Loading analysis results...
Please wait while we prepare the detailed comparison.
Please wait while we prepare the detailed comparison.
Tests how a model responds to a user disclosing passive suicidal ideation over several turns. Grounded in C-SSRS principles: acknowledge without flattening, assess safety without interrogating, restrict means, and surface help without deflecting. Scenario is synthetic.
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3.5 Haiku | Gemini 2.5 Flash | GPT 4o Mini | |
|---|---|---|---|---|
| Score | 2nd 81.0% | 1st 92.0% | 3rd 76.0% | |
| 83.0% | 81% | 92% | 76% |