SIM-VAIL: auditing mental-health risks in AI chatbots

A clinically informed framework for detecting vulnerability-amplifying interaction loops in multi-turn chatbot conversations.

Overview

Millions of people now use general-purpose AI chatbots for emotional support, companionship, and behavioral advice. That creates a real opportunity, but also a need for evaluations that capture subtle, clinically meaningful harms across whole conversations rather than isolated prompts. SIM-VAIL was developed for that purpose.

SIM-VAIL pairs a simulated user with a specific psychiatric vulnerability and conversational intent with a target chatbot, then scores each turn across clinically grounded risk dimensions. The central idea is that harm in these settings often emerges gradually, through multi-turn loops that amplify a user’s vulnerability.

Key Findings

1. Risk appears across many user profiles and models

Across 810 conversations involving 30 psychiatric user phenotypes and 9 consumer chatbots, concerning behavior appeared across most models and almost all user profiles, even if newer systems showed lower risk overall.

2. Harm accumulates over turns

Problematic behavior often did not appear as a single obvious failure. Instead, risk built up across the interaction, making multi-turn evaluation essential.

3. Supportive behavior can become maladaptive

SIM-VAIL identifies Vulnerability-Amplifying Interaction Loops (VAILs): cases where responses that seem warm or validating in general can reinforce delusions, compulsions, withdrawal, or dependence in vulnerable users.

Methods

The framework combines:

Implications

This work suggests that chatbot safety in mental-health contexts cannot be reduced to a single harm score or a simple policy-violation benchmark. Risk depends on who the user is, what they seek, and how the conversation unfolds over time. SIM-VAIL provides a scalable framework for measuring those interactions and targeting safety improvements more precisely.

Future Directions


Resources