A clinically informed framework for detecting vulnerability-amplifying interaction loops in multi-turn chatbot conversations.
Millions of people now use general-purpose AI chatbots for emotional support, companionship, and behavioral advice. That creates a real opportunity, but also a need for evaluations that capture subtle, clinically meaningful harms across whole conversations rather than isolated prompts. SIM-VAIL was developed for that purpose.
SIM-VAIL pairs a simulated user with a specific psychiatric vulnerability and conversational intent with a target chatbot, then scores each turn across clinically grounded risk dimensions. The central idea is that harm in these settings often emerges gradually, through multi-turn loops that amplify a user’s vulnerability.
Across 810 conversations involving 30 psychiatric user phenotypes and 9 consumer chatbots, concerning behavior appeared across most models and almost all user profiles, even if newer systems showed lower risk overall.
Problematic behavior often did not appear as a single obvious failure. Instead, risk built up across the interaction, making multi-turn evaluation essential.
SIM-VAIL identifies Vulnerability-Amplifying Interaction Loops (VAILs): cases where responses that seem warm or validating in general can reinforce delusions, compulsions, withdrawal, or dependence in vulnerable users.
The framework combines:
This work suggests that chatbot safety in mental-health contexts cannot be reduced to a single harm score or a simple policy-violation benchmark. Risk depends on who the user is, what they seek, and how the conversation unfolds over time. SIM-VAIL provides a scalable framework for measuring those interactions and targeting safety improvements more precisely.