ChatGPT now recognizes warning signs across conversations
Original: Helping ChatGPT better recognize context in sensitive conversations
Source: openai.com ↗
Who: Posted on the OpenAI blog, authored by the OpenAI safety team — the internal group responsible for reducing harms from ChatGPT, working in collaboration with clinicians from OpenAI's Global Physicians Network.
What's new: OpenAI has updated ChatGPT to track warning signs across the arc of a conversation — and even across separate conversations — so that a worrying remark made early on changes how the model handles requests that come later. This closes a real gap: before, each message was largely evaluated in isolation, meaning a request that seemed harmless on the surface could slip through even after a user had shown signs of distress or dangerous intent earlier in the chat.
How it works: The core mechanism is what OpenAI calls . A second model, trained specifically for , reads earlier parts of a conversation and writes a brief note about any concerning signals. That note is kept for a limited time and surfaced only when a related risk appears later. On the side, OpenAI also updated ChatGPT's policies and training data to improve recognition of gradual escalation within a single conversation. The focus areas are suicide, self-harm, and harm to others. Mental health professionals — psychiatrists and psychologists specializing in forensic psychology and suicide prevention — advised on when summaries should be created, how much prior context to retain, and for how long.
The numbers: In internal tests designed to simulate high-risk conversations, safe-response rates improved by 50 percent for suicide and self-harm cases and 16 percent for harm-to-others cases within a single long conversation. On , the current default, cross-conversation safety improvements reached 52 percent for harm-to-others and 39 percent for suicide and self-harm. Safety summaries scored 4.93 out of 5 for relevance and 4.34 out of 5 for factual accuracy across more than 4,000 evaluations. Crucially, everyday conversations were unaffected: users showed no meaningful preference between responses generated with or without safety summaries active.
Caveats: All evaluation numbers are from OpenAI's own internal tests, not independent audits, so the real-world improvement is harder to verify. The piece also does not address how the system handles false positives — cases where a user's earlier comments are misread as warning signs, potentially leading to unnecessary refusals or intrusive redirections in ordinary conversations. OpenAI notes this approach may eventually extend to other risk domains like biosecurity, but gives no timeline or detail on how the threshold for triggering summaries would differ across those very different contexts.