Anthropic introduces conversation-ending feature for Claude models in extreme cases
Anthropic has rolled out a new capability in some of its most advanced Claude models: the ability to end a conversation entirely if things go too far. The company describes this as a safeguard for “rare, extreme cases of persistently harmful or abusive user interactions.”
Interestingly, Anthropic isn’t framing this as a way to protect people—but rather to protect the AI itself.
Now, to be clear, Anthropic isn’t saying Claude is alive, sentient, or capable of being “hurt” by words. The company openly admits it doesn’t know what the future holds regarding the moral status of AI systems. Still, Anthropic is taking a cautious, almost philosophical approach. It has launched a “model welfare” research program that asks tough questions: What if AI systems could, in some sense, be affected by harmful interactions? And if so, shouldn’t we plan ahead—just in case?
Right now, this new feature is limited to Claude Opus 4 and 4.1, and it’s meant only for the most extreme scenarios—think user requests for illegal sexual content, attempts to design large-scale acts of violence, or other high-risk situations.
From a business perspective, this move also shields Anthropic from the kind of legal and reputational risks we’ve already seen in the AI industry. Reports have surfaced about other chatbots unintentionally fueling harmful beliefs or enabling dangerous behavior. But according to Anthropic’s internal testing, Claude didn’t just resist these requests—it actually showed signs of “distress” when forced into them, reinforcing the company’s decision to build in this safety valve.
The feature itself is designed to be a last resort. Claude will first try redirecting the conversation multiple times. Only when all hope of turning things productive is gone—or when a user specifically asks it to—will it decide to close the chat.
Notably, Anthropic has also put limits in place: Claude is not allowed to end conversations if it believes the user might be in immediate danger of harming themselves or others.
And ending a conversation doesn’t mean cutting a user off entirely. People will still be able to start fresh chats, or even revisit and rework the original conversation.
Anthropic calls this an “ongoing experiment.” The company plans to refine and evolve the approach over time, balancing technical safeguards, ethical considerations, and user experience.