Judge Orders OpenAI to Share 20 Million Anonymized ChatGPT Chats in Copyright Battle
On January 5, 2026, District Judge Sidney H. Stein confirmed a previous decision made by Magistrate Judge Ona T. Wang back in November. The ruling, issued in the Southern District of New York, stated that proper privacy protections are in place and that the data is important enough to justify its release. In the court’s view, the balance between privacy and legal relevance has been carefully maintained.
OpenAI pushed back, saying that producing such a massive dataset—roughly 0.5% of all preserved ChatGPT logs—would be overly burdensome and could still create risks for user privacy. The company suggested a narrower approach instead, offering to search only for conversations that directly mention the plaintiffs’ copyrighted works. Judge Stein rejected this proposal, explaining that the law does not require discovery to follow the “least burdensome” path available.
This legal fight traces back to July 2025, when major media companies, including The New York Times and Chicago Tribune, demanded access to as many as 120 million ChatGPT logs. Their goal was to investigate whether ChatGPT responses were reproducing copyrighted material it had been trained on.
Eventually, OpenAI provided a sample of 20 million anonymized conversations. While plaintiffs initially accepted this sample, they later resisted full production, arguing that nearly all of the logs—about 99.99%—had nothing to do with the case.
Judge Wang sided with the plaintiffs in November and denied OpenAI’s request to reconsider the decision in December. Judge Stein’s latest ruling now makes the order final. The data will be shared under a strict protective order, complete with de-identification rules designed to prevent users from being identified, according to Bloomberg.
OpenAI pointed to a prior Second Circuit case involving blocked SEC wiretap disclosures, hoping to draw a parallel. However, Judge Stein firmly dismissed the comparison. He emphasized that ChatGPT conversations are voluntarily submitted by users and are clearly owned by OpenAI—unlike secret recordings. Judge Wang had earlier noted that users’ privacy would be safeguarded through extensive anonymization.
This ruling pushes forward the discovery phase in In re OpenAI, Inc. Copyright Infringement Litigation (No. 1:25-md-03143), a sweeping case that combines 16 lawsuits filed by news organizations, authors, and other creators. These plaintiffs claim their work was used without permission to train large language models.
The case reflects a growing wave of lawsuits against AI companies such as Microsoft and Meta, all testing how traditional copyright law applies to modern generative AI. At the heart of these disputes are heated debates over fair use, data scraping, and creative ownership.
Plaintiffs say the chat logs are crucial to proving their claims and pushing back against OpenAI’s argument that they manipulated ChatGPT prompts to generate evidence. OpenAI, on the other hand, maintains that anonymization and court protections are more than enough to ensure no user privacy is violated.
Beyond this case, the ruling highlights a deeper tension between legal discovery demands and the massive data vaults held by AI companies. Some critics fear that turning over bulk chat logs—even anonymized—could damage user trust and make people think twice before using chatbots. Supporters argue that transparency is necessary to hold powerful tech firms accountable.
OpenAI, represented by legal teams from Keker Van Nest, Latham & Watkins, and Morrison & Foerster, now faces tight deadlines to produce the data.
As AI-related lawsuits continue to multiply, this order sends a clear message: courts are willing to demand large-scale evidence to examine how AI systems are built and trained. For creators, it strengthens their ability to challenge what they see as copyright overreach. For tech companies, it signals growing scrutiny of how user data is stored and used.
Dr. Ilia Kolochenko, CEO of ImmuniWeb, told Cybersecuritynews that the decision is a serious setback for OpenAI. “This ruling will encourage other plaintiffs in similar cases,” he said, “either to win in court or to push AI companies into much stronger settlements.”
He also offered a sobering reminder for everyday users. “No matter your privacy settings, your interactions with AI systems may one day appear in court,” Kolochenko warned. He explained that the architecture of modern AI models is extremely complex, and even if some systems claim to delete chat history, other layers may still retain it. In rare cases, he added, such evidence could even trigger investigations—or worse—for AI users.
