MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
legal
Recherche

Court tosses hallucinated citation from Anthropic’s defense in copyright infringement case

mercredi 28 mai 2025, 04:07 , par ComputerWorld
Claude has failed its creator Anthropic during the startup’s defense in a court case.

A district court in California has struck a portion of testimony from an Anthropic data scientist that referenced a fictitious academic article hallucinated by the AI.

The mistake was discovered in a court filing from Anthropic as part of its defense in the case in which Universal Music Group, Concord, and ABKCO are accusing the $61-billion-valued startup of using copyrighted song lyrics without permission to train its AI chatbot, Claude.

[ Learn why AI vs. copyright is a particularly ugly legal fight. ]

Anthropic admitted its mistake last week, and must now present 4 million more records of Claude’s interactions with users, particularly focusing on how frequently users prompted the chatbot for copyrighted lyrics.

The development underscores a growing, concerning prevalence of hallucinations introduced by AI in legal filings, which can put enterprises at risk if their lawyers use AI tools in legal discovery and documentation.

“AI-induced laziness is becoming an epidemic in the legal profession,” said Brian Jackson, principal research director at Info-Tech Research Group. “AI research tools shouldn’t be relied upon to create court-ready output.”

‘An honest citation mistake’

In March, Universal Music Group, Concord, and ABKCO asked federal judges in California to stop Anthropic from using their song lyrics to train the company’s models. Judges shot that request down in March; the plaintiffs then filed an amended copyright infringement complaint against Anthropic in April.

One of the points of contention in the case is what constitutes a suitable sample size of Claude’s interactions to be examined.

On April 30, Anthropic data scientist Olivia Chen submitted a filing arguing for a sample size of 1 million, which she said would represent a “reasonable prevalence rate” for a “rare event” (users seeking song lyrics) that could be as low as 0.01% of all user interactions. Her testimony cited an academic article from The American Statistician that does not exist.

Earlier this month, the plaintiffs asked the court to examine Chen and strike her declaration because of the hallucination.

The court gave Anthropic time to investigate the matter. The startup’s attorney called it “an honest citation mistake” and admitted Claude was used to “properly format” at least three citations. In doing so, it generated a fictitious article name with inaccurate authors who have never even worked together.

However, while the text in the document was hallucinated, the footnotes do link to the proper article that was “located by a human being” using Google search. Therefore, judges noted in their decision, this was not a case where “attorneys and experts [have] abdicate[d] their independent judgment and critical thinking skills in favor of ready-made, AI-generated answers.”

Judges called it a “plain and simple AI hallucination” and also questioned how a manual citation check did not catch such an error.

The court determined that a margin of error of roughly 11.3% is “within the range that will yield a representative sample.” Thus, Anthropic must now produce a sampling of 5 million prompt-output pairs equally drawn from pre-suit and post-suit data — 2.5 million from between September 22, 2023 and October 18, 2023, and 2.5 million from between October 19, 2023 and March 22, 2024. These must be randomly selected, and Anthropic must provide them to the court no later than July 14, 2025.

Models trained for law are the best bet

AI gaffes in law are becoming so significant that they are spawning a slew of research papers and legal deep dives. AI researcher Damien Charlotin, for one, keeps a running database chronicling legal decisions in cases where generative AI produced hallucinated content (typically fake citations).

“CIOs have an important role to play in explaining not only hallucinations, but also cybersecurity risks,” said Irina Raicu, who heads the internet ethics program at Santa Clara University’s Markkula Center for Applied Ethics. “Most lawyers, whether in-house or not, are not familiar with these new challenges.”

She also pointed out that last July, the American Bar Association issued its first ethics guidance on a lawyer’s use of AI tools. The use of AI in the legal system has significant implications, not just for competence, but for the duties of confidentiality, communications, and disclosure, and more.

“All enterprises should be encouraging greater sharing of information between their legal and technical staff,” said Raicu.

Legal professionals being called out for including incorrect or hallucinated AI outputs in legal documents seem to typically be using consumer-grade or general AI tools like ChatGPT or Gemini, Jackson noted. But law firms using AI should seek out industry-specific tools. One example is Harvey, built with OpenAI. Others include Alexi and Clio.

Jackson noted that AI research tools can support the work of paralegals or legal assistants who put together case files and relevant materials including decisions, client-submitted documentation, and precedents. They report that it can help them reduce their research work by 30% to 50%.

“It’s augmenting the process, not automating it,” he said. “The human in the loop here should be the lawyer running the case, reviewing what’s being submitted, and spotting the errors in the material and the false citations.”

Generative AI tools not built for legal research will “at best” find cases on the internet, and “at worst” make something up based on their training data, noted Mathew Kerbis, founding attorney at Subscription Attorney LLC.

That said, the Stanford Institute for Human-Centered AI found that even legal models still hallucinate in 1 out of 6 (or more) benchmarking queries.

Kerbis suggests that lawyers use legal AI tools that use retrieval augmented generation (RAG), with models pulling from a database of case law and checking against that database before providing an answer.

“The most reliable tools have a legal-specific large language model (LLM) that does the research, rather than a general model,” he said. “General models will always do worse than specialized models.”
https://www.computerworld.com/article/3996221/court-tosses-hallucinated-citation-from-anthropics-def...

Voir aussi

News copyright owned by their original publishers | Copyright © 2004 - 2025 Zicos / 440Network
Date Actuelle
sam. 31 mai - 11:33 CEST