January 22, 2024 | 1 minute read

Stanford Study Finds High Percentage of Errors Using Large Language Models in Legal Contexts

Get in touch

Paul Hunter

Partner

Get in touch

Paul Hunter

Partner

The study by Stanford University found that hallucinations, or the tendency of large language models (LLMs) to produce content that deviates from actual facts or well-established legal principles and precedents, occurred from 69% to 88% of the time in response to specific legal queries.

The study applied 200,000 queries against each of GPT 3.5, Llama 2, and PaLM 2 models. Although these generative AI programs have supposedly passed bar exams, they failed at some basic tasks performed by junior attorneys. For example, in a task measuring the precedential relationship between two different cases, most LLMs do no better than random guessing. In answering queries about a court’s core ruling (or holding), models were found to hallucinate at least 75% of the time.

The risks of using LLMs for legal research are especially high for:

Litigants in lower courts or in less prominent jurisdictions
Individuals seeking detailed or complex legal information
Users formulating questions based on incorrect premises
Those uncertain about the reliability of LLM responses

The findings of this study are particularly concerning given there are dozens of legal tech startups and law firms saying that they are using AI to provide better, more efficient legal services. However, given such poor performance in these tests, anyone using AI or LLMs should exercise extreme caution. The law appears to require more intelligence than artificial intelligence currently offers.

... hallucination rates range from 69% to 88% in response to specific legal queries for state-of-the-art language models. Moreover, these models often lack self-awareness about their errors and tend to reinforce incorrect legal assumptions and beliefs. These findings raise significant concerns about the reliability of LLMs in legal contexts, underscoring the importance of careful, supervised integration of these AI technologies into legal practice.