Accounting

The 'Hallucination' Problem: A Checklist for Verifying AI-Generated Legal Citations

Key Takeaways:

  • Legal AI tools hallucinate 17-34% of the time even with specialized platforms like Lexis+ AI and Westlaw AI-Assisted Research—and general-purpose tools like ChatGPT hallucinate on legal queries 58-88% of the time, making human verification non-negotiable
  • The ABA’s Formal Opinion 512 establishes that lawyers remain fully responsible for AI-generated work product, requiring active oversight and independent verification of all citations and legal assertions under your ethical duties
  • A systematic six-step verification checklist can protect your firm from sanctions, malpractice claims, and reputational damage while still capturing AI’s efficiency benefits

Your associate just handed you a research memo that looks impeccable. The cases are relevant, the citations are formatted perfectly, and the legal analysis flows logically from precedent to conclusion. There’s just one problem: three of those cases don’t exist.

Welcome to the hallucination problem—and it’s more common than you think.

In 2023, attorneys Steven Schwartz and Peter LoDuca made national headlines when they submitted a court filing in Mata v. Avianca containing six entirely fabricated case citations generated by ChatGPT. The cases had convincing names, plausible citations, and even detailed holdings. They just weren’t real. The result? A $5,000 fine and a referral to the court’s grievance committee—not to mention the kind of publicity no attorney wants.

If you think this was an isolated incident caused by inexperienced lawyers, think again. As of late 2024, a database tracking AI hallucinations in legal proceedings has documented over 640 cases worldwide, including 324 in U.S. federal, state, and tribal courts. And these are just the ones that got caught.

For mid-sized law firms embracing AI to stay competitive, the stakes couldn’t be higher. You need the efficiency gains that AI provides—but you also need to protect your firm from sanctions, malpractice claims, and the erosion of client trust that comes from citing fictional precedent.

This guide will give you everything you need: a clear understanding of why hallucinations happen, the regulatory framework governing AI use in legal practice, and most importantly, a practical, step-by-step checklist for verifying every AI-generated citation before it leaves your office.

Understanding the Hallucination Problem

Before you can solve a problem, you need to understand it. AI hallucinations aren’t random glitches—they’re predictable failures that stem from how large language models work.

What Exactly Is an AI Hallucination?

When we talk about AI hallucinations in legal contexts, we’re actually describing two distinct types of errors that require different verification approaches.

Fabrication is the complete invention of legal authority. The AI creates case names, citations, and holdings out of whole cloth. These hallucinations include fictitious party names, made-up volume and reporter numbers, and legal reasoning that sounds convincing but references events that never happened. This is what happened in Mata v. Avianca—the AI didn’t just get something wrong; it invented something new.

Misgrounding is more subtle and potentially more dangerous. The AI correctly identifies a real case but misrepresents its holding, applies it to the wrong legal principle, or cites it for a proposition it doesn’t actually support. The citation exists and will pass a basic verification check, but the legal conclusion drawn from it is wrong.

According to Stanford’s RegLab research, misgrounded citations may be “more dangerous than fabricating a case outright, because they are subtler and more difficult to spot.” A real case citation with a positive Shepard’s symbol creates false confidence—you might stop your verification process too early.

Why Do These Hallucinations Happen?

Large language models don’t “know” the law the way an attorney does. They’re prediction engines trained on vast amounts of text, generating the most statistically likely next word based on patterns in their training data.

When an AI produces a citation, it isn’t looking up a case in a database. It’s predicting what a plausible-sounding citation would look like based on patterns it learned during training. If the AI has seen thousands of citations formatted as “[Plaintiff] v. [Defendant], [Volume] [Reporter] [Page],” it will generate new citations that follow that pattern—even if the specific combination doesn’t correspond to any real case.

This becomes especially problematic with:

  • Novel legal questions: When there’s limited training data on a topic, the AI fills gaps with plausible-sounding but fabricated information
  • Jurisdiction-specific law: State-specific rules and local precedent are often underrepresented in training data
  • Recent developments: AI training data has cutoff dates, so recent cases may be missing or mischaracterized
  • Unpublished opinions: These are less likely to appear in training data, leading to higher error rates

Understanding these failure modes isn’t academic—it tells you where to focus your verification efforts.

The Numbers Don’t Lie: How Widespread Is the Problem?

If you’re hoping that specialized legal AI platforms have solved the hallucination problem, the research suggests otherwise.

A landmark study from Stanford’s RegLab and Human-Centered Artificial Intelligence (HAI) institute tested leading legal AI platforms against general-purpose tools. The findings are sobering.

General-Purpose AI Tools (ChatGPT, Gemini, etc.)

When asked legal research questions, general-purpose AI tools hallucinated between 58% and 88% of the time. That’s not a typo—in some scenarios, the AI was wrong more often than it was right. These tools were never designed for legal research, and it shows.

Specialized Legal AI Platforms

The news is better with legal-specific platforms, but still far from perfect:

  • Lexis+ AI produced incorrect information in more than 17% of queries
  • Ask Practical Law AI (Thomson Reuters) also showed hallucination rates exceeding 17%
  • Westlaw AI-Assisted Research hallucinated in more than 34% of queries

Let those numbers sink in. Even with the best available legal AI tools, roughly one in six responses contains errors—and with some platforms, it’s closer to one in three.

The Adoption Paradox

Here’s where it gets complicated. According to Clio’s Legal Trends Report, AI adoption among legal professionals skyrocketed from 19% in 2023 to 79% in 2024. Among mid-sized law firms specifically, 93% of surveyed legal professionals now use AI in some capacity.

But only 40% are using legal-specific AI solutions—down from 58% in 2024. That means more lawyers are turning to generic tools like ChatGPT, Claude, and Gemini, likely because free access tiers make these tools easier to adopt. These same tools have the highest hallucination rates.

The result is a growing gap between AI usage and AI competence—exactly the conditions that lead to citation disasters.

The Regulatory and Ethical Framework

The legal profession hasn’t been caught flat-footed by the AI revolution. Bar associations, courts, and regulators have responded with a framework of obligations that makes citation verification not just best practice, but ethical necessity.

ABA Formal Opinion 512

In July 2024, the American Bar Association released Formal Opinion 512—its first guidance on generative AI use in legal practice. The opinion doesn’t create new rules; it applies existing ethical obligations to new technology.

Competence (Model Rule 1.1): You must understand the benefits and risks of the AI technologies you use. “I didn’t know the AI would hallucinate” isn’t a defense—competent practice requires understanding how your tools can fail.

Candor to the Tribunal (Model Rules 3.1, 3.3, 8.4(c)): You cannot make false statements to a court or allow falsehoods to go uncorrected. AI-generated hallucinations that make their way into court filings violate this foundational obligation.

Supervisory Duties (Model Rules 5.1, 5.3): Attorneys with supervisory responsibility must ensure that subordinates—and in a significant 2012 change, “nonlawyer assistance” including AI tools—comply with professional conduct rules. If your associate uses AI improperly, you may bear responsibility. For a deeper dive into these supervision duties in the AI context, we’ve published a comprehensive guide for California firms that applies broadly to attorneys everywhere.

Reasonable Fees (Model Rule 1.5): You may charge for time spent prompting AI tools and reviewing their outputs—but you cannot bill for time “saved” as if you’d done the work manually without disclosure. Understanding how to properly bill for AI-assisted research is essential to maintaining ethical compliance.

The message is clear: AI is a tool, not a replacement for professional judgment. The attorney’s ethical obligations remain unchanged.

Court Disclosure Requirements

Beyond bar association guidance, many judges have implemented standing orders specifically addressing AI use. According to tracking by Ropes & Gray LLP, as of mid-2024, at least 36 standing orders from state and federal judges across 13 states required attorneys to either disclose AI use, certify citation accuracy, or both.

These requirements vary significantly:

  • Judge Brantley Starr (N.D. Texas) was the first to require a certificate attesting that AI-generated content has been verified by a human
  • Judge Stephen Alexander Vaden (Court of International Trade) requires disclosure of which AI tool was used and certification that no confidential information was exposed
  • Judge Michael Baylson (E.D. Pennsylvania) requires disclosure of any AI use, not just generative AI
  • Judge Peter Kang (N.D. California) distinguishes between generative AI (disclosure required) and traditional AI tools like Westlaw (no disclosure required)

The trend is toward more disclosure, not less. Before filing anything, check the standing orders for your specific judge and jurisdiction.

State Bar Guidance

Several state bars have issued their own guidance, building on the ABA framework:

  • California released comprehensive practical guidance emphasizing that existing rules apply fully to AI, with additional considerations for confidentiality and employment law compliance
  • Florida’s Advisory Opinion 24-1 permits AI use but requires lawyers to “reasonably guarantee compliance” with ethical obligations
  • Texas State Bar Opinion No. 705 emphasizes that attorneys must understand AI technology before using it and remain fully responsible for verifying accuracy
  • New York State Bar Association developed one of the most robust frameworks, with detailed recommendations for firms of all sizes

The common thread across all this guidance: AI doesn’t change your obligations—it just creates new ways to fail them.

The Six-Step Citation Verification Checklist

Now for the practical heart of this guide: a systematic process for verifying every AI-generated citation before it leaves your firm.

This checklist isn’t meant to be exhaustive paranoia—it’s designed to catch the specific types of errors AI tools are prone to making while remaining efficient enough for everyday use.

Step 1: Verify Basic Existence

Goal: Confirm that the case actually exists.

Before doing anything else, confirm that the citation corresponds to a real case. This sounds basic, but it catches the most egregious fabrications.

Actions:

  • Search for the case name and citation in a verified legal research database (Westlaw, LexisNexis, or Google Scholar)
  • Verify that the reporter, volume, and page number are valid
  • Confirm the court and year match the citation format

Red Flags to Watch For:

  • Case names that sound too perfect for your argument (AI tends to create overly relevant-sounding fabrications)
  • Citations to reporters that don’t exist or year/volume combinations that are impossible
  • Cases involving fictional judges (the Stanford study documented AI inventing judges who never existed)

Time Investment: 30 seconds to 2 minutes per citation

Step 2: Confirm the Holding

Goal: Ensure the case actually says what the AI claims it says.

This step catches misgrounding—real cases cited for propositions they don’t support.

Actions:

  • Read the actual case—at minimum, the headnotes and holding
  • Verify that the legal principle attributed to the case is accurate
  • Check that any quoted language actually appears in the opinion
  • Confirm the procedural posture matches what the AI described

Red Flags to Watch For:

  • AI summaries that are “too good”—capturing exactly the holding you need without nuance or limitation
  • Holdings that seem broader or more absolute than typical judicial language
  • Missing procedural context (the AI might cite a case that was later reversed on the point at issue)

Time Investment: 2-5 minutes per citation

Step 3: Check Current Status

Goal: Confirm the case hasn’t been overruled, reversed, or significantly limited.

Even accurate citations are useless if the precedent has been undermined.

Actions:

  • Run KeyCite (Westlaw), Shepard’s (Lexis), or BCite (Bloomberg) on every cited case
  • Check for negative treatment specifically on the point of law you’re citing
  • Review any distinguishing cases that might limit the holding’s application
  • Note the jurisdiction and determine if the precedent is binding or merely persuasive

Red Flags to Watch For:

  • Cases with red or yellow flags that the AI didn’t mention
  • Recent developments in the area of law that might affect the citation’s value
  • AI citing state cases as if they were binding federal precedent (or vice versa)

Time Investment: 1-3 minutes per citation

Step 4: Verify Context and Applicability

Goal: Ensure the case applies to your specific facts and jurisdiction.

This is where legal judgment comes in—something AI still can’t replicate.

Actions:

  • Analyze whether the factual context of the cited case is analogous to your situation
  • Consider whether the legal principle transfers across jurisdictions if the case isn’t binding
  • Assess whether there’s more recent or more relevant authority available
  • Check whether the cited case is the seminal authority or merely a derivative citation

Red Flags to Watch For:

  • Cases from distant jurisdictions when closer authority exists
  • Factual patterns that are meaningfully different from your situation
  • Secondary citations when primary sources would be stronger

Time Investment: 3-5 minutes per citation

Step 5: Cross-Reference AI Output

Goal: Validate AI-generated analysis by checking multiple sources.

No single AI tool should be trusted in isolation. Cross-referencing helps catch systematic errors.

Actions:

  • If using general-purpose AI, verify findings with a legal-specific tool
  • Compare AI-generated summaries against actual case text
  • Run the same research question through traditional research methods and compare results
  • Have a second person independently verify high-stakes citations

Red Flags to Watch For:

  • AI tools that consistently cite the same cases (may indicate training data limitations)
  • Significant discrepancies between AI summary and actual case language
  • Research results that seem too convenient or too comprehensive

Time Investment: 5-10 minutes for complex research questions

Step 6: Document Your Verification

Goal: Create an audit trail demonstrating human oversight.

Documentation isn’t just good practice—it’s protection against claims that you blindly relied on AI.

Actions:

  • Note which citations were AI-generated and which came from traditional research
  • Record the verification steps performed for each citation
  • Document any corrections made to AI output
  • Maintain records sufficient to demonstrate supervision and review

Documentation Should Include:

  • Date and time of verification
  • Which AI tool generated the original content
  • What verification method was used (KeyCite, reading the full case, etc.)
  • Name of the attorney who performed verification
  • Any changes made to the original AI output

Time Investment: 1-2 minutes per citation (longer if maintaining detailed records)

Building a Firm-Wide Verification Protocol

Individual verification is essential, but systematic protection requires firm-wide policies and procedures. Here’s how to build citation verification into your firm’s DNA.

Establish Clear AI Use Policies

Every firm needs written policies covering:

Approved Tools: Specify which AI tools are permitted for what purposes. Consumer tools might be acceptable for initial brainstorming but not for final research. Legal-specific platforms might be approved for broader use.

Required Verification: Make the six-step checklist mandatory for all AI-assisted research. Build it into workflow checklists and pre-filing review processes.

Documentation Standards: Specify what records must be maintained regarding AI use and verification. Consider whether time entries should reflect AI assistance.

Billing Practices: Address how AI-assisted work should be billed. The ABA permits charging for prompting and review time, but you can’t bill AI-accelerated work at manual rates without consideration of efficiency gains. Modern legal billing software can help you track and document AI-assisted work accurately.

Implement Multi-Tier Review

Create review checkpoints that match the stakes involved:

Standard Matters: The drafting attorney verifies all citations; a supervising attorney reviews the final work product.

High-Stakes Litigation: Add a third-level review specifically focused on citation verification. Consider having someone who wasn’t involved in the initial research conduct independent verification.

Novel Legal Questions: When AI is most likely to hallucinate—in areas with limited training data—require enhanced verification including traditional research methods.

Train Your Team

Effective AI use requires training on both capabilities and limitations:

Understanding Hallucinations: Everyone who uses AI should understand why and how hallucinations occur. This knowledge shapes how they interact with and verify AI output.

Verification Techniques: Train on the six-step checklist until it becomes automatic. Include practice exercises with AI-generated research that contains deliberate errors.

Ethical Obligations: Ensure every attorney understands their duties under ABA Opinion 512 and relevant state guidance. This isn’t optional education—it’s required competence.

Tool-Specific Training: If your firm uses particular AI tools, provide training on their specific features, limitations, and best practices.

Leverage Technology for Quality Control

Technology can help catch what human review might miss:

Citation Verification Software: Tools that automatically check citations against legal databases can serve as a backstop to human review.

AI Detection Tools: Some software can identify AI-generated content, useful for reviewing work from junior associates or contract attorneys.

Time-Tracking Integration: Advanced legal reporting systems can help track AI-assisted work and ensure appropriate billing practices that reflect actual time spent.

Create Accountability Structures

Without accountability, policies become suggestions:

Designation of Responsibility: Assign clear responsibility for AI policy compliance to specific individuals—often managing partners or practice group leaders.

Regular Audits: Periodically review work product for compliance with verification requirements. Spot-check citations in filed documents.

Incident Response: Have a plan for what happens when a verification failure is discovered—ideally before filing, but also after.

What to Do If You Discover a Problem

Despite best efforts, errors can slip through. Here’s how to handle them:

Before Filing

If you discover a hallucination during review, simply correct it. Document what was caught and why, so you can improve your verification process.

After Filing, Before Court Action

If you discover an error in a filed document before the court has acted on it, you have an obligation of candor to correct the record. File amended documents immediately and consider whether disclosure to the court about the AI error is appropriate under local rules.

After Adverse Consequences

If a hallucination contributes to a court ruling or sanctions, you’re in damage-control mode:

  • Notify affected clients immediately
  • Notify your malpractice insurance carrier
  • Consider whether the error affects other pending matters
  • Implement enhanced controls to prevent recurrence
  • Cooperate fully with any bar investigation

The earlier you catch problems, the less damage they cause. This is why verification matters.

The Bottom Line: Trust but Verify

AI is transforming legal practice. Mid-sized firms that embrace it thoughtfully will gain significant competitive advantages—more efficient research, faster document drafting, and the ability to take on matters that would previously have been economically impossible.

But those advantages only materialize if you avoid the pitfalls. A single hallucination that makes it to court can cost more than all the efficiency gains from months of AI use—not just in sanctions, but in client trust and professional reputation.

The six-step verification checklist isn’t optional overhead—it’s the price of admission for using AI responsibly. Build it into your workflows. Train your people. Document everything. And never forget that AI is a tool for enhancing human judgment, not replacing it.

For attorneys ready to modernize their practice, AI offers a clear path to smarter workflows—from legal research to financial analysis to marketing. The key is implementing it with proper safeguards.

Your clients hire you for your expertise, your judgment, and your professional responsibility. AI can help you deliver on those expectations more efficiently—but only you can deliver them at all.


FAQ

Q: Do I need to disclose to clients that I’m using AI for legal research?

A: While not universally required, best practice suggests transparency. Consider adding language to engagement letters about AI use and discussing it when clients express preferences. Some corporate clients now include AI provisions in their outside counsel guidelines.

Q: Can I bill for time spent verifying AI-generated research?

A: Yes. ABA Formal Opinion 512 confirms that attorneys may charge for time spent prompting AI tools and reviewing outputs for accuracy and completeness. The key is billing for actual time spent and being transparent about efficiency gains. Understanding ethical billing practices is essential when incorporating AI into your workflow.

Q: Are legal-specific AI tools like Lexis+ AI or Westlaw AI “safe” to use without verification?

A: No. While these tools have lower hallucination rates than general-purpose AI, Stanford research found they still produce errors 17-34% of the time. Independent verification remains essential regardless of which AI tool you use.

Q: What should I do if a court in my jurisdiction requires certification that AI wasn’t used?

A: Some courts require certification that filings were not drafted using AI or that AI output was verified by a human. Before filing, check local rules and standing orders. If AI was used, ensure you can truthfully certify that human review occurred.

Q: Is it better to avoid AI entirely to eliminate hallucination risk?

A: Complete prohibition may actually increase risk by driving AI use underground and potentially violating your duty of technological competence. A balanced approach with clear guardrails and verification protocols is generally recommended.

Q: How should supervising attorneys handle associates using AI?

A: Under ABA Model Rules 5.1 and 5.3, supervising attorneys must ensure subordinates (and “nonlawyer assistance” including AI tools) comply with professional conduct rules. This requires establishing clear policies, providing training, and implementing review procedures. See our comprehensive guide on supervising AI use for detailed protocols.

Q: What are the warning signs that a citation might be hallucinated?

A: Watch for citations that seem too perfect for your argument, case names that are overly descriptive, citations to reporters or volume/year combinations that don’t exist, and holdings that lack the nuance typical of real judicial opinions. When in doubt, verify.


Sources