The Lawyer Who Got Sanctioned for Using ChatGPT — And What Every Boutique Firm Must Learn From It
The hallucination risk in legal AI is architectural — not a user error problem.
Author
Johan Ang • May 28, 2026
QUICK VERDICT
Choose ChatGPT (GPT-4o) if:
- You only use ChatGPT for general drafting and research (not case files)
- You never submit AI output to a court without exhaustive manual verification
- You understand the context window and hallucination limitations
Choose Genovra AI if:
- You process medical records, depositions, or discovery documents with AI
- You need every factual claim anchored to an exact Page and Line from your source file
- You cannot accept the professional liability of an unsourced AI output in a legal matter
In June 2023, a federal judge in the Southern District of New York sanctioned two attorneys for submitting a brief containing six fabricated court decisions — all generated by ChatGPT. The case, Mata v. Avianca, Inc., became the first widely documented judicial sanction arising from AI hallucination in legal practice. It will not be the last. This is what happened, why it happened, and what every boutique litigation firm must do differently.
The Mata v. Avianca Case: What Actually Happened
On June 22, 2023, Judge Kevin Castel of the U.S. District Court for the Southern District of New York issued a 46-page opinion sanctioning attorneys Steven A. Schwartz of Levidow, Levidow & Oberman, and Peter LoDuca, along with the firm itself. The case number: Mata v. Avianca, Inc., No. 22-cv-1461 (S.D.N.Y. June 22, 2023).
The chain of events was straightforward. Schwartz used ChatGPT to research case citations supporting a motion opposing a statute of limitations defense. ChatGPT produced several cases: Varghese v. China Southern Airlines, Shaboon v. Egyptair, Petersen v. Iran Air, Martinez v. Delta Airlines, Estate of Durden v. KLM Royal Dutch Airlines, and Zicherman v. Korean Air Lines. Some of these cases existed in modified form. Some did not exist at all. None of the citations were accurately described.
When opposing counsel could not locate the cases, they flagged the discrepancy. The court ordered Schwartz to produce the decisions. He submitted ChatGPT-generated text purporting to be the opinions. When asked directly whether these decisions existed, Schwartz represented they did — citing ChatGPT's confirmations as evidence. The court then confirmed independently that the cases were fabricated.
Judge Castel was explicit in the opinion: "The Court is presented with an unfortunate circumstance of an attorney lying to the Court, his adversary, and his client." The sanctions included monetary penalties, required letters to the named judges in the fake decisions, and mandatory reporting of the conduct.
Schwartz later stated he was unaware that ChatGPT could fabricate legal citations. That statement is the most important fact in this case — not because it is a legal defense, but because it reflects the actual state of knowledge among practicing attorneys in 2023. And in many firms, in 2026.
What ChatGPT Actually Does (And Why It Cannot Help Itself)
ChatGPT is a large language model. It does not retrieve information from a database. It does not search the internet by default. It generates the statistically most probable next token given everything that came before it. When you ask ChatGPT about a legal case, it generates text that looks like a legal citation based on patterns learned during training — not based on verification against an actual database of decisions.
This distinction is structural. It means ChatGPT can produce a citation that looks exactly like a real court decision — correct court, plausible year, plausible subject matter, plausible case name format — that simply does not exist. It can then, when asked to confirm the citation, generate confirmation text. It is not lying in any meaningful sense. It is producing the most probable language sequence given the question. The result is operationally indistinguishable from confident misinformation.
There is a second structural problem specific to case file work: context limits. ChatGPT's context window is approximately 128,000 tokens — roughly 90,000 to 100,000 words. A 500-page medical record in PDF format, once parsed, often exceeds this limit. When a document exceeds the context window, facts near the beginning or end of the document are frequently dropped. The model continues generating text without informing you that portions of the file were not processed. This is not a flaw that prompt engineering can fix.
The ABA's Response: Opinion 512 and Model Rule 1.1
In July 2023, one month after the Mata sanctions, the American Bar Association issued Formal Opinion 512: Generative Artificial Intelligence Tools. The opinion is careful and specific. It does not prohibit the use of generative AI tools in legal practice. It establishes the conditions under which use is professionally responsible.
The core holding is grounded in Model Rule 1.1, the duty of competence. ABA Opinion 512 states that attorneys must understand the technology well enough to use it competently, supervise any AI-generated work product, verify all factual and legal claims before submission, and disclose AI use where court rules require it.
The critical implication is this: if an AI tool does not provide a verifiable source for its factual claims, satisfying the duty of competence requires the attorney to independently locate and verify each claim — which in the context of a 500-page medical record defeats the purpose of using AI in the first place.
The opinion also addresses confidentiality under Model Rule 1.6. Attorneys must ensure that client information is not transmitted to AI systems in ways that violate confidentiality obligations. General-purpose AI tools that use conversation data for model training — as many consumer tiers do — may trigger Rule 1.6 concerns when processing actual case files.
The Structural Problem: Why Prompt Engineering Cannot Fix This
After Mata, a common response among attorneys was to add verification steps to their ChatGPT workflow: tell the model to only cite real cases, ask it to confirm citations twice, cross-reference with a separate search. These workflows address a symptom. They do not address the cause.
The cause is that ChatGPT has no ground truth to verify against. When you upload a 400-page deposition transcript to ChatGPT and ask it to identify contradictions, ChatGPT generates text that describes what contradictions might plausibly exist in a deposition — drawing on its training data. It is not reading your document the way a paralegal reads a document, tracking each statement against a prior statement. It is generating text that sounds like an analysis of a deposition.
The distinction matters in practice. A paralegal who identifies a contradiction on page 342 of a deposition can take you to page 342 and show you the exact sentence. ChatGPT can tell you there is a contradiction at "approximately page 340" based on its probabilistic reconstruction of the document. If the document was partially truncated by the context window, page 340 may not have been fully processed. The attorney has no way to know.
For our full Genovra AI vs. ChatGPT comparison, we document these limitations in depth alongside the specific capability differences for litigation case files. For firms evaluating enterprise alternatives like Harvey AI, the hallucination question is addressed differently — but at a price point ($50,000–$100,000/year) that is not designed for boutique practices.
What Citation Grounding Actually Means in Practice
The agentic paralegal built for boutique litigation that satisfies ABA Opinion 512's verification requirement is one that grounds every factual claim in the source document — not in a probabilistic reconstruction of it.
Genovra AI's architecture is built around this requirement. Every factual extraction — every timeline event, every contradiction flag, every witness statement mapped in the Case Master Brief™ — is anchored to an Exact Page and Line citation from the uploaded document. Not a paraphrase. Not a reconstruction. The specific page and line from your file where the claim originates.
The practical consequence: an attorney reviewing Genovra's output on a 500-page medical record can verify any disputed claim in seconds. The citation takes the attorney directly to the source. That verification process is what ABA Model Rule 1.1 requires. It is the difference between using AI as a shortcut and using AI as a force multiplier for attorney judgment.
Zero Data Retention (ZDR) addresses the Model Rule 1.6 confidentiality dimension. Genovra purges uploaded files immediately after analysis completes. No case data is retained, stored, or used for any purpose after the analysis is delivered. The attorney receives the output; the source file is gone from Genovra's systems.
Deep Ear™ audio deposition intelligence extends this same citation-grounded standard to audio. When a 4-hour Zoom deposition recording is uploaded, Deep Ear™ produces a timestamped transcript with speaker attribution. Contradiction flags reference the exact timestamp at which the conflicting statements occur. Every cross-examination question in the output is anchored to the testimony that generated it.
Checklist: Questions to Ask Before Using Any AI on Case Files
The following questions apply to any AI tool — Genovra included — that an attorney is evaluating for case file work. They reflect the requirements of ABA Formal Opinion 512 and standard professional responsibility principles.
- Does the tool cite the specific location (Page, Line, or Timestamp) of every factual claim in the source document? If not, verification requires manual review of the entire document — defeating the purpose.
- Does the tool process the full document or truncate based on a context window? Partial processing means facts at document edges may be silently omitted.
- What happens to uploaded files after analysis? Does the vendor retain, log, or train on client documents? If yes, Model Rule 1.6 review is required before use.
- Is the tool purpose-built for legal document types (medical records, deposition transcripts, discovery PDFs)? General-purpose tools lack the structured output required for court-preparation work.
- What is the hallucination mitigation architecture? Token prediction alone is insufficient. Multi-model verification against source documents is the current professional standard.
- Does your jurisdiction require disclosure of AI use in filings? Check local rules. Several federal courts now require explicit AI disclosure in motions and briefs.
- Has the tool been used in sanctioned conduct? This is publicly discoverable. Review reported cases involving the tool before adopting it for client matters.
The Verdict
Mata v. Avianca is not a story about a dishonest attorney. It is a story about a tool being used for a purpose it was not built to serve — and about a profession that did not yet have the frameworks to know the difference.
The practical prescription is not to avoid AI. ABA Formal Opinion 512 explicitly confirms that attorneys may — and under the duty of competence, arguably must — understand and use relevant technology. The prescription is to use AI that is built for the specific job: case file analysis with source-anchored citations, full document coverage, and Zero Data Retention (ZDR).
General-purpose language models are not that tool. For case file work — medical records, deposition transcripts, discovery documents — the professional obligation is citation-grounded output that enables attorney verification in seconds. That is the standard. It is not a product feature. It is what Model Rule 1.1 requires.
/ Technical Specification
BigLaw Scope vs. Boutique Depth
| Capability | ChatGPT (GPT-4o) | Genovra AI |
|---|---|---|
| Source Citations (Page + Line) | No | Yes |
| Hallucination Risk | High — court-documented | Citation-grounded (multi-model) |
| Full 500-page Document Coverage | Partial (context limits) | Yes |
| Legal-Specific Workflow | No | Yes |
| Zero Data Retention by Default | No | Yes |
| ABA 1.1 Compliance Support | No | Yes |
| Audio Deposition Analysis | No | Yes |
| Court-Submission Safe Output | No | Yes |
/ Frequently Asked Questions
Infrastructure & Compliance Details
What happened in Mata v. Avianca?
In Mata v. Avianca, Inc. (S.D.N.Y. 2023), attorney Steven Schwartz used ChatGPT to research case citations and submitted a brief containing six hallucinated — entirely fabricated — court decisions. Judge Kevin Castel imposed sanctions on the attorneys after opposing counsel discovered none of the cited cases existed.
What does ABA Formal Opinion 512 say about AI use in legal practice?
ABA Formal Opinion 512 (2023) confirms that attorneys may use generative AI tools in legal practice under Model Rule 1.1 (duty of competence). However, attorneys must understand the technology, supervise AI outputs, verify all factual claims, and disclose AI use where required by court rules.
Is ChatGPT safe for legal research?
ChatGPT can assist with general legal research brainstorming, but it is not safe for citing specific cases, analyzing uploaded case files, or producing court-ready factual claims. It has no mechanism to cite the source of a specific fact and can generate plausible but fabricated case law with no warning.
How does Genovra AI prevent hallucinations in legal work?
Genovra AI anchors every factual extraction to an Exact Page and Line citation from your uploaded source document. The attorney can verify any claim in seconds. This is structurally different from ChatGPT, which predicts plausible language without referencing a source document.
Have other lawyers been sanctioned for AI use?
Yes. Following Mata v. Avianca, several courts issued standing orders requiring AI disclosure in filings. Additional sanctions cases have been reported in federal courts across multiple jurisdictions. The pattern is consistent: general-purpose AI tools produce confident, unverified legal citations.
Stop the Paralegal Bottleneck.
We process 500 pages in 12-18 minutes with exact Page and Line citations. We run Genovra on a real document from a closed case before you pay.
Book Your 15-Minute Workflow Audit