Court Treatment Of Artificial Intelligence: Predictive Coding

This post is the first in a planned series about how courts treat artificial intelligence (AI). Advances in AI seemingly happen on a daily basis. AI pioneer Andrew Ng fondly says that AI “is the new electricity.” Earlier this year, consulting firm McKinsey & Company estimated that it could annually create several trillion dollars in value to businesses. There is little doubt AI is becoming pervasive.

Yet court opinions involving AI are relatively sparse. With the rapid growth of AI, courts increasingly will be called upon to adjudicate related issues. Thus, the time is ripe for discussing how courts treat AI.

We will start with “predictive coding.”

What Is Predictive Coding?

Predictive coding–also known as “computer-assisted coding” or “technology-assisted review”–is the area where courts most often deal with AI. In this previous post (well worth reading), Matt discussed predictive coding in the context of whether it will complement human attorneys or replace them.

So what exactly is it? Broadly speaking, predictive coding is an AI application that helps lawyers review records in litigation. Parties to litigation must produce to their opponents reasonably available documents, including electronically stored information (ESI), that are otherwise discoverable. In complex cases, the amount of potentially relevant ESI can exceed what any human could manually review. Complicating this, parties often disagree about what ESI records should be produced, and how to find them.

Enter AI. In predictive coding, knowledgeable attorneys first review a small sample of the universe of records and label each record in the sample. For instance, attorneys might label whether the record is responsive to a discovery request, and whether it is covered by attorney-client privilege. The next step is where AI shines: Given a sufficient sample set labeled by attorneys, predictive coding uses AI to predict the appropriate labels for the remaining universe of records. And it can do so with great accuracy.

But predictive coding is not perfect. In some cases, mistakes have caused significant amounts of responsive documents to be missed. Further, the exact manner of implementing predictive coding varies by vendor. It is not surprising, then, that disagreements arise about the propriety and parameters of predictive coding.

This post serves as a high-level introduction to predictive coding. This is an evolving topic, and in future posts, I plan to provide updates and dive deeper into specific subtopics where appropriate.

When Do Courts Allow Predictive Coding?

A federal magistrate judge in New York (now senior counsel at an international law firm), Andrew J. Peck, paved the way for predictive coding in litigation. He authored multiple court opinions on the topic, starting with his seminal decision in Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012). That case explicitly “recognize[d] that computer-assisted review is an acceptable way to search for relevant ESI in appropriate cases.” Id. at 183.

Predictive coding got another boost in a tax case in 2014, when a court rejected a party’s argument that predictive coding is “unproven technology.”  Dynamo Holdings Ltd. P’ship v. Comm’r of Internal Revenue, 143 T.C. 183, 2014 WL 4636526 (2014). The court held:

Where, as here, petitioners reasonably request to use predictive coding to conserve time and expense, and represent to the Court that they will retain electronic discovery experts to meet with respondent’s counsel or his experts to conduct a search acceptable to respondent, we see no reason petitioners should not be allowed to use predictive coding to respond to respondent’s discovery request.

143 T.C. at 192.

Other courts followed suit. Indeed, one Delaware court even unilaterally stated that the parties should use it, though the court eventually softened its position. EORHB, Inc. v. HOA Holdings LLC, No. 7409-VCL, 2013 WL 1960621 (Del. Ch., May 6, 2013). Court approval of predictive coding in civil cases became so widespread that Judge Peck stated in 2015, “the case law has developed to the point that it is now black letter law that where the producing party wants to utilize [predictive coding] for document review, courts will permit it.” Rio Tinto PLC v. Vale S.A., 306 F.R.D. 125, 127 (S.D.N.Y. 2015).

What Are The Limits To Predictive Coding?

While it has gained acceptance, predictive coding has some limits.

First, courts generally will not force unwilling parties to use predictive coding. In one case, Judge Peck refused to compel a defendant to search for documents using predictive coding, when the defendant preferred to use keyword searching. Hyles v. New York City, 10 Civ. 3119 (AT)(AJP), 2016 WL 4077114, at *2-3 (S.D.N.Y. Aug. 1, 2016). The court reasoned that the party responding to discovery requests is generally best situated to decide how, exactly, it should search for relevant ESI. Id. at *3. As another court stated, “[t]he few courts that have considered this issue have all declined to compel predictive coding.” In re Viagra Products Liability Litig., 16-md-02691-RS (SK) (N.D. Cal. Oct. 14, 2016) (citing Hyles).

Courts also may reject a party’s attempt to use predictive coding when the same party had previously agreed to use other methods of reviewing records. For instance, a court refused a proposal to use predictive coding where (1) the parties had agreed to a different search method, (2) the proposing party failed to comply with recommended “best practices” for using the software, and (3) the proposal “lack[ed] transparency and cooperation regarding the search methodologies applied.” Progressive Cas. Ins. Co. v. Delaney, No. 2:11-cv-00678-LRH-PAL, 2014 WL 3563467, at *8 (D. Nev. July 18, 2014). But a separate court reached a different conclusion and approved a plaintiff’s use of predictive coding, despite the parties’ previous agreement to use different search methods. Bridgestone Ams., Inc. v. Int’l Bus. Machs. Corp., No. 3:13-1196, 2014 U.S. Dist. LEXIS 142525, at *3 (M.D. Tenn., July 24, 2014). The court recognized that it was, “to some extent, allowing Plaintiff to switch horses in midstream.” It thus ordered the plaintiff to “provide the seed documents they are initially using to set up predictive coding,” and indicated that the defendant could also “switch[] to predictive coding if they believe it would…be more efficient….”

Another limitation is that using predictive coding is not practical in every case. Many routine cases have a limited universe of documents that lawyers can manually review. And straightforward disputes over small amounts typically do not justify the budget needed to hire predictive coding vendors.

Further, it is unclear whether courts will permit predictive coding in criminal matters. There are certainly complex criminal cases with a sprawling universe of records, where manual review is impossible or impracticable. In these cases, either the government or the defendants may seek to use predictive coding. The issues raised in these circumstances could get thorny, and may be worthy of a separate post. See United States v. Comprehensive Drug Testing, Inc., 621 F.3d 1162, 1177 (2010) (noting that large volumes of ESI in criminal cases implicates the need to strike “the right balance between the government’s interest in law enforcement” and defendants’ rights).


The use of AI in litigation is growing, and this is particularly evident in predictive coding. Courts universally accept that AI can help parties categorize documents in large collections of data. We are keeping our fingers on the pulse of predictive coding, and will let you know about important new developments in this area.

Finally, my first post here wouldn’t be complete without thanks to my friend Matt Scherer for the chance to join this exciting blog. Matt has provided valuable, cutting-edge insights into the intersection of law and AI. As a lawyer, entrepreneur, and AI programmer, I hope to add to this discussion.


  • Daniel Schiff

    Thanks very much for this overview. I’m actually a bit surprised that the legal profession has allowed this relatively novel technology given the possible biases and errors in play against the rights to receive all relevant documents.

    Would love to hear more about how the initial documents are selected, coded, and the procedures to ensure that the predictive coding is reasonably robust. What are the accuracy rates, etc.

    Welcome to the blog,

    • Joe Wilbert

      Thank you for the welcome and the comment, Daniel. Your suggestion about the particulars behind predictive coding is a good one, which I’m hoping to delve into in a future post. Hope all is well at GA Tech. -Joe

  • Maria Graetsch

    Hi, This is really interesting. Keep the blogs coming. I am a researcher in operationalising AI, with a focus on legal environments and explainability. Really keen to understand how these solutions are being deployed in law firms and management of adoption. Thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.