In October 2011, Southern District of New York Magistrate Judge Andrew Peck wrote an article about computer-assisted review, also referred to as computer-assisted coding or predictive coding, in which he noted that, to his knowledge, no reported case had yet ruled on the use of this type of technology in electronic data discovery. He also speculated that many attorneys and clients may be waiting for a judicial decision approving computer-assisted review. That wait is now over. In Da Silva Moore v. Publicis Groupe et al., Judge Peck has now issued the first judicial opinion essentially addressing the defensibility of computer-assisted review and finding that it is “an acceptable way to search for relevant [electronically stored information] in appropriate cases.” The opinion has heightened the “buzz” about predictive coding that has been growing over the past few years as litigants, attorneys and courts have sought effective ways to increase the efficiency and reduce the costs of reviewing and producing the ever-increasing volume of ESI.
What is computer-assisted review and how does it work?
Computer-assisted review involves the use of sophisticated algorithms to aid in the identification of responsive, issue-specific or privileged documents from a certain data set. It allows a computer program to become a review team’s “silent partner” by leveraging small samples of documents that have been reviewed and coded by attorneys to find other similar documents within the full data population. The attorneys, in essence, “train” the program through an iterative, interactive process. While the exact protocol may vary based on the specific program or vendor used, the parties, or the case, generally the process is as follows. First, one or more experienced attorneys review and code a sample set of documents. Next, the computer program uses an algorithm to identify properties of those documents and apply their coding to other documents from the collection. Often, the program “ranks” documents based on their likely relevance, assigning them a numerical score. Then the attorneys review samples from each category of documents identified by the computer, to confirm the accuracy of the computer’s predictions. The computer uses this feedback to refine its coding until the system’s predictions and the attorneys’ coding sufficiently coincide. Judgmental and statistical sampling are used to select documents and to test and quality-control the accuracy of the process.
The facts and circumstances of Da Silva Moore
In Da Silva Moore, a putative employment discrimination class action, the plaintiffs sought discovery from defendant MSL Group that implicated approximately three million electronic documents. The parties agreed to use predictive coding to cull down the volume for review and production. However, they disagreed about the scope and implementation of the process.
First, the plaintiffs indicated concern about the accuracy of the computer training procedure and the possibility that the initial coding process could create systemic errors, increasing the likelihood that relevant documents could get missed. This issue was addressed by MSL’s agreement to provide to the plaintiffs for their review and coding feedback all of the nonprivileged documents used to train the program and MSL’s coding of them.
Second, MSL proposed using seven rounds of coding to train the program. While Judge Peck accepted that number, he added the caveat that additional review and training could be required if after the seventh round there was insufficient agreement between the attorneys’ coding and the program’s predictions.
Third, MSL proposed capping its review and production at the top 40,000 documents identified as responsive through the coding process. Judge Peck found that a hard-number cutoff was inappropriate because the determination of when review and production was complete would be a function of the statistical results. In other words, limiting production to 40,000 documents would not be acceptable if the predictive coding process showed that such a cutoff would leave unreviewed and unproduced a large volume of documents identified as likely to be highly relevant.
When may computer-assisted review be appropriate?
While Da Silva Moore considers the use of predictive coding to identify documents for production in response to discovery requests, computer-assisted review potentially has a place in other contexts as well, such as document productions received from other parties, internal investigations or other projects involving the review of large volumes of electronic data.
As explained by Judge Peck, predictive coding is “not a magic, Staples-Easy-Button solution appropriate for all cases,” nor is it “a case of machine replacing humans.” Rather, the objective of document review is to identify as many relevant documents as possible while reviewing as few nonrelevant documents as possible, and the goal is to use a review method that maximizes “recall” (the fraction of relevant documents identified during a review) and “precision” (the fraction of identified documents that are relevant) at a cost proportionate to the case’s value. In some cases, that method may involve predictive coding techniques.
In Da Silva Moore, Judge Peck notes that the parties were not ordered to use predictive coding. Rather, the opinion provides several reasons why predictive coding was appropriate in this case:
Judge Peck’s “lessons for the future”
- Parties’ agreement. The parties agreed to the use of predictive coding to facilitate the document review and production. The dispute was over the specific protocol to be applied. Judge Peck acknowledged that a case in which the requesting party objected to the producing party’s use of predictive coding would be “slightly more difficult.” Such a case is currently pending in the Northern District of Illinois.
- High volume of ESI. MSL had more than three million electronic documents for potential review.
- Best available option. The court found that predictive coding was superior to the available alternatives under the circumstances of the case.
- Cost effectiveness and proportionality. Under Federal Rule of Civil Procedure 26(b)(2)(C), the proposed discovery must be reasonable and the burden and expense of responding to it should not outweigh its likely benefit.
- Transparency. MSL agreed to make its review and coding process highly transparent. Specifically, it agreed to provide to the plaintiffs for verification all of the nonprivileged documents used to the train the computer, including those MSL determined were nonresponsive. The parties also agreed to meet and confer to resolve any disputes that might arise over the coding applied to particular documents.
The Da Silva Moore opinion specifically identifies four “lessons” that can be taken from the resolution of the electronic discovery disputes in the case:
Other practical considerations:
- Post-training determination of end point. In cases using predictive coding, courts likely will not be able to determine or approve a party’s proposal as to when review and production can be deemed completed until the computer-assisted review program has been trained and the results of the coding have been verified as accurate. Until that point is reached, neither the parties nor the court will be able to know the number of documents after which there is a clear drop-off from those documents likely to be highly relevant to those likely to be marginally relevant to those not likely to be relevant.
- Phased discovery. Obtaining and reviewing data from the most likely relevant sources (custodians and media) first provides a way to potentially control discovery costs.
- Cooperation. In cases in which a requesting party has knowledge of the producing party’s documents, counsel for the requesting party should consider the potential benefits of strategically and proactively disclosing information to the producing party’s counsel.
- IT involvement in e-discovery court proceedings. Even in cases in which counsel is especially knowledgeable about e-discovery issues, it may be helpful to have each party’s e-discovery vendor or in-house IT personnel available at e-discovery-related court conferences and hearings to participate as needed.
For more information about this case, computer-assisted review or other e-discovery topics, please contact one of the co-chairs of the firm’s E-Discovery Task Force, Rachel Tausend, at email@example.com or 202.419.8405, or Jana Landon, at firstname.lastname@example.org or 215.564.8049.
- Upfront time and legal fees. While computer-assisted review is designed to expedite the discovery process and reduce document-review legal fees overall, it typically requires a significant amount of experienced attorney time, and interaction with the technology team, at the outset of the process to train the computer program effectively.
- Nature of the case and documents. As with other document search and review methods and technologies, the decision to use computer-assisted review should be made after a careful and informed evaluation of whether it is appropriate, and likely to be effective, under the specific circumstances of the case.
- Potential variation among courts. While it is not unreasonable to think that other courts that address computer-assisted review issues in the future will share the views and follow the protocol set forth in Da Silva Moore, it is also possible that they could take a different approach.
- More than just technology. To maximize both effectiveness and defensibility, computer-assisted review technology, like other e-discovery solutions, needs to be used in combination with an appropriately designed process that includes quality control testing.
The posting of information on this Web site, or the receipt of information by viewers of this Web site, is not intended to — and does not — create an attorney-client relationship. This Web site is not intended to provide legal advice, and visitors to this Web site should refrain from acting on information posted here without seeking specific legal advice from individually qualified counsel.