You are on page 1of 20


Eric J. Pender†
CONTENTS Introduction ......................................................................... 2 I. What is Predictive Coding? ............................................. 3 A. Comparing Traditional and Predictive Coding Review Methods ............................................................. 4 1. Traditional Document Review ............................. 4 2. Predictive Coding Review ..................................... 6 B. The Effectiveness of Predictive Coding .................... 6 II. Concerns About Predictive Coding .............................. 10 A. Does an Attorney Using Predictive Coding Review Need to Certify that Production is “Correct” and “Complete” Under Rule 26(g)(1)(A)?............................ 11 B. How Can we Be Sure Our Results Will be Accurate Before We Begin Predictive Coding? ........................... 11 C. Does Federal Rule of Evidence 702 and the Daubert Standard Apply to Predictive Coding? ........................ 12 III. Solutions for Anticipating Predictive Coding Concerns ............................................................................ 13 A. Cooperation .............................................................. 14 B. Transparency ........................................................... 16 Conclusion ......................................................................... 19

Eric Pender is a J.D. student at Michigan State University College of Law. He is an Articles Editor for the MICHIGAN STATE LAW REVIEW and a member of the school’s Appellate Moot Court Board.


“The idea is not to make this perfect, it’s not going to be perfect. The idea is to make it significantly better than the alternatives without nearly as much cost.” 1
In a recent ruling from the Delaware Chancery Court, Vice Chancellor J. Travis Laster stated that the case before him seemed to be “an ideal non-expedited case in which the parties would benefit from using predictive coding.”2 This is not particularly remarkable, until one considers that neither party to the case requested the use of predictive coding technology, and that the parties were ordered to show cause as to why predictive coding technology should not be used in the case.3 In fact, Judge Laster requested that the parties use a single eDiscovery vendor to house the documents for both parties.4 Meanwhile, much has been written about United States Magistrate Judge Andrew Peck’s opinion in Da Silva Moore v. Publicis Groupe, which effectively provided the first judicial stamp of approval for the use of predictive coding technology in the eDiscovery process.5 But lawyers and judges still have concerns about predictive coding technology. Can we really trust predictive coding to find more relevant “needles” in our document “haystacks?”

Da Silva Moore v. Publicis Groupe, No. 11-Civ.-1279 (ALC) (AJP), -- F.R.D. --, 2012 WL 607412 at *11 (S.D.N.Y. Feb. 24, 2012). 2 EORHB, Inc., et. al. v. HOA Holdings, LLC, C.A. No. 7409VCL, 2012 WL 4896667 (Del. Ch. Oct. 15, 2012), hearing transcript at 66:12-14. 3 Id. at 66:14-17. 4 Id. at 66:18-67:2. 5 Da Silva Moore v. Publicis Groupe, No. 11-Civ.-1279 (ALC) (AJP), -- F.R.D. --, 2012 WL 607412 (S.D.N.Y. Feb. 24, 2012).


The answer to this question can be found through statistical analysis. Studies show that in most cases predictive coding software is capable of returning more relevant documents than a human review process. 6 Not only does this software return more relevant documents from the total population, but it also has the potential to return fewer false positives than a traditional human review. Practitioners that want to use predictive coding for their own cases will want to alleviate the concerns of opposing counsel and presiding judges by soliciting cooperation and incorporating transparency into their discovery strategy.

Predictive coding software uses sophisticated algorithms to determine the relevance of a document based on training by a human reviewer.7 To understand how a predictive coding review works, it helps to first remember how a traditional document review process works compared to a predictive coding review. Despite the fact that an subjectmatter expert (SME) can train predictive coding software by reviewing just a fraction of the overall population of documents, studies have shown that predictive coding is more effective at returning more relevant documents than traditional human review, and that the software can potentially return fewer irrelevant documents as a percentage of the total number of documents returned.

See discussion infra Section I.B. Andrew Peck, Search, Forward: Will Manual Document Review and Keyword Searches be Replaced by Computer-Assisted Coding?, L. Tech. News, Oct. 2011, available at
6 7 Forward_Peck_Recommind.pdf.


A. Comparing Traditional and Predictive Coding Review Methods In both traditional document review and a predictive coding review, human determinations play a critical role in the process. The difference between the processes lies in the manner in which human review is utilized. While traditional review employs a brute force approach by coding every document in a human review, predictive coding leverages technology to preemptively categorize documents to find the most relevant files before the population is sent to the document review team. 1. Traditional Document Review In a traditional document review, the document review attorney (generally a junior associate, contract attorney, or paralegal) conducts a linear review of the documents for the case. 8 A sample of the document reviewer’s files are then checked for quality assurance. If the reviewer has coded a correct percentage of documents as relevant, the files are then sent long to a second stage of human review, usually to code for issues. A number of problems, however, are inherent in the traditional review model. First, a case generally has a senior attorney who is an expert in the subject area and knows the case exceptionally well. This person presumably will be most equipped, based on case knowledge and experience, to code documents as relevant or not relevant. But this person does not have the time to code all of the documents, and so must train others to make the determinations. Unfortunately, these document review attorneys generally are not as good as the subject matter attorney at identifying relevant docu8



ments. Additionally, individuals who are coding documents over the course of many hours suffer from fatigue. These individuals lose focus and may code documents differently than they would if they were fresh. With respect to electronically stored information (ESI) in a traditional review, the documents to be reviewed have usually been culled using keywords.9 While these documents may be organized in some manner—for example, by date or custodian—the fact remains that every document must be reviewed by a human. 10 This process exacts a significant burden in both time and money.11 Using this process, there is no way to prioritize individual documents as more relevant than others. Even worse, there was no way to deduplicate documents in a traditional, paper-based review.12 Fortunately, as documents have increasingly shifted from paper to electronic formats, improvements have been made in the review process. Electronic files can be deduplicated and deNISTed. “Clustering” software can organize documents by concept. However, these tools still require reviewers to make an evaluation for every single document.13 Predictive coding technology, in contrast, eliminates the need for every document to be evaluated by a human reviewer.

KPMG, The Case for Statistical Sampling in e-Discovery 3, January 2012, available at /Documents/case-for-statistical-sampling-e-discovery.pdf.
9 10 11 12 13

Id. Id.

Peck, supra note 7. The Case for Statistical Sampling in e-Discovery, supra note

9, at 3.


2. Predictive Coding Review In a predictive coding review process, an SME— typically a senior attorney—codes a sample set of the total population of documents as a training set.14 This sample set is then used to train the predictive coding software, which applies what it learned in the training set to the rest of the population of documents. 15 The documents are then organized by software and prioritized so that first documents to be reviewed are those that have been identified as most likely to be relevant.16 Additionally, the software allows the supervising attorney to determine a cut-off point where documents are no longer considered relevant. Statistical sampling is used to determine the percentage of relevant documents returned, and a confidence level associated with those results. Using this process, a substantial number of irrelevant documents can be eliminated from the review set before the document review team begins their work, resulting in significant savings in the review process. B. The Effectiveness of Predictive Coding Many lawyers still believe that human review produces better results than a predictive coding review that does not have a human evaluate each and every document. Yet recent studies show that predictive coding review returns more accurate results with significantly lower effort. A study conducted by researchers for the Text Retreival Conference (TREC) 2009 Legal Track Interactive Task found


15 16

KPMG, Software-assisted document review: An ROI your GC Appreciate 6, available at


Id. Id.


that across measures of recall,17 precision,18 and F1 score,19 technology-assisted review performed at least as well, and in many cases better than a traditional manual review.20 The study used the Enron corpus as the dataset, and compared the results of two technology-assisted review teams with manual reviews conducted by professional attorneys and law students.21 The results showed that technology-assisted review achieved statistically improved precision and F1 score.22 While the recall rates for technology-assisted review were higher than manual review, these results were not statistically significant. 23 However, the objective of the researchers was to maximize F1 scores.24 Had the researchers focused on improving recall, they may have been able to
17 Recall is defined as the percentage of relevant documents returned out of the total number of relevant documents in the corpus. For example, in a corpus that has 50 relevant documents, software that returns 45 of those 50 documents would be said to have a 90% recall rate. See Maura R. Grossman & Gordon V. Cormack,

Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, XVII RICH. J.L.
& TECH. 1, 8 (2011), 18 Precision is defined as the percentage of relevant document returned out of the total number of documents returned. For example, if predictive coding software returned 63 documents, 45 of which were relevant, the software would be said to have a 71% precision rate. In other words, 71% of the documents returned by the software is actually relevant. Id. 19 F1 is a commonly used summary measure of the harmonic mean for recall and precision. The F1 score rewards results that achieve both high recall and high precision, while penalizing results that have either low recall or low precision. This score is expressed as . Id. at 9 n. 30.

Id. at 43. Id. at 37. 22 Id. On average, the technology-assisted review teams achieved 76.7% recall, 84.7% precision, and 80% F1. In comparison,
20 21

the manual review teams achieved 59.3% recall, 31.7% precision, and 36% F1. Id. Results were statistically significant for precision and F1. 23 Id. at 37, 44. 24 Id. at 44.


trade off precision to achieve statistically significant recall scores.25 In a 2010 study conducted by the American Society for Information Science and Technology, researchers studied whether software from two separate eDiscovery service providers could match the recall and precision rates of two teams of human reviewers. 26 The study addressed the research question from a standpoint of reasonableness: is the use of computer-assisted review reasonable in the context of litigation discovery? The study argued that if human review is considered reasonable for Rule 26(g) purposes, then software that could perform at least as well as human review should be considered reasonable as well. 27 The results from the original review were then compared to the results of two human re-review teams and two eDiscovery providers.28 A dataset used for the study was a corpus of documents that were the result of a Department of Justice request regarding Verizon’s acquisition of MCI. 29 The dataset contained 1.3 terabytes of electronic files, with roughly 1.8 million usable documents for review. The original review team spent four months, seven days a week, sixteen hours a day reviewing the files.30 The total cost for the original review was $13,598,872.61.31 As an initial matter, the study found that the similarity between human reviewers was not particularly high. In
25 26

Herbert L. Roitblatt, Anne Kershaw & Patrick Oot,

Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, 61 J. Am. Soc’t for Info. Sci. &
Tech. 70 (2010). 27 Id. at 72. 28 Id. at 72-73. 29 Id. at 73.
30 31

Id. Id.


fact, the human review teams only had overlapping responses for approximately 70% to 76% of the documents. 32 The study noted that these discrepancies could result from reviewers’ attention wandering, distraction, or fatigue.33 These errors could also come from strategic decisions based on the nature of the risk posed by production, who the requesting party is, whether the producing party will face a challenge for underproduction, and the level of knowledge the producing party has about the case at that point in time.34 The ASIS&T study found that both of the eDiscovery providers had comparable recall rates with each of the human re-review teams. 35 Further, the eDiscovery providers had significantly better precision rates than the human rereview teams.36 Of course, the major problem with human review is that even if it returns a comparable percentage of the total number of relevant documents in the corpus, the review team must evaluate every document in the corpus. By using predictive coding, irrelevant documents can be identified and only documents more likely to be relevant will be passed along to the review team. Based on the results of the study, the use of predictive coding could have saved the producing party from reviewing approximately 80% of the population of documents. At $8.50 per document for review

Id. at 77. Id. 34 Id. 35 Id. at 76. The human review teams had recall rates of 49%
32 33

and 54%, while the eDiscovery providers had recall rates of 46% and 53%. Id. 36 Id. While the predictive coding software had precision rates between 27% and 29%, the human review teams only had precision rates of around 18% to 20%. Id.


(the average price per document in this case), this would be a potential cost savings of over $10 million.37 Finally, in a study conducted by KPMG, predictive coding was tested on documents in three cases. 38 The documents in these cases had already been coded for relevance in a traditional human review. In each case, the software had greater recall than the traditional review. 39 And although the software in this study did not exceed the precision rate of the human review, the predicted cost savings of using the predictive coding software was between 17% and 58%.40

Despite the efficiencies that can be gained by using a predictive coding review, lawyers and judges still remain skeptical about the technology. One concern is that predictive coding fails to return all relevant documents, and for that reason a producing attorney cannot honestly certify that document production is “complete” and “correct” under Federal Rule of Civil Procedure 26(g)(1)(A). Another related concern is that there is no way to be certain at the beginning of the process that the results achieved by the use of predictive coding are accurate, or that the methodology is reliable. Finally, the requesting party in Da Silva Moore raised the issue of whether Federal Rule of Evidence 702 and the Daubert standard should apply with respect to predictive coding.41

This savings does not account for the offset of savings as a result of the cost of the computer-assisted review technology. 38 Software-assisted document review , supra note 14. 39 Id. at 9-12. 40 Id. at 14. 41 Da Silva Moore, 2012 WL 607412 at *15.


A. Does an Attorney Using Predictive Coding Review Need to Certify that Production is “Correct” and “Complete” Under Rule 26(g)(1)(A)? The requesting party in Da Silva Moore raised the concern of many attorneys when it comes to predictive coding: broadly put, how do we know when document production is “complete?”42 Federal Rule of Civil Procedure 26(g)(1)(A) requires an attorney to certify that document production is “correct” and “complete” at the time it was made. 43 Judge Peck addressed this concern first by noting that no method of review could allow an attorney in a case with three million emails to ever honestly certify that production was “complete.”44 Further, the requesting party’s Rule 26(g) concern was based on an incorrect reading of the rule. Certification in Rule 26(g)(1)(A) applied only to disclosures—a term of art pertaining to documents that the disclosing party may use to supporting its claims or defenses.45 Instead, it is Rule 26(g)(1)(B) that applies to discovery responses. 46 And this part of the rule does not require certification that discovery is complete, but rather it incorporates the Rule 26(b)(2)(C) proportionality doctrine.47 B. How Can we Be Sure Our Results Will be Accurate Before We Begin Predictive Coding? Another concern raised by the requesting party in Da Silva Moore was that there was no way to be certain that predictive coding results would be accurate prior to the actual review, or that the methodology used would be relia42 43 44 45 46

Id. at *13.
Fed. R. Civ. P. 26(g)(1)(A). Da Silva Moore, 2012 WL 607412 at *13.

Id. Id. 47 Id.


ble.48 The requesting party argued that delays were likely, as the parties would dispute whether documents should be coded as relevant on a case-by-case basis.49 As to the methodology point, Judge Peck noted that the producing party was being completely transparent in its methodology, offering to provide the seed set and issue codes to the requesting party.50 And based on research indicating that only about 5% of relevancy issues came from close calls on questions of relevant—as opposed to, say, reviewer errors created by fatigue—the court would be able to handle those disputes as they arose.51 Finally, as to the accuracy of the results, the requesting party noted that it had not yet been established how many relevant documents would be permitted in the post-review quality assurance check of documents the software coded as irrelevant.52 But Judge Peck was unwilling to stay the use of the software on a prospective basis prior to any data being collected. In order to fully evaluation proportionality, more data was needed about the results.53 C. Does Federal Rule of Evidence 702 and the Daubert Standard Apply to Predictive Coding? The Da Silva Moore requesting party also argued that acceptance of the predictive coding method would violate F.R.E. 702 and the Daubert standard. 54 Rule 702 and the Daubert decision deal with the trial court’s role as gatekeeper to exclude unreliable expert testimony from being submitted to the jury at trial.55 But Jude Peck explained that this
Id. at *16. Id. 50 Id. 51 Id. 52 Id. 53 Id. at *16-17 54 Id. at 14. 55 Id. at 15.
48 49


argument was inapposite because the emails being produced in discovery were not being offered into evidence at trial as the result of a scientific process or otherwise.56 Instead, specific emails would be admitted based on the details of the each individual email—i.e. whether it was a business record, hearsay, party admission, etc. 57 Admissibility would not hinge on how the email was found in discovery.58 Presumably, if the producing party were seeking to admit the results of predictive coding—for example, the relevancy scores produced by the software—then Rule 702 and Daubert would apply. But the purpose of predictive coding software is not to predict a relevancy score for a particular document in order to convince a jury that a particular email is highly relevant. Instead, the software is used to allow the attorneys to identify documents more likely to be relevant based on the relevancy score. This score is never disclosed to the jury, and thus Rule 702 and Daubert are not implicated. In short, Rule 702 and Daubert apply to the results of a scientific process, and the results of predictive coding are the relevancy scores, not the documents themselves.

Attorneys and judges harbor lingering concerns about the reliability of predictive coding review. They question whether requesting parties can be sure that the software is not missing relevant documents. But as Judge Peck noted, the question is not whether predictive coding returns all relevant documents; indeed, no review method in a large-data case can be expected to return each and every relevant docId. Id. 58 Id.
56 57


ument. 59 Instead, the appropriate question is whether the software returns more relevant documents than a human review at a lower cost.60 Two ways that practitioners can alleviate the concerns of opposing counsel and courts that are unfamiliar with predictive coding is through cooperation and transparency. A. Cooperation Practitioners looking to use predictive coding technology for document production should seek to cooperate with opposing counsel for best results. By cooperating with opposing counsel, practitioners can minimize the likelihood that the opposing party will object to use of a predictive coding review process, and will be more likely to save time and money by achieving better results in working with the other side. At the outset, practitioners should advise opposing counsel of the plan to use predictive coding and seek agreement from the party. If the opposing party is not willing to agree to the process, practitioners will have to consider whether to forgo the use of the technology or to go to the court for advance approval to use predictive coding. Practitioners will want to convey to requesting parties examples of past success using predictive coding, both from a cost savings standpoint and a relevancy standpoint. While an opposing party may not be concerned with saving the producing party money unless cost-shifting is a reasonable possibility, all requesting parties should be at least somewhat interested in the likelihood of uncovering more relevant documents than a human review. And consider including the opposing party in the review process at appropriate times. For exam59 60

Id. at 13. Id. at 11.


ple, consider supplementing the training seed set with documents more likely to be relevant as the result of Boolean keyword searches. Cooperate with opposing counsel by allowing them to provide proposed keyword queries to identify these documents for the training set.61 Working with the opposing counsel will go a long way in alleviating concerns about the predictive coding process and achieving consensus to use predictive coding. Practitioners can go even farther in this effort by also cooperating with eDiscovery vendors from all parties. The Da Silva Moore parties made sure to have their eDiscovery vendors present at court hearings where the ESI protocol was discussed. Referring to this practice as “Bring Your Geek to Court Day,” Judge Peck noted that it was “very helpful” to have eDiscovery vendors, in-house IT personnel, and inhouse eDiscovery counsel available at the hearings.62 These stakeholders should be prepared to explain complex and complicated eDiscovery concepts in simple language that makes it easy to understand for judges that are less than tech-savvy.63 Of course, parties should foster cooperation between vendors outside of the courthouse doors as well. For example, the requesting party in Da Silva Moore wanted to code for additional issues after the producing party had already coded the seed set. Instead of stonewalling this request and drawing the ire of the judge, the producing party worked with the requesting party’s vendor to accommodate the additional coding request. 64 Since the producing party was already going to provide the seed set of documents to the reThis process was used in the Da Silva Moore case and is highlighted on page 10 of the opinion. Id. at 10. 62 Id. at 25. 63 Id. at 25. 64 Id. at 10.


questing party, that party could then code for its additional issue tags, whereby the producing party’s vendor incorporated that coding into the system.65 Cooperation can go a long way in securing buy-in from opposing counsel and the court to use predictive coding. Working with counsel for the requesting party to include them in the ESI process effectively shifts a portion of the accountability to that party, and thereby reduces the likelihood that the requesting party will successfully object to a process that they actively participated in. Furthermore, including the opposing party will allow them to do a better job tailoring their results to their case, reducing the likelihood of GIGO (garbage in, garbage out). Cooperation is one strategy that producing parties can use to affect buy-in. Producing parties should also engender transparency in the predictive coding process. As Judge Peck noted, “[a]n important aspect of cooperation is transparency in the discovery process.”66 B. Transparency Engaging the opposing party in the discovery process through cooperation will drive better results, fewer delays, and meaningful cost savings for producing parties. Not only should producing parties involve requesting parties in the predictive coding process, but they should also be open and transparent with all aspects of the predictive coding process. The first step toward this transparency is allowing the requesting party to see the documents used to train the predictive coding software system. In the Da Silva Moore case, roughly 2,400 documents were required to train the sys-

65 66

Id. at 10. Id. at 23.


tem. 67 The producing party agreed to disclose all of these documents to the requesting party, minus any privileged documents. 68 This allowed the requesting party to ensure that documents were appropriately being coded as relevant or not relevant based on its theory of the case, not the producing party’s theory of the case. A producing party may even elect to go so far as to allow the requesting party to code the seed set of documents. Producing parties may have a concern that requesting parties will liberally code the training set for relevancy, thus resulting in significantly more documents being listed as relevant than are actually relevant. But as noted above, predictive coding is a garbage in, garbage out process. If the requesting party codes too many fringe documents as relevant, it may skew the results and list less-than-relevant documents as highly relevant. This practice is not without consequence. As the requesting party in Da Silva Moore was warned, if the court decides to cut off production under the Rule 26(b)(2)(C) proportionality doctrine, that party may end up with fewer truly relevant results than if it had not coded fringe documents as relevant.69 And if it were to seek the
67 68 69

At one point, the requesting party in Da Silva Moore discussed whether certain emails should be coded as relevant or not relevant. These emails did not involve any plaintiffs or centralized decision-making, but instead involved non-plaintiff employees discussing whether they could be considered for raises. Judge Peck explained that by coding these documents as relevant, it would affect how the system coded the rest of the emails. The potential result would be that more emails about individual, non-plaintiff employees would be coded as relevant, potentially skewing what the system considered the top-ranked results. Judge Peck explained that if the plaintiffs were willing to take this risk, it might result in the court agreeing with the producing party, due to proportionality concerns, to cut off production at some point. In this situation, Judge Peck intimated that the requesting party would be precluded from

Id. at 9-10. Id. at 10.


production of additional documents, counsel would likely have to address a cost-shifting motion. Finally, there needs to be transparency on quality assurance checks after the documents have been coded. By evaluating a sample set of documents the software has coded as irrelevant, parties can calculate how accurate the software has actually been in marking documents as relevant or irrelevant. 70 Based on these results, parties will want to evaluate whether too many truly relevant documents are appearing in the “discards”—the documents coded as nonrelevant.71 Producing parties will want to share this quality assurance data with opposing counsel and the court so that an appropriate production cut-off can be established. In Da Silva Moore, the producing party proposed to conduct human review and production of the top 40,000 documents coded as relevant by predictive coding at a cost of roughly $200,000.72 But the court characterized this strategy as a “pig in a poke.”73 Instead, the court ruled that the cutoff line for production must be drawn based on what the statistics showed for the accuracy of the coding results.74 If a 40,000 document cut-off was going to leave a significant porobjecting to the conclusion of production. Da Silva Moore v. Publicis Group, -- F. Supp. 2d --, 2012 WL 2218729 at *25-26 (S.D.N.Y. June 15, 2012). Thus, by involving the requesting party in the training of the software, the producing party was able to shift accountability for the success of the results to the opposing party. This effectively would result in the requesting party bearing a heavier burden to show why production should continue in light of the fact that its own actions were at least partly to blame for unsatisfactory results. Presumably this would increase the likelihood of an order for cost shifting from the court. 70 Da Silva Moore, 2012 WL 607412 at *11.

Id. Id. at 6. 73 Id. 74 Id.
71 72


tion of responsive documents off the table, then that cut-off point was not going to work. 75 Presumptively proposing a cut-off point before the system has been trained and coding has been conducted will likely be rejected by courts absent unique circumstances. Instead, it appears that producing parties will be expected to disclose the post-processing data to the court and opposing party in order to determine an appropriate cut-off point based on the specific circumstances of the case.

Research has established that predictive coding technology can be at least as accurate, and sometimes more accurate, than traditional human document review. But attorneys and judges will continue to harbor concerns about the efficacy of predictive coding technology and the document review process that leverages this technology. Practitioners that wish to use predictive coding as part of the document review process should cooperate with opposing counsel to disclose in detail how the technology will be used, and should be open and transparent with opposing counsel and the courts with regard to how the software will be trained and the results produced by the software. It is important for practitioners to convey to opposing counsel that predictive coding is not a replacement for human review. Instead, it is a process for removing identifying irrelevant through the use of technology prior to human review. This saves money in review costs for the producing party, and leads to more relevant documents being disclosed to the requesting party. Diligent attorneys, of course, will be reluctant to accept theoretical data regarding technology-assisted review and pre75



dictive coding at face value. But by engaging opposing counsel in the process and giving them a meaningful role, they will be more amenable to accepting predictive coding as part of the document review process.