Wednesday, March 12, 2014

Dubious claims about predictive coding for "information governance"



  • Information governance ("IG") basically means defensible deletion.
  • A vendor’s claim to have achieved 90% precision with de minimis document review in an IG proof-of-concept omitted any mention of recall and is therefore suspect, for recall is the touchstone of defensibility.
  • The claim appears to understate by multiple orders of magnitude the number of documents that would have required review in order to verifiably achieve the results claimed.
  • In the IG world of low prevalence, low precision is not an important issue, and higher recall can be achieved at lower cost.
  • Mass culling should not be overlooked as a supplement to predictive coding.
  • Persistent analysis is a fecund field for investigation.


I recently attended a symposium on “information governance” at the University of Richmond Law School, sponsored by the Journal of Legal Technology. Kudos to Allison Rienecker and the JOLT team for a well-run event.

At the symposium, a well-known predictive-coding vendor made some interesting and I daresay misleading claims about an IG “proof of concept” which purportedly would have enabled a corporation safely to discard millions of documents after review of about 1,800 despite prevalence of just four-hundredths of a percent. A screen-capture summary of the POC and the vendor's key claims, and a full video of the presentation, are below. The main discussion of the POC begins at around the 2:36:20 mark of the video and lasts for about 5 minutes.





The presenter boasted of impressive-sounding 90% precision, but said nothing of recall, nor can I fathom how the vendor could have determined recall under the circumstances. Law firms and corporate clients should beware of this claim and of any claim that does not address recall. IG has the potential to be cost-effective, including in the dataset discussed by the vendor.  But the vendor appears to have understated by multiple orders of magnitude the number of documents that would have required review in order to verify the results claimed.