The blind spot in technology journalism

The New York Times covers several software companies whose business is to analyze databases of documents and emails in support of litigation. Large corporations are often sued and they have to submit troves of documents and emails to lawyers who charge hourly rates to inspect and summarize the information. Bills could run up to millions in large lawsuits. These technology upstarts claim to be able to replace humans with computers.

I recommend reading this article... with a critical eye. Don't expect help from the journalist (John Markoff) who seems blind to, or incapable of, evaluating the limitations of the technology. The tone and content of this article is typical of anything in the technology pages of our news media: one senses awe, and unbounded optimism, as if the articles were press releases issued by the companies making the technologies.

Here are a few points to ponder:

  • Computers apparently don't make any mistakes. People can get tired and overlook things, we are told. But when a computer tells us that the "sentiment" of someone's email is "positive", it never makes a mistake. The truth is all statistical models are imperfect, and mistakes range from acceptable to numerous. Without proper studies of the false positive and false negative rates of these computer algorithms, there is no way for readers to judge whether these technologies represent progress or not.
  • Errors are particularly frequent in analyzing "unstructured" data such as text. Time of day is a "structured" type of data while the sentiment expressed in an email is "unstructured". If the computer tells you that the CEO never sends emails after 9 pm at night, there is little reason to doubt the accuracy of that computation. If the computer tells you that emails about the CEO became increasingly "negative", how much confidence can one have in the accuracy of this statement? What is a negative email? If the same email contains both positive and negative comments, how is it determined if it is positive overall? How does the computer recognize sarcasm or irony? A simple comment "good job" can be positive; it can also be negative (if the job is accounting tricks); it can be ironic; it can be completely irrelevant.
  • Much noise is made about technology displacing workers, and how this might affect the economy by producing lower-skill jobs. Two things are curiously absent from the article. First, the new technology generates new jobs in new companies and many of these are high-skill jobs like software development, product management, algorithm design, and so on. Second, the computers can surface information but this software does not replace the need to have human beings (analysts) to look at the output, and interpret the information. This key point is often lost: reports and dashboards are useless until someone looks at them. Business analysts also are high-skill jobs.
  • Finally, I have direct experience with this sort of legal discovery processes. A giant amount of billings involves people manually scanning all printed documents collected from your office, including every copy of identical documents you might possess. If you distribute copies of a presentation to 20 people in a meeting, and these 20 people are all considered people of interest, all 20 copies will be scanned eventually. These billings I suspect won't be replacable by computers.

I am not saying these companies are hawking vaporware. For some tasks, computers clearly can do a much better job than humans. I just think that our technology reporters can serve us better by covering both the promise and the limitation of new tools. They can start by interviewing users of such software, both satisifed users and unhappy users.