Know your data 49: the risk of story-telling with data

What we learn from another case of mistaken identity

Know your data 49: the risk of story-telling with data
Photo by Slim Emcee / Unsplash

This ArsTechnica story (link) will continue to recur so long as most of us, especially those of us in the U.S., ignore the forever expansion of surveillance state.

A man was arrested and falsely accused of luring a child at a McDonald's, based on an erroneous result of a facial recognition software widely used by local police in the U.S. It turned out the man lives more than 300 miles away from that McDonald's, and has never even been to that town. Nevertheless, the police obtained an arrest warrant, took the man into custody, and the man required legal assistance to "prove his innocence," a perverse reversal of our legal doctrine of innocent-until-proven-guilty.

The story was excellently written, as it covers several ingredients that contribute to these wrongful arrests. Data by themselves are not sufficient to do harm. The blame goes beyond the facial recognition company's practice of assembling images of Americans, 40 million and counting.

It's also that the facial recognition industry has failed to encourage proper usage of such powerful technology. Instead, the industry issues meaningless statistics conveying misleading confidence about the reliability of the technology. In this case, the man supposedly matched the perpetrator's image "93%". What does 93% mean? It's anyone's guess. The technology vendor calls this number a "confidence score." The police did not seem to understand what the number measures, but 93 percent sounds like a highly confident match.

The falsely-accused man lamented, "Says it’s 93 percent accurate. Far as I’m concerned, it’s 100 percent inaccurate." And he would be right.

The real culprit is story-telling with data. It's what I've been calling "story time": start with one tangible piece of data, then spin it into a grand lullaby.

The single piece of evidence here is the 93% face match. Add a heavy dose of narrative fallacy. The investigators formulated a theory of the man as the perpetrator, discarding anything that didn't fit the story. For example, one officer checked the man's license plate numbers against (yet another) surveillance database, which turned up nothing. Nonetheless, this contradictory evidence did not change their theory; the police ignored it when applying for the arrest warrant.

Moreover, we learned that the investigators were blinded by their conviction. They could have collected plenty of corroborative evidence, such as the McDonald's receipt. The investigators didn't even ask. If they had, they might have overturned their faulty theory.


Anyone who has worked with data recognizes such dangers (whether we are able to avert them is a different matter!). We all tend to be skeptical of the analysis when the analysis disagrees with our theory, asking few questions otherwise. It takes a lot of mental strength to resist narrative fallacy and blind spots.

Surveillance technology gives the impression it's all knowing. People frequently believe that the surveillance companies have "all our data". Ultimately, the technologies offer samples with gaps that must be filled, gaps which frequently get grouted with stories.

Data and investigators form the worst pair, as they both laser-focus on positive identification, and neglect negative evidence. When was the last time you heard a forensic software company state their objective as "prove his innocence"? Every application in that space wants to "prove guilt"! Many databases are records of transactions. If someone is innocent, his data would not appear in the database.


If you think you'd heard this story before, you'd be right. A similar story happened to a grandmother, which I featured in this post.