By Kaiser Fung in Statistics — 05 Aug 2020

Tautology in data arguments

Kaiser discusses statistical tautologies.

One of the less discussed fallacies in statistical arguments is tautology. By this, I mean saying the same thing twice. Or, a conclusion that is true by assumption.

A recent example is the following popular argument:

(A) Our economy has suffered greatly because of anti-pandemic measures such as lockdowns, as evidenced by layoffs and GDP decline.

(B) Meanwhile, the deaths due to Covid-19 are insignificant - relative to the lost jobs. (*)

The conclusion is that public health measures were useless.

Every statistical data argument has assumptions. What is a key assumption behind the jump from (B) to (C)? It is that the deaths due to Covid-19 as reported would be the same without the anti-pandemic measures that were imposed. Said differently, the assumption is that anti-pandemic measures are useless.

If one assumes anti-pandemic measures are useless, one will conclude that anti-pandemic measures are useless. So the argument amounts to saying the same thing twice. It is tautological.

Imagine if the opposite assumption were made, that anti-pandemic measures are effective at suppressing the death toll. Then, one will conclude that those measures are useful.

It turns out it's very easy to fall into the trap of tautologies in making statistical (data) arguments. That's because not everything is measured or measurable. Conclusions are an amalgam of data and assumptions. It's not always clear even to the analyst what assumptions have been made.

(*) Some article I came across this morning said there were 330 layoffs for every Covid-19 death in the U.S. This layoff-to-Covid-death metric is apples-to-oranges. All layoffs are counted as if there would not have been a single layoff in the absence of the pandemic (untrue) while only deaths that are confirmed with a positive SAR-CoV-2 diagnosis are counted in the denominator (under-counted).

***

Data science is not immune to tautologies. Here's one:

(A) We run an e-commerce website, and have developed an algorithm to recommend products to our customers.

(B) After the recommendation engine was launched, browsing (page views) of the top recommended products increased significantly relative to past trends.

The conclusion is that the engine recommended the right set of products. What is a key assumption behind the jump from (B) to (C)? It is that page views of other products would not increase similarly if they were recommended to the customers. Said differently, the assumption is that the engine recommended the right products to these customers.

If one assumes the recommendation engine works, one will conclude that the recommendation engine works. The argument amounts to saying the same thing twice. It is tautological.

Alternatively, one can assume that the engine is useless. Whatever is recommended to customers will get top views. In that case, one concludes that the recommendation engine is useless.

***

Have you come across examples of tautologies? Comment below!

Tautology in data arguments

This exercise plan for your lock-down work-out is inspired by Venn

A testing mess across the pond

This exercise plan for your lock-down work-out is inspired by Venn

A testing mess across the pond

You might also like...