Changing metrics when you dislike the results

Kaiser reacts to New York Governer's attempt to re-define what counts as a Covid-19 case.

Changing metrics when you dislike the results
Xkcd-2073-kilogram

(Source: xkcd)

Chapter 2 of Numbersense (link) is titled "Can a new statistic make us less fat?" In the context of recent remarks by the acting New York State Governor, perhaps I should have called it "Can a new statistic save some people from dying?"

Trick question:

An executive tracks performance using a key metric displayed on a dashboard. The executive requests a review of the definition of the metric, then decides the definition must be revised, and after the revision, the new metric looks better than before. Which of the following is true?

a) Using the prior definition, the key metric was outperforming expectation

b) Using the prior definition, the key metric was underperforming expectation

***

The New York TImes reported that "Ms. Hochul said that some hospital executives have told her between 20 and 50 percent of their Covid patients are not suffering from severe symptoms, but that they are testing positive in the hospital incidentally, after being admitted for other reasons such as car accidents...As a result, beginning Tuesday, the state will begin to ask hospitals to break down how many patients are being admitted for acute Covid-19 symptoms, in an effort to further decipher this wave’s severity."

This is statistical alchemy that is frequently requested by people who don't like what the numbers are showing. It is a trick because they did not commit to restating the entire data series back to the start of data collection. This aforementioned issue is a "problem" only if they had recently changed the counting rules to include admissions for non Covid-19 reasons, therefore causing recent data to be inflated relative to older data, but that's not what they're saying.

Further, notice that they are commingling two separate re-definitions: one is Covid-19 vs not Covid-19; the other is mild vs "acute" Covid-19. I have never heard that "cases" only refer to "acute" Covid-19 cases, until now. The new definition relies on a subjective definition of "acute" and so they can report whatever number they want going forward.

To have any credibility, they need to (a) publish all the definitions that have been deployed in the last 2+ years, including the latest one, and (b) apply the latest definition to restate all historical data. Otherwise, no one can tell whether the trend is up or down since we are not comparing apples to apples!

To have maximum credibility, they should do (c) a comprehensive review of the definition, and correct for problems that undercount as well as overcount. For example, are they missing lots of cases found by people doing at-home testing? How are they dealing with people with multiple test results? (This is no small matter: if an infected person will take further tests till s/he tests negative, then each positive test is likely to be balanced by one or more negative tests in the near future. This dynamic affects how we can interpret the time series of positive tests divided by number of tests.)

***

_numbersense_cover

In Numbersense (link), I trace how every few years, someone writes an article saying we need a new definition of obesity. The typical reason is that BMI classifies fit athletes as obese. I have never seen someone provide an analysis of how BMI misclassifies the average person. It's always the outliers. Notice in the above quote, the Governor did not give an aggregate proportion of what she considers misclassified. She said "some" hospitals gave her a concerning statistic.

As a data analyst, you have to think about the likelihood that the sample of hospitals she heard from is representative of all hospitals. Imagine you're working at a hospital for which most of the patients testing positive for Covid-19 are being treated for Covid-19. How likely are you to pick up the phone and call the Governor to tell her that you do not have a problem?

Thus, the sample she is working with is biased towards those hospitals experiencing this "problem". The real proportion of cases being "misclassified" is surely lower than what she announced.