Govt shutdown shines light on missing data
What data will be missing, how will they be backfilled, and how do they affect us
You haven't seen me on Tiktok, but I'm there now, thanks to FullTake who interviewed me last week about how the government shutdown affects data collection.
Here's the clip:
@fulltake The longest government shutdown in U.S. history has stalled data reports from the Bureau of Labor Statistics. Former Columbia University program director and data science consultant, Kaiser Fung, tells us what will happen as a result of these report suspensions. #data
♬ original sound - fulltake
Statisticians don't like holes in the data, especially avoidable ones.
The government shutdown is punching holes in the datasets that underlie U.S. economic reports. These datasets rely on "shoe leather," staff conducting interviews about employment situation, or visiting retail stores to compile lists of prices. During the impasse, data collection has been suspended.
What happens after the government reopens?
We know that the furloughed employees typically get back pay, undoing the damage in one sense. However, data that weren't collected could not be replaced.
Prices displayed on store shelves one or two or three months later aren't necessarily the prices during the shutdown. When it comes to employment, it is possible to ask someone how many hours they were working several months ago. But such replacement data introduce recall bias. The more unsteady is a person's employment, the more inaccurate his/her answer. In fact, anyone with a steady job isn't contributing to recall error.
Alternatively, BLS can apply statistical methods to fill data gaps. Think of these fillers as part data, part assumptions. The most famous simple backfill method is "mean imputation," which is a jargonistic way of saying "replace missing values with the average value of the non-missing." Backfilling is typically biased toward maintaining the status quo, because the most common – and least assailable – assumption is that the future replays the past. This assumption is likely to misfire in light of high economic uncertainty.
The government statisticians can elect not to fill in the gaps. This is an act of passing the buck because analysts who use these data series would then have to prepare their own filling materials.
How will any of this affect you and I?
Here's one way. The CPI is used by the government to determine cost-of-living adjustments for Social Security payments. Similarly, employers may use CPI to figure out annual pay increases.
Let's say BLS economists backfilled the missing values caused by the pause in data collection. These fillers mostly reflect assumptions as there aren't much, if any, data. The key assumption is likely rolling forward the status quo. If the inflation trend continues, then we would have a few months in which the CPI is under-estimated. This could lead to lower-than-warranted cost-of-living adjustments. (Imagine, for example, the adjustment formula is based on an average of some number of historical monthly inflation figures.)