Big data

Big data

To me, statistics is about searching for beauty in simplicity.  Much of our discipline is concerned with data reduction, or finding creative representation that consumes less space than the raw data.  That's why I have mixed feelings about complicated, multi-dimensional, dynamic, user-controlled, gee-whiz displays of data.  I have no issue with these as works of art but they tend not to enhance our understanding of the data.

I take a marginally relevant example from Google's well-illustrated 2005 Zeitgeist report.

Googlez_2


The annotator nailed the key insights from this data, especially the flatness of "surfing" versus the seasonality of "snowboarding".  Something is not right with the week-by-week fluctuations: these represent noise that interferes with our perception of the underlying seasonal trend.  An easy remedy is to "smooth" the data using a moving average, exponential smoothing, etc.  The smoothed data will not contain such jaggedness, making it easier to read.