Another data miner who thinks he can "predict terrorist attacks"
... when all he's saying is something like I can tell you there is global warming but I can't tell you whether it will rain tomorrow.
And I am not the least bit convinced.
The reporter seems to be confused about what exactly this guy claims to be able to do. At one point, he tells us:
Clauset hopes, for example, that his work will enable predictions of when terrorists might get their hands on a nuclear, biological or chemical weapon — and when they might use it.
Later, the reporter says:
Clauset’s method is unlikely to predict exactly where or when an attack might occur. Instead, he deals in probabilities that unfold over months, years and decades.
So which is it? Is he predicting or not?
If you read the full article, you'll find that Clauset is very modest about what his research could do so it is more of a journalistic flourish which gives the impression of something ground-breaking. Or put differently, he may be contributing to understanding patterns of occurrence at a high level of abstraction but he is not in any way focused on predicting the next attack.
***
This research has one fundamental limitation: it cannot be falsified. This is the same problem with macroeconomic forecasting or climate modeling; we only have one history (technically, one sample path). Any number of models, with sufficient complexity, can fit that history.
I think of these kinds of data mining exercises as essentially descriptive in nature. I have a problem with conflating them with "predictive modeling". The only way to assess predictive models is their predictive performance. It is very difficult to validate predictions of events that occur extremely rarely. Because of this, people in the terrorist prediction business get a free pass: how do you know whether they can predict the future or not?
It turns out that this researcher has already had to retract a "prediction":
In a 2005 draft of their paper, Clauset and his collaborators projected that another 9/11-magnitude attack would occur within seven years, a finding that sparked newspaper headlines (“Physicists Predict Next 9/11 In Seven Years”). Clauset now says there were too many uncertainties in the data to make such a specific prediction. “What we had said was, if the future is exactly like the past and the assumptions of the model are correct, this is what you would expect,” he says. “But that number I don’t trust.”
This failure is presented as a lesson learned. In fact, this is one of the first lessons of data mining, perhaps even the first lesson. And yet, we are asked to trust him. We are told he "has been invited to consult with the Department of Defense, the Department of Homeland Security and other government agencies."
So, when he makes a prediction like "It’s well within the realm of possibility within the next 50 years that a low-yield nuclear bomb is detonated as a terrorist attack somewhere in the world," I don't know on what basis I should believe it. But the reporter has no such issues -- according to him, "Clearly, that is an eventuality society might want to be prepared for." I am unable to locate this "clarity".
***
The second lesson of data mining (or any kind of statistical modeling for that matter) can be what I have called the false belief in true models. And unfortunately, Clauset is not immune to this either. Consider this passage (my italics):
For example, knowing a group’s size should enable governments and law enforcement to gauge the true threat it poses (because the power law proves that size determines the frequency with which it can attack).
“It tells you that while a lot of things are flexible — different terrorist organizations are very different — there are a couple of things that they can’t change,” Clauset says. “That means that even if they know that we know this, they can’t do anything about it.”
So, because he's found patterns "over months, years, and decades", these patterns will simply have to persist in the future because the mathematics say they have to. These patterns is a kind of gravity that people can do nothing about. There is an implicit assumption that past history contains all useful information about the future.
***
If you're still reading, this is turning into a primer on data mining. Third lesson: models do not make decisions; people make decisions. This means that you can never keep politics out. Besides, the person building models has to make lots of choices during the construction process, and each such choice is subjective, and can have strong impacts on the nature of the models. (You only have to sample the controversies surrounding climate models to see what I mean.)
So it is folly to believe that this work is "less prone to ideological distortion" or that "it forces the conversation to remain analytical and apolitical". The right reference point is not model v. no model but model A v. model B. My experience is that a standoff between model A and model B is every bit as political and ideological as you could imagine. And that's because there is no such thing as a "true model" (certainly not in the social science or business setting.)