Examining the eleventh lecture from Professor Michael Spagat’s Economics of Warfare course that he gives at Royal Holloway University. It is posted on his blog Wars, Numbers and Human Losses at: https://mikespagat.wordpress.com/
This lecture discusses analysis of cross-country datasets, correlations, and then discussed some problems with statistical testing in general. This is worth reading carefully in its entirety.
The datasets they are discussing in the first slides I assume are from the “correlates of war” (COW) dataset, a publically available data set that many in academia have used. We have never used it. When we created the MISS (Modern Insurgency Spread Sheets…now called DISS), we built them entirely from our own research.
He then looks at two different studies on the probability of conflict, one done by Fearon and Laitin (slide 5) and the other done by Collier and Hoeffler (slide 15). Even though they are based upon the same data, they produced somewhat different results (all, of course, to 90% or 95% confidence intervals). He summarizes the conclusions of the Fearon and Laitin study on slides 8 and 10 and the conclusions to the Collier and Hoeffler study on slides 16 and 18. It is worth comparing the differences.
Throughout this paper, he starts giving warnings about the problems with this analysis. First he discusses “story lines” on slides 12 – 14. This is important. One you have a correlation….then most people are clever enough to be able to explain why such a correlation exists, be it right or wrong.
But the part of the lecture that hit home with me starts with the statement that “These reported results may just have come out out that way by luck or chance.” (slide 20). The cartoon on slide 22 makes the point. Basically, if you test 20 different things, even if they are completely irrelevant, even to a 95% confidence interval; then with average luck you will get at least one correlation! Test enough things, and you will get a correlation. By the same token, add enough variables to your regression model and you will get a fit.
This is done all the time and I did discuss it in America’s Modern Wars, page 73-75 on a study done by CAA (Center for Army Analysis) in 2009 using our MISS. As I note in the book they identified 34 variables and then built a regression model based upon 11 of them and then boiled the final model down to four: 1) Number of Red Factions, 2) Counterinsurgent per Insurgent Ratio (Peak), 3) Counterinsurgent Developed Nation and 4) Political Concept. As their model was based on force ratio and political concept, it was similar to my regression model, except they added two more variables to the model. The problem is that one of those variables, “Number of Red Factions,” should not have been added. As I note in my book “In our original research we did not systematically and rigorously establish a count of factions for insurgency…. It should not have been used as a variable without further research.”
To continue from my book: “My fear it that this variable (“number of factions”) worked in their regression model because it was helping to shape the curve even though there is not a clear cause-and-effect relationship here. Also, because of the methodology they choose, which was establishing variables based upon statistical significance, as opposed to there being a solid theoretical basis for it, then I believe that statistically there should be around two ‘false’ correlations among those 11 variables.”
I end up concluding: “My natural tendency as a modeler was to make sure I had clearly identified cause-and-effect relationships before I moved forward. That is why my approach starts simply (two variables) and moves forward from there. It is also why I independently examined each possible variable in some depth. In addition, I reviewed and examined a range of theorists before proceeding (see Chapter Seventeen). I have had the experience of dumping lots of variables into a regression model, and lo-and-behold, something fits. It is important to make sure you have clearly established cause-and-effect.”
Anyhow, what Dr. Spagat warns of on slide 22 is what some people are actually doing. It is not a mistake made by grad students, but a mistake that a professional DOD analytical organization has done.
Enough preaching; the link to the lecture is here: http://personal.rhul.ac.uk/uhte/014/Economics%20of%20Warfare/Lecture%2011.pdf