Summation of our Validation Posts

This extended series of posts about validation of combat models was originally started by Shawn Woodford’s post on future modeling efforts and the “Base of Sand” problem.

Wargaming Multi-Domain Battle: The Base Of Sand Problem

This post apparently irked some people at TRADOC and they wrote an article in the December issue of the Phalanx referencing his post and criticizing it. This resulted in the following seven responses from me:

Engaging the Phalanx


Validating Attrition

Physics-based Aspects of Combat

Historical Demonstrations?


Engaging the Phalanx (part 7 of 7)

This was probably overkill…..but guys who write 1,662 page books sometimes tend to be a little wordy.

While it is very important to identify a problem, it is also helpful to show the way forward. Therefore, I decided to discuss what data bases were available for validation. After all, I would like to see the modeling and simulation efforts to move forward (and right now, they seem to be moving backward). This led to the following nine posts:

Validation Data Bases Available (Ardennes)

Validation Data Bases Available (Kursk)

The Use of the Two Campaign Data Bases

The Battle of Britain Data Base

Battles versus Campaigns (for Validation)

The Division Level Engagement Data Base (DLEDB)

Battalion and Company Level Data Bases

Other TDI Data Bases

Other Validation Data Bases

There were also a few other validation issues that had come to mind while I was writing these blog posts, so this led to the following series of three posts:

Face Validation

Validation by Use

Do Training Models Need Validation?

Finally, there were a few other related posts that were scattered through this rather extended diatribe. It includes the following six posts:

Paul Davis (RAND) on Bugaboos


TDI Friday Read: Engaging The Phalanx

Combat Adjudication

China and Russia Defeats the USA

Building a Wargamer

That kind of ends this discussion on validation. It kept me busy for while. Not sure if you were entertained or informed by it. It is time for me to move onto another subject, not that I have figured out yet what that will be.

Dupuy’s Verities: The Inefficiency of Combat

The “Mud March” of the Union Army of the Potomac, January 1863.

The twelfth of Trevor Dupuy’s Timeless Verities of Combat is:

Combat activities are always slower, less productive, and less efficient than anticipated.

From Understanding War (1987):

This is the phenomenon that Clausewitz called “friction in war.” Friction is largely due to the disruptive, suppressive, and dispersal effects of firepower upon an aggregation of people. This pace of actual combat operations will be much slower than the progress of field tests and training exercises, even highly realistic ones. Tests and exercises are not truly realistic portrayals of combat, because they lack the element of fear in a lethal environment, present only in real combat. Allowances must be made in planning and execution for the effects of friction, including mistakes, breakdowns, and confusion.

While Clausewitz asserted that the effects of friction on the battlefield could not be measured because they were largely due to chance, Dupuy believed that its influence could, in fact, be gauged and quantified. He identified at least two distinct combat phenomena he thought reflected measurable effects of friction: the differences in casualty rates between large and small sized forces, and diminishing returns from adding extra combat power beyond a certain point in battle. He also believed much more research would be necessary to fully understand and account for this.

Dupuy was skeptical of the accuracy of combat models that failed to account for this interaction between operational and human factors on the battlefield. He was particularly doubtful about approaches that started by calculating the outcomes of combat between individual small-sized units or weapons platforms based on the Lanchester equations or “physics-based” estimates, then used these as inputs for brigade and division-level-battles, the results of which in turn were used as the basis for determining the consequences of theater-level campaigns. He thought that such models, known as “bottom up,” hierarchical, or aggregated concepts (and the prevailing approach to campaign combat modeling in the U.S.), would be incapable of accurately capturing and simulating the effects of friction.

Building a Wargamer

Interesting article from Elizabeth Bartels of RAND from November 2018. It is on the War on the Rocks website. Worth reading: Building a Pipeline of Wargaming Talent

Let me highlight a few points:

  1. “On issues ranging from potential conflicts with Russia to the future of transportation and logistics, senior leaders have increasingly turned to wargames to imagine potential futures.”
  2. “The path to becoming a gamer today is modeled on the careers of the last generation of gamers — most often members of the military or defense analysts with strong roots in the hobby gaming community of the 1960s and 1970s.”
    1. My question: Should someone at MORS (Military Operations Research Society) nominate Charles S. Roberts and James F. Dunnigan for the Vance R. Wanner or the Clayton J. Thomas awards? (see:
  3. One notes that there is no discussion of the “Base of Sand” problem.
  4. One notes there is no discussion of VVA (Verification, Validation and Accreditation)
  5. The picture heading her article is of a hex board overlaid by acetate.

Do Training Models Need Validation?

Do we need to validate training models? The argument is that as the model is being used for training (vice analysis), it does not require the rigorous validation that an analytical model would require. In practice, I gather this means they are not validated. It is an argument I encountered after 1997. As such, it is not addressed in my letters to TRADOC in 1996: See

Over time, the modeling and simulation industry has shifted from using models for analysis to using models for training. The use of models for training has exploded, and these efforts certainly employ a large number of software coders. The question is, if the core of the analytical models have not been validated, and in some cases, are known to have problems, then what are the models teaching people? To date, I am not aware of any training models that have been validated.

Let us consider the case of JICM. The core of the models attrition calculation was the Situational Force Scoring (SFS). Its attrition calculator for ground combat is based upon a version of the 3-to-1 rule comparing force ratios to exchange ratios. This is discussed in some depth in my book War by Numbers, Chapter 9, Exchange Ratios. To quote from page 76:

If the RAND version of the 3 to 1 rule is correct, then the data should show a 3 to 1 force ratio and a 3 to 1 casualty exchange ratio. However, there is only one data point that comes close to this out of the 243 points we examined.

That was 243 battles from 1600-1900 using our Battles Data Base (BaDB). We also tested it to our Division Level Engagement Data Base (DLEDB) from 1904-1991 with the same result. To quote from page 78 of my book:

In the case of the RAND version of the 3 to 1 rule, there is again only one data point (out of 628) that is anywhere close to the crossover point (even fractional exchange ratio) that RAND postulates. In fact it almost looks like the data conspire to leave a noticeable hole at that point.

So, does this create negative learning? If the ground operations are such that an attacking ends up losing 3 times as many troops as the defender when attacking at 3-to-1 odds, does this mean that the model is training people not to attack below those odds, and in fact, to wait until they have much more favorable odds? The model was/is (I haven’t checked recently) being used at the U.S. Army War College. This is the advanced education institute that most promotable colonels attend before advancing to be a general officer. Is such a model teaching them incorrect relationships, force ratios and combat requirements?

You fight as you train. If we are using models to help train people, then it is certainly valid to ask what those models are doing. Are they properly training our soldiers and future commanders? How do we know they are doing this. Have they been validated?

Validation by Use

Sacrobosco, Tractatus de Sphaera (1550 AD)

Another argument I have heard over the decades is that models are validated by use. Apparently the argument is that these models have been used for so long, and so many people have worked with their outputs, that they must be fine. I have seen this argument made in writing by a senior army official in 1997 in response to a letter addressing validation that we encouraged TRADOC to be send out:


I doubt that there is any regulation discussing “validation by use,” and I doubt anyone has ever defended this idea in public paper. Still, it is an argument that I have heard used far more than once or twice.

Now, part of the problem is that some of these models have been around a few decades. For example, the core of some of the models used by CAA, for example COSAGE, first came into existence in 1969. They are using a 50-year updated model to model modern warfare. My father worked with this model. RAND’s JICM (Joint Integrated Contingency Model) dates back to the 1980s, so it is at least 30 years old. The irony is that some people argue that one should not use historical warfare examples to validate models of modern warfare. These models now have a considerable legacy.

From a practical point of view, it means that the people who originally designed and developed the model have long since retired. In many cases, the people who intimately knew the inner workings of the model have also retired and have not really been replaced. Some of these models have become “black boxes” where the users do not really know the details of how the models calculate their results. So suddenly, validation by use seems like a reasonable argument, because these models pre-date the analysts, and they assume that there is some validity to them, as people have been using them. They simple inherited the model. Why question it?

Illustration by Bartolomeu Velho, 1568 AD

China and Russia Defeats the USA

A couple of recent articles on that latest wargaming effort done by RAND:

The opening line states: “The RAND Corporation’s annual ‘Red on Blue’ wargame simulation found that the United States would be a loser in a conventional confrontation with Russia and China.”

A few other quotes:

  1. “Blue gets its ass handed to it.”
  2. “…the U.S. forces ‘suffer heavy losses in one scenario after another and still can’t stop Russia or China from overrunning U.S. allies in the Baltics or Taiwan:”

Also see:

A few quotes from that article:

  1. “The US and NATO are unable to stop an attack in the Balkans by the Russians,….
  2. “…and the United States and its allies are unable to prevent the takeover of Taiwan by China.

The articles do not state what simulations were used to wargame this. The second article references this RAND study (RAND Report) but my quick perusal of it did not identify what simulations were used. A search on the words “model” and “wargame” produced nothing. The words “simulation” and “gaming” leads to the following:

  1.  “It draws on research, analysis, and gaming that the RAND Corporation has done in recent years, incorporating the efforts of strategists, regional specialists, experts in both conventional and irregular military operations, and those skilled in the use of combat simulation tools.”
  2. “Money, time, and talent must therefore be allocated not only to the development and procurement of new equipment and infrastructure, but also to concept development, gaming and analysis, field experimentation, and exploratory joint force exercises.”

Anyhow, curious as to what wargames they were using (JICM – Joint Integrated Contingency Model?). I was not able to find out with a cursory search.

Dupuy’s Verities: The Effects of Firepower in Combat

A German artillery barrage falling on Allied trenches, probably during the Second Battle of Ypres in 1915, during the First World War. [Wikimedia]

The eleventh of Trevor Dupuy’s Timeless Verities of Combat is:

Firepower kills, disrupts, suppresses, and causes dispersion.

From Understanding War (1987):

It is doubtful if any of the people who are today writing on the effect of technology on warfare would consciously disagree with this statement. Yet, many of them tend to ignore the impact of firepower on dispersion, and as a consequence they have come to believe that the more lethal the firepower, the more deaths, disruption, and suppression it will cause. In fact, as weapons have become more lethal intrinsically, their casualty-causing capability has either declined or remained about the same because of greater dispersion of targets. Personnel and tank loss rates of the 1973 Arab-Israeli War, for example, were quite similar to those of intensive battles of World War II and the casualty rates in both of these wars were less than in World War I. (p. 7)

Research and analysis of real-world historical combat data by Dupuy and TDI has identified at least four distinct combat effects of firepower: infliction of casualties (lethality), disruption, suppression, and dispersion. All of them were found to be heavily influenced—if not determined—by moral (human) factors.

Again, I have written extensively on this blog about Dupuy’s theory about the historical relationship between weapon lethality, dispersion on the battlefield, and historical decline in average daily combat casualty rates. TDI President Chris Lawrence has done further work on the subject as well.

TDI Friday Read: Lethality, Dispersion, And Mass On Future Battlefields

Human Factors In Warfare: Dispersion

Human Factors In Warfare: Suppression

There appears to be a fundamental difference in interpretation of the combat effects of firepower between Dupuy’s emphasis on the primacy of human factors and Defense Department models that account only for the “physics-based” casualty-inflicting capabilities of weapons systems. While U.S. Army combat doctrine accounts for the interaction of firepower and human behavior on the battlefield, it has no clear method for assessing or even fully identifying the effects of such factors on combat outcomes.