Validating A Combat Model (Part IV)

[The article below is reprinted from April 1997 edition of The International TNDM Newsletter.]

The First Test of the TNDM Battalion-Level Validations: Predicting the Winners
by Christopher A. Lawrence

Part I

In the basic concept of the TNDM battalion-level validation, we decided to collect data from battles from three periods: WWI, WWII, and post-WWII. We then made a TNDM run for each battle exactly as the battle was laid out, with both sides having the same CEV [Combat Effectiveness Value]. The results of that run indicated what the CEV should have been for the battle, and we then made a second run using that CEV. That was all we did. We wanted to make sure that there was no “tweaking” of the model for the validation, so we stuck rigidly to this procedure. We then evaluated each run for its fit in three areas:

  1. Predicting the winner/loser
  2. Predicting the casualties
  3. Predicting the advance rate

We did end up changing two engagements around. We had a similar situation with one WWII engagement (Tenaru River) and one modern period engagement (Bir Gifgafa), where the defender received reinforcements part-way through the battle and counterattacked. In both cases we decided to run them as two separate battles (adding two more battles to our database), with the conditions from the first engagement being the starting strength, plus the reinforcements, for the second engagement. Based on our previous experience with running Goose Green, for all the Falklands Island battles we counted the Milans and Carl Gustavs as infantry weapons. That is the only “tweaking” we did that affected the battle outcome in the model. We also put in a casualty multiplier of 4 for WWI engagements, but that is discussed in the article on casualties.

This is the analysis of the first test, predicting the winner/loser. Basically, if the attacker won historically, we assigned it a value of 1, a draw was 0, and a defender win was -1. In the TNDM results summary, it has a column called “winner” which records either an attacker win, a draw, or a defender win. We compared these two results. If they were the same, this is a “correct” result. If they are “off by one,” this means the model predicted an attacker win or loss, where the actual result was a draw, or the model predicted a draw, where the actual result was a win or loss. If they are “off by two” then the model simply missed and predicted the wrong winner.

The results are (the envelope please….):

It is hard to determine a good predictability from a bad one. Obviously, the initial WWI prediction of 57% right is not very good, while the Modern second run result of 97% is quite good. What l would really like to do is compare these outputs to some other model (like TACWAR) to see if they get a closer fit. I have reason to believe that they will not do better.

Most cases in which the model was “off by 1″ were easily correctable by accounting for the different personnel capabilities of the army. Therefore, just to look where the model really failed. let‘s just look at where it simply got the wrong winner:

The TNDM is not designed or tested for WWI battles. It is basically designed to predict combat between 1939 and the present. The total percentages without the WWI data in it are:

Overall, based upon this data I would be willing to claim that the model can predict the correct winner 75% of the time without accounting for human factors and 90% of the time if it does.

CEVs: Quite simply a user of the TNDM must develop a CEV to get a good prediction. In this particular case, the CEVs were developed from the first run. This means that in the second run, the numbers have been juggled (by changing the CEV) to get a better result. This would make this effort meaningless if the CEVs were not fairly consistent over several engagements for one side versus its other side. Therefore, they are listed below in broad groupings so that the reader can determine if the CEVs appear to be basically valid or are simply being used as a “tweak.”

Now, let’s look where it went wrong. The following battles were not predicted correctly:

There are 19 night engagements in the data base, five from WWI, three from WWII, and 11 modern. We looked at whether the miss prediction was clustered among night engagements and that did not seem to be the case. Unable to find a pattern, we examined each engagement to see what the problem was. See the attachments at the end of this article for details.

We did obtain CEVs that showed some consistency. These are shown below. The Marines in World War l record the following CEVs in these WWI battles:

Compare those figures to the performance of the US Army:

In the above two and in all following cases, the italicized battles are the ones with which we had prediction problems.

For comparison purposes, the CEVs were recorded in the battles in World War II between the US and Japan:

For comparison purposes, the following CEVs were recorded in Operation Veritable:

These are the other engagements versus Germans for which CEVs were recorded:

For comparison purposes, the following CEVs were recorded in the post-WWII battles between Vietnamese forces and their opponents:

Note that the Americans have an average CEV advantage of 1 .6 over the NVA (only three cases) while having a 1.8 advantage over the VC (6 cases).

For comparison purposes, the following CEVs were recorded in the battles between the British and Argentine’s:

Next: Part II: Conclusions

Validating A Combat Model (Part III)

[The article below is reprinted from April 1997 edition of The International TNDM Newsletter.]

Numerical Adjustment of CEV Results: Averages and Means
by Christopher A. Lawrence and David L. Bongard

As part of the battalion-level validation effort, we made two runs with the model for each test case—one without CEV [Combat Effectiveness Value] incorporated and one with the CEV incorporated. The printout of a TNDM [Tactical Numerical Deterministic Model] run has three CEV figures for each side: CEVt CEVl and CEVad. CEVt shows the CEV as calculated on the basis of battlefield results as a ratio of the performance of side a versus side b. It measures performance based upon three factors: mission accomplishment, advance, and casualty effectiveness. CEVt is calculated according to the following formula:

P′ = Refined Combat Power Ratio (sum of the modified OLls). The ′ in P′ indicates that this ratio has been “refined” (modified) by two behavioral values already: the factor for Surprise and the Set Piece Factor.

CEVd = 1/CEVa (the reciprocal)

In effect the formula is relative results multiplied by the modified combat power ratio. This is basically the formulation that was used for the QJM [Quantified Judgement Model].

In the TNDM Manual, there is an alternate CEV method based upon comparative effective lethality. This methodology has the advantage that the user doesn’t have to evaluate mission accomplishment on a ten point scale. The CEVI calculated according to the following formula:

In effect, CEVt is a measurement of the difference in results predicted by the model from actual historical results based upon assessment for three different factors (mission success, advance rates, and casualties), while CEVl is a measurement of the difference in predicted casualties from actual casualties. The CEVt and the CEVl of the defender is the reciprocal of the one for the attacker.

Now the problem comes in when one creates the CEVad, which is the average of the two CEVs above. l simply do not know why it was decided to create an alternate CEV calculation from the old QJM method, and then average the two, but this is what is currently being done in the model. This averaging results in a revised CEV for the attacker and for the defender that are not reciprocals of each other, unless the CEVt and the CEVl were the same. We even have some cases where both sides had a CEVad of greater than one. Also, by averaging the two, we have heavily weighted casualty effectiveness relative to mission effectiveness and mission accomplishment.

What was done in these cases (again based more on TDI tradition or habit, and not on any specific rule) was:

(1.) If CEVad are reciprocals, then use as is.

(2.) If one CEV is greater than one while the other is less than 1,  then add the higher CEV to the value of the reciprocal of the lower CEV (1/x) and divide by two. This result is the CEV for the superior force, and its reciprocal is the CEV for the inferior force.

(3.) If both CEVs are above zero, then we divide the larger CEVad value by the smaller, and use its result as the superior force’s CEV.

In the case of (3.) above, this methodology usually results in a slightly higher CEV for the attacker side than if we used the average of the reciprocal (usually 0.1 or 0.2 higher). While the mathematical and logical consistency of the procedure bothered me, the logic for the different procedure in (3.) was that the model was clearly having a problem with predicting the engagement to start with, but that in most cases when this happened before (meaning before the validation), a higher CEV usually produced a better fit than a lower one. As this is what was done before. I accepted it as is, especially if one looks at the example of Mediah Farm. If one averages the reciprocal with the US’s CEV of 8.065, one would get a CEV of 4.13. By the methodology in (3.), one comes up with a more reasonable US CEV of 1.58.

The interesting aspect is that the TNDM rules manual explains how CEVt, CEVl and CEVad are calculated, but never is it explained which CEVad (attacker or defender) should be used. This is the first explanation of this process, and was based upon the “traditions” used at TDI. There is a strong argument to merge the two CEVs into one formulation. I am open to another methodology for calculating CEV. I am not satisfied with how CEV is calculated in the TNDM and intend to look into this further. Expect another article on this subject in the next issue.

Validating A Combat Model (Part II)

[The article below is reprinted from October 1996 edition of The International TNDM Newsletter.]

Validation of the TNDM at Battalion Level
by Christopher A. Lawrence

The original QJM (Quantified Judgement Model) was created and validated using primarily division-level engagements from WWII and the 1967 and 1973 Mid-East Wars. For a number of reasons, we are now using the TNDM (Tactical Numerical Deterministic Model) for analyzing lower-level engagements. We expect, with the changed environment in the world, this trend to continue.

The model, while designed to handle battalion-level engagements, was never validated for those size engagements. There were only 16 engagements in the original QJM Database with less than 5,000 people on one side, and only one with less than 2,000 people on a side. The sixteen smallest engagements are:

While it is not unusual in the operations research community to use unvalidated models of combat, it is a very poor practice. As TDI is starting to use this model for battalion-level engagements, it is time it was formally validated for that use. A model that is validated at one level of combat is not validated to represent sizes, types and forms of combat to which it has not been tested. TDI is undertaking a battalion-level validation effort for the TNDM. We intend to publish the material used and the results of the validation in the International TNDM Newsletter. As part of this battalion-level validation we will also be looking at a number of company-level engagements. Right now, my intention is to simply just throw all the engagements into the same hopper and see what comes out.

By battalion-level, I mean any operation consisting of the equivalent of two or less reinforced battalions on one side. Three or more battalions imply a regiment or brigade—level operation. A battalion in combat can range widely in strength, but that usually does not have an authorized strength in excess of 900. Therefore, the upper limit for a battalion—level engagement is 2,000 people, while its lower limit can easily go below 500 people. Only one engagement in the original OJM Database fits that definition of a battalion-level engagement. HERO, DMSI, TND & Associates, and TDI (all companies founded by Trevor N. Dupuy) examined a number of small engagements over the years. HERO assembled 23 WWI engagements for the Land Warfare Database (LWDB), TDI has done 15 WWII small unit actions for the Suppression contract and Dave Bongard has assembled four others from that period for the Pacific, DMSI did 14 battalion-level engagements from Vietnam for a study on low intensity conflict 10 years ago, and Dave Bongard has been independently looking into the Falkland Islands War and other post-WWII sources to locate 10 more engagements, and we have three engagements that Trevor N. Dupuy did for South Africa. We added two other World War II engagements and the three smallest engagements from the list to the left (those marked with an asterisk). This gives us a list of 74 additional engagements that can be used to test the TNDM.

The smallest of these engagements is 220 people on both sides (100 vs I20), while the largest engagement on this list is 5,336 versus 3,270 or 8,679 vs 725. These 74 engagements consist of 23 engagements from WWI, 22 from WWII, and 29 post-1945 engagements. There are three engagements where both sides have over 3,000 men and 3 more where both sides are above 2,000 men. In the other 68 engagements, at least one side is below 2,000, while in 50 of the engagements, both sides are below 2,000.

This leaves the following force sizes to be tested:

These engagements have been “randomly” selected in the sense that the researchers grabbed whatever had been done and whatever else was conveniently available. It is not a proper random selection, in the sense that every war in this century was analyzed and a representative number of engagements was taken from each conflict. This is not practical, so we settle for less than perfect data selection.

Furthermore, as many of these conflicts are with countries that do not have open archives (and in many cases limited unit records) some of the opposing forces strength and losses had to be estimated. This is especially true with the Viet Nam engagements. It is hoped that the errors in estimation deviate equally on both sides of the norm, but there is no way of knowing that until countries like the People’s Republic of China and Vietnam open up their archives for free independent research.

TDI intends to continue to look for battalion-level and smaller engagements for analysis, and may add to this data base over time. If some of our readers have any other data assembled, we would be interested in seeing it. In the next issue we will publish the preliminary results of our validation.

Note that in the above table, for World War II, German, Japanese, and Axis forces are listed in italics, while US, British, and Allied forces are listed in regular typeface, Also, in the VERITABLE engagements, the 5/7th Gordons’ action continues the assault of the 7th Black Watch, and that the 9th Cameronians assumed the attack begun by the 2d Gordon Highlanders.

Tu-Vu is described in some detail in Fall’s Street Without Joy (pp. 51-53). The remaining Indochina/SE Asia engagements listed here are drawn from a QJM-based analysis of low-intensity operations (HERO Report 124, Feb 1988).

The coding for source and validation status, on the extreme right of each engagement line in the D Cas column, is as follows:

  • n indicates an engagement which has not been employed for validation, but for which good data exists for both sides (35 total).
  • Q indicates an engagement which was part of the original QJM database (3 total).
  • Q+ indicates an engagement which was analyzed as part of the QJM low-intensity combat study in 1988 (14 total).
  • T indicates an engagement analyzed with the TNDM (20 total).

Validating A Combat Model

The question of validating combat models—“To confirm or prove that the output or outputs of a model are consistent with the real-world functioning or operation of the process, procedure, or activity which the model is intended to represent or replicate”—as Trevor Dupuy put it, has taken up a lot of space on the TDI blog this year. What this discussion did not address is what an effort to validate a combat model actually looks like. This will be the first in a series of posts that will do exactly that.

Under the guidance of Christopher A. Lawrence, TDI undertook a battalion-level validation of Dupuy’s Tactical Numerical Deterministic Model (TNDM) in late 1996. This effort tested the model against 76 engagements from World War I, World War II, and the post-1945 world including Vietnam, the Arab-Israeli Wars, the Falklands War, Angola, Nicaragua, etc. It was probably one of the more independent and better-documented validations of a casualty estimation methodology that has ever been conducted to date, in that:

  • The data was independently assembled (assembled for other purposes before the validation) by a number of different historians.
  • There were no calibration runs or adjustments made to the model before the test.
  • The data included a wide range of material from different conflicts and times (from 1918 to 1983).
  • The validation runs were conducted independently (Susan Rich conducted the validation runs, while Christopher A. Lawrence evaluated them).
  • The results of the validation were fully published.
  • The people conducting the validation were independent, in the sense that:

a) there was no contract, management, or agency requesting the validation;
b) none of the validators had previously been involved in designing the model, and had only very limited experience in using it; and
c) the original model designer was not able to oversee or influence the validation. (Dupuy passed away in July 1995 and the validation was conducted in 1996 and 1997.)

The validation was not truly independent, as the model tested was a commercial product of TDI, and the person conducting the test was an employee of the Institute. On the other hand, this was an independent effort in the sense that the effort was employee-initiated and not requested or reviewed by the management of the Institute.

Descriptions and outcomes of this validation effort were first reported in The International TNDM Newsletter. Chris Lawrence also addressed validation of the TNDM in Chapter 19 of War by Numbers (2017).

TDI Friday Read: Engaging The Phalanx

The December 2018 issue of Phalanx, a periodical journal published by The Military Operations Research Society (MORS), contains an article by Jonathan K. Alt, Christopher Morey, and Larry Larimer, entitled “Perspectives on Combat Modeling.” (the article is paywalled, but limited public access is available via JSTOR).

Their article was written partly as a critical rebuttal to a TDI blog post originally published in April 2017, which discussed an issue of which the combat modeling and simulation community has long been aware but slow to address, known as the “Base of Sand” problem.

Wargaming Multi-Domain Battle: The Base Of Sand Problem

In short, because so little is empirically known about the real-world structures of combat processes and the interactions of these processes, modelers have been forced to rely on the judgement of subject matter experts (SMEs) to fill in the blanks. No one really knows if the blend of empirical data and SME judgement accurately represents combat because the modeling community has been reluctant to test its models against data on real world experience, a process known as validation.

TDI President Chris Lawrence subsequently published a series of blog posts responding to the specific comments and criticisms leveled by Alt, Morey, and Larimer.

How are combat models and simulations tested to see if they portray real-world combat accurately? Are they actually tested?

Engaging the Phalanx

How can we know if combat simulations adhere to strict standards established by the DoD regarding validation? Perhaps the validation reports can be released for peer review.

Validation

Some claim that models of complex combat behavior cannot really be tested against real-world operational experience, but this has already been done. Several times.

Validating Attrition

If only the “physics-based aspects” of combat models are empirically tested, do those models reliably represent real-world combat with humans or only the interactions of weapons systems?

Physics-based Aspects of Combat

Is real-world historical operational combat experience useful only for demonstrating the capabilities of combat models, or is it something the models should be able to reliably replicate?

Historical Demonstrations?

If a Subject Matter Expert (SME) can be substituted for a proper combat model validation effort, then could not a SME simply be substituted for the model? Should not all models be considered expert judgement quantified?

SMEs

What should be done about the “Base of Sand” problem? Here are some suggestions.

Engaging the Phalanx (part 7 of 7)

Persuading the military operations research community of the importance of research on real-world combat experience in modeling has been an uphill battle with a long history.

Diddlysquat

And the debate continues…

Engaging the Phalanx (part 7 of 7)

Hopefully this is my last post on the subject (but I suspect not, as I expect a public response from the three TRADOC authors). This is in response to the article in the December 2018 issue of the Phalanx by Alt, Morey and Larimer (see Part 1, Part 2, Part 3, Part 4, Part 5, Part 6). The issue here is the “Base of Sand” problem, which is what the original blog post that “inspired” their article was about:

Wargaming Multi-Domain Battle: The Base Of Sand Problem

While the first paragraph of their article addressed this blog post and they reference Paul Davis’ 1992 Base of Sand paper in their footnotes (but not John Stockfish’s paper, which is an equally valid criticism), they then do not discuss the “Base of Sand” problem further. They do not actually state whether this is a problem or not a problem. I gather by this notable omission that in fact they do understand that it is a problem, but being employees of TRADOC they are limited as to what they can publicly say. I am not.

I do address the “Base of Sand” problem in my book War by Numbers, Chapter 18. It has also been addressed in a few other posts on this blog. We are critics because we do not see significant improvement in the industry. In some cases, we are seeing regression.

In the end, I think the best solution for the DOD modeling and simulation community is not to “circle the wagons” and defend what they are currently doing, but instead acknowledge the limitations and problems they have and undertake a corrective action program. This corrective action program would involve: 1) Properly addressing how to measure and quantify certain aspects of combat (for example: Breakpoints) and 2) Validating these aspects and the combat models these aspects are part of by using real-world combat data. This would be an iterative process, as you develop and then test the model, then further develop it, and then test it again. This moves us forward. It is a more valued approach than just “circling the wagons.” As these models and simulations are being used to analyze processes that may or may not make us fight better, and may or may not save American service members lives, then I think it is important enough to do right. That is what we need to be focused on, not squabbling over a blog post (or seven).

Has The Army Given Up On Counterinsurgency Research, Again?

Mind-the-Gap

[In light of the U.S. Army’s recent publication of a history of it’s involvement in Iraq from 2003 to 2011, it may be relevant to re-post this piece from from 29 June 2016.]

As Chris Lawrence mentioned yesterday, retired Brigadier General John Hanley’s review of America’s Modern Wars in the current edition of Military Review concluded by pointing out the importance of a solid empirical basis for staff planning support for reliable military decision-making. This notion seems so obvious as to be a truism, but in reality, the U.S. Army has demonstrated no serious interest in remedying the weaknesses or gaps in the base of knowledge underpinning its basic concepts and doctrine.

In 2012, Major James A. Zanella published a monograph for the School of Advanced Military Studies of the U.S. Army Command and General Staff College (graduates of which are known informally as “Jedi Knights”), which examined problems the Army has had with estimating force requirements, particularly in recent stability and counterinsurgency efforts.

Historically, the United States military has had difficulty articulating and justifying force requirements to civilian decision makers. Since at least 1975, governmental officials and civilian analysts have consistently criticized the military for inadequate planning and execution. Most recently, the wars in Afghanistan and Iraq reinvigorated the debate over the proper identification of force requirements…Because Army planners have failed numerous times to provide force estimates acceptable to the President, the question arises, why are the planning methods inadequate and why have they not been improved?[1]

Zanella surveyed the various available Army planning tools and methodologies for determining force requirements, but found them all either inappropriate or only marginally applicable, or unsupported by any real-world data. He concluded

Considering the limitations of Army force planning methods, it is fair to conclude that Army force estimates have failed to persuade civilian decision-makers because the advice is not supported by a consistent valid method for estimating the force requirements… What is clear is that the current methods have utility when dealing with military situations that mirror the conditions represented by each model. In the contemporary military operating environment, the doctrinal models no longer fit.[2]

Zanella did identify the existence of recent, relevant empirical studies on manpower and counterinsurgency. He noted that “the existing doctrine on force requirements does not benefit from recent research” but suggested optimistically that it could provide “the Army with new tools to reinvigorate the discussion of troops-to-task calculations.”[3] Even before Zanella published his monograph, however, the Defense Department began removing any detailed reference or discussion about force requirements in counterinsurgency from Army and Joint doctrinal publications.

As Zanella discussed, there is a body of recent empirical research on manpower and counterinsurgency that contains a variety of valid and useful insights, but as I recently discussed, it does not yet offer definitive conclusions. Much more research and analysis is needed before the conclusions can be counted on as a valid and justifiably reliable basis for life and death decision-making. Yet, the last of these government sponsored studies was completed in 2010. Neither the Army nor any other organization in the U.S. government has funded any follow-on work on this subject and none appears forthcoming. This boom-or-bust pattern is nothing new, but the failure to do anything about it is becoming less and less understandable.

NOTES

[1] Major James A. Zanella, “Combat Power Analysis is Combat Power Density” (Ft. Leavenworth, KS: School of Advanced Military Studies, U.S. Army Command and General Staff College, 2012), pp. 1-2.

[2] Ibid, 50.

[3] Ibid, 47.

Historians and the Early Era of U.S. Army Operations Research

While perusing Charles Shrader’s fascinating history of the U.S. Army’s experience with operations research (OR), I came across several references to the part played by historians and historical analysis in early era of that effort.

The ground forces were the last branch of the Army to incorporate OR into their efforts during World War II, lagging behind the Army Air Forces, the technical services, and the Navy. Where the Army was a step ahead, however, was in creating a robust wartime historical field history documentation program. (After the war, this enabled the publication of the U.S. Army in World War II series, known as the “Green Books,” which set a new standard for government sponsored military histories.)

As Shrader related, the first OR personnel the Army deployed forward in 1944-45 often crossed paths with War Department General Staff Historical Branch field historian detachments. They both engaged in similar activities: collecting data on real-world combat operations, which was then analyzed and used for studies and reports written for the use of the commands to which they were assigned. The only significant difference was in their respective methodologies, with the historians using historical methods and the OR analysts using mathematical and scientific tools.

History and OR after World War II

The usefulness of historical approaches to collecting operational data did not go unnoticed by the OR practitioners, according to Shrader. When the Army established the Operations Research Office (ORO) in 1948, it hired a contingent of historians specifically for the purpose of facilitating research and analysis using WWII Army records, “the most likely source for data on operational matters.”

When the Korean War broke out in 1950, ORO sent eight multi-disciplinary teams, including the historians, to collect operational data and provide analytical support for U.S. By 1953, half of ORO’s personnel had spent time in combat zones. Throughout the 1950s, about 40-43% of ORO’s staff was comprised of specialists in the social sciences, history, business, literature, and law. Shrader quoted one leading ORO analyst as noting that, “there is reason to believe that the lawyer, social scientist or historian is better equipped professionally to evaluate evidence which is derived from the mind and experience of the human species.”

Among the notable historians who worked at or with ORO was Dr. Hugh M. Cole, an Army officer who had served as a staff historian for General George Patton during World War II. Cole rose to become a senior manager at ORO and later served as vice-president and president of ORO’s successor, the Research Analysis Corporation (RAC). Cole brought in WWII colleague Forrest C. Pogue (best known as the biographer of General George C. Marshall) and Charles B. MacDonald. ORO also employed another WWII field historian, the controversial S. L. A. Marshall, as a consultant during the Korean War. Dorothy Kneeland Clark did pioneering historical analysis on combat phenomena while at ORO.

The Demise of ORO…and Historical Combat Analysis?

By the late 1950s, considerable institutional friction had developed between ORO, the Johns Hopkins University (JHU)—ORO’s institutional owner—and the Army. According to Shrader,

Continued distrust of operations analysts by Army personnel, questions about the timeliness and focus of ORO studies, the ever-expanding scope of ORO interests, and, above all, [ORO director] Ellis Johnson’s irascible personality caused tensions that led in August 1961 to the cancellation of the Army’s contract with JHU and the replacement of ORO with a new, independent research organization, the Research Analysis Corporation [RAC].

RAC inherited ORO’s research agenda and most of its personnel, but changing events and circumstances led Army OR to shift its priorities away from field collection and empirical research on operational combat data in favor of the use of modeling and wargaming in its analyses. As Chris Lawrence described in his history of federally-funded Defense Department “think tanks,” the rise and fall of scientific management in DOD, the Vietnam War, social and congressional criticism, and an unhappiness by the military services with the analysis led to retrenchment in military OR by the end of the 60s. The Army sold RAC and created its own in-house Concepts Analysis Agency (CAA; now known as the Center for Army Analysis).

By the early 1970s, analysts, such as RAND’s Martin Shubik and Gary Brewer, and John Stockfisch, began to note that the relationships and processes being modeled in the Army’s combat simulations were not based on real-world data and that empirical research on combat phenomena by the Army OR community had languished. In 1991, Paul Davis and Donald Blumenthal gave this problem a name: the “Base of Sand.”

Validating Attrition

Continuing to comment on the article in the December 2018 issue of the Phalanx by Alt, Morey and Larimer (this is part 3 of 7; see Part 1, Part 2)

On the first page (page 28) in the third column they make the statement that:

Models of complex systems, especially those that incorporate human behavior, such as that demonstrated in combat, do not often lend themselves to empirical validation of output measures, such as attrition.

Really? Why can’t you? If fact, isn’t that exactly the model you should be validating?

More to the point, people have validated attrition models. Let me list a few cases (this list is not exhaustive):

1. Done by Center for Army Analysis (CAA) for the CEM (Concepts Evaluation Model) using Ardennes Campaign Simulation Study (ARCAS) data. Take a look at this study done for Stochastic CEM (STOCEM): https://apps.dtic.mil/dtic/tr/fulltext/u2/a489349.pdf

2. Done in 2005 by The Dupuy Institute for six different casualty estimation methodologies as part of Casualty Estimation Methodologies Studies. This was work done for the Army Medical Department and funded by DUSA (OR). It is listed here as report CE-1: http://www.dupuyinstitute.org/tdipub3.htm

3. Done in 2006 by The Dupuy Institute for the TNDM (Tactical Numerical Deterministic Model) using Corps and Division-level data. This effort was funded by Boeing, not the U.S. government. This is discussed in depth in Chapter 19 of my book War by Numbers (pages 299-324) where we show 20 charts from such an effort. Let me show you one from page 315:

 

So, this is something that multiple people have done on multiple occasions. It is not so difficult that The Dupuy Institute was not able to do it. TRADOC is an organization with around 38,000 military and civilian employees, plus who knows how many contractors. I think this is something they could also do if they had the desire.

 

Validation

Continuing to comment on the article in the December 2018 issue of the Phalanx by Jonathan Alt, Christopher Morey and Larry Larimer (this is part 2 of 7; see part 1 here).

On the first page (page 28) top of the third column they make the rather declarative statement that:

The combat simulations used by military operations research and analysis agencies adhere to strict standards established by the DoD regarding verification, validation and accreditation (Department of Defense, 2009).

Now, I have not reviewed what has been done on verification, validation and accreditation since 2009, but I did do a few fairly exhaustive reviews before then. One such review is written up in depth in The International TNDM Newsletter. It is Volume 1, No. 4 (February 1997). You can find it here:

http://www.dupuyinstitute.org/tdipub4.htm

The newsletter includes a letter dated 21 January 1997 from the Scientific Advisor to the CG (Commanding General)  at TRADOC (Training and Doctrine Command). This is the same organization that the three gentlemen who wrote the article in the Phalanx work for. The Scientific Advisor sent a letter out to multiple commands to try to flag the issue of validation (letter is on page 6 of the newsletter). My understanding is that he received few responses (I saw only one, it was from Leavenworth). After that, I gather there was no further action taken. This was a while back, so maybe everything has changed, as I gather they are claiming with that declarative statement. I doubt it.

This issue to me is validation. Verification is often done. Actual validations are a lot rarer. In 1997, this was my list of combat models in the industry that had been validated (the list is on page 7 of the newsletter):

1. Atlas (using 1940 Campaign in the West)

2. Vector (using undocumented turning runs)

3. QJM (by HERO using WWII and Middle-East data)

4. CEM (by CAA using Ardennes Data Base)

5. SIMNET/JANUS (by IDA using 73 Easting data)

 

Now, in 2005 we did a report on Casualty Estimation Methodologies (it is report CE-1 list here: http://www.dupuyinstitute.org/tdipub3.htm). We reviewed the listing of validation efforts, and from 1997 to 2005…nothing new had been done (except for a battalion-level validation we had done for the TNDM). So am I now to believe that since 2009, they have actively and aggressively pursued validation? Especially as most of this time was in a period of severely declining budgets, I doubt it. One of the arguments against validation made in meetings I attended in 1987 was that they did not have the time or budget to spend on validating. The budget during the Cold War was luxurious by today’s standards.

If there have been meaningful validations done, I would love to see the validation reports. The proof is in the pudding…..send me the validation reports that will resolve all doubts.