Streetwise Professor

September 8, 2010

Yeah, So Whatabout Cromwell?

Filed under: History,Politics,Russia — The Professor @ 3:12 pm

Obviously an adherent of the view that the best defense is a good offense, Vladimir Putin responded to a question from a British reporter (Richard Beeston) about the continued presence of Lenin’s corpose in Red Square by bringing up–Oliver Cromwell. Huh?

All was well until I asked him about Vladimir Lenin, whose body still lies in state in Red Square, Moscow. Was it not time to bury him?

The Russian leader’s piercing blue eyes narrowed and he lost his composure: “What about Cromwell, was he any better than Lenin? There are memorials to Cromwell all over Britain. The Russian people will in time decide what happens to Lenin.”

Here’s a more complete account:

Richard Beeston: Thank you Prime Minister. Richard Beeston, The Times, London. Last week we talked about the past, about Russian history and especially about the turbulent 20th century which was fatal for many Russians. I am amazed that with seven years to the centenary of the Bolshevik Revolution, Vladimir Lenin is still lying on display in the Mausoleum in Red Square, with guards standing around him. Don’t you think its a good idea to finally bury him before this event, to help Russia turn a new page? Thank you.

Vladimir Putin: Are you from Great Britain?

Richard Beeston: Yes.

Vladimir Putin: Then I have a question for you. Was Cromwell better or worse than Stalin?

Richard Beeston: Probably just as bad. But he is not displayed on Trafalgar Square, but somewhere in Westminster, at the back.

Vladimir Putin: But there are monuments to him all over Britain, everything in its season. When time comes, the Russian people will decide what to do. History is something that avoids hassle. Next question please.”

Although the Irish might disagree, Beeston was wrong to equate Cromwell with Stalin: the latter’s crimes dwarf the former’s.

More to the point, Putin completely ignores the actual question–the continued presence of Lenin on Red Square.  That’s a different thing altogether than the statues of Cromwell scattered across Britain.

Indeed, the historical comparison is quite illuminating.  After the Restoration of Charles II (whose father Cromwell had beheaded), Cromwell’s corpse was dug up from Westminster, hanged, drawn, and quartered; the punishment meted out to traitors.  His head was displayed on a pike for decades afterwards, and then was sold from hand to hand, and displayed as a curiosity for centuries thereafter.

Continued display in a referential way in the capital vs. symbolic repudiation, desecration, and ridicule.  See the difference?  I knew you could.

And let’s say, counterfactually, that the UK continued to let Cromwell rest in an honored place.  Would that be appropriate?  Would this justify the continued honoring of another mass murderer (who also put Cromwell to shame, BTW)?

By going on the offensive (in multiple meanings of the word), Putin avoids answering, in an honest way, the justification for honoring Lenin long after the USSR fell.  As for the “Russian people will decide”: As if.  When have the Russian people been allowed to decide anything?

Seldom does one see such 190 proof, distilled essence of whataboutism.  (Well, sometimes in the comments at SWP.)  It is a low rhetorical dodge intended to allow the whatabouter to escape confronting the issue head on.  It would be better if Putin gave a full-throated defense of Lenin, or of the propriety of keeping his embalmed corpse on display.  You might disagree with his reasoning, but at least if he answered so he would be respecting the question and the questioner, and acknowledging the weightiness of the issue.

But perhaps the whataboutism is telling: perhaps it betrays that even Putin knows that he cannot make a reasoned and reasonable defense, so he feels compelled to stoop to such discreditable rhetorical dodges.

More substantively, reporting from Putin’s Valdai performance almost universally agrees that it indicates he is planning to resume the presidency.

What, this is news?

Print Friendly, PDF & Email


  1. Ostap and Sublime (#44 and 47),

    Sorry to disappoint you, but Putin did say “Stalin” instead of “Lenin” — but clearly this time it was just a verbal slip rather than yet another demonstration of his hopeless inability to handle unscripted questions.

    Comment by peter — September 15, 2010 @ 5:38 am

  2. Gee Ostap, you get worse with every post.

    Now you are making up theories to try and dig yourself out of an ever deepening hole caused mainly by your lies.

    Stop digging Ostap, better for you if you get your facts right next time, though given your track record that may be impossible.

    By the way, in your Russian transcript Putin says STALIN, and Beeston replied. Beeston did not confuse Stalin with Lenin. Putin tried to change the subject because, like the authoritarian wanker that he is he is extremely uncomfortable with a free press.

    Now, how about an apology for your lies regarding the number of statues of Cromwell.

    By the way, Lenin was every bit as evil as Stalin, probably more so, as he envisaged spreading his evil over the whole planet.

    Comment by Andrew — September 15, 2010 @ 6:48 am

  3. To put this conversation in perspective, Putin’s Second Chechen war has claimed anywhere from 25,000 to 50,000 Chechen civilian lives. Since modern Chechnya has approximately the same population as Ireland in the 17th century, this makes Putin, whose war caused the deaths of 2.5 to 5 percent of the Chechen population, himself 1/4 to 1/2 as bad as Cromwell. Such chutzpah, bringing up Cromwell.

    But let’s not single out Putin. The previous British prime minister and Bush together caused the deaths of 600,000 Iraqi civilians (according to the Lancet) out of a population of 31 million. Percentage-wise this is slightly less bad (2% of the Iraqi population) than Russian totals in Chechnya but in terms of raw numbers of civilian deaths Bush and Blair far “outshine” Putin. When it comes to being a global turn-of-the-century civilian-killer Putin may be in the big leagues but he’s certainly not the champion. Funny is not quite the right word for it, but it almost seems like another example where Russia cannot quite catch up to the USA.

    Comment by AP — September 15, 2010 @ 8:30 am

  4. Well, 600,000 is a bit of an overestimate to say the least.

    The Lancets methodology has been widely ridiculed.

    Iraq Body Count put the number at between 97,994 and 106,954

    In addition, in Iraq & Afghanistan 70% to 80% of the deaths are caused by Muslim extremists.

    You know, the Taliban, Al Qaida in Iraq, those guys, the ones who put car bombs in markets etc.

    In Chechnya, the Russians were responsible for around 80% of the deaths, due to the mass carpet bombings of places like Grozny.

    Non Russian estimates of the civilian dead in Chechnya run as high as 250,000.

    Intersting to compare the civilian deaths caused by Russia in Afghanistan 1979-1989 to the (much lower) death toll since the UN mandated invasion by (predominantly) western forces.

    The Soviet invasion led to 1,000,000 Afghan civilians being killed in 10 years.
    The (UN mandated) US led invasion led to around 14,000 deaths.

    Comment by Andrew — September 16, 2010 @ 5:22 am

  5. Oh, and about that Lancet article, and the severely flawed data on which it was based:

    “Exaggerated claims, substandard research, and a disservice to truth

    ORB’s “million Iraqi deaths” survey seriously flawed, new study shows

    1 ILCS P.54 (UNDP), IFHS (WHO)
    There have been several survey-based attempts to roughly estimate the number of Iraqis killed as a result of the 2003 invasion and subsequent conflict. It is unfortunate that the most careful and well-resourced survey work in this area (from the UNDP and WHO)1 has been scarcely visible, while the most flawed and inadequate work has dominated public discourse. This has been largely due to the shocking (but ultimately numbing) effect of their hugely exaggerated death toll figures.

    Comment by Andrew — September 16, 2010 @ 5:42 am

  6. Why not compare the Soviet invasion of Afghanistan to Vietnam? The carpet bombing of Grozny was under Yeltsin, and we are discussing Putin here. Yeltsin’s war was even bloodier than as Putin’s. With respect to Iraq, I didn’t list the ORB total of 1 million deaths but the Lancet’s total of 600,000 deaths – it would seem your article attacking the ORB’s findings is irrelevent here. The killings by Islamic extremists are Bush’s fault – Saddam had kept those people under control and they would not have killed anyone if not for Bush/Blair’s invasion of Iraq. I suppose if Kadyrov engages in a bloodbath in Chechnya you won’t hold Putin responsible? I would.

    Comment by AP — September 16, 2010 @ 8:46 am

  7. Sorry AP, you are wrong on that score.

    The greatest destruction of Grozny occurred under Putin. More Chechens died under Putins invasion than died in the first war.

    In both wars Russian armed forces used carpet bombing in both wars, and in fact used far higher concentrations of firepower in the second, as well as other refinements such as targeting markets with IRBM’s (theatre ballistic missiles), and extending the carpet bombing to towns and villages throughout the republic of Chechnya.

    And if you had even bothered to read the linked article you would have read:

    Iraq Body Count (IBC) applied an early and so far unanswered set of reality checks to the Johns Hopkins survey published in the Lancet in October 2006, a paper which has recently been comprehensively discredited in a new study by Prof. Michael Spagat of Royal Holloway University. Even among the generally inexact survey results for deaths in Iraq the “Lancet estimate” was an extreme outlier, asserting 450,000 more deaths from violence than the much larger WHO-funded study that estimated 151,000 such deaths by July 2006. The only evidence that appeared to support the Lancet finding was published by a polling company, Opinion Research Business (ORB), which estimated 1 million violent Iraqi deaths by August 2007.

    The ORB drivel was the main basis for the highly unreliable and factually inaccurate Lancet study.

    As to Vietnam, interestingly the vast overwhelming number of deaths were caused by the Communist North Vietnamese invading South Vietnam.

    Funny how people like you forget that part.

    No communist invasion of South Vietnam, no war.

    Saddam killed just as many, if not more, of his own people than anyone else. He just did it by gassing them, or shooting them tidily in police stations and burying them in unmarked graves.

    Then there is his genocide against the Marsh Arabs and the Kurds, his campaigns against the Shia Muslims etc. Sorry, your argument falls flat.

    Comment by Andrew — September 16, 2010 @ 12:21 pm

  8. Oh, and a picture from December 16, 1999
    before the carpet bombing

    And one from March 16, 2000
    after the carpet bombing

    Comment by Andrew — September 16, 2010 @ 12:24 pm

  9. I read the article you linked to and although iut claiemd the Lancet study gave figures that were too high it did not supply any evidence refuting the Lancet study nor offer any specific critiques of the Lancet study’s methodology, other than saying that the discredited ORB study supported it. The article did describe how the ORB study was flawed. It did not state that the Lancet study was in any way based on the bad ORB study, as you claimed above. Therefore your link doesn’t refute the Lancet study; it merely says than another study with similar results was flawed and described why that other study was flawed.

    As for the Chechen wars, wikipedia (for what it’s worth) gives far higher casualties for the first war (50,000-100,000 dead civilians) than for the second (25,000-50,000), although many of the first war’s casualties were ethnic Russian residents of Grozny (perhaps 35,000 of them). I suppose it’s possible that a few thousand more ethnic Chechens may have died in the second war, taking into account ethnic Russian casualties in the first.

    I’m not sure what Saddam’s own killing rate has to do with this conversation – to bump Putin down to third place after Bush and Saddam? Saddam wasn’t engaging in mass killings for years before Iraq was invaded. AFAIK Saddam killed about 300,000 people in his long career, almost none of whom were murdered after the early 90’s.

    Comment by AP — September 16, 2010 @ 2:05 pm

  10. Wikipedia is not that reliable a source of information on contentious issues AP.

    Furthermore, unless you are illiterate the quote Even among the generally inexact survey results for deaths in Iraq the “Lancet estimate” was an extreme outlier, asserting 450,000 more deaths from violence than the much larger WHO-funded study that estimated 151,000 such deaths by July 2006. The only evidence that appeared to support the Lancet finding was published by a polling company, Opinion Research Business (ORB), which estimated 1 million violent Iraqi deaths by August 2007. usually is taken to show that the only data supporting the Lancet article was that provided by the discredited ORB survey, and that the Lancet study was fundamentally flawed.


    Number Crunching
    Taking another look at the Lancet’s Iraq study.
    By Fred Kaplan
    Posted Friday, Oct. 20, 2006, at 6:34 PM ET
    Let us take another look at the Lancet study. This is the report, issued by a team from Johns Hopkins University and published in the current issue of British medical journal the Lancet, estimating that 655,000 Iraqis have died as a consequence of the U.S.-led invasion. It’s a shocking number. Is it true?
    Initially, I decided to stay out of this controversy. I’d written the first critique of an earlier Lancet/Hopkins study, which estimated that 100,000 Iraqis had died in just the first year of the war. The study’s sample was too small, the data-gathering too slipshod, the range of uncertainty so wide as to render the estimate useless.
    The new study looked better: a larger sample, more fastidious attention to data-gathering procedures, a narrower range of uncertainty. The number—655,000 deaths—seemed improbably high (that’s an average of 20,000 deaths a month since the war began), but so have a lot of savage statistics that turned out to be true; and there are many areas of Iraq these days where reporters and human rights groups dare not roam.
    However, the more I read the study and the more I talked with statisticians, the flimsier this number appeared. The study might be as good an effort as anyone can manage in wartime. Certainly, the Iraqis who went door to door conducting the surveys are amazingly brave souls. But the study has two major flaws—the upshot of which is that it’s impossible to infer anything meaningful from it, except that a lot of Iraqis have died and the number is getting higher.
    This point should be emphasized. Let’s say that the study is way off, off by a factor of 10 or five—in other words, that the right number isn’t 655,000 but something between 65,500 and 131,000. That is still a ghastly number—a number that, apart from all other considerations, renders this war a monumental mistake. Here’s the key question: Had it been known ahead of time that invading Iraq would result in the deaths of 100,000 Iraqis (or 50,000, or pick your own threshold number), would the president have made—would Congress have voted to authorize, would any editorial writer or public figure have endorsed—a decision to go to war?
    Here lies the danger of studies that overstate a war’s death toll. The war’s supporters and apologists latch on to the inevitable debunkings and proclaim that really “only 100,000” or “only 200,000” people have died. It’s obscene—it sullies and coarsens the political culture—to place the word “only” in front of such numbers.
    So, let’s look at this study’s numbers and why they’re almost certainly overstated.
    The researchers reached this conclusion through a common technique known as “cluster sampling.” They randomly selected 47 neighborhoods in 18 of Iraq’s regions. Within those neighborhoods, they visited a total of 1,849 households, comprising 12,801 residents, and asked how many of their members had died before the invasion and since the invasion. The researchers then extrapolated from this sample to the entire Iraqi population of 27 million people—from which they concluded that since the war there have been about 655,000 “excess deaths,” of which 601,000 were caused by violence.
    This methodology is entirely proper if the sample was truly representative of the entire population—i.e., as long as those households were really randomly selected. If they were not randomly selected—if some bias crept into the sampling, even unintentionally—then it is improper, and wildly misleading, to extrapolate the findings to the population as a whole.
    There are two reasons to suspect that the sample was not random, and one of those reasons suggests that the sample was biased in a way that exaggerates the death toll.
    First, the Lancet study, like all such studies, estimates not how many people have died, but rather the difference between how many people died in a comparable period before the invasion and how many people have died since the invasion. As the study puts it, 655,000 is roughly the number of deaths “above the number that would be expected in a non-conflict situation.”
    In any such study, it’s crucial that the base-line number—deaths before the invasion—is correct. The Lancet study’s base-line number is dubious.
    Based on the household surveys, the report estimates that, just before the war, Iraq’s mortality rate was 5.5 per 1,000. (That is, for every 1,000 people, 5.5 die each year.) The results also show that, in the three and a half years since the war began, this rate has shot up to 13.3 per 1,000. So, the “excess deaths” amount to 7.8 (13.3 minus 5.5) per 1,000. They extrapolate from this figure to reach their estimate of 655,000 deaths.
    However, according to data from the United Nations, based on surveys taken at the time, Iraq’s preinvasion mortality rate was 10 per 1,000. The difference between 13.3 and 10.0 is only 3.3, less than half of 7.8.
    Does that mean that the post-invasion death toll is less than half of 655,000? Not necessarily. You can’t just take the data from one survey and plug them into another survey. Maybe the Hopkins survey understated post-invasion deaths as much as it understated preinvasion deaths—in which case, the net effect is nil. Maybe not. Either way, it should have been clear to the data-crunchers that something was wrong with the numbers for preinvasion and post-invasion deaths, since they were derived from the same survey.
    “When you get these large discrepancies between your own results and results that are already well-established, you recrunch your numbers or you send your survey team back into the field to widen your sample,” Beth Osborne Daponte, a demographer at Yale University who has worked on many studies of this sort, told me in a phone interview. “Obviously, they couldn’t do that here. It’s too dangerous. But that doesn’t change the point. You need to triangulate your data”—to make sure they match other data, or, if they don’t, to figure out why. “They didn’t do that.”
    (If the Hopkins researchers want to claim that their estimate is more reliable than the United Nations’, they will have to prove the point. It is also noteworthy that, if Iraq’s preinvasion mortality rate really was 5.5 per 1,000, it was lower than that of almost every country in the Middle East, and many countries in Western Europe.)
    This flaw—or discrepancy—doesn’t tell you whether 655,000 is too high, too low, or (serendipitously) just right. It just tells you that something about the number is almost certainly off.
    However, the second flaw suggests that the number is almost certainly too high.
    A joint research team led by physicists Sean Gourley and Neil Johnson of Oxford University and economist Michael Spagat at Royal Holloway University in London noticed the second flaw. In a statement released Thursday (and reported in today’s issue of the journal Science), they charged that the Lancet study is “fundamentally flawed”—and in a way that systematically overstates the death toll.
    The Lancet study, in its section on methodology, notes that the teams picked the houses they would survey from a “random selection of main streets,” defined as “major commercial streets and avenues.” (Italics added.) They also chose from a “list of residential streets crossing” those main streets.
    The Oxford-Holloway team calls this method “main street bias.” They add:
    Main street bias inflates casualty rates since conflict events such as car bombs, drive-by shootings, artillery strikes on insurgent positions, and marketplace explosions gravitate toward the same neighborhood types that the [Lancet] researchers surveyed. …
    In short, the closer you are to a main road, the more likely you are to die in violent activity. So if researchers only count people living close to a main road, then it comes as no surprise they will over-count the dead.
    Whether or not the Hopkins researchers were aware of this flaw, or its importance, is unclear. An exchange of e-mails with Gilbert Burnham, the study’s chief researcher, raises some disturbing questions about this matter. (Click here for the details.)
    It’s understandable why the surveyors limited their work to the main roads; they were in strange and dangerous places. But that doesn’t negate the Oxford-Holloway team’s point. By this measure alone, the Lancet study is not a random survey. In statistically proper random surveys, each household has the same probability of being chosen. Yet in the Lancet survey, if a household wasn’t on or near a main road, it had zero chance of being chosen. And “cluster samples” cannot be seen as representative of the entire population unless they are chosen randomly.
    The Iraq war is a catastrophe in political, military, and—not least—human terms. How much so may be unfathomable as long as the streets of Iraq are still dangerous. In any event, it’s a question that the Lancet study doesn’t really answer.

    There, is that enough?

    Try actually reading AP, its good for the brain.

    Comment by Andrew — September 16, 2010 @ 11:34 pm

  11. Andrew, can’t you disagree with someone in a civil way (“illiterate”…”try reading, it’s good for the brain.”)? As I said, the first paragraph you provided showed no critique of the methodology of the Lancet study. It stated that the ORB study was the only one that “appeared to support” the Lancet study, not that the Lancet study was in any way based on the discredited ORB study. Incidentally, the ORB study’s death toll of 1 million is about as far from the Lancet’s 600,000 as the Lancet is from the WHO’s 150,000. Does that mean the Lancet study “appears to support” the WHO-funded study?

    As for the Slate article, those concerns have been addressed. From here:

    About the alleged “main street bias”:

    Prof Burnham said the researchers penetrated much further into residential areas than was clear from the Lancet paper. The notion ‘that we avoided back alleys was totally untrue’. He added that 28% of households were in rural areas – which matches the population spread.”

    About the allegedly low pre-invasion death rate of 5.5:

    Lancet 2 found a pre-invasion death rate of 5.5/ per 1000 people per year. The UN has as estimate of 10? Isn’t that evidence of inaccuracy in the study?

    LR: The last census in Iraq was a decade ago and I suspect the UN number is somewhat outdated. The death rate in Jordan and Syria is about 5. Thus, I suspect that our number is valid. …

    The pre-invasion death rate you found for Iraq was lower than for many rich countries. Is it credible that a poor country like Iraq would have a lower death rate than a rich country like Australia?

    LR: Yes. Jordan and Syria have death rates far below that of the UK because the population in the Middle-east is so young. Over half of the population in Iraq is under 18. Elderly populations in the West are a larger part of the population profile and they die at a much higher rate.


    An Oct. 11, 2006 Washington Post article[4] reports:

    Ronald Waldman, an epidemiologist at Columbia University who worked at the Centers for Disease Control and Prevention for many years, called the survey method “tried and true,” and added that “this is the best estimate of mortality we have.”

    An October 16, 2006 MediaLens article quotes many health experts, epidemiologists, biostatistics experts, polling experts, etc. who approve of the Lancet study and methodology.[82] For example:

    John Zogby, whose New York-based polling agency, Zogby International, has done several surveys in Iraq since the war began, said: “The sampling is solid. The methodology is as good as it gets. It is what people in the statistics business do.” …
    Professor Sheila Bird of the Biostatistics Unit at the Medical Research Council said: “They have enhanced the precision this time around and it is the only scientifically based estimate that we have got where proper sampling has been done and where we get a proper measure of certainty about these results.”


    This would seem to cover the objections raised in the Slate article you posted. The researchers did go to the back alleys, and the pre-war Iraqi death rate of 5.5/1000 was similar to Syria’s and Jordan’s death rate of 5/1000 due to the young population in these countries. Although a small minority disagres with the Lancet study it enjoys broad support among the experts.

    Comment by AP — September 17, 2010 @ 1:28 am

  12. No AP, it does not enjoy “broad support” amongst experts.

    The Iraq Body Count project (IBC), who compiles a database of reported civilian deaths, has criticised the Lancet’s estimate of 601,000 violent deaths[26] out of the Lancet estimate of 654,965 total excess deaths related to the war. The IBC argues that the Lancet estimate is suspect “because of a very different conclusion reached by another random household survey, the Iraq Living Conditions Survey 2004 (ILCS), using a comparable method but a considerably better-distributed and much larger sample.” IBC also enumerates several “shocking implications” which would be true if the Lancet report were accurate, e.g. “Half a million death certificates were received by families which were never officially recorded as having been issued” and claims that these “extreme and improbable implications” and “utter failure of local or external agencies to notice and respond to a decimation of the adult male population in key urban areas” are some of several reasons why they doubt the study’s estimates. IBC states that these consequences would constitute “extreme notions”.[27]
    Jon Pedersen of the Fafo Institute[28] and research director for the ILCS survey, which estimated approximately 24,000 (95% CI 18,000-29,000) war-related deaths in Iraq up to April 2004, expressed reservations about the low pre-war mortality rate used in the Lancet study and about the ability of its authors to oversee the interviews properly as they were conducted throughout Iraq. Pedersen has been quoted saying he thinks the Lancet numbers are “high, and probably way too high. I would accept something in the vicinity of 100,000 but 600,000 is too much.”[29]
    Debarati Guha-Sapir, director of the Centre for Research on the Epidemiology of Disasters in Brussels, was quoted in an interview for saying that Burnham’s team have published “inflated” numbers that “discredit” the process of estimating death counts. “Why are they doing this?” she asks. “It’s because of the elections.”.[30] However, another interviewer a week later paints a more measured picture of her criticisms: “She has some methodological concerns about the paper, including the use of local people — who might have opposed the occupation — as interviewers. She also points out that the result does not fit with any she has recorded in 15 years of studying conflict zones. Even in Darfur, where armed groups have wiped out whole villages, she says that researchers have not recorded the 500 predominately violent deaths per day that the Johns Hopkins team estimates are occurring in Iraq. But overall Guha-Sapir says the paper contains the best data yet on the mortality rate in Iraq.”[31] A subsequent article co-authored by Guha-Sapir and Olivier Degomme for CRED reviews the Lancet data in detail. It concludes that the Lancet overestimated deaths and that the war-related death toll was most likely to be around 125,000 for the period covered by the Lancet study, reaching its conclusions by correcting errors in the 2006 Lancet estimate and triangulating with data from IBC and ILCS.[32]
    Beth Osborne Daponte, a demographer known for producing death estimates for the first Gulf War, evaluates the Lancet survey and other sources in a paper for the International Review of the Red Cross.[33] Among other criticisms, Daponte questions the reliability of pre-war estimates used in the Lancet study to derive its “excess deaths” estimate, and the ethical approval for the survey. She concludes that the most reliable information available to date is provided by the Iraq Family Health Survey, the Iraq Living Conditions Survey and Iraq Body Count.
    Mark van der Lann, professor of biostatistics and statistics at UC Berkeley, disputes the estimates of both Lancet studies on several grounds in a paper co-authored with writer Leon de Winter.[34] The authors argue that the confidence intervals in the Lancet study are too narrow, saying, “our statistical analysis could at most conclude that the total number of violent deaths is more than 100.000 with a 0.95 confidence – but this takes not into account various other potential biases in the original data.” Among the main conclusions of their evaluation are that “the estimates based upon these data are extremely unreliable and cannot stand a decent scientific evaluation. It may be that the number of violent deaths is much higher than previously reported, but this specific report, just like the October 2004 report, cannot support the estimates that have been flying around the world on October 29, 2006. It is not science. It is propaganda.”
    Fred Kaplan of Slate criticized the first Lancet study and has again raised concerns about the second.[35][36] Kaplan argues that the second study has made some improvements over the first, such as “a larger sample, more fastidious attention to data-gathering procedures, a narrower range of uncertainty”, and writes that “this methodology is entirely proper if the sample was truly representative of the entire population—i.e., as long as those households were really randomly selected.” He cites the low pre-war mortality estimate and the “main street bias” critique as two reasons for doubting that the sample in this study was truly random. And he concludes saying that the question of the war’s human toll is “a question that the Lancet study doesn’t really answer”.
    Dr. Hicks published a thorough account and clarification of these concerns, which concluded that, “In view of the significant questions that remain unanswered about the feasibility of their study’s methods as practiced at the level of field interviews,it is necessary that Burnham and his co-authors provide detailed, data-based evidence that all reported interviews were indeed carried out, and how this was done in a valid manner. In addition, they need to explain and to demonstrate to what degree their published methodology was adhered to or departed from across interviews, and to demonstrate convincingly that interviews were done in accordance with the standards of ethical research.”[37]
    Borzou Daragahi Iraq correspondent for the Los Angeles Times, in an interview with PBS, questioned the study based on their earlier research in Iraq, saying, “Well, we think—the Los Angeles Times thinks these numbers are too large, depending on the extensive research we’ve done. Earlier this year, around June, the report was published at least in June, but the reporting was done over weeks earlier. We went to morgues, cemeteries, hospitals, health officials, and we gathered as many statistics as we could on the actual dead bodies, and the number we came up with around June was about at least 50,000. And that kind of jibed with some of the news report that were out there, the accumulation of news reports, in terms of the numbers killed. The U.N. says that there’s about 3,000 a month being killed; that also fits in with our numbers and with morgue numbers. This number of 600,000 or more killed since the beginning of the war, it’s way off our charts.”[38][39]
    The October 2006 Lancet estimate also drew criticism from the Iraqi government. Government spokesman Ali Debbagh said, “This figure, which in reality has no basis, is exaggerated”.[40] Iraq’s Health Minister Ali al-Shemari gave a similar view in November 2006: “Since three and a half years, since the change of the Saddam regime, some people say we have 600,000 are killed. This is an exaggerated number. I think 150 is OK.”[41}
    In addition it has also been revealed that the survey that was conducted was partially funded by anti-war billionaire activist George Soros.[42]
    John Tirman (February 14, 2008) in Editor and Publisher responded to the Soros charge: “My center at MIT used internal funds to underwrite the survey. More than six months after the survey was commissioned, the Open Society Institute, the charitable foundation begun by Soros, provided a grant to support public education efforts of the issue. We used that to pay for some travel for lectures, a web site, and so on. OSI (Open Society Institute), much less Soros himself (who likely was not even aware of this small grant), had nothing to do with the origination, conduct, or results of the survey. The researchers and authors did not know OSI, among other donors, had contributed.”

    In addition, the researchers have never provided evidence that they “went further into the back alleys than was apparent” from the article.
    We have only their say-so which is insufficient.

    Their failure to adhere to recognised standards was picked up by:

    AAPOR investigation of the 2nd Lancet survey
    On February 3, 2009, the Executive Council of the American Association for Public Opinion Research (AAPOR) announced that an 8-month investigation found the author of the 2006 Lancet survey, Dr. Gilbert Burnham, had violated the Association’s Code of Professional Ethics & Practices for repeatedly refusing to disclose essential facts about his research. Neither Dr. Burnham nor the Johns Hopkins Bloomberg School of Public Health are members of AAPOR. “Dr. Burnham provided only partial information and explicitly refused to provide complete information about the basic elements of his research,” said Mary Losch, chair of the association’s Standards Committee.[44][45]
    AAPOR’s President, Richard A. Kulka, added:
    “When researchers draw important conclusions and make public statements and arguments based on survey research data, then subsequently refuse to answer even basic questions about how their research was conducted, this violates the fundamental standards of science, seriously undermines open public debate on critical issues, and undermines the credibility of all survey and public opinion research. These concerns have been at the foundation of AAPOR’s standards and professional code throughout our history, and when these principles have clearly been violated, making the public aware of these violations is in integral part of our mission and values as a professional organization.”[46]
    AAPOR subsequently released a more detailed list of eight specific pieces of information Burnham failed to disclose after repeated requests. These include a copy of the survey questionnaire in all languages into which it was translated, the consent statement, information of sample selection methodology and a summary of the disposition of all sample cases.[47]
    The American Statistical Association has subsequently written in support of the actions taken by AAPOR, saying: “We are aware that, in taking this action, you have subjected yourselves to some criticism. On behalf of the American Statistical Association, we wish to recognize AAPOR for following procedure and acting professionally on such a difficult and divisive matter. In so doing, you eloquently express by your actions the goals stated in your Code.”

    The simple fact that the Iraqi government disagrees with the findings, Iraqi ministry of health in particular, should give you an idea that there is something particularly wrong with the Lancet article.

    In addition, it is not the first time the Lancet has screwed up, do you remember them having to retract a study linking vaccines with autism due to severely flawed methodology.

    Comment by Andrew — September 17, 2010 @ 4:19 am

  13. Some further criticism:

    One critic is Professor Michael Spagat, an economist from Royal Holloway College, University of London. He and colleagues at Oxford University point to the possibility of “main street bias” – that people living near major thoroughfares are more at risk from car bombs and other urban menaces. Thus, the figures arrived at were likely to exceed the true number. The Lancet study authors initially told The Times that “there was no main street bias” and later amended their reply to “no evidence of a main street bias”.

    Professor Spagat says the Lancet paper contains misrepresentations of mortality figures suggested by other organisations, an inaccurate graph, the use of the word “casualties” to mean deaths rather than deaths plus injuries, and the perplexing finding that child deaths have fallen. Using the “three-to-one rule” – the idea that for every death, there are three injuries – there should be close to two million Iraqis seeking hospital treatment, which does not tally with hospital reports.

    “The authors ignore contrary evidence, cherry-pick and manipulate supporting evidence and evade inconvenient questions,” contends Professor Spagat, who believes the paper was poorly reviewed. “They published a sampling methodology that can overestimate deaths by a wide margin but respond to criticism by claiming that they did not actually follow the procedures that they stated.” The paper had “no scientific standing”. Did he rule out the possibility of fraud? “No.”

    If you factor in politics, the heat increases. One of The Lancet authors, Dr Les Roberts, campaigned for a Democrat seat in the US House of Representatives and has spoken out against the war. Dr Richard Horton, editor of the The Lancet is also antiwar. He says: “I believe this paper was very thoroughly reviewed. Every piece of work we publish is criticised – and quite rightly too. No research is perfect. The best we can do is make sure we have as open, transparent and honest a debate as we can. Then we’ll get as close to the truth as possible. That is why I was so disappointed many politicians rejected the findings of this paper before really thinking through the issues.”

    Knocking on doors in a war zone can be a deadly thing to do. But active surveillance – going out and measuring something – is regarded as a necessary corrective to passive surveillance, which relies on reports of deaths (and, therefore, usually produces an underestimate).

    Iraq Body Count relies on passive surveillance, counting civilian deaths from at least two independent reports from recognised newsgathering agencies and leading English-language newspapers ( The Times is included). So Professor Gilbert Burnham, Dr Les Roberts and Dr Shannon Doocy at the Centre for International Emergency, Disaster and Refugee Studies, Johns Hopkins Bloomberg School of Public Health, Maryland, decided to work through Iraqi doctors, who speak the language and know the territory.

    They drafted in Professor Riyadh Lafta, at Al Mustansiriya University in Baghdad, as a co-author of the Lancet paper. Professor Lafta supervised eight doctors in 47 different towns across the country. In each town, says the paper, a main street was randomly selected, and a residential street crossing that main street was picked at random.

    The doctors knocked on doors and asked residents how many people in that household had died. A person needed to have been living at that address for three months before a death for it to be included. It was deemed too risky to ask if the dead person was a combatant or civilian, but they did ask to see death certificates. More than nine out of ten interviewees, the Lancet paper claims, were able to produce death certificates. Out of 1,849 households contacted, only 15 refused to participate. From this survey, the epidemiologists estimated the number of Iraqis who died after the invasion as somewhere between 393,000 and 943,000. The headline figure became 650,000, of which 601,000 were violent deaths. Even the lowest figure would have raised eyebrows.

    Dr Richard Garfield, an American academic who had collaborated with the authors on an earlier study, declined to join this one because he did not think that the risk to the interviewers was justifiable. Together with Professor Hans Rosling and Dr Johan Von Schreeb at the Karolinska Institute in Stockholm, Dr Garfield wrote to The Lancet to insist there must be a “substantial reporting error” because Burnham et al suggest that child deaths had dropped by two thirds since the invasion. The idea that war prevents children dying, Dr Garfield implies, points to something amiss.

    Professor Burnham told The Times in an e-mail that he had “full confidence in Professor Lafta and full faith in his interviewers”, although he did not directly address the drop in child mortality. Dr Garfield also queries the high availability of death certificates. Why, he asks, did the team not simply approach whoever was issuing them to estimate mortality, instead of sending interviewers into a war zone?

    Professor Rosling told The Times that interviewees may have reported family members as dead to conceal the fact that relatives were in hiding, had fled the country, or had joined the police or militia. Young men can also be associated with several households (as a son, a husband or brother), so the same death might have been reported several times.

    Professor Rosling says that, despite e-mails, “the authors haven’t provided us with the information needed to validate what they did”. He would like to see a live blog set up for the authors and their critics so that the matter can be clarified.

    Another critic is Dr Madelyn Hsaio-Rei Hicks, of the Institute of Psychiatry in London, who specialises in surveying communities in conflict. In her letter to The Lancet, she pointed out that it was unfeasible for the Iraqi interviewing team to have covered 40 households in a day, as claimed. She wrote: “Assuming continuous interviewing for ten hours despite 55C heat, this allows 15 minutes per interview, including walking between households, obtaining informed consent and death certificates.”

    Does she think the interviews were done at all? Dr Hicks responds: “I’m sure some interviews have been done but until they can prove it I don’t see how they could have done the study in the way they describe.”

    Professor Burnham says the doctors worked in pairs and that interviews “took about 20 minutes”. The journal Nature, however, alleged last week that one of the Iraqi interviewers contradicts this. Dr Hicks says: : “I have started to suspect that they [the American researchers] don’t actually know what the interviewing team did. The fact that they can’t rattle off basic information suggests they either don’t know or they don’t care.”

    And the corpses? Professor Burnham says that, according to reports, mortuaries and cemeteries have run out of space. He says that the Iraqi team has asked for data to remain confidential because of “possible risks” to both interviewers and interviewees.

    Comment by Andrew — September 17, 2010 @ 4:20 am

  14. Furthermore:

    Dubious polls: How accurate are Iraq’s death counts?

    The 2006 Lancet study on Iraqi deaths has received a 2010 “Top Ten Dubious Polling” award from (a Media Ethics Project of the Art Science Research Laboratory) for its lead author’s “stonewalling in the face of serious questions about a flawed survey project”. This award resulted partly from earlier criticism by the American Association for Public Opinion Research (AAPOR), for ethical violations (eg non-disclosure of essential information on the survey’s methods).

    What’s behind this? Are AAPOR and ASRL motivated by pro-war views? Why else discredit the study which estimated 654,965 Iraqi deaths? Well, AAPOR last made such a charge of ethics violations 12 years ago, against Republican pollster Frank Luntz, and the other 2010 “Top Ten Dubious Polling” awards went to Fox News, Gallup, CBS News, etc – not the usual targets of warmongers.

    Could it be that these criticisms of the Lancet study are accurate and justified? And can those of us who opposed the war consider this question without reflexively dismissing it in disgust? Prior to AAPOR’s involvement, Gilbert Burnham (Lancet study lead author) had revealed that the survey used a sampling methodology which differed from the published account (1). When researchers requested details, all were refused – making it impossible to assess the study’s claim of random sampling (an important matter for a study which estimated 601,000 violent deaths from 300 recorded deaths in the sample surveyed).(2)

    After receiving a related complaint regarding the survey, AAPOR asked Burnham for “basic methodological details” (including “sampling information”, “protocols regarding household selection”, etc) but was refused.(3) As a result, AAPOR criticised Burnham for not answering “even basic questions about how their research was conducted”. AAPOR’s president, Richard Kulka, went as far as stating that “this violates the fundamental standards of science” and “undermines the credibility of all survey and public opinion research”.

    A brief piece about this appeared on New Scientist’s website. The author, Debora MacKenzie, wrote that Burnham “did not send” the information requested by AAPOR, but that, “According to New Scientist’s investigation, however, Burnham has sent his data and methods to other researchers, who found it sufficient.”

    Has Burnham really “sent his methods” to researchers? Well, he hasn’t made basic, crucial details of the sampling methodology available (see above, and footnotes 1 & 2). And since AAPOR’s complaint was largely about this important aspect of the study, MacKenzie’s choice of words here seems misleading. The fact that other assorted information on the study’s methods has been available, and that data has been released to some researchers (some of whom, incidentally, have not found it “sufficient” – presumably MacKenzie’s investigation didn’t stretch to talking to them) is irrelevant to AAPOR’s criticism about what specifically hasn’t been made available to anyone.

    MacKenzie asserts that “Burnham’s complete data, including details of households, is available to bona fide researchers on request”. But the data isn’t available to all researchers – the ‘Main Street Bias’ researchers were refused it, for example. (Their paper, critical of the Lancet study, was awarded ‘Article of the Year’ by the Journal of Peace Research). Also, Burnham did not release “complete” data (as researchers who received the incomplete data would inform MacKenzie, if she’d bothered to ask them). As for “details of households”, that could mean anything (eg household-level data). In the context of AAPOR’s request for household selection data (for assessing sampling methods) it’s misleading, since this data wasn’t made available.

    Meanwhile, what was the reason, if any, that Burnham gave for refusing AAPOR’s request for information about the study? MacKenzie claims that:

    “A spokesman for the Bloomberg School of Public Health at Johns Hopkins, where Burnham works, says the school advised him not to send his data to AAPOR, as the group has no authority to judge the research. The ‘correct forum’, it says, is the scientific literature.”

    This is a strange statement for several reasons. First, what “authority” does AAPOR (or anyone else) need in order to “judge” (ie evaluate) information about a study? Did Burnham refuse the requests of other researchers because they didn’t have the correct “authority”? What does “authority” have to do with it? Note also that the comment about the scientific literature being the “correct forum” is disingenuous, as some of the writers appearing in “the scientific literature” were the very people being refused basic information by Burnham in the first place. One can’t discuss aspects of a study in the “scientific literature” unless information about those aspects is made available.

    If I read in the newspaper of a survey claiming “30% of British children carry a knife” (or whatever), my first thoughts are: how did they get a representative sample, and what questions did they ask? Would it be over-demanding of anyone to want to know these things? On the Lancet study, AAPOR asked for (and were refused) “the wording of questions and other basic [ie sampling] methodological details”.(4)

    Burnham’s school conducted its own investigation (after AAPOR’s criticisms were published). It suspended Burnham for violations of the approved protocol: use of the wrong data collection form and inclusion of respondents’ names (which potentially endangered them). Some commentators argued that this wasn’t relevant to estimation of the science. Burnham, however, reportedly said the investigation “verified his results” (a surreal claim, since the only thing “verified” was the transcription of data to computer. The school stated that it “did not evaluate aspects of the sampling methodology or statistical approach”).

    While it’s obvious that inclusion of names wouldn’t itself affect the study’s results, it seems relevant to earlier criticism of the study (over sampling methods) that the field team were carrying respondents’ names around (eg through checkpoints). Here’s Gilbert Burnham’s description (2007) of the “main street” aspect of the sampling (which came under criticism):

    “The interviewers wrote the principal streets in a cluster on pieces of paper and randomly selected one. They walked down that street, wrote down the surrounding residential streets and randomly picked one. […] The team took care to destroy the pieces of paper which could have identified households if interviewers were searched at checkpoints.”

    The reason the Lancet study’s authors haven’t released their lists of “principal streets” (which would be crucial to assessing their sampling scheme) was reportedly to protect respondents’ identities. And yet the list of main streets (including both sampled and unsampled streets) would in itself be less likely to reveal their identities than forms which contained their names.

    At least one of the Lancet study’s authors (Riyadh Lafta) presumably knew that names were being carried through checkpoints (contrary to the above claim, within the context of sampling, that care was taken to “destroy” any identifiers). Lafta was part of the Iraq-based team and one of the authors (along with Burnham) of the official companion document to the Lancet study, which states (falsely, we now know): “For ethical reasons, no names were written down, and no incentives were provided to participate.”

    One expects the reasons for non-disclosure of important information to stand up under scrutiny, and it’s particularly relevant to the survey’s estimates that crucial aspects of the sampling methods remain undisclosed, as small biases in sampling can have large effects on the estimates. Of course, this doesn’t necessarily mean the survey’s estimate of deaths was incorrect. Even if no information at all had been published and we had nothing but the team’s assertions to go on, it wouldn’t mean that their estimate was “necessarily” wrong. It would just make claims of an impeccably conducted, intensely-vetted survey look questionable, and we might, as a result, have a preference for other studies (eg IFHS, ILCS) which published more information with which to assess the results.



    1. Gilbert Burnham writes: “As far as selection of the start houses, in areas where there were residential streets that did not cross the main avenues in the area selected, these were included in the random street selection process, in an effort to reduce the selection bias that more busy streets would have.” This refers to a sampling method which was not included in the published account of the study. To date, Burnham has not made details of this sampling method available (in other words the actual procedures used to achieve random sampling “in areas where there were residential streets that did not cross the main avenues in the area selected” have not been released), despite requests from researchers and journalists. See: Ethical and data-integrity problems in the second Lancet survey of mortality in Iraq [p9]

    2. For example: “Peter Lynn, Professor of Survey Methodology at the Institute of Social and Economic Research, University of Essex, has been quietly investigating and, despite several e-mails to the [Lancet team] researchers, has been unable to get answers”. One of the aspects of the study that Lynn was concerned about was sampling methods: “The researchers made a list of all the roads intersecting the main road, and took one of those at random. They then went to 40 adjacent addresses going up one side. But these were all near the main road, so streets away from the main road may not have been represented.”

    See also: Ethical and data-integrity problems in the second Lancet survey of mortality in Iraq


    4. MacKenzie claims, misleadingly, that “The wording of the questions was also provided (pdf format) to a non-scientific magazine article critical of the study”. She links to an “Iraq Mortality Survey Template” which was provided to the National Journal by a third party. But the Lancet authors have declined to confirm or deny if these were the questions used. According to Professor Michael Spagat [p5]: “The L2 authors have not publicly released their questionnaire in any language: English, Arabic or Kurdish (III2). It is not clear at this stage that there was a formal questionnaire for L2 and there is no way to know how questions were worded in the field. Various researchers, such as Fritz Scheuren of National Opinion Research Center (NORC) and Madelyn Hsiao-Rei Hicks of the Institute of Psychiatry in London, have requested copies of the L2 questionnaire and have been refused by the L2 authors (personal communications). Scheuren was also told that the questionnaire exists only in English and that L2 interviewers, said to be fluent in both Arabic and English, translated the questionnaire into Arabic in the field. Several problems ensue.” [L2 = Lancet Iraq study, 2006]

    Comment by Andrew — September 17, 2010 @ 4:27 am

  15. Some stinging criticisms of Johns Hopkins and the Lancet article can be found here, including accusations of falsification of data:

    And this:

    Potential Problems
    Both Lancet studies of Iraqi war deaths rest on the data provided by Lafta, who operated with little American supervision and has rarely appeared in public or been interviewed about his role. In May, Lafta and Roberts presented their study to an off-the-record meeting of experts in Geneva, but other attendees declined to describe Lafta’s remarks. Despite multiple requests sent via e-mails and through Burnham and Roberts, Lafta declined to communicate with National Journal or to send copies of his articles about Iraqi deaths during Saddam’s regime.

    When asked questions about the reliability of their Iraqi partner, the studies’ American authors defend Lafta as a nice guy and a good researcher.

    “I’ve known him for years,” Garfield told NJ. “I used to work with his boss in 2003, studying how Saddam had pilfered cash [intended] for the health care system. He’s thoughtful, careful, and we became friends.”

    John Tirman, a political scientist at the Massachusetts Institute of Technology, described Lafta as “a medical doctor, a professor of medicine. Those factors were a sufficient level of credibility. I never asked [Lafta] about his political views.” Tirman commissioned the Lancet II survey with $46,000 from George Soros’s Open Society Institute and additional support from other funders.

    Lancet Editor Richard Horton shares this fundamental faith in scientists. He told NJ that scientists, including Lafta, can be trusted because “science is a global culture that operates by a set of norms and standards that are truly international, that do not vary by culture or religion. That’s one of the beautiful aspects of science — it unifies cultures, not divides them.”

    Still, the authors have declined to provide the surveyors’ reports and forms that might bolster confidence in their findings. Customary scientific practice holds that an experiment must be transparent — and repeatable — to win credence. Submitting to that scientific method, the authors would make the unvarnished data available for inspection by other researchers. Because they did not do this, citing concerns about the security of the questioners and respondents, critics have raised the most basic question about this research: Was it verifiably undertaken as described in the two Lancet articles?

    “The authors refuse to provide anyone with the underlying data,” said David Kane, a statistician and a fellow at the Institute for Quantitative Social Statistics at Harvard University. Some critics have wondered whether the Iraqi researchers engaged in a practice known as “curb-stoning,” sitting on a curb and filling out the forms to reach a desired result. Another possibility is that the teams went primarily into neighborhoods controlled by anti-American militias and were steered to homes that would provide information about the “crimes” committed by the Americans.

    Fritz Scheuren, vice president for statistics at the National Opinion Research Center and a past president of the American Statistical Association, said, “They failed to do any of the [routine] things to prevent fabrication.” The weakest part of the Lancet surveys is their reliance on an unsupervised Iraqi survey team, contended Scheuren, who has recently trained survey workers in Iraq.

    When the study came out in October 2006, President Bush said it wasn’t credible.

    The research is “a field study in unstable conditions,” Columbia University’s Garfield, one of the authors of the preliminary 2004 study, told National Journal in October. “You know that it’s imperfect, but … I’ll say this: It’s much easier to discredit than to go into a place like this and try and find answers. None of these harpies are dodging bullets.”

    Perhaps. But overall, the possible shortcomings of the Lancet studies persist, in three broad categories.

    Design And Implementation
    Critics say that the surveys used too few clusters, and too few people, to do the job properly.

    Sample size. The design for Lancet II committed eight surveyors to visit 50 regional clusters (the number ended up being 47) with each cluster consisting of 40 households. By contrast, in a 2004 survey, the United Nations Development Program used many more questioners to visit 2,200 clusters of 10 houses each. This gave the U.N. investigators greater geographical variety and 10 times as many interviews, and produced a figure of about 24,000 excess deaths — one-quarter the number in the first Lancet study. The Lancet II sample is so small that each violent death recorded translated to 2,000 dead Iraqis overall. The question arises whether the chosen clusters were enough to be truly representative of the entire Iraqi population and therefore a valid data set for extrapolating to nationwide totals.
    “Main street” bias? According to the Lancet II article, surveyors randomly selected a main street within a randomly picked district; “a residential street was then randomly selected from a list of residential streets crossing the main street.” This method pulled the survey teams away from side streets and toward main streets, where car bombs can kill the most people, thus boosting the apparent death rate, according to a critique of the study by Michael Spagat, an economics professor at the Royal Holloway, University of London, and Sean Gourley and Neil Johnson of the physics department at Oxford University.
    Burnham responds that The Lancet’s description of how the researchers picked sites was an editing error, and that the method used eliminated main-street bias.

    Oversight. To undertake the first Lancet study, Roberts went into Iraq concealed on the floor of an SUV with $20,000 in cash stuffed into his money belt and shoes. Daring stuff, to be sure, but just eight days after arriving, Roberts witnessed the police detaining two surveyors who had questioned the governor’s household in a Sadr-dominated town. Roberts subsequently remained in a hotel until the survey was completed. Thus, most of the oversight for Lancet I — and all of it for Lancet II — was done long-distance. For this reason, although he defends the methodology, Garfield took his name off Lancet II. “The study in 2006 suffered because Les was running for Congress and wasn’t directly supervising the work as he had done in 2004,” Garfield told NJ.
    Black-Box Data
    With the original data unavailable, other scholars cannot verify the findings, a key test of scientific rigor.

    Response rate. The surveyors said that 1.7 percent of households — fewer than one in 50 — were unoccupied or uncooperative, even though questioners visited each house only once on one day; that answers were taken only from the household’s husband or wife, not from in-laws or adult children; and that householders had reason to fear that their participation would expose them to threats from armed groups.
    To Kane, the study’s reported response rate of more than 98 percent “makes no sense,” if only because many male heads of households would be at work or elsewhere during the day and Iraqi women would likely refuse to participate. On the other hand, Kieran J. Healy, a sociologist at the University of Arizona, found that in four previous unrelated surveys, the polling response in Iraq was typically in the 90 percent range.

    The Lancet II questioners had enough time to accomplish the surveys properly, Burnham said.

    Lack of supporting data. The survey teams failed to collect the fraud-preventing demographic data that pollsters routinely gather. For example, D3 Systems, a polling firm based in Vienna, Va., that has begun working in Iraq, tries to prevent chicanery among its 100-plus Iraqi surveyors by requiring them to ask respondents for such basic demographic data as ages and birthdates. This anti-fraud measure works because particular numbers tend to appear more often in surveys based on fake interviews and data — or “curb-stoning — than they would in truly random surveys, said Matthew Warshaw, the Iraq director for D3. Curb-stoning surveyors might report the ages of many people to be 30 or 40, for example, rather than 32 or 38. This type of fabrication is called “data-heaping,” Warshaw said, because once the data are transferred to spreadsheets, managers can easily see the heaps of faked numbers.
    Death certificates. The survey teams said they confirmed most deaths by examining government-issued death certificates, but they took no photographs of those certificates. “Confirmation of deaths through death certificates is a linchpin for their story,” Spagat told NJ. “But they didn’t record (or won’t provide) information about these death certificates that would make them traceable.”
    Under pressure from critics, the authors did release a disk of the surveyors’ collated data, including tables showing how often the survey teams said they requested to see, and saw, the death certificates. But those tables are suspicious, in part, because they show data-heaping, critics said. For example, the database reveals that 22 death certificates for victims of violence and 23 certificates for other deaths were declared by surveyors and households to be missing or lost. That similarity looks reasonable, but Spagat noticed that the 23 missing certificates for nonviolent deaths were distributed throughout eight of the 16 surveyed provinces, while all 22 missing certificates for violent deaths were inexplicably heaped in the single province of Nineveh. That means the surveyors reported zero missing or lost certificates for 180 violent deaths in 15 provinces outside Nineveh. The odds against such perfection are at least 10,000 to 1, Spagat told NJ. Also, surveyors recorded another 70 violent deaths and 13 nonviolent deaths without explaining the presence or absence of certificates in the database. In a subsequent MIT lecture, Burnham said that the surveyors sometimes forgot to ask for the certificates.

    Suspicious cluster. Lafta’s team reported 24 car bomb deaths in early July, as well as one nonviolent death, in “Cluster 33” in Baghdad. The authors do not say where the cluster was, but the only major car bomb in the city during that period, according to Iraq Body Count’s database, was in Sadr City. It was detonated in a marketplace on July 1, likely by Al Qaeda, and killed at least 60 people, according to press reports.
    The authors should not have included the July data in their report because the survey was scheduled to end on June 30, according to Debarati Guha-Sapir, director of the World Health Organization’s Collaborating Center for Research on the Epidemiology of Disasters at the University of Louvain in Belgium. Because of the study’s methodology, those 24 deaths ultimately added 48,000 to the national death toll and tripled the authors’ estimate for total car bomb deaths to 76,000. That figure is 15 times the 5,046 car bomb killings that Iraq Body Count recorded up to August 2006.

    According to a data table reviewed by Spagat and Kane, the team recorded the violent deaths as taking place in early July and did not explain why they failed to see death certificates for any of the 24 victims. The surveyors did remember, however, to ask for the death certificate of the one person who had died peacefully in that cluster.

    The Cluster 33 data is curious for other reasons as well. The 24 Iraqis who died violently were neatly divided among 18 houses — 12 houses reported one death, and six houses reported two deaths, according to the authors’ data. This means, Spagat said, that the survey team found a line of 40 households that neatly shared almost half of the deaths suffered when a marketplace bomb exploded among a crowd of people drawn from throughout the broader neighborhood.

    The data also bolster Spagat’s criticism that the surveyors selected too many clusters in places where bomb explosions and gunfights were most common.

    Ideological Bias
    Virtually everyone connected with the study has been an outspoken opponent of U.S. actions in Iraq. (So are several of the study’s biggest critics, such as Iraq Body Count.) Whether this affected the authors’ scientific judgments and led them to turn a blind eye to flaws is up for debate.

    Follow the money. Lancet II was commissioned and financed by Tirman, the executive director of the Center for International Studies at MIT. (His most recent book is 100 Ways America Is Screwing Up the World.) After Lancet I was published, Tirman commissioned Burnham to do the second study, and sent him $50,000. When asked where Tirman got the money, Burnham told NJ: “I have no idea.”
    In fact, the funding came from the Open Society Institute created by Soros, a top Democratic donor, and from three other foundations, according to Tirman. The money was channeled through Tirman’s Persian Gulf Initiative. Soros’s group gave $46,000, and the Samuel Rubin Foundation gave $5,000. An anonymous donor, and another donor whose identity he does not know, provided the balance, Tirman said. The Lancet II study cost about $100,000, according to Tirman, including about $45,000 for publicity and travel. That means that nearly half of the study’s funding came from an outspoken billionaire who has repeatedly criticized the Iraq campaign and who spent $30 million trying to defeat Bush in 2004.

    Partisan considerations. Soros is not the only person associated with the Lancet studies who had one eye on the data and the other on the U.S. political calendar. In 2004, Roberts conceded that he opposed the Iraq invasion from the outset, and — in a much more troubling admission — said that he had e-mailed the first study to The Lancet on September 30, 2004, “under the condition that it come out before the election.” Burnham admitted that he set the same condition for Lancet II. “We wanted to get the survey out before the election, if at all possible,” he said.
    “Les and Gil put themselves in position to be criticized on the basis of their views,” Garfield concedes, before adding, “But you can have an opinion and still do good science.” Perhaps, but the Lancet editor who agreed to rush their study into print, with an expedited peer-review process and without seeing the surveyors’ original data, also makes no secret of his leftist politics. At a September 2006 rally in Manchester, England, Horton declared, “This axis of Anglo-American imperialism extends its influence through war and conflict, gathering power and wealth as it goes, so millions of people are left to die in poverty and disease.” His speech can be viewed on YouTube.

    Mr. Roberts tries to go to Washington. Roberts, who opposed removing Saddam from power, is the most politically outspoken of the authors. He initiated the first Lancet study and repeatedly used its conclusions to criticize Bush. “I consider myself an advocate,” Roberts told an interviewer in early 2007. “When you start working documenting events in war, the public health response — the most important public health response — is ending the war.”
    In 2006, he acted on this belief, seeking the Democratic nomination for New York’s 24th Congressional District before dropping out in favor of the eventual winner, Democrat Michael Arcuri. Asked why he ran for office, Roberts told NJ: “It was a combination of Iraq and [Hurricane] Katrina that just put me over the top. I thought the country was going in the desperately wrong direction, particularly with regard to public health and science.”

    Politics At Work
    Roberts was hardly the only American to lose confidence in Bush. The question is whether he and his team lost their objectivity as scientists as well. Unanimously, the authors insist that the answer is no.

    Roberts concedes that the only certain way to collect information for a study of Iraqi war casualties would be through a full census, something he says is impossible in the midst of sectarian civil war. His study’s method “has limitations,” he told NJ. “It works less well when bombs are killing people in clusters — and they are killing people in clusters in Iraq — but it remains a fundamentally robust way of determining changes in mortality rates.” Asked if he remains certain that Lafta’s Iraqi teams truly collected the data they turned in, Roberts answered, “I’m just absolutely confident this data is not fabricated.”

    “Dr. Burnham and his colleagues are confident that the data presented in the 2004 and 2006 are accurate, and they fully stand by the conclusions of their research,” according to a November 27 statement from the Bloomberg School of Public Health. “The findings of independent surveys of Iraqis conducted by the United Nations in March 2005, by the BBC in March 2007, and by the British polling firm ORB in September 2007 support the conclusions of the Hopkins mortality studies.”

    Critics say, however, that the other national reports cited in the Johns Hopkins statement, particularly the ORB poll, have methodological flaws and political overtones similar to those in the Lancet studies.

    “Just stating, ‘We have no biases of that type’ isn’t very convincing,” says Oxford University’s Johnson. “Using ‘I am an expert’ arguments sounds to me like ‘Trust me, I am a doctor.’ ” Johnson and two of his colleagues have called on the scientific community to conduct an in-depth re-evaluation of both Lancet studies. “It’s almost a crime to let it go unchallenged,” Johnson said.

    Even Garfield, a co-author of the first Lancet article, is backing away from his previous defense of his fellow authors. In December, Garfield told National Journal that he guesses that 250,000 Iraqis had died by late 2007. That total requires an underlying casualty rate only one-quarter of that offered by Lancet II.

    The authors — Lafta excepted — have been willing to engage their critics in debate, returning journalists’ calls and, for the most part, avoiding ad hominem arguments. Yet, sometimes their defenses raise new questions. Burnham says, for instance, that Lafta offered to take reporters to visit some of the neighborhoods used in the clusters, although he declined to say whether the reporters would be allowed to visit the surveyed households or to pick the clusters to see.

    Roberts and his defenders emphasize that when their cluster method produced shockingly high mortality rates in the Congo, no one questioned them — not seeming to understand that journalists looking at the Iraq study are now indeed wondering if the Congo results are valid.

    Roberts, when asked if he timed the release of his Lancet studies to hurt the Republicans on Election Day, contends that his biggest concern was ensuring the safety of his researchers. “If this study was finished in September and not published until after the November elections — and it was perceived that we were sitting on the results — my Iraqi colleagues would have been killed,” he told National Journal. Even if true, this assertion undermines his expressions of confidence in the integrity and skill of the Iraqi researchers. How can their data be trusted if their very lives depended on the results?

    No matter whether a latent desire to feed the American public’s opposition to the war might have shaped these studies, another audience was paying close attention: jihadists who used this research as a justification for killing Americans. Roberts already believed that jihadi attacks were, in part, driven by the international image of the United States. “The greatest threat to U.S. national security [is] the image that the United States is a violator of international laws and order and that there is no means other than violence to curb it,” Roberts wrote in a July 2005 article for Tirman’s center. When NJ asked Roberts about the risk that his estimate would incite more violence, his confidence seemed to waver for the only time during the interview. “This area of study is a minefield,” he said. “The people you are talking about are the same kind of people who deny the Holocaust.” Does it give him qualms that some of those people use his study to recruit suicide bombers? “It does,” he replied after a pause. “My guess is that I’ve provided data that can be narrowly cited to incite hatred. On the other hand, I think it’s worse to have our leaders downplaying the level of violence.”

    Burnham also paused when asked whether Iraqi factions manipulated him and his colleagues and then replied, “We’re reasonably confident that we were not manipulated.”

    Professional Responsibilities
    Officials at Iraq Body Count strongly opposed the Iraq war yet issued a detailed critique of the Lancet II study. Researchers wading into a field that is this fraught with danger have a responsibility not to be reckless with statistics, the group said. The numbers claimed by the Lancet study would, under the normal ratios of warfare, result in more than a million Iraqis wounded seriously enough to require medical treatment, according to this critique. Yet official sources in Iraq have not reported any such phenomenon. An Iraq Body Count analysis showed that the Lancet II numbers would have meant that 1,000 Iraqis were dying every day during the first half of 2006, “with less than a tenth of them being noticed by any public surveillance mechanisms.” The February 2006 bombing of the Golden Mosque is widely credited with plunging Iraq into civil war, yet the Lancet II report posits the equivalent of five to 10 bombings of this magnitude in Iraq every day for three years.

    “In the light of such extreme and improbable implications,” the Iraq Body Count report stated, “a rational alternative conclusion to be considered is that the authors have drawn conclusions from unrepresentative data.”

    Against these criticisms, the authors maintain that they were using methods of study unfamiliar to human-rights groups and that the scientific community widely accepted the Lancet studies. “There have been 56 studies using this retrospective household survey method,” Garfield said. “The estimation of crude mortality in a population does work…. It doesn’t mean you can’t do it wrong. It is the best method we have. The question is, ‘Did they do it right?’ ”

    When it comes to the question of peer review, the study’s defenders sometimes seem to want it both ways. On the one hand, Roberts talks about the need “to step beyond peer review.” Yet the authors insist that their study was peer-reviewed extensively (if rapidly, in order to be published before the election). The authors also maintain that one of the reasons they went to The Lancet with these studies is its quick turnaround time.

    Surprisingly, not one of the peer reviewers seems to have thought to ask a basic question: Are the data in the two studies even true? The possibility of fakery, editor Horton told NJ, “did not come up in peer review.” Medical journals can’t afford to repeat every scientific study, he said, because “if for every paper we published we had to think, ‘Is this fraud?’ … honestly, we would fold tomorrow.”

    In Belgium, Guha-Sapir’s team is completing a paper outlining numerous mathematical and procedural errors in the Lancet II article, and its corrections will likely lower the estimate of dead Iraqis to 450,000, even without consideration of possible fraud during the surveying, a source said.

    Perhaps medical journals, like respected news organizations, will learn that they have to factor the possibility of wartime fraud into their fact-checking. Horton knows the peacetime risks only too well: In a Lancet article in October 2005, exactly halfway between the two Iraq mortality studies, a Norwegian physician named Jon Sudbo wrote that a review of 454 patients showed that such common painkillers as ibuprofen and naproxen reduced smokers’ risk of contracting oral cancer while increasing their risk for heart disease; it later turned out that Sudbo had faked his research.

    Today, the journal’s editor tacitly concedes discomfort with the Iraqi death estimates. “Anything [the authors] can do to strengthen the credibility of the Lancet paper,” Horton told NJ, “would be very welcome.” If clear evidence of misconduct is presented to The Lancet, “we would be happy to go ask the authors and the institution for an official inquiry, and we would then abide by the conclusion of that inquiry.”

    Comment by Andrew — September 17, 2010 @ 4:41 am

  16. Andrew, you made a number of good points. I’ll try to address them in order. You mention the Iraq Living Conditions Survey showing lower numbers of deaths than the Lancet survey.

    Summary: A previous United Nations survey around half as many additional deaths in the same study period as the MIT/Bloomberg study. The two studies measured different quantities: as the ILCS study estimated deaths classified as “war-related” by respondents, whereas the MIT/Bloomberg study estimated all excess deaths. The two surveys also employed somewhat different methodologies.

    Background on the ILCS
    In 2005, the United Nations Development Program and jointly published the Iraq Living Conditions Survey 2004 (ILCS). This was a wide-ranging study with an extensive questionnaire that took respondents a median time of 83 minutes to complete. The questionnaire covered housing and infrastructure; household economy; basic demography; the education, health, and labour force characteristics of the household members; and women’s reproductive history and children’s health. The main finding was that living conditions in Iraq as measured by key indicators deteriorated significantly in the period April 2002 to April 2004. For example, chronic malnutrition increasing from 4% to 8% and access to safe water falling from 95% to 60% in urban areas.

    ILCS mortality findings
    In addition to these topics, the list of questions included the question: “Has any person(s) who was a regular household member died or gone missing during the past 24 months?” As the survey was carried out in April 2004, the “past 24 months” refers to the period April 2002 to April 2004.

    The date of the deaths were not recorded in the ILCS, and its results therefore cannot be used to establish a change in mortality. A partial comparison with estimates of the change in the rate of death nonetheless is possible by using information about the cause of death recorded. The survey asked respondents to classify reported deaths in the categories “Disease / Traffic Accident / War related death / Pregnancy or childbirth / Other”. The number of war-related deaths was estimated at 24 thousand in the period March 2003 to April 2004 (12 months), with 95% confidence that the true number is in the range 18-29 thousand.

    Comparison with the 2004 Bloomberg study
    The ILCS study can be compared to the 2004 Bloomberg study, which found a total of 98 thousand deaths in total in the period March 2003 to September 2004 (18 months), with 95 confidence the true number was in the range 8-192 thousand. This corresponds to 65 thousand total deaths per year, or 2.7 times the number found by the ILCS for war-related deaths.

    Comparison with the 2006 MIT/Bloomberg study
    The 2006 Lancet study estimated an increase in crude mortality of 7.8/1,000 for the period March 2003 to June 2006. This figure corresponds to 655 thousand deaths over 40 months.

    In the period March 2003 – April 2004, similar to that of the ILCS, the study reports an increase in crude mortality of 2/1,000 (Table 3). This translates around 52 thousand deaths per year. This figure for total crude mortality of all causes thus is 2.2 times larger than that found by the ILCS exclusively for “war-related” deaths.

    Potential explanations for discrepancies between estimates
    There are several reasons that one may not expect the ILCS results to be similar to those found by the Lancet study:

    ?Different variables are measured. The ILCS number recorded “war-related” deaths whereas the Bloomberg estimates are for total deaths. It is not clear what deaths were classified by respondents to the ILCS survey as “war-related”, but this is unlikely to include all deaths (and may not include all violent deaths, such as criminal murder). For this reason, the ILCS number should be expected to be lower than the total of “violent deaths” recorded in the Bloomberg studies.

    ?The ILCS survey was not focussed on mortality and may not have complete reporting. Lancet study co-author Les Roberts gave the following explanation to a forum hosted by the BBC Website (full transcript available from MediaLens):

    I suspect that [the ILCS] mortality estimate was not complete. I say this because the overall non-violent mortality estimate was, I am told, very low compared to our 5.0 and 5.5/ 1000 /year estimates for the pre-war period which many critics (above) claim seems too low. Jon [Pederson, who led the survey] sent interviewers back after the survey was over to the same interviewed houses and asked just about <5 year old deaths. The same houses reported ~50% more deaths the second time around. In our surveys, we sent medical doctors who asked primarily about deaths. Thus, I think we got more complete reporting.

    These questions and answers by one of the Lancet authors address many of the concerns you posted:

    I will post the unedited version of the above:

    1. How do you know that you are not reporting the same fatality multiple times?
    For example if you were to ask people in the UK if they know anyone who has been involved in a traffic accident most would say they do. Applying your logic that means there are 60 million accidents every year.
    Andrew M, London, UK

    Les Roberts: That is an excellent question. To be recorded as a death in a household, the decedent had to have spent most of the nights during the 3 months before their death "sleeping under the same roof" with the household that was being interviewed. This may have made us undercount some deaths (soldiers killed during the 2003 invasion for example) but addressed your main concern that no two households could claim the same death event.

    2. It seems the Lancet has been overrun by left-wing sixth formers.
    The report has a flawed methodology and deceit is shown in the counting process. What is your reaction to that?
    Ian, Whitwick, UK

    LR: Almost every researcher who studies a health problem is opposed to that health problem. For example, few people who study measles empathize with the virus. Thus, given that war is an innately political issue, and that people examining the consequences of war are generally opposed to the war's conception and continuation, it is not surprising that projects like these are viewed as being highly political. That does not mean that the science is any less rigorous than a cluster survey looking at measles deaths. This study was the standard approach for measuring mortality in times of war, it went through a rigorous peer-review process and it probably could have been accepted into any of the journals that cover war and public health.

    The Lancet is a rather traditional medical journal with a long history and is not seen as "left-wing" in the public health and medical communities. The types of different reports (medical trials, case reports, editorials) in the Lancet have been included for scores of years. The Lancet also has a long history of reporting about the adverse effects of war, and the world is a more gentle place for it.

    3. Why is it so hard for people to believe the Lancet report?
    I am an Iraqi and can assure you that the figure given is nearer to the truth than any given before or since.
    S Kazwini, London, UK

    LR: I think it is hard to accept these results for a couple of reasons. People do not see the bodies. While in the UK there are well over 1000 deaths a day, they do not see the bodies there either. Secondly, people feel that all those government officials and all those reporters must be detecting a big portion of the deaths. When in actuality during times of war, it is rare for even 20% to be detected. Finally, there has been so much media attention given to the surveillance-based numbers put out by the coalition forces, the Iraqi Government and a couple of corroborating groups, that a population-based number is a dramatic contrast.

    4. Why do you think some people are trying to rubbish your reports, which use the same technique as used in other war zones for example in Kosovo?
    Another group, which uses only English-language reports – Iraq Body Count – constantly rubbishes your reports. Again, why do you think that is?
    Mark Webb, Dublin, Ireland

    LR: I suspect there are many different groups with differing motives.

    5. Can you explain, if your figures are correct, why 920 more people were dying each day than officially recorded by the Iraqi Ministry of Health – implying huge fraud and/or incompetence on their behalf?
    Dan, Scotland

    LR: It is really difficult to collect death information in a war zone! In 2002, in Katana Health Zone in eastern Democratic Republic of Congo (DRC) there was a terrible meningitis outbreak, where the health zone was supported by the Belgian Government, and with perhaps the best disease surveillance network in the entire country. A survey by the NGO International Rescue Committee showed that only 7% of those meningitis deaths were recorded by the clinics and hospitals and government officials. Patrick Ball at Berkeley showed similar insensitivity by the press in Guatemala during the years of high violence in the 1980s. I do not think that very low reporting implies fraud.

    6. As an analyst myself I would like to know how reliable the method itself actually is.
    Les Roberts and his colleagues claim to have used the same method to estimate deaths in Iraq as is used to estimate deaths in natural disasters. Is there any evidence that the method is accurate? By this I mean a comparison of the number actual deaths after a natural disaster with estimates of the number of deaths.
    Rickard Loe, Stockholm, Sweden

    LR: That is a good question. There is a little evidence of which I am aware. Note that the 2004 and 2006 studies found similar results for the pre- and initial post-invasion period which at least implies reproducibility. I led a 30 cluster mortality survey in Kalima in the DRC in 2001. The relief organization Merlin did a nutritional survey and measured mortality in the same area and with a recall period that covered part of our survey. Both were cluster surveys, Merlin used a different technique to select houses and we obtained statistically identical results. In a couple of refugee settings, cluster surveys have produced similar estimates to grave monitoring.

    In 1999, in Katana Health Zone in the Congo, I led a mortality survey where we walked a grid over the health zone and interviewed 41 clusters of 5 houses at 1km. spacings. In that survey, we estimated that 1,600 children had died of measles in the preceding half year. A couple of weeks later we did a standard immunization coverage survey (30 clusters of 7 children but selected totally proportional to population) that asked about measles deaths and we found an identical result.

    I suspect that Demographic Health Surveys or the UNICEF MICS surveys (which are both retrospective cluster mortality approaches) have been calibrated against census data but I do not know when or where.

    7. My understanding is that this study reports ten times more deaths attributable to the war than other studies because this is the only one to use statistical methods to make inferences about the mortality rate across the whole population.
    Other studies only record verifiable deaths, which one would expect to constitute only a small part of the total number. Am I correct?
    Matthew, Appleton

    LR: Yes.

    8. It seems to me that the timing of the publication of the 2004 and 2006 reports – in both cases shortly before a U.S. election – was a mistake.
    Does Mr Roberts regret the timing of the release of the two reports or does he feel they achieved some benefit?
    Mik Ado, London, UK

    LR: Yes. Both were unfortunate timing. As I said at the time of the first study, I lived in fear that our Iraqi colleagues and interviewers would be killed if we had finished a survey in mid-September and it took two months for the results to get out. This notion has been widely misquoted as saying we wanted to influence the if the two parties somehow had different positions on the war in Iraq. I think in Iraq, a post-election publication in 2004 would have been seen as my colleagues knowing something but keeping it hidden. It was also unfortunate that the attention span of the U.S. media is short during election seasons.

    More detailed questions from Joe Emersberger

    9. Lancet 2 found a pre-invasion death rate of 5.5/ per 1000 people per year. The UN has as estimate of 10? Isn't that evidence of inaccuracy in the study?

    LR: The last census in Iraq was a decade ago and I suspect the UN number is somewhat outdated. The death rate in Jordan and Syria is about 5. Thus, I suspect that our number is valid. Note that if we are somehow under-detecting deaths, then our death toll would have to be too low, not too high. Both because a) we must be missing a lot, and b) the ratio of violent deaths to non-violent deaths is so high.

    I find it very reassuring that both studies found similar pre-invasion rates, suggesting that the extra two-years of recall did not dramatically result in under-reporting..a problem recorded in Ziare and Liberia in the past.

    10. The pre-invasion death rate you found for Iraq was lower than for many rich countries. Is it credible that a poor country like Iraq would have a lower death rate than a rich country like Australia?

    LR: Yes. Jordan and Syria have death rates far below that of the UK because the population in the Middle-east is so young. Over half of the population in Iraq is under 18. Elderly populations in the West are a larger part of the population profile and they die at a much higher rate.

    11. A research team led by physicists Sean Gourley and Neil Johnson of Oxford University and economist Michael Spagat have asserted in an article in Science that the second Lancet study is seriously flawed due to "main street bias.". Is this a valid, well tested concept and is it likely to have impacted your work significantly?

    LR: I have done (that is designed, led, and gone to the houses with interviewers) at least 55 surveys in 17 countries since 1990.most of them retrospective mortality surveys such as this one. I have measured at different times, self-selection bias, bias from the families with the most deaths leaving an area, absentee bias..but I have never heard of "main street bias." I have measured population density of a cluster during mortality surveys in Sierra Leone, Rwanda, Dem. Republic of Congo, and the Republic of Congo, and in spite of the conventional wisdom that crowding is associated with more disease and death, I have never been able to detect this during these conflicts where malaria and diarrhoea dominated the mortality profile.

    We worked hard in Iraq to have every street segment have an equal chance of being selected. We worked hard to have each separate house have an equal chance of being selected. I do not believe that this "main street bias" arose because a) about a 1/4th of the clusters were in rural areas, b) main streets were roughly as likely to be selected, c) most urban clusters spanned 2-3 blocks as we moved in a chain from house to house so that the initial selected street usually did not provide the majority of the 40 households in a cluster and d) people being shot was by far the main mechanism of death, and we believe this usually happened away from home. Realize, there would have to be both a systematic selection of one kind of street by our process and a radically different rate of death on that kind of street in order to skew our results. We see no evidence of either.

    12. In Slate Magazine, Fred Kaplan has alleged that:

    "….if a household wasn't on or near a main road, it had zero chance of being chosen. And "cluster samples" cannot be seen as representative of the entire population unless they are chosen randomly." Is Kaplan's statement true?

    LR: His comment about proximity to main roads is just factually wrong! As far as cluster surveys go, they are never perfect; however, they are the main way to measure death rates in this kind of setting. See the SMART initiative at

    13. Madelyn Hicks, a psychiatrist and public health researcher at King's College London in the U.K., says she "simply cannot believe" the paper's claim that 40 consecutive houses were surveyed in a single day. Can you comment on this?

    LR: During my DRC surveys I planned on interviewers each interviewing 20 houses a day, and taking about 7 minutes per house. Most of the time in a day was spent on travel and finding the randomly selected household. In Iraq in 2004, the surveys took about twice as long and it usually took a two person team about 3 hours to interview a 30 house cluster. I remember one rural cluster that took about 6 hours and we got back after dark. Nonetheless, Dr. Hicks concerns are not valid as many days one team interviewed two clusters in 2004.

    14. A recent Science Magazine article stated that Gilbert Burnham (one of your co-authors) didn't know how Iraqis on survey team conducted their work. The article also claimed that raw data was destroyed to protect the safety of interviewees. Is this true?

    LR: These statements are simply not true; and do not reflect anything said by Gilbert Burnham! He's submitted a letter to the editors of Science in response, which I hope they will print.

    15. A UNDP study carried out survey 13 months after the war that had a much higher sample size than both Lancet studies and found about 1/3 the numbers of deaths that your team has found. Given the much higher sample size shouldn't we assume the UNDP study was more accurate and that therefore your numbers are way too high?

    LR: The UNDP study was much larger, was led by the highly revered Jon Pederson at Fafo in Norway, but was not focused on mortality. His group conducted interviews about living conditions, which averaged about 82 minutes, and recorded many things. Questions about deaths were asked, and if there were any, there were a couple of follow-up questions.

    A) I suspect that Jon's mortality estimate was not complete. I say this because the overall non-violent mortality estimate was, I am told, very low compared to our 5.0 and 5.5/ 1000 /year estimates for the pre-war period which many critics (above) claim seems too low. Jon sent interviewers back after the survey was over to the same interviewed houses and asked just about <5 year old deaths. The same houses reported ~50% more deaths the second time around. In our surveys, we sent medical doctors who asked primarily about deaths. Thus, I think we got more complete reporting.

    B) This UNDP survey covered about 13 months after the invasion. Our first survey recorded almost twice as many violent deaths from the 13th to the 18th months after the invasion as it did during the first 12 (see figure 2 in the 2004 Lancet article). The second survey found an excess rate of 2/1000/year over the same period corresponding to approximately 55,000 deaths by April of 2004(see table 3 of 2006 Lancet article). Thus, the rates of violent death recorded in the two survey groups are not so divergent.
    Les Roberts Responds To Steven Moore Of The Wall Street Journal

    Moore's editorial can be read here:

    Distinction between criticism and fabrication regarding deaths in Iraq

    I read with interest the October 18th editorial by Steven Moore reviewing our study reporting that an estimated 650,000 deaths were associated with the 2003 invasion and occupation of Iraq. I had spoken with Mr. Moore the week before when he said that he was writing something for the Wall Street Journal to put this survey in perspective. I am not surprised that we differed on the current relevance of 10 year-old census data in a country that had experienced a major war and mass exodus.

    I am not surprised at his rejection of my suggestion that the references in a web report explaining the methodology for lay people and reporters was not the same as the references in our painstakingly written peer reviewed article. What is striking is Mr. Moore's statement that we did not collect any demographic data, and his implication that this makes the report suspect.

    This is curious because, not only did I tell him that we asked about the age and gender of the living residents in the houses we visited, but Mr. Moore and I discussed, verbally and by e-mail, his need to contact the first author of the paper, Gilbert Burnham, in order to acquire this information as I did not have the raw data. I would assume that this was simply a case of multiple misunderstandings except our first report in the Lancet in 2004 referenced in our article as describing the methods states, ".interviewees were asked for the age and sex or every current household member."

    Thus, it appears Mr. Moore had not read the description of the methods in our reports. It is not important whether this fabrication that "no demographic data was collected" is the result of subconscious need to reject the results or whether it was intentional deception. What is important, is that Mr. Moore and many others are profoundly uncomfortable that our government might have inadvertently triggered 650,000 deaths.

    Most days in the US, more than 5000 people die. We do not see the bodies. We cannot, from our household perspective, sense the fraction from violence. We rely on a functional governmental surveillance network to do that for us. No such functional network exists in Iraq. Our report suggests that on top of the 300 deaths that must occur in Iraq each day from natural causes; there have been approximately 500 "extra" deaths mostly from violence.

    Of any high profile scientific report in recent history, ours might be the easiest to verify. If we are correct, in the morgues and graveyards of Iraq, most deaths during the occupation would have been due to violence. If Mr. Bush's "30,000 more or less" figure from last December is correct, less than 1 in 10 deaths has been from violence. Let us address the discomfort of Mr. Moore and millions of other Americans, not by uninformed speculation about epidemiological techniques, but by having the press travel the country and tell us how people are dying in Iraq.

    Comment by AP — September 17, 2010 @ 9:39 am

  17. More:

    Lancet Study Author Assesses New Report on Iraqi Death Toll
    January 11, 2008

    The World Health Organization reports on findings just published in the New England Journal of Medicine: “A large national household survey conducted by the Iraqi government and WHO estimates that 151,000 Iraqis died from violence between March 2003 and June 2006.”

    Roberts is co-author of a study published in October 2006 by the leading medical journal The Lancet that estimated 655,000 excess deaths following the 2003 U.S. invasion of Iraq.

    The following are excerpts from a statement released by Roberts: “I think that this new article in the NEJM is a good addition to the discussion. … There is far more in common in the results of the two reports than appears at first glance.

    “The NEJM article found a doubling of mortality after the invasion, we found a 2.4 fold increase. Thus, we roughly agree on the number of excess deaths. The big difference is that we found almost all the increase from violence, they found one-third the increase from violence. …

    “This new estimate is almost four times the ‘widely accepted’ [Iraq Body Count] number from June of 2006, our estimate was 12 times higher. Both studies suggest things are far worse than our leaders have reported.

    “There are reasons to suspect that the NEJM data had an under-reporting of violent deaths.

    “They roughly found a steady rate of violence from 2003 to 2006. Baghdad morgue data, Najaf burial data, Pentagon attack data, and our data all show a dramatic increase over 2005 and 2006. …

    “It is likely that people would be unwilling to admit violent deaths to the study workers who were government employees.

    “Finally, their data suggests one-sixth of deaths over the occupation through June 2006 were from violence. Our data suggests a majority of deaths were from violence. The morgue and graveyard data I have seen is more in keeping with our results.”

    The second article suggests that the discrepancy between the Lancet findings and has more to do with violent deaths (car bomb, bullet, etc.)versus excess deaths (i.e., a kid dying from an easily treatable illness because the hospital was bombed and/or the family couldn’t get antibiotics). The UN survey found the Iraqi post-invasion death rate to be twice as high as the death rate before the invasion (vs. 2.4 times higher in the Lancet survey) but attributed only 150,000 vs. 600,000 excessive deaths to violence. So the argument is not, did 500,000 or 650,000 civilians die due to the Iraq invasion – the UN survey’s total civilian excess death rate isn’t far removed from the Lancet’s total – but how many of those were shot, beheaded, bombed vs. died by other means such as lack of medical care, drinking contaminated water, etc. In terms of responsibility I don’t see much difference – those people still died as a result of Bush and Blair’s invasion of Iraq. But that part is a matter of opinion, you might see it differently.

    Without even resorting to studies, it seems pretty straightforward and logical. If the death rate in Iraq pre-invasion was 5.5 per 1000 as it is in other, comparable middle eastern nations with high populations of young people (according to here: Baathist Syria had a death rate 5 per thousand at the time of the invasion) but post-invasion increased by 5.5 people per thousand, that comes out to 148,500 extra deaths per year in a country with 27 million people using figures from the new WHO study (which is lower than the Lancet’s figures). In 4 years that’s the extra deaths of 594,000 people caused by the invasion, in whichever way.

    So, as bad as Putin is – and he is bad – the Bush/Blair team is much worse, in terms of how many civilians have died.

    Comment by AP — September 17, 2010 @ 9:40 am

  18. AP, I suggest you read

    It is a serious and well researched destruction of the Lancet article.

    The problem with the 5.5 deaths per 1000 statement for Iraq prior to the invasion is that the same Iraqi researcher used by the team who wrote the article for the lancet had spent around 10 years claiming massively increased mortality rates in Iraq during the period between the two Gulf wars due to sanctions against the Saddam regime.

    The same researcher then tries to say the the death rate was the same as Jordan?

    Give me a break.

    “Estimates of deaths during sanctions
    Estimates of excess deaths during vary depending on the source. The estimates vary [24][33] due to differences in methodologies, specific time-frames covered, and practical difficulties.[34] A short listing of estimates follows:
    Unicef: 500,000 children (including sanctions, collateral effects of war). “[As of 1999] [c]hildren under 5 years of age are dying at more than twice the rate they were ten years ago.”[24][35]
    Former U.N. Humanitarian Coordinator in Iraq Denis Halliday: “Two hundred thirty-nine thousand children 5 years old and under” as of 1998.[6]
    Iraqi Baathist government: 1.5 million.[22]
    Iraqi Cultural Minister Hammadi: 1.7 million (includes sanctions, bombs and other weapons, depleted uranium poisoning) [36]
    “probably … 170,000 children”, Project on Defense Alternatives, “The Wages of War”, 20. October 2003[37]
    350,000 excess deaths among children “even using conservative estimates”, Slate Explainer, “Are 1 Million Children Dying in Iraq?”, 9. October 2001.[38]
    “Richard Garfield, a Columbia University nursing professor … cited the figures 345,000-530,000 for the entire 1990-2002 period”[39] for sanctions-related excess deaths.[40]
    Zaidi, S. and Fawzi, M. C. S., The Lancet (1995, estimate withdrawn in 1997):567,000 children.[9]
    Editor (then “associate editor and media columnist”) Matt Welch,[41] Reason Magazine, 2002: “It seems awfully hard not to conclude that the embargo on Iraq has … contributed to more than 100,000 deaths since 1990.”[22][40]
    Former U.S. Attorney General Ramsey Clark: 1.5 million (includes sanctions, bombs and other weapons, depleted uranium poisoning).[42]
    British Member of Parliament George Galloway: “a million Iraqis, most of them children.”[43]
    Economist Michael Spagat: unknowable because “we cannot observe anything resembling a controlled experiment that isolates the pure impact of economic sanctions on child deaths” but “the contention that sanctions had caused the deaths of more than half a million children is … very likely to be wrong.”[9]
    [edit]Infant and child death rates

    Iraq’s infant and child survival rates fell after sanctions were imposed.
    A May 25, 2000 BBC article[44] reported that before Iraq sanctions were imposed by the UN in 1990, infant mortality had “fallen to 47 per 1,000 live births between 1984 and 1989. This compares to approximately 7 per 1,000 in the UK.” The BBC article was reporting from a study of the London School of Hygiene & Tropical Medicine, titled “Sanctions and childhood mortality in Iraq”, that was published in the May 2000 Lancet medical journal.[45] The study concluded that in southern and central Iraq, infant mortality rate between 1994 and 1999 had risen to 108 per 1,000. Child mortality rate, which refers to children between the age of one and five years, also drastically inclined from 56 to 131 per 1,000.[44] In the autonomous northern region during the same period, infant mortality declined from 64 to 59 per 1000 and under-5 mortality fell from 80 to 72 per 1000, which was attributed to better food and resource allocation.
    The Lancet publication[45] was the result of two separate by UNICEF[24] surveys between February and May 1999 in partnership with the local authorities and with technical support by the WHO. “The large sample sizes – nearly 24,000 households randomly selected from all governorates in the south and center of Iraq and 16,000 from the north – helped to ensure that the margin of error for child mortality in both surveys was low,” UNICEF Executive Director Carol Bellamy said.[24]
    In the spring of 2000 a U.S. Congressional letter demanding the lifting of the sanctions garnered 71 signatures, while House Democratic Whip David Bonior called the economic sanctions against Iraq “infanticide masquerading as policy.”[46]
    [edit]Arguments over culpability for excess deaths
    The Lancet[45] and Unicef studies observed that child mortality decreased in the north and increased in the south between 1994 and 1999 but did not attempt to explain the disparity, or to apportion culpability: “Both the Government of Iraq and the U.N. Sanctions Committee should give priority to contracts for supplies that will have a direct impact on the well-being of children,” UNICEF said.[24] However, others did attempt to explain this disparity, or use this to apportion culpability. In The Nation, 2001, David Cortright argued that Iraqi government policy, rather than the UN Sanctions, should be held responsible. He wrote:
    The differential between child mortality rates in northern Iraq, where the UN manages the relief program, and in the south-center, where Saddam Hussein is in charge, says a great deal about relative responsibility for the continued crisis. As noted, child mortality rates have declined in the north but have more than doubled in the south-center. … The tens of thousands of excess deaths in the south-center, compared to the similarly sanctioned but UN-administered north, are also the result of Baghdad’s failure to accept and properly manage the UN humanitarian relief effort.[7]
    In The New Republic, 2001, Michael Rubin argued that
    The difference [t]here is that local Kurdish authorities, in conjunction with the United Nations, spend the money they get from the sale of oil. Everywhere else in Iraq, Saddam does. And when local authorities are determined to get food and medicine to their people–instead of, say, reselling these supplies to finance military spending and palace construction–the current sanctions regime works just fine. Or, to put it more bluntly, the United Nations isn’t starving Saddam’s people. Saddam is.[8]
    However, in Reason Magazine, 2002, Matt Welch acknowledged this but replied that the sanctions are not “‘exactly the same’ in both parts of Iraq” because
    Under the oil-for-food regime, the north, which contains 13 percent of the Iraqi population, receives 13 percent of all oil proceeds, a portion of that in cash. Saddam’s regions, with 87 percent of the population, receive 59 percent of the money … none of it in cash. And there are other factors affecting the north-south disparity…[22]
    Author Anthony Arnove also writes that the situation is more complicated:
    Sanctions are simply not the same in the north and south. Differences in Iraqi mortality rates result from several factors: the Kurdish north has been receiving humanitarian assistance longer than other regions of Iraq; agriculture in the north is better; evading sanctions is easier in the north because its borders are far more porous; the north receives 22 percent more per capita from the oil-for-food program than the south-central region; and the north receives UN-controlled assistance in currency, while the rest of the country receives only commodities. The south also suffered much more direct bombing…[47]
    In Significance, 2010, economist Michael Spagat argues that the ICMMS survey, the only one (of four) international sanctions surveys (graphed in his paper) to show a dramatic increase in child mortality, is suspect because of the abusive, manipulative nature of the Iraqi regime. He offers two possible explanations for the north/south discrepancy:
    First, the Kurdish zone was free of Saddam’s control. In the South/centre, though, the reaction of Saddam Hussein’s regime to the sanctions must be part of a full explanation for child mortality patterns in this zone. … A second potential explanation for the strange patterns displayed by the South/ Centre in the [data] is that they were not real but, rather, results of manipulations by the Iraqi government.[9]”

    You can’t have it both ways AP, Saddams Iraq was in no way comparable to Jordan, Syria et al.

    Saddam stole the funds from the money for food program to build his palaces and buy weapons, the Kurds used it to look after their people.

    The child mortality rate in the area controlled by Saddam was 131/1000

    It looks like the liars at Johns Hopkins/Bloomberg were taking the figures for mortality in the Kurdish controlled north, which was not under Saddams control and using them for the whole country, rather than the horrific figures for areas under Saddams control.

    By the way, the head of the Bloomberg team was suspended for failure to follow his own protocols. If he lied repeatedly about whether or not his researchers recorded personal details, then what else was he lying about?

    From the damning criticism of the methodology:

    “Quoting again from L2: ‘The third stage consisted of random selection of a main street within the administrative unit from a list of all main streets.’ (Burnham et al., 2006a, emphasis added). These lists of main streets are at the core of the claimed sampling methodology. Yet, the L2 authors have refused to provide these lists or even clarify where they came from.12 Without this information we cannot assess the sampling frame for the study (III.3) and we cannot know the sample design fully (III.4).

    Gilbert Burnham did make aspects of the sampling methodology fairly concrete in Biever (2007), an interview with the New Scientist.

    The interviewers wrote the principal streets in a cluster on pieces of paper and randomly selected one. They walked down that street, wrote down the surrounding residential streets and randomly picked one. Finally, they walked down the selected street, numbered the houses and used a random number table to pick one. That was our starting house, and the interviewers knocked on doors until they’d surveyed 40 households…. The team took care to destroy the pieces of paper which could have identified households if interviewers were searched at checkpoints. (Biever, 2007, emphasis added)

    Whatever its strengths or weaknesses, this does seem to be a procedure that can be followed in the field. The L2 authors may no longer be able to specify their sample design since these pieces of paper have been destroyed. But they should be able to supply lists of principal streets or at least specify how many such streets there were per governorate.

    Burnham explains that the sampling information was destroyed to protect the identities of respondents, but this explanation is inadequate. Pieces of paper with lists of principal streets and surrounding streets would be of no use for identifying households included in the survey. Even lists of all of the households on a street that was actually sampled would not be usable for identifying particular L2 respondents. On the other hand, the L2 data-entry form that Riyadh Lafta submitted to the WHO contains spaces for listing the name of each head of household in addition to names of people who died or were born during the L2 sampling period. If the field teams could travel around with pieces of paper containing the names of their respondents plus many of their family members then they did not have to destroy lists of streets. Finally, as noted above in Section 2, the lists of L2’s respondents would have been widely known at the local level in any case.

    The L2 authors have often dismissed the possibility of sampling bias by stating that they did not actually follow the sampling procedures that they claimed to have followed in their Lancet publication. For example, Burnham and Roberts (2006a) write that they had removed the following sentence from their description of their sampling methodology at the suggestion of peer reviewers and the editorial staff at the Lancet:

    As far as selection of the start houses, in areas where there were residential streets that did not cross the main avenues in the area selected, these were included in the random street selection process, in an effort to reduce the selection bias that more busy streets would have. (Burnham and Roberts, 2006a)

    Thus, this part of the description of sampling methodology should have read:

    The third stage consisted of random selection of a main street within the administrative unit from a list of all main streets. A residential street was then randomly selected from a list of residential streets crossing the main street. As far as selection of the start houses, in areas where there were residential streets that did not cross the main avenues in the area selected, these were included in the random street selection process, in an effort to reduce the selection bias that more busy streets would have. (Original text from Burnham et al., 2008, with new text italicised)

    Combining this with Gilbert Burnham’s New Scientist interview already quoted (Biever, 2007) would imply that at each location:

    Field teams wrote names of main streets on pieces of paper and selected one street at random.
    The field teams then walked down this street writing down names of cross streets on pieces of paper and selected one of these at random.
    The field teams then became aware of all other streets in the area that did not cross the main avenues and may have selected one of these instead of one of the cross streets written on pieces of paper. This wide selection was done according to an undisclosed procedure.
    The Biever (2007) description of Burnham does outline a sampling procedure that could have been followed and is broadly consistent with the published methodology. If other types of streets, beyond those that would be covered by the published methodology, were included in the sampling procedures then the authors need to specify how these streets were included. More fundamentally, how did the field teams discover the existence of such streets that could not be seen by walking down principal streets as described by Burnham in Biever (2007)?

    The L2 field teams would not have brought detailed street maps with them into each selected area or else it would not have been necessary to walk down selected principal streets writing down names of surrounding streets on pieces of paper. We can also rule out the possibility that the teams completely canvassed entire neighbourhoods and built up detailed street maps from scratch in each location. Developing such detailed street maps would have been very time consuming and the L2 field teams had to follow an extremely compressed schedule that required them to perform 40 interviews in a day (Hicks, 2006).

    In Giles (2007), an article in Nature, Burnham and Roberts suggested one possible explanation on how the field teams had managed to augment their street lists beyond streets that could be seen by walking down a main street, but this suggestion was rejected by an L2 field team member interviewed by Nature:

    But again, details are unclear. Roberts and Gilbert Burnham, also at Johns Hopkins, say local people were asked to identify pockets of homes away from the centre; the Iraqi interviewer says the team never worked with locals on this issue. (Giles, 2007)

    Even if locals had identified such ‘pockets of homes away from the centre’ the authors still would have to specify how these were included in the randomisation procedures. Indeed, involving local residents in selecting the streets to be sampled would seem to be at odds with the random selection of households. Locals could, for example, lead the survey teams to particularly violent areas.

    Burnham and Roberts have induced further confusion about their sample design by issuing a series of contradictory statements.

    The sites were selected entirely at random, so all households had an equal chance of being included. (Burnham et al., 2006b, emphasis added)

    Our study team worked very hard to ensure that our sample households were selected at random. We set up rigorous guidelines and methods so that any street block within our chosen village had an equal chance of being selected. (Burnham and Roberts, 2006b, emphasis added)

    … we had an equal chance of picking a main street as a back street. (The National Interest, 2006)

    These statements contradict each other and the methodology published in the Lancet. Some streets are much longer than others. Some streets are much more densely populated than others. Such varied units cannot all have equal probability of selection. If, for example, every street block had an equal chance of selection then households on densely populated street blocks would have lower selection probabilities than households on a sparsely populated street block. If main streets are more densely populated on average than are back streets and main streets and back streets have equal selection probabilities then households on main streets would have lower selection probabilities than households on back streets.

    Thus, the L2 survey appears to violate standards III.3 and III.4 of the AAPOR Code of Professional Ethics and Practices.

    The sampling methods for the ILCS are explained briefly in ILCS (2005a) and in great detail in ILCS (2005b, Appendix 2). The IFHS sampling methods are explained in IFHS (2008a), including in the supplementary appendix. The sampling methods have been well disclosed for these surveys.

    III.5. Sample sizes and, where appropriate, eligibility criteria, screening procedures, and response rates computed according to AAPOR Standard Definitions. At a minimum, a summary or disposition of sample cases should be provided so that response rates could be computed. (AAPOR, 2005)

    L2 does give information on response rates but this information is unlikely to be correct. L2 reports nobody home in 16 households out of 1849 (0.9%) and refusals to participate from 15 households (0.8%). This degree of success seems especially unlikely given the rushed conditions under which the survey was conducted with field teams regularly conducting 40 interviews in a single day.13 L2 methodology did not follow a common practice, employed in several recent surveys in Iraq including the IFHS and the ILCS, of making three visits to a selected household before accepting failure to make contact. For L2, a head of household or spouse had to be present and agreeable for an interview within a single time window of perhaps 20-30 minutes almost without fail with no opportunity for repeat visits. The L2 paper plus a further clarification by Gilbert Burnham also reports that its field teams conducted interviews in 52 clusters and that there was only one security-related failure to reach a selected cluster, which was in the governorate of Wasit.14

    The IFHS gives a rather direct comparison with L2 since the IFHS field work was conducted only a few months after the L2 field work. The IFHS failed to visit 115 out of its 1086 clusters (10.6%) due to security reasons. These problems encountered by IFHS field workers cast doubt on the L2 report of only one failed cluster visit in 52 attempts (1.9%) due to security reasons. Assume that the IFHS success rate in cluster visits (89.4%) is the true rate for L2 and that the results of attempted visits (success or failure) are statistically independent across these attempts. Then the odds against 0 or 1 failed visits out of 52 attempts would be 47 to 1.

    The IFHS disaggregates its success rates in visiting clusters by governorate: 34.2% (37/108) for Al-Anbar, 67.7% (65/96) for Baghdad, 83.3% (60/72) for Nineveh and 98.1% (53/54) for Wasit. If we take these percentages as the true ones for L2 and again assume independence across visits then the odds against the record of L2 in Baghdad, 12 successes in 12 attempts, are 108 to 1 against. The odds against L2’s five successes in five attempts in both Al-Anbar and Nineveh are, respectively, 214 to 1 and 2.5 to 1 against.15 The compound odds against 22 successful cluster visits in 22 attempts in these three insecure governorates are 57,780 to 1 against. Somewhat strangely, Wasit was the only governorate for which L2 reported a security-related failed cluster visit although the IFHS experience of 53 successes in 54 attempts suggests that such a failure would be improbable.

    For clusters actually visited the IFHS failed to make contact 3.4% of the time compared with L2’s rate of 0.9%. Assuming independence across visits and a success probability of 96.6% for each visit, as suggested by the IFHS record, the odds against the L2 report of only 16 failed contact attempts would be more than 500,000 to 1 against.

    Note that the IFHS did not give up on making contact before making three contact attempts. L2, on the other hand, had a compressed work schedule and could not have tried as hard as the IFHS did to make contact. Thus, the IFHS would have been expected to have a substantially lower no-contact rate than L2’s – just the opposite of what was reported by the two surveys.

    L1 (Roberts et al., 2004) was conducted by many of the same people who did L2 and the two studies shared many methodological commonalities, including strong time pressure on the field teams. L1 is, therefore, a good survey to compare with L2. On the other hand, L1 was conducted nearly two years before L2 was done. During the period in between the two surveys a large number of Iraqis were displaced with at least several hundred thousand fleeing abroad. One would expect the not-at-home rate to be higher in 2006 than it was in 2004. Yet L1 reported 64 out of 988 households visited were empty (6.5%).16 Thus, the no-contact rate for L2 was lower by more than a factor of 7 compared to L1’s. If, again, we assume statistical independence across contact attempts and that the L1 no-contact rate of 6.5% applied during the L2 period then the odds against the L2 contact record would be about 71014. In fact, we would have to lower the true L1 no-contact rate from 6.5% to about 1.5%, to even reduce the odds against the reported L2 rate to about 90 to 1.

    The ILCS, done in 2004 like L1, reports an overall failure-to-interview-rate, mixing no contact with refusals, of 1.6%, which is slightly lower than L2’s 1.7%. There are, however, two reasons why we must adjust the ILCS rate upward in order to make an appropriate comparison with L2.

    First, the ILCS made three contact attempts and failed to complete interviews 2.6% of the time on its first attempts.

    Second, the ILCS expended considerable effort preparing the ground before selecting and contacting households. Specifically, the ILCS teams completely enumerated all the households in each cluster before selecting the particular households to be interviewed. During these enumerations field teams eliminated all housing units the teams determined to be empty.17

    Thus, L2’s 1.7% failure-to-interview rate should be compared with the ILCS’s 2.6% plus some upward adjustment for the percentage of unoccupied housing in 2006. The field work for the IFHS was conducted only a few months after L2’s field work and reported that for 0.8% of its selected households the ‘entire household was absent for [an] extended period’ and 1.3% of the time the ‘dwelling [was] vacant or address not a dwelling’. With an empty-housing adjustment of 2% for the ILCS, an appropriate failure-to-interview rate would be 4.6% for the ILCS compared with 1.7% for L2. Even without this adjustment the odds against the reported L2 experience, using the same methods as before, are 190 to 1. If we add in the adjustment then the odds against the L2 claim rise to nearly 100,000 to 1.

    A recent poll by American Broadcasting Corporation (ABC) and other news organisations, ABC (2007a), experienced a no-contact rate of 7% and a refusal rate 35% (ABC, 2007b). It appears that the refusal rate is not strictly comparable to L2’s because use of the ‘next-birthday’ method by the ABC poll probably made it harder to progress to a successful interview for this poll than it was for L2.18 On the other hand, the L2 methodology only allows interviews with heads of households or their spouses so some adults who might have been at home when L2 interviewers visited would have been ineligible to respond to the survey. Even if we reduce the 7% rate reported by ABC by a factor of 4 the odds against the L2 record would still remain at 934 to 1.

    A recent poll by Opinion Research Business based in London (ORB, 2008) failed to interview (at least on their mortality question) 251 out of 2414 individuals contacted (10.4%), again suggesting that the claimed L2 success rate is unlikely.

    To summarise, these comparisons provide some evidence of fabrication and falsification both in L2’s reported success rates in visiting selected clusters and in L2’s reported contact rates with selected households.

    Also relevant to the disclosure discussion is the fact that an incomplete L2 dataset has been released but only selectively to certain researchers (Kaiser, 2007). Below is the key part of the data disclosure policy of the L2 researchers (Bloomberg School of Public Health, 2007).

    Conditions for the Release of Data from the 2006 Iraq Mortality Study

    [Enlarge Image]
    The IFHS dataset has not yet been released. The ILCS dataset is obtainable by approaching the Central Organisation for Statistics and Information Technology (Iraq) or COSIT, the Iraqi national statistical office, although it is not easy to obtain.

    Finally, and most importantly on the subject of disclosure, the AAPOR Standards Committee formally investigated the L2 survey and formally censured L2 lead author Gilbert Burnham for refusing to disclose the L2 funding source, questionnaire, consent script, sample design and other fundamental pieces of information on the survey, thereby stifling further investigation by the Committee (AAPOR, 2009a & b).


    In this section I discuss a varied body of evidence of fabrication and falsification in the L2 data and paper and reports of L2 results. I have already presented some of this evidence in the previous section. I stress the evidence of fabrication/falsification in response rates and in success rates in visiting selected clusters and failure to properly disclose many aspects of the study including wordings of questions, the data-entry form, the sample design and data that matches anonymised interviewer IDs with particular interviews. In the next subsection I take a different tack, looking at evidence for falsification by the extrapolation of L2’s results from two previous studies. The main exhibit is the following graphic.

    Some Evidence of Extrapolation of the L2 Results from Previous Studies

    Figure 2 shows results from three mortality surveys.19 The first is the Kosovo study of Spiegel and Salama (2000). This paper is cited in Roberts et al. (2004), Burnham et al. (2006a) and Burnham et al. (2006b). Thus this is a paper that the L2 authors know well.

    [Enlarge Image]
    FIGURE 2. Some evidence that the violent-death estimate for L2 was extrapolated from two previous surveys.
    There was an exchange of letters in the Lancet of 13 January 2007. Guha-Sapir et al. (2007) questioned the L2 finding that roughly 90% of all excess deaths in Iraq were violent, contrary to findings in other war studies such as those done on the Democratic Republic of Congo (DRC). The L2 authors responded:

    We feel a better comparison would be to the data collected during that war which showed that 1.8% of the 19.9 million people in the eastern part of the country died of violence in the first 33 months of the conflict, a proportion similar to that measured in Iraq. (Burnham et al., 2007)20

    To back up this claim they cite Roberts et al. (2001), a study of the DRC. This is the second point in Figure 2. The third and final data point is L2 itself.

    The three studies are in near-perfect alignment. A regression line drawn through them has an R-squared of 0.9996. One could make a slightly different assumption and feed in slightly different numbers but under any plausible scenario the fit is nearly perfect with an R-squared of at least 0.99. All of these studies have quite large confidence intervals so the chances of their central estimates lining up so well would appear to be very small.

    The Kosovo and DRC studies were in the literature for several years before L2 was done. Draw a line between these first two central estimates and the slope suggests that an additional 15 months of conflict will result in the deaths of an additional 1% of the population. Extending the line, the eight months by which the L2 period exceeds the DRC period would bring the total percentage killed during the L2 period to just over 2.3. The fact that the L2 authors cite the DRC study as being similar to L2 in terms of the number of months and percent of population killed and the fact that the L2 authors are well aware of the Kosovo study reinforces the relevance of the graph.21

    Professor Mark van der Laan of the University of California Berkeley quantified the probability of the three points lining up the way they do due to pure chance as 0.036. This is based on a simulation taking 100,000 draws of three points with normal distributions and respective means and standard errors of (0.8, 0.21), (1.8, 0.4) and (2.3, 0.4) where the standard errors are suggested by the published studies (R code available upon request). Thus, this three-point diagram (Figure 2) provides statistical evidence of data falsification although it is not definitive; we reject the hypothesis that the alignment arose by chance at the 5% level but not at the 1% level.

    Risk Factors for Interviewer Fabrication

    AAPOR and ASA (2003), a joint document of AAPOR and the American Statistical Association (ASA), lists risk factors for data fabrication by interviewers. Most of them are present in L2. Here is the list of risk factors with commentary on their relationship to L2.

    a Hiring and training practices that ignore fabrication threats

    I am not aware of any information concerning hiring practices for L2. L2 states that the interviewers were all medical doctors with ‘previous survey experience and community medicine experience and were fluent in English and Arabic’ but does not explain how they were hired. L2 further states that there was a two-day training session for the field workers but the L2 researchers have refused to disclose any information on the content of these sessions other than that interviewers were ‘trained in the use of the questionnaire’ (Burnham et al., 2006b). There is no evidence of any attention to fabrication threats in any training or hiring practices.

    I have no information on hiring practices for the ILCS or the IFHS. ILCS (2005b) and IFHS (2008a, supplementary appendix) are clear that training and field testing for both surveys were extensive although they contain no information on the content of the training.

    b Inadequate supervision

    None of the US-based authors were in Iraq when the field work was conducted so none of them could have provided meaningful supervision. Burnham et al. (2006a) does not claim that the US-based authors did supply any field supervision. The paper simply states that Riyadh Lafta was the field manager and supervisor. There is no information on how Lafta discharged these duties. Moreover, Lafta is not available to answer questions about how he supervised the L2 field work. He has a policy of not responding to any questions from journalists and his only interaction with researchers on this subject of which I am aware was an off-the-record meeting at the WHO at which he submitted his data-entry form. The US-based L2 researchers do not facilitate contacts with Riyadh Lafta (Munro and Canon, 2008).

    The IFHS employed 112 two-person (male-female) interview teams and 100 supervisors: 21 central, 20 local and 59 in the field (IFHS, 2008b). The ILCS had five-person interview teams, each with its own supervisor (ILCS, 2005a) with additional supervision and visits from COSIT, the Iraqi statistical department, and Fafo, the Norwegian institute that was in charge of the study.

    The AAPOR/ASA document discusses supervisory methods that can be employed to prevent fabrication but there is no evidence that Riyadh Lafta employed any of these methods. These methods include:

    i Observational methods

    This means monitoring interviews. L2 had two field teams consisting of four interviewers who are said to have divided into sub-teams of two for actual interviewing. Thus, it was possible for Riyadh Lafta to monitor up to about 25% of all the interviews. There have, however, been no indications that Lafta actually did any such monitoring.

    ii Recontact methods

    These methods can involve physically revisiting households that were supposed to have been interviewed or simply calling them on the telephone or writing to them through the mail. These recontacts can be used to check data that have been collected or simply to check that interviews were actually conducted. L2 did not use any recontact methods. Furthermore, the apparent destruction of records on where interviews were conducted means that recontact of households that were interviewed for L2 was never and will never be possible.

    iii Data analysis methods

    These methods can involve the identification of suspicious patterns by particular interviewers. The L2 authors have not published any evidence that they used such methods and have refused to cooperate with other people, such as Fritz Scheuren of NORC, who have wanted to apply them. As noted above, the L2 authors refuse to release data with anonymised interviewer IDs matched to the results of interviews.

    Collection and analysis of demographic information on respondents and their families is another important, and commonly used, check against fabrication. But the L2 study did not collect demographic information on households other than the number of males and the number of females contained in each one (with some omissions).

    iv Selection procedures

    The document states that ‘typically 5-15% of the interviews are monitored and/or recontacted’. But L2 apparently did not have any monitoring and had no recontact. Of course, field teams would have been well aware of the lack of supervision in the study and might have acted accordingly.

    All of the above four supervisory methods were employed by the ILCS and the IFHS. Note, in particular, that both surveys collected data-matching interviews with anonymised interviewer IDs and this information is present in the ILCS dataset that has been released.

    c Lack of concern about interviewer motivation

    I found no evidence of concern about interviewer motivation in the L2 study. The L2 authors have not disclosed any information about their interviewers, other than the phrase quoted above under ‘point a’. On the other hand, I also did not find evidence of concern about interviewer motivation in materials released by the ILCS or the IFHS.

    d Poor quality control

    I have already discussed the lack of quality control in the collection of the data. The lack of quality control in the L2 dataset itself has been well-documented, including numerous errors, omissions and inconsistencies.22 Data that are sometimes missing include household sizes (13 times), months in which deaths occurred (57 times), and the number of males and females in each household (55 times).23 The dataset usually gives household sizes in 2002 and 2006 plus births, deaths, immigration to and emigration from the households but for 14% of all households the identity,

    Household size 2006 = Household size 2002 + births – deaths + in-migration – out-migration

    does not hold. Occasionally the identity fails by a wide margin. The L2 paper states:

    The interviewers then asked about births, deaths, and in-migration and out-migration, and confirmed that the reported inflow and exit of residents explained the differences in composition between the start and end of the recall period.

    Thus, these inconsistencies should have been filtered out in the field but often were not.

    In L2’s single cluster that was done in the governorate of Al-Tameem, data are missing on the number of males and the number of females for all 40 households. This can be viewed as another quality control issue; someone should have spotted this deficiency and sent field workers back to this cluster to gather the missing data. Note, however, that field teams consisting of four people are said to have worked in groups of two. This means that one pair should have done approximately 20 of the households in the cluster with the other pair doing the other 20 households. It is a bit implausible that both teams would have separately forgotten to record the number of males and females for their entire half of the cluster. Moreover, if these pairs were actually using the data-entry forms that Riyadh Lafta submitted to the WHO it seems unlikely that they could have gone through 20 interviews without realising that they were not filling in the box for gender information. Thus, perhaps interviews were not really conducted as described in the Al-Tameem cluster.

    I am not aware of any similar indicators of poor quality in the ILCS or IFHS mortality data.

    d Excessive workload

    L2 imposed an extraordinary workload on its field workers (Hicks, 2006). Field teams were routinely expected to conduct 40 interviews in a single day. Moreover, it is claimed that the two field teams completed 52 clusters (40 interviews per cluster) in just 52 days of field work. To accomplish this task the teams had to travel all over Iraq during one of the most violent periods of the conflict, encumbered by checkpoints and poor transportation infrastructure in a country that had experienced, over the last three decades, three wars and strict economic sanctions.

    The IFHS had 112 interview teams conduct 9345 interviews in 971 clusters spread over four months. This works out to about two interviews every three days per team on average, with a team completing a cluster of ten households roughly every two weeks on average. These teams were supported by 100 supervisors and 55 data-entry people as well. The ILCS had 500 workers but does not give a breakdown. Since the ILCS sample size was more than twice that of the IFHS and the IFHS was largely conducted within two months it would appear that ILCS interviewers would have experienced more time pressure than IFHS interviewers. However, time pressure on L2 interviewers would have been much greater than in either the IFHS or the ILCS.

    e Inadequate compensation.

    f Piece-rate compensation as the primary pay structures

    To my knowledge there is no information available on how the field teams were compensated for L2, the ILCS or the IFHS.

    g Off-site isolation of interviewers from the parent organization

    The parent organization for L2 is Johns Hopkins University so there was indeed off-site isolation of interviewers from the parent organization. No one from the parent organization was present in Iraq during the L2 field work. The IFHS and ILCS did not suffer from such off-site isolation.

    To summarise, most of the risk factors for fabrication identified in the AAPOR/ASA document were present in the L2 study. Some, such as excessive workload, were present, arguably, to an extreme degree. Other factors may not have been present but cannot be ruled out based on the information that is currently available. Of course, the presence of so many risk factors for fabrication does not prove that fabrication actually occurred. Nevertheless, the above discussion demonstrates that the L2 project appears to have operated virtually without defences against fabrication. As Fritz Scheuren of NORC pointed out: ‘They failed to do any of the [routine] things to prevent fabrication’ (Munro and Canon, 2008).

    A Work Schedule that Appears to be Impossible without Ethical Transgressions

    The key reference on this is Hicks (2006), developing ideas that were first expressed by Bohannon (2006). This paper makes concrete the many things that L2 field teams needed to accomplish at each household and argues that it is implausible that the teams could have worked on such a punishing schedule while maintaining acceptable ethical standards.

    Additional factors to those covered in the Hicks paper add further grounds for scepticism that the L2 study could have been performed as claimed. The sampling routines described above would have been time consuming. At each cluster a field team needed to walk down a main street writing down names of cross streets and then select one at random. The teams would then have to have walked the length of the selected cross street enumerating all the houses on that street so that one of these could be chosen at random as the starting point. If we accept that field teams somehow included streets that were not cross streets to main streets then even more time would have to have been spent locating these other streets. In addition, travelling from cluster to cluster while navigating checkpoints along a bad system of roads, degraded by years of conflict and sanctions would also have been very time consuming as the two field teams attempted to move from cluster to cluster.

    L2 Estimates Compared with those of Other Surveys24

    In this section I compare the distribution of violent deaths nationally and by governorate in L2 with the distribution of ‘war-related deaths’ in the ILCS (ILCS, 2005a) and with violent deaths in the IFHS (IFHS, 2008a). I also make some use of the database of the Iraq Body Count (IBC) project.25

    The ILCS, supported by the United Nations Development Program in Iraq, estimated 24,000 ‘war-related deaths’ with a 95% Confidence Interval (CI) of 18,000 to 29,000 based on field work conducted mainly between 22 March 2004 and 25 May 2004. The ILCS had a recall period of two years so it covered slightly more than a year after the invasion of Iraq and slightly less than a year before the invasion.

    First, note that non-violent death rates for L2 and the ILCS are quite similar: 4.5 and 4.8 per 1000 per year for the ILCS period respectively. L1’s non-violent death rate of 5.3 per 1000 per year is also close to the non-violent death rates for L2 and the ILCS.

    But violent-death estimates diverge dramatically, for L2 versus ILCS. Even taking L2 only through 31 March 2004, eight weeks before the ILCS field work was completed, the L2 central estimate exceeds the ILCS one by nearly a factor of 3 (see Table I). This becomes almost a factor of 4 if we include April and May for L2 (see Table II).

    TABLE I Violent Deaths: ILCS vs. L2 – March 2004
    ILCS lower CI limit

    ILCS central estimate

    ILCS upper CI limit

    L2 central through 31 March 2004

    (L2 central)/ (ILCS upper limit)

    Note: Figures in bold type and those underlined particularly illustrate the author’s argument.





























































    TABLE II Violent Deaths: ILCS vs. L2 – May 2004
    ILCS lower CI limit

    ILCS central estimate

    ILCS upper CI limit

    L2 central through 31 May 2004

    (L2 central)/ (ILCS upper limit)

    Note: Figures in bold type and those underlined particularly illustrate the author’s argument.





























































    The IFHS is suitable for comparing with L2 because it includes almost exactly the same coverage period.26 The IFHS gives a central estimate of 151,000 violent deaths with a 95% CI of 104,000 to 223,000. The central estimate of L2 for violent deaths exceeds that of the IFHS by a factor of 4 and even the bottom of the L2 CI is nearly twice the top of the IFHS CI. The factor-of-4 difference translates into 450,000 additional deaths in the L2 estimate above the IFHS estimate.

    Even this formulation understates the difference between the two surveys. Using conventional estimation methods the IFHS estimate for violent deaths would have been below 100,000. The IFHS paper argues that conflict mortality surveys tend to underestimate violent deaths and adjusts its conventional estimate up to 151,000. If this is right then, for a proper comparison, either the L2 estimate should be adjusted up similarly to how the IFHS estimate was adjusted up or we should compare unadjusted IFHS figures with unadjusted L2 figures. Making the latter comparison suggests at least a factor-of-six difference between L2 and the IFHS. Indeed, L2 estimated a violent mortality rate of 7.2 per 1000 per year compared with a rate of 1.09 in the IFHS. These two estimates differ by a factor of 6.6. This translates into an L2 estimate that exceeds an unadjusted IFHS estimate by well over half a million violent deaths.

    It is clear from much of the discussion above that the IFHS and the ILCS had more rigorous quality control than did L2. Both the IFHS and the ILCS are also much larger surveys than L2. The IFHS interviewed 9345 households in 971 clusters and the ILCS interviewed 21,668 households in 2200 clusters compared to (as actually used) 1849 households in 47 clusters for L2. In short, the ILCS and the IFHS are bigger and higher-quality surveys and both suggest that L2 has overestimated violent deaths by a wide margin.

    I now compare the geographical patterns of deaths in the ILCS and L2. Table I shows that L2 and the ILCS agree rather well on violent deaths in the North and in the South.27 In Baghdad, L2 looks rather high compared with the ILCS but not exceptionally high. However, in the central governorates L2 is very high indeed. Even when we allow only L2 deaths occurring before April 2004, L2 still exceeds the upper limit of the ILCS CI by more than a factor of 7. This becomes a factor of 23 in Diyala governorate.

    Table II shows how much more L2 diverges from the ILCS when we extend L2 through to the end of May 2004.

    To summarise the patterns:

    Non-violent deaths match up well, ILCS versus L2.
    Violent deaths also match up well between the two surveys in the North and in the South.
    In Baghdad L2 is definitely high for violent deaths but not dramatically out of line with the ILCS.
    In the centre L2 has far more violent deaths than the ILCS.
    The ILCS seems to perform perfectly well relative to L2 in discovering non-violent deaths throughout Iraq. The ILCS also seems to be just as capable as L2 in discovering violent deaths in the North and South. Therefore, we cannot argue that the ILCS, perhaps due to weaknesses in its questionnaire, was not as good as L2 in finding deaths that have truly occurred. The discrepancy only arises for violent deaths in one particular region where the sudden large distance of L2 from the ILCS casts doubt on L2.

    This surplus of violent deaths in a single region should be viewed within the context of the refusal of the L2 authors to release data tying households to anonymised interviewer IDs. It is possible that a single interview team did all or many of the clusters into which so many of L2’s violent deaths are packed.

    The IFHS-L2 comparison also seems to confirm the L2 pattern of the lumping of deaths into the central governorates, although data are not yet available to repeat the precise L2-ILCS comparisons presented above. Figure 1 of the IFHS paper shows that L2 places about 26% of its violent deaths in Baghdad compared to 54% for the IFHS. About 65% of L2’s deaths are in governorates in the centre and south (Al-Anbar, Diyala, Nineveh, Salahuddin, Babylon and Basra), according to the classifications of the above tables, compared with about 35% for the IFHS.

    Figure 1 of the IFHS paper also shows that the geographical pattern of deaths in the IBC database, which is based primarily on monitoring of the international media, is consistent with that of the IFHS but not with L2.

    The IFHS paper also compares its estimates with L2’s for three different time periods. The ratio of violent mortality rates for the two studies is 1.8 (not statistically different from (1) for March 2003 to April 2004, 4.2 (highly significant) for May 2004 to May 2005 and 7.2 (highly significant) for June 2005 to June 2006. In short, L2 exhibits an extremely sharp upward trend over time compared to the relatively flat trend exhibited by the IFHS.28

    Both the geographical and the temporal heaping of deaths in L2 are consistent with a hypothesis of fabricated/falsified data. The large divergence of L2 from the IFHS comes after the time periods covered by the two main surveys that existed when L2 was published: L1 and the ILCS. If falsified violent deaths were added into the L2 dataset it would make sense to add most of them after the time period for which comparisons with other surveys were possible at the time L2 was published. This could explain why L2 diverges from the IFHS much more strongly after the ILCS/L1 period than it does before.

    L2’s geographical departures from the ILCS and the IFHS come in governorates that are known to be violent but that are outside of Baghdad. L2 researchers knew that their estimates would be compared to the counts of the IBC’s. A case can be made that the international media, the main source for IBC, covers Baghdad better than it covers other parts of the country. This may or may not be true but it is a claim that certainly sounds plausible.29 If we accept the idea of Baghdad bias in IBC data then adding many falsified violent deaths into Baghdad clusters of L2 would create a very large L2/IBC divergence in Baghdad which would have been flagged as suspicious. Adding falsified deaths into zones known to be peaceful, such as the Kurdish area, would have also raised suspicions. A better strategy would be to add falsified deaths into acknowledged violent areas outside of Baghdad, that is the central governorates of Al-Anbar, Diyala, Nineveh and Salahuddin where L2 is so far out of line with the other data sources. The geographical pattern of deaths in L2 is, therefore, not inconsistent with a falsification hypothesis.

    Finally, note that the L2 paper claims that L1 and L2 confirm each other but Gourley et al. (2007) documents that this claim does not withstand scrutiny. The L2 data suggest roughly twice as many violent deaths during the L1 coverage period than were estimated in L1.

    Cluster 33

    The following anomaly was discovered by Olivier Degomme and Deberati Guha-Sapir of the Centre for Research on the Epidemiology of Disasters (CRED) in Belgium. They found that 24 people were killed by car bombs in July 2006 in a single cluster of the L2 dataset: Cluster 33 in Baghdad.30 L2 field work finished on 10 July 2006. Therefore, these deaths must have occurred between 1 and 10 July 2006. During this time period, IBC recorded separate car bombings in which the number of people killed were 68, 17-19, 10-12, 6, 5 and fewer scattered through the neighbourhoods of Sadr City, Adhamiya, Jameela, Mansour and Al-Bayaa respectively, plus other places around Baghdad. It is crucial to note that, according to the L2 methodology, in each cluster a field team did interviews in 40 contiguous households. It is, therefore, exceptionally implausible that so many close neighbours could have been killed in multiple car bombings in different neighbourhoods of Baghdad within a single 10-day window.31 Thus, the most favourable interpretation for L2 is that all 24 victims were killed in the very large car bombing in Sadr City on 1 July (BBC 2006) and so I will assume this.

    The pictures at BBC (2006) show rather clearly that there was not a line of homes destroyed.32 It would seem to be virtually impossible for a group of 24 people coming from 18 separate homes located more or less right next to each other to all have been walking around the market clustered so close to one another when the bomb exploded. It is hard to imagine how this could have happened unless this large group of people all set out together for the market and then circulated through the market doing their shopping while holding hands. It seems likely that all or most of these deaths in the L2 dataset are fabricated.

    Recall the evidence already presented on security-caused failures to visit clusters, L2 versus IFHS. I argued that the L2 claim of 12 successful Baghdad visits in 12 attempts was highly unlikely given the 67.7% success rate in cluster visits of the IFHS in Baghdad. Cluster 33 adds a specifically suspicious cluster to the general cloud that hangs over all of L2’s Baghdad clusters in light of the IFHS.

    It is important to see the anonymised interviewer IDs for all the clusters in L2 and to check the extent to which the same interviewers might have been involved in both cluster 33 as well as in other suspicious clusters, particularly in the governorates of Diyala, Al-Tameem, Al-Anbar, Nineveh and Salahuddin. Unfortunately, the L2 authors continue to withhold these data.

    Death Certificates

    The very high rates of violent deaths measured in L2 have been defended on the grounds that a high percentage of the deaths recorded by L2 were confirmed through death certificates. According to the L2 paper and Burnham (2007):

    Field teams requested death certificates for 545 out of 629 (87%) of deaths.
    When field teams did not request death certificates this was because they ‘forgot’ (Burnham, 2007).
    When requested, respondents produced death certificates 501 out of 545 times.
    ‘The pattern of deaths in households without death certificates was no different from those with certificates’ (Burnham et al., 2006a).
    The claim that a very high percentage of the deaths in the sample were confirmed by death certificates has been central to the defence of L2 from the beginning. Given the strong unpopularity of the US-led occupation of Iraq it is easy to imagine that many respondents might have invented deaths.33 Less dramatically, it seems likely that people might have reported deaths of extended family members who did not reside within the households of respondents. Very few respondents, and perhaps not even all of the interviewers themselves, would understand the statistical imperative to limit household boundaries clearly. To the contrary, many people may feel a need to ‘bear witness’ to atrocities that have been visited on their friends and relatives. Many people may believe that the correct and moral thing to do is to report deaths of friends and family members. Such people might be baffled by the concept that somehow it is improper to report the death of, for example, a dear cousin.

    L2 largely pre-empted such lines of criticism by claiming that their teams requested death certificates for 545 out of 629 (87%) deaths and respondents were able to produce them in 501 out of these 545 cases (92%).

    There are, however, some reasons to question the high rate of death-certificate confirmation reported in L2.

    The very high number of estimated deaths in L2 implies that the official death certificate system has issued, but failed to record the issuance of, about 500,000 death certificates during the L2 coverage period.34 This forces L2 into a very delicate balancing act. For the death-certificate data to be valid it must be the case that Iraqi authorities issue death certificates for virtually all violent deaths and yet that same system fails to record the fact that death certificates have been issued roughly 90% of the time. Alternatively, it could be that Iraqi Ministry of Health is engaged in a massive and highly successful cover-up of deaths that have actually been documented through death certificates. This seems unlikely.
    L2 had an extremely compressed work schedule. Field teams routinely had to complete 40 interviews in a day. This means that respondents had to produce these death certificates almost without fail and within a matter of minutes. In many cases these documents would not have been accessed for several years prior to an L2 interview.
    In L1, the previous Lancet publication on Iraq by (mostly) the same team, the claimed rate of death certificate confirmation upon request was substantially lower than in L2: 80% when requested in L1 compared with 92% when requested in L2. The coverage period for L2 is nearly two years longer than the recall period for L1 so it should have been, if anything, harder to confirm deaths through death certificates in L2 compared to L1. Moreover, a significant fraction of the population had migrated during the time between the two studies with, presumably, at least some death certificates mislaid or buried among other belongings during these movements.
    With the release of some L2 data it became possible to examine L2’s death-certificate claims further. Here are some relatively new findings on death certificates mixed with some older discoveries from Kane (2007).

    In Table III ‘no’ means that a death certificate was requested but not produced, ‘yes’ means that a death certificate was requested and produced and ‘forgot’ (consistent with Gilbert Burnham’s MIT lecture) means that a death certificate was not requested. It is clear that, contrary to the claims of L2, the pattern of deaths with death certificates does differ from those without.

    For violent deaths, all failures to produce death certificates when asked were in a single governorate, Nineveh, whereas for non-violent deaths these failures were spread across eight governorates. It is implausible that the system of issuing death certificates and families taking care of them is nearly perfect in all but one governorate in the case of violent deaths whereas these systems are less reliable for non-violent deaths in eight governorates.
    ‘Forgetting’ to ask, or simply not asking, was far more common in Baghdad than outside Baghdad and six times more likely overall for non-violent deaths than for violent deaths (Kane, 2007).
    Baghdad, Nineveh and Thi-Qar all display strange patterns and need to be examined more closely.
    TABLE III Death-Certificate Confirmation and Non-Confirmation of Deaths in L2

    No Violent

    No. Non-Violent

    Yes Violent

    Yes Non-Violent

    Forgot Violent

    Forgot Non-Violent

    Note: Figures in bold type and those underlined particularly illustrate the author’s argument.

















































































































    Under a variety of reasonable assumptions the perfect run of 180 death certificate confirmations in 180 attempts for violent deaths outside Nineveh appears to be extremely unlikely, for example:35

    Using the death-certificate confirmation rate for L1 of 80% and assuming statistical independence across deaths, the odds against 180 confirmations in a row are 2.71027 to 1. In fact, a more direct comparison is possible for the violent deaths recorded in L2 and occurring during the L1 coverage period, i.e. through September 2004. L2 claims a perfect record of 60 confirmations in 60 attempts for violent deaths during the L1 sampling period, for which we can calculate odds of more than 650,000 to 1 against.
    Using the confirmation rate for non-violent deaths in L2 of 92%, the odds against are more than three million to 1.
    Even if we arbitrarily and implausibly assume a 0.98 probability that death certificates can be produced for each violent death we still get odds of 38 to 1 against.
    I conclude that there is likely fabrication in the death-certificate data in L2 and that these data do not give reliable support to L2’s very high estimated death rate.

    Cluster 34

    As noted in the section about Cluster 33, L2 reports that its respondents failed to produce death certificates when asked only 22 times regarding violent deaths. All 22 of the missing death certificates for violent deaths occurred in the governorate of Nineveh. L2 has five clusters in Nineveh. One of these, Cluster 34, contains 19 of these 22 confirmation failures.

    Cluster 34 contains 42 deaths, 35 of which are classified as violent. These violent deaths break down into 18 by ‘air strike’, 10 from ‘gunshot’, 4 from ‘car bombs’, 1 from ‘fight’, 1 from ‘crushed, US army vehicle’ and 1 from ‘bomb’.

    The 18 deaths in air strikes, which could only be due to the USA, contribute about 36,000 deaths to L2’s central estimate of 600,000 violent deaths. According to the L2 dataset none of these deaths were confirmed by a death certificate. For seven of the 18 the interviewers forgot to, or simply did not, ask for death certificates. These seven were in a single household that reported deaths of two girls, three boys and two women (one aged 17), due to an air strike, taking the specific form of a ‘missile on home’ in November 2005.

    For all of the remaining 11 deaths from air strikes in Cluster 34 it is reported that interviewers asked to see deaths certificates but respondents were unable to produce any. These include a second household that reported deaths in November 2005, two boys under the age of five, possibly in the same event as the above ‘missile on home’ that is claimed to have killed seven women and children in the same month. The L2 dataset claims four further air strikes in Cluster 34. These events were in June 2005, killing two men in a single household; in October 2005, again killing two men in a single household; in December 2005, killing one girl; and in March 2006, killing two men in one household and two girls in another household.36

    Cluster 34’s 18 deaths in air strikes are spread over seven households in five different months. Thus, according to the L2 data, there were at least five separate air strikes on this small neighbourhood of 40 contiguous households over a ten-month period between June 2005 and March 2006. All of these air strikes came months after the first few weeks of the war in 2003 when air strikes were common.

    Claimed air-strike victims in Cluster 34 include two women and ten children spread across four households in at least three incidents plus a 15-year-old in a fifth household/fourth incident. Survivors in all five of these households would have strong motives to report these deaths so as to receive financial compensation from the United States. Thus, if real, these deaths would be more likely to be backed by death certificates than most deaths in Iraq. Yet L2 reports that none of these deaths were corroborated by death certificates. It is also likely that 12 air-strike killings of women and children would draw international media attention. Yet none of these deaths appear in the IBC database, a strong indicator that they were not reported by the international media.37

    Table IV gives the age distribution of the victims of US air strikes in Cluster 34. This is a surprisingly young set of victims, as many as 2/3 of whom could be considered children, with three of the remaining six aged 19 or 22. The complete absence of victims over the age of 50, or in their late 20s or 30s is puzzling. Of course, there exists a general and valid perception that it is worse to kill children than it is to kill adults. Thus, this age pattern is consistent with the hypothesis that respondents or interviewers fabricated deaths to make US soldiers look bad. Similarly, 1/3 of the claimed victims in these air strikes were female, although only 9% of all violent deaths in L2 were of females.

    TABLE IV The Age Distribution of People Killed By US Air Strikes in Cluster 34














    Number Killed














    The five deaths attributed to ‘bullet by USA army’ account for about 10,000 violent deaths

    Comment by Andrew — September 18, 2010 @ 3:12 am

  19. They break down into two adult males in separate households with death-certificate confirmation in February 2005, a man in May 2005, and a girl and a woman in single household in June 2005. For the last three deaths it is reported that interviewers requested death certificates but respondents were unable to produce them. Unlike the claimed air-strike deaths, some weak corroborating evidence can be found for these shootings within the IBC database. IBC does have shootings involving US forces, sometimes in firefights with ‘anti-coalition agents’, in the relevant months in various places within the governorate of Nineveh.38 Nevertheless, it still seems unlikely that there were at least three separate shooting incidents in which US soldiers killed residents of four households in this small neighbourhood of 40 contiguous households within a span of 17 months.

    The final death attributed to the US Army is a three-year-old boy claimed to have been crushed by an American military vehicle in August 2005 with death certificate confirmation. This death does not appear in the IBC database although it is a newsworthy incident if true.

    There is no overlap between the seven households reporting deaths from US air strikes, the four households reporting deaths from US Army bullets, and the household reporting a child crushed by an American military vehicle. Thus, Cluster 34 contains 12 households claiming 24 deaths attributed to the US military in at least nine separate incidents over a 17-month period. These 24 deaths attributed to the US military in Cluster 34 constitute fully one quarter of all violent deaths attributed to coalition forces in L2 and account for about 8% of all violent deaths in L2.

    The 24 violent deaths at the hands of US soldiers are 69% of all the violent deaths in the cluster. In contrast, in the IBC database, the US is coded as being fully or partially responsible for 476 out of 2963 (16%) violent deaths of civilians in the governorate of Nineveh during the L2 sampling period. Cluster 34 contributed about 48,000 violent deaths blamed on US forces to L2’s central estimate, roughly 100 times the number of civilian deaths fully or partially attributed to US forces by IBC in the entire governorate of Nineveh. But the true discrepancy is still larger since the L2 dataset contains five Nineveh clusters.39

    The 24 people violently killed by US soldiers in Cluster 34 breaks down into six girls, six boys, three women and nine men: nine females and 15 males. Thus, in Cluster 34, 50% of these US victims were children and 38% were females. In contrast, of all violent deaths in the full L2 dataset, 11% were children and 9% were females. In all clusters combined, 19 out of 95 US victims (20%) were children and 12 (13%) were females. The entire L2 dataset contains 50 violent deaths of women and children, 15 of which (30%) are recorded as killed by the US Army in Cluster 34 alone.40 According to the L2 dataset, in Cluster 34 alone the US military killed three of the 16 women (19%), six of the 22 boys (27%) and six of the 12 girls (50%) killed violently by any party in all of L2’s 47 counted clusters combined. To summarise, if the Cluster-34 data are true, the behaviour of US soldiers within the cluster was much worse than the behaviour throughout the whole of Iraq both of US soldiers themselves and of all other agents.

    At least four factors already presented suggest the possibility of fabrication of violent deaths in Cluster 34. These include: (1) the number of killings attributed to US soldiers in the cluster; (2) the number of incidents of such killings; (3) the unique focus of these killings on women and children, compared both to killings by other agents in Iraq and to US norms throughout the country and; (4) the thinness of corroborating evidence for these killings, either through death certificates or through the international media.

    There is further evidence of the possibility fabrication in the fact that 19 out of the 24 deaths attributed to the Coalition in Cluster 34 are claimed by a string of nine households with L2 dataset IDs of 1311, 1312, 1313, 1314, 1315, 1317, 1319, 1320 and 1321. To the extent that consecutive numbers within the dataset suggests that households are in particularly close to each other, this pattern suggests that there may have been some coordination among neighbours on reporting fabricated violent deaths caused by US forces. Such coordination could have been facilitated by advance approaches by neighbourhood children, as discussed in the second section, to explain the purpose of the L2 survey. Alternatively, this string of households might have been interviewed by a single interview team that may been producing inaccurate data from the same neighbourhood.

    Cluster 34 contains an additional 11 deaths not directly attributed to US forces. Of these, five come in bombings, four of which are specifically classified as car bombings. These deaths are spread over four new households, i.e. households not reporting deaths caused by the US, and three separate months. The first car-bomb killing was of a man in April 2005 claimed to be verified by a death certificate. Next, in November 2005 there were car-bombing deaths of one man and one woman. In both cases it is reported that death certificates were requested but not produced. In addition, in November 2005, there was a bombing death of a 15-year-old classified as a man. These November bombings may have been the same event although they victimized two separate households. The fifth death was a man from a fourth household, in May 2006, and again it is reported that a death certificate was requested but not produced. The international media did report multiple car bombings in Nineveh in April 2005 and May 2006 so there is some small corroboration, at least for two of the three car bombings.41 Nevertheless, it is very unlikely that five people spread across four separate households within a small group of 40 adjacent households would have been killed in three separate car bombings. The probability of this happening may well be lower than the probability that 24 members of a single cluster could have been killed in a single car bombing, as is claimed for Cluster 33.

    L2 claims five further gunshot deaths, all of men, in Cluster 34 in addition to the five people shot to death by US soldiers already discussed above. In March 2004 there was a ‘gunshot robbery’ of a man claimed as verified by death certificate. There were four subsequent deaths in the cluster from ‘gunshot unknown’. The first two, in November and December 2004, are coded as verified by death certificates. For the second two, in September 2005 and April 2006, it is reported that death certificates were requested but not produced. None of these overlap with any of the above incidents or households. Thus, they yield five further incidents affecting five further households among this small cluster of 40 contiguous households. IBC has a number of gunshot deaths attributed to ‘anti-coalition agents’ and ‘unknown agents’ during each of these months. Nevertheless, so much targeting of this one small neighbourhood seems unlikely. Remember, that the L2 authors claim, in various forms, that all neighbourhoods had essentially equal chances of being selected into the sample.

    The final violent death in Cluster 34 was a man from another new household recorded as dying in a ‘fight’ confirmed by a death certificate in November 2004. Conceivably this was the same incident in which a member of a different household died from a gunshot.

    The 11 violent killings not directly attributed to US soldiers in Cluster 34 break down into ten men and one woman, although one man was only 15 years old. Thus, the percentage of females killed among these 11 deaths, 9%, exactly matches the percentage of females killed among all violent deaths in the L2 dataset.42 Table V summarises how the number of violent killings plus their gender and age mix compare for US soldiers and for other agents both within Cluster 34 and for all clusters. If true, it points to exceptionally dirty behaviour for US soldiers in Cluster 34 where the US is blamed for about 1/2 of all killings of women and children nationwide by L2. Other agents are held responsible for killing one woman and no children.

    TABLE V People, Females and Children Killed by US Soldiers and Other Agents
    Killed in Cluster 34

    % Killed in Cluster 34

    % Killed in all Clusters

    Children Killed in Cluster 34

    % Children among all Children Killed in all Clusters

    Females killed in Cluster 34

    % Females among all Females Killed in all Clusters

    Girls killed in Cluster 34

    % Girls among all Girls Killed in all Clusters

    US Soldiers










    Other agents










    Combining the violent activity of US soldiers and other agents, Cluster 34 contains at least 17 separate violent incidents affecting 22 of the 40 households in the cluster and causing 35 violent deaths. It is reported that only nine of the violent deaths were confirmed by death certificates, i.e. about 26%. Of the 26 non-corroborated violent deaths, death certificates were not requested for seven (27%) and were requested but not produced for 19 (73%).

    Evidence of fabrication of violent deaths in this small cluster of 40 contiguous households comes in four basic forms. First, Cluster 34 seems to have been afflicted with improbably large numbers of violent deaths, violent incidents and households affected by this violence. Second, the extent to which and manner in which US soldiers are blamed for these killings suggests some attempts to tarnish the reputation of US soldiers. The total numbers of US victims, female victims and child victims in Cluster 34 are large compared with the victims of other agents in the cluster. The percentages of female and child victims of US soldiers among all female and child victims of all agents within Cluster 34 are very high: 90% and 100% respectively. The percentages of female and child victims of US soldiers within Cluster 34 among all female and child victims of all agents in all clusters are also very high: 32% and 46% respectively.

    For these claims to be true, the behaviour of US soldiers in Nineveh would have to be very much worse than the behaviour of other agents in Nineveh and normal behaviour of US soldiers elsewhere. Third, there is no corroborating evidence, either through the international media or through death certificates, for many of the deaths. Fourth, there is a string of household IDs within which nine households out of 11 reported killings by US soldiers, suggesting that there might have been a coordinated attempt, either by interviewers or respondents, to manipulate the L2 survey.

    Mishandling of Other Evidence on Mortality in Iraq

    The L2 paper does not address contrary evidence, creates spurious confirming evidence and cites other incorrect evidence on mortality in Iraq. The impact of these distortions is to obfuscate the extent to which L2 is an outlier among all the credible sources of mortality information in Iraq (see also Spagat, 2008).

    The L2 introduction contains at least the following problems.

    It cites the US Department of Defense (DoD) as recording 117 civilian deaths per day between May 2005 and June 2006. But, Dougherty (2007) exposed the fact that, the source cited, DoD (2006), states clearly that this figure is 117 casualties per day of civilians plus combatants (Iraqi Security Forces) where casualties means killings plus injuries. The original figure from the DoD report is reproduced below as Figure 3. Note also that the DoD figure of 117 actually applies to the period 20 May 2006 through 11 August 2006, not May 2005 through June 2006 as claimed in L2. To cover the period of May 2005 through June 2006 cited in L2 we need to include three other periods during which casualties per day of Iraqi civilians plus combatants are, respectively, roughly 82, 55 and 59. Thus, the DoD figures suggest perhaps 70 casualties per day of civilians plus combatants during the period cited in L2, a difference of more than 20,000 casualties. Civilian deaths measured by DoD are likely to be considerably lower than 117 per day during the appropriate period. This is, in fact, a period when L2 measures roughly 1000 violent deaths per day. The DoD figures are again incorrectly presented as mortality numbers in Figure 4, later in the L2 paper.
    It ignores the fact that the ILCS estimated war-related deaths and its figures are much lower than the L2 figures. As noted above, the L2 estimate exceeds the ILCS one by a factor of three or four. L2 mentions the ILCS but only as confirming that bad water, sewerage and restricted electricity create health problems. L2 also mentions the ILCS in a footnote as ‘predictably’ finding substantially higher numbers than what L2 refers to as ‘passive surveillance’ efforts, i.e. IBC. Yet the ILCS estimate for civilians plus combatants killed is only 1.6 times the IBC number for only civilians killed during the ILCS period. This period is the early phase of the war when many combatants were killed. L2, on the other hand, differs by a factor of 12 with IBC, somewhat less if we take some account of combatants.
    It ignores the UN mortality monitoring (UNAMI, 2007). These figures are lower than the L2 figures for 2006 by about a factor of 12 during the first half of 2006. UNAMI measured about 80 deaths per day compared to about 1000 per day for L2 or about 170,000 violent deaths in L2 supposedly missed by the UN monitoring system.
    It ignores the daily casualty monitoring of the Iraq Ministry of Health Emergency Tracking System (Sloboda et al., 2007). These figures are lower than L2’s by about a factor of 15.
    It does mention the IBC figures, which are lower than L2’s by a factor of 12, but does not compare them to L2. Instead, L2 gives a misleading comparison suggesting that the figures of Iraq’s Interior Ministry are 75% higher than IBC’s, which might suggest to some readers that the IBC figures should be dismissed as far too low:
    Estimates from the Iraqi Ministry of the Interior were 75% higher than those based on the Iraq Body Count from the same period. (Burnham et al., 2006a)

    In fact, IBC figures are 50% higher than the Interior Ministry figures to which they are compared in the cited source (O’Hanlon and Kamp, 2006). On close inspection we see that this is an effort of the Brookings Institution that removes all morgue entries and police deaths from IBC. These figures are then compared in L2 to Interior Ministry figures that would likely include police and morgue data, thus bringing the IBC figures from 50% above to 40% below the Interior Ministry ones.
    It cites L1 as confirming L2 but, as noted above (Gourley et al., 2007), this is not the case.
    It comments that in many conflicts indirect and non-violent deaths comprise the majority of excess deaths. Yet it fails to mention that L2’s findings conflict with this common pattern. Excess non-violent deaths are statistically insignificant in L2.
    It cites (Janabi, 2006) claiming that ‘a detailed survey’ had been conducted that found 37,000 civilian fatalities between March 2003 and September 2003 in Iraq. The origin of this Al-Jazeera story was a letter posted on a blog on 21 August 2003 (Wanniski, 2003) claiming that the Iraqi Freedom Party had made a massive census-like effort to collect data on civilian deaths, visiting:
    all villages, towns, cities and some of the desert areas etc. affected by the aggression (with exception of the Kurdish area), and also by interviewing hundreds of undertakers, hospitals officials and ordinary people in these places, conducted a survey. (Wanniski, 2003)

    The posting goes on to explain that the sole copy of the report on this survey was in the possession of a single man who was unable to find a fax machine (or apparently a photocopying machine) in Baghdad so that he could fax the report to party headquarters. He had, therefore, attempted to cross over to the Kurdish zone of Iraq in search of a fax machine and had disappeared with the only copy of the report. Apparently, all supporting materials from this massive effort are also lost so there will never be a new write-up:
    Due to the absence in Iraq (with the exception of the Kurdish area) of functional communication systems with the outside World, our party headquarters in Baghdad tried to send me a fully comprehensive and detailed report by fax AI-Sulaimaniyah (a Kurdish area). However by crossing to the Kurdish area, the kurdish ‘Peshmerga’ [militia] searched the person carrying that report which was found with him and confiscated. According, he was handed over to the American troops where he was arrested and no one knows yet of his whereabouts. (Wanniski, 2003)

    Such evidence is not suitable for citation as a credible source in an academic paper.
    It claims, similarly, that ‘Iraqiyun, estimated 128,000 deaths from the time of the invasion until July 2005, by use of various sources, including household interviews’ (Burnham et al., 2006a).
    Yet in Appendix C of Burnham et al. (2006b) the L2 authors are less confident about this source: ‘The methods of this organization – reported to be direct accounts from relatives of those killed – could not be confirmed’ (Burnham et al. 2006b). Burnham et al. (2006b) cites UPI (2005):

    An Iraqi humanitarian organization is reporting that 128,000 Iraqis have been killed since the US invasion began in March 2003.

    Mafkarat al-Islam reported that chairman of the Iraqiyun humanitarian organization in Baghdad, Dr Hatim al-‘Alwani, said that the toll includes everyone who has been killed since that time, adding that 55 percent of those killed have been women and children aged 12 and under. (UPI, 2005)

    This three-paragraph UPI article is the sole basis for the claim that a survey was done. No copy of the survey has ever surfaced. Cole (2007) refers to Mafkarat al-Islam as ‘The radical Sunni Arab newspaper’. This is what the US State Department has to say about Mafkarat al-Islam (Islam Memo):

    Islam Memo, or Mafkarat al-Islam, is perhaps the most unreliable source of ‘news’ about Iraq on the Internet. For example, on March 27, 2005, Islam Memo ‘news items’ translated into English by Muhammad Abu Nasr claimed that more than 88 US soldiers had been killed that day. In reality, none had been killed. Such disinformation fabrications are typical of Islam Memo. In the ten-day period from March 20 to March 29, 2005, they claimed that more than 334 US troops had been killed. The real number was eight. (United States State Department, 2005)

    L2 diverts readers from this trail by not citing the UPI article but instead citing NGO (Non-Governmental Organisation) Coordination Committee of Iraq (2006), a fourth-hand reference which gives the Iraqiyun figure, citing the Washington Times which, in turn, just reprinted the UPI article.43

    These problems all arise just within the first four paragraphs of the L2 paper. They show a consistent pattern of not engaging with or misconstruing contrary evidence, claiming supporting evidence that is not appropriate for scientific citation and claiming support from sources that do not actually support L2. These practices are similar to those in Checchi and Roberts (2005) which contains a table (Table 6) that conveys a false impression that analysis of seven selected mortality sources for Iraq showed that IBC’s figures were low by factors of five to ten and those of L1 were moderate. Among other problems, this table cuts the IBC numbers almost in half and cites a mental health study published in the New England Journal of Medicine as yielding an extremely high mortality rate although the study offers no mortality estimate and its data are not usable for such a purpose (Dardagan et al., 2006a). These are examples of information falsification.

    [Enlarge Image]
    FIGURE 3. Casualty figures taken from the US Department of Defense (2006).
    L2’s Figure 4 attempts to convince readers that L2’s extremely sharp upward trend in mortality rates from the beginning of the war until the middle of 2006 is consistent with evidence from both the DoD and IBC. It is claimed that these common trends support the credibility of the L2 data. L2’s Figure 4 is, however, incorrect and misleading.

    [Enlarge Image]
    FIGURE 4. Trends in number of deaths reported by the Iraq Body Count and the MultiNational Corps-Iraq and the mortality rates found by this study.
    First, as noted above, the DoD figures are for casualties and not mortality so they are not comparable to the L2 ones.

    Second, the DoD figures only begin on 1 January 2004 yet L2’s Figure 4 claims a DoD figure of roughly 12,000 deaths covering March 2003 through April 2004. This figure of 12,000, which is placed virtually on top of the IBC figure, seems to be without any basis.

    Third, as pointed out in Guha-Sapir et al. (2007), Figure 4 compares L2 numbers for deaths per 1000 per year over three time periods since the start of the war with cumulative DoD and IBC figures. Of course, cumulative figures increase sharply, much like the L2 rates. But a proper comparison of rates shows the IBC figures to be relatively flat over time while the L2 ones increase very sharply.

    The DoD casualty rates for the 13-month period 1 June 2005 through 30 June 2006 are about 45% higher than DoD figures for 1 May 2004 through 31 May 2005: 0.96 and 0.66 casualties per 1000 per year respectively. The corresponding figures for L2, quoted in its Figure 4, over the same time periods are 10.9 deaths per 1000 per year and 19.8 deaths per 1,000 per year, an 82% increase. Therefore, deaths in L2 increase more sharply than casualties in the DoD data. Yet, Figure 4 places the DoD point below the L2 point for May 2004 through May 2005 and above the L2 point for June 2005 through June 2006, creating a false impression that the DoD data exhibit a sharper upward trend than do the L2 data. The opposite is true. Figure 4 of the present paper reproduces Figure 4 as it appears in L2 together with the corrected figure.

    Recall that it is argued above that the very sharp upward trend for violent mortality rates in L2 after the L1 and ILCS sampling periods were finished is, in itself, suggestive of data fabrication. Figure 4 leaves a false impression that other sources confirm this sharp upward trend.

    There is further mishandling of evidence in the ‘Discussion’ section of L2. The objective is to explain the huge difference between IBC figures (and also the spuriously cited DoD figures) and L2 figures by claiming that IBC’s ‘passive surveillance’44 methods have been shown to capture only a tiny fraction of all conflict violence:

    Our estimate of excess deaths is far higher than those reported in Iraq through passive surveillance measures. [Footnote to IBC and the DoD.] This discrepancy is not unexpected. Data from passive surveillance are rarely complete, even in stable circumstances, and are even less complete during conflict, when access is restricted and fatal events could be intentionally hidden. Aside from Bosnia [Footnote], we can find no conflict situation where passive surveillance recorded more than 20% of the deaths measured by population-based methods. In several outbreaks, disease and death recorded by facility-based methods underestimated events by a factor of ten or more when compared with population-based estimates. [Five footnotes] Between 1960 and 1990, newspaper accounts of political deaths in Guatemala correctly reported over 50% of deaths in years of low violence but less than 5% in years of highest violence. (Burnham et al., 2006a).

    What are these allegedly supporting footnotes?

    The ‘Bosnia’ study cited is actually a Croatia study (Kuzman et al., 1993). The paper examines 4339 deaths ‘recorded on two documents: a demographic mortality statistical form completed by authorized civil servants, and a death certificate completed by medical examiners’. The paper cites Ministry of Health figures that estimate ‘a total war toll of 10,000 to 12,000 deaths or more’ but does not say how the Ministry of Health made these estimates. It also mentions that the Red Cross counted 13,708 missing persons but does not speculate on how many of these people died. Conceivably, this paper could have some implications for official surveillance systems but it has no implications for media-based monitoring in Iraq.
    Roberts et al. (2001), a study done in the DRC. It reports on a population-based survey but contains no comparison with any other figures derived from other methods. On its own, it cannot be used to argue that any method undercounts war deaths by any factor compared with population-based methods.
    Roberts and Despines (1999), a letter on mortality in the DRC that reports only on survey findings and does not compare with any other figures.
    Goma Epidemiology Group (1995), a study of the health of Rwandan refugees in what was then Zaire (DRC from 1997). The study includes a survey but it is not used to estimate deaths. Thus, the paper makes no comparison of population-based estimates with deaths estimates from ‘passive surveillance’. This work contains nothing that could be used to evaluate the coverage rate of media-based monitoring such as IBC’s. The paper seems to have been included as a supporting footnote because it does refer to undercounting of deaths:
    48,347 bodies were collected by the trucks between July 14 and Aug. 14. This figure represents a minimum estimate for mortality in this population because an unknown, though probably small, number of refuges who died during the first few weeks were buried privately and, therefore, were not counted by the body collection system. (Goma Epidemiology Group, 1995)

    The paper also observes that the area consists of hard volcanic rock so burial is difficult and bodies are normally left on the ground and are, therefore, easy to count. So this undercount, irrelevant for Iraq, is thought to be small in any case.
    A study of a pellagra outbreak among refugees in Malawi in 1990 (Malfait et al., 1993). Pellagra is a nutritional disease that can result in death in severe cases. This study is not relevant to mortality monitoring in Iraq. Violent killings in Iraq are an international news story. A normally non-fatal nutritional disease among refugees in Malawi is not an international news story. Coverage rates in the monitoring of pellagra in Malawi in 1990 cannot convey useful information about coverage of mortality monitoring in Iraq. In any case, although the article does discuss passive and active surveillance there is no direct comparison between the two since the two systems were never operated simultaneously.
    Spiegel and Salama (2000), a population-based study of the Kosovo War that estimated 12,000 deaths. The study makes no mention of passive surveillance or media monitoring. It does mention three other estimates that range between 9269 and 11,334, i.e. 77% to 94% of the study’s estimate.
    Ball et al. (1999), a Guatemala study, which is the only one mentioned that actually does compare some form of media monitoring with another method. Yet this analysis has little or no applicability to the IBC’s mortality monitoring in Iraq. The Guatemala study argues that 13 mainstream newspapers in Guatemala failed completely to cover large massacres in the Guatemalan countryside in the late 1970s and early 1980s. On the other hand, it also notes that the international media and even some non-mainstream Guatemalan sources did convey at least some news about this violence. Although it is interesting to learn what the mainstream newspapers reported in Guatemala, this base of newspapers is too narrow to illuminate IBC’s coverage of Iraq. IBC incorporates news wires, many non-mainstream news sources and official figures like those of the Baghdad morgue and the Ministry of Health. Moreover, Iraq now is far more in the media spotlight than Guatemala was in the late 1970s and early 1980s and modern technologies such as the Internet and cell phones carry information much more freely out of Iraq in the 21st century than was the case in Guatemala nearly 30 years ago. Moreover, the killings in Guatemala during the relevant period were mostly of indigenous peoples who were probably not prioritised by mainstream Guatemalan newspapers. Finally, according to the Guatemala study, mainstream newspapers captured more violence than the population-based measurements in a number of years. Thus, the Guatemala study does not imply that we should expect a coverage rate for IBC of the order of 5% as suggested in L2.
    The following comparisons are not included among these L2 footnotes despite being far more relevant to the case of Iraq than the articles cited. They all suggest substantially more than 20% coverage for media-based monitoring in Iraq, contrary to the L2 claim that ‘we can find no conflict situation where passive surveillance recorded more than 20% of the deaths measured by population-based methods’:

    L1, conducted by mostly the same authors as L2, estimated 56,700 violent deaths of civilians plus combatants outside Al-Anbar governorate (EPIC, 2004), a large outlier in L1, compared to 17,687 deaths of civilians in Iraq outside Anbar recorded by IBC for the L1 period.
    The ILCS estimated 24,000 war-related deaths of civilians and combatants compared to an IBC figure of about 14,000 deaths of civilians for the ILCS coverage period.45
    Benini and Moulton (2004), a study of Afghanistan since 2001 done by colleagues of the L2 authors at Johns Hopkins, compared mortality estimates from a population-based survey with a body count based on media monitoring that used methods that inspired IBC’s approach (Herold, 2004). The survey found 5576 killed. This compares to a media-based count of 3620 civilians killed for the same period.
    I draw two conclusions from the material discussed in this section. First, L2 is much more of an outlier in the Iraq mortality literature than would be suggested by L2’s treatment of the literature. Second, the treatment of the evidence on Iraq mortality in L2 displays a pattern of data and information falsification.


    In the second section I measured L2 against the AAPOR (2005) and argued that there had been a number of violations of principles of professional responsibilities in dealing with respondents and in standards for minimal disclosure. In particular, there is evidence of inadequacies in L2’s informed consent processes and that respondents were endangered and their privacy was breached. The L2 authors have refused to disclose important information including the exact wordings of the questions that were asked, a definitive data-entry form, their full sample design and data matching anonymised interviewer IDs to households.

    In the third section, and also to some extent in the second, I presented evidence of data fabrication and falsification that includes:

    Evidence suggesting that the figure of 600,000 violent deaths was extrapolated from two earlier surveys.
    Shortcomings of disclosure just mentioned, including the L2 questionnaire, data-entry form and sample design, and data that matches interviews with anonymised interviewer IDs.
    Improbable response rates and success rates in visiting selected clusters despite highly insecure conditions.
    The presence of many known risk factors for fabrication listed in AAPOR/ASA (2003).
    A claimed field work schedule that appears to be impossible, at least without committing ethical transgressions in the field.
    Large discrepancies with other data sources on the scale, location and timing of violent deaths in Iraq in ways that are consistent with fabrication and the use of an incorrect trend figure (sub-section in the third section) that eliminates these timing discrepancies.
    Evidence of fabrication in a particular Baghdad cluster (Cluster 33) combined with the implausible claim of zero security-related failures to visit Baghdad clusters during a period when Baghdad was very insecure and further evidence of fabrication in a cluster in Nineveh (Cluster 34).
    Unlikely patterns in the confirmations of violent deaths through the viewing of death certificates and in the patterns of when death certificates were requested and when they were not requested.
    Manipulation of other evidence on mortality in Iraq and material that is not relevant to mortality in Iraq or unsuitable for citation in a scientific publication.
    A few of these anomalies could occur by chance but it is extremely unlikely that all of them could have occurred randomly and simultaneously. In light of these findings, Burnham et al. (2006a) cannot be considered a reliable contribution to knowledge about mortality during the Iraq War.

    I conclude that there should be a formal investigation of the second Lancet survey of mortality in Iraq. To aid such an investigation, L2 authors should first meet the minimal disclosure standards established by AAPOR and, in addition, should provide access to their raw data, including the filled-out data-entry forms (anonymised if necessary) and sampling details.

    Comment by Andrew — September 18, 2010 @ 3:14 am

  20. Lots of long cut/pastes here, but Andrew is correct about the Lancet study. It is bunk. Yes AP, the ORB poll gives a higher number but it also covers 14 more months than the Lancet study. That is why it was said that the ORB poll “appears to support” the Lancet study. If you consider that it covers over a year more, and that the period in between was the most violent period since the original invasion in 2003, then 600,000 (Lancet) and 1 million (ORB) look very similar, and therefore “appears to support” the Lancet study that was/is widely criticized as too high.

    However, Andrew posted the link on IBC showing that the ORB poll has been debunked. The same page also points to another peer-reviewed paper which “comprehensively discredited” the Lancet study. That is the paper Andrew quotes at length just above, which is here:

    This article shows that Lancet’s estimate diverges widely from all other credible sources (except ORB, which has itself been discredited) and shows evidence of data fabrication and falsification in the Lancet study. It also shows that the authors of the Lancet study have refused to disclose basic methodological information such as, “the survey’s questionnaire, data-entry form, data matching anonymised interviewer identifications with households and sample design.”

    These non-disclosures led the AAPOR to dismiss the study as well:

    The AAPOR President wrote on this, “When researchers draw important conclusions and make public statements and arguments based on survey research data, then subsequently refuse to answer even basic questions about how their research was conducted, this violates the fundamental standards of science, seriously undermines open public debate on critical issues, and undermines the credibility of all survey and public opinion research.”

    The WHO also dismissed the Lancet study in the peer-reviewed report on their survey on Iraq deaths (IFHS), stating flatly that: “the 2006 study by Burnham et al. (the Lanet study) considerably overestimated the number of violent deaths.”

    Another peer-reviewed paper by demographer Beth Daponte rejects the Lancet study in favor of IBC, ILCS and the IFHS (WHO):$File/

    Yet another peer-reviewed paper critiqued the methodology of the Lancet study as biased in a way that would exaggerate the estimate of violent deaths (due to Main Street Bias). This paper won the 2008 Article of the Year award in the JPR journal, with the award jury writing, “The authors show convincingly that [the Lancet study has] significantly overestimated the number of casualties in Iraq.”

    That is four peer-reviewed papers and one major survey research organization dismissing the Lancet study. That is only the tip of the iceberg of the criticism and rejection of the Lancet study by a wide range of experts, both in the academic literature and otherwise. See here for more:

    That said, while Andrew is right about the Lancet, i’d wonder why he is so quick to accept poorly supported estimates for other conflicts, when often these estimates don’t provide any methodology at all, but appear to be little more than guesses that have just become ‘conventional wisdom’ by the power of repetition. If we need to have such academic debate and detailed methodological arguments in the case of Iraq to know that things like Lancet are wrong, then why can we just accept huge numbers like “1 million” for things like the Soviet invasion of Afghanistan when this has arguably even less support? What is the real basis of that figure I wonder, and is it a better one than the basis of the Lancet or ORB poll for Iraq?

    Comment by philip37 — September 19, 2010 @ 6:10 pm

  21. Andrew,

    You wrote that one of the Lancet authors had earlier been writing about how the sanctions had increased the death rate in Iraq and how this now means that he cannot claim that prior to the invasion (and during the sanctions) the Iraqi death rate was about the same as that of Jordan. What did the Lancet author previously claim the Iraqi death rate was during the sanctions but before the war? What did he claim the excess deaths due to the sanctions was? In the 2006 Lancet article it was claimed that the death rate in Iraq prior to the invasion had been 5.5/1000, just a bit worse than Jordan’s and Syria’s 5/1000. The authors stated that after the invasion Iraq’s death rate jumped to 13.3/1000.

    What were the Lancet authors writing in 2000?

    Thank you for the link to the Michael Spagat article. I came across the guy when googling info about the effect of sanctions on Iraq (he says, essentially, that the research claiming the sanctions harmed Iraqi children was wrong). I will note that he is an economist, not a demographer or specialist in public health or epidemiology. I also found that he seems to have a conflict of interest, working with Iraq Body Count whose figures are much lower than those of the Lancet article: and whose methodoliogy is much more suspect.

    Having a conflict of interest does not mean, of course , that he is wrong. It would be interesting to see the Lancet authors’ response to his article. I’m not sure if he “destroyed” it. Two years after the Lancet article, a study by the WHO and published in the New England Journal of Medicine produced figures lower than the those of the Lancet studyy but in the same general ballpark (about 400,000 excess deaths vs. 650,000 in Lancet – the big diffrence was the number who died violently, only 150,000 in the WHO study vs. 600,000 Lancet).

    Spagat has not criticized the other article.

    About the WHO study:

    DL: What accounts for the disparity between your studies and the WHO-Iraqi government study?

    LR: There’s far less discrepancy than has been suggested in the press. We found, for the three years after the invasion, the death rates went up 2.4-fold. They found the death rate went up 2.0-fold. When they looked in their data, initially they had a death rate before the invasion that seemed to them implausibly low. Remember, this was produced by the Iraqi government, but the WHO helped analyze the data. The WHO scientists very honestly described the three problems they have with underreporting. They said, first of all, compared with other places in the region, it seems that half the death is not being reported to us. They said this for two reasons: because the death rate they found was implausibly low for the period before the invasion, and because they didn’t go to Anbar Province and much of Baghdad – the most dangerous parts of the country. So, they adjusted (for this), and it doubled what they found. When they adjusted, they had a death-rate similar to ours. Therefore, the two studies are in rough agreement as to how many people died from the invasion. We’re saying 650,000 in the first three years – they’re saying about 400,000. Also, their report only looks at violent deaths, which they found to be only 151,000, and which they said only accounted for one out of six deaths after the invasion. Our study said that of the dramatic increase of mortality, virtually all of it was from violence – 600,000 in that same window of three years. So, the two studies are not differing on how many died – they’re differing on how many died of violence. They only went on to produce the number of violent deaths, not the number of excess deaths.

    DL: Can you define “excess deaths?”

    LR: Excess deaths are deaths above the baseline rate before the invasion occurred. In World War II, when we hear a death toll, it’s not just the deaths from bullets and bombs, but also from starvation, medical dysfunction and all those things that tend to go with war.

    Comment by AP — September 20, 2010 @ 12:25 am

  22. Thanks, Phillip (and Andrew). My impression, until I hear more from the Lancet authors, is that the Lancet study most likely overestimated the number of violent deaths in Iraq, while being in the ballpark in terms of total number of excess deaths. As Lancet author Les Roberts stated:

    The NEJM article found a doubling of mortality after the invasion, we found a 2.4 fold increase. Thus, we roughly agree on the number of excess deaths. The big difference is that we found almost all the increase from violence, they found one-third the increase from violence.

    “This new estimate is almost four times the ‘widely accepted’ [Iraq Body Count] number from June of 2006, our estimate was 12 times higher. Both studies suggest things are far worse than our leaders have reported.

    “There are reasons to suspect that the NEJM data had an under-reporting of violent deaths.

    “They roughly found a steady rate of violence from 2003 to 2006. Baghdad morgue data, Najaf burial data, Pentagon attack data, and our data all show a dramatic increase over 2005 and 2006. …

    “It is likely that people would be unwilling to admit violent deaths to the study workers who were government employees.

    “Finally, their data suggests one-sixth of deaths over the occupation through June 2006 were from violence. Our data suggests a majority of deaths were from violence. The morgue and graveyard data I have seen is more in keeping with our results.”

    As I had originally posted when we began this interesting digression, “But let’s not single out Putin. The previous British prime minister and Bush together caused the deaths of 600,000 Iraqi civilians (according to the Lancet) out of a population of 31 million. Percentage-wise this is slightly less bad (2% of the Iraqi population) than Russian totals in Chechnya but in terms of raw numbers of civilian deaths Bush and Blair far “outshine” Putin.” I generally stand by that statement, although the number of people dead as a result of Bush and Blair’s invasion, it now seems, 400,000 rather than 600,000 as I had written. I was a bit wrong on Iraq’s population in the early-mid 2000s which was 27 not 31 million (today’s population), so Bush and Blair are responsible for the deaths of about 1.5% of the Iraqi population. While percentage-wise this may not be as bad as Putin’s 2.5% of Chechnya’s civilians dead, one must take into account Iraq’s larger size.

    Comment by AP — September 20, 2010 @ 8:59 am

  23. A number of problems with your responses AP.

    “I came across the guy [Michael Spagat] when googling info about the effect of sanctions on Iraq (he says, essentially, that the research claiming the sanctions harmed Iraqi children was wrong).”

    That’s not what he said. The article is here:

    It rejects the highest estimates of deaths due to sanctions in the 90’s, which came primarily from a UNICEF survey. This critique largely references another article by Tim Dyson, which, like the Spagat one does not say sanctions were not harmful to children, but that the highest estimates of deaths due to sanctions are wrong. (Note too, death is only the most extreme form of “harm”. There is much other harm short of death that something like sanctions can produce in a population.)

    You then cite a blog asserting a “conflict of interest” on the part of Spagat because of “working with Iraq Body Count whose figures are much lower than those of the Lancet article.” This is a pretty ridiculous kind of rebuttal. It would imply that IBC would also have a “conflict of interest” if it criticizes the Lancet study, which would also imply that the Lancet authors would have a “conflict of interest” if they criticize IBC, or if they criticize the IFHS “whose figures are much lower”, or if the IFHS authors criticize the Lancet article. This becomes ridiculous. Just because different researchers have produced different conclusions or figures does not mean they have a “conflict of interest” when discussing or criticizing each others conclusions. Or, if any one of them do, then they all do. This would apply to all of your quoted statements of Les Roberts about IBC or the IFHS because he has a “conflict of interest” because the figures of his study are “much higher” than either of those. If the charge applies to Spagat, then it applies to all of them. So why then even bring it up in the case of Spagat? It’s just an attempt to poison the well and evade the arguments. If different researchers reach different results you _want_ them to criticize each other because that is how you might discover which results are right or wrong.

    As far as I can tell, Spagat’s articles do not make false statements, but your quotes of Les Roberts do make numerous false statements. In fact, the Lancet authors have constantly made false statements about their own study, those of others, and on a wide range of issues related to Iraq. In my view, this is because they are trying to defend something that is false, so have to resort to falsehoods to accomplish the task.

    For example, you quote Roberts asserting that, “This new estimate [IFHS] is almost four times the ‘widely accepted’ [Iraq Body Count] number from June of 2006, our estimate was 12 times higher.”

    The estimate of IFHS was three times higher than IBC’s figures for the same period, not four times. In fact, the IFHS report compares them directly and says this explicitly. Note as well that the IFHS is for civilians and combatants alike, while IBC is for civilians only, so it should be somewhat lower even if they had both captured all relevant deaths perfectly. But Roberts falsifies this difference upward to suit his purposes.

    Roberts also asserts, “They [IFHS] roughly found a steady rate of violence from 2003 to 2006. Baghdad morgue data, Najaf burial data, Pentagon attack data, and our data all show a dramatic increase over 2005 and 2006.”

    This and many other false statements of Roberts and other Lancet authors have been addressed in another paper by Spagat here:

    In fact, the trends over time for the IFHS, IBC and other sources are quite similar to each other. It is the Lancet trend line that is off in another universe. None of the things Roberts lists supports the Lancet trend. The Baghdad morgue data is already built into the IBC data. There is no trend data for “Najaf burial data” that could support or refute any of the trend lines, and he misleading shifts to Pentagon “attack data” while ignoring the death data which again supports the IFHS and IBC type trends, but the “attack data” doesn’t really support Lancet either. See sections 5 and 7 of the above paper for details.

    Roberts (and you) then rest on shifting the death estimate of IFHS from violent deaths to “excess deaths”, creating an estimate of “400,000 excess deaths” that was not published in the IFHS or in any peer-reviewed context, but was rather inferred by third parties from the raw data.

    Roberts says, “They only went on to produce the number of violent deaths, not the number of excess deaths.” Indeed, they did not produce any number of excess deaths because they said doing so would be unreliable and would tend to lead to an exaggerated number. Specifically they said, “Overall mortality from nonviolent causes was about 60% higher in the post-invasion period than in the pre-invasion period. Although recall bias may contribute to the increase, since deaths before 2003 were less likely to be reported than more recent deaths, this finding warrants further analysis.”

    To translate, this means that some of the measured increase between the two periods is likely to be not real, but a product of recall bias leading to more underreporting further back in time in the recall period, particularly the pre-war rate used to do the “excess death” calculation. So the IFHS authors do not propose such an estimate and instead suggest it would probably be exaggerated and requires further research.

    This caution is ignored by Roberts so he can create a number that seems somewhat closer to his. But even doing this, the difference between 400,000 and 650,000 is still quite large, some 250,000 deaths. That difference is greater, for example, than the difference between the IBC and IFHS violent death estimates (50,000 vs. 150,000 – a difference of 100,000 deaths). Yet Roberts claims “the two studies are not differing on how many died”, yet makes a fuss about the smaller difference between IFHS and IBC (which he dishonestly exaggerates to four times).

    However, let’s just put aside the fact that the IFHS and NEJM never published a “400,000 excess deaths” figure, and put aside that they suggested such would be unreliable and likely exaggerated specifically. And let’s put aside that 400,000 is hardly very similar to 650,000. Still, the two studies are measuring a completely different universe. Roberts tries to pass this off as not too relevant, “the two studies are not differing on how many died – they’re differing on how many died of violence.”

    But even if 400,000 and 650,000 is not considered an important difference, the difference on cause of death is still a huge difference. It means the two studies are simply not seeing the same universe as each other. Imagine if one study claimed that there were 400,000 deaths from criminal murders while another said there were 650,000 from cancer. And imagine an author trying to claim that these studies somehow support each other, simply because the two numbers – in the entirely abstract numerical sense – are sort of not that far apart, in the same “ball park”. This is irrelevant if they aren’t measuring the same phenomenon.

    There are strong differences between violent and non-violent deaths. For one, violent deaths in the Iraq war are, by a huge majority, of mainly adult males in the age of around 18-40. Non-violent deaths would show no such pattern _at all_. They would not have much gender imbalance, and would be weighted more toward the elderly and possibly young children and infants. You can not just shift one over to the other. They would be measuring different deaths, different people. This would be like saying that if we have 10 young males killed by bullets that this is the same as 4 children, 3 women and 3 men who died of diseases because both totals come out to the number “10”.

    Moreover, the Lancet study estimate of 650,000 ‘excess’ consists of 600,000 violent deaths and only 50,000 attributed to any non-violent cause. The Lancet report even says the increase in non-violent is not statistically significant, which means the finding might as well be zero. Yet if we accept the 400,000 estimate for IFHS this would be 150,000 from violence and 250,000 from other causes, which would be highly significant for both.

    The two studies are simply not seeing the same Iraq, not by a long shot. They might as well have been looking at two different countries. On the other hand, IBC and IFHS are seeing a similar Iraq. The IFHS is higher than IBC in terms of numbers, explainable by several possible reasonable factors including the exclusion of combatants by IBC, general underreporting in IBC, and, overestimation by IFHS, since it is still an extrapolation from a sample that would require a fairly wide range of error. The other large study, ILCS, also saw an Iraq very similar to IFHS and IBC in the same periods. These sources are similar then on numbers, trends over time, and geographic distribution of deaths, while the Lancet study is off on another planet on all these questions. I think all the studies i listed in my previous comment explain why: the Lancet study is just bunk, even if some of the Lancet authors are intent on distorting everything they can to prevent this obvious conclusion.

    Comment by philip37 — September 21, 2010 @ 1:53 pm

  24. I should add to the above, I think that there may very well have been a substantial increase in non-violent death rates in Iraq since the invasion, possibly even into the hundreds of thousands range. There are many reasons why such an increase could have happened, such as the high level of displacement in Iraq due to the war, which usually results in deteriorated health conditions for the displaced people, and the stress on the hospital system making good treatment more difficult to obtain, including the departure of lots of health professionals fleeing the conflict, leaving the public without enough good doctors, etc. It just doesn’t seem like any of the existing studies gives very reliable information on what the numbers might be, and I don’t think the ‘excess’ number derived from the IFHS is enough to support such numbers, nor can it save the Lancet study in any way. If we accept the Lancet study, there was basically no increase in the death rates except for those killed in the violence. If we accept the inference from the IFHS, then it would appear there was a substantial increase in non-violent death rates. Neither inference is very strong in my view.

    Comment by philip37 — September 21, 2010 @ 3:23 pm

  25. AP you are still basing your information on a widely debunked study, one that seems to have been performed by people with a particular political agenda.

    For the numbers given for Afghanistan, a lot of it comes from the wildly different tactics used by the Russians in that conflict.

    1,000,000 is actually about a mid range estimate, some estimates running as high as 2,000,000 plus.

    Given the Russian practice of deliberately targeting Afghan agriculture for destruction, and its deliberate practice of killing recalcitrant civilians through execution, well you get the idea.

    They had little or no regard for human rights as it was, and used far more destructive tactics, literally carpet bombing and napalming anything that moved. Mining huge areas of countryside, poisoning wells, dropping delayed action bombs shaped as traditional Afghan toys, that sort of thing. When the Russians bombed Herat they killed between 18,000 and 23,000 (usually given as 20,000) in a couple of days.

    Anyway some of the numbers:

    Afghanistan (1979-2001): 1 800 000
    Soviets vs. Mujahideen vs. Govt. vs. Taliban [estimates listed chronologically]
    War Annual 6 (1994): 1,000,000
    Britannica Annual (1994): 1,500,000
    Wallechinsky (1995): 1,300,000
    D.Smith (1995): 1,500,000
    B&J (1997): 1,500,000 (1979-95)
    Dictionary of 20C World History (1997): 1M
    CDI: 1,550,000 (1978-97)
    29 April 1999 AP: 2,000,000
    Dict.Wars: >2M
    23 May 1999 Denver Rocky Mtn News: 1,800,000
    Ploughshares 2000: 1,500,000
    [MEDIAN of latest five: 1,800,000]

    Soviet Phase and immediate aftermath only
    Isby, War in a Distant Country: Afghanistan (1989): Civilian deaths:
    1986 voluntary aid study: 600,000
    1987 USAID study: 875,000
    1987 Gallup study: 1,200,000
    2 June 2002 LA Times: 670,000 civilians during 10-year Soviet occupation
    Toronto Star (6 May 1991): more than 1,000,000
    SIPRI 1990: 1,000,000 total dead (the 1988 Yearbook estimated 100-150T battle dead)
    Minneapolis Star-Tribune (14 Sept. 1991): 1,500,000
    FAS 2000: 1-2M Afghans (1979-89)
    USA Today (17 Apr. 1992): more than 2 million.
    [MEDIAN: 1.5M]
    20 Sept 2001 Christian Science Monitor: 400,000 civilian deaths in the 1990s []
    Factional fighting in Kabul, 1992-96
    30 Dec. 2001 AP: 50,000
    2 June 2002 LA Times: >50,000 acc2 Red Cross
    2 June 2002 LA Times: 20,000 civilians k. by Soviet air raids, March 1979 in Herat
    4 March 1980 AP: 1,300 villagers in Konarha Province k. by Soviets & Afghan govt. “last year”
    By Soviets in Kunduz (province in northern Afg.)
    27 March 1985 Chicago Tribune: 900 massacred
    26 Feb. 1985 AP: 480 civilians massacred at Chahardara (town) ca. Feb. 2/3
    Taliban POWs k. by Northern Alliance in Mazar-i-Sharif, May 1997
    28 Nov.1998 NY Times: up to 2,000
    26 Aug. 2002 Newsweek: 1,250
    By Taliban in Mazar-e Sharif, Nov. 1998
    13 Nov. 1998 News-India Times: 5,000-8,000 massacred
    28 Nov.1998 Washington Post: 2,000-5,000 ethnic Hazara civilians k.
    Harff & Gurr: 1,000,000 old regime loyalists, rebel supporters were victims of revolutionary politicide.
    Soviet deaths:
    FAS 2000: ca. 14,500
    20 May 88 Chicago Tribune: 12-15,000 killed
    Isby, War in a Distant Country: 13,310 KIA as of 25 May 1988
    24 Dec. 1989 Arizona Republic: 13,310
    War Annual 6 (1994): 13,833
    Wallechinsky: 14,454, incl. 11,381 in combat


    Destruction in Afghanistan
    Estimates of the Afghan deaths vary from 100,000[84] to 2 million.[85] 5 million Afghans fled to Pakistan and Iran, 1/3 of the prewar population of the country. Another 2 million Afghans were displaced within the country. In the 1980s, half of all refugees in the world were Afghan.[86]
    Along with fatalities were 1.2 million Afghans disabled (mujahideen, government soldiers and noncombatants) and 3 million maimed or wounded (primarily noncombatants).[87]
    Irrigation systems, crucial to agriculture in Afghanistan’s arid climate, were destroyed by aerial bombing and strafing by Soviet or government forces. In the worst year of the war, 1985, well over half of all the farmers who remained in Afghanistan had their fields bombed, and over one quarter had their irrigation systems destroyed and their livestock shot by Soviet or government troops, according to a survey conducted by Swedish relief experts [86]
    The population of Afghanistan’s second largest city, Kandahar, was reduced from 200,000 before the war to no more than 25,000 inhabitants, following a months-long campaign of carpet bombing and bulldozing by the Soviets and Afghan communist soldiers in 1987.[88] Land mines had killed 25,000 Afghans during the war and another 10–15 million land mines, most planted by Soviet and government forces, were left scattered throughout the countryside.[89]
    A great deal of damage was done to the civilian children population by land mines.[90] A 2005 report estimated 3–4% of the Afghan population were disabled due to Soviet and government land mines. In the city of Quetta, a survey of refugee women and children taken shortly after the Soviet withdrawal found over 80% of the children refugees unregistered and child mortality at 31%. Of children who survived, 67% were severely malnourished, with malnutrition increasing with age.[91]
    Critics of Soviet and Afghan government forces describe their effect on Afghan culture as working in three stages: first, the center of customary Afghan culture, Islam, was pushed aside; second, Soviet patterns of life, especially amongst the young, were imported; third, shared Afghan cultural characteristics were destroyed by the emphasis on so-called nationalities, with the outcome that the country was split into different ethnic groups, with no language, religion, or culture in common.[92]

    Comment by Andrew — September 22, 2010 @ 7:34 am

  26. As for the difference in tactics:

    Soviet Atrocities in Afghanistan
    During the Soviet military aggression in Afghanistan, over one million Afghans died and millions more were wounded or left to fend for themselves without the benefit of food, water, medical care, or housing. There were also numerous eyewitness accounts of incredible cruelty and acts of cold calculated murder. One such act was observed by a doctor in September of 1984 who witnessed Soviet troops in a Afghan village: “They tied them up and piled them like wood. They poured gasoline over them and burned them alive.”
    An Afghan resistance leader recounted how Soviet soldiers treated civilians who were left behind when another village was abandoned: “The Russians tied dynamite to their backs and blew them up.” Another eyewitness described a fiendish practice that Russians used to extract information about the mujahadeen (Muslim freedom fighters): “They would slowly roast a child over fire”.
    The Soviets also, reportedly would encircle villages, enter every dwelling, and kill every inhabitant, including old men, women and children. Before leaving, they would burn down the entire village.[1] A 1986 report gives a chilling account:
    “In three small villages near Qandahar, last year, the Soviets killed close to 350 women and children in retaliation for a Mujahadeen attack in the vicinity. After slitting the throats of the children, disemboweling pregnant women, raping, shooting and mutilating others, the Russians poured a substance on the bodies which caused instant decomposition.”[2]
    Angelo Rasanayogan, offers an analysis of the frustration of fighting an elusive opponent and the tactical brutality of the Russian invasion in his book, Afghanistan: A Modern History:
    “The frustration of waging what appeared to be an ‘unwinnable war’ against uconventional guerilla forces, denied the Soviets the prospect of ever hoping to permanently pacify the countryside or to expand the areas under their control. The mujahadeen were like Mao Tse Tung’s fish in the sea, and the Soviets in the mid-1980’s began to adopt a policy aimed at draining the sea itself. Civilians were driven out of their homes as Soviet forces indiscriminately bombed villages and destroyed crops, orchards and irrigation systems, and scattered anti-personnel mines over large tracts of the country-side where a guerilla presence was suspected”.[3]
    Other eyewitnesses describe harrowing incidents of cruelty and almost unspeakable butchery:
    “The Russians took 14 of us and made us stand in a line near this wall. Two Russian soldiers stood in front of us with machine guns. We began reciting The Holy Kalima from the Holy Qur’an, because we knew we were about to die. They machined gunned everyone of us. I fell. There were a pile of bodies, all on top of me. The soldiers searched us and took our money.
    They moved me but I just pretended to be dead.”[4]
    An unidentified Soviet soldier described what he perceived as, “ such thing as a peaceful population, they were all guerrilla fighters. I remember how we once rounded up all the women and children, poured kerosene over them and set fire to them. Yes, it was cruel. Yes, we did it, but those kids were torturing our wounded soldiers with knives.”[5]
    Another Russian describes the wanton lack of regard for human life, and what he perceives as reasons for the Soviet soldiers propensity to kill without restraint:
    “A young soldier might kill just to test his gun, or if he’s curious to see what the inside of a human being looks like or whats inside a smashed head. But there is also the fact that if you don’t kill, you’ll get killed.
    It’s a feeling of being drunk on blood. Often you kill out of boredom or because you just feel like doing it—it’s like hunting rabbits.”[6]
    Maynom, an Afghan villager from Laghman province, describes a living hell:
    “The rockets were falling all around us like leaves off a tree. My daughter’s head was smashed open. Her brains were hanging from a branch. I lost Everything–my cousins, my nephews, everybody was killed–my wife, four children.”[7]
    With the numerous methods of inflicting suffering and devastation on a defenseless civilian population, there are none as revolting and as devilish as the purposeful targeting of children with the dispersal of millions of land mines.
    Many of these explosive devises were designed to look like toys, and were fashioned in bright colors to attract the curiosity of children. These land bombs were shaped like butterflies, or kites, or made of translucent plastic (making them especially irresistible to unsuspecting children. Apparently, the purpose was to murder and to maim chidren who the Soviets feared would mature into freedom fighters. This practice, while either ignored or overlooked by Western media, is documented by independent news sources.[8]

    Comment by Andrew — September 22, 2010 @ 7:41 am

  27. “1,000,000 is actually about a mid range estimate, some estimates running as high as 2,000,000 plus.”

    …or as low as 100,000 apparently: “Estimates of the Afghan deaths vary from 100,000[84] to 2 million.[85]”. When the range is that wide taking a “mid range” is not particularly meaningful.

    You list a lot of figures, but what seems to be missing is how any of them were derived: the methodology. Without some explanation of how the figures were determined it’s hard to know if any of them are much better than someone’s wild guess.

    Comment by philip37 — September 22, 2010 @ 2:20 pm

  28. @Professor:

    On the one hand, I applaud your tolerance and the fact that you don’t filter or censor posts. On the other hand, there must be a limit to how much damage a saboteur like Andrew can inflict on your site. Andrew’s modus operandi at the LR blog is: whenever he sees a topic that’s damaging to what he perceives as his “cause”, he takes a dump of thousands and thousands of lines of totally irrelevant off-topic cut-and-paste garbage, thus making the topic unreadable. That’s what he has done to you here on this page. I would recommend deleting any dump like that. Make the cut-off at, say, 2,000 words and delete the rest.

    @Peter who wrote: “Sorry to disappoint you, but Putin did say “Stalin” instead of “Lenin” — but clearly this time it was just a verbal slip…

    No. Can’t you read? Let me remind you:

    ???????? ????? ???????? ?????: – “? ???? ????????? ??????. ???????? ??? ????? ??? ???? ????????”

    In case your Russian is rusty, this says:

    Putin took a long pause: “I have a counter-question for you. Was Cromwell better or worse than Stalin?”

    Counter-question”! That means “a different question”. So, Putin changed from Lenin to Stalin on purpose and openly told the journalist that he was asking a DIFFERENT question. Unfortunately, the original translation here was: “another question”, which hides this fact. So, everything was on the up and up. As always, it was lost in the translation.

    Peter continued: “… rather than yet another demonstration of his hopeless inability to handle unscripted questions.

    Nonsense. You are for some weird reason confusing Putin with Bush. Putin is the smart one. Bush and Palin are the retards. Here are some typical videos and quotes of Bush and Palin answering unscripted questions:

    Couric strikes again – asks Palin impossible question:

    It was a pretty simple question. “When it comes to establishing your worldview, I was curious, what newspapers and magazines did you regularly read before you were tapped for this to stay informed and to understand the world?” Couric asked.

    “I’ve read most of them, again with a great appreciation for the press, for the media,” Palin responded.

    You think Couric may have had a follow-up to that response? You’d be correct. And it wasn’t a devious follow-up. In fact, it was two words. “What specifically?”

    Same response. “Um, all of them, any of them that have been in front of me all these years,” Palin said.
    Palin On Foreign Policy
    “Exclusive”: Katie Couric talks with Gov. Sarah Palin about her foreign policy experience and Alaska’s proximity to Russia
    Bush Speech on Tribal Sovereignty
    “Mr. President, what does tribal sovereignty mean in the 21 st century?”
    “Tribal sovereignty means that: it’s sovereign. it’s, you’re a, you’re a — you’ve been given sovereignty, and you’re — viewed as a sovereign entity……. And therefore the relationship between the federal government and —– tribes is one between —— sovereign entities.”
    Bush “Fool Me Once…”
    “There’s an old saying in Tennessee—I know it’s in Texas, probably in Tennessee…. — that says: “fool me once – shame on……… shame on …… you? You fool me—you can’t get fooled again!”

    “You know, one of the hardest parts of my job is to connect Iraq to the war on terror.” –interview with CBS News’ Katie Couric, Sept. 6, 2006
    “Our enemies are innovative and resourceful, and so are we. They never stop thinking about new ways to harm our country and our people, and neither do we.” –Washington, D.C., Aug. 5, 2004
    “Too many good docs are getting out of business. Too many OB-GYNs aren’t able to practice their love with women all across this country.”

    “Rarely is the questioned asked: Is our children learning?” -Florence, South Carolina, Jan. 11, 2000

    “I’ll be long gone before some smart person ever figures out what happened inside this Oval Office.” –Washington, D.C., May 12, 2008

    “The most important thing is for us to find Osama bin Laden. It is our number one priority and we will not rest until we find him.” –Washington, D.C., Sept. 13, 2001

    “I don’t know where bin Laden is. I have no idea and really don’t care. It’s not that important. It’s not our priority.” –Washington, D.C., March 13, 2002

    “Goodbye from the world’s biggest polluter.” –in parting words to world leaders at his final G-8 Summit, punching the air and grinning widely as those present looked on in shock, Rusutsu, Japan, July 10, 2008

    “I wish you’d have given me this written question ahead of time so I could plan for it…I’m sure something will pop into my head here in the midst of this press conference, with all the pressure of trying to come up with answer, but it hadn’t yet…I don’t want to sound like I have made no mistakes. I’m confident I have. I just haven’t — you just put me under the spot here, and maybe I’m not as quick on my feet as I should be in coming up with one.” –after being asked to name the biggest mistake he had made, Washington, D.C., April 3, 2004

    “For every fatal shooting, there were roughly three non-fatal shootings. And, folks, this is unacceptable in America. It’s just unacceptable. And we’re going to do something about it.” –Philadelphia, Penn., May 14, 2001

    “They misunderestimated me.” –Bentonville, Ark., Nov. 6, 2000

    “I would say the best moment of all was when I caught a 7.5 pound largemouth bass in my lake.” –on his best moment in office, interview with the German newspaper Bild am Sonntag, May 7, 2006
    George W Bush Comedy Quotes

    “I know the human being and fish can coexist peacefully.” –Saginaw, Mich., Sept. 29, 2000

    “Do you have blacks, too?” –to Brazilian President Fernando Cardoso, Washington, D.C., Nov. 8, 2001

    Reporter: Is the tide turning in Iraq? — Bush: “I think – tide turning – see, as I remember, I was raised in the desert, but tides kind of, it’s easy to see a tide turn……….. Did I say those words?” June 14, 2006

    “You are working hard to put food on your family.” –Greater Nashua, N.H., Chamber of Commerce, Jan. 27, 2000

    “My pro-life position is: I believe there’s life. It’s not necessarily based in religion. I think there’s a life there, therefore the notion of life, liberty, and pursuit of happiness.” – Quoted in the San Francisco Chronicle, Jan. 23, 2001

    “Border relations between Canada and Mexico have never been better” September 24, 2001

    Comment by Ostap Bender — September 23, 2010 @ 11:11 pm

  29. @Professor:

    On the one hand, I applaud your tolerance and the fact that you don’t filter or censor posts. On the other hand, there must be a limit to how much damage a saboteur like Andrew can inflict on your site. Andrew’s modus operandi at the LR blog is: whenever he sees a topic that’s damaging to what he perceives as his “cause”, he takes a dump of thousands and thousands of lines of totally irrelevant off-topic cut-and-paste garbage, thus making the topic unreadable. That’s what he has done to you here on this page. I would recommend deleting any dump like that. Make the cut-off at, say, 2,000 words and delete the rest.

    Comment by Ostap Bender — September 23, 2010 @ 11:11 pm

  30. Oh I don’t know Ostap retard,

    If the professor wants to edit my posts no problem.

    However, as evidenced above, you are the one sabotaging the post with your Bush V Palin drivel.

    Substandard BS from your substandard intellect.

    Comment by Andrew — September 27, 2010 @ 12:28 am

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress