Streetwise Professor

May 10, 2020

Code Violation: Other Than That, How Was the Play, Mrs. Lincoln?

Filed under: CoronaCrisis,Economics,Politics,Regulation — cpirrong @ 3:03 pm

By far the most important model in the world has been the Imperial College epidemiological model. Largely on the basis of the predictions of this model, nations have been locked down. The UK had been planning to follow a strategy very similar to Sweden’s until the Imperial model stampeded the media, and then the government, into a panic. Imperial predictions regarding the US also contributed to the panicdemic in the US.

These predictions have proved to be farcically wrong, with deaths tolls exaggerated by one and perhaps two orders of magnitude.

Models only become science when tested against data/experiment. By that standard, the Imperial College model failed spectacularly.

Whoops! What’s a few trillions of dollars, right?

I was suspicious of this model from the first. Not only because of its doomsday predictions and the failures of previous models produced by Imperial and the leader of its team, Neil Ferguson. But because of my general skepticism about big models (as @soncharm used to say, “all large calculations are wrong”), and most importantly, because Imperial failed to disclose its code. That is a HUGE red flag. Why were they hiding?

And how right that was. A version of the code has been released, and it is a hot mess. It has more bugs than east Africa does right now.

This is one code review. Biggest take away: due to bugs in the code, the model results are not reproducible. The code itself introduces random variation in the model. That means that runs with the same inputs generate different outputs.

Are you fucking kidding me?

Reproducibility is the essence of science. A model whose predictions can not be reproduced, let alone empirical results based on that model, is so much crap. It is the antithesis of science.

After tweeting about the code review article linked above, I received feedback from other individuals with domain expertise who had reviewed the code. They concur, and if anything, the article understates the problems.

Here’s one article by an interlocutor:

The Covid-19 function variations aren’t stochastic. They’re a bug caused by poor management of threads in the code. This causes a random variation, so multiple runs give different results. The response from the team at Imperial is that they run it multiple times and take an average. But this is wrong. Because the results should be identical each time. Including the buggy results as well as the correct ones means that the results are an average of the correct and the buggy ones. And so wouldn’t match the expected results if you did the same calculation by hand.

As an aside, we can’t even do the calculations by hand, because there is no specification for the function, so whether the code is even doing what it is supposed to do is impossible to tell. We should be able to take the specification and write our own tests and check the results. Without that, the code is worthless.

I repeat: “the code is worthless.”

Another correspondent confirmed the evaluations of the bugginess of the code, and added an important detail about the underlying model itself:

I spent 3 days reviewing his code last week. It’s an ugly mess of thousands of lines of C (not C++). There are hundreds of input parameters (not counting the fact it models population density to 1km x 1km cells) and 4 different infection mechanisms. It made me feel quite ill.

Hundreds of input parameters–another huge red flag. I replied:

How do you estimate 100s of parameters? Sounds like a climate model . . . .

The response:

Yes. It shares the exact same philosophy as a GCM – model everything, but badly.

I recalled a saying of von Neumann: “With four parameters I can fit an elephant, with five I can make him wiggle his trunk.” Any highly parameterized model is IMMEDIATELY suspect. With so many parameters–hundreds!–overfitting is a massive problem. Moreover, you are highly unlikely to have the data to estimate these parameters, so some are inevitably set a priori. This high dimensionality means that you have no clue whatsoever what is driving your results.

This relates to another comment:

No discussion of comparative statics.

So again, you have no idea what is driving the results, and how changes in the inputs or parameters will change predictions. So how do you use such a model to devise policies, which is inherently an exercise in comparative statics? So as not to leave you in suspense: YOU CAN’T.

This is particularly damning:

And also the time resolution. The infection model time steps are 6 hours. I think these models are designed more for CYA. It’s bottom-up micro-modelling which is easier to explain and justify to politicos than a more physically realistic macro level model with fewer parameters.

To summarize: these models are absolute crap. Bad code. Bad methodology. Farcical results.

Other than that, how was the play, Mrs. Lincoln?

But it gets better!

The code that was reviewed in the first-linked article . . . had been cleaned up! It’s not the actual code used to make the original predictions. Instead, people from Microsoft spent a month trying to fix it–and it was still as buggy as Kenya. (I note in passing that Bill Gates is a major encourager of panic and lockdown, so the participation of a Microsoft team here is quite telling.)

The code was originally in C, and then upgraded to C++. Well, it could be worse. It could have been Cobol or Fortran–though one of those reviewing the code suggested: “Much of the code consists of formulas for which no purpose is given. John Carmack (a legendary video-game programmer) surmised that some of the code might have been automatically translated from FORTRAN some years ago.”

All in all, this appears to be the epitome of bad modeling and coding practice. Code that grew like weeds over years. Code lacking adequate documentation and version control. Code based on overcomplicated and essentially untestable models.

But it gets even better! The leader of the Imperial team, the aforementioned Ferguson, was caught with his pants down–literally–canoodling with his (married) girlfriend in violation of the lockdown rules for which HE was largely responsible. This story gave versimilitude to my tweet of several days before that story broke:

It would be funny, if the cost–in lives and livelihoods irreparably damaged, and in lives lost–weren’t so huge.

And on such completely defective foundations policy castles have been built. Policies that have turned the world upside down.

Of course I blame Ferguson and Imperial. But the UK government also deserves severe criticism. How could they spend vast sums on a model, and base policies on a model, that was fundamentally and irretrievably flawed? How could they permit Imperial to make its Wizard of Oz pronouncements without requiring a release of the code that would allow knowledgeable people to look behind the curtain? They should have had experienced coders and software engineers and modelers go over this with a fine-tooth comb. But they didn’t. They accepted the authority of the Pants-less Wizard.

And how could American policymakers base any decision–even in the slightest–on the basis of a pig in a poke? (And saying that it is as ugly as a pig is a grave insult to pigs.)

If this doesn’t make you angry, you are incapable of anger. Or you are an idiot. There is no third choice.

Print Friendly, PDF & Email

26 Comments »

  1. Alex Berenson also made a great speech on marijuana.

    Read it here, the first paragraph will grab you.

    https://imprimis.hillsdale.edu/marijuana-mental-illness-violence/

    Alex is the go to guy on the virus also.

    Comment by Joe Walker — May 10, 2020 @ 3:11 pm

  2. We traded posts on Twitter about this. As you put it this is despicable: worthy of Mann’s East Anglia the dog ate my data from n the cloud excuse when he had it subpoenaed. Replicability has become a real crisis. In BD BS non replicability is reported at over 80%. The so called natural sciences are polluted as well. Poorly executed Markov processes are a notorious culprit. The real problem is that people buy into these things without understanding what the hell is going on. Did this credentialed idiot set any predetermined sets and outcomes and back test? Apparently not. Would have cut into his booty call time.

    Comment by Sotosy1 — May 10, 2020 @ 3:15 pm

  3. I failed to mention one thing: this gimp has almost made Markov infected OAS models look good. At least they backtested, even if they had to add fudge factors like mean reversion, etc. to make it look like it worked. I didn’t think that was possible.

    Comment by Sotosy1 — May 10, 2020 @ 3:21 pm

  4. The USA needs to extradite Ferguson for economic sabotage if the UK is not going to do so.
    I’m sure that a case can be constructed of either contributory negligence or deliberate destructiveness based on the faults of the models you’ve outlined above.

    Comment by Nessimmersion — May 10, 2020 @ 3:30 pm

  5. @Joe Walker. Thanks. I follow Berenson closely.

    @Sotosy1. Public science has been totally corrupted by the confluence of money and agendas. Further, many (most?) of those who spend the money are both incapable of evaluating the rigor and reliability of what they bought, and indifferent regardless, because they are buying specific results that advance an agenda. The money too often goes to those who will produce the wanted results, not to those who will evaluate the evidence rigorously and objectively.

    Re backtesting. I was thinking about this the other day. These epidemiological models of something like covid19 cannot be backtested if you believe other things that the “experts” tell us about it. Note the inherent contradiction. A “novel” virus, which we are told over and over again we know little about and may differ dramatically from other viruses. How can you backtest a model of a one-off? You can’t! You should put very little weight on a model that cannot be backtested. But governments bet everything on it!

    It drives me to distraction. These people commit every crime against science and good methodological `practice that you can imagine. But don’t you dare criticize it! If you do, you will be labeled an anti-science “denier” who wants people to die.

    Comment by cpirrong — May 10, 2020 @ 4:39 pm

  6. @Sotosy1. This model makes me pine for Gaussian copulas!

    Comment by cpirrong — May 10, 2020 @ 4:48 pm

  7. I recently re-read Feynman’s book, “Surely You’re Joking, Mr. Feynman,” in which the last chapter is his famous essay, “Cargo Cult Science.” His criticisms of the lack of integrity in the science of that time are as pertinent today, if not more so, than when he gave his speech in 1974.

    http://calteches.library.caltech.edu/51/2/CargoCult.pdf

    Comment by ColoComment — May 10, 2020 @ 4:50 pm

  8. @Sotosy1. The models are such that even if they were coded properly, they would be impossible to understand. And given the appalling code, trying to understand what is actually happening is beyond impossible. Even if you knew the code with metaphysical certainty you could not predict the outcome because the poor coding creates wildly idiosyncratic software-hardware interactions.

    Re poorly executed Markov processes. I recall a statement in Tuekolsky et al’s Numerical Recipes to the effect that you could fill libraries with papers rendered garbage by the use of faulty random number generators.

    Comment by cpirrong — May 10, 2020 @ 4:52 pm

  9. “If this doesn’t make you angry, you are incapable of anger. Or you are an idiot. There is no third choice.” Could not agree more but how do we deal with this? I will send it to my local House of Reps member but what else? Can we start a Help Fund Me program to get the publicity out there. What “models” of advocacy, not mathematics, have worked in the past or are we a greenfield? How to activate anger into action?

    Now when we see something that does not pass what we call the Pub Test (for Yanks maybe the bar test) we begin to smell a rat, a corruption rat. We need to follow the money, who benefits from the policy action initiated by the government(s)? I’ve been reading some interesting articles at off-guardian.org which highlight just who is getting the benefit of such policy. Heavens forbid can you imagine they are Big Pharma and Bill Gate’s cronies.

    The two primary motivations for mankind are fear (life & livelihood) and greed (money & power), especially where greed feeds off fear.

    Comment by Alessandro — May 10, 2020 @ 5:09 pm

  10. The fundamental problem with the Ferguson code can be seen in the following source from Github:

    https://github.com/mrc-ide/covid-sim/blob/master/src/CovidSim.cpp (the 5000+ line main program);
    https://github.com/mrc-ide/covid-sim/tree/master/tests (the current tests)

    There are (now, at least) some tests for the code – but they are regression tests, testing whether a given set of inputs produces the same output as before. There are no *unit tests* which test whether key functions in the code do the right thing in isolation. All that the regression tests do is confirm that whatever mistakes were made in the original code, *continue to happen* in the new code. There is no practical way to know whether any given function does “the right thing”; unit test input/output at least tells you what the original author thought “the right thing” was, and might even (in an ideal universe) explain why.

    An arbitrary example: in the CovidSim.cpp file look for the function “void UpdateProbs”. It’s a reasonable-sized function with a single input. What does it do? There’s literally no documentation. It’s called within the main loop (in RunModel() ) so it’s clearly very important to the model’s output, but there’s no indication of what it does, and hence how it is possible to know it is correct. If it’s obvious enough to require no comments, it’s easy enough to unit-test. If it’s too hard to unit-test, you need to document the snot out of it.

    There are areas where reasonable engineers might differ about this code – choice of language, maximum function length / loop nesting depth, commenting strategy. This is not one of them. If you’re not unit testing, you are saying that you don’t actually care – or understand – what each function does.

    Every function over about 20 lines (language dependent) has a significant bug – 95% confidence based on my experience. If I wrote the code, it’s realistically more like 98% confidence – I’m lucky to work with many people smarter than me.

    My qualifications? I’m not going to go the credentialism route (anonymity is nice), but you can look at posts labelled “software” on my blog, and form your own opinion whether I know what I’m talking about. “Sue Denim” is clearly a knowledgeable software engineer and I endorse her/his/its opinions. If anything, Sue was too kind.

    Comment by Hopper — May 10, 2020 @ 5:53 pm

  11. @ColoComment–I think the only question of how many orders of magnitude worse it is. A good guess would be the amount by which funding has grown. The system is utterly corrupted now. There are no real checks and balances. Peer review? Don’t make me laugh (or vomit). That system was always flawed and it has been corrupted too.

    I will read the essay–thanks for the pointer. I was thinking of “Cargo Cult” the other day. I don’t know whether it was a vestigial memory of Feynman, or an original thought 😛 But politicians who wouldn’t know true science from a turnip are certainly like Cargo Cultists when they chant the mantra of science.

    Comment by cpirrong — May 10, 2020 @ 6:29 pm

  12. @Alessandro. Great question. I have been thinking how to do that. It has to be something from outside the system. But overcoming coordination/free rider problems is hard. I’ve joined a small group that is wrestling with the same thing over Zoom/Telegram.

    Can’t do better than Thucydides: Fear, honor, interest (i.e., greed).

    Bill Gates is one of the most malign figures in modern life. I will look at off-guardian.org.

    Comment by cpirrong — May 10, 2020 @ 6:33 pm

  13. @Hopper. Thanks very much. Very informative.

    Credentialism is way overrated.

    One of the people I quoted in the post also told me that he had looked at Ferguson’s published papers, and they did not have “a sufficient description of the model to allow a reader to reproduce it.” So you don’t know what the programs are supposed to do, or how they are supposed to work, and you can’t create programs based on a description of the model. That is another form of non-reproducibility.

    Comment by cpirrong — May 10, 2020 @ 6:40 pm

  14. cpirrong — May 10, 2020 @ 6:29 pm

    Feynman is a marvelous writer, and his curiosity knew no bounds. Literally, no limits. He was a physicist who challenged himself to learn bongos and Portuguese and pick locks, practiced pick-up lines in bars, and played math games in his head: he is/was amazing.

    He is so cute in this video. Who else could say, with a lilt of excitement, “I get a kick out of … thinking about things.” And…, bicycle pumps (near the end.)
    https://www.ted.com/…/richard_feynman_physics_is_fun_to…
    and
    https://www.nature.com/articles/d41586-018-05082-4
    He is by far my favorite science-related writer, along with Oliver Sacks.

    Comment by ColoComment — May 10, 2020 @ 6:40 pm

  15. Link should be

    https://www.ted.com/talks/richard_feynman_physics_is_fun_to_imagine?utm_campaign=tedspread&utm_medium=referral&utm_source=tedcomshare

    Comment by ColoComment — May 10, 2020 @ 9:52 pm

  16. Yes that is truly atrocious. And not only atrocious – the scale of the income foregone and the cost in livelihood and possibly lives is I think criminal. Maybe a group of us can sue these people for losses.

    Yet another failure of Big Government and ‘SCIENCE!’

    I can only think of possibly one vanishingly minor upside to this clusterfuck: Years ago, when I was learning about psuedo-random number generation, I was taught that there could be no such thing as a string of genuinely random numbers, as all random numbers are produced by some sort of process which could, theoretically, be identified and modelled. The Imperial College team’s buggy code, which produces unreproducible random numbers, may have inadvertently proven that belief wrong.

    Comment by Ex-Global Super-Regulator on Lunch Break — May 10, 2020 @ 10:58 pm

  17. The Imperial model was a bit like the virus itself i.e. arriving at just the right time on just the right host. As I recall the report came out a week or two after most of our European neighbours had locked down and the clamour here to do likewise was growing to fever pitch. Also, our minimalist government, like yours, isn’t exactly big on detail or getting behind the numbers, and probably welcomed some deniable (i.e. “we were following the science”) opportunity to justify a particular course action.

    I haven’t had time yet to read the un-redacted (yes, that actually happened) report from SAGE to see what if any concerns were raised about the Imperial model. I’m surprised the likes of Whitty or Vallance didn’t question it, or at least ask for more time to check it out, given their expertise.

    As for your last comment, there is a third choice i.e. simply being worn down to the point of grimly accepting your fate as a consequence of the seemingly boundless ineptitude of your leadership.

    Comment by David Mercer — May 11, 2020 @ 4:13 am

  18. Financial models, climate models, epidemiological models… I noticing a pattern here.

    Comment by Andrew Stanton — May 11, 2020 @ 6:38 am

  19. Poor Gauss – the sins committed in his name! I never understood models that had so called random variables that were anything but and, from prior experience would all radically change their relationships the moment the theater catches on fire,everyone is screaming, all arbitrages are breaking down due to funding and collateral constraints, general panic, no one answering the phone or posting on the boards, etc. What really amused me was watching the children build these things to price and trade FI. Sure, an elegant model will mimic the traders of my generation, headset on, a slice of half eaten Pizza in one hand while he scratches himself with the other, desperately trying to figure out which moron he could unload the securitized Yurt loans* he was being offered at a point of greater.

    I tried to tell them the difference between evaluations – what something is theoretically worth relative to other things, and valuation – where the damned bid is. No one would listen

    * I was told such an issue actually existed!?!

    Comment by Sotosy1 — May 11, 2020 @ 10:20 am

  20. It is a mistake to fixate too much on the quality of the code, regardless of how bad it is. Focusing on things like a lack of automated confirmatory checks*, misses the point as well. Do not focus on the fact it was Microsoft staff that cleaned it up. Do not give them the slightest reason to dismiss discussion of the core issues.

    What are the core issues here?
    * What was the intent of the model?
    * What were the assumptions?
    * How did they avoid the overfitting problem?
    * …

    You could code up a very nice clean modern implementation of the model, one that had none of the issues people are fixating on and yet given the intent and assumptions it would still come up with the same orders of magnitude in its output. Would that be OK for everyone? Would our host suddenly go, oh right lets lockdown for a long time? Somehow, I do not think so.

    The fundamental thing at play here is different philosophical approaches to risk management. There are essentially two schools of thought, descriptive vs prescriptive. The descriptive method is to describe how you intend to reduce risk “as far as reasonably practicable”, this requires thinking and understanding the context of the system and the ability to understand and accept there is no such thing as completely safe and that there are tradeoffs. The prescriptive method is do xyz and you will be safe, just do as you are told, no thinking required, there will always be someone to blame and be a target for litigation. I sit fairly in the
    descriptive school in case it was not obvious.

    Oh and Mr. Mercer I totally agree on how soul destroying it would be to be subjected to the leadership of what is probably the worst sequence of prime ministers in English history, Blair, Brown, Cameron and May, you get what you vote for I guess. Although the UK should be thankful that Brown was a literal Euro sceptic and kept the UK out of that currency union.

    * An aside from someone who tests software for a living**. Testing is “a technical investigation performed to expose quality-related information about the product under test”. It cannot be automated but its practitioners can and should make use of tools, or as I like to call them software developers :), to do the repetitive trivial confirmatory checks so that we can do the deeper thoughtful testing. Checks are simple confirmations of expected behaviour, as an example think of all the times you say check and replace it with test and think does that make any sense e.g. let me test my calendar to see if I’m free.
    ** Testers do not break software; all we do is show where it is already broken.

    Comment by Steve in Calgary — May 11, 2020 @ 11:12 am

  21. At the time the prof was running his model our ignorance about SARS2 was almost total, apart from a few Chinese lies.
    So it could have been the best damn model in the universe, it would still have produced garbage.

    Comment by philip — May 11, 2020 @ 11:52 am

  22. @philip Yes there is that too. To place the blame entirely on Imperial’s model is a bit harsh, but then some people are desperate to find any scapegoat for their government’s chaotic response.

    Comment by David Mercer — May 11, 2020 @ 2:00 pm

  23. @Andrew Stanton. I wrote years ago on this blog about parallels between financial models, “big” macroeconomic models, and climate models. Yes. There are similarities.

    In my view, the role of models is to elucidate causal mechanisms. To do so, models have to be relatively simple and transparent. This inevitably requires abstraction and a basic lack of realism. Trying to model everything, but badly (to quote one of the people I corresponded with) is not the right response to abstraction.

    This means that one has to be humble about models, their purposes, and their limitations. I’ve written a lot here and elsewhere that modelers tend to fall deeply in love with their models, and that this is a serious problem.

    All that said, the saying “without theory the data are mute” is true. You need a theory/model to guide empirical inquiry (and to avoid the pitfalls of data mining), and the purpose of empirical research is to test the causal mechanisms identified by theory.

    But to the extent that code is required to bring theory and data together to test predictions, that code has to be reliable. Pace @Steve in Calgary, code with the flaws in the ICL model is sufficient to reject use of this model, regardless of the underlying soundness of that model. And as i mentioned in the post, previous Ferguson publications apparently do not permit an evaluation of the underlying soundness of the model, and the messiness of the code similarly do not permit an evaluation of the underlying model.

    Comment by cpirrong — May 11, 2020 @ 3:25 pm

  24. […] review, by Craig Pirrong, Professor of Finance and Energy Markets Director of the Global Energy […]

    Pingback by Critics Batter The Fake Science Used To Justify The LockdownAs Boris Waffles | The Original Boggart Blog — May 12, 2020 @ 7:26 am

  25. Hitherto Imperial has had a reputation as one of the top STEM universities in the UK. That may have influenced whoever in government decided to listen to him. I suspect this affair will have done significant damage to that rep for the future.

    Comment by Tractor Gent — May 12, 2020 @ 10:19 am

  26. @Tractor Gent. If you are going to spend 10s of millions of pounds on a model, and bet the lives and livelihoods of 10s of millions of people on it, it is beyond negligent to do so purely on the basis of a generalized reputation of a university. That is hardly any guarantee that what you will get is reliable and useful. (Charlatans and outright academic frauds are not unknown, even at the “best” universities.) Moreover, there was no need to rely on ICL’s reputation overall: Ferguson has a long track record. A long and horrible track record of wildly exaggerated predictions. That should have steered the government clear of him, and if it didn’t, at least given them ample reason to perform a thorough forensic evaluation of his model and his code before using it to make a single decision.

    ICL should be furious at Ferguson for precisely the reason that you suggest: the Ferguson fiasco will damage the reputation of the university as a whole. That’s not fair or right, but nor was it right for the government to rely on that reputation in the first place, thereby giving Ferguson the opportunity to rubbish it.

    Comment by cpirrong — May 14, 2020 @ 6:16 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress