A longer outage than I anticipated, but we’re back! Today we’ll be talking about how hard it can be to recognize systemic failure, like we did last issue. But this time I want to focus on real-world examples and look more at the stakes of this phenomena. Let’s start with an article from Matt Stoler’s substack BIG that I couldn’t give proper attention to when it first came out - The War in Afghanistan Is What Happens When McKinsey Types Run Everything.
“McKinsey types” is one of those statements that’s hard to nail down the exact meaning of, but quite easy to understand emotionally if you’ve met one. McKinsey is a firm that’s the biggest player in the “management consultant” space, specializing in parachuting into a rich domain and reducing it down to simple metrics. Let’s look at a paragraph from Stoler to set the scene:
In fact, McChrystal and much of our military leadership is tight with consultants like McKinsey, and that whole diseased culture from Harvard Business School of pervasive over-optimism and finance-venture capital monopoly bro-a-thons. McKinsey itself had involvement in Afghanistan, with at least one $18.6 million contract to help the Defense Department define its “strategic focus,” though government watchdogs found that the "only output [they] could find" was a 50-page report about strategic economic development potential in Herat, a province in western Afghanistan.” It turns out that ‘strategic focus’ means an $18.6 million PowerPoint. (There was reporting on this contract because Pete Buttigieg worked on it as a junior analyst at McKinsey, and he has failed upward to run the Transportation Department.)
We can define by example: a McKinsey type is one who imagines that eighteen point six million dollars could be usefully spent on “strategic focus”, by people who aren’t the same ones doing the actual work. So McKinsey types are not just the employees of management consultant firms, but also the credulous bureaucratic substrate that they feed and multiply off of. McKinsey type is a statement of faith about dashboards and metrics that ignores any concept of determining their correspondence to the world and instead insists that the act of creating them makes them true. Management consultancy requires a sort of profound arrogance in your ability to compress a domain that isn’t yours into the important bits; it’s no surprise that the people who think they can do this are usually those who are unfamiliar enough with doing real work that they can confuse the map with the territory.
Stoler again:
And their embarrassment covers up something even more dangerous. None of these tens of thousands of Ivy league encrusted PR savvy highly credentialed prestigious people actually know how to do anything useful. They can write books on leadership, or do powerpoints, or leak stories, but the hard logistics of actually using resources to achieve something important are foreign to them, masked by unlimited budgets and public relations. It is, as someone told me in 2019 about the consumer goods giant Proctor and Gamble, where “very few white-collar workers at P&G really did anything” except take credit for the work of others.
Because having lots of money leads to you passively accumulating more, top-heavy governments and corporations are able to be profoundly wasteful and still see their numbers go up. They can fail all they want and still succeed. But what do “success” and “failure” mean, anyway? This is the real asset that management consultancy is selling - definitions in abstract domains. “The war in Afghanistan”, “the COVID-19 pandemic”, “the social welfare programs of the United States” - these are dense and interconnected domains where a lot of people will try a lot of things and at the end some outcome will occur. You don’t get a clear breakdown of what was in your power and what wasn’t, you don’t get an omniscient being to tell you whether your view of what ended up happening was true or not, and you certainly don’t get a letter grade of how good among the possible outcomes you were.
So what happens when the naive quantification of a McKinsey type meets the real world? Trading away long term resilience for short term efficiency, usually. The example Stoler uses is the Afghan air force being maintained by contractors, who promptly abandoned them when the US withdrew. I’m not a military expert, but if you asked me “If we’re trying to build up the Afghan military to be independent, should they be dependent on NATO contractors who will leave when the US withdraws?” I wouldn’t exactly need to think long to say “no”. And neither, I imagine, would you. So how does this sort of thing happen? Well, because in some slide deck that some military head-of-something-or-other looks at every month, there’s a metric being tracked called something like “operating costs”. And when the contractors put in their bid, the monthly cost of running the air force went down, and that metric turned green and had a little +X% next to it. Then some person whose job is making Powerpoints showed that slide and figured it was self-evident that the number being green meant that the military was more capable, and things are officially better than last month, and that’s that.
So they use extremely tenuous proxies like “monthly expenses” to try to predict something like “the outcome of a war”, and fall victim to all sorts of trades that juice the proxy by weakening it’s already-questionable predictive power. If you knew what I meant the first time I said “McKinsey type”, you probably knew all of this already. But the thing I really want to focus on is how this attitude manages to be so pernicious despite it’s history of consistent and obvious failure. Why are there still McKinsey types? I think the answer lies in how many kinds of failure are largely unmeasurable.
Take this Slate article by Vishal Khetpal: Just Say It: the Health Care System Has Collapsed. It makes the point that saying that the US healthcare system is on the brink of collapse due to COVID-19 is giving it too much credit. Daniel Wilkinson presents with gallstone pancreatitis that is extremely amenable to quick surgery, but that surgery is delayed due to lack of ICU capacity and he dies. Is “collapse” a good word for an easily preventable death occurring because of stresses on the system? Like the word “failure”, it primes us into thinking about thresholds and discrete states - the system has collapsed, or has not collapsed.
Instead, try asking: how collapsed is it, exactly? Daniel Wilkinson wasn’t the only patient at that hospital, and some people got the care they needed that day. When you analyze his individual story, it’s clear enough that his death was a profound failure because of the personal details: the seven hour wait, the huge success rate of his procedure when done on time. But coding this as a “preventable death” is much harder. Hospitals always have delays, and there will always be some sick people who die even if everything is done perfectly. You can measure delays, and you can measure deaths with certain diagnoses, but ”could have been prevented” is an individual story. Prevented how? With what kind of effort? IF the wait time had been shorter, THEN Daniel Wilkinson would have lived - but there’s always going to be some sort of margin where that’s true for somebody and false for somebody else.
“Average time to get an ICU bed” is a metric that you can measure, but that’s not the same thing as discovering the impact of it’s failure. You can set metric thresholds to decide what you want to call a failure, but how much your failure actually meant in material terms requires the sort of counterfactual analysis (“What if everything in the world was the same that day except for the average time to ICU bed?”) that’s extremely difficult to do over aggregated numbers. This is the bolt hole that a McKinsey type can always crawl back into for self-defense. Systems of sufficient complexity rarely completely fail in discrete and definitive ways - they simply do a lot worse. But as long as there’s someone left behind to pay the consulting fee, a consultant can point to whatever green number is on the dashboard, and instead of saying “Wow, we still bungled the hell out of the war in Afghanistan even though that number was green - I guess I should re-evaluate my thoughts on what I chose to measure” they can just say “Well I’m sure that bad outcome was inevitable, but it would have been worse if the number was red. Hooray for data!”
There’s a phrase that sometimes gets passed around by McKinsey types: “the plural of anecdote is not data”. The intent is to elevate whatever report was put together as superior to the lived experience of any naysayer. But the original quote from Raymond Wolfinger was exactly the opposite: “the plural of anecdote is data”. I’m with Wolfinger on this one. Data is a detail-annihilating projection that may or may not correspond well or at all to the real world, while an anecdote is something that actually happened. And far too often, success is expressed in data while failure remains buried within anecdotes. After all, why do all of the work to properly handle details that will just make you look bad when you could just tell a nice story and collect a big check?
At the beginning of his piece, Stoler asks: “Do we have the competence to govern ourselves anymore?” If your vision of “collapse” is a complete post-apocalypse of marauding gangs, it may seem like a silly question to ask of a society that is broadly functional for many people most of the time. But dying of gallstone pancreatitis that was diagnosed several hours before death, dying for lack of insulin when it’s an order of magnitude cheaper across the border, dying for lack of food while across town police pour bleach on food in a dumpster - these failures are failures of governing competency. These are stories of neglect, and one enormous form of neglect is to have the depths of your tragedy insufficiently captured and understood; excluded from the metrics so that on the day you die, the bureaucrat that could have saved your life is looking at a Powerpoint with green numbers because nothing about you dying makes them worse. That’s the reason I dislike McKinsey types, and that’s the reason I’m writing Desystemize. We need to be more mindful both of what metrics we choose to track, and that metrics will always lag anecdote in terms of detecting and preventing system failure.
Next time we’ll re-tread the ground of Desystemize #5 on AI, but this time focusing on the existential risk of superintelligence - it’s been a popular topic in the comment section and I want to give full treatment. If you want to know when it’s done, make sure you subscribe!
Sorry for suddenly appearing on your blog only to disagree with you. I tend not to comment on the parts I agree with, which I know is a fault.
I blame not the faith in metrics to produce results, but the faith in the Ivy Leagues to produce graduates who are worth hiring as consultants about anything despite having no experience at anything. The great dysfunction here is the religious nature of the Ivy Leagues, which are the cathedrals of crypto-Calvinist puritanism.
The plural of anecdote is not data, because the anecdotes have always been gathered by people trying to prove one point or another. For instance, we have a vast multitude of anecdotes about faith healings, but no data AFAIK.