Desystemize #1

How hard is it to get counting right?

The thesis of Desystemize, in one sentence: it takes a lot of work for a number to mean anything and we largely aren’t doing it. That’s a simplified version that leaves a lot to explore. There are domains where this happens more and less, approximations that are better and worse, failure cases that are bounded and failure cases without end. And it’s not just numbers, really - it’s the abstractions we create to scale up our knowledge that go bad, with numbers more of a symptom than a cause. But at the end of the day, our mission is to inject detail into systems that don’t mean as much as they think they do.

“Inject detail” - it’s a bit of awkward phrasing, isn’t it? I prefer to use constructive language that focuses on what we’re positively doing, but we don’t have familiar phrases on hand for doing this kind of work. We don’t talk about this stuff enough. That’s my whole point! This newsletter is called Desystemize and not The Detail Injector because -- well, because The Detail Injector would be a terrible name, but also because our language in this space is so geared towards building systems up and not tearing them down. By necessity if not choice, these will be stories largely focused on things going wrong. I prefer to highlight wins over losses, but as we’ll discover, the wins are few and far between and the losses are all around us. Still, negativity is no way to start something new, is it? Let’s start with one of those few wins so we can see what it actually means to desystemize. Let’s talk about counting ticks.

Counting is a system, one a lot more profound and powerful than we give it credit for. After all, if I pick up an animal, count the ticks on it (the “tick burden”), and then set it loose again, I’ve created something from scratch. We now have a number, where before there was only a messy and detailed world. When we set that animal loose, we can never travel back in time to the world that was then and have another look, but the number will survive as far in the future as we care to take it. We can study a tract of past we can’t revisit because we’ve made a number that serves as a mirror to it. Systems are great at creating things!

And you’ll need those creations if you, for example, want to study the spread of Lyme disease. One of the people doing this studying is Dr. Richard S. Ostfeld, whose book “Lyme Disease: The Ecology of a Complex System” excellently records the investigations of his lab group. A number of studies in the 80s and early 90s all leaned towards white-footed mice as the primary host of the ticks that caused Lyme Disease, at least in the northeastern United States. They were plentiful, they tended to pass on the responsible bacterium (Borrelia burgdorferi) to ticks feeding on them, but most importantly: they had the highest tick burdens. It was thanks to those counts that we could ID the guilty party. 

Saved again by the power of systems. But remember, the whole thing that makes counting so powerful is that the past is unreachable. We’re completely reliant on the field counts being an accurate way to visit the past. Is the count of ticks on a given animal really the same thing as the number of ticks on a given animal? This is where Dr. Ostfeld comes in with, for my money, one of the most beautifully simple experiments of all time:

"My research group has set and checked many hundreds of thousands of live animal traps over the years. When we catch a white-footed mouse, we remove the animal from the trap and hold it by the scruff of the neck to check its sex, breeding condition, identity (the number on its ear tag), body mass, and the number of ticks that are attached to it. For some unknown reason, blacklegged ticks on mice orient toward the ears, which are big and only sparsely covered with fur. By carefully inspecting the ears and face of a mouse for about one minute, we detect about 90% of the ticks that are attached to it. We know this because we have retrieved hundreds of mice to the lab and held them in wire mesh cages (supplied with water and their favorite foods) over pans of water for up to 5 days, longer than larval blacklegged ticks typically stay attached before they drop off the host. (The pans of water beneath cages contained much more than just ticks—this was a messy and challenging task, but sometimes we must make sacrifices for science.) So we have a full count of tick burdens on many individual mice that we also inspected in the field. Repeating the same process for chipmunks (Tamias striatus), we know we detect about 60% to 75% of the ticks during our one-minute inspections in the field. For all other mammal and bird hosts we trap, counts in the field are such wild underestimates that we don’t even bother. Blacklegged ticks on these other hosts tend to distribute themselves over the entire body, where fur or feathers can be dense and thick and the ticks are impossible to see. If hosts are anesthetized, they can be inspected somewhat more carefully, but even these counts tend to grossly underestimate actual tick burdens."

As it turns out, some species are just easier to count ticks on than others, and those field counts we hoped were a mirror into the past actually just tell us which species have the most visible ticks. In response to this, Dr. Ostfeld’s lab group (led by Kathleen LoGiudice) collected as many mammal and bird species as they could to test via the pan method. Even this has its difficulties. Short-tailed shrews, for example, have an observed skill at staying away from traps, lack of external large ears for tags, and are difficult to keep alive in the lab - Dr. Ostfeld muses that this is probably the reason they’re held as unimportant in Lyme disease ecology. Still, the pan method worked for a variety of animals, and revealed a surprising truth; white-footed mice have among the lowest tick burdens of all of the mammals tested. But since those ticks they do have are prominently visible on their ears, sparing them from the normal dramatic underestimation of field counting, equating field counts with true numbers gave them an undeserved prominence. They’re still a key part of Lyme disease ecology, but far from the monolithic single species theory that had been previously supposed. 

From an ecology perspective, this is just science working as usual; a system that did a bad job of mirroring the world was replaced by one that does a better job. For us, the most interesting part is one particular sentence: “For all other mammal and bird hosts we trap, counts in the field are such wild underestimates that we don’t even bother.” This is what it means to desystemize: to stop believing that a given process models the part of the world it was meant to model. The count is not the number, and we’ll stop acting like it is. And as happy we are that it happened here, looking into the details of what it required paints a grim picture.

First off, we should note how fortunate it was that this experiment led effortlessly to a new systemization. We proved one system (field counts) wrong by creating another system (pan counts) with a design that guaranteed it was more accurate, then showed that the numbers were dramatically different. But field counts would be wrong even if we didn’t have pan counts to replace them. It’s easy to wean people off of a bad number when you made a good number to refute it; far harder to make the case when there’s no replacement. When thinking about desystemization, a replacement system is a luxury, not a guarantee. The far more common outcome is simply walking away with humility that the thing you’re investigating is too slippery for systemization.

There’s also the matter of technology - or more specifically, the lack of it. Our ability to remix, compound, and analyze numbers has exploded in recent years: machine learning algorithms that pull “insight” from thin air, enormous stores of training data, all sorts of statistical techniques that promise to get more mileage out of any numbers you care to feed them. But our ability to start with the right numbers is still bounded by however long it takes a few ecologists to sift through wet mouse turds and count the ticks. Every year our tools for analysis get farther and farther out of sync with the human-scale tools we have to check correspondence with the world, making our eyes bigger than our stomachs and increasing the temptation to assume the data must be all right so we can get to the fun part. 

And finally, there’s the cold and terrifying truth behind nearly every story of oversystemization: nothing about the field counts themselves indicated they were wrong. The ticks on those mouse ears really did exist! And even someone willing to admit the field counts are gross underestimates would be forgiven for thinking that the inaccuracy would be roughly even for all species, instead of “gross underestimates except for specifically white-footed mice and chipmunks, for whom the counts will be close to accurate in a way that will make them seem much more tick-infested than all the other species”. Since it’s systems that generate the data, the errors will be systemic as well; this defies our intuition of errors as a sort of random fuzz around a true center.

Dr. Ostfeld didn’t start that paragraph by noting such-and-such statistical technique clearly indicated something was off with the tick counts. He started with the sentence “My research group has set and checked many hundreds of thousands of live animal traps over the years.” In other words, it was familiarity with the data-generating process that enabled the lab group to imagine this potential vulnerability and come up with this experiment. By the time the data gets into the hands of analysts, it’s too late to fix. You can’t math your way out of a wrong number. This mistake was caught only because it was the same people generating the data as analyzing it. Which, great for ecology - but as data science becomes more and more specialized, it will be increasingly done by people who are explicitly and solely data scientists. And they’ll inherit datasets from repositories somewhere and never catch a single one of these systemic errors because they couldn’t sift through the wet mouse turds even if they wanted to.

Let’s recap. We started with counting, probably the simplest systemization that exists in the entire world. It turns out that “pick up an animal and count the ticks on it” is not only an insufficient way to learn how many ticks are on it, but it’s insufficient in a non-random and incredibly misleading way. To catch this mistake, you have to be intimately involved in the data-generating process and be curious enough to have a hunch to design an experiment. That experiment involves a lot of hard, tedious, and gross work. If you pull it off, then you’ll get a good number only because you happen to be working on a question where the system was bad because of its own merits, and not because the domain itself was inherently resistant to systemization. That’s what it takes to pull off proper desystemization. 

But what about domains that ARE resistant to systemization, where we can’t go any further than “I don’t know what the right answer is, just that there’s no good way to find it?” What about findings that are entirely based in machine learning techniques and invite no easy way to correspond them with the world? What about data that’s actively trying to hide it’s generating process because it violates your privacy? What about findings that are expected to come at machine speed, with no stakeholders willing to commit the time and effort for correspondence work? In short - we saw how hard it was to make this easy looking system (“You can know how many ticks are on an animal by counting them”) meaningful. Given this, how meaningful are all of the other systems around us?

The bad news is: the crisis of meaning in the modern world is pretty much exactly as bad as it sounds. The good news is: once a week or so you can get an email about it. Welcome to Desystemize!