Representation and Uncertainty
How do our forms of expression influence our understanding of what's true?
I.
As always when something is a prerequisite for itself, you have to proceed in a spiral. An approximate understanding of a small part of the subject makes it possible to grasp more of it, and thereby to revise your understanding of the initial beachhead. You need repeated passes over the topic, in increasing breadth and depth, to master it.
-David Chapman, Ontological Remodeling
I want to talk about the modern crisis of meaning. With our powerful computers and modern scientific techniques and access to all sorts of media, why is it so hard to figure out what’s actually true? I think a major reason is a lack of emphasis on representation: that is, how we choose to describe the world in the first place. We evaluate effective representations and ineffective representations side-by-side without distinguishing between them, hoping that truthfulness for one maps to the other, and instead are left only with confusion.
This is a hard thing to talk about, because a “representation” is a concept that I must represent effectively to you. The whole problem is that we’re not used to talking about representational issues, but talking about representational issues is an example of a representational issue. So I can’t describe the problem to you without solving it; it’s a prerequisite for itself. As such, we’ll take Chapman’s advice and proceed in a spiral, insinuating our way towards the subject rather than rushing it head-on. (I’ll also have to gently mislead you for a while, though I promise to fix it before the end.)
We’ll start with just a single word:
Hrair: A great many; an uncountable number; any number over four.
It’s a word in Lapine, the language of the rabbits from Richard Adams’ Watership Down. Rabbits, Adams tells us, can’t count higher than four. It’s not that they only have memory for four things; Hazel, the leader of the novel’s band of rabbits, has more than four followers, and he knows all of their names. But rabbits never represent things in groups higher than four. Anything more than four is hrair, a lot. For example, when Hazel and another rabbit, Blackberry, are discussing a diplomatic mission to another warren, Hazel says:
“We’re agreed, then, that we ought to send an expedition to this warren and there’s a good chance of being successful without fighting. Do you want everyone to go?”
“I’d say not,” said Blackberry. “Two or three days journey; and we’re all in danger, both going and coming. It would be less dangerous for three or four rabbits than for hrair. Three or four can travel quickly and aren’t conspicuous: and the Chief Rabbit of this warren would be less likely to object to a few strangers coming with a civil request.”
“Everyone” is a concept Hazel understands just fine, even though more than four rabbits are part of the “everyone” that lives in his warren. But if Hazel was asked exactly how many rabbits went into this “everyone”, he couldn’t say. What matters to him here is that four rabbits can travel more carefully than hrair rabbits, whether hrair ends up being five or ten or twenty. For the case of “how many rabbits ought to visit the other warren?”, there’s no need to count higher than four.
We can imagine cases in rabbit-world where hrair isn’t sufficient. Suppose there’s a log that could be used like a see-saw to open up a new feeding ground, but requires at least eight rabbits sitting on one end to make it move. We tell Hazel to have his rabbits sit on one end to see what happens, and he responds “I’ve already tried having hrair rabbits sitting on one end, and nothing happened.'' We ask him to follow a certain instruction:
Get a group of hrair rabbits. Have each of them choose a different rabbit not already in the group. This is two-hrair.
Two-hrair ends up meaning “at least eight”. If Hazel’s band tries this and discovers that two-hrair rabbits are enough to move the log, he may well decide that two-hrair is a concept worth remembering. Does this mean Hazel learned to count to eight? Not really. He can’t tell in advance whether he has two-hrair worth of rabbits, and he can’t reach the intermediate states of five to seven. But clearly, if Hazel remembers the trick of “when hrair isn’t enough, try two-hrair”, he’s got something he didn’t have before. If it’s not the process of counting that’s changed for him, what did?
The word we’re reaching for is ontology, a jargony philosophy term that can be more or less understood as “the list of things in a category” or “the list of answers to a question.” Before, Hazel’s list of options for “how many of a certain thing is there?” was this: {one, two, three, four, hrair.} Now, it’s this: {one, two, three, four, hrair, two-hrair.} This isn’t the same thing as a list of numbers: numbers are distinct, whereas a hrair group of rabbits might also have two-hrair’s worth of rabbits without anyone knowing it, because it wasn’t assembled through the pairing algorithm. In fact, two-hrair is a lot less general than the other entries on the list, because it can only apply to rabbits that are working together and following the same rule. One, two, three, and four are numbers, hrair is a numeric range, and two-hrair is a numeric range that can only apply to rabbits.
The entries are conceptually quite different – but they’re ontologically similar, because they’re used the same way. Two-hrair is just a new answer that’s potentially available for questions of the form “how many—?”. Nothing says it has to behave like the other answers, or be available for all potential “how many—?” questions that come up. Take note of that asymmetry before continuing on: all numbers are answers to “how many—?” questions, but new answers to “how many—” questions don’t need to be numbers.
Speaking of “how many—?” questions, let’s ask two:
Game A: Would you trade away four value-units for a 25% chance of having five value-units?
Game B: Would you trade away four value-units for a 25% chance of having one thousand value-units?
A lot of modern discussion of uncertainty works by asking these sorts of questions and finding answers that are as objectively defensible as possible. I don’t want to get into that, so I deliberately chose payoffs extreme enough that the answers are pretty obvious. Game A stinks, whether the value units are dollars or candy bars; Game B is incredible unless you only have a few value-units and will instantly die without them (vials of insulin, for example). Imagine we’re all sitting around a picnic table, coming to a quick consensus, when a couple of rabbits hop nearby. We want to get their input, so we translate our questions into Lapine:
Game A: Would you trade away four value-units for a 25% chance of having hrair value-units?
Game B: Would you trade away four value-units for a 25% chance of having hrair value-units?
They look at us with their confused little bunny eyes and ask, “Sorry, why did you describe the same game twice?” Lapine cannot represent the difference between Game A and Game B. Our work with two-hrair won’t help: metaphysical value-units aren’t the same thing as conscious and cooperating rabbits, so we can’t run the process to get to two-hrair. And anyway, Game B would still stink if it ends up only being an eight value-unit payoff.
If we want the rabbits to understand our question, we’ve got to teach them our sort of counting, an algorithm that can be continually re-executed to produce an infinite ontology: {1,2, 3….1000…}. What’s the incentive we can give them for learning our method of counting—the equivalent of the log and the new feeding ground we used to teach them two-hrair? Can rabbits get any use out of playing these games? In what situations would they actually be offered the chance to trade away four value-units to get either five or one thousand? What are the value units in question? Do Game A and Game B show up in their lives via sufficiently different contexts that we can just refer to those contexts without teaching counting at all? Maybe something like an aphorism: “bargain a sure thing for hrair with your friends, but never strangers.”
We can see that formal mathematical methods aren’t the right tool to fix the rabbit's understanding. Their difficulty is ontological, not logical: since Game A and B look the same to them, the application of any rabbit-scale formal method would yield the same answer. And given how different the two games are, a method that gives the same answer for both of them can’t be all that useful. Formal methods pre-suppose that nebulous reality has been described at the level of detail the method requires. It doesn’t matter how good your math is if you’re telling it to a rabbit.
If, for an audience of humans, I had gotten into the math of Game A vs. Game B, made their payoffs a little closer, used phrases like “expected value”, and maybe threw in an integral sign somewhere, it would have been a lot of work. So much work, in fact, that it’d be easy to imagine that pushing those numbers around is just a conscious representation of what our brains are doing subconsciously when we make decisions. But hopefully I’ve shown how a precise-enough understanding of the situation has to come first, before any sort of math. Formally solving uncertainty on what to do only works if you can sensibly describe what there is.
There’s another important point here, one that you might have started thinking about a couple of paragraphs back. I claimed that, in order to teach the rabbits our sort of counting, we’d have to ground it in their local context. But how did we learn our sort of counting? Many people are comfortable reading Game A and Game B as they are, without relating them to a local context. You don’t have to imagine dollars, or candy bars, or vials of insulin. The number “1000” feels meaningful even if it’s not 1000 of anything in particular. So saying “humans know how to count infinitely and rabbits don’t” is just kicking the can down the road as an explanation. The real difference is that humans are able to learn context-free information. Why is that?
Ignorance, A Skilled Practice, an essay written by a literal banana, offers us the frame we need to explore this further. It’s an essay concerning indexicality, a word that, like ontology, sounds awfully jargony but isn’t so bad once you get to know it. Indexicality is essentially the degree to which local context matters for a given concept, though I’ll let the banana describe in more detail:
“Indexical” is a word almost perfectly calculated to sap the morale of the reader and annihilate interest. I have seriously considered replacing it with the word “pointing,” used as a descriptor. Indexical statements are pointing statements: “I prefer this one,” “don’t do that,” “I made it for you.” These sentences have no particular meaning without reference to the situation in which they are produced. Physical pointing, as with an index finger or the lips or chin, may or may not accompany an indexical expression, but there is a sort of pointing-to-the-situation that occurs in all cases. I’ve decided to keep “indexical” for clarity, but keep in mind that indexical means pointing, in a literal and then in an extended, figurative way.
In the linguistic sense, an expression is indexical if it refers by necessity to some particular state of affairs. “This guy arrived just now” depends on the person indicated and the time of speaking; it is highly indexical. Compare “The Prime Minister arrived at 5:15 p.m.” This is less indexical, but notice that the identity of the Prime Minister depends on the country, and of course we don’t know anything about the circumstances or place of arrival from the text: 5:15 p.m., but in what time zone?
Extremely non-indexical expressions often appear as health or science headlines. These are pretty much the opposite of indexicality:
Stanford Researchers: Average Human Body Temperature Dropping
How Puberty, Pregnancy And Perimenopause Impact Women’s Mental Health
Why is air pollution so harmful? DNA may hold the answer
Predatory-journal papers have little scientific impact
Can a healthy diet reduce your risk of hearing loss? Here’s what the research says
Notice that these refer to people in general, and vague concepts in general. They take the form of objective knowledge that is true in general, for all cases, globally, universally. They “see through” to the ultimate truth of matters, unsullied by the messy realities of particular people and situations. The kind of knowledge that non-indexical statements presume to convey is timeless, and describes all of humanity or the world in general. Indexical knowledge, on the other hand, refers to specific situations, times, people, and interactions. It does not purport to apply timelessly, in general, or to all people.
Indexical statements point at specifics; non-indexical statements describe generalities. “The number 3 is larger than the number 2” is non-indexical, and true in a general sense. “My two cats put together are heavier than your three cats put together” is highly indexical, and true for specific values of “my two cats” and “your three cats,” overriding the general “truth” that 3 is more than 2. “Average body temperature” is a non-indexical concept; but to guess at it, researchers can only measure the indexical body temperatures of certain people at certain times. Rabbits stay in the realm of the indexical, while humans are able to trade in non-indexical statements. Or, at least, some humans. Back to the banana:
To illustrate that global knowledge is a game, consider a story about Alexander Luria, who studied illiterate Russian peasants and their semi-literate children. Consider especially this version of the story, prepared in the 1970s to provide morale and context to reading teachers (John Guthrie, 1977). Essentially, Luria discovered that the illiterate, unschooled peasants were highly resistant to syllogisms and word games. The adult peasants would only answer questions based on their own knowledge, and stubbornly refused to make deductions from given premises. “All bears are white where it is snowy. It is snowy in Nova Zembla. What color are the bears in Nova Zembla?” “I don’t know, I have never been to Nova Zembla.” Children with only a year or two of education, however, were easily able to engage in such abstract reasoning. They quickly answered the syllogisms and drew inferences from hypothetical facts outside of their own observation.
In this story, I argue, Luria’s peasants are indexical geniuses, who refuse to engage in unproven syllogistic games. They are not interested in a global, universal game. Their children, however, are easily introduced to this game by the process of schooling and literacy.
The Russian peasants, like the rabbits, want to keep their observations grounded in context. Whether they’re worried about a trick, too proud to risk giving a wrong answer, or just plain suspicious of Luria, they don’t want to suppose the existence of global, non-indexical concepts like “all bears are white where it is snowy” (never mind things as abstract as Game A and Game B!) I’m not quite as adamant as them—I think there can be a lot of value in drawing inferences from unobserved information. But it’s true that both human beings and rabbits are highly indexical beings, who perceive this, do that, here and now. If you want to use non-indexical findings in your indexical life, you have to make sure they actually fit your context.
From our perspective, rabbits and peasants have some extra work they need to do before they can talk about these formal games like we do; they need to be able to become more comfortable with abstractions and language that doesn’t refer to concrete things out in the world. If you ask the rabbits and the peasants, though, we have an extra step we need to do after describing our formal games. Because we let ourselves play around with concepts that we couldn’t actually point to, we have no guarantee that our findings will actually come up in our lives like they do for rabbits and peasants. Non-indexicality lets us take a sort of loan of meaning, playing with tools that aren’t available in the here and now to broaden the ways we can think about things. But that loan has to be paid off eventually through correspondence work showing that the abstract objects you were acting on are similar enough to what actually exists in a particular situation for the finding to be valid. Otherwise, you’ve just been playing a meaningless game that has no bearing on actual reality.
I would argue that rabbits and Russian peasants are not indexical geniuses but indexical misers, unwilling to take on any sort of meaning-debt. To speak of the white bears in Nova Zembla is to become beholden to two abstracts—“All bears are white where it is snowy” and “it is snowy in Nova Zembla.” How would you pay down that debt? Bears are real animals, snow is real, and Nova Zembla is a real place: they aren’t just counters in a logic puzzle. It’d be a big trip to go to Nova Zembla, and maybe it’s only snowy part of the year. What if you go at the wrong time? Who can prove anything about “all bears where it is snowy”, anyway? Has someone been to every snowy place and met every bear? What if the non-white bears were asleep? Do they have a checklist that gets updated every time a new bear is born? Why deal with any of this shit when you could just say “I don’t know, I have never been to Nova Zembla” and get on with your life?
Just like monetary debt, allowing yourself to take on meaning-debt temporarily can let you access new heights and then pay the debt down and end up somewhere better. Every time a novel theory in physics is confirmed by experiment, meaning-debt proves its worth. But also like monetary debt, meaning-debt can ruin your life if you’re not careful, and you’re not exactly wrong to decide it’s not worth the risk. The peasants may be constraining their range of possible thoughts by focusing solely on the here and now, but they’re also guaranteeing that the things they think about will have an impact on their actual lives. If you develop formal theories without checking for heres and nows that behave in the way your theory describes, you may be throwing more and more of your time into a compounding interest hole of meaninglessness.
Statements about value-units are non-indexical, which just means that we’re not pointing at anything in particular when we talk about them. By contrast, if I was deciding whether to trade four french fries now for a 25% chance to get one thousand french fries for lunch tomorrow, that would be a highly indexical decision, because it’s drenched in my local context. It would also be a highly indexical decision to trade four grand pianos now for a 25% chance to get one thousand grand pianos tomorrow. But maybe my answers for those two are different—I’ll take the gamble on the fries because I’m not hungry now anyhow, but I hardly have anywhere to put the four grand pianos while I sell them, never mind one thousand.
This difference shows us the meaning-debt that we incur from non-indexical phrases like “value-units”. If we want to use our findings from Game A and Game B in our lives, we first need to do correspondence work to find whether french fries behave enough like value-units that the findings are applicable, and the same for grand pianos, and the same for anything else. In one sentence: there are possibilities we can conceive of solely because our ontologies can include things we can’t directly point to, but this power comes with the debt to make them point to something later.
II.
Our next step is to see the meaning-debt the peasants were afraid of. What do problems of representation look like? We’ll turn to humorism as an example. Some ancient medical theorists thought that most illnesses could be explained by an excess or deficit of the four humors: {blood, phlegm, yellow bile, black bile}. We’ll imagine a Hippocratic hardliner who thinks this explains everything. That is, every single illness is caused by either too little or too much blood, phlegm, yellow bile, or black bile. (I don’t know how many historical physicians actually believed in humorism this strongly, so don’t treat this illustrative example as an especially accurate historical recounting. Also, it’s going to get worse.)
We know now that humorism isn’t true. Illnesses can be caused by viruses or bacteria or environmental contamination or all manner of other things. But the “humors” aren’t entirely arbitrary concepts, either. The illness is caused by whatever it’s caused by, different illnesses cause different symptoms, and if the humorist diagnostic criteria consistently classifies the same symptoms the same way, the humors “listen to” the true causes in a partial, indirect way.
Suppose a patient named Artemis has some set of symptoms—maybe a deep cough and a splitting headache, or whatever. All of the ancient physicians she runs into agree that she must have an excess of phlegm with the other humors in balance (“phlegmatic” for short). What does the statement “Artemis is phlegmatic'' actually mean, knowing what we know now about medicine? It’s not pointing at Artemis’s excess of phlegm, because we know now that bodies don’t work this way and there’s no actual excess of phlegm to point to. But it’s not exactly false to say “Artemis is phlegmatic”, because it has a meaning to the people who say it and Artemis is a patient they all agree qualifies. It’s indexical, because it’s pointing to a real thing, but that doesn’t mean it’s true. We’ll need to understand this fussy-sounding distinction to go farther, so we’ll inject enough context that the difference becomes clear.
Let's say that modern medical historians have determined that just two diseases caused people to be phlegmatic in the ancient world: city cough (caused by the bacterium c. urbanicus) and country cough (caused by the bacterium c. pastorilus). You catch city cough in close proximity to other people, and it’s cured by simply increasing intake of sugars and meat—the traditional cure was drinking mulled wine and eating bull testicles—but gets dramatically worse if you have any contaminated food or drink. Country cough gets in your lungs from disturbed soil, and the solution is to beef up your microbiome—traditionally done by drinking brackish water and eating a handful of grave dirt—but it spirals out of control if you eat and drink normally, and god help you if you eat some bull testicles.
Remember that these (imagined) Hippocratic hardliners don’t think of “city cough” or “country cough” as ailments. They think that a phlegmatic patient has an excess of phlegm and that’s the sole reason for their problems. It just so happens, though, that doctors tend to either be city doctors or country doctors, and tend to see cases of city cough or country cough but not both. The city doctors have their regular hookups for bull testicles, country doctors know the perfect graves to skim dirt off of, and the literature tactfully equivocates with “Some doctors recommend mulled wine and bull testicles to cure an excess of phlegm, while others prescribe a course of brackish water and grave dirt.”
Artemis, though, is a bit of an interesting case. She lives on the outskirts of town, spending a fair amount of time in the field but also making frequent trips to the city. So when she wanted to get multiple opinions, she ended up asking one doctor who works in the city—Dr. House—and one who works out in the country—Dr. Field. Dr. House naturally suggests the wine and ball combo, to which Dr. Field indignantly replies, hey, I agree that she’s phlegmatic, but that’s precisely WHY the extra food and drink is a huge problem—we need to get her on brackish water and grave dirt, stat.
When Dr. House says “Artemis is phlegmatic,” he’s relating her to the patients he’s seen previously that got better when they drank mulled wine and ate bull testicles. Why did they drink mulled wine and eat bull testicles? Because he told them to. Why did he tell them to? Because they seemed a certain way. Artemis also seems that certain way, and so Dr. House wants to treat her the same way he treated the other patients, by giving her mulled wine and bull’s testicles. If Artemis seemed a different way, he wouldn’t expect mulled wine and bull testicles to help.
When we ask if something is true or not, this is ultimately what we want to get to: the difference in outcomes for an intervention. Mulled wine and bull testicles work great for some patients and do nothing or are actively harmful for others. Dr. House wants to say “Give mulled wine and bull testicles to the patients for whom they will help, and don’t give them to the patients for whom they will not help.” If he had never met another doctor in his life, and felt no need to justify himself to any patients, he could hold this idea in his head without needing explicit concepts. His personal experience would be enough to have an intuition of “the sort of patient” to give mulled wine and bull testicles to, and he could use that as his guide. This is a normal and very human thing to do.
But that doesn’t cut it when you want to socialize this difference, whether it’s for formal diagnostic rules or just idle banter with colleagues. Dr. House wants to compare his findings with his peers. He wants to explain his reasoning to his patients. He wants to document his actions for future reference. To do any of these, he needs to be able to point at the difference in interventions. But how can he do that? He can’t describe every single thing he noticed in each patient he’s seen. And he can’t point at c.urbiancius itself, because he doesn’t know what that is. He needs a named concept he can point to when he means “look at this difference between patients who got better with mulled wine and bull testicles and those who didn’t.”
This is where the tangle of meaning happened. Dr. House and Dr. Field are trying to point to two separate differences—the difference in outcomes for mulled wine and bull testicles vs. the difference in outcomes for brackish water and grave dirt. They each needed a way to describe that, and were used to the background framework of humorism saying that symptom profiles are all you need to describe the differences in interventions. So they both pointed to “phlegmatic”, without adding in the additional context of city vs. country. In the act of formalizing how one points to phlegmatic, they inadvertently destroyed the why of pointing to phlegmatic. It’s not that they were wrong to try, since the differences they were pointing to can be used to improve human health when used correctly. But their framework of “what constitutes an explanation for a difference?” was too rigid, and so patients that look the same in their framework may not respond the same way to a given intervention.
Let’s repeat our summary of part I:
There are possibilities we can conceive of solely because our ontologies can include things we can’t directly point to, but this power comes with the debt to make them point to something later.
Now, we’ll reframe it:
There are differences we can only point to because our ontologies can hold abstract concepts, but pointing to these concepts comes with the obligation to make sure these differences are really there when we point at the concept.
This is the meaning-debt the indexical misers were desperate to avoid. In formalizing the idea of “phlegmatic” enough that the two doctors agree that Artemis is phlegmatic, it melded together the differences that each doctor imagines they’re pointing to. “Artemis is phlegmatic” is indexical in the sense that “How do you point to the concept ‘phlegmatic’?” has a shared understanding among physicians. But while it’s indexical, it’s not completely meaningful, because not all uses of the word are pointing at the same difference.
If their language had been refined enough to say “city phlegmatic” and “country phlegmatic”, then the meaning-debt could potentially be paid off, since “city phlegmatic” would be pointing at c. urbanicus and “country phlegmatic” would be pointing at c. pastorilus. Alternatively, if each doctor had stuck squarely to their own terrain and never interacted with an in-betweener like Artemis, it also would have worked out fine. “Phlegmatic” would be pointing to separate differences in the minds of Dr. House and Dr. Field, but each individual use of the word “phlegmatic” would be pointing to what it means to point to.
So it’s not the word “phlegmatic” itself that goes into meaning-debt or not. Instead, it’s the difference being pointed to that incurs the debt when you conceptualize it and try to make something in the world point at it. You pay this debt off when your conceptualization successfully points back to your difference, and the payoff is being able to use that difference in your real, indexical life. If Dr. House had demurred and said “I don’t know, I’ve never treated a patient outside the city center”, then “phlegmatic” would be simultaneously a meaningful word for him (because it always points to a specific difference, the difference in outcomes of interventions for patients with c. urbanicus, when he uses it) while being a bankrupt word for Dr. Field, who overstepped his boundaries too far by presuming the lived experiences of his previous patients applied to Artemis.
In other words, representations are inevitable, and the challenge is to make sure the representations are meaningful with respect to the differences in the world you’re trying to interact with. “Phlegmatic” as a representation carries with it some degree of unrepresented baggage that Artemis won’t be aware of when she hears “you’re phlegmatic”. The doctors who are trying to explain real differences have the job of trying to hold that unrepresented baggage constant enough that their past experience of exploiting those differences applies the same way to the same representation. Since words are just tools that one can use well or poorly, different people may incur different levels of meaning-debt when using the same representation. Once you get practice thinking this way, you’re able to be much more precise about what’s actually going wrong when looking at failures of representation.
Practice! That’s what we need. Here, I’ll send you back to the ancient world using my time machine. Plus side: I’ve also given you a solar-powered laptop capable of running modern machine learning models, so you have raw computational power on your side. Minus side: they think you’re a god of healing and believe they’ll be destroyed if they ever see you or hear you speak, so they’ve locked you in a room. They slide a clay tablet under the door that includes the patient’s humors and a list of possible interventions—mulled wine and bull testicles, brackish water and grave dirt, and many others besides. You mark the tablet by the intervention you want them to do and send it back. Later, they send an offering and another tablet showing how the patient ended up faring.
As you get more data to train your machine learning model, you’ll be able to improve outcomes and eventually choose the “best” outcome for each humor state. But it’d fall way short of modern medicine, because even with modern computational power, you wouldn’t have modern ontological power. You don’t know whether these patients live in the city or the country, because your worshipers think the humors explain everything and don’t put other details on the tablet. And your model will be brittle, because it doesn’t have any theory to inform it when conditions change. If cities expand their borders, so that more patients get city cough and fewer patients get country cough, you won’t be able to predict that in advance. You'll just notice that brackish water and grave dirt start doing worse and worse for phlegmatic patients and phase it out, chasing the trend after the fact.
How would this look from the perspective of the ancient doctors who are tracking the success rate of their god? They probably don’t know that they’re being roadblocked on their representational vagueness. Ontological difficulties are pretty subtle, especially in systems as complex as the human body. It’s extremely easy to be satisfied with a probabilistic conclusion like “if the patient is phlegmatic, then mulled wine and bull testicles will save them 75% of the time” and assume that’s the whole answer, even if they could save everybody by distinguishing city cough from country cough.
What if I had sent a Russian peasant through my time machine instead of you? What does an indexical miser do when they see a tablet come under that door? “I don’t know, I’ve never met this person,” and slide the tablet back without a mark. Eventually the doctors assume that their god is displeased and stop sending the tablets. Was this the responsible thing to do? Probably not. After all, the humors at least partially point to something real. Even if you can’t get to 100% by distinguishing between city cough and country cough, it’s at least better than random chance, right?
This argument is a common one when people are faced with representational issues: you can only do the best you can with the data you have. I want to push back strongly against that framing. Because in order to make the indexical miser look unreasonable, I had to create a completely locked door, a strictly one-way flow of information, a representation of a patient you can take or leave but never change or look at in more detail. In real life, of course, you can open the door.
That’s the moral of the story. I’ll repeat it.
In real life, you can open the door.
Whenever you’re given a word you think isn’t pointing to a difference, you can ask: what will you do differently depending on whether this patient is phlegmatic or not? You can ask people to give you stories instead of data. You can try recording things in different ways. If a modern epidemiologist was sent back, but allowed to open the door, their most powerful technology would be their checklists, asking more relevant questions than anyone back then would have thought to ask. Do you use a wood stove? Do your children spend significant amounts of time in the basement? Do you use earthen-ware pottery? And eventually, they’d get to: do you live in the city, or the country? When you see the relative impact of mulled wine and bull testicles vs brackish water and grave dirt, split out by city vs. country, you simply don’t need modern computing power to figure out what’s happening. It’s plain as day, right in front of you.
Let me pull you back to the present day and show you something. {Four, hrair, two-hrair} – remember how these were different conceptual things that were only similar ontologically? Well, here’s two words from our modern ontology of diseases. One is cholera, here defined from the World Health Organization:
Cholera is an acute diarrhoeal infection caused by ingestion of food or water contaminated with the bacterium Vibrio cholerae…The majority of people can be treated successfully through prompt administration of oral rehydration solution (ORS)… Severely dehydrated patients are at risk of shock and require the rapid administration of intravenous fluids.”
The second is fibromyalgia, here defined by the National Institutes of Health:
Fibromyalgia is a long-lasting disorder that causes pain and tenderness throughout the body. It also can cause you to feel overly tired (fatigue) and have trouble sleeping. Doctors do not fully understand what causes fibromyalgia, but people with the disorder are more sensitive to pain… Treatment may include exercise or other movement therapies, mental health and behavioral therapy, and medications.
Cholera and fibromyalgia are similar ontologically, in that a doctor might say “you have {cholera/fibromyalgia} and I’m going to prescribe a treatment based on that.” They’re very different conceptually, though. Cholera is caused by a specific bacterium, vibrio cholerae. It leads to consistent symptoms that clear up with consistent treatment. We’re pointing at the difference between people who immediately get better when given fluids and people who don’t, and saying that one way you could name that difference is “cholera.” How do we make that indexical? We have microscopes now. We can literally point at vibrio cholerae. The word “cholera” pays off its meaning-debt when we point and say “this vibrio cholerae, here and now.”
But what is “fibromyalgia” pointing at? An understanding that many people are experiencing similar symptoms. It behaves more like “phlegmatic” did, where patients described in the same way might respond dramatically differently to the same treatment. It’s more useful than not having a word, because patients with fibromyalgia are more similar to each other than they are to people without fibromyalgia. It’s a hook we can use to begin to investigate a difference. “Fibromyalgia” describes uncertainty, but doesn’t terminate it in the way that “cholera” does. Could the word “fibromyalgia” be made more useful? How much is studying the impact of various treatments on “fibromyalgia” going to serve us until we clarify our understanding of “fibromyalgia”? Can you truly compare the outcomes of two studies on “fibromyalgia” if one is city fibromyalgia and one is country fibromyalgia?
Imagine future scholars looking back at you. What issues are we trying to fix with our current verbiage that will have people from the future thinking, “Well, if I traveled back to their time, I wouldn’t show them our superior computational ability—I’d show them our superior ontology”? Will people in the year 2100 use the word fibromyalgia at all? Or will “fibromyalgia” be replaced by a number of words for different causes of pain and fatigue, each with their own corresponding treatment?
And when you do have a word that seems like it’s not pointing properly—how should you think about it in the meantime? How do you make your strategies for uncertainty account for the fact that even your representations are uncertain?
III.
What we need now is an example of effective decision-making that avoids our pointing problems. If you want to investigate decision-making in areas of high uncertainty and urgency, it’s hard to beat talking to firefighters. In Sources of Power, 20th Anniversary Edition: How People Make Decisions, researcher Gary Klein and his team recount the surprising conclusions they came to after studying firefighters:
We thought this hypothesis—that instead of considering lots of options they would consider only two—was daring. Actually, it was conservative. The commanders did not consider two. In fact, they did not seem to be comparing any options at all. This was disconcerting, and we discovered it at the first background discussion we had with a fireground commander, even before the real interviews. We asked the commander to tell us about some difficult decisions he had made.
“I don't make decisions,” he announced to his startled listeners. "I don't remember when I’ve ever made a decision.”
For researchers starting a study of decision making, this was unhappy news. Even worse, he insisted that fireground commanders never make decisions. We pressed him further. Surely there are decisions during a fire—decisions about whether to call a second alarm, where to send his crews, how to contain the fire.
He agreed that there were options, yet it was usually obvious what to do in any given situation. We soon realized that he was defining the making of a decision in the same way as Soelberg’s students—generating a set of options and evaluating them to find the best one. We call this strategy of examining two or more options at the same time, usually by comparing the strengths and weaknesses of each, comparative evaluation. He insisted that he never did it. There just was no time. The structure would burn down by the time he finished listing all the options, let alone evaluating them. [Emphasis mine.]
Not making any decisions about how to fight fires is something that me and this fireground commander have in common. But if I was in charge of leading a fire response, the outcome would be a lot worse. The fireground commander is the highest ranking person on the scene of the fire, and must have been chosen to do something relating to fires better than I would. If it’s not decision-making, what is it?
Let’s imagine that there’s a flowchart in the fireground commander's head declaring exactly what to do in any circumstance. “If it’s a big fire, only attack it from outside with hoses. If it’s medium size, go in and evacuate people, but only if you can do it in five minutes” —those sorts of rules. When the fireground commander says “he doesn’t make decisions,” he’s saying that the flowchart never invokes “decide between these options.” You always just follow what the chart does. If the fireground commander managed to get the flowchart out of his head and onto paper, would I then be able to follow it and fight fires as well as he does?
Let’s try. When does a fire go from “medium” to “big”? I have no clue. I can observe a here and a now, and I can have a completely well-defined flowchart of decision-making, but in order to make that here and now talk to the flowchart, I have to do correspondence work to make sure I’m mapping the world to the right part of the flowchart. This is the exact inverse of the problem that Drs. House and Field struggled with. They had a systemic, well-agreed upon method of determining which patients were phlegmatic, but they had two different flowcharts in their heads that were thinking of two differences in interventions when they said “phlegmatic”. The fireground commander’s representations are meaningful with respect to the differences: big fires are the ones you don’t get in the house for. Instead, the problem is looking at the real world with its infinite supply of detail and knowing which concept to point to.
In some domains, this isn’t such a big deal. In a chess game, for example, the rules assert that the detail of the world can be abstracted away, that a king piece made of young wood is the same as a king piece made of old wood. In fires, it’s entirely possible that the age of this particular wood is going to matter a great deal, as will the moisture in the air, the ambient temperature, the construction of the floor—you get the idea.
Still, it feels like we should be able to get the best of both worlds here. Let’s figure out an objective, agreed-upon way to pick a state for a fire, just like Drs. House and Field had an agreed-upon way to pick a humor for a patient. And we’ll make sure those states map to the fireground commander’s decision-free flowchart. As long as you have those two guarantees, you don’t need the fireground commander at all. In fact, you could fight the fire with your eyes closed! Just have someone describe the fire to you in the agreed-upon way, run through the flowchart in your head, and tell them the answer.
Fighting fires with your eyes closed sure would be nice. We should probably get a more concrete example of how fireground commanders point to see how we’d go about doing it. This time I don’t even have to make up the scenario! Let’s look at this full-length anecdote from Sources of Power:
Example 4.1 The Sixth Sense
It is a simple house fire in a one-story house in a residential neighborhood. The fire is in the back, in the kitchen area. The lieutenant leads his hose crew into the building, to the back, to spray water on the fire, but the fire just roars back at them.
“Odd,” he thinks. The water should have more of an impact. They try dousing it again, and get the same results. They retreat a few steps to regroup.
Then the lieutenant starts to feel as if something is not right.
He doesn’t have any clues; he just doesn't feel right about being in that house, so he orders his men out of the building—a perfectly standard building with nothing out of the ordinary.
As soon as his men leave the building, the floor where they had been standing collapses. Had they still been inside, they would have plunged into the fire below.
“A sixth sense,” he assured us, and part of the makeup of every skilled commander. Some close questioning revealed the following facts:
• He had no suspicion that there was a basement in the house.
• He did not suspect that the seat of the fire was in the basement, directly underneath the living room where he and his men were standing when he gave his order to evacuate.
• But he was already wondering why the fire did not react as expected.
• The living room was hotter than he would have expected for a small fire in the kitchen of a single-family home.
• It was very quiet. Fires are noisy, and for a fire with this much heat, he would have expected a great deal of noise.
The whole pattern did not fit right. His expectations were violated, and he realized he did not quite know what was going on. That was why he ordered his men out of the building. With hindsight, the reasons for the mismatch were clear. Because the fire was under him and not in the kitchen, it was not affected by his crew’s attack, the rising heat was much greater than he had expected, and the floor acted like a baffle to muffle the noise, resulting in a hot but quiet environment.
This incident helped us understand how commanders make decisions by recognizing when a typical situation is developing. In this case, the events were not typical, and his reaction was to pull back, regroup, and try to get a better sense of what was going on. By showing us what happens when the cues do not fit together, this case clarified how much firefighters rely on a recognition of familiarity and prototypicality. By the end of the interview, the commander could see how he had used the available information to make his judgment.
This anecdote is often repeated outside the pages of Sources of Power to describe the “power of intuition.” But what do we mean by intuition, anyway? When I walk into a room and notice that it’s quiet, that’s not intuition, just observation. The impressive part is that the fireground commander recognized the situation was atypical without knowing why, which allowed him to act on a fact (the room was too quiet for how hot the fire was) without actually being able to describe the specific fact he was acting on. He didn’t point to “too quiet”; he pointed to “not normal,” and only figured out the “too quiet” part reflecting after the fact. This almost seems like cheating after all of our hard work. We’re trying to figure out the right way to call a fire “too quiet,” because in our flowchart, “too quiet” points to “leave the building,” and it’s extremely important to pay attention to that difference when it’s relevant. But this fireground commander jumped straight to the difference of “not normal, so leave the building” without mucking around with concepts at all. How the hell are we meant to learn from that?
Let’s turn our question the other way around. We want to understand what the fireground commander actually did in representational terms. What’s the simplest possible example that can give us a sense of that? Suppose I wrote a companion book to Sources of Power called Sources of Miniscule Amounts of Power, and I opened with this case study:
Example 1.1 The Wobbly Table
Gregory puts his mug on the table and notices that the table wobbles when he does so, and his drink seems to be moving slightly away from him. He moves his drink to a different table nearby and then looks under the first table. The far opposite leg is a bit shorter than the others. Luckily, Gregory has a copy of Sources of Power, 20th Anniversary Edition: How People Make Decisions on hand. He puts it under the short leg, then grabs his drink and places it back on the table. The table stops wobbling and his drink is still.
It’s not the sort of thing we’d call intuition, and it’s not the sort of thing that impresses us, but it is the sort of thing that the fireground commander did. Gregory didn’t immediately know why the table was “wrong,” but his vast amounts of lived experience with flat tables let him recognize that the table wasn’t flat and act on it prior to having any particular concepts in mind. Gregory has a mental image of a “normal table” just like the fireground commander has one of a “normal fire,” and he’s attuned to that difference even when he can’t immediately state a cause.
So why is Gregory’s story less impressive than the fireground commander’s? Let’s ask ourselves: what factors can make a table violate our expectations of normality? Well, the legs could be different lengths, or the floor itself could be slanted, requiring an artificial lengthening of one or more legs. That’s basically it! Of course, the infinite detail of the world could theoretically introduce another cause. Maybe the table actually is flat, and the wobbling and drink movement were due to a small earthquake in an area where they aren’t common. In practice, though, that sort of thing almost never happens. We’re comfortable holding an “idealized form” of a table in our heads that says “a non-normal table is non-normal because of irregularities in leg length, or a slant of the floor below.” Basically every person on planet Earth has a large body of lived experience telling us that this almost always works as an explanation. In the times when it doesn’t, we’ll be confused for a bit, hear about the earthquake later, and then have a fun story.
Both the fireground commander and Gregory noticed that something was wrong, but Gregory already had an idealized form telling him exactly what sort of table is “normal” and what sort of causes to look for if it’s non-normal. The infinite sources of detail in the world—the wood, the temperature, the moisture in the air—never came into the picture. The fireground commander doesn’t have that same guarantee. His idea of a “normal fire” is tentative, subject to any number of things that could violate it without him having a list of what those things are. The risk and ambiguity that the fireground commander must grapple with is what makes the first story so compelling and the second one so boring, and the fact that he can recognize a “normal fire” enough to perceive a deviant one is what he possesses that I don’t.
So, how do you fight a fire with your eyes closed? You don’t. Because instead of pointing at a particular fire and saying “medium” or “big,” you might need to say “not normal” and improvise. Going by a flowchart is only possible if you can explicitly list out every single factor that might be relevant ahead of time. For tables and other deliberately constructed things, you can have a small-world idealization that works so often that you can use the flowchart and basically never be punished for it. When you’re dealing with complex, interactive, dynamic sorts of phenomena (like fires), the real world is a lot more likely to intrude on your simplified model of how it works and force you to consider new details.
I mentioned at the start that I’d have to gently mislead you for a while, and now seems like the time to come clean. This is an essay about problems with representation, but throughout the first two sections, I made it seem like they were specifically problems with words. That’s not what’s happening here, though. Look at the word “quiet.” We all know what difference “quiet” is pointing to. In fact, we could exactly measure the volume of every fire if we liked. Firefighters don’t do that, because it doesn’t matter very much—a quiet fire is often a safer fire. The specific fire in The Sixth Sense was dangerous because it was too quiet given the other factors, and therefore not normal. But just as it required intuition to start with a fire and realize you needed to include the volume, it also requires intuition to start with the volume of a fire and figure out which other factors you need to determine whether the volume is too quiet given those other factors.
It’s not just the pointing of an individual concept we’re worried about, but also the ensemble of concepts we’re choosing to consider. The linguistic battles we were fighting in section II are only part of the story. It’s not enough to make sure an individual concept is well-defined, because we have an infinite number of concepts (well-defined or otherwise) we can choose to consider when evaluating any situation, and we need to use our finite brains to pick enough of them to make a good decision. To make this work, we also need to develop a sense of normality to recognize when the concepts we’re using aren’t enough to explain what’s happening. (Think of Gregory and his table, noticing “not-flat” before noticing why. The fireground commander’s sense of “normality” can be thought of as knowing which fires are “flat,” i.e. imitate the usual model, and which fires have a factor that’s out of balance with expectations.) You can’t list every potential factor ahead of time, so you just need to keep your eyes open and look for hints that it’s time to involve another concept in your representation.
In other words: don’t rely on a strictly one-way flow of information, a representation of a fire you can take or leave but never change. I’m just talking about opening the door again. The work to make “phlegmatic” point at a meaningful difference and the work to intuit that your model of a situation is missing a concept may seem completely different, but they’re both dealing with the same problem of fixed representation. The answer in either case is to allow your representation of what there is—what kind of patient? What kind of fire?—to change with interaction so it can capture the differences that matter.
Don’t try to fight a fire with your eyes closed. Don’t try to diagnose the patient from behind the door. The same moral to both stories, but they reveal different facets of the problem. When you imagine uncertainty like a fire, you see that adding in additional factors was an automatic, intuitive process that happened without conscious effort because it was needed to address an urgent break in normality. It’s important, though, for you to also keep thinking about the tablets coming under the door, because representational problems aren’t simply time and resource constraints forcing us to make approximations. Representations annihilate detail, with the city phlegmatic and country phlegmatic patients having identical tablets, and it’s not anything that you can fix with arbitrary computing power and arbitrary time to think. You need to be able to go back to the well of detail and gather the factors you need. Meaning is a product of interaction, not something that springs from a dead dataset.
To act on anything in the world, you have to represent it in a way where you can imagine which actions to take. We develop this capability before language, as embodied cognition and intuition about physical objects. And part of this capability is the ability to notice when your representations break down and need to be amended. We do this all of the time without needing to consciously think about it. If you were in Gregory’s place for The Wobbly Table, you would have handled it just as well without needing to be taught.
That’s why I had to start with the door and not the fire. You need to see the door in your mind's eye, a great imposing thing between you and the phenomena. You need to appreciate how utterly impossible it is to reach true understanding without opening it. Only once you have that model should you think about uncertainty as a fire that you interact with to pull meaning from. We have tools to fix representational problems, but they’re local tools that have to touch the infinitely detailed world to work. You can’t use them on the tablets that get slid under the door.
More and more, though, we’re being asked to try. We’re assaulted by a flurry of charts and articles and studies and being asked to draw meaningful conclusions from them. (If we’re not just told to “trust science!”, as though there was a single correct way to interpret every finding.) That’s where this story has to end. We intuitively know how to open our eyes in our everyday lives, but how do we open the door between us and the myriad of static representations that modern life puts before us?
IV.
Throughout this essay, I’ve had to use strange words that most people aren’t used to seeing, contrived and specific examples, misdirection and repetition. The reason I’m using all of these tricks is because we don’t have well-known, socially negotiated ways to describe these representational problems. This is the first and most urgent lesson you need to come away with. This is happening everywhere, all of the time, and it’s happening largely because our concepts around indexicality and meaning are dramatically underdeveloped. No one is driving the bus and making sure meaning gets to where it’s going, and no institutional authority can be trusted to deliver these answers.
There’s no global answer to these questions because all meaning is interactive and contextual, but that’s not the same thing as saying all meaning is relative and personal. The world has real patterns waiting to be unearthed. But to use those real patterns to drive action, you need to create an abstract concept to hold the pattern, then make a real situation point at the abstract concept accurately enough.
We know meaning when we see it, and for most of human history, that worked well enough. Never mind if the actual mechanics of representation are awkward to discuss and difficult to understand—as long as we can stop the table from wobbling and get out of the building when the floor collapses, that has been enough. For much of human history, most of the data we acted upon was held in our heads. Things that were externally recorded were often simple representations that were maintained by the same person who used them to make decisions, like a merchant's inventory and receipts. Data couldn’t help but be interactive and indexical, because it was bound up inextricably with human beings.
But the people involved wouldn’t think of it as “interactive” and “indexical” any more than the fireground commander thought he was making decisions. Figuring out the right way to represent something is an automatic human tool we use to get things done without worrying about theory. What counts as a chair in your house? Well, maybe an ottoman is good enough to perch on when you’re watching TV, but when the in-laws come around, only your chair-chairs count as chairs. A stump is a chair when you’re outside around the fire, but if your friend asked you to bring a chair to his housewarming party, he’d be pretty upset if you lugged a stump into his house. But your friend wouldn’t ask, “What are you pointing at when you say ‘chair’? Do you think your concept of chair is pointing meaningfully at the sort of things we’d like to sit on here? How much did you consider indexicality before coming to this party?” They’d just say “what the hell, man, you’re being ridiculous.”
There’s nothing wrong with just calling stump guy ridiculous! No one needs to draw on these deeper concepts of meaning to point out what went wrong here! We can all handle that problem just fine if we’re there to see it. But what about when we’re not? What about when we’re behind the door, and just given a static representation? Jeremy’s house has six chairs in it. But if he’s counting the ottoman, that’d better be for him and not the in-laws. And if one is a stump? That’d be ridiculous. It’s easy to give these proclamations from within Jeremy’s house, but annoying and high-effort to look at “HOUSE: Jeremy. CHAIRS: Six” and try to figure out from there whether there are actually six chairs or if some of the things being counted are ridiculous. So mostly, we just hope that the process of collecting the data doesn’t result in any ridiculous answers. But because data collection and analysis work can be made more and more automatic and scalable while the anti-ridiculousness work is much more manual (going to Jeremy’s house and seeing what the six chairs actually are), the balance is getting ever-more disturbed by modern norms of science.
Tal Yarkoni identifies this tension in his paper “The Generalizability Crisis.” It tackles the scientific “replication crisis”, where many dramatic findings in many fields don’t seem to happen when other people run the same experiment. Yarkoni attributes this to the step where we trade verbal constructs for statistical operationalizations:
Suppose I hypothesize that high social status makes people behave dishonestly. If I claim that I can test this hypothesis by randomly assigning people to either read a book or watch television for 10 minutes, and then measuring their performance on a speeded dishwashing task, nobody is going to take me very seriously. It doesn’t even matter how the results of my experiment turn out: there is no arrangement of numbers in a table, no p-value I could compute from my data, that could possibly turn my chosen experimental manipulation into a sensible proxy for social status. And the same goes for the rather questionable use of speeded dishwashing performance as a proxy for dishonesty.
The absurdity of the preceding example exposes a critical assumption that often goes unnoticed: for an empirical result to have bearing on a verbal assertion, the measured variables must be suitable operationalizations of the verbal constructs of interest, and the relationships between the measured variables must parallel those implied by the logical structure of the verbal statements. Equating the broad construct of honesty with a measure of speeded dishwashing is so obviously nonsensical that we immediately reject such a move out of hand. What may be less obvious is that exactly the same logic implicitly applies in virtually every case where researchers lean on statistical quantities to justify their verbal claims. Statistics is not, as many psychologists appear to view it, a rote, mechanical procedure for turning data into conclusions. It is better understood as a parallel, and more precise, language in which one can express one’s hypotheses or beliefs. Every statistical model is a description of some real or hypothetical state of affairs in the world. If its mathematical expression fails to capture roughly the same state of affairs as the verbal hypothesis the researcher began with, then the statistical quantities produced by the model cannot serve as an adequate proxy for the verbal statements—and consequently, the former cannot be taken as support for the latter.
Yarkoni’s point about speeded dishwashing is similar to our pointing problem with “phlegmatic”. Dishwashing aptitude clearly doesn’t point to any difference caused by any concept of “dishonesty”, but even things that have some superficial relationship to our idea of “dishonesty”—amount of eye-contact, or heart rate on a polygraph, or whatever—could have the same danger of meaning-debt if they’re trying to proxy for the reasons we care about “dishonesty” (can I trust them when they say how much my share of the bill was, or do I need to see a receipt?). Yarkoni calls this the “generalizability crisis”. We can view it as a subset of our representational crisis, since it’s the same problem of concepts failing to cleanly map to differences. Remember, though, that uncertainty is like a fire as well as like a door. The representational crisis is not just about individual low-validity concepts, but whether you’re looking at the proper ensemble of concepts.
Let’s suppose Dr. House and Dr. Field each performed studies on the efficacy of mulled wine and bull testicles for phlegmatic patients and published them in modern-style scientific journals. House went first, testing with the city patients and finding a dramatic impact. Field published after, looking at country patients and finding no impact. This might be called a failure of replication, but we know it was actually a failure of representation. Dr. House’s work wasn’t replicable because Dr. Field was testing a different thing than Dr. House was, even if they both called their patients “phlegmatic.”
Just like with the fireground commander, these are problems that humans have experience catching, but only when they’re able to interact with the world and not just with dead representations of things. This story from Chemical & Engineering News is a great case study. A chemist publishes a paper about a supposedly “metal-free” reaction. But other chemists reading it know that palladium is stubborn and hard to completely get rid of. So they replay the methods of the paper and replicate the original results, but then try different things that remove contaminating palladium and show that the reaction doesn’t work anymore. That’s how this sort of verification has to work. Textual or statistical analysis won’t get you anywhere, since the whole problem is that the original paper represents itself as “palladium-free” and you need to have the experience to say: “Well, I’m sure they didn’t mean to have residual palladium, but that’s not the same as palladium-free.”
In chemistry, the objects you’re working with are precise down to the elemental level. Even that isn’t enough to stop the problem of uncertainty being like fire! What it does mean, however, is that one chemist generally has a very easy time replicating what another chemist did. The original paper didn’t contain an accurate description of what there was (trace palladium), but the language of chemistry permits very accurate descriptions of what was done. So even if the paper itself is a dead representation that lives behind the door, the methods section allows someone else to open the door for themselves, stand where the original chemists stood, and try something new.
The same doesn’t hold true for House and Field. Even if they had followed identical methods to the letter, they would still have gotten totally different results, because their base unit of “a phlegmatic patient” simply is not stable between experiments in the same way as a chemical compound. Field doesn’t open the door to the same place in which House stood: he opens the door to country patients and tries to replicate a finding from city patients. So of course subject areas that work with highly contextual objects, such as people and ecosystems and cities and ideas, will have problems with replication! And it’s not something that can be caught the same way that the chemists caught the palladium issue, because you can’t run an experiment on the same patients at the same moment in time. You can trust palladium now to be like palladium before, and you can trust palladium here to be the same as palladium there, but that’s obviously not the case for anything to do with human beings. So when an experiment doesn’t replicate, it doesn’t necessarily mean that the first scientist lied about their impact or the second scientist’s experiment contained methodological sloppiness. It could just mean that the two scientists were running entirely different experiments (the effect of a treatment on c. urbanicus vs. c. pastorilus), and only thought they were the same because they labeled their different patients the same way.
I’m not disputing that there’s plenty of methodological sloppiness and outright fraud in science. But I do genuinely think this is the most important frame to the replication crisis. After all, the labels are also how you’re going to end up using the study. “Are you feeling phlegmatic? Well, there was this new study…” Meaningfully using a scientific finding in your indexical life has exactly the same issue with representation as trying to replicate that finding. A country physician needs to know that their phlegmatic patients are different from a city physician’s phlegmatic patients, while each individual patient needs to know which paper to read when faced with the differing results.
When you think about it this way, it’s clear that many of our existing strategies for imbuing scientific papers with more meaning aren’t relevant here. Statistics? House sees a good result 100% of the time, and Field sees a bad result 100% of the time, so neither of them need any degree of statistical fluency to interpret the signals they’re seeing. Bigger sample sizes? Maybe they’ll get a hint of the real issue if the demand for more patients forces them out of their normal stomping grounds and increases sample diversity, but by and large House will just see more city patients and Field will see more country patients. Preregistration? This wasn’t a method-based problem at all, because House and Field did exactly the same thing.
What about a meta-analysis that looks at several different papers to try to figure out what’s really happening? Now we’re getting warmer, but there’s a critically important distinction here. Meta-analysis can clue you in that the difference exists, but it’s not a tool that lets you reach “the answer,” because the whole issue is that there is no singular answer. Doing a numerical meta-analysis is only going to mislead you by averaging together dramatically different studies, in the same way that no formal methods worked for the rabbits because hrair could be 4 or 1000. We need a new kind of meta-analysis that lets us do what the chemists did and pick out details that the authors didn’t think to explicitly represent. But if two papers can follow identical methods and get different results, how can you read them both and tease apart the difference?
The antidote to representation issues is context. We need scientific papers to be more like murder mysteries, replete with “unnecessary” details that might include the crucial clue. By definition, the authors themselves won’t understand which detail they add that might prove crucial for understanding a representational problem—if they understood it, they’d represent it, and it wouldn’t be a problem. But the authors don’t need to understand the importance of particular details. Let’s say that House and Field were both forced to add some set amount of narrative context to their paper. They’re hardliners who are convinced there’s nothing more to a patient than “phlegmatic” and resent the whole exercise. They agree to both use the space to complain about how annoying the required sample size was. House’s complaint is that he had a constant line of patients outside his door that disrupted his neighbors, while Field’s complaint is that he went through five horses because he had to travel to so many different farms to get to everyone.
What happens when those narrative studies are slipped under the door instead of just the tablets? House never explicitly says “I’m a city doctor working with city patients” in his paper, but now it can be inferred by the reader. You can’t directly recreate the original scenario like the chemists could, but you can at least see into the experiment beyond just the results. In a situation where the same methods led to different outcomes, the papers themselves will now be different in a way that can help you tease out why. You’re not completely helpless like you were up there in section II.
Phrased like this, it’s almost insultingly straightforward. Of course adding more stuff to a paper makes it easier to figure out! Maybe because representational issues are so obvious when you’re face to face with them, we don’t tend to think of them as a general phenomenon. We can recognize them and solve them, but individually, without recognizing it as a “kind” of problem.
We have to imagine detail contaminating all abstracts, though. Just as the theoretical construct of “metal-free” couldn’t help but mean “no metals explicitly added, though some contaminants remain,” “phlegmatic” can’t help but mean something more nuanced than “a symptom state such that all patients who exhibit it will behave exactly the same way to exactly the same interventions.” The more indexical the object is, the more impossible it is to run two experiments and equate them with each other, and the more context you need to determine how much of the relevant detail your representations are actually capturing. Indexicality can’t be objectively measured or tested for, but that doesn’t mean it’s subjective or arbitrary. Clearly, a human being and a fire have many factors that could influence the outcomes of any given intervention, while palladium compounds and tables have fewer. What we don’t have is well-negotiated language to talk about how indexical various concepts are, and to what degree detail intrudes on the pure concept as it’s being used.
No one will give you the answer—so you have to figure it out yourself. This is not a manifesto about all scientific findings and modern rationality being wrong. It’s a warning that the right stuff and the wrong stuff are going to look exactly the same to the untrained eye. The same methods, the same data analysis techniques, the same probabilities. There’s only one way to sort the good from the bad: are the representations meaningful with respect to the differences? You can’t trust the people telling you “the facts” to get this part right. Look for clues. Is it a finding about something largely independent of its context like palladium, or is it highly indexical like a human being? If you see a headline, is it about something you can instantly imagine the definition of (like “all 2018 Volkswagen Jettas”) or is it something that doesn’t have a single socially negotiated meaning (like “healthy diets”or “predatory journals”)? Do you know the story of how the data was collected? If something very weird happened, whose job would it be to encode it, and what categories would they pick to describe it? Was this analysis done by someone in direct contact with the data collection, or are there multiple layers of abstraction between the phenomena and the conclusion?
It’s becoming easier than ever to collect big gobs of data, cheaper than ever to hold on to it, and faster than ever to send it out for other people to derive insight from. Once you start asking these questions about representation, you can’t help but notice the problems are getting worse and worse every year. This is the urgency that drove me to start Desystemize in the first place. You can see this concern right at the very beginning in Desystemize #1:
Dr. Ostfeld didn’t start that paragraph by noting such-and-such statistical technique clearly indicated something was off with the tick counts. He started with the sentence “My research group has set and checked many hundreds of thousands of live animal traps over the years.” In other words, it was familiarity with the data-generating process that enabled the lab group to imagine this potential vulnerability and come up with this experiment. By the time the data gets into the hands of analysts, it’s too late to fix. You can’t math your way out of a wrong number. This mistake was caught only because it was the same people generating the data as analyzing it. Which, great for ecology - but as data science becomes more and more specialized, it will be increasingly done by people who are explicitly and solely data scientists. And they’ll inherit datasets from repositories somewhere and never catch a single one of these systemic errors because they couldn’t sift through the wet mouse turds even if they wanted to.
After all of my previous articles dancing around these issues of representation, at last we’ve arrived within the right arena to face the beast head on. No “scientific method” will ever be enough on its own, because the same method may succeed in domains with meaningful representations and fail dramatically when the representations are drowning in meaning-debt. The solution is to open the door and to engage with differences beyond dead representation. Trust your ear to tell you when the fire’s too quiet. And when you’re asked to make a judgment about something you can’t open the door to yourself—a study or an article or a graph, the sort of thing you’ve been shown thousands of times and will be shown thousands more—don’t trust their ontologies without question. Look for hints that they’ve taken representation as seriously as they ought to. Ask questions, and from their answers, forge a proper language of representation, a social understanding of which ontologies are useful and which are bankrupt. Take this tacit skill scattered among a few experts and make it a core part of education and the human experience. It’s gotten too easy to live behind the door, and the stakes are too high to keep it up much longer.
Thanks to Crispy Chicken for significant conceptual feedback, the rest of the Inexact Sciences crew for inspiration, Lyta Gold for editing, and everyone who kept me company while I wrote.
Some housekeeping in the comments...
Yes, Desystemize is not dead! I just got sick of tip-toeing around the general representational crisis and wanted to get it nailed down once and for all. This took a lot of time, but I'm glad that it's done. We're probably never getting back to weekly cadence, but we're definitely going back to something more regular than this. Going forward, I'll probably retain the "Desystemize #N" naming scheme for stuff that's kind of just "one thought" and use proper titles for arguments like this that are more comprehensive and structured.
Also - the eagle eyed among you may have noticed that the last Desystemize said this would be about AI existential risk and it's not. Basically, I wrote a couple thousand words about it and then I got bored. I might collaborate with Crispy on it and revive it in some form but I didn't feel like pushing it through. I will say, though, that this article kind of gets my main point across - which is that any intelligent AI will need to be an embodied AI that can pick novel details out of it's environment, not a correlation machine munching on pre-chewed data. Hopefully once you've internalized "You can't diagnose patients from behind the door", it's easy to see why "Okay, but what if the computer behind the door is like, *really super duper fast*" is not an especially serious argument.
I learned my lesson from the above and didn't put a preview at the end of this one. Though if I don't change my mind, next article will be about ontological remodeling and sudoku.
Excellent post! I just have one small aside:
This is much less general than the broad representational issues you're talking about, but a term I like for the problem that the city doctor and country doctor are having is "denominator problem."
Whenever someone gives you an average, you should ask "what are they dividing by?" There is some background set that's "normal" and if you divide by measurements of different things, you get different averages.
The country doctor and the city doctor have different experiences because they see different people. They aren't explicitly doing math, but their ideas about "average" have different denominators.
However, there are two levels of concern (1) do you understand the dataset? What do they think they're dividing by? (2) what are they really dividing by? As you say, reanalyzing the dataset might not be enough. You might have to go and look.
And then there are the broader questions of whether you're even counting the right things.