Today, we’re looking at an article from Jukka-Pekka Onnela with the refreshingly direct title “Exporting the same data from a wearable twice doesn’t give you the same data”. We’ll use this to more generally explore the concept of reporting continuity (sometimes just “continuity”). Note that this is one of those cases where I’ll be using personal experience as a healthcare data worker and can’t necessarily provide textual backup to everything I say. In fact, when I search “reporting continuity”, the top results are all about business continuity, which is how a business returns to normal operations after a disruption and has nothing to do with what I’m talking about here. For all I know, “reporting continuity” is a tribal affectation for ex-Epic Systems nerds that no one else uses, but it’s the word I know it by so it’s the word I’ll use. (Please sound off if you know of a more widely accepted term for this, though.)
In brief: HealthKit is Apple’s tool to aggregate health data and standardize it for research and interoperability. This makes it easier to work with than raw sensor data from the Apple Watch or whatever wearable item, but it’s also a system that adds a layer of abstraction. And what do systems do? They annihilate detail. Technically speaking, the sensors themselves are also systems, so you end up exporting a ghost of a ghost, data mangled by a set of rules.
What happens when those rules change?
This is what Onnela observed in his article. He was studying heart rate variability (hereafter HRV), but the actual metric is almost irrelevant to the point that we’re making. The thing that matters is that they did two extracts separated in time (the first one September 5, 2020, and the second on April 15, 2021) but looking at the same date range in the data. Since time travel is impossible even for Apple, the past should remain constant no matter how far ahead in time you’re looking at it. For example, you look the same in your 2010 yearbook whether you’re reading it in 2011 or 2021. The time you’re reading the book doesn’t have any bearing on what the book says.
It matters a lot if you’re looking at Apple’s idea of HRV, though:
It’s worth stopping a moment and reflecting on how completely ridiculous that second graph is. On the x axis is the data from one export, and on the y axis is the data for the same dates from another. The expected correlation is 100%, represented by the identity line running diagonally through. Why wouldn’t it be? These are precisely the same events, with the only difference being when we choose to look at them. So every single bit away from the line represents data instability over time.
Presumably Apple had a good reason to change their algorithm so drastically. I don’t know anything about calculating HRV from sensor data and have no reason to assume otherwise. If it corresponds better to physical reality, isn’t that the work that Desystemize wants to hype? In a vacuum, yes. But meaning is a product of interaction, which means that this change can’t just be analyzed by what it did to numbers living on your phone. Instead, we have to analyze the impact it had on the information ecosystem attached to those numbers. And since meaning is contingent on the governance of the actions that went into the data, basically all definition surprises are bad surprises.
A quick caveat: I work downstream from wearable data and don’t have access to these HealthKit exports personally. In a world where the Healthkit extract clearly indicated that you were downloading HRV data calculated by algorithm X or Y, with your choice to export the past data under either regime, there would be nothing especially irresponsible about these changes. I am taking Onnela’s surprise at face value and assuming that this change was mandatory and not specifically indicated in the export screen. This may seem presumptive of me, but quite frankly I’ve been in the industry long enough to know that the big players recklessly and silently break continuity all of the time. Still, if someone provides evidence to the contrary, I’ll happily retract my criticisms for this specific case, even if it still holds as a general phenomena.
For now we’ll assume it was a forced rollout. What does it mean to suddenly adopt a new definition, even a more accurate one? It means that you can no longer compare your previous data with your new data. Now, from the perspective of a single HealthKit enabled device, that isn’t necessarily a huge problem. Since the new definition works on the old data, you can look back and put your current data in context with the past. It might surprise you if you were keeping track of your “normal” and saw it suddenly change, but it’s a change you can investigate and rectify.
How does that signal to re-evaluate everything propagate outwards, though? Every study using HRV data needs to pull it all again. Any composite dataset of Healthkit data that has a mix of old and new algorithm HRV is comparing apples to oranges, and useless until it’s brought back to equality. And of course, that’s assuming you have the ability to force a new export. If you’ve ever downloaded a dataset without also getting a link to the originating devices, the algorithm change just effectively means that your dataset has become stale, unable to be used to comparisons with the present or future.
And it gets worse! Let’s suppose you’re training a predictive model for patient risk, using patient wearable data to make it “personalized for them”. No matter how responsible you were investigating the impact of Healthkit data on your model, it’s all for naught when the definition ticks over. If you have a model in use making clinical decisions, that means that the Healthkit component could become out of date at any moment, on the whims of Apple - and there’s no remedy except from trying to get as many re-exports as you can and training it all over again. How long is that going to take? And what happens in the meantime? You’ll be making decisions off of one idea of HRV, using data based on another.
When you think about it this way, you can see that it’s only partially Apple’s fault. There are definitely ways that you can announce continuity breaks loudly - proactively reaching out to people who have exported, showing the algorithm changes on the export menu itself, limiting changes to pre-arranged windows (ie, having only a few set days a year where you reserve the option to change, so people can know to check on those specific times). But there’s only so much you can do at the source, because the idea that data can be a freely traded commodity is itself a big part of the problem. How can you expect every single downstream source to update in time with a change at the root? The governance is part of the data, but a part that requires constant examination and care. Fundamentally, data can just go farther than governance, since data remains static forever and free to endlessly share, remix, and train datasets.
So it’s our vision of the future that’s the ultimate culprit. Create as much data about yourself as possible, and it can be used to personalize your world! But all prediction supposes that the past data predicts the future data, and when someone else owns the definitions, they can break continuity whenever they want. So any meaningful use of data needs to be a constant, ongoing conversation with the generating process. Instead, we have an ecosystem of “data dumps”, “extracts”, and other static datasets, ready to silently break their relationship with future data at any time. Whenever you see a study based on data that’s generated by an entity outside of the actual people performing that study, you should always stop and ask yourself: if what the data means changes one day, would the people running the study know?
Kudos and thanks on your thought-provoking substacks so far! These posts are reminding me of back one day when the internet was dominated by free information and fascinating frontiers. Particularly your article one or two ago centered around the puzzles with six on the left, six on the right. Reminded me of a website from maybe 15, 20 years ago that led the reader through discovering the principles of thermodynamics. I'm not searching for it now, but something like ("work is energy" and "energy is not work") would hopefully find it...got sidetracked a bit there. Main point if my comment is that I'm really enjoying your substacks, sk than you!