This seems remarkably related to Nelson Goodman’s problem of “grue”. He defined something to be “grue” if it was observed before 2025 and turned out to be green, or if it wasn’t observed before 2025 and turned out to be blue. He notes that all observed emeralds have been green, but also that all observed emeralds have been grue. *We* obviously predict that future observed emeralds will still be green, but there’s a sense in which this post might be suggesting that a certain kind of improperly structured machine learning algorithm might predict they’ll be grue (because it doesn’t notice that the training data is based on the property of “detected cancer”, which is grue-some, rather than “cancer”, which is green-like).
> Predictive models for cancer (and all sorts of other diseases - who knows how many other conditions have this same bias?) are being created at health systems all around the country as we speak. They will be turned on, and they will be used to drive clinical decisions, and they will have a significantly lower score for 64 year olds than 65 year olds because 65 is a much more common year to get diagnosed with cancer than 64.
On the other hand, it seems like a health executive might know that it's beneficial to try to diagnose cancer earlier than they see it. I don't know how these predictive models work, but if there's big spike at 65, couldn't they say, okay, let's test more at 64 to try to catch it earlier? That should help even with a mistaken theory of why the spike happens.
Though, maybe then they run into false positives because age isn't a very specific risk factor, and it's harder to get people to do the testing because of the access issues, so then they figure out what's going on.
This seems remarkably related to Nelson Goodman’s problem of “grue”. He defined something to be “grue” if it was observed before 2025 and turned out to be green, or if it wasn’t observed before 2025 and turned out to be blue. He notes that all observed emeralds have been green, but also that all observed emeralds have been grue. *We* obviously predict that future observed emeralds will still be green, but there’s a sense in which this post might be suggesting that a certain kind of improperly structured machine learning algorithm might predict they’ll be grue (because it doesn’t notice that the training data is based on the property of “detected cancer”, which is grue-some, rather than “cancer”, which is green-like).
> Predictive models for cancer (and all sorts of other diseases - who knows how many other conditions have this same bias?) are being created at health systems all around the country as we speak. They will be turned on, and they will be used to drive clinical decisions, and they will have a significantly lower score for 64 year olds than 65 year olds because 65 is a much more common year to get diagnosed with cancer than 64.
On the other hand, it seems like a health executive might know that it's beneficial to try to diagnose cancer earlier than they see it. I don't know how these predictive models work, but if there's big spike at 65, couldn't they say, okay, let's test more at 64 to try to catch it earlier? That should help even with a mistaken theory of why the spike happens.
Though, maybe then they run into false positives because age isn't a very specific risk factor, and it's harder to get people to do the testing because of the access issues, so then they figure out what's going on.