If interested in another look at numbers and counting, watch "Breaking the Maya Code" (2008) a documentary about the 200 year struggle to decipher the writing system of the ancient Maya, a 4,000 year-old Mesoamerican civilization.
Life Lesson: It takes many different kinds of people to solve a complex problem.
Movie Scene:
"Our decimal system counts by tens and powers of ten. Förstemann realized that the Maya used a base 20 system, counting by 20s and powers of 20. With this system, they could express and manipulate extremely large numbers."
So many fantastic points here! This is why an understanding of the DGP and theory can never really be replaced by completely pattern-oriented approaches to scientific questions. The type of mixture model approach used for the ticks is actually pretty common in other areas of ecology, particularly wrt mammal and bird population estimates. Point-count and presence-absence data usually assume a true underlying population and imperfect observations (including false zeroes), but those approaches are slowly spreading to other areas, with the increase in popularity of Bayesian approaches probably accounting for a lot of that. Have you ever read Statistical Rethinking by Richard McElreath? Best stats textbook I've ever read, and hits on many of the points here about DGPs, causal inference, and the need for theory to inform our models. Really great stuff, you've got a very happy new subscriber here!
It's funny, you frame this like a bad thing, but it actually gives me hope, because if this problem continues, it would appear there is not much to fear from AI, as it would fail to ever become superhumanly effective.
Sounds like you might like Desystemize #5 :) Yes, I do think one side-effect from our failures to understand what goes in to intelligence is that our efforts to produce intelligence will have a lot more headwind then we think. There's a lot of potential benefits from us getting the proper humility about systems - the negative framing is because right now there's an awful lot of driving the bus blindfolded.
In my world (data protection), we have a lot of 'shrew v. white-footed mice' examples -- easily measurable harms (data breaches, 'insufficient' legal basis or technical controls, cookie & privacy notices) which are often subjective, but easy to spot get lots of press/regulatory interest. Harder problems -- unethical / illegitimate business practices, data accuracy, bias/discrimination, non-material privacy harms -- are like the tick problem, likely to be far more impactful but require more time, effort & energy than the easy stuff.
Here is a fun anti-empiricism proposition: most domains, if not all, are resistant, anti-fragile, or worse, archotrophic, to the tyranny of systems, thus first hand perception is superior. Reference: #16 https://eggreport.substack.com/p/how-to-find-god-10-16
Interesting feedback on this piece over at /r/slatestarcodex:
> This doesn't seem like the only way error detection in complex systems can work. You might have many ways to realize something is amiss; data where Lyme disease doesn't seem to vary based on mouse population as expected, failed interventions to reduce mouse populations having no effect on cases, ecology models suggesting white-footed mice which really had that many ticks wouldn't be competitive with other mice, etc.
> As your great mound of data and models grows, inconsistencies should be easier to detect, even for ML. New lines of evidence introduce puzzles that contradict models based on tainted older data; only basically correct theories will have explanatory power, and the edge cases will suggest avenues to improve the model or determine its limits. That's how science works.
If interested in another look at numbers and counting, watch "Breaking the Maya Code" (2008) a documentary about the 200 year struggle to decipher the writing system of the ancient Maya, a 4,000 year-old Mesoamerican civilization.
Life Lesson: It takes many different kinds of people to solve a complex problem.
Movie Scene:
"Our decimal system counts by tens and powers of ten. Förstemann realized that the Maya used a base 20 system, counting by 20s and powers of 20. With this system, they could express and manipulate extremely large numbers."
https://moviewise.wordpress.com/2015/07/31/breaking-the-maya-code/
So many fantastic points here! This is why an understanding of the DGP and theory can never really be replaced by completely pattern-oriented approaches to scientific questions. The type of mixture model approach used for the ticks is actually pretty common in other areas of ecology, particularly wrt mammal and bird population estimates. Point-count and presence-absence data usually assume a true underlying population and imperfect observations (including false zeroes), but those approaches are slowly spreading to other areas, with the increase in popularity of Bayesian approaches probably accounting for a lot of that. Have you ever read Statistical Rethinking by Richard McElreath? Best stats textbook I've ever read, and hits on many of the points here about DGPs, causal inference, and the need for theory to inform our models. Really great stuff, you've got a very happy new subscriber here!
Oops, realizing I replied to @moviewise's comment. While it's also great, I meant this as a response to the original post!
It's funny, you frame this like a bad thing, but it actually gives me hope, because if this problem continues, it would appear there is not much to fear from AI, as it would fail to ever become superhumanly effective.
Sounds like you might like Desystemize #5 :) Yes, I do think one side-effect from our failures to understand what goes in to intelligence is that our efforts to produce intelligence will have a lot more headwind then we think. There's a lot of potential benefits from us getting the proper humility about systems - the negative framing is because right now there's an awful lot of driving the bus blindfolded.
This was such a good piece, and a wonderful follow after reading 'The Limits of Data' by T. Chi Nguyen: https://issues.org/limits-of-data-nguyen/
In my world (data protection), we have a lot of 'shrew v. white-footed mice' examples -- easily measurable harms (data breaches, 'insufficient' legal basis or technical controls, cookie & privacy notices) which are often subjective, but easy to spot get lots of press/regulatory interest. Harder problems -- unethical / illegitimate business practices, data accuracy, bias/discrimination, non-material privacy harms -- are like the tick problem, likely to be far more impactful but require more time, effort & energy than the easy stuff.
Here is a fun anti-empiricism proposition: most domains, if not all, are resistant, anti-fragile, or worse, archotrophic, to the tyranny of systems, thus first hand perception is superior. Reference: #16 https://eggreport.substack.com/p/how-to-find-god-10-16
Interesting feedback on this piece over at /r/slatestarcodex:
> This doesn't seem like the only way error detection in complex systems can work. You might have many ways to realize something is amiss; data where Lyme disease doesn't seem to vary based on mouse population as expected, failed interventions to reduce mouse populations having no effect on cases, ecology models suggesting white-footed mice which really had that many ticks wouldn't be competitive with other mice, etc.
> As your great mound of data and models grows, inconsistencies should be easier to detect, even for ML. New lines of evidence introduce puzzles that contradict models based on tainted older data; only basically correct theories will have explanatory power, and the edge cases will suggest avenues to improve the model or determine its limits. That's how science works.
What do you make of this?