Many of us have seen this episode of Black Mirror in which an app tells us our level of romantic compatibility, in a society in which AI controls most social interactions.
For some strange people, like Yuval Harari (Homo Deus, Sapiens) and others, this is a wet dream of deterministic human behavior and reducing human nature to a set of equation. To a large portion of the population, this is a nightmare that looks unavoidable. For data scientists, this is just funny how probability and predictive modelling can be misunderstood.
Let me explain.
The Problem with Probabilities
If we use social media data, for example, we can have a very precise map of everyone’s interest and many aspects of their opinions of personality (as many even take a lot of time to answer those “free” personality surveys). So, we could very easily build an algorithm that matches other people based on all these variables. And it would have quite a high accuracy. Still, it will have a very low chance of being correct.
Let’s assume that out there, there are some perfect matches for you, personality, chemistry, everything is there. Probably one chance in a thousand.
Let’s say we have a model with a 90% accuracy, obviously this would be very good. This would give us something like the following table
Good Match | No Match | |
Good Match | 90 % | 10 % |
No Match | 10 % | 90 % |
This means that in all those identified as bad matches, there is still a 10% probability of you two to click, and with all those identified as good match, there is still a 10% chance of not “clicking”, right?
Not quite.
Let’s put some actual numbers in there. Let’s say we have a potential of 10.000.000 matches available for us. 1/1000 is a good match, that’s still a huge number: we have 10.000 good candidates to choose from.
But the list given by the model is not quite as good. Indeed, those 9000 candidates are mixed within a group of 1.000.000 bad candidates. Still, our probability of being right has increase a LOT, from 1/1000 to almost 1/100.
Good Match | No Match | |
Good Match | 9 000 | 1 000 |
No Match | 1 000 000 | 9 000 000 |
Clearly, this is not the 90% certainty of being right that we were “offered” by the app.
The Problem with Humans
Of course, the top candidates can have a much higher probability of being a good match, and the skimming of all the 9.000.000 people can help a bit. And even within the top 10 candidates, none would be “100%” probability of being the perfect match for a life partner. Also, trust in the model might even influence behavior in such a way that we would unconsciously invalidate it. Basically, just by knowing “this is the perfect match” many will take things for granted and jeopardize the relationship, or will not make any efforts if “this is a bad match”.
At this point, you could argue that this is a particular case, but most models involving human behavior will have the same characteristics. Whether we look at tests that determine laziness, good candidates, psychological pathologies or even sexual tendencies, we have the same common situation: a model that attempts to predict a small group. And in such case, errors of classification are ALWAYS to be expected, and always in proportion higher than correct classifications.
So, is AI not Working?
Obviously, there are areas in which AI WILL replace human labor (and already does it), and it’s good: many operations can be automated, many things can be classified automatically, many simple decisions can be taken by algorithms.
Don’t get me wrong: AI IS nevertheless incredibly useful, on average. For risk, for marketing, for ad placement, even HR selection. But at an individual level, we are nowhere near an intelligence that would make decisions for us. And it is a very scary thought that many of our political leaders believe that we are.
For complex decisions, the result is really not up to what non data scientists would expect from “80-90% accuracy” offered by the models. And definitely not enough to start believing that AI will replace you in everything. Even face recognition is not quite accurate enough, data can be manipulated, most models published are still garbage, and even in radiology, where most investments are made, models are still underwhelming (https://www.politico.com/news/2022/08/15/artificial-intelligence-health-care-00051828).
Do We Still Need to Worry?
Yes. But not because AI will replace us, rather because AI can be incredibly abused for excessive control, tracking, invasion of privacy. All those are nice topics for another post.
Also, we definitely should be worried that some people believe in the results of models or worse… simulations! When it comes to AI, always remember what pretty much every professor tells students on the first class of modeling: models are not to be trusted, they are to be used.
for more technical information on this topic, you can read this other post on the accuracy fallacy: https://timi.eu/blog/modeler/so-you-think-ai-classifies-well/.