I think that you omitted allowing for the people born on Feb 29 in leap years, so that it should be 1/365.25 not 1/365 everywhere. I haven't recalculated with that but my intuition says that this makes such a minor difference that possibly you need 24 people but maybe not
(And yes it is exactly 365.25 because 2000 was a leap year even though it's a century because the century number is divisible by 4 and there are 0 people in the world you can meet who were born in 1900)
Sigh. I knew someone would do this. I almost didn't publish it, in fact. Here's what the thought process is like for someone with a reasonably-sized readership.
1) Include February 29 as a full day? Then someone will say no, it's only a quarter of a day! Then there will be an argument because you can't have a quarter of a match and the comment section or Notes derails into that and my point is lost.
2) Include February 29 as a quarter day? Then I can't write out the explanation using easily understandable fractions and the entire goddamn fucking point, to make people see that they can understand math, is lost because that's too confusing for a general audience and no, it does not kick it up to 24 people. People who read further about the Birthday Paradox, or who google because they don't believe me, will not see anyone including the quarter day, and I'll get at least two emails telling me I'm wrong as a result.
3) Don't include February 29 at all? Then someone will be in the comment section lecturing me as if I am a very stupid, very young child who doesn't know the history of Leap Year and telling me I should've picked option 1 or 2.
I promise, I'm really not retarded. I'm not a 7 year old autistic savant who manages to write like an adult in the same way that Rain Man can count toothpicks. I really do think about things. I swear. If you need an external witness to this fact, maybe Josh will weigh in and affirm that he's actually seen me think. In person!
I am grateful that my comment section is going away entirely as of my next post. There is literally no way to write anything without creating some kind of frustrating, annoying, time-and-energy-consuming headache for myself. Jesus Christ.
The underlying maths has many other applications of course.
For example, suppose you're told that a facial recognition system has a 99% accuracy rate when comparing 2 head shots to decide whether those 2 head shots are the same person or not. We'll assume for simplicity that we have an equal rate of false positives (falsely saying it is the same person) and false negatives (falsely saying it is not the same person).
This sounds pretty good, and if you were to use it to ensure only the employees of a small company could enter its premises it would probably work well enough, at least if some allowance is made for the occasional mis-identification.
Now, we know that there's a 99/100 chance of correctly recording if 2 head shots are of the same person. But imagine using this to scan headshots taken (via CCTV) of members of the public as they walk down the streets of a busy city to check them against headshots of wanted criminals. Now the odds of getting things wrong become pertinent to your freedom to walk the streets unaccosted by the police.
How many comparisons would you need to do before the probability drops below 50%? The answer is 69. If you compare hundreds or thousands of people against the mugshots of wanted criminals you'll end up accusing lots of random civilians of being criminals and may well miss some of the wanted criminals along the way.
But suppose we raise the accuracy to 99.9%? The point at which the probability drops below 50% is 693. At 99.99% accuracy, the figure for dropping below 50% accuracy is 6,934. This application of facial recognition is going to need a very high accuracy to avoid lots of innocent people being flagged incorrectly as wanted criminals.
Now maybe there are ways around this by e.g. treating the initial match as a trigger for a deeper check of the person's identity to finally settle whether they're really the wanted criminal or not, but I'm sceptical of this kind of use of facial recognition because the maths indicate that an impressive level of pairwise-accuracy can nevertheless yield lots of error in such settings.
My very feeble old-man brain remembered the group size as 13, obviously wrong. But in any case, I always wondered how a small group could give me winning odds. I’ll probably stick with groups of 25 to get me closer to something like Vegas roulette house odds. Thanks for the explainer.
In my case, the leap from "does anyone in this crowd share a birthday? Want to bet?" to "does anyone in this crowd share MY birthday..." happens too fast for it to be a matter of simply substituting an easier question for a harder one. At first, I wanted to blame my listening skills, and then I wanted to blame my early training in memorizing the multiplcation tables ('what's 6 x 7? Quick! Wow, you need to be able to answer faster than that.'), but now I'm just not sure why it is so easy to fall into error. Maybe the psychology is clearer for other people. I hope it is.
p.s. I decided to start practicing not leaving comments because soon I won't be able to. I failed this time, but will do better. So, since this should be my last comment on one of your posts: I wish you a Happy Halloween, and a Merry Christmas!
I think that you omitted allowing for the people born on Feb 29 in leap years, so that it should be 1/365.25 not 1/365 everywhere. I haven't recalculated with that but my intuition says that this makes such a minor difference that possibly you need 24 people but maybe not
(And yes it is exactly 365.25 because 2000 was a leap year even though it's a century because the century number is divisible by 4 and there are 0 people in the world you can meet who were born in 1900)
Sigh. I knew someone would do this. I almost didn't publish it, in fact. Here's what the thought process is like for someone with a reasonably-sized readership.
1) Include February 29 as a full day? Then someone will say no, it's only a quarter of a day! Then there will be an argument because you can't have a quarter of a match and the comment section or Notes derails into that and my point is lost.
2) Include February 29 as a quarter day? Then I can't write out the explanation using easily understandable fractions and the entire goddamn fucking point, to make people see that they can understand math, is lost because that's too confusing for a general audience and no, it does not kick it up to 24 people. People who read further about the Birthday Paradox, or who google because they don't believe me, will not see anyone including the quarter day, and I'll get at least two emails telling me I'm wrong as a result.
3) Don't include February 29 at all? Then someone will be in the comment section lecturing me as if I am a very stupid, very young child who doesn't know the history of Leap Year and telling me I should've picked option 1 or 2.
I promise, I'm really not retarded. I'm not a 7 year old autistic savant who manages to write like an adult in the same way that Rain Man can count toothpicks. I really do think about things. I swear. If you need an external witness to this fact, maybe Josh will weigh in and affirm that he's actually seen me think. In person!
I am grateful that my comment section is going away entirely as of my next post. There is literally no way to write anything without creating some kind of frustrating, annoying, time-and-energy-consuming headache for myself. Jesus Christ.
The underlying maths has many other applications of course.
For example, suppose you're told that a facial recognition system has a 99% accuracy rate when comparing 2 head shots to decide whether those 2 head shots are the same person or not. We'll assume for simplicity that we have an equal rate of false positives (falsely saying it is the same person) and false negatives (falsely saying it is not the same person).
This sounds pretty good, and if you were to use it to ensure only the employees of a small company could enter its premises it would probably work well enough, at least if some allowance is made for the occasional mis-identification.
Now, we know that there's a 99/100 chance of correctly recording if 2 head shots are of the same person. But imagine using this to scan headshots taken (via CCTV) of members of the public as they walk down the streets of a busy city to check them against headshots of wanted criminals. Now the odds of getting things wrong become pertinent to your freedom to walk the streets unaccosted by the police.
How many comparisons would you need to do before the probability drops below 50%? The answer is 69. If you compare hundreds or thousands of people against the mugshots of wanted criminals you'll end up accusing lots of random civilians of being criminals and may well miss some of the wanted criminals along the way.
But suppose we raise the accuracy to 99.9%? The point at which the probability drops below 50% is 693. At 99.99% accuracy, the figure for dropping below 50% accuracy is 6,934. This application of facial recognition is going to need a very high accuracy to avoid lots of innocent people being flagged incorrectly as wanted criminals.
Now maybe there are ways around this by e.g. treating the initial match as a trigger for a deeper check of the person's identity to finally settle whether they're really the wanted criminal or not, but I'm sceptical of this kind of use of facial recognition because the maths indicate that an impressive level of pairwise-accuracy can nevertheless yield lots of error in such settings.
I was guessing over 365 to begin with. Lol. Interesting. Thanks for the shout out as well. :)
My very feeble old-man brain remembered the group size as 13, obviously wrong. But in any case, I always wondered how a small group could give me winning odds. I’ll probably stick with groups of 25 to get me closer to something like Vegas roulette house odds. Thanks for the explainer.
You remembered that the answer is a surprisingly small prime number whose second digit is a 3. That’s pretty good!
In my case, the leap from "does anyone in this crowd share a birthday? Want to bet?" to "does anyone in this crowd share MY birthday..." happens too fast for it to be a matter of simply substituting an easier question for a harder one. At first, I wanted to blame my listening skills, and then I wanted to blame my early training in memorizing the multiplcation tables ('what's 6 x 7? Quick! Wow, you need to be able to answer faster than that.'), but now I'm just not sure why it is so easy to fall into error. Maybe the psychology is clearer for other people. I hope it is.
p.s. I decided to start practicing not leaving comments because soon I won't be able to. I failed this time, but will do better. So, since this should be my last comment on one of your posts: I wish you a Happy Halloween, and a Merry Christmas!