I eat Reese’s pieces almost every day after lunch, and they come in three colors: orange, yellow, and brown.
I’ve wondered for a while whether the three colors occur in equal proportions, so for lunch today, I thought I’d try to infer the occurrence rates using Bayes’ Theorem.
Bayes’ Theorem provides a quantitative way to update your estimate of the probability for some event, given some new information. In math, the theorem looks like
The probability for event to happen, given that some condition is met, is the probability that is met, given that happened, times the probability for to happen at all, and divided by the probability for to be met at all.
The and are called the “priors” and often represent your initial estimates for the probability that and occur. is called the “likelihood”, and is the “posterior”, the thing we know AFTER is satisfied. is usually the thing we’re trying to calculate.
Thanks, Winco buy-in-bulk!
So for my case, will be the frequency with which a certain color occurs, and will be my experimental data.
For a given frequency of oranges (or browns or yellows), the probability that I draw oranges is ~ f^N (1 – f)^N(not orange). As I select more and more candies, I can keep re-evaluating for the whole allowed range of f (0 to 1) and find the value that maximizes .
Closing my eyes, I pulled ten different candies out of the bag, with following results in sequence: brown, orange, orange, yellow, orange, orange, orange, brown, orange, yellow, orange. These results obviously suggest orange has a higher frequency than yellow or brown.
This ipython notebook implements the calculation I described, and the plots below show how changes after a certain number of trials :
Applying Bayesian inference to determine the frequency of Reese’s pieces colors.
So, for example, before I did any trials , I assumed all colors were equally likely. After the first trial when I chose a brown candy, the probability that brown has a higher frequency than the other colors goes up. After three trials (brown, orange, orange), orange takes the lead, and since I hadn’t seen any yellows, there’s a non-zero probability that yellow’s frequency is actually zero. We can see how the probabilities settle down after ten trials.
Based on this admittedly simple experiment, it seems that oranges have a frequency about twice that of yellows and browns. Although not as much fun, if I’d bothered to check wikipedia, I would have seen that “The goal color distribution is 50% orange, 25% brown, and 25% yellow” — totally consistent with my estimate.