
From http://en.wikipedia.org/wiki/Reese%27s_Pieces#/media/File:Reeses-pieces-loose.JPG.
I eat Reese’s pieces almost every day after lunch, and they come in three colors: orange, yellow, and brown.
I’ve wondered for a while whether the three colors occur in equal proportions, so for lunch today, I thought I’d try to infer the occurrence rates using Bayes’ Theorem.
Bayes’ Theorem provides a quantitative way to update your estimate of the probability for some event, given some new information. In math, the theorem looks like
<code>
</code>
The probability for event
to happen, given that some condition
is met, is the probability that
is met, given that
happened, times the probability for
to happen at all, and divided by the probability for
to be met at all.
The
and
are called the “priors” and often represent your initial estimates for the probability that
and
occur.
is called the “likelihood”, and
is the “posterior”, the thing we know AFTER
is satisfied.
is usually the thing we’re trying to calculate.

Thanks, Winco buy-in-bulk!
So for my case,
will be the frequency with which a certain color occurs, and
will be my experimental data.
For a given frequency
of oranges (or browns or yellows), the probability
that I draw
oranges is ~ f^N (1 – f)^N(not orange). As I select more and more candies, I can keep re-evaluating
for the whole allowed range of f (0 to 1) and find the value that maximizes
.
Closing my eyes, I pulled ten different candies out of the bag, with following results in sequence: brown, orange, orange, yellow, orange, orange, orange, brown, orange, yellow, orange. These results obviously suggest orange has a higher frequency than yellow or brown.
This ipython notebook implements the calculation I described, and the plots below show how
changes after a certain number of trials
:

Applying Bayesian inference to determine the frequency of Reese’s pieces colors.
So, for example, before I did any trials
, I assumed all colors were equally likely. After the first trial when I chose a brown candy, the probability that brown has a higher frequency than the other colors goes up. After three trials (brown, orange, orange), orange takes the lead, and since I hadn’t seen any yellows, there’s a non-zero probability that yellow’s frequency is actually zero. We can see how the probabilities settle down after ten trials.
Based on this admittedly simple experiment, it seems that oranges have a frequency about twice that of yellows and browns. Although not as much fun, if I’d bothered to check wikipedia, I would have seen that “The goal color distribution is 50% orange, 25% brown, and 25% yellow” — totally consistent with my estimate.