Repository for a workshop on Bayesian statistics
Bayesian Bandits from Scratch
Code from a talk presented to the Boston Bayesian meetup, July 2016.
Copyright 2016 Allen Downey
MIT License: https://opensource.org/licenses/MIT
The cookie problem
Create a Suite with two equally likely hypotheses.
Update each hypothesis with the likelihood of the data (a vanilla cookie).
Print the posterior probabilities.
Suppose we put the first cookie back, stir, choose again from the same bowl, and get a chocolate cookie.
The red dice problem
Create a Suite to represent six-sided dice with different numbers of red sides.
We can do the update by hand like this:
Now I'll do the same calculation using Suite.Update
.
I'll define a new class called RedDice
that extends Suite
.
It provides a method called Likelihood
that takes data
and hypo
and returns the probability of the data (the outcome of rolling the die) for a given hypothesis (number of red sides on the die).
Now we can create a RedDice
object and update it.
If we get more data, we can perform more updates.
Here are the results.
Bayesian bandits
Here's a definition for Bandit
, which extends Suite
and defines a likelihood function that computes the probability of the data (win or lose) for a given value of x
(the probability of win).
Note that hypo
is in the range 0 to 100.
We'll start with a uniform distribution from 0 to 100.
Now we can update with a single loss:
Another loss:
And a win:
Starting over, here's what it looks like after 1 win and 9 losses.
The posterior mean is about 17%
The most likely value is the observed proportion 1/10
The posterior credible interval has a 90% chance of containing the true value (provided that the prior distribution truly represents our background knowledge).
Bayesian A/B testing
Now suppose we have several bandits and we want to decide which one to play.
For this example, we have 4 machines with these probabilities:
The following function simulates playing one machine once.
Here's a test, playing machine 3 twenty times:
Now I'll make 4 Bandit
objects to represent our beliefs about the 4 machines.
The core of the algorithm is this process for choosing which machine to play:
Here's an example.
This function updates our beliefs about one of the machines based on one outcome.
Finally, this function chooses a machine, plays once, and updates beliefs
:
Here's an example
This function displays the four posterior distributions
Now we can play a few times and see how beliefs
gets updated:
We can summarize beliefs
by printing the posterior mean and credible interval:
The credible intervals usually contain the true value, but the estimates are still rough, especially for the lower-probability machines.
But that's a feature, not a bug: the goal is to play the high-probability machines most often. Making the estimates more precise is a means to that end, but not an end itself.