Repository for a workshop on Bayesian statistics
Bayesian Statistics Made Simple
Code and exercises from my workshop on Bayesian statistics in Python.
Copyright 2018 Allen Downey
MIT License: https://opensource.org/licenses/MIT
The likelihood function
Here's a definition for Bandit
, which extends Suite
and defines a likelihood function that computes the probability of the data (win or lose) for a given value of x
(the probability of win).
Note that hypo
is in the range 0 to 100.
We'll start with a uniform distribution from 0 to 100.
Now we can update with a single loss:
Another loss:
And a win:
Starting over, here's what it looks like after 1 win and 9 losses.
The posterior mean is about 17%
The most likely value is the observed proportion 1/10
The posterior credible interval has a 90% chance of containing the true value (provided that the prior distribution truly represents our background knowledge).
Multiple bandits
Now suppose we have several bandits and we want to decide which one to play.
For this example, we have 4 machines with these probabilities:
The following function simulates playing one machine once.
Here's a test, playing machine 3 twenty times:
Now I'll make 4 Bandit
objects to represent our beliefs about the 4 machines.
This function displays the four posterior distributions
Now suppose we play each machine 10 times. This function updates our beliefs about one of the machines based on one outcome.
After playing each machine 10 times, we have some information about their probabilies:
Bayesian Bandits
To get more information, we could play each machine 100 times, but while we are gathering data, we are not making good use of it. The kernel of the Bayesian Bandits algorithm is that is collects and uses data at the same time. In other words, it balances exploration and exploitation.
The following function chooses among the machines so that the probability of choosing each machine is proportional to its "probability of superiority".
Random
chooses a value from the posterior distribution.
argmax
returns the index of the machine that chose the highest value.
Here's an example.
Putting it all together, the following function chooses a machine, plays once, and updates beliefs
:
Here's an example
Trying it out
Let's start again with a fresh set of machines:
Now we can play a few times and see how beliefs
gets updated:
We can summarize beliefs
by printing the posterior mean and credible interval:
The credible intervals usually contain the true values (10, 20, 30, and 40).
The estimates are still rough, especially for the lower-probability machines. But that's a feature, not a bug: the goal is to play the high-probability machines most often. Making the estimates more precise is a means to that end, but not an end itself.
Let's see how many times each machine got played. If things go according to play, the machines with higher probabilities should get played more often.
Exercise: Go back and run this section again with a different value of num_play
and see how it does.