Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Probability-Statistics-Jupy…
GitHub Repository: Probability-Statistics-Jupyter-Notebook/probability-statistics-notebook
Path: blob/master/notebook-for-learning/Chapter-3-Discrete-Probability-Distributions.ipynb
388 views
Kernel: Python 3
''' Import here useful libraries Run this cell first for convenience ''' import numpy as np from scipy import stats import scipy import warnings warnings.simplefilter('ignore', DeprecationWarning)

Chapter 3 - Discrete Probability Distributions

The Binomial Distribution

Bernoulli Random Variables

  • Modeling of a process with two possible outcomes, labeled 0 and 1

  • Random variable defined by the parameter pp, 0p10 \leq p \leq 1, which is the probability that the outcome is 1

  • The Bernoulli distribution Ber(p)Ber(p) is: f(x;p)=px(1p)1x, x=0,1\begin{equation} f(x;p) = p^x(1-p)^{1-x}, \text{ } x= 0,1 \end{equation}

  • E(X)=pE(X) = p

  • Var(X)=p(1p)Var(X) = p(1-p)

from scipy.stats import bernoulli n = number of trials p = 0.3 # probability of success print("Mean: ", bernoulli.mean(p)) print("Variance: ", bernoulli.var(p))
Mean: 0.3 Variance: 0.21

Definition of the Binomial Distribution

  • Let's consider and experiment consisting of nn Bernoulli trials X1,,XnX_1, \cdots, X_n independent and with a constant probability pp of success

  • Then the total number of successes X=i=1mXiX = \sum_{i=1}^m X_i is a random variable whose Binomial distribution with parameters nn (number of trials) and pp is: XB(n,p)\begin{equation} X \sim B(n,p) \end{equation}

  • Probability mass function of a B(n,p)B(n, p) random variable is: f(x;n,p)=(nx)px(1p)nx, x=0,1,,n\begin{equation} f(x;n,p) = \binom{n}{x}p^x(1-p)^{n-x}, \text{ } x= 0,1, \cdots, n \end{equation}

  • E(X)=npE(X) = np

  • Var(X)=np(1p)Var(X) = np(1-p)

from scipy.stats import binom # Parameters n = 10 # number of trials x = 7 # number of successes p = 0.2 # probability of success print("Mean: ", binom.mean(n, p)) print("Variance: ", binom.var(n, p)) print("Probability mass function: ", binom.pmf(x, n, p)) print("Cumulative distribution function: ", binom.cdf(x,n,p))
Mean: 2.0 Variance: 1.6 Probability mass function: 0.0007864320000000006 Cumulative distribution function: 0.9999220736

Proportion of successes in Bernoulli Trials

  • Let XB(n,p)X \sim B(n,p). Then, if Y=XnY = \frac{X}{n}

  • E(Y)=pE(Y) = p

  • Var(Y)=p(1p)nVar(Y) = \frac{p(1-p)}{n}

The Geometric and Negative Binomial Distributions

Definition of the Geometric Distribution

  • Number of XX of trials up to and including the first success in a sequence of independent Bernoulli trials with a constant success probability pp has a geometric distribution with parameter pp

  • Probability mass function: P(X=x)=(1p)x1p, x=1,2,.\begin{equation} P(X = x) = (1 - p)^{x-1}p, \text{ } x=1,2, \cdots. \end{equation}

  • Cumulative distribution function: P(Xx)=1(1p)x\begin{equation} P(X \leq x) = 1 - (1-p)^x \end{equation}

  • E(X)=1pE(X) = \frac{1}{p}

  • Var(X)=1pp2Var(X) = \frac{1-p}{p^2}

from scipy.stats import geom x = 5 # number of trials up to and including the first success p = 0.23 # probability of success print("Mean: ", geom.mean(p)) print("Variance: ", geom.var(p)) print("Probability mass function: ", geom.pmf(x, p)) print("Cumulative distribution function: ", geom.cdf(x, p))
Mean: 4.3478260869565215 Variance: 14.555765595463136 Probability mass function: 0.08085199430000001 Cumulative distribution function: 0.7293215843

Definition of the Negative Binomial Distribution

  • Number XX of trials up and including the rrth success in a sequence of independent Bernoulli trials with a consant success probability pp has a negative binomial distribution with parameter pp

  • Probability mass function: P(X=x)=(x1r1)(1p)xrprx=r,r+1,.\begin{equation} P(X = x) = \binom{x-1}{r-1} (1-p)^{x-r}p^r \text{, } x=r,r+1, \cdots. \end{equation}

  • E(X)=rpE(X) = \frac{r}{p}

  • Var(X)=r(1p)p2Var(X) = \frac{r(1-p)}{p^2}

from scipy.stats import nbinom x = 7 # number of trials up and including the r-th success r = 4 # number of successes p = 0.55 # probability of success #########################################NOT WORKING!!!!!!!!!!!#################### print("Mean: ", nbinom.mean(r, p)) print("Variance: ", nbinom.var(r, p)) print("Probability mass function: ", nbinom.pmf(x-r, r, p)) # the distribution takes x-r, which is the number of failures print("Cumulative distribution function: ", nbinom.cdf(x-r, r, p))
Mean: 3.2727272727272725 Variance: 5.950413223140496 Probability mass function: 0.1667701406250001 Cumulative distribution function: 0.6082877968750002

Hypergeometric Distribution

Definition of the Hypergeometric Distribution

  • Consider a collection of NN items of which rr are of a certain kind

  • Probability the item is of the special kind: p=rNp = \frac{r}{N}

  • If nn items are chosen at random without replacement, then the distribution of XB(n,p)X \sim B(n,p)

  • Hypergeometric distribution: nn items chosen at random without replacement

  • Probability mass function: f(x;N,n,r)=(rx)(Nrnx)(Nn),\begin{equation} f(x; N, n, r) = \frac{ \binom{r}{x} \binom{N-r}{n-x} }{ \binom{N}{n} }, \end{equation} max{0,n(Nr)}xmin{n,r}\begin{equation} max \{ 0, n-(N-r) \} \leq x \leq min \{ n, r \} \end{equation}

  • E(X)=nrNE(X) = n\frac{r}{N}

  • Var(X)=NnN1nrN(1rN)Var(X) = \frac{N-n}{N-1} n \frac{r}{N}(1- \frac{r}{N})

  • Comparison with B(n,p)B(n,p) when p=rN p = \frac{r}{N}

    • EB(X)=EH(X)=npE_B(X) = E_H(X) = np

    • σB2(X)=npqσH2(X)=NnN1npq\sigma_B ^2 (X) = npq \geq \sigma_H ^2(X) = \frac{N-n}{N-1} npq

from scipy.stats import hypergeom x = 2 # number of rare elements picked N = 15 # total number of elements r = 9 # number of rare elements n = 5 # picked up elements print("Mean: ", hypergeom.mean(N, r, n)) print("Variance: ", hypergeom.var(N, r, n)) print("Probability mass function: ", hypergeom.pmf(x, N, r, n)) print("Cumulative distribution function: ", hypergeom.cdf(x, N, r, n))
Mean: 3.0 Variance: 0.8571428571428571 Probability mass function: 0.23976023976023988 Cumulative distribution function: 0.2867132867132869

The Poisson Distribution

Definition of the Poisson Distribution

  • Describes the number of "events" occurring within certain specified boundaries of space and time

  • A random variable XX distributed as a Poisson random variable with parameter λ\lambda is written as: XP(λ)\begin{equation} X \sim P(\lambda) \end{equation}

  • Probability mass function: P(X=x)=eλλxx! x=0,1,2,.\begin{equation} P(X = x) = \frac{ e^{- \lambda} \lambda ^ {x}} {x!} \text{ } x=0,1,2, \cdots. \end{equation}

  • Eprint("Mean:",multinomial.mean(x,p))(X)=Var(X)=λEprint("Mean: ", multinomial.mean(x, p))(X) = Var(X) = \lambda

from scipy.stats import poisson # Parameters x = 1 # number of events Lambda = 2/3 # lambda parameter print("Mean: ", poisson.mean(Lambda)) print("Variance: ", poisson.var(Lambda)) print("Probability mass function: ", poisson.pmf(x, Lambda)) print("Cumulative distribution function: ", poisson.cdf(x, Lambda))
Mean: 0.6666666666666666 Variance: 0.6666666666666666 Probability mass function: 0.3422780793550613 Cumulative distribution function: 0.8556951983876534

The Multinomial Distribution

Definition of the Multinomial Distribution

  • Consider a sequence of nn independent trials in which each individual trial can have kk outcomes occurring with a constant probability value p1,p2,,pkp_1, p_2, \cdots , p_k with p1+p2++pk=1p_1 + p_2 + \cdots + p_k = 1

  • The random variables X1,X2,,XkX_1, X_2, \cdots , X_k with i=1kXi=n\sum_{i=1}^k X_i = n that count the number of occurrences of the kk respective outcomes are said to have a multinomial distribution

  • Joint probability mass function of X1,X2,,XkX_1, X_2, \cdots , X_k : f(x1,x2,,xk;p1,,pk,n)=(nx1,x2,,xk)p1x1p2x2pkxk\begin{equation} f(x_1, x_2, \cdots, x_k; p_1, \cdots , p_k , n) = \binom{n}{x_1,x_2,\cdots, x_k} p_1^{x_1} p_2^{x_2} \cdots p_k^{x_k} \end{equation} with i=1kxi=n\sum_{i=1} ^ k x_i = n and i=1kpi=1\sum_{i=1}^k p_i = 1

  • Also written as: (X1,,Xk)Mk(p1,,pk,n)\begin{equation} (X_1, \cdots , X_k) \sim M_k(p_1, \cdots, p_k, n) \end{equation}

  • E(Xi)=npiE(X_i) = np_i

  • Var(Xi)=npi(1pi)Var(X_i) = np_i(1-p_i)

from scipy.stats import multinomial # Parameters x = [3, 3, 4, 5] # number of successes (! Need to insert the remaining variable) n = 15 # number of trials p = [1/6, 1/6, 1/6 , 3/6] # probabilities corresponding to the successes (! Need to insert the remaining probability) print("Probability mass function: ", multinomial.pmf(x, n=n, p=p))
Probability mass function: 0.006518417007220723