Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Probability-Statistics-Jupy…
GitHub Repository: Probability-Statistics-Jupyter-Notebook/probability-statistics-notebook
Path: blob/master/notebook-for-learning/Chapter-2-Random-Variables.ipynb
388 views
Kernel: Python 3
''' Import here useful libraries Run this cell first for convenience ''' # Import here useful libraries import numpy as np from scipy import stats import scipy import warnings from sympy import symbols from sympy import integrate warnings.simplefilter('ignore', DeprecationWarning)

Chapter 2 - Random Variables

Discrete Random Variable

Definition of a Random Variable

  • Random variable XX: mapping from sample space SS to a real line RR

  • Numerical value X(w)X(w) mapped to each outcome ww of a particular experiment

Probability Mass Function

  • Probability Mass Function (p.m.f.): set of probability values pip_i assigned to each value taken by the discrete random variable xix_i

  • 0pi1 and ipi=1 0 \leq p_i \leq 1 \text{ and } \sum_i p_i = 1

  • Probability: P(X=xi)=piP(X = x_i) = p_i

Cumulative Distribution Function

  • Cumulative Distribution Function (CDF): F(x)=P(Xx)F(x) = P(X \leq x)

Continuous Random Variables

Probability Density Function

  • Probability Density Function (pdf): f(x)0\begin{equation} f(x) \geq 0 \end{equation} f(x)dx=1\begin{equation} \int_{-\infty}^{\infty} f(x) dx = 1 \end{equation}

# Verify if provided function is a probability density function # Parameters a = 2 b = 1 x_i = 49.5 # Integration from x_i to x_f x_f = 50.5 from scipy.integrate import quad def integrand(x, a, b): return 1.5 - 6*(x - 50)**2 I = quad(integrand, x_i, x_f, args=(a,b)) print("Result and error: ", I)
Result and error: (0.9999999999999989, 1.1102230246251553e-14)

Cumulative Distribution Function

  • Cumulative Distribution Function for continuous Random Variables:

    • F(x)=xf(y)dyF(x) = \int_{-\infty}^x f(y) dy

    • f(x)=dF(x)dxf(x) = \frac{dF(x)}{dx}

    • P(a<X<b)=F(b)F(a)P(a < X < b) = F(b) - F(a)

    • P(a<X<b)=P(aXb)=P(aX<b)=P(a<Xb)P(a < X < b) = P(a \leq X \leq b) = P(a \leq X < b) = P(a < X \leq b)

Expectation of a Random Variable

Expectations of Discrete Random Variables

  • Expectation of a discrete random variable XX with p.m.f. pp: E(X)=ipixi\begin{equation} E(X) = \sum_i p_i x_i \end{equation}

from scipy.stats import rv_discrete x = [10, 20, 30] p = [0.2, 0.3, 0.5] distribution = rv_discrete(values=(x, p)) print("Expected value: ", distribution.expect())
Expected value: 23.0

Expectation of a Continuous Random Variable

  • Expectation of a continuous random variable with p.d.f. f(x)f(x) E(X)=xf(x)dx\begin{equation} E(X) = \int_{- \infty}^{\infty} xf(x)dx \end{equation}

from scipy.stats import rv_continuous a = 49.5 # lower bound b = 50.5 # upper bound class distribution_gen(rv_continuous): def _pdf(self, x): return 1.5 - 6*(x - 50)**2 distribution = distribution_gen(name='Continuous distribution') print("Expected value: ", distribution.expect(lambda x: 1, lb=a, ub=b))
Expected value: 0.9999999999999989

Symmetric Random Variables

  • Symmetric Random Variables: if p(x)p(x) is symmetric around a point μ\mu so that: f(μ+x)=f(μx)\begin{equation} f(\mu + x) = f(\mu -x) \end{equation}

  • In this case, E(X)=μE(X) = \mu is the point of symmetry

Medians of Random Variables

  • Median: for a random variable XX its median is the value xx such that F(x)=0.5F(x) = 0.5

from scipy.stats import rv_continuous a = 49.5 # lower bound b = 50.5 # upper bound class distribution_gen(rv_continuous): def _pdf(self, x): return 1.5 - 6*(x - 50)**2 distribution = distribution_gen(a=a, b=b) print("Median: ", distribution.median())
Median: 50.0

Variance of a Random Variable

Definition and Interpretation of Variance

  • Variance (σ2 \sigma ^2): Var(X)=E(XE(X))2=E(X2)μ2Var(X) = E(X - E(X))^2 = E(X^2) - \mu ^2

  • Positive quantity measuring the spread of the distribution about its mean value

  • Standard Deviation(σ\sigma): Var(x)\sqrt{Var(x)}

from scipy.stats import rv_discrete x = [10, 20, 30] p = [0.2, 0.3, 0.5] distribution = rv_discrete(values=(x, p)) print("Variance: ", distribution.var()) print("Standard Deviation: ", distribution.std())
Variance: 61.0 Standard Deviation: 7.810249675906654
from scipy.stats import rv_continuous a = 49.5 # lower bound b = 50.5 # upper bound class distribution_gen(rv_continuous): def _pdf(self, x): return 1.5 - 6*(x - 50)**2 distribution = distribution_gen(a=a, b=b) print("Variance: ", distribution.var()) print("Standard Deviation: ", distribution.std())
Variance: 0.049999999999272404 Standard Deviation: 0.22360679774835202
## Why isn't the result correct? from scipy.stats import rv_continuous from math import exp a = 0 # lower bound b = 10 # upper bound class distribution_gen(rv_continuous): def _pdf(self, x): return (( (exp(10 -(x)) -1)/(exp(10)-11) * x)) distribution = distribution_gen(a=a, b=b) print("Variance: ", distribution.var()) print("Standard Deviation: ", distribution.std())
Variance: 2.0423852667674502 Standard Deviation: 1.4291204521549086
## Looks like it works well with polynomials from scipy.stats import rv_continuous from math import exp a = 5 # lower bound b = 6 # upper bound class distribution_gen(rv_continuous): def _pdf(self, x): return ( 2/11 * x) distribution = distribution_gen(a=a, b=b) print("Variance: ", distribution.var()) print("Standard Deviation: ", distribution.std())
Variance: 0.08310376492194038 Standard Deviation: 0.28827723621878365

Chebyshev's Inequality

  • Chebyshev's Inequality: if XX is a random variable with mean μ\mu and variance σ2\sigma ^2 the following holds: P(μcσXμ+cσ)11c2 for c1\begin{equation} P(\mu -c \sigma \leq X \leq \mu + c \sigma) \geq 1 - \frac{1}{c^2} \text{ for } c \geq 1 \end{equation}

Quantiles of Random Variables

  • Quantiles of Random Variables: pp-th quantile xx of a random variable XX is F(x)=p\begin{equation} F(x) = p \end{equation}

  • Upper quartile (Q3Q_3): 75th percentile of the distribution

  • Lower quartile (Q1Q_1): 25th percentile of the distribution

  • Interquantile range (IQR): distance between the two quartiles, Q3Q1Q_3 - Q_1

''' from scipy.stats import rv_continuous a = 49.5 # lower bound b = 50.5 # upper bound class distribution_gen(rv_continuous): def _pdf(self, x): return 1.5 - 6*(x - 50)**2 # Function to return distribution = distribution_gen(a=a, b=b) print("Variance: ", distribution.qu()) print("Standard Deviation: ", distribution.std()) '''

Jointly Distributed Random Variables

Joint Probability Distributions

  • Discrete: P(X=xi,Y=yj)=pij0 satisfying ijpij=1\begin{equation} P(X = x_i, Y = y_j) = p_{ij} \geq 0 \text{ satisfying } \sum_i \sum_j p_{ij} = 1 \end{equation}

  • Continuous: f(x,y)0 satisfying f(x,y)dxdy=1\begin{equation} f(x,y) \geq 0 \text{ satisfying } \int \int f(x,y) dxdy= 1 \end{equation}

  • Joint Cumulative Distribution Function: F(x,y)=P(Xxj,Yyj)\begin{equation} F(x,y) = P(X \leq x_j, Y \leq y_j) \end{equation}

    • Discrete: F(x,y)=i:xixj:yjypij\begin{equation} F(x,y) = \sum_{i:x_i \leq x} \sum_{j:y_j \leq y} p_{ij} \end{equation}

    • Continuous: F(x,y)=yxf(w,z)dwdz\begin{equation} F(x,y) = \int_{- \infty}^{ y} \int_{- \infty}^{ x} f(w, z) dwdz \end{equation}

Marginal Probability Distributions

  • Marginal probability distribution: obtained by summing or integrating the joint probability distribution over the values of the other random variable

    • Discrete: P(X=xi)=pi+=jpij\begin{equation} P(X = x_i) = p_{i+} = \sum_j p_{ij} \end{equation}

    • Continuous: fX(x)=f(x,y)dy\begin{equation} f_X(x) = \int_{- \infty}^{\infty} f(x,y)dy \end{equation}

Conditional Probability Distributions

  • Probability distribution describing the properties of a random variable XX given knowledge of YY

    • Discrete: fXY(xiyi)=P(X=xiY=yj)=pijp+j\begin{equation} f_{X \mid Y}(x_i \mid y_i) = P(X = x_i \mid Y = y_j) = \frac{p_{ij}}{p_{+j}} \end{equation}

    • Continuous: fXY(xy)=f(x,y)fY(y)\begin{equation} f_{X \mid Y}(x \mid y) = \frac{f(x,y)}{f_Y(y)} \end{equation}

Computation of E(g(X,Y))

  • Given g(x,y)g(x,y) function of xx and yy, we have that:

    • Discrete: E(g(X,Y))=x,yg(x,y)f(x,y)\begin{equation} E(g(X,Y)) = \sum_{x,y} g(x,y)f(x,y) \end{equation}

    • Continuous: E(g(X,Y))=g(x,y)f(x,y)dxdy\begin{equation} E(g(X,Y)) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} g(x,y)f(x,y)dxdy \end{equation}

Independence and Covariance

  • Independence: when two random variables XX and YY satisfy: f(x,y)=fX(x)fY(y) for all x and y\begin{equation} f(x,y) = f_X(x)f_Y(y) \text{ for all } x \text{ and } y \end{equation}

  • Covariance: Cov(X,Y)=E(XE(X))(YE(Y))=E(XY)E(X)E(Y)Cov(X,Y) = E(X - E(X))(Y - E(Y)) = E(XY) -E(X)E(Y)

  • May take a positive or negative value

  • Independent random variables have a covariance of zero, but the contrary is not always true

  • Correlation (ρXY\rho_{XY}): Corr(X,Y)=Cov(X,Y)Var(X)Var(Y)\begin{equation} Corr(X,Y) = \frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}} \end{equation}

  • 1ρXY1-1 \leq \rho_{XY} \leq 1

  • The correlation is invariant to linear transformations of XX and YY

# Calculate the Covariance for discrete random variables # Input X and Y are in a table with corresponding probabilities value_x = np.array([1, 2, 3]) value_y = np.array([1, 2, 3, 4]) prob_matrix = np.array([[0.12, 0.08, 0.07, 0.05], [0.08, 0.15, 0.21, 0.13], [0.01, 0.01, 0.02, 0.07]]) # Covariance matrix # Expectation of x exp_x = 0 for i in range(len(value_x)): exp_x += value_x[i] * np.sum(prob_matrix, axis=1)[i] print("Expectation of x: ", exp_x) # Expectation of y exp_y = 0 for i in range(len(value_y)): exp_y += value_y[i] * np.sum(prob_matrix, axis=0)[i] print("Expectation of y: ", exp_y) # Variance of x exp_x2 = 0 for i in range(len(value_x)): exp_x2 += (value_x[i] ** 2) * np.sum(prob_matrix, axis=1)[i] var_x = exp_x2 - (exp_x ** 2) print("Variance of x: ", var_x) # Variance of y exp_y2 = 0 for i in range(len(value_y)): exp_y2 += (value_y[i] ** 2) * np.sum(prob_matrix, axis=0)[i] var_y = exp_y2 - (exp_y ** 2) print("Variance of y: ", var_y) # Covariance exp_xy = 0 for i in range(len(value_x)): for j in range(len(value_y)): exp_xy += value_x[i] * value_y[j] * prob_matrix[i, j] cov = exp_xy - exp_x * exp_y print("Covariance: ", cov) # Correlation corr = cov / ((var_x * var_y) ** (1/2)) print("Correlation: ", corr)
Expectation of x: 1.79 Expectation of y: 2.59 Variance of x: 0.3858999999999999 Variance of y: 1.161900000000001 Covariance: 0.22389999999999954 Correlation: 0.33437386749732556
# Covariance for continuous random variables x = symbols('x') y = symbols('y') # Inputs func = 8 * x * y - 2 * x * (y ** 2) domain_x = (0, 1) domain_y = (1, 2) # Expectation of x exp_x = float(integrate(x * integrate(func, (y, domain_y[0], domain_y[1])), (x, domain_x[0], domain_x[1]))) print("Expectation of x: ", exp_x) # Expectation of y exp_y = float(integrate(y * integrate(func, (x, domain_x[0], domain_x[1])), (y, domain_y[0], domain_y[1]), (y, domain_y[0], domain_y[1]))) print("Expectation of y: ", exp_y) # Variance of x exp_x2 = float(integrate(x * x * integrate(func, (y, domain_y[0], domain_y[1])), (x, domain_x[0], domain_x[1]))) var_x = exp_x2 - exp_x ** 2 print("Variance of x: ", var_x) # Variance of y exp_y2 = float(integrate(y * y * integrate(func, (x, domain_x[0], domain_x[1])), (y, domain_y[0], domain_y[1]))) var_y = exp_y2 - exp_y ** 2 print("Variance of y: ", var_y) # Covariance exp_xy = float(integrate(x * y * func, (x, domain_x[0], domain_x[1]), (y, domain_y[0], domain_y[1]))) cov = exp_xy - exp_x * exp_y print("Covariance: ", cov) # Correlation corr = cov/(var_x * var_y) ** (1/2) print("Correlation: ", corr)
Expectation of x: 2.4444444444444446 Expectation of y: 5.583333333333333 Variance of x: -4.141975308641976 Variance of y: -22.373611111111106 Covariance: -9.925925925925927 Correlation: -1.0310963144439693

Combinations and Functions of Random Variables

Linear Functions of Random Variables

  • Linear function of a random variable: given XX random variable and Y=aX+bY = aX + b with a,bRa, b \in \mathbb{R} then: E(Y)=aE(X)+b and Var(Y)=a2Var(X)\begin{equation} E(Y) = aE(X) + b \text{ and } Var(Y) = a^2Var(X) \end{equation}

  • Standardization: if XX has expectation μ\mu and variance σ2\sigma^2 then: Y=Xμσ\begin{equation} Y = \frac{X - \mu}{\sigma} \end{equation} has an expectation of zero and variance of one

  • Sums of Random variables: given two random variables X1X_1 and X2X_2 then E(X1+X2)=E(X1)+E(X2)\begin{equation} E(X_1 + X_2) = E(X_1) + E(X_2) \end{equation} and Var(X1+X2)=Var(X1)+Var(X2)+2Cov(X1,X2)\begin{equation} Var(X_1 + X_2) = Var(X_1) + Var(X_2) + 2Cov(X_1,X_2) \end{equation}

  • If X1X_1 and X2X_2 are independent, then: Var(X1+X2)=Var(X1)+Var(X2)\begin{equation} Var(X_1 + X_2) = Var(X_1) + Var(X_2) \end{equation}

Linear Combinations of Random Variables

  • Linear combinations of random variables: if X1,,XnX_1, \cdots, X_n is a sequence of random variables and a1,,ana_1, \cdots, a_n and bb are constants then: E(a1X1++anXn+b)=a1E(X1)++anE(Xn)+b\begin{equation} E(a_1X_1 + \cdots + a_n X_n + b) = a_1 E(X_1) + \cdots +a_n E(X_n) + b \end{equation}

  • If the random variables are independent: Var(a1X1++anXn+b)=a12Var(X1)++an2Var(Xn)\begin{equation} Var(a_1X_1 + \cdots + a_nX_n + b) = a_1^2Var(X_1) + \cdots + a_n^2 Var(X_n) \end{equation}

  • Averaging independent random variables: X1,,XnX_1, \cdots, X_n sequence of random variables with expectation μ\mu and variance σ2\sigma^2 we have: Xˉ=i=1nXin\bar{X} = \frac{\sum_{i=1}^n X_i}{n} then

    • E(Xˉ)=μE(\bar{X}) = \mu

    • Var(Xˉ)=σ2nVar(\bar{X}) = \frac{\sigma^2}{n}

Nonlinear Functions of a Random Variable

  • Nonlinear function of a random variable XX: another random variable Y=g(X)Y=g(X) for some nonlinear function g