GitHub Repository: Probability-Statistics-Jupyter-Notebook/probability-statistics-notebook
Path: blob/master/notebook-for-learning/Chapter-8-Inferences-on-a-Population-Mean.ipynb
³⁸⁸ views

Kernel: Python 3

In [2]:

'''
Import here useful libraries
Run this cell first for convenience
'''
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import statsmodels.stats.weightstats as sms
from scipy.stats import t
import math

Chapter 8 - Inferences on a Population Mean

Confidence Intervals

A confidence interval for an unknown parameter θ is an interval that contains a set of plausible values of the parameter
It is associated with a confidence level 1 - α, which measures the probability that the confidence interval actually contains the unknown parameter value
Inference methods on a population mean based upon the t-procedure are appropriate for large sample sizes n ≥ 30 and also for small sample sizes as long as the data can reasonably be taken to be approximately normally distributed

Two-sided t-Interval

A confidence interval with confidence level 1 − α for a population mean $\mu$ based upon a sample of n continuous data observations with a sample mean $x$ and a sample standard deviation $S$ is $\left( \bar{x} - \frac{t_{\alpha/2, n-1} S}{\sqrt{n}}, \bar{x} + \frac{t_{\alpha/2, n-1} S}{\sqrt{n}} \right)$ where $T = \frac{\bar{X} - \mu}{S/\sqrt{n}} \sim t_{n-1}$
The central limit theorem ensures that the distribution of X is approximately normal for large sample sizes

Interval length

L = 2 \times \frac{t_{\alpha/2, n-1} S}{\sqrt{n}}

If we want to know the interval length smaller than a certain amount, we need at least $n$ samples: $n \geq 4 \times \frac{t_{\alpha/2, n-1}^2 S^2}{L_0^2}$

One-Sided t-Interval: One-sided confidence intervals with confidence levels 1-α for a population mean $\mu$ : $\left( -\infty, \bar{x} + \frac{t_{\alpha, n-1} S}{\sqrt{n}} \right)$ for a lower bound and $\left( \bar{x} + \frac{t_{\alpha, n-1} S}{\sqrt{n}}, \infty \right)$ for an upper bound

In [12]:

n = 40
x_bar = 9.39
mu_0 = 10
s = 1.041
alpha = 0.01
t_stat = math.sqrt(n)*(x_bar - mu_0)/s
crit = t.ppf(1-alpha, n-1)
wing_span = crit*s/(math.sqrt(n))
print(("t-statistics t = {:.4f}").format(t_stat))
print(("Confidence interval with 99% confidence level: (-∞)({:.4f})").format(x_bar+wing_span))

Out[12]:

t-statistics t = -3.7060
Confidence interval with 99% confidence level: (-∞)(9.7893)

Z-intervals

If we want to construct a confidence interval with a known value for the population standard deviation $\sigma$ , then we have $\left( \bar{x} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}}, \bar{x} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \right)$

In [13]:

"""Example: 𝑧_0.05"""
d = stats.norm()
print(("z = {:.4f}").format(d.ppf(0.95)))

Out[13]:

z = 1.6449

Hypothesis testing

A null hypothesis H 0 for a population mean $\mu$ is a statement that designates possible values for the population mean.
It is associated with an alternative hypothesis $HA0$ , which is the “opposite” of the null hypothesis.

Two-sided set of hypotheses

$H_0: \mu = \mu_0$ versus $H_A: \mu \neq \mu_0$

One-sided set of hypoteses

Can be either:

$H_0: \mu \leq \mu_0$ versus $H_A: \mu > \mu_0$
- or
$H_0: \mu \geq \mu_0$ versus $H_A: \mu < \mu_0$

Interpretation of $p$ -values

Types of error

Type I error: An error committed by rejecting the null hypothesis when it is true.
Type II error: An error committed by accepting the null hypothesis when it is false.

Significance level

is specified as the upper bound of the probability of type I error.

$p$ -values of a test

The p-value of a test is the probability of obtaining a given data set or worse when the null hypothesis is true. A data set can be used to measure the plausibility of null hypothesis $H_0$ through the construction of a $p$ -value.
The smaller the $p$ -value, the less plausible is the null hypothesis.

Rejection and acceptance of the Null Hypothesis

Rejection: $p$ -value is smaller than the significance level, then $H_0$ is rejected
Acceptance: $p$ -value larger than the significance level, then $H_0$ is plausible

Calculation of $p$ -values

where $T = \frac{\bar{X} - \mu}{S/\sqrt{n}} \sim t_{n-1}$

Two-sided t-test

If we test $H_0: \mu = \mu_0$ versus $H_A: \mu \neq \mu_0$ then

$p$ -value = $2 \times P(T \geq \mid t \mid)$

In [9]:

"""𝐻0:𝜇=385, 𝐻𝐴:𝜇≠385"""
n = 33
x_bar = 382.97
mu_0 = 385
s = 3.81
alpha = 0.01
t_stat = math.sqrt(n)*(x_bar - mu_0)/s
print(("t-statistics T = {:.4f}").format(t_stat))
print(("P-value = {:.4f}").format(2*t.sf(abs(t_stat), n-1)))

Out[9]:

t-statistics T = -3.0608
P-value = 0.0044

One-sided t-test

If we test $H_0: \mu \leq \mu_0$ versus $H_A: \mu > \mu_0$ then

$p$ -value = $P(T \geq t)$ n = 40

If we test $H_0: \mu \geq \mu_0$ versus $H_A: \mu < \mu_0$ then

$p$ -value = $P(T \leq t)$

In [11]:

"""𝐻0:𝜇≥10, 𝐻𝐴:𝜇<10"""
n = 40
x_bar = 9.39
mu_0 = 10
s = 1.041
alpha = 0.01
t_stat = math.sqrt(n)*(x_bar - mu_0)/s
print(("t-statistics T = {:.4f}").format(t_stat))
print(("P-value = {:.4f}").format(t.cdf(t_stat, n-1)))

Out[11]:

t-statistics T = -3.7060
P-value = 0.0003

Significance level of size $\alpha$

A hypothesis test with a significance level of size α

rejects the null hypothesis $H_0$ if a p-value smaller than α is obtained
accepts the null hypothesis $H_0$ if a p-value larger than α is obtained.

Two-sided problems

A size $\alpha$ test for $H_0: \mu = \mu_0$ versus $H_A: \mu \neq \mu_0$ reject $H_0$ if the test statistics is in the rejection region $R = [t: \mid t \mid > t_{\alpha/2, n-1}]$ and accepts if in the acceptance region $A = [t: \mid t \mid \leq t_{\alpha/2, n-1}]$

One-sided problems

A size $\alpha$ test for $H_0: \mu \geq \mu_0$ versus $H_A: \mu < \mu_0$ rejects $H_0$ if the test statistics is in the rejection region $R = [t: t < - t_{\alpha, n-1}]$ and accepts if in the acceptance region $A = [t: t \geq - t_{\alpha, n-1}]$

In [17]:

"""t =  2.4428"""
print(("Critical point when α/2 = 0.05 = {:.4f}").format(t.ppf(0.95, 19)))
print(("Critical point when α/2 = 0.005 = {:.4f}").format(t.ppf(0.995, 19)))
# So the null hypotesis is rejected at size  𝛼=0.10  and accepted for  𝛼=0.01
# To verify, we can use the p-value
print(("p-value = {:.4f}").format(2*t.sf(2.4428, 19)))

Out[17]:

Critical point when α/2 = 0.05 = 1.7291
Critical point when α/2 = 0.005 = 2.8609
p-value = 0.0245

$z$ -tests

We test similarly to the t-statistics, but with data with sample size $n$ from $N(\mu, \sigma^2)$ assuming $\sigma$ is known
The Z-statistics is given by: $Z = \frac{ \bar{X} - \mu_0}{\sigma / \sqrt{n}}$

Power of a hypothesis test

Definition:
- power = 1 - P(Type II error $\mid H_A$ )

which is the probability that the null hypothesis is rejected when it is false.

Computation of the power of a hypothesis test

We want to test $H_0: \mu = \mu_0$ vs $H_A: \mu \neq \mu_0$ with significance level $\alpha$ . We assume a sample size $n$ from $N(\mu, \sigma^2)$

If $\mu = \mu^* > \mu_0, \beta (\mu^*) \Longrightarrow \beta (\mu^*) = 1 - P_{\mu = \mu^*} ( \mid Z \mid \leq z_{\alpha/2} )$

Determination of sample size in hypotheses testing

Find $n$ for which $\beta (\mu^*) = \beta^*$ with $\mu^* > \mu_0$

\sqrt{n} \approx \sigma \frac{ z_{\alpha / 2} - z_{\beta^*}}{ \mu^* - \mu_0}

When $\mu^* \rightarrow \mu_0$ , we need more samples

Chapter 8 - Inferences on a Population Mean

Confidence Intervals

Two-sided t-Interval

Interval length

Z-intervals

Hypothesis testing

Two-sided set of hypotheses

One-sided set of hypoteses

Interpretation of $p$ -values

Types of error

Significance level

$p$ -values of a test

Rejection and acceptance of the Null Hypothesis

Calculation of $p$ -values

Two-sided t-test

One-sided t-test

Significance level of size $\alpha$

Two-sided problems

One-sided problems

$z$ -tests

Power of a hypothesis test

Computation of the power of a hypothesis test

Determination of sample size in hypotheses testing

Product

Resources

Company

Chapter 8 - Inferences on a Population Mean

Confidence Intervals

Two-sided t-Interval

Interval length

Z-intervals

Hypothesis testing

Two-sided set of hypotheses

One-sided set of hypoteses

Interpretation of ppp-values

Types of error

Significance level

ppp-values of a test

Rejection and acceptance of the Null Hypothesis

Calculation of ppp-values

Two-sided t-test

One-sided t-test

Significance level of size α\alphaα

Two-sided problems

One-sided problems

zzz-tests

Power of a hypothesis test

Computation of the power of a hypothesis test

Determination of sample size in hypotheses testing

Interpretation of $p$ -values

$p$ -values of a test

Calculation of $p$ -values

Significance level of size $\alpha$

$z$ -tests