GitHub Repository: Probability-Statistics-Jupyter-Notebook/probability-statistics-notebook
Path: blob/master/notebook-for-reviewing/chapter_8_inferences_on_a_population_mean.ipynb
³⁸⁸ views

Kernel: Python 3

In [2]:

import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.stats.weightstats as sms

from scipy import stats

Chapter 8 Inferences on a Population Mean

Confidence Intervals

Confidence Interval - for unknown parameter, the interval contains a set of possible values.

Confidence Level $1-\alpha$ - probability for the parameter within the interval.

Two-sided t-interval

Requirements:

Continuous data set, length = $n$
Sample mean $\bar{x}$
Sample standard $s$
N.B. Real standard deviation is UNKNOWN.

Real mean $\mu$ $1-\alpha$ confidence interval would be

\left( \bar{x} - \frac{t_{\alpha/2, n-1} S}{\sqrt{n}}, \bar{x} + \frac{t_{\alpha/2, n-1} S}{\sqrt{n}} \right)

T distribution:

T = \frac{\bar{X} - \mu}{S/\sqrt{n}} \sim t_{n-1}

In [7]:

# Two-sided t-interval

# Input
n = 60
s = 0.134
x_bar = 49.9999
alpha = 0.1

# Calculate
t = stats.t.ppf(1 - alpha / 2, n - 1)
wing = t * s / math.sqrt(n)

# Output
print('T Statistics Value\t{:.4f}'.format(t))
print('Interval Length\t\t{:.4f}'.format(2 * wing))
print('Confidece Interval\t({:.4f}, {:.4f})'.format(x_bar - wing, x_bar + wing))

Out[7]:

T Statistics Value	1.6711
Interval Length		0.0578
Confidece Interval	(49.9710, 50.0288)

One-sided t-interval

Real mean $\mu$ with $1-\alpha$ confidence interval would be

\left( -\infty, \bar{x} + \frac{t_{\alpha, n-1} S}{\sqrt{n}} \right)

\left( \bar{x} - \frac{t_{\alpha, n-1} S}{\sqrt{n}}, +\infty \right)

In [9]:

# One-sided t-interval

# Input
n = 60
s = 0.134
x_bar = 49.9999
alpha = 0.1

# Calculate
t = stats.t.ppf(1 - alpha, n - 1)
wing = t * s / math.sqrt(n)

# Output
print('T Statistics Value\t{:.4f}'.format(t))
print('----- Upper Bound -----')
print('Confidece Interval\t(-inf, {:.4f})'.format(x_bar + wing))
print('----- Lower Bound -----')
print('Confidece Interval\t({:.4f}, +inf)'.format(x_bar - wing))

Out[9]:

T Statistics Value	1.2961
----- Upper Bound -----
Confidece Interval	(-inf, 50.0223)
----- Lower Bound -----
Confidece Interval	(49.9775, +inf)

Two-sided z-interval

Real mean $\mu$ with $1-\alpha$ confidence interval would be

\left( \bar{x} - \frac{z_{\alpha/2,} \sigma}{\sqrt{n}}, \bar{x} + \frac{z_{\alpha/2} \sigma}{\sqrt{n}} \right)

In [10]:

# Two-sided z-interval

# Input
n = 60
sigma = 0.134
x_bar = 49.9999
alpha = 0.1

# Calculate
z = stats.norm.ppf(1 - alpha / 2)
wing = z * sigma / math.sqrt(n)

# Output
print('T Statistics Value\t{:.4f}'.format(z))
print('Interval Length\t\t{:.4f}'.format(2 * wing))
print('Confidece Interval\t({:.4f}, {:.4f})'.format(x_bar - wing, x_bar + wing))

Out[10]:

T Statistics Value	1.6449
Interval Length		0.0569
Confidece Interval	(49.9714, 50.0284)

In [11]:

# One-sided t-interval

# Input
n = 60
sigma = 0.134
x_bar = 49.9999
alpha = 0.1

# Calculate
z = stats.norm.ppf(1 - alpha, n - 1)
wing = t * sigma / math.sqrt(n)

# Output
print('T Statistics Value\t{:.4f}'.format(t))
print('----- Upper Bound -----')
print('Confidece Interval\t(-inf, {:.4f})'.format(x_bar + wing))
print('----- Lower Bound -----')
print('Confidece Interval\t({:.4f}, +inf)'.format(x_bar - wing))

Out[11]:

T Statistics Value	1.2961
----- Upper Bound -----
Confidece Interval	(-inf, 50.0223)
----- Lower Bound -----
Confidece Interval	(49.9775, +inf)

Hypothesis

Null Hypothesis $H_0$ - designate possible value.

Alternative Hypothesis $H_A$ - opposite of null hypothesis.

Two-sided hypothesis:

H_0:\mu = \mu_0.versus.H_A:\mu\ne\mu_0

One-sided hypothesis:

H_0:\mu \leq \mu_0.versus.H_A:\mu > \mu_0

H_0:\mu \geq \mu_0.versus.H_A:\mu < \mu_0

$p-value$ - probability of making null hypothesis true.

$p-value < significance level$ - reject null hypothesis
$p-value \geq significance level$ - accept null hypothesis
N.B. null hypothesis may not be true

In [14]:

# Two-sided t-test

# Input
n = 60
s = 0.1334
x_bar = 49.99856
mu_0 = 50

# Calculate
t = (x_bar - mu_0) / (s / math.sqrt(n))
p_value = 2 * stats.t.sf(abs(t), n - 1)

# Outpu 
print('P-Value\t{:.4f}'.format(p_value))

Out[14]:

P-Value	0.9336

In [16]:

# One-sided t-test

# Input
n = 60
s = 0.1334
x_bar = 49.99856
mu_0 = 50

# Calculate
t = (x_bar - mu_0) / (s / math.sqrt(n))
p_value = stats.t.cdf(abs(t), n - 1)

# Outpu 
print('P-Value\t{:.4f}'.format(p_value))

Out[16]:

P-Value	0.5332

In [17]:

# Two-sided acceptance region

# Input
n = 60
s = 0.1334
x_bar = 49.99856
mu_0 = 50
alpha = 0.1

# Calculate 
t = stats.t.ppf(1 - alpha / 2, n - 1)
wing = t * (s / math.sqrt(n))

# Output
print('Acceptance Region\t({:.4f}, {:.4f})'.format(x_bar - wing, x_bar + wing))

Out[17]:

Acceptance Region	(49.9698, 50.0273)

In [18]:

# One-sided acceptance region

# Input
n = 60
s = 0.1334
x_bar = 49.99856
mu_0 = 50
alpha = 0.1

# Calculate 
t = stats.t.ppf(1 - alpha, n - 1)
wing = t * (s / math.sqrt(n))

# Output
print('Acceptance Region\t(-inf, {:.4f})'.format(x_bar + wing))

Out[18]:

Acceptance Region	(-inf, 50.0209)

Z-test hypothesis

In [19]:

# Two-sided z-test

# Input
n = 60
sigma = 0.1334
x_bar = 49.99856
mu_0 = 50

# Calculate
t = (x_bar - mu_0) / (sigma / math.sqrt(n))
p_value = 2 * stats.norm.sf(abs(t), n - 1)

# Outpu 
print('P-Value\t{:.4f}'.format(p_value))

Out[19]:

P-Value	2.0000

In [20]:

# One-sided z-test

# Input
n = 60
sigma = 0.1334
x_bar = 49.99856
mu_0 = 50

# Calculate
t = (x_bar - mu_0) / (sigma / math.sqrt(n))
p_value = stats.norm.cdf(abs(t), n - 1)

# Outpu 
print('P-Value\t{:.4f}'.format(p_value))

Out[20]:

P-Value	0.0000