Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Probability-Statistics-Jupy…
GitHub Repository: Probability-Statistics-Jupyter-Notebook/probability-statistics-notebook
Path: blob/master/notebook-for-learning/Chapter-8-Inferences-on-a-Population-Mean.ipynb
388 views
Kernel: Python 3
''' Import here useful libraries Run this cell first for convenience ''' import numpy as np import pandas as pd import matplotlib.pyplot as plt from scipy import stats import statsmodels.stats.weightstats as sms from scipy.stats import t import math

Chapter 8 - Inferences on a Population Mean

Confidence Intervals

  • A confidence interval for an unknown parameter θ is an interval that contains a set of plausible values of the parameter

  • It is associated with a confidence level 1 - α, which measures the probability that the confidence interval actually contains the unknown parameter value

  • Inference methods on a population mean based upon the t-procedure are appropriate for large sample sizes n ≥ 30 and also for small sample sizes as long as the data can reasonably be taken to be approximately normally distributed

Two-sided t-Interval

  • A confidence interval with confidence level 1 − α for a population mean μ\mu based upon a sample of n continuous data observations with a sample mean xx and a sample standard deviation SS is (xˉtα/2,n1Sn,xˉ+tα/2,n1Sn)\left( \bar{x} - \frac{t_{\alpha/2, n-1} S}{\sqrt{n}}, \bar{x} + \frac{t_{\alpha/2, n-1} S}{\sqrt{n}} \right) where T=XˉμS/ntn1T = \frac{\bar{X} - \mu}{S/\sqrt{n}} \sim t_{n-1}

  • The central limit theorem ensures that the distribution of X is approximately normal for large sample sizes

Interval length

L=2×tα/2,n1SnL = 2 \times \frac{t_{\alpha/2, n-1} S}{\sqrt{n}}

If we want to know the interval length smaller than a certain amount, we need at least nn samples: n4×tα/2,n12S2L02n \geq 4 \times \frac{t_{\alpha/2, n-1}^2 S^2}{L_0^2}

  • One-Sided t-Interval: One-sided confidence intervals with confidence levels 1-α for a population mean μ\mu: (,xˉ+tα,n1Sn)\left( -\infty, \bar{x} + \frac{t_{\alpha, n-1} S}{\sqrt{n}} \right) for a lower bound and (xˉ+tα,n1Sn,)\left( \bar{x} + \frac{t_{\alpha, n-1} S}{\sqrt{n}}, \infty \right) for an upper bound

n = 40 x_bar = 9.39 mu_0 = 10 s = 1.041 alpha = 0.01 t_stat = math.sqrt(n)*(x_bar - mu_0)/s crit = t.ppf(1-alpha, n-1) wing_span = crit*s/(math.sqrt(n)) print(("t-statistics t = {:.4f}").format(t_stat)) print(("Confidence interval with 99% confidence level: (-∞)({:.4f})").format(x_bar+wing_span))
t-statistics t = -3.7060 Confidence interval with 99% confidence level: (-∞)(9.7893)

Z-intervals

If we want to construct a confidence interval with a known value for the population standard deviation σ\sigma, then we have (xˉzα/2σn,xˉ+zα/2σn)\left( \bar{x} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}}, \bar{x} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \right)

"""Example: 𝑧_0.05""" d = stats.norm() print(("z = {:.4f}").format(d.ppf(0.95)))
z = 1.6449

Hypothesis testing

  • A null hypothesis H 0 for a population mean μ\mu is a statement that designates possible values for the population mean.

  • It is associated with an alternative hypothesis HA0HA0 , which is the “opposite” of the null hypothesis.

Two-sided set of hypotheses

  • H0:μ=μ0H_0: \mu = \mu_0 versus HA:μμ0H_A: \mu \neq \mu_0

One-sided set of hypoteses

Can be either:

  • H0:μμ0H_0: \mu \leq \mu_0 versus HA:μ>μ0H_A: \mu > \mu_0

    • or

  • H0:μμ0H_0: \mu \geq \mu_0 versus HA:μ<μ0H_A: \mu < \mu_0

Interpretation of pp-values

Types of error

  • Type I error: An error committed by rejecting the null hypothesis when it is true.

  • Type II error: An error committed by accepting the null hypothesis when it is false.

Significance level

  • is specified as the upper bound of the probability of type I error.

pp-values of a test

  • The p-value of a test is the probability of obtaining a given data set or worse when the null hypothesis is true. A data set can be used to measure the plausibility of null hypothesis H0H_0 through the construction of a pp-value.

  • The smaller the pp-value, the less plausible is the null hypothesis.

Rejection and acceptance of the Null Hypothesis

  • Rejection: pp-value is smaller than the significance level, then H0H_0 is rejected

  • Acceptance: pp-value larger than the significance level, then H0H_0 is plausible

Calculation of pp-values

where T=XˉμS/ntn1T = \frac{\bar{X} - \mu}{S/\sqrt{n}} \sim t_{n-1}

Two-sided t-test

If we test H0:μ=μ0H_0: \mu = \mu_0 versus HA:μμ0H_A: \mu \neq \mu_0 then

  • pp-value = 2×P(Tt)2 \times P(T \geq \mid t \mid)

"""𝐻0:𝜇=385, 𝐻𝐴:𝜇≠385""" n = 33 x_bar = 382.97 mu_0 = 385 s = 3.81 alpha = 0.01 t_stat = math.sqrt(n)*(x_bar - mu_0)/s print(("t-statistics T = {:.4f}").format(t_stat)) print(("P-value = {:.4f}").format(2*t.sf(abs(t_stat), n-1)))
t-statistics T = -3.0608 P-value = 0.0044

One-sided t-test

If we test H0:μμ0H_0: \mu \leq \mu_0 versus HA:μ>μ0H_A: \mu > \mu_0 then

  • pp-value = P(Tt)P(T \geq t)n = 40

If we test H0:μμ0H_0: \mu \geq \mu_0 versus HA:μ<μ0H_A: \mu < \mu_0 then

  • pp-value = P(Tt)P(T \leq t)

"""𝐻0:𝜇≥10, 𝐻𝐴:𝜇<10""" n = 40 x_bar = 9.39 mu_0 = 10 s = 1.041 alpha = 0.01 t_stat = math.sqrt(n)*(x_bar - mu_0)/s print(("t-statistics T = {:.4f}").format(t_stat)) print(("P-value = {:.4f}").format(t.cdf(t_stat, n-1)))
t-statistics T = -3.7060 P-value = 0.0003

Significance level of size α\alpha

A hypothesis test with a significance level of size α

  • rejects the null hypothesis H0H_0 if a p-value smaller than α is obtained

  • accepts the null hypothesis H0H_0 if a p-value larger than α is obtained.

Two-sided problems

A size α\alpha test for H0:μ=μ0H_0: \mu = \mu_0 versus HA:μμ0H_A: \mu \neq \mu_0 reject H0H_0 if the test statistics is in the rejection region R=[t:t>tα/2,n1]R = [t: \mid t \mid > t_{\alpha/2, n-1}] and accepts if in the acceptance region A=[t:ttα/2,n1]A = [t: \mid t \mid \leq t_{\alpha/2, n-1}]

One-sided problems

A size α\alpha test for H0:μμ0H_0: \mu \geq \mu_0 versus HA:μ<μ0H_A: \mu < \mu_0 rejects H0H_0 if the test statistics is in the rejection region R=[t:t<tα,n1]R = [t: t < - t_{\alpha, n-1}] and accepts if in the acceptance region A=[t:ttα,n1]A = [t: t \geq - t_{\alpha, n-1}]

"""t = 2.4428""" print(("Critical point when α/2 = 0.05 = {:.4f}").format(t.ppf(0.95, 19))) print(("Critical point when α/2 = 0.005 = {:.4f}").format(t.ppf(0.995, 19))) # So the null hypotesis is rejected at size 𝛼=0.10 and accepted for 𝛼=0.01 # To verify, we can use the p-value print(("p-value = {:.4f}").format(2*t.sf(2.4428, 19)))
Critical point when α/2 = 0.05 = 1.7291 Critical point when α/2 = 0.005 = 2.8609 p-value = 0.0245

zz-tests

  • We test similarly to the t-statistics, but with data with sample size nn from N(μ,σ2)N(\mu, \sigma^2) assuming σ\sigma is known

  • The Z-statistics is given by: Z=Xˉμ0σ/nZ = \frac{ \bar{X} - \mu_0}{\sigma / \sqrt{n}}

Power of a hypothesis test

  • Definition:

    • power = 1 - P(Type II error HA\mid H_A)

which is the probability that the null hypothesis is rejected when it is false.

Computation of the power of a hypothesis test

We want to test H0:μ=μ0H_0: \mu = \mu_0 vs HA:μμ0H_A: \mu \neq \mu_0 with significance level α\alpha. We assume a sample size nn from N(μ,σ2)N(\mu, \sigma^2)

If μ=μ>μ0,β(μ)β(μ)=1Pμ=μ(Zzα/2)\mu = \mu^* > \mu_0, \beta (\mu^*) \Longrightarrow \beta (\mu^*) = 1 - P_{\mu = \mu^*} ( \mid Z \mid \leq z_{\alpha/2} )

Determination of sample size in hypotheses testing

  • Find nn for which β(μ)=β\beta (\mu^*) = \beta^* with μ>μ0\mu^* > \mu_0

nσzα/2zβμμ0\sqrt{n} \approx \sigma \frac{ z_{\alpha / 2} - z_{\beta^*}}{ \mu^* - \mu_0}

When μμ0\mu^* \rightarrow \mu_0, we need more samples