Path: blob/master/notebook-for-learning/Chapter10-Discrete-Data-Analysis.ipynb
388 views
Chapter 10 - Discrete Data Analysis
Inferences on a Population Proportion
If we have a population proportion with characteristic, then for the random sample size of we have:
With characteristic: cell probability ; cell frequency
Without characteristic: cell probability ; cell frequency
Given a sample proportion :
and
For large we have
Confidence Intervals for Population Proportions
Two sided confidence intervals for a population proportion
One sided with a lower bound
One sided with a upper bound
The results are safe if both and are
Hypothesis Tests on a Population Proportion
Two-sided tests: vs
, where
When and are both larger than 5, a normal approximation may be used to compute the p-value
where
Continuity correction can be used for a better approximation to the -value.
A size α hypothesis test rejects when
One-sided hypothesis tests for a population mean
For testing vs
By the normal approximation: where
is the continuity correction. If we want to test the opposite, then we have
Sample size calculations
Considering a two sided level confidence interval
If is not available,
Then e.g.
Comparing Two Population Proportions
Confidence Intervals for the Difference Between Two Population Proportions
Assume and and they are independent
confidence intervals for
Two-sided:
These approximations are reasonable as long as x, n − x, y, and n − y are all larger than 5.
Hypothesis Tests on the Difference Between Two Population Proportions
For testing vs
p-value: where
For vs
For vs
Reject if -value is smaller than the significance level , otherwise accept
Goodness of Fit Tests for One-Way Contingency Tables
One-Way Classifications
Each of observations is classified into one of categories or cells
Cell frequencies: ,
Cell probabilities: ,
Test , vs Under , the expected cell frequency at cell , , is given by
There are two test statistics:
Pearson's Chi-square statistic:
Likelihood ration Chi-square statistic:
Both of the statistics, and , follow asymptotically
This asymptotic result is reasonable as long as all the are larger than 5. if not, it is appropriate to group cell frequecies to get at least 5
-value =
Conclusion: Reject if the p-value smaller than the sig. level α. Otherwise, accept
Testing for Independence in Two-Way Contingency Tables
Two-Way Classifications
Two way contingency table | | Level 1 | Level 2 | | Level j | | Level c | | |----------|---------------|---------------|----------|---------------|----------|---------------|-----------------------| | Level 1 | | | | | | | | | Level 2 | | | | | | | | | | | | | | | | | | Level i | | | | | | | | | | | | | | | | | | Level r | | | | | | | | | | | | | | | | |
Testing for Independence
Testing for independence in a Two-way contingency table
: two factors are independent vs : not
We have the statistics:
where
These statistics follow with
The results are valid as long as the are larger than 5
-value =
Simpson’s paradox
Suppose for with
It is possible that
This can be due e.g. if the size of the events are different