Path: blob/master/notebook-for-learning/Chapter11-Analysis-of-Variance.ipynb
388 views
Chapter 11 - Analysis of Variance
One Factor Analysis of Variance
One Factor Layouts
Suppose we experiment on k populations with unknown population means
Observation represents the j-th observation from the i-th population
The one factor analysis of variance methodology is appropriate for comparing three of more populations
Each population i consists of observations
If sample sizes are all equal, then the data set is balanced, otherwise it is called imbalanced
Total sample size is
The total data set is a called one-way or one factor layout
Each factor has levels corresponding to populations
Completely randomized designs: experiment performed by randomly allocating a units among the populations
Modeling assumption:
where the error terms
Point estimates of the unknown population means:
If we test vs : not
Acceptance of the null hypothesis indicates that there is no evidence that any of the population means are unequal
Partitioning the Total Sum of Squares
We can call the the "sum of squares for treatments", supposing we are analyzing the effects of treatments
-value considerations: The plausibility of the null hypothesis that the factor level means are all equal depends upon the relative size of the sum of squares for treatments, SSTr, to the sum of squares for error, SSE
The Analysis of Variance (ANOVA) Table
Mean square error (MSE)
Mean squares for treatments (MSTr)
If the factor level means are all equal (), then and under we have:
Analysis of variance table for one factor layout
Source | d.f. | Sum of Squares | Mean Squares | F-statistic | P-value |
---|---|---|---|---|---|
Treatments | SSTr | ||||
Error | SSE | ||||
Total | SST |
Pairwise Comparisons of the Factor Level Means (T-Method: Tukey’s Multiple Comparisons Procedure)
When the null hypothesis is rejected, the experimenter can follow up the analysis with pairwise comparisons of the factor level means to discover which ones have been shown to be different and by how much
With factor levels there are pairwise differences
A set of confidence intervals for the differences are
where
is is a critical point that is the upper point of the Studentized range distribution with parameter α and k and degrees of freedom
Difference with the : T-intervals have an individual confidence level whereas this set of simultaneous confidence intervals have an overall confidence level
If the confidence interval for the difference contains zero, then there is no evidence that the means at factor levels and are different
Sample Size Determination
The sensitivity of the analysis depends on the k sample sizes
The power of the test is higher as the sample size increases
Increase in the sample size decrease in the lengths of pairwise confidence intervals
If the sample sizes are unequal, then
If they are equal, then the expression becomes
If we want a maximum length of then we need to collect
Model Assumptions
Observations are distributed independently with normal distribution that has a common variance
The ANOVA is fairly robust to the distribution of data, so that it provides fairly accurate results as long as the distribution is not very far from a normal distribution
The equality of the variances for each of the k factor levels can be judged from a comparison of the sample variances or from a visual comparison of the lengths of boxplots of the observations at each factor level