Path: blob/master/notebook-for-learning/Chapter12-Simple-Linear-Regression-and-Correlation.ipynb
388 views
Chapter 12 - Simple Linear Regression and Correlation
The Simple Linear Regression Model
Model Definition and Assumptions
The model can be defined as:
The observed is composed of a linear function of the independent variable and an error term .
The error terms are generally taken from a distribution for some error variance
This means that are observations from the independent random variables
where
: intercept parameter
: slope parameter
can be estimated from the data set
The smaller the error variance, the closer the values are to the line
Fitting the Regression Line
Parameter Estimation
Let
We want to find the values called least squares estimators minimizing Q
We have to solve
by which we get
Where
The regression (fitted) line is:
Sum of squared errors (SSE):
Inferences on the Slope Parameter
Inference Procedures
Let
Under the assumption of the linear regression model we have:
We can build the statistics as: . Then
% confidence interval for :
where the standard error is
-values
We test and with the T statistics
Two-sided test vs
p-value =
One-sided test vs
p-value =
A size test rejects if
One-sided test vs
p-value =
A size test rejects if
For a fixed value of the error variance , the variance of the slope parameter estimate decreases as the value of increases the more the values of are spread out, the more "leverage" they will have for fitting the regression line
Inferences on the Regression Line
Inference Procedures
Inferences on the mean response at ,
The T-statistics are as following: where
and Then
A confidence interval can be obtained as following:
Prediction Intervals for Future Response Values
Inference Procedures
Predicition interval for a single response at
The T-statistics can be written as
A % prediction interval for a single response can be obtained as:
The Analysis of Variance Table
Sum of Squared Decomposition
Total sum of squares (SST)
Analysis of Variance for Simple Linear Regression Analysis
Source | d.f. | Sum of Squares | Mean Squares | F-statistic | P-value |
---|---|---|---|---|---|
Treatments | SSR | ||||
Error | SSE | ||||
Total | SST |
The Coefficient of Determination
The sample correlation coefficient and the coefficient of determination
The sample correlation coefficient is
Residual Analysis
Residual Analysis Method
The residuals are defined to be
That satisfy:
Residual analysis can be used to:
Identify outliers
For instance, outliers have a large absolute value
Check if the fitted model is appropriate
Check if the error variance is constant
Check if the error terms are normally distributed
The normal score of the -th smallest residual is
Intrinsically Linear Models
Models that can be linearized via transformations
Example model:
we can rewrite the model as:
Correlation Analysis
The Sample Correlation Coefficient
Under the assumption that X and Y are jointly bivariate normal in distribution
We test versus , the test statistic is:
The test is equivalent to testing vs . In fact: where is the Pearson's correlation coefficient calculated as
Recall that