Path: blob/main/C1 - Supervised Machine Learning - Regression and Classification/week3/Optional Labs/C1_W3_Lab09_Regularization_Soln.ipynb
3748 views
Optional Lab - Regularized Cost and Gradient
Goals
In this lab, you will:
extend the previous linear and logistic cost functions with a regularization term.
rerun the previous example of over-fitting with a regularization term added.
Adding regularization


The slides above show the cost and gradient functions for both linear and logistic regression. Note:
Cost
The cost functions differ significantly between linear and logistic regression, but adding regularization to the equations is the same.
Gradient
The gradient functions for linear and logistic regression are very similar. They differ only in the implementation of .
Cost functions with regularization
Cost function for regularized linear regression
The equation for the cost function regularized linear regression is: where:
Compare this to the cost function without regularization (which you implemented in a previous lab), which is of the form:
The difference is the regularization term,
Including this term encourages gradient descent to minimize the size of the parameters. Note, in this example, the parameter is not regularized. This is standard practice.
Below is an implementation of equations (1) and (2). Note that this uses a standard pattern for this course, a for loop
over all m
examples.
Run the cell below to see it in action.
Expected Output:
Regularized cost: 0.07917239320214275 |
Cost function for regularized logistic regression
For regularized logistic regression, the cost function is of the form where:
Compare this to the cost function without regularization (which you implemented in a previous lab):
As was the case in linear regression above, the difference is the regularization term, which is
Including this term encourages gradient descent to minimize the size of the parameters. Note, in this example, the parameter is not regularized. This is standard practice.
Run the cell below to see it in action.
Expected Output:
Regularized cost: 0.6850849138741673 |
Gradient descent with regularization
The basic algorithm for running gradient descent does not change with regularization, it is: Where each iteration performs simultaneous updates on for all .
What changes with regularization is computing the gradients.
Computing the Gradient with regularization (both linear/logistic)
The gradient calculation for both linear and logistic regression are nearly identical, differing only in computation of .
m is the number of training examples in the data set
is the model's prediction, while is the target
For a linear regression model
For a logistic regression model where is the sigmoid function:
The term which adds regularization is the .
Gradient function for regularized linear regression
Run the cell below to see it in action.
Expected Output
Gradient function for regularized logistic regression
Run the cell below to see it in action.
Expected Output
Rerun over-fitting example
In the plot above, try out regularization on the previous example. In particular:
Categorical (logistic regression)
set degree to 6, lambda to 0 (no regularization), fit the data
now set lambda to 1 (increase regularization), fit the data, notice the difference.
Regression (linear regression)
try the same procedure.
Congratulations!
You have:
examples of cost and gradient routines with regularization added for both linear and logistic regression
developed some intuition on how regularization can reduce over-fitting