Path: blob/master/1_Supervised_Machine_Learning/Week 2. Regression with multiple input variables/C1_W2_Linear_Regression.ipynb
2826 views
Practice Lab: Linear Regression
Welcome to your first practice lab! In this lab, you will implement linear regression with one variable to predict profits for a restaurant franchise.
Outline
1 - Packages
First, let's run the cell below to import all the packages that you will need during this assignment.
numpy is the fundamental package for working with matrices in Python.
matplotlib is a famous library to plot graphs in Python.
utils.py
contains helper functions for this assignment. You do not need to modify code in this file.
2 - Problem Statement
Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet.
You would like to expand your business to cities that may give your restaurant higher profits.
The chain already has restaurants in various cities and you have data for profits and populations from the cities.
You also have data on cities that are candidates for a new restaurant.
For these cities, you have the city population.
Can you use the data to help you identify which cities may potentially give your business higher profits?
3 - Dataset
You will start by loading the dataset for this task.
The
load_data()
function shown below loads the data into variablesx_train
andy_train
x_train
is the population of a cityy_train
is the profit of a restaurant in that city. A negative value for profit indicates a loss.Both
X_train
andy_train
are numpy arrays.
View the variables
Before starting on any task, it is useful to get more familiar with your dataset.
A good place to start is to just print out each variable and see what it contains.
The code below prints the variable x_train
and the type of the variable.
x_train
is a numpy array that contains decimal values that are all greater than zero.
These values represent the city population times 10,000
For example, 6.1101 means that the population for that city is 61,101
Now, let's print y_train
Similarly, y_train
is a numpy array that has decimal values, some negative, some positive.
These represent your restaurant's average monthly profits in each city, in units of $10,000.
For example, 17.592 represents $175,920 in average monthly profits for that city.
-2.6807 represents -$26,807 in average monthly loss for that city.
Check the dimensions of your variables
Another useful way to get familiar with your data is to view its dimensions.
Please print the shape of x_train
and y_train
and see how many training examples you have in your dataset.
The city population array has 97 data points, and the monthly average profits also has 97 data points. These are NumPy 1D arrays.
Visualize your data
It is often useful to understand the data by visualizing it.
For this dataset, you can use a scatter plot to visualize the data, since it has only two properties to plot (profit and population).
Many other problems that you will encounter in real life have more than two properties (for example, population, average household income, monthly profits, monthly sales).When you have more than two properties, you can still use a scatter plot to see the relationship between each pair of properties.
Your goal is to build a linear regression model to fit this data.
With this model, you can then input a new city's population, and have the model estimate your restaurant's potential monthly profits for that city.
4 - Refresher on linear regression
In this practice lab, you will fit the linear regression parameters to your dataset.
The model function for linear regression, which is a function that maps from
x
(city population) toy
(your restaurant's monthly profit for that city) is represented asTo train a linear regression model, you want to find the best parameters that fit your dataset.
To compare how one choice of is better or worse than another choice, you can evaluate it with a cost function
is a function of . That is, the value of the cost depends on the value of .
The choice of that fits your data the best is the one that has the smallest cost .
To find the values that gets the smallest possible cost , you can use a method called gradient descent.
With each step of gradient descent, your parameters come closer to the optimal values that will achieve the lowest cost .
The trained linear regression model can then take the input feature (city population) and output a prediction (predicted monthly profit for a restaurant in that city).
5 - Compute Cost
Gradient descent involves repeated steps to adjust the value of your parameter to gradually get a smaller and smaller cost .
At each step of gradient descent, it will be helpful for you to monitor your progress by computing the cost as gets updated.
In this section, you will implement a function to calculate so that you can check the progress of your gradient descent implementation.
Cost function
As you may recall from the lecture, for one variable, the cost function for linear regression is defined as
You can think of as the model's prediction of your restaurant's profit, as opposed to , which is the actual profit that is recorded in the data.
is the number of training examples in the dataset
Model prediction
For linear regression with one variable, the prediction of the model for an example is representented as:
This is the equation for a line, with an intercept and a slope
Implementation
Please complete the compute_cost()
function below to compute the cost .
Exercise 1
Complete the compute_cost
below to:
Iterate over the training examples, and for each example, compute:
The prediction of the model for that example
The cost for that example
Return the total cost over all examples
Here, is the number of training examples and is the summation operator
If you get stuck, you can check out the hints presented after the cell below to help you with the implementation.
Click for hints
You can represent a summation operator eg: in code as follows:
In this case, you can iterate over all the examples in
x
using a for loop and add thecost
from each iteration to a variable (cost_sum
) initialized outside the loop.Then, you can return the
total_cost
ascost_sum
divided by2m
.
You can check if your implementation was correct by running the following test code:
Expected Output:
Cost at initial w: 75.203 |
As described in the lecture videos, the gradient descent algorithm is:
where, parameters are both updated simultaniously and where
m is the number of training examples in the dataset
is the model's prediction, while , is the target value
You will implement a function called compute_gradient
which calculates ,
Exercise 2
Please complete the compute_gradient
function to:
Iterate over the training examples, and for each example, compute:
The prediction of the model for that example
The gradient for the parameters from that example
Return the total gradient update from all the examples
Here, is the number of training examples and is the summation operator
If you get stuck, you can check out the hints presented after the cell below to help you with the implementation.
Click for hints
Then, you can return
dj_dw
anddj_db
both divided bym
.
Run the cells below to check your implementation of the compute_gradient
function with two different initializations of the parameters ,.
Now let's run the gradient descent algorithm implemented above on our dataset.
Expected Output:
Gradient at initial , b (zeros) | -65.32884975 -5.83913505154639 |
Expected Output:
Gradient at test w | -47.41610118 -4.007175051546391 |
2.6 Learning parameters using batch gradient descent
You will now find the optimal parameters of a linear regression model by using batch gradient descent. Recall batch refers to running all the examples in one iteration.
You don't need to implement anything for this part. Simply run the cells below.
A good way to verify that gradient descent is working correctly is to look at the value of and check that it is decreasing with each step.
Assuming you have implemented the gradient and computed the cost correctly and you have an appropriate value for the learning rate alpha, should never increase and should converge to a steady value by the end of the algorithm.
Now let's run the gradient descent algorithm above to learn the parameters for our dataset.
Expected Output:
w, b found by gradient descent | 1.16636235 -3.63029143940436 |
We will now use the final parameters from gradient descent to plot the linear fit.
Recall that we can get the prediction for a single example .
To calculate the predictions on the entire dataset, we can loop through all the training examples and calculate the prediction for each example. This is shown in the code block below.
We will now plot the predicted values to see the linear fit.
Your final values of can also be used to make predictions on profits. Let's predict what the profit would be in areas of 35,000 and 70,000 people.
The model takes in population of a city in 10,000s as input.
Therefore, 35,000 people can be translated into an input to the model as
np.array([3.5])
Similarly, 70,000 people can be translated into an input to the model as
np.array([7.])
Expected Output:
For population = 35,000, we predict a profit of | $4519.77 |