Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download

Jupyter notebook Week_1_b exercises.ipynb

11 views
Kernel: Python 3

Exercise notebook 1: Having a go at it

This Jupyter notebook, for Week 1 of The Open University's Learn to code for Data Analysis course, contains code examples and coding activities for you.

You'll come across steps in the FutureLearn course directing you to this notebook. Once you've done the exercise, go back to FutureLearn to discuss it with your fellow learners and course facilitators and mark it as complete.

# this code conceals irrelevant warning messages import warnings warnings.simplefilter('ignore', FutureWarning)

Exercise 1: variables and assignments

A variable is a named storage for values. An assignment takes a value (like the number 100 below) and stores it in a variable (deathsInPortugal below).

deathsInPortugal = 100

To display the value stored in a variable, write the name of the variable.

deathsInPortugal
100

Each variable can store one value at any time, but the value stored can vary over time, by assigning a new value to the variable.

deathsInPortugal = 100 deathsInPortugal = 140 deathsInPortugal
140

Each assignment is written on a separate line. The computer executes the assignments one line at a time, from top to bottom.

deathsInPortugal = 140 deathsInAngola = 6900 deathsInBrazil = 4400 deathsInRussia = 17000 deathsInIndia = 240000 deathsInChina = 41000 deathsInSouthAfrica = 25000

Task

Add assignments to the code cell above (or in a new code cell) for the estimated deaths by TB in 2013 in the remaining BRICS countries. The values are as follows: Russia 17000, India 240000, China 41000, South Africa 25000.

Don't forget to run the code cell, so that the new variables are available for the exercises further below.

Now go back to the 'Start coding yourself' step in FutureLearn to discuss and mark it complete.

Exercise 2: expressions

An expression is a fragment of code that has a value. A variable name, by itself, is an expression: the expression's value is the value stored in the variable. In Jupyter notebooks, if the last line of a code cell is an expression, then the computer will show its value when executing the cell.

deathsInPortugal
140

By contrast, a statement is a command for the computer to do something. Commands don't produce values, and therefore the computer doesn't display anything.

deathsInPortugal = 140

More complex expressions can be written using the arithmetic operators of addition (+), substraction (-), multiplication (*) and division (/). For example, the total number of deaths in the three countries is:

deathsInAngola + deathsInBrazil + deathsInPortugal
11440

If the calculated value needs to be used later on in the code, it has to be stored in a variable. In general, the right-hand side of an assignment is an expression; its value is calculated (the expression is evaluated) and stored in the variable.

totalDeaths = deathsInAngola + deathsInBrazil + deathsInPortugal totalDeaths
11440

The average number of deaths is the total divided by the number of countries.

totalDeaths / 3
3813.3333333333335

The average could also be calculated with a single expression.

(deathsInAngola + deathsInBrazil + deathsInPortugal) / 3
3813.3333333333335

The parentheses (round brackets) are necessary to state that the sum has to be calculated before the division. Without parentheses, Python follows the conventional order used in mathematics: divisions and multiplications are done before additions and subtractions.

deathsInAngola + deathsInBrazil + deathsInPortugal / 3
11346.666666666666

Task

  • In the cell below, write code to calculate the total number of deaths in the five BRICS countries (Brazil, Russia, India, China, South Africa) in 2013. Run the code to see the result, which should be 327400.

deathsInBrazil + deathsInRussia + deathsInIndia + deathsInChina + deathsInSouthAfrica
327400
  • In the cell below, write code to calculate the average number of deaths in the BRICS countries in 2013. Run the code to see the result, which should be 65480.

(deathsInBrazil + deathsInRussia + deathsInIndia + deathsInChina + deathsInSouthAfrica)/5
65480.0

Now go back to the 'Expressions' step in FutureLearn to discuss and mark it complete.

Exercise 3: functions quiz

A function takes zero or more values (the function's arguments) and returns (produces) a value. To call (use) a function, write the function name, followed by its arguments within parentheses (round brackets). Multiple arguments are separated by commas. Function names follow the same rules and conventions as variable names. A function call is an expression: the expression's value is the value returned by the function.

Python provides two functions to compute the maximum (largest) and minimum (smallest) of two or more values.

max(deathsInBrazil, deathsInPortugal)
4400
min(deathsInAngola, deathsInBrazil, deathsInPortugal)
140

The range of a set of values is the difference between the maximum and the minimum.

largest = max(deathsInBrazil, deathsInPortugal) smallest = min(deathsInBrazil, deathsInPortugal) deathsRange = largest - smallest deathsRange
4260

Tasks

Answer the quiz questions on Futurelearn. All of them can be answered by editing the above code cell. Don't forget that you can use TAB-completion to quickly write the variable names of the remaining BRICS countries, namely Russia, India, China and South Africa (Brazil is already in the code above).

Exercise 4: comments

Comments start with the hash sign (#) and go until the end of the line. They're used to annotate the code, e.g. to indicate the units of values.

# population unit: thousands of inhabitants populationOfPortugal = 10608 # deaths unit: inhabitants deathsInPortugal = 140 # deaths per 100 thousand inhabitants deathsInPortugal * 100 / populationOfPortugal
1.3197586726998491

Task

Calculate the deaths per 100 thousand inhabitants for Brazil. Its population in 2013 was roughly 200 million and 362 thousand people. You should obtain a result of around 2.2 deaths per 100 thousand people.

Now go back to the 'Comments' step in FutureLearn to discuss and mark it complete.

Exercise 5: pandas quiz

All programs in this course must start with the following import statement, to load all the code from the pandas module.

from pandas import *

The words in boldface (from and import) are reserved words of the Python language; they cannot be used as names.

Task

Answer the quiz questions in Futurelearn. You can change the above line of code to find out the answers.

Exercise 6: selecting a column

The read_excel() function takes a string with the name of an Excel file, and returns a dataframe, the pandas representation of a table. The computer reports a file not found error if the file is not in the same folder as this notebook, or the file name is misspelt.

data = read_excel('WHO POP TB some.xls') data

The expression dataFrame[columnName] evaluates to the column with the given name (a string). Column names are case sensitive. Misspelling the column name will result in a rather long key error message. You can see what happens by changing the string in the next code cell (e.g. replace TB by tb) and running it. Don't forget to undo your change and run the code again.

tbColumn = data['TB deaths'] tbColumn
0 6900 1 4400 2 41000 3 67 4 1200 5 240000 6 18000 7 140 8 17000 9 18 10 25000 11 990 Name: TB deaths, dtype: int64

Task

In the next cell, select the population column and store it in a variable (you'll use it in the next exercise). You need to scroll back to the start of the exercise to see the column's name.

Now go back to the 'Selecting a column' step in FutureLearn to discuss and mark it complete.

Exercise 7: calculations on a column

A method is a function that can only be called in a certain context, like a dataframe or a column. A method call is of the form context.methodName(argument1, argument2, ...).

Pandas provides several column methods, including to calculate the sum, the largest, and the smallest of the numbers in a column, as follows.

tbColumn.sum()
354715
tbColumn.max()
240000
tbColumn.min()
18

The mean of a collection of numbers is the sum of those numbers divided by how many there are.

tbColumn.sum() / 12
29559.583333333332
tbColumn.mean()
29559.583333333332

The median of a collection of numbers is the number in the middle, i.e. half of the numbers are below the median and half are above.

tbColumn.median()
5650.0

Tasks

Use the population column variable from the previous exercise to calculate:

  • the total population

  • the maximum population

  • the minimum population

Now go back to the 'Calculations on a column' step in FutureLearn to discuss and mark it complete.

Exercise 8: sorting on a column

The dataframe method sort() takes as argument a column name and returns a new dataframe, with rows in ascending order according to the values in that column.

data.sort('TB deaths')

Sorting doesn't change the original table.

data # rows still in original order

Sorting on a column that has text will put the rows in alphabetical order.

data.sort('Country')

Task

Sort the same table by population, to quickly see which are the least and the most populous countries.

Now go back to the 'Sorting on a column' step in FutureLearn to discuss and mark it complete.

Final quiz: Calculations over columns

This information will help you to answer questions in the Week 1 quiz.

The value of an arithmetic expression involving columns is a column. In evaluating the expression, the computer computes the expression for each row.

deathsColumn = data['TB deaths'] populationColumn = data['Population (1000s)'] rateColumn = deathsColumn * 100 / populationColumn rateColumn
0 32.134873 1 2.196025 2 2.942576 3 8.850727 4 70.422535 5 19.167186 6 69.675621 7 1.319759 8 11.901928 9 9.326425 10 47.370017 11 87.378641 dtype: float64

To add a new column to a dataframe, 'select' a non-existing column, i.e. with a new name, and assign to it.

data['TB deaths (per 100000)'] = rateColumn data

Tasks

Add code to calculate:

  • the range of the population, in thousands of inhabitants

  • the mean of the death rate

  • the median of the death rate

Now you can answer the questions in the Week 1 quiz.