📚 The CoCalc Library - books, templates and other resources
License: OTHER
We will be making heavy use of the Python library called NumPy. It is not included by default, so we first need to import it. Go ahead and run the following cell:
Now, we have access to all NumPy functions via the variable np
(this is the convention in the Scientific Python community for referring to NumPy). We can take a look at what this variable actually is, and see that it is in fact the numpy
module (remember that you will need to have run the cell above before np
will be defined!):
NumPy is incredibly powerful and has many features, but this can be a bit intimidating when you're first starting to use it. If you are familiar with other scientific computing languages, the following guides may be of use:
NumPy for Matlab Users: http://mathesaurus.sourceforge.net/matlab-numpy.html
NumPy for R (and S-Plus) Users: http://mathesaurus.sourceforge.net/r-numpy.html
If not, don't worry! Here we'll go over the most common NumPy features.
Arrays and lists
The core component of NumPy is the ndarray
, which is pronounced like "N-D array" (i.e., 1-D, 2-D, ..., N-D). We'll use both the terms ndarray
and "array" interchangeably. For now, we're going to stick to just 1-D arrays -- we'll get to multidimensional arrays later.
Arrays are very similar to lists
. Let's first review how lists work. Remember that we can create them using square brackets:
And we can access an element via its index. To get the first element, we use an index of 0:
To get the second element, we use an index of 1:
And so on.
Arrays work very similarly. The first way to create an array is from an already existing list:
myarray
looks different than mylist
-- it actually tells you that it's an array. If we take a look at the types of mylist
and myarray
, we will also see that one is a list and one is an array. Using type
can be a very useful way to verify that your variables contain what you want them to contain:
We can get elements from a NumPy array in exactly the same way as we get elements from a list:
Array slicing
myarray[a:b:c]
, where a
, b
, and c
are all optional (though you have to specify at least one). a
is the index of the beginning of the slice, b
is the index of the end of the slice (exclusive), and c
is the step size.
Note that the exclusive slice indexing described above is different than some other languages you may be familiar with, like Matlab and R. myarray[1:2]
returns only the second elment in myarray in Python, instead of the first and second element.
First, let's quickly look at what is in our array and list (defined above), for reference:
Now, to get all elements except the first:
To get all elements except the last:
To get all elements except the first and the last:
To get every other element of the array (beginning from the first element):
To get every element of the array (beginning from the second element):
And to reverse the array:
Array computations
So far, NumPy arrays seem basically the same as regular lists. What's the big deal about them?
Working with single arrays
One advantage of using NumPy arrays over lists is the ability to do a computation over the entire array. For example, if you were using lists and wanted to add one to every element of the list, here's how you would do it:
Or, you could use a list comprehension:
In contrast, adding one to every element of a NumPy array is far simpler:
This won't work with normal lists. For example, if you ran mylist + 1
, you'd get an error like this:
We can do the same thing for subtraction, multiplication, etc.:
Working with multiple arrays
We can also easily do these operations for multiple arrays. For example, let's say we want to add the corresponding elements of two lists together. Here's how we'd do it with regular lists:
With NumPy arrays, we just have to add the arrays together:
Just as when we are working with a single array, we can add, subtract, divide, multiply, etc. several arrays together:
Creating and modifying arrays
One thing that you can do with lists that you cannot do with NumPy arrays is adding and removing elements. For example, I can create a list and then add elements to it with append
:
However, you cannot do this with NumPy arrays. If you tried to run the following code, for example:
You'd get an error like this:
There are a few ways to create a new array with a particular size:
np.empty(size)
-- creates an empty array of sizesize
np.zeros(size)
-- creates an array of sizesize
and sets all the elements to zeronp.ones(size)
-- creates an array of sizesize
and sets all the elements to one
So the way that we would create an array like the list above is:
np.arange
, which will create an array containing a sequence of numbers (it is very similar to the built-in range
or xrange
functions in Python).
Here are a few examples of using np.arange
. Try playing around with them and make sure you understand how it works:
"Vectorized" computations
Another very useful thing about NumPy is that it comes with many so-called "vectorized" operations. A vectorized operation (or computation) works across the entire array. For example, let's say we want to add together all the numbers in a list. In regular Python, we might do it like this:
Using NumPy arrays, we can just use the np.sum
function:
np.prod
), mean (np.mean
), and variance (np.var
). They all act essentially the same way as np.sum
-- give the function an array, and it computes the relevant function across all the elements in the array.
Exercise: Euclidean distance (2 points)
Recall that the Euclidean distance is given by the following equation:
In NumPy, this is a fairly simple computation because we can rely on array computations and the np.sum
function to do all the heavy lifting for us.
euclidean_distance
below to compute , as given by the equation above. Note that you can compute the square root using np.sqrt
.
euclidean_distance
), and then run the cell below to check your answer. If you make changes to the cell with your answer, you will need to first re-run that cell, and then re-run the test cell to check your answer again.Creating multidimensional arrays
Previously, we saw that functions like np.zeros
or np.ones
could be used to create a 1-D array. We can also use them to create N-D arrays. Rather than passing an integer as the first argument, we pass a list or tuple with the shape of the array that we want. For example, to create a array of zeros:
shape
attribute:
Note that for 1-D arrays, the shape returned by the shape
attribute is still a tuple, even though it only has a length of one:
This also means that we can create 1-D arrays by passing a length one tuple. Thus, the following two arrays are identical:
(3, 4)
, we must use np.zeros((3, 4))
. The following will not work:It will give an error like this:
This is because the second argument to np.zeros
is the data type, so numpy thinks you are trying to create an array of zeros with shape (3,)
and datatype 4
. It (understandably) doesn't know what you mean by a datatype of 4
, and so throws an error.
size
attribute:
We can also create arrays and then reshape them into any shape, provided the new array has the same size as the old array:
Accessing and modifying multidimensional array elements
To access or set individual elements of the array, we can index with a sequence of numbers:
We can also access the element on it's own, without having the equals sign and the stuff to the right of it:
We frequently will want to access ranges of elements. In NumPy, the first dimension (or axis) corresponds to the rows of the array, and the second axis corresponds to the columns. For example, to look at the first row of the array:
To look at columns, we use the following syntax:
The colon in the first position essentially means "select from every row". So, we can interpret arr[:, 1]
as meaning "take the second element of every row", or simply "take the second column".
Using this syntax, we can select whole regions of an array. For example:
For example, if I want to create a second array that mutliples every other value in arr
by two, the following code will work but will have unexpected consequences:
Note that arr
and arr2
both have the same values! This is because the line arr2 = arr
doesn't actually copy the array: it just makes another pointer to the same object. To truly copy the array, we need to use the .copy()
method: