📚 The CoCalc Library - books, templates and other resources
License: OTHER
Sometimes, there are more advanced operations we want to do with NumPy arrays. For example, if we had an array of values and wanted to set all negative values to zero, how would we do this? The answer is called fancy indexing, and be done two ways: boolean indexing, and array indexing.
Boolean indexing
The idea behind boolean indexing is that for each element of the array, we know whether we want to select it or not. A boolean array is an array of the same shape as our original array which contains only True and False values. The location of the True values in our boolean array indicate the location of the element in our original array that we want to select, while the location of the False values correspond to those elements in our original array that we don't want to select.
Let's consider our experiment data again:
Recall that these are reaction times. It is typically accepted that really low reaction times -- such as less than 100 milliseconds -- are too fast for people to have actually seen and processed the stimulus. Let's see if there are any reaction times less than 100 milliseconds in our data.
To pull out just the elements less than 100 milliseconds, we need two steps. First, we use boolean comparisons to check which are less than 100ms:
Then, using this too_fast
array, we can index back into the original array, and see that there are indeed some trials which were abnormally fast:
What this is doing is essentially saying: for every element in too_fast
that is True
, give me the corresponding element in arr
.
Bcause this is a boolean array, we can also negate it, and pull out all the elements that we consider to be valid reaction times:
Not only does this give you the elements, but modifying those elements will modify the original array, too. In this case, we will set our "too fast" elements to have a value of "not a number", or NaN
:
Now, if we try to find which elements are less than 100 milliseconds, we will not find any:
RuntimeWarning
when you run the above cell, saying that an "invalid value" was encountered. Sometimes, it is possible for NaNs to appear in an array without your knowledge: for example, if you multiply infinity (np.inf
) by zero. So, NumPy is warning us that it has encountered NaNs (the "invalid value") in case we weren't aware. We knew there were NaNs because we put them there, so in this scenario we can safely ignore the warning. However, if you encounter a warning like this in the future and you weren't expecting it, make sure you investigate the source of the warning!
Exercise: Threshold (2 points)
threshold
, which takes an array and returns a new array with values thresholded by the mean of the array.
Array indexing
The other type of fancy indexing is array indexing. Let's consider our average response across participants:
And let's say we also know which element corresponds to which participant, through the following participants
array:
In other words, the first element of avg_responses
corresponds to the first element of participants
(so participant 45), the second element of avg_responses
was given by participant 39, and so on.
Let's say we wanted to know what participants had the largest average response, and what participants had the smallest average response. To do this, we might try sorting the responses:
However, we then don't know which responses correspond to which trials. A different way to do this would be to use np.argsort
, which returns an array of indices corresponding to the sorted order of the elements, rather than the elements in sorted order:
What this says is that element 18 is the smallest response, element 42 is the next smallest response, and so on, all the way to element 24, which is the largest response:
To use fancy indexing, we can actually use this array of integers as an index. If we use it on the original array, then we will obtain the sorted elements:
And if we use it on our array of participants, then we can determine what participants had the largest and smallest responses:
So, in this case, participant 10 had the smallest average response, while participant 47 had the largest average response.
From boolean to integer indices
Sometimes, we want to use a combination of boolean and array indexing. For example, if we wanted to pull out just the responses for participant 2, a natural approach would be to use boolean indexing:
Another way that we could do this would be to determine the index of participant 2, and then use that to index into data
. To do this, we can use a function called np.argwhere
, which returns the indices of elements that are true:
So in this case, we see that participant 2 corresponds to index 26.
Exercise: Averaging responses (2 points)
raise
keyword. For example, to raise a ValueError
, you would do raise ValueError(message)
, where message
is a string explaining specifically what the error was.