📚 The CoCalc Library - books, templates and other resources
License: OTHER
Analysis of Algorithms
Code examples from Think Complexity, 2nd edition.
Copyright 2017 Allen Downey, MIT License
Empirical order of growth
Sometimes we can figure out what order of growth a function belongs to by running it with a range of problem sizes and measuring the run time.
To measure runtimes, we'll use etime
, which uses os.times
to compute the total time used by a process, including "user time" and "system time". User time is time spent running your code; system time is time spent running operating system code on your behalf.
time_func
takes a function object and a problem size, n
, runs the function, and returns the elapsed time.
Here's an example: a function that makes a list with the given length using append
.
run_timing_test
takes a function, runs it with a range of problem sizes, and returns two lists: problem sizes and times.
Here's an example with append_list
fit
takes the lists of ns and ts and fits it with a curve of the form a * n**exp
, where exp
is a given exponent and a
is chosen so that the line goes through a particular point in the sequence, usually the last.
plot_timing_test
plots the results.
Here are the results from append_list
. When we plot ts
versus ns
on a log-log scale, we should get a straight line. If the order of growth is linear, the slope of the line should be 1.
For small values of n
, the runtime is so short that we're probably not getting an accurate measurement of just the operation we're interested in. But as n
increases, runtime seems to converge to a line with slope 1.
That suggests that performing append n
times is linear, which suggests that a single append is constant time.
list.pop
Now let's try that with pop
, and specifically with pop(0)
, which pops from the left side of the list.
That doesn't look very good. The runtimes increase more steeply than the line with slope 1. Let's try slope 2.
The last few points converge on the line with slope 2, which suggests that pop(0)
is quadratic.
Exercise: What happens if you pop from the end of the list? Write a function called pop_right_list
that pops the last element instead of the first. Use run_timing_test
to characterize its performance. Then use plot_timing_test
with a few different values of exp
to find the slope that best matches the data. What conclusion can you draw about the order of growth for popping an element from the end of a list?
Sorting
We expect sorting to be n log n
. On a log-log scale, that doesn't look like a straight line, so there's no simple test whether it's really n log n
. Nevertheless, we can plot results for sorting lists with different lengths, and see what it looks like.
It sure looks like sorting is linear, so that's surprising. But remember that log n
changes much more slowly than n
. Over a wide range of values, n log n
can be hard to distinguish from an algorithm with linear growth. As n
gets bigger, we would expect this curve to be steeper than slope 1. But often, for practical problem sizes, n log n
is as good as linear.
Bisection search
We expect bisection search to be log n
, which is so fast it is hard to measure the way we've been doing it.
To make it work, I create the sorted list ahead of time and use the parameter hi
to specify which part of the list to search. Also, I have to run each search 100,000 times so it takes long enough to measure.
It looks like the runtime increases slowly as n
increases, but it's definitely not linear. To see if it's constant time, we can compare it to the line with slope 0.
Nope, looks like it's not constant time, either. We can't really conclude that it's log n
based on this curve alone, but the results are certainly consistent with that theory.
Dictionary methods
Exercise: Write methods called add_dict
and lookup_dict
, based on append_list
and pop_left_list
. What is the order of growth for adding and looking up elements in a dictionary?
Implementing a hash table
The reason Python dictionaries can add and look up elements in constant time is that they are based on hash tables. In this section, we'll implement a hash table in Python. Remember that this example is for educational purposes only. There is no practical reason to write a hash table like this in Python.
We'll start with a simple linear map, which is a list of key-value pairs.
First let's make sure it works:
Now let's see how long it takes to add n
elements.
Adding n
elements is linear, so each add is constant time. How about lookup?
Looking up n
elements is (notice that exp=2
). So each lookup is linear.
Let's see what happens if we break the list of key-value pairs into 100 lists.
The run time is better (we get to a larger value of n
before we run out of time).
The order of growth is hard to characterize. It looks steeper than the line with slope 1. Let's try slope 2.
It might be converging to the line with slope 2, but it's hard to say anything conclusive without running larger problem sizes.
Exercise: Go back and run run_timing_test
with a larger value of max_time
and see if the run time converges to the line with slope 2. Just be careful not to make max_time
to big.
Now we're ready for a complete implementation of a hash map.
Exercise: Write a function called lookup_hash_map
, based on lookup_better_map
, and characterize its run time.
If things go according to plan, the results should converge to a line with slope 1. Which means that n
lookups is linear, which means that each lookup is constant time. Which is pretty much magic.