📚 The CoCalc Library - books, templates and other resources
License: OTHER
NK models of evolution
Code examples from Think Complexity, 2nd edition.
Copyright 2016 Allen Downey, MIT License
The NK landscape
Here's an implementation of an NK landscape.
A location in the landscape is represented by a NumPy array of N 0s and 1s.
The index
attribute of NKLandscape
is an array of indices into a location, which is an efficient way to select the overlapping slices.
The cache
attribute is a dictionary that maps from (i, slice)
to a fitness, where i
indicates which of the N
functions we want to evaluate and slice
is the parameters of the function.
The first time we see a particular (i, slice)
pair, we generate a random fitness value. The we store it in the cache
in case we need it again.
Here's a small example. The index shows how the traits are linked. Trait 0 is linked to traits 1 and 2. Trait 1 is linked to traits 2 and 3, etc.
Here's an example that evaluates the fitness at a random location:
Here's what the landscape cache looks like after one evaluation:
If we evaluate the same location again, we should get the same value.
And if we evaluate a different location, we should get a different value.
The agents
Here's a parent class, NKAgent
, that contains code used by all agents:
Here's an example using the NKAgent parent class.
We can choose a random direction.
And consider moving.
The following loop considers every direction, in random order, and accepts the first acceptable move.
Now we can encapsulate that strategy, called the "fitter" strategy, in a class:
Exercise: Implement the other strategies described by Vidgen and Padget in Sendero.
Write a class definition called NKAgentMutant
that implements the one-mutant neighbor strategy and a class definition called NKAgentGreedy
that implements the greedy strategy. Instantiate one of each and invoke their step methods.
The simulator
Here's a class that runs simulations:
And here's an example with small values of N
and K
. Initially the distribution of fitness is centered around 0.5.
Here's how the distribution of fitness evolves after each step.
After every agent has found a peak, we can plot the distribution of fitness.
The number of unique locations is a lower bound on the number of peaks.
And we can look at the heights of the peaks.
Here's the distribution of path lengths. A few agents are born on a peak. The longest path is probably 5 or fewer.
This function encapsulates the steps for running a simulation:
Here's the same small example:
This function takes a completed simulation and summarizes the results.
And here are the results with small values of N
and K
.
Here's a simulation run with larger N
and K
:
Exercise: Starting with N=5
and K=3
, run simulations with increasing values of N
, keeping K
constant, and plot the number of peaks and mean path length as a function of N
.
Exercise: Starting with N=20
and K=0
, run simulations with increasing values of K
, keeping N
constant, and plot the number of peaks and mean path length as a function of K
.
Neutral networks
Consider the extension of the NK model proposed by Newman and Engelhardt (1998), in which fitness values are quantized rather than continuous. Below is a subclass of NKLandscape that implements this extension, which is sometimes called the NKq model.
The NKq model gives rise to "neutral networks". A neutral network is "a set of sequences [locations] that all possess the same fitness and that are connected together via ... point mutations" (p. 1335). Newman and Engelhardt (hereafter NE) discover some striking properties of these networks, which, they suggest, greatly improve the performance of single-mutation evolutionary search on rugged fitness landscapes. In the exercises below, we ask you to replicate and interpret some of their results.
Detecting and analyzing neutral networks
To get you started, here's a class that might help. It keeps track of a collection of nodes that are at the same fitness level. It figures out which nodes are one-bit neighbors and creates edges between them. Then it uses NetworkX to find all connected components.
The tricky part of this implementation is that it converts each location from an array of 0s and 1s to an integer; it uses these integers as nodes in the graph, and it uses bitwise operations on these integers to check for neighbor relationships.
Here's a small example that shows how it works.
When we add another node that is a neighbor of the first, we get an edge between them.
Here's another node that is also a neighbor of the first (but not of the second).
And here's a node that's not connected.
If we print the sizes of the connected components, we get one set of 3 nodes and 1 unconnected node.
Now let's find all connected components for all fitness levels in a quantized landscape.
make_locs
makes an array that contains the binary representation of all locations in a landscape.
For N=10
, there are 2**N = 1024
locations.
collect_graph
enumerates the locations in the landscape, computes the fitness of each, and makes a dictionary that maps from each fitness level to a GraphOfLoc
that contains the locations at that fitness level.
We can summarize the results by printing the fitness levels and the sizes of the connected components at each level.
Here is a function that extracts the sizes of the components. (Recall that in the language of the NE paper, each component in this graph is a neutral network.)
And here is a function that computes the fraction of locations (or sequences, in NE's terminology) that reside in "common" networks, where a common network is one that is larger than average.
Finally, here is a function that runs a neutral network experiment. It takes values for N
, K
, and F
(leaving the default value of A
=2), creates an appropriate NKq landscape, instantiates a bunch of agents (by default, 100 agents that use the "fitter" strategy), and lets them walk the landscape.
The run_experiment
function returns three things: a Cdf of neutral network sizes, the fraction of locations that reside in common networks, and the maximum fitness achieved by the agents. You may find these things remarkably useful in the exercises below.
Your turn!
Exercise: Run experiments for a single value of N
(e.g., 14) and several values of K
(e.g., 1, 2 and 5). Plot the distribution of neutral network sizes. Compare your results with Figure 1 in the NE paper.
Exercise: Run experiments for a range of N
values (e.g., 8 to 16) and several values of K
(e.g., 2, 4 and 6). For each value of K
, plot the fraction of locations that reside in common neutral networks. Compare your results with Figure 2 in the NE paper.
Exercise: Run experiments for a single pair of N
and K
values (e.g., 14 and 4) and a range of F
values (e.g., 2 to 16). Plot the maximum fitness achieved as a function of F
. Compare your results with Figure 3 in the NE paper.
Exercise: Putting these findings together, explain in your own words how neutrality helps an evolving population reach a higher level of fitness. How might you be able to harness this phenomenon in designing engineered artifacts and systems?