📚 The CoCalc Library - books, templates and other resources

cocalc-examples / think-complexity-2ed / notebooks / chap03.ipynb

¹³²⁹²³ views
License: OTHER

Kernel: Python 3

Small World Graphs

Code examples from Think Complexity, 2nd edition.

In [1]:

%matplotlib inline

import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import seaborn as sns

from utils import decorate, savefig

# I set the random seed so the notebook 
# produces the same results every time.
np.random.seed(17)

# TODO: remove this when NetworkX is fixed
from warnings import simplefilter
import matplotlib.cbook
simplefilter("ignore", matplotlib.cbook.mplDeprecation)

In [2]:

# node colors for drawing networks
colors = sns.color_palette('pastel', 5)
#sns.palplot(colors)
sns.set_palette(colors)

Regular ring lattice

To make a ring lattice, I'll start with a generator function that yields edges between each node and the next halfk neighbors.

In [3]:

def adjacent_edges(nodes, halfk):
    """Yields edges between each node and `halfk` neighbors.
    
    halfk: number of edges from each node
    """
    n = len(nodes)
    for i, u in enumerate(nodes):
        for j in range(i+1, i+halfk+1):
            v = nodes[j % n]
            yield u, v

We can test it with 3 nodes and halfk=1

In [4]:

nodes = range(3)
for edge in adjacent_edges(nodes, 1):
    print(edge)

Now we use adjacent_edges to write make_ring_lattice

In [5]:

def make_ring_lattice(n, k):
    """Makes a ring lattice with `n` nodes and degree `k`.
    
    Note: this only works correctly if k is even.
    
    n: number of nodes
    k: degree of each node
    """
    G = nx.Graph()
    nodes = range(n)
    G.add_nodes_from(nodes)
    G.add_edges_from(adjacent_edges(nodes, k//2))
    return G

And we can test it out with n=10 and k=4

In [6]:

lattice = make_ring_lattice(10, 4)

In [7]:

nx.draw_circular(lattice, 
                 node_color='C0', 
                 node_size=1000, 
                 with_labels=True)

savefig('figs/chap03-1')

Exercise: To see how this function fails when k is odd, run it again with k=3 or k=5.

WS graph

To make a WS graph, you start with a ring lattice and then rewire.

In [8]:

def make_ws_graph(n, k, p):
    """Makes a Watts-Strogatz graph.
    
    n: number of nodes
    k: degree of each node
    p: probability of rewiring an edge
    """
    ws = make_ring_lattice(n, k)
    rewire(ws, p)
    return ws

Here's the function that does the rewiring

In [9]:

def rewire(G, p):
    """Rewires each edge with probability `p`.
    
    G: Graph
    p: float
    """
    nodes = set(G)
    for u, v in G.edges():
        if flip(p):
            choices = nodes - {u} - set(G[u])
            new_v = np.random.choice(list(choices))
            G.remove_edge(u, v)
            G.add_edge(u, new_v)
            
def flip(p):
    """Returns True with probability `p`."""
    return np.random.random() < p

Here's an example with p=0.2

In [10]:

ws = make_ws_graph(10, 4, 0.2)
nx.draw_circular(ws, 
                 node_color='C1', 
                 node_size=1000, 
                 with_labels=True)

Just checking that we have the same number of edges we started with:

In [11]:

len(lattice.edges()), len(ws.edges())

Now I'll generate a plot that shows WS graphs for a few values of p

In [12]:

n = 10
k = 4
ns = 100

plt.subplot(1,3,1)
ws = make_ws_graph(n, k, 0)
nx.draw_circular(ws, node_size=ns)
plt.axis('equal')

plt.subplot(1,3,2)
ws = make_ws_graph(n, k, 0.2)
nx.draw_circular(ws, node_size=ns)
plt.axis('equal')

plt.subplot(1,3,3)
ws = make_ws_graph(n, k, 1.0)
nx.draw_circular(ws, node_size=ns)
plt.axis('equal')

#TODO: Set figure size
savefig('figs/chap03-2')

Exercise: What is the order of growth of rewire?

In [13]:

# Solution goes here

Clustering

The following function computes the local clustering coefficient for a given node, u:

In [14]:

def node_clustering(G, u):
    """Computes local clustering coefficient for `u`.
    
    G: Graph
    u: node
    
    returns: float
    """
    neighbors = G[u]
    k = len(neighbors)
    if k < 2:
        return np.nan
        
    possible = k * (k-1) / 2
    exist = 0    
    for v, w in all_pairs(neighbors):
        if G.has_edge(v, w):
            exist +=1
    return exist / possible

def all_pairs(nodes):
    """Generates all pairs of nodes."""
    for i, u in enumerate(nodes):
        for j, v in enumerate(nodes):
            if i < j:
                yield u, v

The network average clustering coefficient is just the mean of the local CCs.

In [15]:

def clustering_coefficient(G):
    """Average of the local clustering coefficients.
    
    G: Graph
    
    returns: float
    """
    cu = [node_clustering(G, node) for node in G]
    return np.nanmean(cu)

In a ring lattice with k=4, the clustering coefficient for each node should be 0.5

In [16]:

lattice = make_ring_lattice(10, 4)
node_clustering(lattice, 1)

And the network average should be 0.5

In [17]:

clustering_coefficient(lattice)

Correct.

In [18]:

%timeit clustering_coefficient(lattice)

Exercise: Write a version of node_clustering that replaces the for loop with a list comprehension. Is it faster?

In [19]:

# Solution goes here

In [20]:

%timeit clustering_coefficient(lattice)

Exercise: What is the order of growth of clustering_coefficient in terms of n, m, and k?

In [21]:

# Solution goes here

Path length

The following function computes path lengths between all pairs of nodes

In [22]:

def path_lengths(G):
    length_iter = nx.shortest_path_length(G)
    for source, dist_map in length_iter:
        for dest, dist in dist_map.items():
            yield dist

The characteristic path length is the mean path length for all pairs.

In [23]:

def characteristic_path_length(G):
    return np.mean(list(path_lengths(G)))

On a complete graph, the average path length should be 1

In [24]:

complete = nx.complete_graph(10)
characteristic_path_length(complete)

On a ring lattice with n=1000 and k=10, the mean is about 50

In [25]:

lattice = make_ring_lattice(1000, 10)
characteristic_path_length(lattice)

Exercise: What is the mean path length in a ring lattice with n=10 and k=4?

In [26]:

# Solution goes here

The experiment

This function generates a WS graph with the given parameters and returns a pair of (mean path length, clustering coefficient):

In [27]:

def run_one_graph(n, k, p):
    """Makes a WS graph and computes its stats.
    
    n: number of nodes
    k: degree of each node
    p: probability of rewiring
    
    returns: tuple of (mean path length, clustering coefficient)
    """
    ws = make_ws_graph(n, k, p)    
    mpl = characteristic_path_length(ws)
    cc = clustering_coefficient(ws)
    print(mpl, cc)
    return mpl, cc

With n=1000 and k=10, it takes a few seconds on my computer:

In [28]:

%time run_one_graph(1000, 10, 0.01)

Now we'll run it with a range of values for p.

In [29]:

ps = np.logspace(-4, 0, 9)
print(ps)

This function runs each value of p 20 times and returns a dictionary that maps from each p to a list of (mpl, cc) pairs.

In [30]:

def run_experiment(ps, n=1000, k=10, iters=20):
    """Computes stats for WS graphs with a range of `p`.
    
    ps: sequence of `p` to try
    n: number of nodes
    k: degree of each node
    iters: number of times to run for each `p`
    
    returns:
    """
    res = []
    for p in ps:
        print(p)
        t = [run_one_graph(n, k, p) for _ in range(iters)]
        means = np.array(t).mean(axis=0)
        print(means)
        res.append(means)
    return np.array(res)

Here are the raw results. Warning: this takes a few minutes to run.

In [31]:

%time res = run_experiment(ps)

In [32]:

res

Let's get the results into a form that's easy to plot.

In [33]:

L, C = np.transpose(res)

In [34]:

In [35]:

And normalize them so they both start at 1.0

In [36]:

L /= L[0]
C /= C[0]

Here's the plot that replicates Watts and Strogatz's Figure 2.

In [37]:

plt.plot(ps, C, 's-', linewidth=1, label='C(p) / C(0)')
plt.plot(ps, L, 'o-', linewidth=1, label='L(p) / L(0)')
decorate(xlabel='Rewiring probability (p)', xscale='log',
         title='Normalized clustering coefficient and path length',
         xlim=[0.00009, 1.1], ylim=[-0.01, 1.01])

savefig('figs/chap03-3')

Breadth-first search

Now let's see how the shortest path algorithm works. We'll start with BFS, which is the basis for Dijkstra's algorithm.

Here's our old friend, the ring lattice:

In [38]:

lattice = make_ring_lattice(10, 4)

In [39]:

nx.draw_circular(lattice, 
                 node_color='C2', 
                 node_size=1000, 
                 with_labels=True)

And here's my implementation of BFS using a deque.

In [40]:

from collections import deque

def reachable_nodes_bfs(G, start):
    """Finds reachable nodes by BFS.
    
    G: graph
    start: node to start at
    
    returns: set of reachable nodes
    """
    seen = set()
    queue = deque([start])
    while queue:
        node = queue.popleft()
        if node not in seen:
            seen.add(node)
            queue.extend(G.neighbors(node))
    return seen

It works:

In [41]:

reachable_nodes_bfs(lattice, 0)

Here's a version that's a little faster, but maybe less readable.

In [42]:

def reachable_nodes_bfs(G, start):
    """Finds reachable nodes by BFS.
    
    G: graph
    start: node to start at
    
    returns: set of reachable nodes
    """
    seen = set()
    queue = deque([start])
    while queue:
        node = queue.popleft()
        if node not in seen:
            seen.add(node)
            neighbors = set(G[node]) - seen
            queue.extend(neighbors)
    return seen

It works, too.

In [43]:

reachable_nodes_bfs(lattice, 0)

Dijkstra's algorithm

Now we're ready for Dijkstra's algorithm, at least for graphs where all the edges have the same weight/length.

In [44]:

def shortest_path_dijkstra(G, source):
    """Finds shortest paths from `source` to all other nodes.
    
    G: graph
    source: node to start at
    
    returns: make from node to path length
    """
    dist = {source: 0}
    queue = deque([source])
    while queue:
        node = queue.popleft()
        new_dist = dist[node] + 1

        neighbors = set(G[node]).difference(dist)
        for n in neighbors:
            dist[n] = new_dist
        
        queue.extend(neighbors)
    return dist

Again, we'll test it on a ring lattice.

In [45]:

lattice = make_ring_lattice(10, 4)

In [46]:

nx.draw_circular(lattice, 
                 node_color='C3', 
                 node_size=1000, 
                 with_labels=True)

Here's my implementation:

In [47]:

d1 = shortest_path_dijkstra(lattice, 0)
d1

And here's the result from NetworkX:

In [48]:

d2 = nx.shortest_path_length(lattice, 0)
d2

They are the same:

In [49]:

d1 == d2

Exercise: In a ring lattice with n=1000 and k=10, which node is farthest from 0 and how far is it? Use shortest_path_dijkstra to check your answer.

Note: the maximum distance between two nodes is the diameter of the graph.

In [50]:

# Solution goes here

Exercises

Exercise: In a ring lattice, every node has the same number of neighbors. The number of neighbors is called the degree of the node, and a graph where all nodes have the same degree is called a regular graph.

All ring lattices are regular, but not all regular graphs are ring lattices. In particular, if k is odd, we can't construct a ring lattice, but we might be able to construct a regular graph.

Write a function called make_regular_graph that takes n and k and returns a regular graph that contains n nodes, where every node has k neighbors. If it's not possible to make a regular graph with the given values of n and k, the function should raise a ValueError.

In [51]:

# Solution goes here

In [52]:

# Solution goes here

In [53]:

# Solution goes here

In [54]:

# Solution goes here

Exercise: My implementation of reachable_nodes_bfs is efficient in the sense that it is in $O(n + m)$ , but it incurs a lot of overhead adding nodes to the queue and removing them. NetworkX provides a simple, fast implementation of BFS, available from the NetworkX repository on GitHub.

Here is a version I modified to return a set of nodes:

In [55]:

def plain_bfs(G, start):
    """A fast BFS node generator"""
    seen = set()
    nextlevel = {start}
    while nextlevel:
        thislevel = nextlevel
        nextlevel = set()
        for v in thislevel:
            if v not in seen:
                seen.add(v)
                nextlevel.update(G[v])
    return seen

Compare this function to reachable_nodes_bfs and see which is faster. Then see if you can modify this function to implement a faster version of shortest_path_dijkstra

In [56]:

# Solution goes here

In [57]:

# Solution goes here

In [58]:

# Solution goes here

In [59]:

# Solution goes here

In [60]:

# Solution goes here

In [61]:

# Solution goes here

In [62]:

# Solution goes here

In [63]:

# Solution goes here

Exercise: The following implementation of a BFS contains two performance errors. What are they? What is the actual order of growth for this algorithm?

In [64]:

def bfs(G, start):
    """Breadth-first search on a graph, starting at top_node."""
    visited = set()
    queue = [start]
    while len(queue):
        curr_node = queue.pop(0)    # Dequeue
        visited.add(curr_node)

        # Enqueue non-visited and non-enqueued children
        queue.extend(c for c in G[curr_node]
                     if c not in visited and c not in queue)
    return visited

In [65]:

# Solution goes here

Exercise: In the book, I claimed that Dijkstra's algorithm does not work unless it uses BFS. Write a version of shortest_path_dijkstra that uses DFS and test it on a few examples to see what goes wrong.

In [66]:

# Solution goes here