Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download

📚 The CoCalc Library - books, templates and other resources

132923 views
License: OTHER
Kernel: Python 3

Erdos-Renyi Graphs

Code examples from Think Complexity, 2nd edition.

Copyright 2016 Allen Downey, MIT License

%matplotlib inline import matplotlib.pyplot as plt import networkx as nx import numpy as np import seaborn as sns from utils import decorate, savefig # I set the random seed so the notebook # produces the same results every time. np.random.seed(17) # TODO: remove this when NetworkX is fixed from warnings import simplefilter import matplotlib.cbook simplefilter("ignore", matplotlib.cbook.mplDeprecation)
# node colors for drawing networks colors = sns.color_palette('pastel', 5) #sns.palplot(colors) sns.set_palette(colors)

Directed graph

The first example is a directed graph that represents a social network with three nodes.

G = nx.DiGraph() G.add_node('Alice') G.add_node('Bob') G.add_node('Chuck') list(G.nodes())

Here's how we add edges between nodes.

G.add_edge('Alice', 'Bob') G.add_edge('Alice', 'Chuck') G.add_edge('Bob', 'Alice') G.add_edge('Bob', 'Chuck') list(G.edges())

And here's how to draw the graph.

nx.draw_circular(G, node_color='C0', node_size=2000, with_labels=True) plt.axis('equal') savefig('figs/chap02-1')

Exercise: Add another node and a few more edges and draw the graph again.

# Solution goes here

Undirected graph

The second example is an undirected graph that represents cities and the driving times between them.

positions is a dictionary that maps from each city to its coordinates.

positions = dict(Albany=(-74, 43), Boston=(-71, 42), NYC=(-74, 41), Philly=(-75, 40)) positions['Albany']

We can use the keys in pos to add nodes to the graph.

G = nx.Graph() G.add_nodes_from(positions) G.nodes()

drive_times is a dictionary that maps from pairs of cities to the driving times between them.

drive_times = {('Albany', 'Boston'): 3, ('Albany', 'NYC'): 4, ('Boston', 'NYC'): 4, ('NYC', 'Philly'): 2}

We can use the keys from drive_times to add edges to the graph.

G.add_edges_from(drive_times) G.edges()

Now we can draw the graph using positions to indicate the positions of the nodes, and drive_times to label the edges.

nx.draw(G, positions, node_color='C1', node_shape='s', node_size=2500, with_labels=True) nx.draw_networkx_edge_labels(G, positions, edge_labels=drive_times) plt.axis('equal') savefig('figs/chap02-2')

Exercise: Add another city and at least one edge.

# Solution goes here

Complete graph

To make a complete graph, we use a generator function that iterates through all pairs of nodes.

def all_pairs(nodes): for i, u in enumerate(nodes): for j, v in enumerate(nodes): if i < j: yield u, v

make_complete_graph makes a Graph with the given number of nodes and edges between all pairs of nodes.

def make_complete_graph(n): G = nx.Graph() nodes = range(n) G.add_nodes_from(nodes) G.add_edges_from(all_pairs(nodes)) return G

Here's a complete graph with 10 nodes:

complete = make_complete_graph(10) complete.number_of_nodes()

And here's what it looks like.

nx.draw_circular(complete, node_color='C2', node_size=1000, with_labels=True) savefig('figs/chap02-3')

The neighbors method the neighbors for a given node.

list(complete.neighbors(0))

Exercise: Make and draw complete directed graph with 5 nodes.

# Solution goes here

Random graphs

Next we'll make a random graph where the probability of an edge between each pair of nodes is pp.

The helper function flip returns True with probability p and False with probability 1-p

def flip(p): return np.random.random() < p

random_pairs is a generator function that enumerates all possible pairs of nodes and yields each one with probability p

def random_pairs(nodes, p): for edge in all_pairs(nodes): if flip(p): yield edge

make_random_graph makes an ER graph where the probability of an edge between each pair of nodes is p.

def make_random_graph(n, p): G = nx.Graph() nodes = range(n) G.add_nodes_from(nodes) G.add_edges_from(random_pairs(nodes, p)) return G

Here's an example with n=10 and p=0.3

np.random.seed(10) random_graph = make_random_graph(10, 0.3) len(random_graph.edges())

And here's what it looks like:

nx.draw_circular(random_graph, node_color='C3', node_size=1000, with_labels=True) savefig('figs/chap02-4')

Connectivity

To check whether a graph is connected, we'll start by finding all nodes that can be reached, starting with a given node:

def reachable_nodes(G, start): seen = set() stack = [start] while stack: node = stack.pop() if node not in seen: seen.add(node) stack.extend(G.neighbors(node)) return seen

In the complete graph, starting from node 0, we can reach all nodes:

reachable_nodes(complete, 0)

In the random graph we generated, we can also reach all nodes (but that's not always true):

reachable_nodes(random_graph, 0)

We can use reachable_nodes to check whether a graph is connected:

def is_connected(G): start = next(iter(G)) reachable = reachable_nodes(G, start) return len(reachable) == len(G)

Again, the complete graph is connected:

is_connected(complete)

But if we generate a random graph with a low value of p, it's not:

random_graph = make_random_graph(10, 0.1) len(random_graph.edges())
is_connected(random_graph)

Exercise: What do you think it means for a directed graph to be connected? Write a function that checks whether a directed graph is connected.

# Solution goes here

Probability of connectivity

Now let's estimare the probability that a randomly-generated ER graph is connected.

This function takes n and p, generates iters graphs, and returns the fraction of them that are connected.

# version with a for loop def prob_connected(n, p, iters=100): count = 0 for i in range(iters): random_graph = make_random_graph(n, p) if is_connected(random_graph): count += 1 return count/iters
# version with a list comprehension def prob_connected(n, p, iters=100): tf = [is_connected(make_random_graph(n, p)) for i in range(iters)] return np.mean(tf)

With n=10 and p=0.23, the probability of being connected is about 33%.

np.random.seed(17) n = 10 prob_connected(n, 0.23, iters=10000)

According to Erdos and Renyi, the critical value of p for n=10 is about 0.23.

pstar = np.log(n) / n pstar

So let's plot the probability of connectivity for a range of values for p

ps = np.logspace(-1.3, 0, 11) ps

I'll estimate the probabilities with iters=1000

ys = [prob_connected(n, p, 1000) for p in ps] for p, y in zip(ps, ys): print(p, y)

And then plot them, adding a vertical line at the computed critical value

plt.axvline(pstar, color='gray') plt.plot(ps, ys, color='green') decorate(xlabel='Prob of edge (p)', ylabel='Prob connected', xscale='log') savefig('figs/chap02-5')

We can run the same analysis for a few more values of n.

ns = [300, 100, 30] ps = np.logspace(-2.5, 0, 11) sns.set_palette('Blues_r', 4) for n in ns: print(n) pstar = np.log(n) / n plt.axvline(pstar, color='gray', alpha=0.3) ys = [prob_connected(n, p) for p in ps] plt.plot(ps, ys, label='n=%d' % n) decorate(xlabel='Prob of edge (p)', ylabel='Prob connected', xscale='log', xlim=[ps[0], ps[-1]], loc='upper left') savefig('figs/chap02-6')

As n increases, the critical value gets smaller and the transition gets more abrupt.

Exercises

Exercise: In Chapter 2 we analyzed the performance of reachable_nodes and classified it in O(n+m)O(n + m), where nn is the number of nodes and mm is the number of edges. Continuing the analysis, what is the order of growth for is_connected?

def is_connected(G): start = list(G)[0] reachable = reachable_nodes(G, start) return len(reachable) == len(G)
# Solution goes here

Exercise: In my implementation of reachable_nodes, you might be bothered by the apparent inefficiency of adding all neighbors to the stack without checking whether they are already in seen. Write a version of this function that checks the neighbors before adding them to the stack. Does this "optimization" change the order of growth? Does it make the function faster?

def reachable_nodes_precheck(G, start): # FILL THIS IN return []
# Solution goes here
%timeit len(reachable_nodes(complete, 0))
%timeit len(reachable_nodes_precheck(complete, 0))

Exercise: There are actually two kinds of ER graphs. The one we generated in the chapter, G(n,p)G(n, p), is characterized by two parameters, the number of nodes and the probability of an edge between nodes.

An alternative definition, denoted G(n,m)G(n, m), is also characterized by two parameters: the number of nodes, nn, and the number of edges, mm. Under this definition, the number of edges is fixed, but their location is random.

Repeat the experiments we did in this chapter using this alternative definition. Here are a few suggestions for how to proceed:

  1. Write a function called m_pairs that takes a list of nodes and the number of edges, mm, and returns a random selection of mm edges. A simple way to do that is to generate a list of all possible edges and use random.sample.

  2. Write a function called make_m_graph that takes nn and mm and returns a random graph with nn nodes and mm edges.

  3. Make a version of prob_connected that uses make_m_graph instead of make_random_graph.

  4. Compute the probability of connectivity for a range of values of mm.

How do the results of this experiment compare to the results using the first type of ER graph?

# Solution goes here
# Solution goes here
# Solution goes here