Jupyter notebook M725F17/05 - Graph Coloring/Graph Coloring.ipynb

M725F17/05 - Graph Coloring/Graph Coloring.ipynb

¹⁵¹⁶ views

Kernel: Python 2 (SageMath)

Graph Coloring

by Nathan Albin <[email protected]>
created 2/8/2017, updated 2/12/2017

The graph coloring problem

Graph coloring is a famous type of problem in graph theory. The most basic problem of this type is the following. Suppose you are given a network $G=(V,E)$ with vertices $V$ and edges $E$ along with $k$ different colors. Is it possible to assign a color to each vertex in such a way that no two vertices connected by an edge have the same color. For example, a triangle (the complete graph on 3 vertices, $K_3$ ) can be colored with $k\ge 3$ but not with $k<3$ . The smallest number of colors required to color a given graph $G$ is called its chromatic number, $\chi(G)$ . So $\chi(K_3)=3$ . Also, a graph $G$ is bipartite if and only if $\chi(G)=2$ . The famous Four Color Theorem states that every planar graph can be colored with $4$ colors.

One obvious upper bound on the chromatic number of a general graph is $\chi(G) \le |V|$ , just color each vertex a unique color.

A slightly better bound is $\chi(G) \le \max_{v\in V}\deg(v)+1,$ that is, the chromatic number of a graph is no larger than 1 plus the maximum degree of any vertex in $G$ . One way to prove this is via a simple algorithm.

Basic algorithm:

Whenever you must choose a color for a vertex that is still blank, simply pick a color that is different from any colors already assigned to the neighbors.

To begin, below we implement this basic algorithm. Read the code below line by line and make sure you understand what is happening. If you are unsure, don't hesitate to insert a new cell from the toolbar under the word JUPYTER above, and print out some of the objects that are being called.

In [1]:

%matplotlib inline
import matplotlib.pyplot as plt
import networkx as nx

In [2]:

# Create a random graph.  The "seed" ensures reproducibility, which is handy when
# first testing code.  Change the seed to change the graph.
seed = 818829
n = 10
p = 0.3
G = nx.gnp_random_graph(n,p,seed)

# draw the graph
pos = nx.spring_layout(G)
plt.figure(figsize=(5,5))
nx.draw(G,pos,node_color='white',width=2)

# get the maximum vertex degree
max_deg = max( G.degree().values() )
print 'maximum degree = {}'.format(max_deg)

Out[2]:

maximum degree = 5

<matplotlib.figure.Figure at 0x7f059c082d50>

In [15]:

# Create the set of possible colors.
num_colors = max_deg+1
colors = set(xrange(num_colors))

# We'll store the color of each node as an integer property called 'color'.  Start by initializing
# all the colors to -1 (meaning not yet colored)
for v in G.nodes_iter():
    G.node[v]['color'] = -1
    
# loop over all nodes
for v in G.nodes_iter():
    
    print 'processing node {}'.format(v)
    
    # copy the possible colors
    possible = colors.copy()
    
    # find colors of neighbors
    nbrs = set( [ G.node[w]['color'] for w in G.neighbors_iter(v) if G.node[w]['color'] <> -1 ] )
    
    # remove neighbor colors from the set of possible colors
    possible -= nbrs
    
    # pick a color for the current node
    c = next(iter(possible))
    
    print '  neighbor colors: ', nbrs
    print '  possible colors: ', possible
    print '  selected color:  ', c
    
    #update the color
    G.node[v]['color'] = c

Out[15]:

processing node 0
  neighbor colors:  set([])
  possible colors:  set([0, 1, 2, 3, 4, 5])
  selected color:   0
processing node 1
  neighbor colors:  set([0])
  possible colors:  set([1, 2, 3, 4, 5])
  selected color:   1
processing node 2
  neighbor colors:  set([1])
  possible colors:  set([0, 2, 3, 4, 5])
  selected color:   0
processing node 3
  neighbor colors:  set([0])
  possible colors:  set([1, 2, 3, 4, 5])
  selected color:   1
processing node 4
  neighbor colors:  set([1])
  possible colors:  set([0, 2, 3, 4, 5])
  selected color:   0
processing node 5
  neighbor colors:  set([0])
  possible colors:  set([1, 2, 3, 4, 5])
  selected color:   1
processing node 6
  neighbor colors:  set([1])
  possible colors:  set([0, 2, 3, 4, 5])
  selected color:   0
processing node 7
  neighbor colors:  set([0])
  possible colors:  set([1, 2, 3, 4, 5])
  selected color:   1
processing node 8
  neighbor colors:  set([1])
  possible colors:  set([0, 2, 3, 4, 5])
  selected color:   0
processing node 9
  neighbor colors:  set([0])
  possible colors:  set([1, 2, 3, 4, 5])
  selected color:   1
processing node 10
  neighbor colors:  set([0, 1])
  possible colors:  set([2, 3, 4, 5])
  selected color:   2

In [4]:

plt.figure(figsize=(5,5))
nx.draw(G,pos,node_color=[d['color'] for v,d in G.nodes_iter(data=True)],width=2,cmap=plt.cm.Set3)

Out[4]:

<matplotlib.figure.Figure at 0x7f05937edc90>

In [5]:

print G.nodes(data=True)

Out[5]:

[(0, {'color': 0}), (1, {'color': 0}), (2, {'color': 1}), (3, {'color': 1}), (4, {'color': 2}), (5, {'color': 2}), (6, {'color': 3}), (7, {'color': 1}), (8, {'color': 1}), (9, {'color': 0})]

Exercise 1

In the preceding code, notice that I used the Python object called a set. In a sentence or two, explain what properties of a set make it more beneficial than a list for the task at hand. Type your answer where indicated below. (Hint: If you don't know, try asking Google.)

Answer

It is easier to implement set where we need to check uniqness of elements and we need to chack a new element is in the set or not. It is more suitable for set operations, for example union, intersection etc.

A little better visualization

For some graphs, it can be a little tricky to see whether or not the coloring is correct. Here's another way to layout node positions to help with the visualization. With this layout, you only need to verify that there aren't any vertical edges.

In [14]:

def draw_coloring(G,num_colors):
    '''Visualizes a graph coloring.
    
    Inputs:
      G          : a NetworkX graph object with the 'color' property
                   set to an integer between 0 and num_colors-1
                   on each node
      num_colors : the number of colors in the coloring
    
    '''

    # keep track of the number of nodes for each color
    counts = { j:0 for j in xrange(num_colors) }
    
    # initialize an empty position dictionary
    pos = {}
    
    # loop over the nodes
    for v in G.nodes_iter():
        
        # determine the position from the color
        c = G.node[v]['color']
        pos[v] = (c,counts[c])
        
        # update the color count
        counts[c] += 1
        
    # draw the graph
    plt.figure(figsize=(10,5))
    plt.subplot(1,2,1)
    nx.draw(G,node_color=[d['color'] for v,d in G.nodes_iter(data=True)],width=1,edge_color='gray',cmap=plt.cm.Set3)
    plt.subplot(1,2,2)
    nx.draw(G,pos,node_color=[d['color'] for v,d in G.nodes_iter(data=True)],width=1,edge_color='gray',cmap=plt.cm.Set3)
    
# visualize the coloring
draw_coloring(G,num_colors)

Out[14]:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-14-a4862ca21c66> in <module>()
     34 
     35 # visualize the coloring
---> 36 draw_coloring(G,num_colors)

<ipython-input-14-a4862ca21c66> in draw_coloring(G, num_colors)
     20 
     21         # determine the position from the color
---> 22         c = G.node[v]['color']
     23         pos[v] = (c,counts[c])
     24 
KeyError: 'color'

Improving the Bound

It's not too hard to see that the maximum degree bound is overly pessimistic. Here's an example.

In [13]:

# create a wheel graph
n = 11
G = nx.cycle_graph(n-1)
pos = nx.circular_layout(G)
G.add_node(n-1)
G.add_edges_from([(n-1,i) for i in xrange(n-1)])
pos[n-1] = (0,0)

# draw graph
plt.figure(figsize=(5,5))
nx.draw(G,pos,node_color='white',width=2)

Out[13]:

<matplotlib.figure.Figure at 0x7f05915ae650>

The graph above has chromatic number 3, which is much smaller than 11, the bound given by the maximum degree. To improve the estimate of $\chi(G)$ , let's start by sorting the nodes according to degree. The first step is to create a list with entries (deg(a),a).

In [8]:

unsorted_degrees = [ (G.degree(v),v) for v in G.nodes_iter()]
print unsorted_degrees

Out[8]:

[(3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (10, 10)]

Now, let's sort from smallest degree to largest. (Note that this is why we chose the ordering (deg(v),v) rather than (v,deg(v)).) The sorting algorithm compares the first entries of each tuple, then the second, and so on.

In [9]:

sorted_degrees = sorted(unsorted_degrees)
print sorted_degrees

Out[9]:

[(3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (10, 10)]

If we now color the nodes in this order, we might expect to do better than the max-degree-plus-one bound. By starting with small-degree nodes first, we hope to minimize the number of color conflicts as we choose the color of later nodes. The following code shows the first few steps.

In [10]:

# initialize node colors to -1 (currently uncolored)
for v in G.nodes_iter():
    G.node[v]['color'] = -1
    
# look at the first node in the sorted list
# (The double-underscore '__' is just a dummy variable.
# We needed the degrees to sort the nodes, but now we can
# discard that information.)
__,v = sorted_degrees[0]
print 'processing node {}'.format(v)

# form a list of the colors already assigned to neighbors of this node.
# (of course, there won't be any.)
nbrs = set( [ G.node[w]['color'] for w in G.neighbors_iter(v) if G.node[w]['color'] != -1] )
print '  neighbor colors: ', nbrs

# find the first color 0, 1, 2, ... not present in the current list
new_color = 0
while new_color in nbrs:
    new_color += 1
print '  new node color:  ', new_color

# color the new node
G.node[v]['color'] = new_color

# look at the next node in the list
__,v = sorted_degrees[1]
print 'processing node {}'.format(v)

# form a list of the colors already assigned to neighbors of this node.
nbrs = set( [ G.node[w]['color'] for w in G.neighbors_iter(v) if G.node[w]['color'] != -1] )
print '  neighbor colors: ', nbrs

# find the first color 0, 1, 2, ... not present in the current list
new_color = 0
while new_color in nbrs:
    new_color += 1
print '  new node color:  ', new_color

# color the new node
G.node[v]['color'] = new_color

Out[10]:

processing node 0
  neighbor colors:  set([])
  new node color:   0
processing node 1
  neighbor colors:  set([0])
  new node color:   1

Exercise 2

Complete the following function to compute a graph coloring. Be sure to read the function's docstring so you know what the function is supposed to do.

In [21]:

def color_graph(G):
    '''Finds a coloring of the graph G.
    
    Inputs:
      G     : a NetworkX graph
      
    Output:
      num_colors : an integer representing the number of colors used
      
    Side effects:
      To aid in debugging, the function will output some information as
      it processes the network.  For a real application, you'd probably
      want to remove this, or make it optional.
    
      When this function completes, each node of G will have its 'color'
      attribute set to an integer between 0 and num_colors-1, indicating
      this nodes color in the coloring.
    
    '''
    
    # initialize node colors to -1 (currently uncolored)
    for v in G.nodes_iter():
     G.node[v]['color'] = -1
        
    # initialize number of colors
    num_colors = 0
        
    # sort nodes by degree
    unsorted_degrees = [ (G.degree(v),v) for v in G.nodes_iter()]
    sorted_degrees = sorted(unsorted_degrees)
    
    # loop over nodes
    for __,v in sorted_degrees:
        
        print 'processing node {}'.format(v)

        # form a list of the colors already assigned to neighbors of this node.
        nbrs = set( [ G.node[w]['color'] for w in G.neighbors_iter(v) if G.node[w]['color'] <> -1 ] )
    
        print '  neighbor colors: ', nbrs

        # find the first color 0, 1, 2, ... not present in the current list
        new_color = 0
        while new_color in nbrs:
            new_color += 1
        print '  new node color:  ', new_color

        # color the new node
        c = next(iter(possible))

        # update the number of colors
        num_colors = max(num_colors,new_color+1)
        
    return num_colors

When you're finished, your code should be able to color the wheel with 3 colors.

In [18]:

num_colors = color_graph(G)
draw_coloring(G,num_colors)

Out[18]:

processing node 0
  neighbor colors:  set([0, 1])
  new node color:   2
processing node 1
  neighbor colors:  set([0, 1])
  new node color:   2
processing node 2
  neighbor colors:  set([0, 1])
  new node color:   2
processing node 3
  neighbor colors:  set([0, 1])
  new node color:   2
processing node 4
  neighbor colors:  set([0, 1])
  new node color:   2
processing node 5
  neighbor colors:  set([0, 1])
  new node color:   2
processing node 6
  neighbor colors:  set([0, 1])
  new node color:   2
processing node 7
  neighbor colors:  set([0, 1])
  new node color:   2
processing node 8
  neighbor colors:  set([0, 1])
  new node color:   2
processing node 9
  neighbor colors:  set([0, 1])
  new node color:   2
processing node 10
  neighbor colors:  set([0, 1])
  new node color:   2

<matplotlib.figure.Figure at 0x7f05935d1c50>

Some tests

Your code will be compared with a correct solution to the following graphs. Here's how you can try to check your code yourself.

Save your notebook!
In 'Kernel' menu above, select 'Restart Kernel' and click the red 'Restart button'.
Why? The kernel is a persistent Python interpreter that serves the Jupyter notebook. Every time you run a Jupyter cell, the notebook server passes the Python commands to the interpreter, receives any response sent back, and formats it for your web browser. Every Python object or function you define is managed in memory by this Python interpreter. That's important to keep in mind because it means that Python variables and functions will persist in kernel memory even if you erase them (intentionally or accidentally) from the notebook. This can cause your code to work correctly in your current session, but fail to work later when a "fresh" kernel is started. By restarting the kernel right now, you can check that your code is not making use of kernel objects that are no longer defined in the notebook. If, after the restart, something goes wrong, read the error messages that appear. They will tell you what is missing from your notebook.
In the 'Cell' menu above, select 'Run All'. This will run all of your notebook's code, starting from the top, in your fresh new Python kernel. Check the output throughout the notebook, including in the tests below, to see if there are any errors. Also, make sure the results of the following tests look reasonable.

In [11]:

# graph parameters for the tests
graph_params = ( (8,1,1), (5,0.5,29), (10,0.3,818), (20,0.7,8817) )

# run the tests
for test_num, (n,p,seed) in enumerate(graph_params):
    
    print '*** Running Test {} ***'.format(test_num+1)
    print
    
    # create the graph
    G = nx.gnp_random_graph(n,p,seed)
    
    num_colors = color_graph(G)
    print 'num_colors = ', num_colors
    draw_coloring(G,num_colors)
    
    print
    print '*** Done. ***'
    print

Out[11]:

*** Running Test 1 ***

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-11-f049af473890> in <module>()
     11     G = nx.gnp_random_graph(n,p,seed)
     12 
---> 13     num_colors = color_graph(G)
     14     print 'num_colors = ', num_colors
     15     draw_coloring(G,num_colors)
NameError: name 'color_graph' is not defined

Wilf's Theorem

We end by making a connection with the spectral properties of the adjacency matrix. Recall that if $\mu_1$ is the largest eigenvalue of the adjacency matrix $A$ , then ${\rm deg}_{avg}\leq \mu_1.$ By modifying the basic algorithm slightly Wilf's Theorem shows that the largest eigenvalue bounds the chromatic number: $\chi(G)\leq \lceil \mu_1 \rceil.$ The notation $\lceil \mu_1 \rceil$ means "the ceiling function" of $\mu_1$ , namely the closest integer to $\mu_1$ from above.

Here is the idea of the proof. Suppose we find a way of assigning the numbers $\{1,\dots,N\}$ to the vertices, in such a way that for every $x\in[N]$ ( $[N]$ stands for $\{1,\dots,N\}$ ) has at most $\kappa$ neighbors with smaller label. Then we can run the basic algorithm with this ordering. Assign a color to node 1, than node 2, etc...then node $j$ has at most $\kappa$ neighbors that have already been colored, so it is enough to have $\kappa+1$ colors.

The claim is that it is always possible to order the nodes as above with $\kappa=\lfloor \mu_1 \rfloor$ ("the floor" of $\mu_1$ ).

Since ${\rm deg}_{avg}\leq \mu_1$ , there is at least one node $x_1\in V$ such that ${\rm deg}(x_1)\le \mu_1$ .
Assign the label $N$ to $x_1$
Consider the subgraph induced by $V\setminus{x_1}$ (remove $x_1$ from $V$ and remove all edges incident to $x_1$ from $E$ ).
In the HW you are asked to prove the $\mu_1$ also dominates the average degree of any induced subgraph of $G$ . So ${\rm deg}_{avg}(G(V\setminus\{x_1\}))\le \mu_1.$
In particular, one can pick a new node $x_2\in V\setminus\{x_1\}$ whose degree in $G(V\setminus\{x_1\})$ is less or equal $\mu_1$ .
Assign the label $N-1$ to $x_2$ , and repeat the process.
Note that the property of having at most $\mu_1$ neighbors with lower label is preserved at each step. So this produces the ordering that we were looking for.

Hoffman's Bound

There is also a lower bound for the chromatic numeber due to Hoffman: $\chi(G)\ge 1-\frac{\mu_1}{\mu_N}.$ The proof is more complicated and uses block partitioning of the adjacency matrix.

Note however, that when $G$ is bipartite, the lower bound is tight.

(Also note that we always have $\mu_N<0$ because ${\rm Tr}(A)=0$ .)

Optional Challenge Exercise

Implement Wilf's algorithm by adapting the routines described in this notebook.

In [ ]: