📚 The CoCalc Library - books, templates and other resources
License: OTHER
Scale-Free Networks
Code examples from Think Complexity, 2nd edition.
Copyright 2016 Allen Downey, MIT License
Graphs
To represent social networks, we'll use nx.Graph
, the graph representation provided by NetworkX.
Each person is represented by a node. Each friendship is represented by an edge between two nodes.
Here's a simple example with 4 people:
The number of friends a person has is the number of edges that connect to their node, which is the "degree" of the node.
We are often intereted in the "degree distribution" of a graph, which is the number of people who have 0 friends, the number who have 1 friend, and so on.
The following function extracts a list of degrees, one for each node in a graph.
Here's the result for the small example.
I'll use Pmf
from thinkstats2
to make a probability mass function.
And thinkplot
to display it.
Exercise: Add another node or nodes to the graph above, and add a few edges. Plot the new degree distribution.
Facebook data
The following function reads a file with one edge per line, specified by two integer node IDs.
We'll read the Facecook data downloaded from SNAP
To see how popular "you" are, on average, we'll draw a random sample of 1000 people.
For each "you" in the sample, we'll look up the number of friends.
To plot the degree distribution, I'll use EstimatedPdf
, which computes a smooth Probability Density Function that fits the data.
Now what if, instead of "you", we choose one of your friends, and look up the number of friends your friend has.
Here's the degree distribution for your friend's friends:
The bulk of the distribution is wider, and the tail is thicker. This difference is reflected in the means:
And we can estimate the probability that your friend has more friends than you.
Power law distributions
As we'll see below, the degree distribution in the Facebook data looks, in some ways, like a power law distribution. To see what that means, we'll look at the Zipf distribution, which has a power law tail.
Here's a sample from a Zipf distribution.
Here's what the PMF looks like.
Here it is on a log-x scale.
And on a log-log scale.
On a log-log scale, the PMF of the Zipf distribution looks like a straight line (until you get to the extreme tail, which is discrete and noisy.
For comparison, let's look at the Poisson distribution, which does not have a power law tail. I'll choose the Poisson distribution with the same mean as the sample from the Zipf distribution.
Here's the PMF on a log-log scale. It is definitely not a straight line.
So this gives us a simple way to test for power law behavior. If you plot the PMF on a log-log scale, and the result is a straight line, they is evidence of power law behavior.
This test is not entirely reliable; there are better options. But it's good enough for an initial exploration.
Barabási and Albert
Let's see what the degree distribution for the Facebook data looks like on a log-log scale.
For degrees greater than 10, it resembles the Zipf sample (and doesn't look much like the Poisson sample).
We can estimate the parameter of the Zipf distribution by eyeballing the slope of the tail.
Here's a simplified version of the NetworkX function that generates BA graphs.
And here's the function that generates a random subset without repetition.
I'll generate a BA graph with the same number of nodes and edges as the Facebook data:
Providing a random seed means we'll get the same graph every time.
The number of edges is pretty close to what we asked for.
So the mean degree is about right.
The standard deviation of degree is pretty close; maybe a little low.
Let's take a look at the degree distribution.
Looking at the PMFs on a linear scale, we see one difference, which is that the BA model has no nodes with degree less than k
, which is 22.
If we look at the PMF on a log-log scale, the BA model looks pretty good for values bigger than about 20. And it seems to follow a power law.
Cumulative distributions
Cumulative distributions are a better way to visualize distributions. The following function shows what a cumulative probability is:
The total probability for all values up to and including 11 is 0.258, so the 25th percentile is about 11.
The median degree is about 25.
And the 75th percentile is about 57. That is, about 75% of users have 57 friends or fewer.
thinkstats2
provides Cdf
, which computes cumulative distribution functions.
Here are the degree CDFs for the Facebook data, the WS model, and the BA model.
If we plot them on a log-x scale, we get a sense of how well the models fit the central part of the distribution.
The BA model is ok for values above the median, but not very good for smaller values.
On a log-log scale, we see that the BA model fits the tail of the distribution reasonably well.
But there is certainly room for a model that does a better job of fitting the whole distribution.