📚 The CoCalc Library - books, templates and other resources
License: OTHER
The Friendship Paradox
Code examples from Think Complexity, 2nd edition
Copyright 2016 Allen Downey, MIT License
BA graphs
In "Why Your Friends Have More Friends than You Do", Scott L. Feld explains the "friendship paradox": if you choose one of your friends at random, the chances are high that your friend has more friends than you.
In this notebook, we'll explore this effect in a random Barabasi-Albert graph and in a small dataset from Facebook.
First, I'll generate a BA graph:
The following generator iterates through the nodes of a graph.
This function generates a sample of nodes.
Now let's confirm that sample_nodes
generates the right degree distribution.
It does.
Sampling friends
Now let's generate all the "friends" by iterating through the nodes and their friends:
And let's sample friends by choosing a random node and then a random friend.
In Feld's article, he does something a little different: he chooses a random edge and then chooses one of the endpoints:
Let's see if all of these generators produce the same distribution:
It looks like they do, at least approximately.
And, as expected, the distribution we get when we sample friends (either way) is different from what we get when we sample nodes.
Facebook data
Now let's run the same analysis on the Facebook dataset.
Once again, the degree distribution is the same whether we enumerate all nodes or sample them.
But now we get something I didn't expect. We get two different degree distributions for "friends":
If we sample edges, as Feld did, or if we enumerate all friends, we get one distribution.
If we sample by choosing a node and then a friend, we get another distribition.
Analysis
We can compute the distribution of degree by modeling the edge sampling process.
And confirm that the sample matches the computed distribution.
We can also think of this distribution as a biased view of the degree distribution, where each node is overrepresented proportional to its degree.
And again, that agrees with the sample.
We can also compute the distribution that results from the friend sampling process.
And confirm that it agrees with the friend sample.
So it looks like we have two interpretations of the friendship paradox, which are operationalized by two different sampling processes.
Also, the sampling processes yield the same degree distribution for some graphs, like the BA model, but not for others, like the Facebook dataset.
Questions this raises:
Which process better quantifies the friendship paradox? Are there other metrics we should compute, other than degree distributions?
Why are the results different for these two graphs?
How do the results differ for other graphs?
Are there metrics we can compute directly based on graph properties, rather than by sampling?
Degree correlation
One property that might vary from graph to graph, and affect our results, is the correlation between the degrees of adjacent nodes.
Here's how we can compute it:
The BA graph has relatively high correlation.
The Facebook network has lower correlation.
Friends of friends
Another exploration that might be interesting: how does all of this affect the distribution for friends of friends?