# Network basics 2

## Neighborhoods or cliques

The nodes linked to a certain node are called the neighbors or link-neighbors of the node. The node including its link-neighbors is called a neighborhood or clique. If a node $$i$$ has a neighborhood of size $$k_i$$the maximal number of potential links between them is $$l_{max}=\frac{k_i(k_i-1)}2$$

## Clustering coefficient

The clustering coefficient of a node is defined as the probability that two randomly selected neighbors of this node are neighbors with each other. It is the fraction of pairs of the node's neighbors that are connected to each other.

The ratio of the actual to the maximal number of links in a neighborhood defines the local clustering coefficient. If $$n_i$$ is the number of actual links in a neighborhood, then the local clustering coefficient $$C_l$$ is

$C_l = \frac{2n_i}{k_i(k_i-1)}$

The global clustering coefficient C (or just clustering coefficient) is the average of all local clustering coefficients. It expresses the probability that a node i being connected to a node j is also connected to a node k in its neighborhood.

## Triadic closure and Dunbar's number

Triadic closure refers to the probability that unconnected nodes, which have something in common, will tend to develop a connection. Two people for instance, who have a friend in common, have an increased likelihood that they will become friends themselves at some point in the future. This will increase the clustering coefficient of their social network. And in general it will make their social network grow.

Granovetter, Mark (1973). The strength of weak ties. American Journal of Sociology 78,1360-1380

However, at least for humans the capacity for maintaining contacts is limited. Neurophysiologists estimate that humans are able to maintain not much more than 150 social contacts lastingly. This number is called Dunbar's number. It counteracts the enlarging force of triadic closure by cutting links that exceed the maintaining capacity. In consequence, a particular form of social networks emerges with clusters of nodes that are more densely connected within the cluster, but manage to maintain some outward links to other clusters. Such networks are called Small world networks. they have the feature of having relative high clustering coefficients but low average path length.

Dunbar, Robin I.M. (1992): Neocortex size as a constraint on group size in primates; in: Journal of Human Evolution 22: 469-493.

## Average path length

The path length (or geodesic) is the shortest distance $$d$$ between two nodes $$v_i$$and $$v_j$$.
If $$d(v_i, v_j)=0$$ this means that $$v_i=v_j$$ (or that $$v_i$$ can't be reached from $$v_j$$. From this, the average path length of a graph G can be defined as

$apl_G=\frac{1}{n(n-1)} \sum \limits_{i-j} d(v_i, v_j)$

The longest geodesic in a graph is called the diameter of the network. It consists simply of the largest value for $$d(v_i,v_j)$$.

## Small world networks

Small world networks stand out with rather low overall network density interrupted by regions of high density that are connected with short paths to other regions with high density. Formally, they are defined by a concurrency of high clustering coefficient and low average path length (Watts/Strogatz 1998, Watts 1999).

The name derives from a series of famous experiments conducted by the psychologist Stanley Milgram, in which a group of probands from the US-midwest were asked to forward letters to a person in Boston. The address of this person was withheld and people, who did not by chance know the target themselves, were asked to forward the letter to a personally known contact who was more likely to know the target. If this contact then again did not know the target, she or he was asked to repeat the same procedure. On average, it turned out that letters needed 5 to 6 hops to reach the target, which gave rise to the assumption that the US-population is connected via six degrees of separation.

Milgram, St. (1961). The small world problem; in: Psychology Today, 1/1961
Sandberg, O. (2005). Searching in a Small World. Göteborg, Sweden

In 1998, the assumption of a relatively small average number of links connecting the world's population by personal acquaintance was confirmed by a survey using email-chains on the internet (Watts 1999). Meanwhile several other surveys, among others using Short Messenger Services and Tweets, have been conducted, all of them giving rise to describe the social structure of modern society as having the form of a small world network.

Particular instructive cases for small world networks are the collaborations networks of mathematicians (see The Erdös Number Project) and of actors (see The Oracle of Bacon)

Small world networks are not restricted to social issues. Many natural systems show the typical characteristics of high clustering coefficient and low average path length. It has been hypothesized that the prevalence of small world networks in biological systems may reflect an evolutionary advantage of such an architecture (Barabasi 2003). One possibility is that small world networks are more robust to perturbations than other network architectures. If this were the case, it would provide an advantage to biological systems that are subject to damage by mutation or viral infection.

### Rewiring

Following a suggestion of Duncan J. Watts and Steven H. Strogatz (1998) Small World network models can be generated with a technique called rewiring. One variant of this technique arranges the nodes of a network in a circle and connects each of them with their next two neighbors to the left and to the right (left image below). Starting from this initial state the links then are randomly "rewired" with a probability $$p$$, implying that existing links are replaced by random generated new connections to other nodes (see image 3b). Rewiring with $$p = 0$$ results in a network of highly linked neighborhoods (cliques) without any long-range connections. The degree distribution is uniformly at 4, and the network has a high clustering coefficient ($$cc$$) and a high average path length ($$apl$$). (the left-most 50 node-network below has a $$cc$$ of 0.5 and an $$apl$$ of 6.633).

Rewiring with $$p = 1$$ on the other hand, implies near global connectivity, meaning that all links are replaced with randomly generated new links. Such networks stand out with little or no neighborhood-clusters, that is, with low $$cc$$. And they own an abundance of long-range connections, implying a low $$apl$$ (the right-most network above has a $$cc$$ of 0.08 and an $$apl$$ of 2.946)

High cc and high apl at $$p = 0$$ and low $$cc$$ and low $$apl$$ at $$p = 1$$ give reason to assume that with increasing rewiring-probability networks show a linear and combined decrease of $$cc$$ and $$apl$$. They ought to develop from highly clustered and rather locally connected networks to loosely or not at all clustered and rather globally connected networks. However, as Watts and Strogatz found out, $$cc$$ and $$apl$$ do not develop conjointly. As can be seen in the plot below, $$apl$$ declines faster then $$cc$$ when $$p$$ rises, implying a $$p$$ in the interval {0, 1} with a maximal difference of $$apl$$ and $$cc$$. This network with low average path length and high clustering coefficient is the Small World network, harboring a topology that is known for its efficiency in many fields of networked instances, such as the synchronization of coupled phase oscillators or the spreading of diseases (cf. Strogatz 2004).

Watts, Duncan J. / Strogatz, Steven (1998): Collective Dynamics of 'Small-World' Networks; in: Nature 393, p. 440-442.

Strogatz, Steven (2004): Sync. How Order emerges from Chaos in the Universe, Nature and Daily Life. New York: Hyperion.