# Network basics 3

## Density

Density $$D$$ is another measure of connectedness in a network. It is defined as the actual number of links in a network $$L$$, expressed as proportion of the maximum possible number of links $$\frac{N (N - 1)}2$$, hence $$D=\frac{2L} {N(N-1)}$$. If density is close to 1.0 (or 100%), the network is said to be dense, otherwise it is sparse.

In a network with directed links, the maximum possible number of pairs is used instead.

Since $$D$$ is sensible to N, the number of nodes, it cannot be used for comparisons across networks that vary significantly in size.

## Centrality

The centrality of network nodes encompasses two levels: a local and a global one.

### Local (or degree) centrality

A node is locally central if - in comparison to other nodes - its number of links with other nodes is high. The central node in a star network for instance, has a local centrality of 1.0.

### Global centrality - Closeness

A node is globally central if it lies a short distance from many other nodes. Distance in this case refers to the shortest path (the geodesic) connecting two nodes (with a path possibly consisting of several links). A node hence is globally central if it is "close" (in terms of path length) to many other nodes in the network. Sometimes therefore, global centrality is also called closeness centrality. The following image shows a network with nodes having sizes proportional to their closeness centrality.

## Betweenness centrality

A node is central in terms of betweenness if it operates as a bridge on many shortest paths between other nodes in the network. Betweenness centrality hence is calculated as the number of shortest paths from all nodes to all others that pass through that node. It is a measure of how powerful this node is in transmitting or blocking the spread of information (or diseases etc.) on the fastest connection between two nodes. The following image shows the same network as above with nodes having sizes proportional to their betweenness centrality.

## Eigenvector centrality

Eigenvector centrality is a measure of the influence of a node in a network based on the number of connections this node has to other highly influential nodes in the network. Since the influence of these other nodes builds on the same principle, eigenvector centrality it is typically found through iterating initially randomly assigned scores on the principle that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes. The Page rank, as used in Google's search engine (see below), is an instance of eigenvector centrality. The following image shows the same network as above with nodes having sizes proportional to their eigenvector centrality.

## The Page rank

The Page rank, named after Larry Page, one of the founders of Google, was suggested in 1998. It is beased on the idea that information on the World Wide Web can be ordered in a hierarchy by "link popularity", that is, by ranking pages a.) in respect to the number of references (in the form of hyperlinks) from other pages (incoming links or indegree), and b.) in respect to the weight these other pages themselves gain from being referred by other pages. In other words, the relevance of an internet page - its rank in a query - is determined by the relevance of pages linking to it, with these pages in their turn finding their relevance in the same way.

Therefore, the Page rank implies circular processing. The fact that the relevance of a page can only be found in respect to the relevance of other pages, necessitates to find the relevance of these other pages beforehand - and this  applies to all further pages as well. Since there is no "first" page on the internet, which could bequeath its relevance to all other pages, the reference structure of the Page rank is circular and can only be processed iteratively (i.e. recursively). Google is doing this by assuming a random web surfer following the hyperlinks with the probabilities suggested by the link structure of the web. Physically, this surfer is represented by web crawlers, the GoogleBots, permanently screening the web for the chances of this surfer to land on particular web pages.

The following images show a network before and after the iteration process of assigning page ranks. Like in many other recursion-based technologies (see artificial neural networks for instance), the iterations are initiated from random values - in this case the same ones for all nodes. Colors differentiate nodes by indegrees. The sizes and labels of the nodes in the right network indicate its Page rank.

...

## Homophily

Homophily in a social network expresses the principle that friends tend to be similar to their friends. Sometimes this is expressed with the proverb "birds of a feather flock together". An indication for homophily is a ratio of the fraction of links between nodes that differ in kind, which is smaller than it should be if there is no sign of any bias in the link distribution. In mathematical terms:

$$k_{cross} < 2xy$$, with $$k_{cross}$$ indicating the fraction of links between nodes that differ in kind, and $$x$$ and $$y$$ indicating the fractions of the two differing nodes.

The network to the right for instance, depicts a friendship network in a school classroom with boys represented as green circles and girls as red triangles.

If this network would not exhibit homophily by gender (that is, if links would be evenly distributed among nodes), then cross gender friendships - links from girls to boys and vice versa - should have a proportion of $$2bg$$ in the overall network, with $$b$$ indicating the fraction of boys in the network and $$g$$ indicating the fraction of girls. If the fraction of cross-gender edges would be significantly less than $$2bg$$, then this would be evidence for homophily.

In the network to the right, 5 of the 18 edges are cross gender and $$\frac{2}{3}$$ of all nodes represent boys and $$\frac{1}{3}$$ represent girls, implying that $$2bg = \frac{4}{9}=\frac{8}{18}$$. Since $$\frac{5}{18} < \frac{8}{18}$$ this example shows evidence of homophily.

## Assortativity

Very similar to homophily, assortativity refers to the fact that real social networks often show higher connectivity among similar nodes. Rich people tend to associate with other rich people, poor ones with other poor ones.

As a particular case, assortative mixing by degree implies that nodes of high degree tend to associate with similarly highly connected nodes, while nodes with low degree associate with other less connected nodes.

Assortativity is measured by the Pearson correlation r for all pairs of nodes of the network, with $$r$$ lying between −1 and 1. Positive values of $$r$$ indicate a correlation between nodes of similar degree, negative values indicate relationships between nodes of different degree. When $$r = 1$$, the network is said to have perfect assortative mixing patterns, when $$r = 0$$ the network is non-assortative, while at $$r= −1$$ the network is completely disassortative.

While several social networks - such as the co-authorship-networks of scientists for instance - have been shown to be assortative, many technical − e.g. the internet − and biological networks − e.g. protein interaction, neural networks, food webs etc. − tend to be disassortative.

Newman M.E.J. (2002) Assortative mixing in networks. Phys. Rev. Lett. 89 (2002) 208701.

Newman M.E.J. (2010) Networks. An introduction. Oxford University Press, New York.

## Giant component

The size of the giant component $$G_c$$ captures the maximal connected component of a network