Second-Order Assortative Mixing in Social Networks

Similar documents
Summary: What We Have Learned So Far

Response Network Emerging from Simple Perturbation

From Centrality to Temporary Fame: Dynamic Centrality in Complex Networks

Properties of Biological Networks

arxiv:cs/ v4 [cs.ni] 13 Sep 2004

Complex networks: A mixture of power-law and Weibull distributions

Higher order clustering coecients in Barabasi Albert networks

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

The Complex Network Phenomena. and Their Origin

Introduction to network metrics

Network community detection with edge classifiers trained on LFR graphs

Basics of Network Analysis

THE DEPENDENCY NETWORK IN FREE OPERATING SYSTEM

Modeling Traffic of Information Packets on Graphs with Complex Topology

The missing links in the BGP-based AS connectivity maps

Network Theory: Social, Mythological and Fictional Networks. Math 485, Spring 2018 (Midterm Report) Christina West, Taylor Martins, Yihe Hao

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

The network of scientific collaborations within the European framework programme

Universal Behavior of Load Distribution in Scale-free Networks

Critical Phenomena in Complex Networks

Fast algorithm for detecting community structure in networks

Self-organization of collaboration networks

Physica A. Changing motif distributions in complex networks by manipulating rich-club connections

Tuning clustering in random networks with arbitrary degree distributions

Structure of biological networks. Presentation by Atanas Kamburov

An introduction to the physics of complex networks

Using graph theoretic measures to predict the performance of associative memory models

TELCOM2125: Network Science and Analysis

Characteristics of Preferentially Attached Network Grown from. Small World

Empirical analysis of online social networks in the age of Web 2.0

1 Homophily and assortative mixing

Modelling weighted networks using connection count

Statistical Analysis of the Metropolitan Seoul Subway System: Network Structure and Passenger Flows arxiv: v1 [physics.soc-ph] 12 May 2008

Optimal weighting scheme for suppressing cascades and traffic congestion in complex networks

Why Do Computer Viruses Survive In The Internet?

Immunization for complex network based on the effective degree of vertex

Smallest small-world network

Universal Properties of Mythological Networks Midterm report: Math 485

Global dynamic routing for scale-free networks

Attack Vulnerability of Network with Duplication-Divergence Mechanism

(Social) Networks Analysis III. Prof. Dr. Daning Hu Department of Informatics University of Zurich

An Evolving Network Model With Local-World Structure

The Establishment Game. Motivation

Social Data Management Communities

Convex skeletons of complex networks

BMC Bioinformatics. Open Access. Abstract

Example for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows)

2007 by authors and 2007 World Scientific Publishing Company

Incoming, Outgoing Degree and Importance Analysis of Network Motifs

CAIM: Cerca i Anàlisi d Informació Massiva

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 3 Aug 2000

Deterministic Hierarchical Networks

Complex Networks. Structure and Dynamics

Deterministic small-world communication networks

Small World Properties Generated by a New Algorithm Under Same Degree of All Nodes

First-improvement vs. Best-improvement Local Optima Networks of NK Landscapes

Complex Networks: Ubiquity, Importance and Implications. Alessandro Vespignani

SNA 8: network resilience. Lada Adamic

The Betweenness Centrality Of Biological Networks

arxiv:cond-mat/ v2 [cond-mat.stat-mech] 27 Feb 2004

Search in weighted complex networks

A Realistic Model for Complex Networks

Constructing a G(N, p) Network

Wednesday, March 8, Complex Networks. Presenter: Jirakhom Ruttanavakul. CS 790R, University of Nevada, Reno

Erdős-Rényi Model for network formation

Networks and stability

V4 Matrix algorithms and graph partitioning

Constructing a G(N, p) Network

arxiv:cond-mat/ v1 21 Oct 1999

Traffic dynamics based on an efficient routing strategy on scale free networks

Graph Theory for Network Science

V2: Measures and Metrics (II)

Oh Pott, Oh Pott! or how to detect community structure in complex networks

Master s Thesis. Title. Supervisor Professor Masayuki Murata. Author Yinan Liu. February 12th, 2016

CS-E5740. Complex Networks. Scale-free networks

A Big World Inside Small-World Networks

M.E.J. Newman: Models of the Small World

Supplementary material to Epidemic spreading on complex networks with community structures

Integrating local static and dynamic information for routing traffic

Internet as a Complex Network. Guanrong Chen City University of Hong Kong

Example 1: An algorithmic view of the small world phenomenon

1 More configuration model

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Onset of traffic congestion in complex networks

Fractal small-world dichotomy in real-world networks

Supplementary text S6 Comparison studies on simulated data

Advanced Algorithms and Models for Computational Biology -- a machine learning approach

Graph Theory. Graph Theory. COURSE: Introduction to Biological Networks. Euler s Solution LECTURE 1: INTRODUCTION TO NETWORKS.

arxiv:cond-mat/ v2 [cond-mat.stat-mech] 1 Sep 2002

Random Generation of the Social Network with Several Communities

Topologies and Centralities of Replied Networks on Bulletin Board Systems

An Empirical Analysis of Communities in Real-World Networks

Modeling and Simulating Social Systems with MATLAB

- relationships (edges) among entities (nodes) - technology: Internet, World Wide Web - biology: genomics, gene expression, proteinprotein

Simplicial Complexes of Networks and Their Statistical Properties

TELCOM2125: Network Science and Analysis

Community overlays upon real-world complex networks

Pheromone Static Routing Strategy for Complex Networks. Abstract

CS-E5740. Complex Networks. Network analysis: key measures and characteristics

Transcription:

Second-Order Assortative Mixing in Social Networks 1 arxiv:0903.0687v2 [physics.soc-ph] 23 Nov 2009 Shi Zhou* and Ingemar J. Cox Department of Computer Science, University College London Gower Street, London, WC1E 6BT, United Kingdom Email: s.zhou@cs.ucl.ac.uk Lars K. Hansen Danish Technical University Richard Petersens Plads B321, DK-2800 Lyngby, Denmark Abstract It is common to characterise networks based on their statistical properties. It has been shown that social networks, such as networks of co-starring film actors, collaborating scientists and email communicators, exhibit a positive first-order assortative behaviour, i.e. if two nodes are connected, then their degrees (the number of links a node has) are similar. In contrast, technological and biological networks, such as the Internet and protein interactions, exhibit a disassortative behaviour, i.e. high-degree nodes tend to link with low-degree ones. For social networks, a node s degree has been assumed to be a proxy for its importance/prominence within the network, and the assortative behaviour is then interpreted as indicating that people mix with people of comparable prominence. In this paper we introduce a new property, second-order assortative mixing, which measures the correlation between the most prominent neighbours of two connected nodes, rather than the prominence of the nodes themselves. Five social networks and six other networks are examined. We observe very stronge second-order assortative mixing in social networks. This suggests that if two people interact in a social network then the importance of the most prominent person each knows is very likely to be the same. This is also true if we measure the average prominence of neighbours of the two people. This property is weaker or negative in nonsocial networks. We investigate a number of possible explanations for this property, including statistical significance, neighbourhood size, power-law degree distribution, cluster coefficient, triangles and bipartite graphs. However, none of these properties was found to provide an adequate explanation. We therefore conclude that second-order assortative mixing is a fundamental property of social networks.

2 Fig. 1. Illustration of a part of a network depicting actors (nodes) and the films (links) they have acted together in. I. BACKGROUND A network or graph consists of nodes (also called vertices) connected together via links. Networks are utilised in many disciplines. The nodes model physical elements such as people, proteins or cities, and the links between nodes represent connections between them, such as contacts, biochemical interactions, and roads. For example, a portion of a network describing actors and the films they have acted together in is depicted in Fig. 1. In recent years studying the structure, function and evolution of networked systems in society and nature has become a major research focus [1], [2], [3], [4], [5], [6]. The connectivity or degree, k, of a node is defined as the number of links the node possesses. The probability distribution of node degrees is indicative of a network s global connectivity. For example random graphs with a Poisson degree distribution [7] have most nodes with degrees close to the average degree. In contrast, many complex networks in nature and society are scale-free graphs [8] exhibiting a power-law degree distribution, where many nodes have only a few links and a small number of nodes have a very large number of links. However, the degree distribution alone does not provide a full description of a network s topology. Networks with identical degree distributions can possess other properties that are vastly different [9], [10], [11]. One such property, is the mixing pattern between the two end nodes of a link [12], [13], i.e. the joint probability distribution of a node with degree k being connected to a node with degree k. In general, biological and technological networks are disassortative meaning that well-connected nodes tend to link

3 Fig. 2. An example network. The assortative coefficients are calculated by the excess degree which is degree minus one [13]. Consider the link between nodes A and B, the (excess) degree of node A is 5. The neighbours maximum (excess) degree of node A is 7, which is the (excess) degree of node C. with poorly-connected nodes, and vice versa. In contrast, social networks, such as collaborations between film actors or scientists, exhibit assortative mixing, where nodes with similar degrees are connected. To quantify this mixing property, Newman [13] proposed the assortative coefficient, r, where 1 r 1. It is derived by considering the Pearson correlation between two sequences, where corresponding elements in the two sequences represent the degree of the nodes at either end of a link in the network. 1. When r 1 there is a perfect assortative mixing in the network, i.e., every link connects two nodes with the same degree. When r 0 there is no correlation. In terms of degree mixing, the network is random or neutral. If the network is perfectly disassortative, i.e., every link connects two nodes of different degrees, r is negative and has a value lying in the range 1 r < 0. The mixing pattern has been regarded as a fundamental property of networks and the assortative coefficient r has been widely used to measure this property. II. SECOND-ORDER ASSORTATIVE MIXING We now introduce and define a related property which we refer to as the second-order mixing. Consider a link i connecting two nodes, a and b, with degrees k a and k b. Let N a\b denote the set of node a s neighbours, excluding b, i.e. all the other nodes that a is directly connected to. Each of these neighbours, n a, has a corresponding degree, k na. Following Newman s definition of the (first-order) assortative 1 For a directed network, the degree of the starting node of a link is contained in one sequence, and the degree of the ending node is in the other sequence. The number of elements in each sequence is the number of links. For an undirected network, as all the networks studied in this paper, each undirected link is replaced by two directed links pointing at opposite directions. Thus the number of elements in a sequence is twice the number of links.

4 coefficient r which is based on the degrees of the two end nodes of a link, we define the second-order assortative coefficient R max as R max = L 1 i 1 2 L 1 i where L is the number of links, K i and K i K i K i [ 1 2 L 1 i (K i + K i) (K 2 i + K i 2 [1 ] 2, (1) ) 2 L 1 (K i + K i ) i are the maximum degree of neighbours of the two nodes connected by link i, i.e. K i = max(k na : n a N a\b ) and K i = max(k n b : n b N b\a ). Similarly we define the second-order assortative coefficient R avg based on the neighbours average degrees by replacing K i and K i in the above equation as K i = 1 k a n a N a\b k na and K i = 1 k b n b N b\a k nb. Note that when calculating the first and second assortative coefficients, we actually use the excess degree [13] which is degree minus one 2. See Fig. 2 for examples. Values of the assortative coefficients, r, R avg and R max are provided in Table I. We consider eleven networks, including five social networks, two biological networks, two technology networks, and two synthetic networks based on random connections [7] and the Barabási and Albert [8] model, respectively. The expected standard deviation σ on the value of assortative coefficient r can be obtained by the jackknife method [22] as σ 2 = L i=1 (r i r) 2, where r i is the value of r for the network in which the i-th link is removed and i = 1,2,...L. And likewise for second-order assortative coefficients R max and R avg. For all cases shown in Table 1, the value of σ is very small (< 0.03), which validates the statistical significance of the coefficients. The comparison between the first and second-order assortative coefficients is remarkable. For the five social networks, while both the first and second order coefficients are positive, the value of the secondorder assortative coefficients, in terms of both neighbours average degree and neighbours maximum degree, are very much higher. The strongest effect of second-order mixing is observed for the Film actor, Scientist and Secure email networks. The weakest (though still strong) effect is exhibited by the Email network. The fact that the Secure email network, using the Pretty Good Privacy (PGP) protocol [15], exhibits a much stronger effect is probably due to the nature of this network which explicitly relies on endorsement between members of the network for the purpose of authentication and encryption. Interestingly, while the Email and Musician networks exhibit very low first-order assortative mixing, they exhibit strong second-order assortative mixing. ] 2 2 This is because, when considering two connected nodes, A and B, the neighborhood of A is defined to exclude B. And likewise for the neighborhood of B.

5 TABLE I PROPERTIES OF THE NETWORKS UNDER STUDY. SOCIAL NETWORKS: (A) FILM ACTOR COLLABORATIONS [8], WHERE TWO ACTORS ARE CONNECTED IF THEY HAVE CO-STARRED IN A FILM; (B) SCIENTIST COLLABORATIONS [14], WHERE TWO SCIENTISTS ARE CONNECTED IF THEY HAVE COAUTHORED A PAPER IN CONDENSE MATTER PHYSICS; (C) SECURE EMAIL NETWORK [15], WHERE TWO USERS ARE CONNECTED IF THEY HAVE USED THE PRETTY GOOD PRIVACY (PGP) ALGORITHM FOR SECURE EMAIL EXCHANGE; (D) EMAIL EXCHANGES BETWEEN MEMBERS OF A UNIVERSITY [16]; AND (E) JAZZ MUSICIAN NETWORK [17], WHERE TWO MUSICIANS ARE CONNECTED IF THEY HAVE PLAYED IN A BAND. BIOLOGICAL NETWORKS: (F) C. elegans METABOLIC NETWORK [18], WHERE TWO METABOLITES ARE CONNECTED IF THEY PARTICIPATE IN A BIOCHEMICAL REACTION; AND (G) THE PROTEIN INTERACTIONS OF THE YEAST Saccharomyces cerevisiae [19], [20]. TECHNOLOGICAL NETWORKS: (H) WESTERN STATES POWER GRID OF THE UNITED STATES [21]; AND (I) INTERNET [12] (HTTP://WWW.ROUTEVIEWS.ORG/), WHERE TWO SERVICE PROVIDERS ARE CONNECTED IF THEY HAVE A COMMERCIAL AGREEMENT TO EXCHANGE DATA TRAFFIC. SYNTHETIC NETWORKS: (J) THE RANDOM GRAPHS [7]; AND (K) THE BARABÁSI-ALBERT (BA) GRAPHS [8]. PROPERTIES SHOWN ARE THE NUMBERS OF NODES AND LINKS; THE ASSORTATIVE COEFFICIENTS r, R avg AND R max, BASED ON DEGREE, NEIGHBOURS AVERAGE DEGREE AND NEIGHBOURS MAXIMUM DEGREE, RESPECTIVELY, EACH COMPARING WITH ITS EXPECTED STANDARD DEVIATION σ; AND THE AVERAGE CLUSTERING COEFFICIENT C. Network Nodes Links r σ r R avg σ avg R max σ max C (a) Film actor 82,593 3,666,738 0.206 0.013 0.836 0.027 0.813 0.009 0.75 (b) Scientist 12,722 39,967 0.161 0.007 0.680 0.014 0.647 0.005 0.65 (c) Secure email 10,680 24,316 0.238 0.007 0.653 0.009 0.680 0.007 0.27 (d) Email 1,133 5,451 0.078 0.014 0.242 0.014 0.247 0.014 0.22 (e) Musician 198 2,742 0.020 0.019 0.543 0.023 0.307 0.029 0.62 (f) Metabolism 453 2,025-0.226 0.011 0.265 0.032 0.263 0.023 0.65 (g) Protein 4,626 14,801-0.137 0.008-0.046 0.007 0.033 0.009 0.09 (h) Power grid 4,941 6,594 0.004 0.014 0.205 0.015 0.258 0.016 0.08 (i) Internet 11,174 23,409-0.195 0.001-0.097 0.004 0.036 0.008 0.30 (j) Random graph 10,000 30,000 0 0.009 0 0.011 0 0.006 0 (k) BA graph 10,000 30,000 0 0.004 0 0.008 0 0.008 0 Other forms of networks do not exhibit the very strong second-order correlations exhibited by social networks. However, the Metabolism and Power grid networks do exhibit second order assortative mixing comparable to the Email network, the weakest of the social networks. For the Internet and Protein networks the second order assortative mixing is almost zero (R max, or negative (R avg ). As expected, synthetic networks generated by generic graph models are completely uncorrelated, i.e. R max = R avg = 0. Fig. 3(A) shows the first-order assortative mixing of the Scientist network by plotting the frequency

6 Fig. 3. The first and second-order assortative mixing in the Scientist network. We show the frequency distribution of links as functions of (A) degrees k and k of the two end nodes of a link, with k k ; (B) the neighbours maximum degrees of the two nodes, K max and K max, with K max K max; and (C) the neighbours average degrees, K avg and K avg, with K avg K avg, respectively. (D) is (B) for the randomised case of the network preserving the degree distribution. The maximum degree of the network is 97. distribution of the number of links as a function of degrees, k and k, of the two ends of a link. While we observe a strong correlation between degrees less than 20, there is no correlation beyond this. The frequency distribution rapidly decreases with increasing degree, as expected for a scale-free network. In contrast, Fig. 3(B) and (C) show the second-order assortative mixing by plotting the distribution of links as functions of the neighbours maximum degrees, K max and K max, and the neighbours average degrees, K avg and K avg, respectively. In Fig. 3(B), we observe a very strong correlation for almost all K max values. Moreover, the link distribution along the diagonal does not decrease with the increase of K max, but is approximately flat over two orders of magnitude. Of course the correlation in Fig. 3(B) is not perfect, and a second process appears to be uniform noise. If the average rather than the maximum is considered, Fig. 3(C), we still observe a strong correlation. The noise might be better modelled as Gaussian which is probably due to the summation of many nodes and the central limit theorem.

7 TABLE II NULL HYPOTHESIS TEST. THE AVERAGE ASSORTATIVE COEFFICIENTS, r, R avg AND R max, AND THEIR STANDARD DEVIATION, δ, OBTAINED AFTER RANDOM PERMUTATION OF ONE OF THE TWO VALUE SEQUENCES. Network r δ r R avg δ avg R max δ max (a) Film actor 0.00001 0.00034 0.00000 0.00034 0.00001 0.00036 (b) Scientist -0.00029 0.00360 0.00010 0.00334 0.00035 0.00343 (c) Secure email -0.00065 0.00422 0.00043 0.00470-0.00020 0.00498 (d) Email 0.00021 0.01078 0.00177 0.00861 0.00101 0.00970 (e) Musician 0.00050 0.01250-0.00168 0.01380-0.00026 0.01300 (f) Metabolism -0.00021 0.01469-0.00106 0.01518 0.00134 0.01460 (g) Protein 0.00000 0.00536 0.00058 0.00513-0.00072 0.00520 (h) Power grid -0.00171 0.00842 0.00152 0.00886-0.00039 0.00811 (i) Internet -0.00033 0.00395 0.00072 0.00459 0.00001 0.00422 III. SIGNIFICANCE In this section we first test the statistic significance of the second-order mixing property. Then we address the question of whether the second-order mixing is a new topological property, i.e. whether it can be explained by other known properties of the networks. A. Null Hypothesis Test A high correlation score between two value sequences must be tested against the null hypothesis. For each network and each coefficient, we randomly permuted the order of values in one of the two sequences and re-computed the coefficient. This was repeated 100 times and then we calculated the mean and standard deviation of the coefficient. Results are shown in Table II. Typically the mean value is close to zero and the standard deviation is small. These results confirmed the statistical significance of the first and second-order assortative coefficients, reported in Table I. B. Increasing Neighbourhood Size The strong correlation scores associated with second-order assortative mixing could simply be due to the increased extent of neighbourhood (from distance of one hop to two hops), i.e. a node always has more second-order neighbours than first-order neighbours. To exclude this possibility we also examined the Xth-order assortative coefficients, R max and R avg, which are calculated using the maximum or

8 average degree within the neighbourhood of no more than X hops from each end of a link. We observed that for all networks under study, the values of the 3rd-order coefficients were smaller than the 2nd-order coefficients. This shows the second-order assortative mixing is not caused by increased neighbourhood (alone). Of course, if the neighbourhood continues to increase, eventually the coefficients will increase and tend to one. This is to be expected since eventually, the neighbourhood encompasses the entire network. C. High-Degree Nodes Another possible explanation for the high values of second-order assortative coefficients considered, is that there are a few hub nodes that are extremely well connected and dominate the network structure. To test this we removed the best-connected node (together with the links attaching to it and any resulting isolated nodes) from the networks and re-computed the coefficients. We also calculate the coefficients after removing the top 5 best-connected nodes. Results are shown in Table III. In all cases, the coefficients change very little. For some networks, such as the Secure email, Musician and Metabolism networks, the second-order coefficients became stronger after the best-connected nodes are removed. D. Power-Law Degree Distribution While high degree nodes do not explain the high second order assortative mixing scores, the underlying heterogenous power-law structure of the networks was also considered as an explanation. To exclude this possibility we used the random link rewiring algorithm [9], [11] to produce surrogate networks that preserve the degree distribution of the network under study. The process is as follows. Firstly we choose a pair of links at random. The links should have four different end nodes. Secondly we randomly rewire the two links between the four end nodes. We only accept rewired links if no duplicate link is generated and the network remains as a single component; otherwise we keep the original links. Then we repeat the above two steps a sufficiently large number of times, usually a thousand times of the number of links in the network. This process does not change each node s degree and therefore preserves exactly the network s original degree distribution. The resulting network can be regarded as a randomised case of the original network. Fig. 3(D) illustrates the distribution of links as a function of K max and K max for a randomised case of the Scientist network. The second-order assortative mixing in the original network disappears completely in the randomised case, where the second-order mixing appears to be modelled as uniform noise. This result shows that the second-order mixing is not constrained by a network s degree distribution,

9 TABLE III ASSORTATIVE COEFFICIENTS OF THE NETWORKS AFTER REMOVING (1) THE BEST-CONNECTED NODE OR (2) THE TOP 5 BEST-CONNECTED NODES. ALSO SHOWN ARE THE NUMBERS OF NODES, LINKS AND THE MAXIMUM DEGREE, k max, OF THE RESULTED NETWORKS. Network Nodes Links k max r R avg R max (a) Film actor 82,593 3,666,738 3,789 0.206 0.836 0.813 Remove 1 best-connected node 82,582 3,662,949 3,701 0.208 0.837 0.814 Remove 5 best-connected nodes 82,578 3,648,821 3,229 0.213 0.839 0.825 (b) Scientist 12,722 39,967 97 0.161 0.680 0.647 Remove 1 best-connected node 12,721 39,870 89 0.154 0.674 0.633 Remove 5 best-connected nodes 12,717 39,547 72 0.157 0.673 0.623 (c) Secure email 10,680 24,316 205 0.238 0.653 0.680 Remove 1 best-connected node 10,675 24,111 162 0.267 0.651 0.591 Remove 5 best-connected nodes 10,616 23,603 101 0.354 0.703 0.631 (d) Email 1,133 5,451 71 0.078 0.242 0.247 Remove 1 best-connected node 1,131 5,380 51 0.072 0.211 0.161 Remove 5 best-connected nodes 1,127 5,180 49 0.058 0.195 0.128 (e) Musician 198 2,742 100 0.020 0.543 0.307 Remove 1 best-connected node 197 2,642 95 0.057 0.583 0.615 Remove 5 best-connected nodes 193 2,345 55 0.056 0.442 0.357 (f) Metabolism 453 2,025 237-0.226 0.265 0.263 Remove 1 best-connected node 452 1,788 122-0.222 0.195 0.304 Remove 5 best-connected nodes 448 1,385 72-0.153 0.352 0.338 (g) Protein 4,626 14,801 282-0.137-0.046 0.033 Remove 1 best-connected node 4,574 14,519 197-0.140-0.026 0.080 Remove 5 best-connected nodes 4,497 13,937 115-0.128 0.012 0.074 (h) Power grid 4,941 6,594 19 0.004 0.205 0.258 Remove 1 best-connected node 4,939 6,575 18 0.005 0.189 0.229 Remove 5 best-connected nodes 4,917 6,515 13 0.008 0.172 0.217 (i) Internet 11,174 23,409 2,389-0.195-0.097 0.036 Remove 1 best-connected node 10,782 21,020 1,333-0.214-0.049-0.039 Remove 5 best-connected nodes 9,368 17,155 560-0.192 0.004 0.047

10 Fig. 4. Second-order mixing coefficients vs average clustering coefficient i.e. networks with identical degree distribution could have hugely different mixing patterns both in the first-order [9], [11] and in the second-order. It highlights the limitation of characterising network topology by degree distribution alone and the critical importance of appreciating other topological properties as well. E. Clustering Coefficient We also examined whether the second-order assortative mixing is a consequence of the clustering behaviour observed in many social networks, where one s friends are also friends of each other. This is quantified by the clustering coefficient, C i, which is defined as C i = e i k i(k i 1)/2, where k i is the degree or the number of neighbours of node i, the denominator k i (k i 1)/2 is the maximum possible connections between the neighbours, and e i is the actual number of connections between them [2]. The average clustering coefficient, C, is the arithmetic average over all nodes in the network. Comparison of C with either R avg or R max in Table I and Fig. 4 shows that high values of the second-order assortative coefficients occur for both high and low values of clustering coefficient. For example, while the Scientist network has high clustering coefficient and high second-order assortative coefficients, the Secure Email network has low clustering coefficient and high second-order assortative coefficient. On the other hand, the Internet has an average clustering coefficient higher than the two email networks, but the Internet s second-order assortative coefficients are the lowest.

11 Fig. 5. Average clustering coefficient of k-degree nodes. Fig 5 reveals that the Scientist network and the Secure email network are fundamentally different not only in the average clustering coefficient but also in the relation between clustering coefficient and node degree. Yet they have similar high values of R avg and R max. In contrast, the Scientist network and the metabolism exhibit similar values for the average clustering coefficient, but their R avg and R max values are significantly different. This suggests that the second-order assortative mixing is something quite unexpected, particularly considering the work on the hierarchical organisation of complex networks [23], [24]. F. Triangles It is interesting to consider how many of the two neighbours of maximum degree are one and the same, i.e. the most prominent contact at each end of a link is the same person, and therefore forms a triangle. Let X denote the degree difference between the most prominent neighbour of the two end nodes of a link, i.e. X = K max K max, and L <x denote the number of links with X < x. Table IV shows the ratio of L <4, L <2 and L <1 to the total number of links, L, respectively. Note that L <1 represents the case where K max = K max. Also shown is L /L, where L is the number of links for which the most prominent neighbour of the two end nodes are one and the same node (i.e. L L <1 ). For the case of the Film actor network, the link ratio values are almost the same, 34%, indicating that the second-order mixing behaviour in this network is largely due to the triangle case where the most prominent neighbour of a link s two end nodes are one and the same node. The same is true for the

12 TABLE IV LINK RATIO VALUES OF THE NETWORKS UNDER STUDY. Network L <4 / L L <2 / L L <1 / L L / L R max (a) Film actor 34.2% 34.2% 34.1% 34.1% 0.813 (b) Scientist 50.2% 45.4% 42.9% 41.6% 0.647 (c) Secure email 43.2% 37.8% 35.5% 34.6% 0.680 (d) Email 26.8% 20.2% 15.0% 12.8% 0.247 (e) Musician 56.3% 56.0% 55.7% 55.7% 0.307 (f) Metabolism 51.3% 51.1% 50.8% 50.7% 0.263 (g) Protein 10.5% 8.0% 6.4% 5.7% 0.033 (h) Power grid 68.1% 38.3% 19.1% 7.8% 0.258 (i) Internet 20.5% 20.3% 20.2% 20.2% 0.036 Musician network. However, for other social networks, e.g. the Scientist and Secure email networks, we observe that while a significant portion of links share a common most prominent neighbour, there is a further 1% of links (L <1 /L L /L) of which the most prominent neighbours are not the same but have the same degree value, more than 2% of links whose neighbours maximum degrees differ by one, and 7% of links whose neighbours maximum degrees differ between one and three. Table IV also shows there is no correlation between the link ratio L /L and the coefficient R max. Clearly neither the clustering coefficient nor triangles provide an adequate explanation for our observations. G. Bipartite Network A bipartite network is a network with two non-overlapping sets of nodes and Γ, where all links must have one end node belonging to each set. For example, actors star in films, scientist write papers, and musician play in bands. The Film actor, Scientist and Musician networks under study are constructed from bipartite networks, e.g. two actors are linked if they co-star in a film and two scientists are linked if they co-author a paper. The Film actor, Scientist and Musician networks all exhibit strong second-order assortative mixing (see Table I). It is therefore reasonable to ask whether the second-order assortative mixing can be attributed to the nature of bipartite networks? For example, all actors of one movie constitute a complete subgraph, in which everyone connects with the highest-degree node in the group. However, we found no support for this hypothesis. Firstly, the Metabolic network is also constructed from a bipartite network where the two types of nodes are metabolites and reactions. Two metabolites

13 Fig. 6. Second-order assortative coefficients R max and R avg vs first-order assortative coefficient r. are linked if they participate in a reaction. The Metabolic network, however, does not show a strong second-order assortative mixing. Secondly, the Secure email network is a non-bipartite network, where two email users are linked by direct email communications. The Secure email network exhibits one of the strongest secondorder assortative mixing, whose second-order assortative coefficients are significantly higher than the Musician and Metabolic networks. Bipartite networks might be a contributing factor, but they are not an adequate explanation for the second-order assortative mixing. H. Relation Between First and Second-Order Mixing Coefficients Fig. 6 shows that the assortative coefficient r is loosely correlated with the second-order coefficients R max and R avg. However, we notice that the first-order and the second-order assortative properties are not trivially related. In most cases the second-order mixing is much stronger than the first-order mixing. Moreover there are exceptions where the two are not correlated. Consider the Metabolic network and the Email network, the former is strongly disassortative with r = 0.226, whereas the later is assortative with r = 0.078 (see Table 1). Yet boith networks exhibit similar positive values for the second order mixing coefficient.

14 IV. CONCLUSION For social networks, the degree of a node, i.e. the number of people a person is linked to, is often considered a proxy for the prominence or importance of a person. First order assortative mixing has then been interpreted as indicating that if two people interact in a social network then they are likely to have similar prominence. However, the much stronger second order assortative mixing suggests that if two people interact in a social network then the importance of the most prominent person each knows is very likely to be the same. Our experimental results demonstrated very strong second order assortative mixing in social networks, but weaker, or even negative values for biological and technological networks. We examined a larger variety of other network properties in an effort to establish whether second-order assortative mixing was induced from other network properties such as its power law distribution, cluster coefficient, triangles and bipartite graphs. However, none of these properties was found to provide an adequate explanation. We therefore conclude that second-order assortative mixing is a fundamental property of social networks. Whether our most prominent contacts serve to introduce us (since much of the time they are the same person) or we simply prefer to mix with people who know similarly important people, remains an open question. Our work reveals a new dimension to the hierarchical structure present in social networks. We expect that our work will provide new clues for studying the structure and evolution of social networks as well as complex networks in general. ACKNOWLEDGMENTS The authors thank Ole Winther of DTU, Denmark and Sune Lehmann of Northeastern University, USA for discussions relating to the clustering coefficient. S. Zhou is supported by The Royal Academy of Engineering/EPSRC Research Fellowship under grant no. 10216/70. REFERENCES [1] S. Wasserman and K. Faust. Social Network Analysis. Cambridge University Press, Cambridge, 1994. [2] J. Watts. Small Worlds: The Dynamics of Networks between Order and Randomness. Princeton Univeristy Press, New Jersey, USA, 1999. [3] A. L. Barabási. Linked: The New Science of Networks. Perseus Publishing, 2002. [4] S. Bornholdt and H. G. Schuster. Handbook of Graphs and Networks - From the Genome to the Internet. Wiley-VCH, Weinheim Germany, 2002. [5] S. N. Dorogovtsev and J. F. F. Mendes. Evolution of Networks - From Biological Nets to the Internet and WWW. Oxford University Press, Oxford, 2003.

15 [6] R. Pastor-Satorras and A. Vespignani. Evolution and Structure of the Internet - A Statistical Physics Approach. Cambridge University Press, Cambridge, 2004. [7] P. Erdös and A. Rényi. On random graphs. Publ. Math. Debrecen, 6:290, 1959. [8] A. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509, 1999. [9] S. Maslov, K. Sneppenb, and A. Zaliznyaka. Detection of topological patterns in complex networks: correlation profile of the internet. Physica A, 333:529 540, 2004. [10] P. Mahadevan, D. Krioukov, K. Fall, and A. Vahdat. Systematic topology analysis and generation using degree correlations. ACM SIGCOMM Comp. Comm. Revi., 36(4):135 146, 2006. [11] S. Zhou and R. Mondragón. Structural constraints in complex networks. New Journal of Physics, 9(173):1 11, 2007. [12] R. Pastor-Satorras, A. Vázquez, and A. Vespignani. Dynamical and correlation properties of the internet. Phys. Rev. Lett., 87(258701):258701, 2001. [13] M. E. J Newman. Assortative mixing in networks. Phys. Rev. Lett., 89(208701):208701, 2002. [14] M. E. J. Newman. Scientific collaboration networks. I. Network construction and fundamental results. Phys. Rev. E, 64(016131):016131, 2001. [15] M. Boguñá, R. Pastor-Satorras, and A. Vespignani. Cut-offs and finite size effects in scale-free networks. Eur. Phys. J. B, 38:205 210, 2004. [16] R. Guimera, L. Danon, A. Diaz-Guilera, F. Giralt, and A. Arenas. Self-similar community structure in a network of human interactions. Phys. Rev. E, 68:065103, 2003. [17] P. Gleiser and L. Danon. Community structure in jazz. Adv. Complex Syst., 6:565 573, 2003. [18] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barabási. The large-scale organization of metabolic networks. Nature, 407:651654, 2000. [19] S. Maslov and K. Sneppen. Specificity and stability in topology of protein networks. Science, 296(5569):910 913, 2002. [20] V. Colizza, A. Flammini, A. Maritan, and A. Vespignani. Characterization and modeling of protein protein interaction networks. Physica A, 352:1 27, 2005. [21] D. J. Watts and S. H. Strogatz. Collective dynamics of small-world networks. Nature, 393:440, 1998. [22] B. Efron. Computers and the theory of statistics thinking the unthinkable. SIAM Rev., 21:460 480, 1979. [23] S. N. Dorogovtsev, A. V. Goltsev, and J. F. F. Mendes. Pseudofractal scale-free web. Physical Review E, 65:066122, 2002. [24] E. Ravasz and A.-L. Barabási. Hierarchical organization in complex networks. Physical Review E, 67:026112, 2003.