My favorite application using eigenvalues: partitioning and community detection in social networks Will Hobbs February 17, 2013 Abstract Social networks are often organized into families, friendship groups, and institutional bodies. However, this organization, functional groups of friends and acquaintances especially, may not always be apparent from the characteristics of the individuals or from global measures of network structure. Community detection methods provide a means to identify cohesive and/or inter-dependent clusters of individuals. In this paper, I review traditional methods to cluster and partition vertices, along with a recent community detection method, modularity maximization, that is more intuitively appealing in many research areas. I focus in particular on the spectral solutions to these problems that have advanced and/or expanded their use in real world applications. Introduction Social networks, and many other real world networks, display a high level of order [3]. Globally, empirical degree sequences often follow a heavy-tailed distribution, and this distribution can be used in measures of network centrality. At meso-scale, vertices tend to organize in groups with a high concentration of within group edges, and a low concentration of between-group edges. This organization is termed community structure [4], or clustering. 1 Community structure is practically interesting because the clusters often share characteristics or functions. For example, social networks organize into families, friendship groups, and institutional bodies. Further, the development of community structure may imply an underlying preference (and advantage) in forming these memberships that predicts the function of the community and the network as a whole. Membership in this group can be an indicator of local cohesion and interdependency. Partitioning and community detection can be readily conceptualized in terms of flows and random walks on a network [3]. A small number of between-group edges forces a random walk to spend more time within a community and relatively infrequently cross into another one. In a social network, we might connect this to the flow of social support within a group of very close friends or family members (that infrequently extends this support to out-groups). This spectral approach has a long history in mathematics [14] and, in recent years, has been adapted to quickly and accurately identify communities in social networks [7]. [9]. 1 Much of this paper draws from reviews and explanations of community detection methods by Fortunato [3] and Newman 1
History Community detection has parallel and, recently, mutually enforcing histories in mathematics and in social science. The mathematical approach identifies communities based on network characteristics (edges and edge strengths)[3], while traditional social science research identifies communities based on the shared characteristics and functions of individuals. The current spectral methods for detecting communities in networks are rooted in early work by Fiedler [2] and Donath and Hoffman [1]. This research used the eigenvectors of matrices (Donath and Hoffman) and, specifically, the eigenvector corresponding to the second smallest eigenvalue of the graph Laplacian (Fiedler), or the spectral gap, to partition graphs. The eigenvector corresponding to the spectral gap minimizes the bisection of a graph because the eigenvalues are proportional to their corresponding eigenvector cut sizes. This method was first practically applied in a 1990 paper by Pothen, Simon, and Liou [12]. The most recent applications of spectral methods to community detection use the largest eigenvalue to maximize the modularity of a network [10], or the difference between realized and expected connections within clusters. Specifically, the modularity approaches maximizes the value Q: Q = 1 2m B ij δ(c i, c j ) (1) ij B ij = A ij P ij (2) P ij = k ik j 2m (3) where B ij is the modularity matrix (or the B matrix), A ij is the adjacency matrix, P ij is the null model matrix, k is vertex degree, m is the edge count, and δ is the Kronecker delta symbol indicating shared community membership of two vertices [11]. The most common choice of null model is the kikj 2m, closely related to the configuration model (of expected connections based on a given degree sequence) [6, 11]. This method is appealing because, in most applications, a community detection method should not cut a graph based on local degree. Connections in excess of what we should see at random given a degree sequence will reflect some other (non-degree-based) social characteristic or function. Further, other null models can be easily adapted to this method. These methods are usually verified by labels assigned by the actors themselves, such as party membership in the US Congress, along with qualitative assessments, such as factions within parties during the Civil Rights Era. Algorithms The following methods are split by whether they minimize cuts (using the two, or more, smallest eigenvalues of the graph Laplacian) or maximize modularity (using the one, or more, leading eigenvectors of the modularity matrix). The modularity maximization methods have achieved greater success in real world applications. 2
Cut minimization Some traditional graph partitioning and clustering methods use the graph Laplacian to minimize the cuts between clusters. The size of these cuts in many of the methods are directly related to the probability a random walk will transfer from one cluster to another. These methods originally arose to address problems in parallel computing [12, 11] and require ex-ante specification of group sizes (community detection does not). Greedy method: Kernighan-Lin algorithm [5] One of the first methods in computer science to calculate minimum cut sizes is a greedy algorithm that swaps pairs of vertices until a (locally) minimal cut size is achieved. The pair swapping ensures that the final partition assigns a specific number of nodes to each group. 1. Divide nodes into two groups. 2. Find pairs that reduce the cut size by the largest amount. 3. Swap these pairs (and swap only once) until there is no increase in modularity. 4. Go back to the smallest cut size and begin swapping pairs (with the preceding, already swapped pairs set in place). 5. Repeat until there is no improvement in cut size. Spectral partitioning [12] Spectral partitioning tends to achieve cut sizes similar to the Kernighan-Lin algorithm, but achieves them much more quickly [9]. Results can be improved by using the Kernighan-Lin algorithm after finding the spectral cuts. 1. Calculate the eigenvector corresponding to the spectral gap of the Laplacian. 2. Sort the n 1 largest elements into one group and the rest into a second group (where n 1 is the user size of a partition). 3. Sort the n 1 smallest elements into one group and the rest into a second group (because we do not know whether n 1 should correspond to the largest or smallest elements). 4. Select the division with the smaller cut size. Spectral clustering [13] 3
1. Calculate the k eigenvectors corresponding to the k smallest eigenvalues of the normalized graph Laplacian (the unnormalized graph Laplacian will achieve similar results when the vertices degrees are similar [3]). 2. Create an n x k matrix (where the k columns are eigenvectors). 3. Perform k-means clustering on the n vertices in k-dimensional Euclidean space. Modularity maximization [4] There are two prominent algorithms to optimize the modularity value introduced in the history section of this paper. The algorithms do not substantially differ in speed or accuracy [9]. Greedy method: vertex-moving algorithm (Kernighan-Lin analog) [9] 1. Divide nodes into two groups. 2. Move the vertex that will achieve the greatest increase in modularity (but move vertices only once). 3. After each vertex has been reassigned, go back to the largest modularity value and begin moving vertices (with the preceding, already moved vertices set in place) 4. Repeat until there is no improvement in modularity. 5. Repeat this on each bipartition (until there is no increase in modularity) to find an arbitrary number of communities. Spectral algorithm [7] 1. Find the leading eigenvector of the modularity matrix (using the power method). 2. Assign nodes corresponding to the positive values of the leading eigenvector to one community and the remainder to another (if there are no positive values, do not split the network). 3. Repeat this on each bipartition (until there are no positive values in the leading eigenvector) to find an arbitrary number of communities. Other eigenvectors This method throws away useful information from other eigenvectors. In theory, the k-largest eigenvalues can be used to find more than two communities at once, but there is currently no implementation of this (the problem is too high-dimensional). Ideas behind optimization techniques In this section, I review two techniques that enable the practical implementation of spectral partitioning and modularity maximization. 4
Spectral approximation The matrix formulation of the graph partitioning problem is: R = 1 4 st Ls, (4) The s, community assignment (here either +1 or -1 for s i in s), that minimizes cut size R (alternatively, R = 1 4 ij A ij(1 s i s j )) is difficult to find in practice. A useful approximation of this permits s to take on any value subject to s = n. This relaxation allows s to assign values to the hypersphere circumscribing the 2 n hypercube [9]. Given this relaxation, we can differentiate with respect to s i and find an approximate, continuous solution in terms of the graph Laplacian and its eigenvectors. Spectral modularity maximization in O(n 2 ) time The modularity matrix B is not a sparse matrix and finding the leading eigenvector of this matrix, using the power method, will take O(n 3 ). This is not practical on many real world networks. However, the structure of the modularity matrix allows the computation to be completed more quickly (in O(n 2 )) [9]. In practice the power method is: Bx = Ax k(kt x) 2m, (5) where B is the modularity matrix, A is the adjacency matrix, x is an arbitrary vector, k is the degrees of the vertices, and 2m is a scaling term. Ax takes O(m + n) to complete, given the sparse matrix A (m is the number of edges and n is the number of vertices) and the second term can be evaluated in O(n). Because the power method converges to the leading eigenvector in O(n) multiplications and most real world networks (especially social networks) are sparse, the algorithm is completed in O(n 2 ) [8]. 5
References [1] W. E. Donath and A. J. Hoffman, Lower bounds for the partitioning of graphs, IBM Journal of Research and Development, 17 (1973), pp. 420 425. 2 [2] M. Fiedler, Algebraic connectivity of graphs, Czechoslovak Mathematical Journal, 23 (1973), pp. 298 305. 2 [3] S. Fortunato, Community detection in graphs, Physics Reports, 486 (2010), pp. 75 174. 1, 2, 4 [4] M. Girvan and M. Newman, Community structure in social and biological networks, Proceedings of the National Academy of Sciences of the United States of America, 99 (2002), pp. 7821 7826. 1, 4 [5] B. W. Kernighan and S. Lin, An Efficient Heuristic Procedure for Partitioning Graphs, Bell system technical journal, (1970). 3 [6] M. Molloy and B. Reed, A critical point for random graphs with a given degree sequence, Random Structures & Algorithms, 6 (1995), pp. 161 180. 2 [7] M. Newman, Finding community structure in networks using the eigenvectors of matrices, Physical Review E, 74 (2006), p. 036104. 1, 4 [8] M. Newman, Modularity and community structure in networks, Proceedings of the National Academy of Sciences of the United States of America, 103 (2006), pp. 8577 8582. 5 [9] M. Newman, Networks: An Introduction, Oxford University Press, Inc., New York, NY, USA, 2010. 1, 3, 4, 5 [10] M. Newman and M. Girvan, Finding and evaluating community structure in networks, Physical Review E, 69 (2004), p. 026113. 2 [11] M. A. Porter, J.-P. Onnela, and P. J. Mucha, Communities in networks, Notices of the AMS, 56 (2009), pp. 1082 1097. 2, 3 [12] A. Pothen, H. D. Simon, and K.-P. Liou, Partitioning Sparse Matrices with Eigenvectors of Graphs, SIAM Journal on Matrix Analysis and Applications, 11 (1990), pp. 430 452. 2, 3 [13] J. Shi and J. Malik, Normalized cuts and image segmentation, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22 (2000), pp. 888 905. 3 [14] D. A. Spielmat and S.-H. Teng, Spectral Partitioning Works, Foundations of Computer Science, (1996), pp. 96 105. 1 6