A Novel Parallel Hierarchical Community Detection Method for Large Networks
|
|
- Ronald Hart
- 6 years ago
- Views:
Transcription
1 A Novel Parallel Hierarchical Community Detection Method for Large Networks Ping Lu Shengmei Luo Lei Hu Yunlong Lin Junyang Zou Qiwei Zhong Kuangyan Zhu Jian Lu Qiao Wang Southeast University, School of Information Science and Engineering, Nanjing 2196, China. ZTE Corporation, Nanjing 2112, China. {lu.ping, luo.shengmei, {linyl, zoujyjs, qwzhong1988,zhukuangyan, lujian198, Abstract Detection of community structure in big social network is a challenging task. In this paper, a novel hierarchical clustering method taking advantages of traditional methods such as CNM proposed by Clauset et al and Spectral Clustering is presented as well as its parallel implementation. Computer-generated networks and a real world network of about two million vertices and two billion edges we crawled from Sina Weibo are used for testing the implementation. Our parallel method achieves relatively high modularity within much shorter time than other fast algorithms (CNM method, Spectral Clustering, etc) announced be suitable for massive data. And it works well on the real world network dataset, which is difficult to be handled by traditional algorithms because of limitations in both CPU and memories. 1 Introduction How to find community structure in very large networks has been a hot-spot problem in data mining [1, 2, 3]. Community detection is a process of dividing nodes into groups such that there are more connections within groups than between them. In this paper, a novel hierarchical clustering method is presented as well as its parallel implementation based on MapReduce framework. To assist readers, Table 1 defines terms and notations used throughout the paper. Table 2 shows the time complexity of existing community detection methods and our hierarchical method, illustrating that our parallel implementation achieves relatively high modularity within much shorter time than other fast (Girvan-Newman method (GN)[4], Clauset- Newman-Moore method (CNM)[5], etc) algorithms announced be suitable for big data. corresponding author 1
2 Table 1: Notations used in the paper Symbol Quantity m number of edges n number of nodes d depth of the dendrogram p number of processing nodes t number of clusters degree of the nodei k i Table 2: Time complexity of different methods Algorithm Time Complexity GN method O(m 2 n) CNM method O(md log n) Spectral Clustering [6] O(nt 2 ) Our method O( md p log n 3 p ) 2 Clauset-Newman-Moore (CNM) Method Considering a simple network with no repeated edges and its graph is undirected and unweighted, CNM method is a hierarchical clustering method based on the greedy optimization of the quantity known as Modularity [4] [5], which indicates the quality of a partition. The definition is as follows: Q = i (e ii a 2 i ), (1) where, e ij = 1 2m A vw δ(c v,i)δ(c w,j), a i = vw j e ij. (2) In (2), A vw is 1 when vertices v and w is connected, otherwise, c v denotes the community that contains the vertex v, δ(i,j) is 1 if i = j and otherwise, and e ij is the fraction of edges in the network connecting vertices in community i and those in community j. While the global optimization of Q is a NP-competed problem, CNM method trys to greedily optimize Modularity Q by merging two communities that will have the maximum growth of Q each time. It starts with the assumption that every single vertex is a community, C (n) = {c 1,...,c n } where c i = {V i }. It will merge two communities once a time for the maximum growth of Q, when finally theqdescends, the algorithm give the result ofc (s) = {c 1,...,c s }. The process can be seen as searching the agglomerative path in the space of partition. To make the searching efficiently, CNM method maintains a maxheap of Q, the elements Q ij which means the variation of Modularity Q when we merge the community i into j. According to (1), let Q i denote the contribution of the ith community to Q, Q ij the contribution of the newly merged community, then we initially set Q ij = Q ij (Q i +Q j ) = { 1/2m ki k j /(2m) 2, if i,j are connected,, otherwise. (3) Each step merge two communities and update the Q of the communities which connected to the original two. For convenient we keep the newly merged community tag as j, and delete the ith community. The update rules are as follows. If community k is connected to both i and j, then Q jk = Q ik + Q jk. (4a) Ifk is connected toibut not toj, then Q jk = Q ik 2a j a k. (4b) If k is connected toj but not toi, then Q jk = Q jk 2a i a k. (4c) Equation (4) shows us a fact that whenever the largest Q is a negative value, the merging process will not ever make any growth ofqsince all the relevant quantity of Q in (4) will still keep negative. The variation of the ModularityQwith respect to the merging process is a concave function. 2
3 Network Traditional Method Subgraph C1l Processing Node 1 Processing Node 2 subgraph C1l Improved CNM Improved CNM Subgraph C2l Subgraph C2l Subgraph C2l Subgraph C2l Subgraph C2l Subgraph C2l Community C3l Community C3l Community C3l Community C3l Figure 1: A framwork of parallel CNM method 3 Our Method Original CNM method involves finding the largest change in Q, so parallelization is not easy. The key of parallelization of CNM lies on performing the merging work on every computing nodes while keep the relatively credible searching path. Assume that we have divided the network into c subcommunities, called 1-level subgraphs, randomly or using other algorithms. The Modularity of the clustering result is Q 1l. All these communities are distributed to computing nodes. According to the rules of merging, we would divide them into several subgraph which is called 2-level subgraphs, the intermediate result. Taking the properties of concave function into consideration, the Modularity of the result of 2-level subgraphs Q 2l will be not less than that of 1-level subgraphsq 1l. Again, to get the final result, we adopt CNM on the 2-level subgraphs. After this step we are supposed to get a higher Modularity because of the same reason mentioned above. The process is shown in Fig.1. In conclusion we parallelize the intermediate merging process of CNM in several computing nodes and finally amalgamate the 2-level subgraphs in a master node. And our parallel method will get a higher Modularity for the fist-step raw result whether it is random or result using other algorithms. There are two points worth emphasizing: 1. For the 1-level subgraph, merging process starts by initializing according to (3), where k i is not the degree in the subgraph but in the original graph, which will keep optimizing the global Modularity and make the algorithm perform better than the former. 2. For the 2-level subgraph, (3) should be modified slightly as follows for the initial community being no longer a single vertex, wheree ij anda i is defined in (2), respectively. Q ij = e ij 2a i a j. (5) While the process of 1-level subgraph is independent besides some global variable like k i. Parallelization is easy to realize. Pseudocode is shown in Appendix A. Based on the algorithm described above, two community detection algorithms are proposed as follows. 3.1 Iterative parallel CNM method The most straightforward implementation of parallel CNM method divides the network into several groups randomly and takes them as the input of our iterative method. The algorithm mentioned above is a single step in the iterations. Then each iteration step can get a new result of community detection which has a higher Modularity. Our method will be more efficient because the complexity 3
4 of one iteration step is approximately 1 p 3 times that of the original one. We called the method ICNM for short. 3.2 Parallel CNM method collaborating with Spectral Clustering The complexity of basic Lanczos method is linear related to the number of groups in Spectral Clustering method. And Spectral Clustering can t ensure that the subcommunity be a fully connected graph. Spectral Clustering method thus is not an efficient method for refined clustering with many groups, but is suitable to be the traditional method in our algorithm as shown in Fig.1, because p, which is also the number of 1-level communities, is relatively small. In conclusion, we can use Spectral Clustering method to get p subgraphs, while our method mentioned above is applied to get the final community detection result. It will be calculated only once rather than iteratively because spectral clustering results is relatively convincible. We called the method SCNM for short. 4 Empirical Evaluation In this section, we evaluate the proposed method using some benchmark networks and real-world networks. Our parallel algorithm is implemented in Hadoop Mapreduce framework which is a Google parallel computing framework on our blade server of IBM HS21*9, running RedHat 5.5 with linux kernel , hadoop.2.2 and java Tests on computer-generated networks Benchmark networks[7] is used to test our algorithm. We generate the benchmark graphs with these parameters: k = 12,k max = 1,γ = 2,β = 1,µ =.2,s min = 5,s max = 2. For the benchmark network with 1, nodes, cumulative time and the variation of Modularity for each step in our ICNM and are tested with the increasing of the iterations. The result is shown in Fig.2. From Fig.2, we can see that our method runs in much shorter times and get little higher Modularity than original CNM method. Fig.2 also shows us when to stop the iteration. For, it stop when the Modularity is much close to that of the previous iteration. In the test given below, we stop iteration when(q now Q pre )/Q now >.1 intuitively. For, as Sect(3.2) said, it will iterate once because of the high Modularity after only one iteration. Test for performance in networks with different size as shown in Fig.3 illustrates that our method is suitable for very large network rather than the small one because the iterative and parallel process need more inter-node communication overhead and the start time in our Mapreduce framework Run time of original CNM 1.9 Modularity of the result by original CNM.8 Cumulative time/s 2 15 Clustering by Spectral Clustering method 1 Modularity Clustering by Spectral Clustering method 5 Clustering randomly.2.1 Clustering randomly Iterations Iterations Figure 2: Cumulative time and the Modularity with the increasing of the iterations in our methods 4.2 Tests on real world networks In this subsection, we test our method in a real world network of 1,628,853 vertices and 159,285,54 edges we crawled from Sina Weibo which is a twitter-like micro blog in China. The community 4
5 3 25 Original CNM method Original CNM method 2.76 Run time/s 15 Modularity Network size x Network size x 1 4 Figure 3: The runtime and performance of our method in networks with different size. detection result is shown in Appendix B and the Modularity score and runtime of different methods in Table 3. The modularity is relatively high proving the credibility of the result. Table 3: The Modularity score and runtime of different methods in our Sina Weibo dataset Method Modularity Runtime/s CNM ,338 ICNM ,431 SCNM ,899 5 Significance and Impact In this paper, we proposed a parallel algorithm which is based on CNM method but change the way of optimizing the Modularity. It doesn t join communities together in pairs with the greatest increase in Modularity but find the relatively great Modularity s increase and join communities in each processing node. It can improve the performance of traditional method within a relatively short time. Based on the ideas above we also proposed two parallel hierarchical community detection methods. Our parallel implementation achieves relatively high modularity within much shorter time than original CNM method. The much improved speed of our algorithm makes it possible to find community structure in very large networks, which is not difficult to be handled by traditional algorithms because of limitations in both cpu and memories. References [1] Santo Fortunato. Community detection in graphs. Physics Reports, 486:75-174, 21. [2] Andrea Lancichinetti, Santo Fortunato. Community detection algorithms: A comparative analysis. Phisics Review E 9, 56117, 29. [3] Jure Leskovec, Kevin J. Lang and Michael Mahoney. Empirical comparison of algorithms for network community detection. Proceedings of the 19th international conference on World wide web, , 21. [4] M.E.J.Newman, M.Girvan. Finding and Evaluating Community Structure in Networks. Physical Review E 69, 26133, 24. [5] A.Clauset, M.E.J Newman and C.Moore. Finding Community Structure in Very Large Networks. Physical Review E 7, 66111, 24. [6] Ulrike von Luxburg. A tutorial on spectral clustering. Statics and Computing, 17: , 27. [7] Andrea Lancichinetti, Santo Fortunato, Filippo Radicchi. Benchmark Graph for Testing Community Detection Algorithm Phisical Review E 78, 4611, 28. 5
TELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning
More informationModularity CMSC 858L
Modularity CMSC 858L Module-detection for Function Prediction Biological networks generally modular (Hartwell+, 1999) We can try to find the modules within a network. Once we find modules, we can look
More informationL1-graph based community detection in online social networks
L1-graph based community detection in online social networks Liang Huang 1, Ruixuan Li 1, Kunmei Wen 1, Xiwu Gu 1, Yuhua Li 1 and Zhiyong Xu 2 1 Huazhong University of Science and Technology 2 Suffork
More informationDepartment of Computer Science San Marcos, TX Report Number TXSTATE-CS-TR Clustering in the Cloud. Xuan Wang
Department of Computer Science San Marcos, TX 78666 Report Number TXSTATE-CS-TR-2010-24 Clustering in the Cloud Xuan Wang 2010-05-05 !"#$%&'()*+()+%,&+!"-#. + /+!"#$%&'()*+0"*-'(%,1$+0.23%(-)+%-+42.--3+52367&.#8&+9'21&:-';
More informationHierarchical Overlapping Community Discovery Algorithm Based on Node purity
Hierarchical Overlapping ommunity Discovery Algorithm Based on Node purity Guoyong ai, Ruili Wang, and Guobin Liu Guilin University of Electronic Technology, Guilin, Guangxi, hina ccgycai@guet.edu.cn,
More informationCommunity Structure Detection. Amar Chandole Ameya Kabre Atishay Aggarwal
Community Structure Detection Amar Chandole Ameya Kabre Atishay Aggarwal What is a network? Group or system of interconnected people or things Ways to represent a network: Matrices Sets Sequences Time
More informationSocial-Network Graphs
Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities
More informationOnline Social Networks and Media. Community detection
Online Social Networks and Media Community detection 1 Notes on Homework 1 1. You should write your own code for generating the graphs. You may use SNAP graph primitives (e.g., add node/edge) 2. For the
More informationCommunity Detection in Directed Weighted Function-call Networks
Community Detection in Directed Weighted Function-call Networks Zhengxu Zhao 1, Yang Guo *2, Weihua Zhao 3 1,3 Shijiazhuang Tiedao University, Shijiazhuang, Hebei, China 2 School of Mechanical Engineering,
More informationAn Empirical Analysis of Communities in Real-World Networks
An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization
More informationOn community detection in very large networks
On community detection in very large networks Alexandre P. Francisco and Arlindo L. Oliveira INESC-ID / CSE Dept, IST, Tech Univ of Lisbon Rua Alves Redol 9, 1000-029 Lisboa, PT {aplf,aml}@inesc-id.pt
More informationCommunity Detection: Comparison of State of the Art Algorithms
Community Detection: Comparison of State of the Art Algorithms Josiane Mothe IRIT, UMR5505 CNRS & ESPE, Univ. de Toulouse Toulouse, France e-mail: josiane.mothe@irit.fr Karen Mkhitaryan Institute for Informatics
More informationSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS Ren Wang, Andong Wang, Talat Iqbal Syed and Osmar R. Zaïane Department of Computing Science, University of Alberta, Canada ABSTRACT
More informationarxiv: v2 [cs.si] 22 Mar 2013
Community Structure Detection in Complex Networks with Partial Background Information Zhong-Yuan Zhang a arxiv:1210.2018v2 [cs.si] 22 Mar 2013 Abstract a School of Statistics, Central University of Finance
More informationSpectral Graph Multisection Through Orthogonality
Spectral Graph Multisection Through Orthogonality Huanyang Zheng and Jie Wu Department of Computer and Information Sciences Temple University, Philadelphia, PA 922 {huanyang.zheng, jiewu}@temple.edu ABSTRACT
More informationMining Social Network Graphs
Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be
More informationAn Efficient Algorithm for Community Detection in Complex Networks
An Efficient Algorithm for Community Detection in Complex Networks Qiong Chen School of Computer Science & Engineering South China University of Technology Guangzhou Higher Education Mega Centre Panyu
More informationCommunity detection using boundary nodes in complex networks
Community detection using boundary nodes in complex networks Mursel Tasgin and Haluk O. Bingol Department of Computer Engineering Bogazici University, Istanbul In this paper, we propose a new community
More informationDemystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian
Demystifying movie ratings 224W Project Report Amritha Raghunath (amrithar@stanford.edu) Vignesh Ganapathi Subramanian (vigansub@stanford.edu) 9 December, 2014 Introduction The past decade or so has seen
More informationClustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic
Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the
More informationStatistical Physics of Community Detection
Statistical Physics of Community Detection Keegan Go (keegango), Kenji Hata (khata) December 8, 2015 1 Introduction Community detection is a key problem in network science. Identifying communities, defined
More informationGeneralized Louvain method for community detection in large networks
Generalized Louvain method for community detection in large networks Pasquale De Meo, Emilio Ferrara, Giacomo Fiumara, Alessandro Provetti, Dept. of Physics, Informatics Section. Dept. of Mathematics.
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationNetwork community detection with edge classifiers trained on LFR graphs
Network community detection with edge classifiers trained on LFR graphs Twan van Laarhoven and Elena Marchiori Department of Computer Science, Radboud University Nijmegen, The Netherlands Abstract. Graphs
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationImplementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b
International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory
More informationClustering on networks by modularity maximization
Clustering on networks by modularity maximization Sonia Cafieri ENAC Ecole Nationale de l Aviation Civile Toulouse, France thanks to: Pierre Hansen, Sylvain Perron, Gilles Caporossi (GERAD, HEC Montréal,
More informationCAIM: Cerca i Anàlisi d Informació Massiva
1 / 72 CAIM: Cerca i Anàlisi d Informació Massiva FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Balcázar, Ricard Gavaldá Department of Computer Science, UPC Fall 2016 http://www.cs.upc.edu/~caim
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford
More informationLocal higher-order graph clustering
Local higher-order graph clustering Hao Yin Stanford University yinh@stanford.edu Austin R. Benson Cornell University arb@cornell.edu Jure Leskovec Stanford University jure@cs.stanford.edu David F. Gleich
More informationMy favorite application using eigenvalues: partitioning and community detection in social networks
My favorite application using eigenvalues: partitioning and community detection in social networks Will Hobbs February 17, 2013 Abstract Social networks are often organized into families, friendship groups,
More informationAN ANT-BASED ALGORITHM WITH LOCAL OPTIMIZATION FOR COMMUNITY DETECTION IN LARGE-SCALE NETWORKS
AN ANT-BASED ALGORITHM WITH LOCAL OPTIMIZATION FOR COMMUNITY DETECTION IN LARGE-SCALE NETWORKS DONGXIAO HE, JIE LIU, BO YANG, YUXIAO HUANG, DAYOU LIU *, DI JIN College of Computer Science and Technology,
More informationV4 Matrix algorithms and graph partitioning
V4 Matrix algorithms and graph partitioning - Community detection - Simple modularity maximization - Spectral modularity maximization - Division into more than two groups - Other algorithms for community
More informationChapter 10. Fundamental Network Algorithms. M. E. J. Newman. May 6, M. E. J. Newman Chapter 10 May 6, / 33
Chapter 10 Fundamental Network Algorithms M. E. J. Newman May 6, 2015 M. E. J. Newman Chapter 10 May 6, 2015 1 / 33 Table of Contents 1 Algorithms for Degrees and Degree Distributions Degree-Degree Correlation
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank
More informationCS224W Final Report: Study of Communities in Bitcoin Network
CS224W Final Report: Study of Communities in Bitcoin Network Jim Hu Group 40 December 10, 2013 9AM EST 1. Abstract Bitcoin is the world s first person-to-person decentralized digital currency allowing
More informationCommunity Detection in Social Networks
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 5-24-2017 Community Detection in Social Networks Ketki Kulkarni San Jose State University Follow
More informationResearch on Community Structure in Bus Transport Networks
Commun. Theor. Phys. (Beijing, China) 52 (2009) pp. 1025 1030 c Chinese Physical Society and IOP Publishing Ltd Vol. 52, No. 6, December 15, 2009 Research on Community Structure in Bus Transport Networks
More informationNon Overlapping Communities
Non Overlapping Communities Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides
More informationCommunity detection. Leonid E. Zhukov
Community detection Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Network Science Leonid E.
More informationPSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets
2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department
More informationCommunity Structure and Beyond
Community Structure and Beyond Elizabeth A. Leicht MAE: 298 April 9, 2009 Why do we care about community structure? Large Networks Discussion Outline Overview of past work on community structure. How to
More informationExtracting Information from Complex Networks
Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform
More informationCrawling and Detecting Community Structure in Online Social Networks using Local Information
Crawling and Detecting Community Structure in Online Social Networks using Local Information Norbert Blenn, Christian Doerr, Bas Van Kester, Piet Van Mieghem Department of Telecommunication TU Delft, Mekelweg
More informationModeling and Detecting Community Hierarchies
Modeling and Detecting Community Hierarchies Maria-Florina Balcan, Yingyu Liang Georgia Institute of Technology Age of Networks Massive amount of network data How to understand and utilize? Internet [1]
More informationMaximizing edge-ratio is NP-complete
Maximizing edge-ratio is NP-complete Steven D Noble, Pierre Hansen and Nenad Mladenović February 7, 01 Abstract Given a graph G and a bipartition of its vertices, the edge-ratio is the minimum for both
More information1 Non greedy algorithms (which we should have covered
1 Non greedy algorithms (which we should have covered earlier) 1.1 Floyd Warshall algorithm This algorithm solves the all-pairs shortest paths problem, which is a problem where we want to find the shortest
More informationCommunity Structure in Graphs
Community Structure in Graphs arxiv:0712.2716v1 [physics.soc-ph] 17 Dec 2007 Santo Fortunato a, Claudio Castellano b a Complex Networks LagrangeLaboratory(CNLL), ISI Foundation, Torino, Italy b SMC, INFM-CNR
More informationCUT: Community Update and Tracking in Dynamic Social Networks
CUT: Community Update and Tracking in Dynamic Social Networks Hao-Shang Ma National Cheng Kung University No.1, University Rd., East Dist., Tainan City, Taiwan ablove904@gmail.com ABSTRACT Social network
More informationG(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu
G(B)enchmark GraphBench: Towards a Universal Graph Benchmark Khaled Ammar M. Tamer Özsu Bioinformatics Software Engineering Social Network Gene Co-expression Protein Structure Program Flow Big Graphs o
More informationImplementation of Network Community Profile using Local Spectral algorithm and its application in Community Networking
Implementation of Network Community Profile using Local Spectral algorithm and its application in Community Networking Vaibhav VPrakash Department of Computer Science and Engineering, Sri Jayachamarajendra
More informationCommunity Detection: A Bayesian Approach and the Challenge of Evaluation
Community Detection: A Bayesian Approach and the Challenge of Evaluation Jon Berry Danny Dunlavy Cynthia A. Phillips Dave Robinson (Sandia National Laboratories) Jiqiang Guo Dan Nordman (Iowa State University)
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationBenchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities
PHYSICAL REVIEW E 8, 68 29 Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities Andrea Lancichinetti and Santo Fortunato Complex Networks Lagrange
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationSupplementary material to Epidemic spreading on complex networks with community structures
Supplementary material to Epidemic spreading on complex networks with community structures Clara Stegehuis, Remco van der Hofstad, Johan S. H. van Leeuwaarden Supplementary otes Supplementary ote etwork
More informationThis article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution
More informationThe clustering in general is the task of grouping a set of objects in such a way that objects
Spectral Clustering: A Graph Partitioning Point of View Yangzihao Wang Computer Science Department, University of California, Davis yzhwang@ucdavis.edu Abstract This course project provide the basic theory
More informationDetecting Community Structure for Undirected Big Graphs Based on Random Walks
Detecting Community Structure for Undirected Big Graphs Based on Random Walks Xiaoming Liu 1, Yadong Zhou 1, Chengchen Hu 1, Xiaohong Guan 1,, Junyuan Leng 1 1 MOE KLNNIS Lab, Xi an Jiaotong University,
More informationA Simple Acceleration Method for the Louvain Algorithm
A Simple Acceleration Method for the Louvain Algorithm Naoto Ozaki, Hiroshi Tezuka, Mary Inaba * Graduate School of Information Science and Technology, University of Tokyo, Tokyo, Japan. * Corresponding
More informationA New Model of Search Engine based on Cloud Computing
A New Model of Search Engine based on Cloud Computing DING Jian-li 1,2, YANG Bo 1 1. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China 2. Tianjin Key
More information1 Large-scale network structures
Inference, Models and Simulation for Complex Systems CSCI 7000-001 Lecture 15 25 October 2011 Prof. Aaron Clauset 1 Large-scale network structures The structural measures we met in Lectures 12a and 12b,
More informationINTRODUCTION. Chapter GENERAL
Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationDetecting community structure in networks
Eur. Phys. J. B 38, 321 330 (2004) DOI: 10.1140/epjb/e2004-00124-y THE EUROPEAN PHYSICAL JOURNAL B Detecting community structure in networks M.E.J. Newman a Department of Physics and Center for the Study
More informationDatabases 2 (VU) ( )
Databases 2 (VU) (707.030) Map-Reduce Denis Helic KMI, TU Graz Nov 4, 2013 Denis Helic (KMI, TU Graz) Map-Reduce Nov 4, 2013 1 / 90 Outline 1 Motivation 2 Large Scale Computation 3 Map-Reduce 4 Environment
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki Wagner Meira Jr. Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department
More informationJure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah
Jure Leskovec (@jure) Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah 2 My research group at Stanford: Mining and modeling large social and information networks
More informationSpectral Methods for Network Community Detection and Graph Partitioning
Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection
More informationDatabases 2 (VU) ( / )
Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:
More informationAn Optimal Allocation Approach to Influence Maximization Problem on Modular Social Network. Tianyu Cao, Xindong Wu, Song Wang, Xiaohua Hu
An Optimal Allocation Approach to Influence Maximization Problem on Modular Social Network Tianyu Cao, Xindong Wu, Song Wang, Xiaohua Hu ACM SAC 2010 outline Social network Definition and properties Social
More informationLecture Note: Computation problems in social. network analysis
Lecture Note: Computation problems in social network analysis Bang Ye Wu CSIE, Chung Cheng University, Taiwan September 29, 2008 In this lecture note, several computational problems are listed, including
More informationEXTREMAL OPTIMIZATION AND NETWORK COMMUNITY STRUCTURE
EXTREMAL OPTIMIZATION AND NETWORK COMMUNITY STRUCTURE Noémi Gaskó Department of Computer Science, Babeş-Bolyai University, Cluj-Napoca, Romania gaskonomi@cs.ubbcluj.ro Rodica Ioana Lung Department of Statistics,
More information1 More configuration model
1 More configuration model In the last lecture, we explored the definition of the configuration model, a simple method for drawing networks from the ensemble, and derived some of its mathematical properties.
More informationImproved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG *
2016 Joint International Conference on Artificial Intelligence and Computer Engineering (AICE 2016) and International Conference on Network and Communication Security (NCS 2016) ISBN: 978-1-60595-362-5
More informationDynamic Clustering in Social Networks using Louvain and Infomap Method
Dynamic Clustering in Social Networks using Louvain and Infomap Method Pascal Held, Benjamin Krause, and Rudolf Kruse Otto von Guericke University of Magdeburg Department of Knowledge Processing and Language
More information732A54/TDDE31 Big Data Analytics
732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks
More informationFall 2018: Introduction to Data Science GIRI NARASIMHAN, SCIS, FIU
Fall 2018: Introduction to Data Science GIRI NARASIMHAN, SCIS, FIU !2 MapReduce Overview! Sometimes a single computer cannot process data or takes too long traditional serial programming is not always
More informationarxiv: v1 [cs.si] 5 Aug 2013
Clustering and Community Detection in Directed Networks: A Survey Fragkiskos D. Malliaros a,, Michalis Vazirgiannis a,b a Computer Science Laboratory, École Polytechnique, 91120 Palaiseau, France b Department
More informationCIS 121 Data Structures and Algorithms Minimum Spanning Trees
CIS 121 Data Structures and Algorithms Minimum Spanning Trees March 19, 2019 Introduction and Background Consider a very natural problem: we are given a set of locations V = {v 1, v 2,..., v n }. We want
More informationLabel propagation with dams on large graphs using Apache Hadoop and Apache Spark
Label propagation with dams on large graphs using Apache Hadoop and Apache Spark ATTAL Jean-Philippe (1) MALEK Maria (2) (1) jal@eisti.eu (2) mma@eisti.eu October 19, 2015 Contents 1 Real-World Graphs
More informationMachine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler Overview What is clustering and its applications? Distance between two clusters. Hierarchical Agglomerative clustering.
More informationImproving Image Segmentation Quality Via Graph Theory
International Symposium on Computers & Informatics (ISCI 05) Improving Image Segmentation Quality Via Graph Theory Xiangxiang Li, Songhao Zhu School of Automatic, Nanjing University of Post and Telecommunications,
More informationJure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research
Jure Leskovec, Cornell/Stanford University Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research Network: an interaction graph: Nodes represent entities Edges represent interaction
More informationCSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection
CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationα Coverage to Extend Network Lifetime on Wireless Sensor Networks
Noname manuscript No. (will be inserted by the editor) α Coverage to Extend Network Lifetime on Wireless Sensor Networks Monica Gentili Andrea Raiconi Received: date / Accepted: date Abstract An important
More informationCommunity Mining in Signed Networks: A Multiobjective Approach
Community Mining in Signed Networks: A Multiobjective Approach Alessia Amelio National Research Council of Italy (CNR) Inst. for High Perf. Computing and Networking (ICAR) Via Pietro Bucci, 41C 87036 Rende
More informationExploiting Efficient Densest Subgraph Discovering Methods for Big Data
Exploiting Efficient Densest Subgraph Discovering Methods for Big Data Bo Wu and Haiying Shen Electrical and Computer Engineering, Clemson University, Clemson, SC 29634 Department of Computer Science,
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationOverlapping Communities
Yangyang Hou, Mu Wang, Yongyang Yu Purdue Univiersity Department of Computer Science April 25, 2013 Overview Datasets Algorithm I Algorithm II Algorithm III Evaluation Overview Graph models of many real
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 2. MapReduce Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Framework A programming model
More informationLarge Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System
Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Seunghwa Kang David A. Bader 1 A Challenge Problem Extracting a subgraph from
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Shortest Paths Date: 10/13/15
600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Shortest Paths Date: 10/13/15 14.1 Introduction Today we re going to talk about algorithms for computing shortest
More informationSolution for Homework set 3
TTIC 300 and CMSC 37000 Algorithms Winter 07 Solution for Homework set 3 Question (0 points) We are given a directed graph G = (V, E), with two special vertices s and t, and non-negative integral capacities
More informationAdaptive Modularity Maximization via Edge Weighting Scheme
Information Sciences, Elsevier, accepted for publication September 2018 Adaptive Modularity Maximization via Edge Weighting Scheme Xiaoyan Lu a, Konstantin Kuzmin a, Mingming Chen b, Boleslaw K. Szymanski
More informationSocial Data Management Communities
Social Data Management Communities Antoine Amarilli 1, Silviu Maniu 2 January 9th, 2018 1 Télécom ParisTech 2 Université Paris-Sud 1/20 Table of contents Communities in Graphs 2/20 Graph Communities Communities
More informationAN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE
AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE Changwu Zhu 1, Guanxiang Yan 2, Zhi Liu 3, Li Gao 1,* 1 Department of Computer Science, Hua Zhong Normal University, Wuhan 430079, China
More informationCS224W: Analysis of Networks Jure Leskovec, Stanford University
CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models
More informationImprovements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1
3rd International Conference on Machinery, Materials and Information Technology Applications (ICMMITA 2015) Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao
More informationA Novel Similarity-based Modularity Function for Graph Partitioning
A Novel Similarity-based Modularity Function for Graph Partitioning Zhidan Feng 1, Xiaowei Xu 1, Nurcan Yuruk 1, Thomas A.J. Schweiger 2 The Department of Information Science, UALR 1 Acxiom Corporation
More information