Label propagation with dams on large graphs using Apache Hadoop and Apache Spark

Size: px
Start display at page:

Download "Label propagation with dams on large graphs using Apache Hadoop and Apache Spark"

Transcription

1 Label propagation with dams on large graphs using Apache Hadoop and Apache Spark ATTAL Jean-Philippe (1) MALEK Maria (2) (1) (2) October 19, 2015

2 Contents 1 Real-World Graphs 2 Community Detection Problem Community detection algorithms Supervised and unsupervised measures 3 Benchmarks to test our community detection algorithms 4 Proposed community detection algorithms 5 Perspectives : a Hadoop and Spark implementation, first experiments on Amazon 6 Conclusion

3 What is a graph? A simple definition A graph is a set of vertices and edges, or set of nodes that are connected by a certain degree of relationship (a certain level of similarity). A mathematical definition A graph is an ordered pair G = (V, E) where V is the set of vertices (or nodes) and E is the set of edges of the graph. We note: V = n is the number of vertices of G E = m is the number edges of G Figure 1: weighted graph Figure 2: unweighted graph

4 Contents 1 Real-World Graphs 2 Community Detection Problem Community detection algorithms Supervised and unsupervised measures 3 Benchmarks to test our community detection algorithms 4 Proposed community detection algorithms 5 Perspectives : a Hadoop and Spark implementation, first experiments on Amazon 6 Conclusion

5 The community detection problem The community detection problem The community detection problem is to find a partition of nodes in a way that there is a high density of edges within a group and a low density of edges between groups. It is the detection of natural groups of vertices in networks. There is no absolute definition in the literature Figure 3: A toy example with 3 communities of different scales

6 Analysis on the community detections It exists three big families in community detection : Local method based on the node and its neighbourhood propagates an information or an operation not deterministic and very unstable weak complexity O(m) Global method based on the whole topology of the graph high complexity (often in O(n 3 )) propagate an information or an operation deterministic, subject to error propagation Hybrid method (Glocal) based on a local method which is lead by one or several global metrics complexity depends on the global metrics used better quality results than local methods not deterministic

7 Example of the three methodologies Figure 4: Agglomerative, spectral and Leader driven methods

8 Unsupervised measures To evaluate the quality of our community detection algorithms: Non supervised measures : Modularity and conductance Supervised Measures : Adjusted rand index, normalised mutual information and purity. Let f (S) the metric that captures the notion of the quality of the cluster. Optimized value of f (S) signifies a more community-like set of nodes: C The conductance : f (S) = S N S (2m S +c S ) measures the fraction of total edge volume that points outside the cluster. The modularity :f (S) = 1 4 (m S E(m S )) is the difference between m S, the number of edges between nodes in S and E(m S ), the expected number of such edges in a random graph with identical degree sequence. where N S is the number of nodes in S, and C S the number of outgoing links.

9 Supervised measures : Adjusted Rand Index (1971) Adjusted Rand Index Let S be a set of N data items, and U = {U 1, U 2,..., U R } and V = {V 1, V 2,..., V C } two partitions of S, information on the overlap between U and V can be summarized in form of a R C contingency table M = [nij] j=1...c i=1...r, where n ij denotes the number of objects that are common to clusters U i and V j. By counting ( a b) possibilities, we have N 11 : the number of pairs that are in the same cluster in both U and V ; N 00 : the number of pairs that are in different clusters in both U and V N 01 : the number of pairs that are in the same cluster in U but in different clusters in V N 10 : the number of pairs that are in different clusters in U but in the same cluster in V 2(N 00 N 11 N 01 N 10 ) ARI = (N 00 + N 01 )(N 01 + N 11 ) + (N 00 + N 10 )(N 10 + N 11 ) ATTAL Jean-Philippe Figureand 5: MALEK College Maria football Label propagation club network with dams on large graphs using Apache H

10 Supervides measures : The Normalised mutual information (1987) H(U) = R a i i=1 N log( a i N ) H(U, V ) = R C n ij i=1 j=1 N log( n ij N ) MI(U, V ) = H(U) + H(V ) H(U, V ) Values of the NMI The MI as the NMI : measures the information that partitions U and V share tells how much knowing one of these clusterings reduces our uncertainty about the other. The NMI can be considered as a function : f NMI (U, V ) [0, 1]

11 Contents 1 Real-World Graphs 2 Community Detection Problem Community detection algorithms Supervised and unsupervised measures 3 Benchmarks to test our community detection algorithms 4 Proposed community detection algorithms 5 Perspectives : a Hadoop and Spark implementation, first experiments on Amazon 6 Conclusion

12 Networks characteristics Figure 6: Network Characteristics

13 Contents 1 Real-World Graphs 2 Community Detection Problem Community detection algorithms Supervised and unsupervised measures 3 Benchmarks to test our community detection algorithms 4 Proposed community detection algorithms 5 Perspectives : a Hadoop and Spark implementation, first experiments on Amazon 6 Conclusion

14 Proposed community detection algorithm The label propagation is a non deterministic algorithm based on the propagation label. 1: Initialize nodes with unique labels, as n N, c n = l n 2: update each node s label to the label shared by most of its neighborhood, ie n N, C n = arg max l L l (n) until the convergence. 3: If every node has a label the maximum number of their neighbors have, then stop the algorithm, go to the previous step Figure 7: The label propagation This algorithm is a local method. low complexity O(m) non deterministic method very unstable (without order or preferential order)

15 Community detection algorithms : problem of Label propagation Label propagation has two major problems: It can produce a bad propagation It can produce "monster" communities It is highly unstable (case where a given order is not considered) Figure 8: The label propagation

16 Community detection algorithms : other label propagation algorithms in the literature Adding a score to each label which decreases when the geodesic distance from the label source node is too high (Leung et al. 2009) Multistep greedy agglomerative label propagation algorithm using modularity (Liu and Murata 2009) Offensive, defensive and hop attenuation label propagation (node propagation strength), (Lovro et al. 2013) Copra, Finding overlapping communities (Steve Gregory 2009) "Community Detection Using A Neighborhood Strength Driven Label Propagation Algorithm" (Xie and B.K. Szymanski 2011) Maximum overlap label core detection using MapReduce (Ovelgonne 2013) "Robust network community detection using balanced propagation"(subelj and Bajec 2013) "Controlled Label Propagation: Preventing Over Propagation through Gradual Expansion" (Rezaei and Soleymani 2015) The list is not exhaustive.

17 Community detection algorithms : proposed Method, a stabilized label propagation with dams Objective: Find community structures without the problem of bad propagation. Method which can be easily parallelizable and produces a good quality of community detection. Figure 9: The label propagation with dams We note β the percentage of edges with dams.

18 Community detection algorithms : proposed Method, an hybrid method, a stabilized label propagation with dams The edge betweenness centrality Let G = (V, E) a graph, where V is the set of nodes and E the set of edges of G. Let w the function of the weight on edges. For an unweighted graph, we have w(e) = 1. Let a path between two vertices starting at s V and finishing at t V. Note σ st the total number of shortest paths between vertices s and t. The notion of edge betweenness is based on the number of shortest paths that pass through a certain edge. The edge betweenness BC(e) for an edge e E is given by: BC(e) = s,t V,s t σ st (e) σ st where σ st (e) represents the number of shortest paths from nodes s to t and passing though the edge e E.

19 Community detection algorithms : proposed Method, the LPWD Figure 10: The LPWD Complexity: O(n 2 ) + k O(m βm)

20 Community detection algorithms : proposed Method, the LPWD Figure 11: The LPWD on Football Club

21 Community detection algorithms : a core label propagation with dams Figure 12: The core detection of Seifi et al. (2011)

22 Community detection algorithms : LPWDUS Figure 13: The LPWDUS Complexity: O(n 2 ) + O(N k (m βm))

23 Community detection algorithms : LPWDUSWS Figure 14: The LPWDUSWS The modularity can be used as score metric. Complexity : O(n 2 ) + O( 1 N k (m βm)).

24 Community detection algorithms : ECDLPWD Figure 15: The ECDLPWD Complexity: O(n 2 ) + O( 1 N k (m βm)).

25 Comparative analysis Experiences with some community detection algorithms Algorithms Q Φ NMI ARI Purity # Zachary #2 Louvain Seifi GN Spin Spectral WalkTrap Leung LPA * DPA Infomap LICOD ECDLPWD LPWDUS LPWDUSWM LPWODUS Table 1: Experiences with some community detection algorithms

26 Comparative analysis Experiences with some community detection algorithms Algorithms Q Φ NMI ARI Purity # Football #11 Louvain Seifi GN Spin Spectral WalkTrap Leung LPA * DPA Infomap LICOD ECDLPWD LPWDUS LPWDUSWM LPWODUS Table 2: Experiences with some community detection algorithms

27 Comparative analysis Experiences with some community detection algorithms Algorithms Q Φ NMI ARI Purity # Dolphins #2 Louvain Seifi GN Spin Spectral WalkTrap Leung LPA * DPA Infomap LICOD ECDLPWD LPWDUS LPWDUSWM LPWODUS Table 3: Experiences with some community detection algorithms

28 Comparative analysis Experiences with some community detection algorithms Algorithms Q Φ NMI ARI Purity # Political #3 Louvain Seifi GN Spin Spectral WalkTrap Leung LPA * DPA Infomap LICOD ECDLPWD LPWDUS LPWDUSWM LPWODUS Table 4: Experiences with some community detection algorithms

29 Contents 1 Real-World Graphs 2 Community Detection Problem Community detection algorithms Supervised and unsupervised measures 3 Benchmarks to test our community detection algorithms 4 Proposed community detection algorithms 5 Perspectives : a Hadoop and Spark implementation, first experiments on Amazon 6 Conclusion

30 Perspectives and current work: Parallel graph processing systems Figure 16: PGPS

31 Perspectives : graph partitioning with Mizan Figure 17: Dynamic graph partitioning

32 Perspectives and current work: HADOOP and MapReduce Figure 18: Hadoop architecture

33 Perspectives and current work: Spark and RDD Figure 19: Spark architecture

34 Perspectives and current work: label propagation on large graphs Current work: Work on large graphs with billions of edges. A Hadoop version for large scale graphs How to compute the edge betweenness on large graphs How to develop an in memory solution using Spark Study the parametrization the LPWDUS with α and β. Proposed an improved version of the label propagation with core detection DBLP, You Tube, Live Journal

35 Perspectives: Amazon graph Amazon graph It represents a network of products, where each vertex is a product and an edge exists between two products if they have been co purchased frequently. Figure 20: Amazon characteristics network

36 Perspectives: a label propagation on Hadoop Figure 21: Simple Label propagation on Hadoop

37 Perspectives: a core label propagation on Hadoop Figure 22: Community size distribution

38 Contents 1 Real-World Graphs 2 Community Detection Problem Community detection algorithms Supervised and unsupervised measures 3 Benchmarks to test our community detection algorithms 4 Proposed community detection algorithms 5 Perspectives : a Hadoop and Spark implementation, first experiments on Amazon 6 Conclusion

39 Conclusion Putting dams allows to increase the quality of the community detection with a local method. ECDLPWD gives better results than the LPWODUS LPWDUSWM seems in specific case to give better results than the ECDLPWD. For scale free graphs, 15% to 20% of dams with the highest edge betweenness gives better quality results Putting dams associated to core detection allows to find cores, but produces a bigger number of communities rather than the standard LPA The number of community in LPWODUS depends on α

40 Do you have any questions?

41 Please cite the following articles: Jean-Philippe Attal, Maria Malek, A new label propagation with dams, IEEE/ACM international conference on advances in social networks analysis and Mining (ASONAM), Paris, août Jean-Philippe Attal, Maria Malek, Un nouvel algorithme de propagation de labels avec barrages, Journée Réseaux Sociaux et Inteligence Artificielle (Atelier PFIA), Rennes, 29 juin and Jean-Philippe Attal, Maria Malek, Propagation de labels avec barrages sur de grands graphes en utilisant Apache Hadoop et Apache Spark (GraphX), Journée thématique : Fouille de grands graphes (JFGG15), Nîmes, octobre 2015

Community detection using boundary nodes in complex networks

Community detection using boundary nodes in complex networks Community detection using boundary nodes in complex networks Mursel Tasgin and Haluk O. Bingol Department of Computer Engineering Bogazici University, Istanbul In this paper, we propose a new community

More information

Web Structure Mining Community Detection and Evaluation

Web Structure Mining Community Detection and Evaluation Web Structure Mining Community Detection and Evaluation 1 Community Community. It is formed by individuals such that those within a group interact with each other more frequently than with those outside

More information

Adaptive Modularity Maximization via Edge Weighting Scheme

Adaptive Modularity Maximization via Edge Weighting Scheme Information Sciences, Elsevier, accepted for publication September 2018 Adaptive Modularity Maximization via Edge Weighting Scheme Xiaoyan Lu a, Konstantin Kuzmin a, Mingming Chen b, Boleslaw K. Szymanski

More information

CHAPTER 3 3. LABEL PROPAGATION IN COMMUNITY DETECTION

CHAPTER 3 3. LABEL PROPAGATION IN COMMUNITY DETECTION CHAPTER 3 3. LABEL PROPAGATION IN COMMUNITY DETECTION 3.1 INTRODUCTION There exist various algorithms that identify community structures in large-scale real-world networks which were discussed in Chapter

More information

Social Data Management Communities

Social Data Management Communities Social Data Management Communities Antoine Amarilli 1, Silviu Maniu 2 January 9th, 2018 1 Télécom ParisTech 2 Université Paris-Sud 1/20 Table of contents Communities in Graphs 2/20 Graph Communities Communities

More information

Community Detection. Community

Community Detection. Community Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,

More information

Community Detection: Comparison of State of the Art Algorithms

Community Detection: Comparison of State of the Art Algorithms Community Detection: Comparison of State of the Art Algorithms Josiane Mothe IRIT, UMR5505 CNRS & ESPE, Univ. de Toulouse Toulouse, France e-mail: josiane.mothe@irit.fr Karen Mkhitaryan Institute for Informatics

More information

Weighted Label Propagation Algorithm based on Local Edge Betweenness

Weighted Label Propagation Algorithm based on Local Edge Betweenness Weighted Label Propagation Algorithm based on Local Edge Betweenness Hamid Shahrivari Joghan, Alireza Bagheri, Meysam Azad Department of Computer Engineering and Information Technology, Amir Kabir University

More information

Non Overlapping Communities

Non Overlapping Communities Non Overlapping Communities Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides

More information

Comparative Evaluation of Community Detection Algorithms: A Topological Approach

Comparative Evaluation of Community Detection Algorithms: A Topological Approach omparative Evaluation of ommunity Detection Algorithms: A Topological Approach Günce Keziban Orman,2, Vincent Labatut, Hocine herifi 2 Galatasaray University, 2 University of Burgundy korman@gsu.edu.tr,

More information

Community detection algorithms survey and overlapping communities. Presented by Sai Ravi Kiran Mallampati

Community detection algorithms survey and overlapping communities. Presented by Sai Ravi Kiran Mallampati Community detection algorithms survey and overlapping communities Presented by Sai Ravi Kiran Mallampati (sairavi5@vt.edu) 1 Outline Various community detection algorithms: Intuition * Evaluation of the

More information

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu G(B)enchmark GraphBench: Towards a Universal Graph Benchmark Khaled Ammar M. Tamer Özsu Bioinformatics Software Engineering Social Network Gene Co-expression Protein Structure Program Flow Big Graphs o

More information

A new Pre-processing Strategy for Improving Community Detection Algorithms

A new Pre-processing Strategy for Improving Community Detection Algorithms A new Pre-processing Strategy for Improving Community Detection Algorithms A. Meligy Professor of Computer Science, Faculty of Science, Ahmed H. Samak Asst. Professor of computer science, Faculty of Science,

More information

Finding Hierarchical Communities in Complex Networks Using Influence-Guided Label Propagation

Finding Hierarchical Communities in Complex Networks Using Influence-Guided Label Propagation Finding Hierarchical Communities in Complex Networks Using Influence-Guided Label Propagation Wenjun Wang and W. Nick Street Department of Management Sciences University of Iowa Iowa City, IA 52242, USA

More information

Oh Pott, Oh Pott! or how to detect community structure in complex networks

Oh Pott, Oh Pott! or how to detect community structure in complex networks Oh Pott, Oh Pott! or how to detect community structure in complex networks Jörg Reichardt Interdisciplinary Centre for Bioinformatics, Leipzig, Germany (Host of the 2012 Olympics) Questions to start from

More information

Network community detection with edge classifiers trained on LFR graphs

Network community detection with edge classifiers trained on LFR graphs Network community detection with edge classifiers trained on LFR graphs Twan van Laarhoven and Elena Marchiori Department of Computer Science, Radboud University Nijmegen, The Netherlands Abstract. Graphs

More information

LICOD: Leaders Identification for Community Detection in Complex Networks

LICOD: Leaders Identification for Community Detection in Complex Networks 2011 IEEE International Conference on Privacy, Security, Risk, and Trust, and IEEE International Conference on Social Computing LICOD: Leaders Identification for Community Detection in Complex Networks

More information

AN ANT-BASED ALGORITHM WITH LOCAL OPTIMIZATION FOR COMMUNITY DETECTION IN LARGE-SCALE NETWORKS

AN ANT-BASED ALGORITHM WITH LOCAL OPTIMIZATION FOR COMMUNITY DETECTION IN LARGE-SCALE NETWORKS AN ANT-BASED ALGORITHM WITH LOCAL OPTIMIZATION FOR COMMUNITY DETECTION IN LARGE-SCALE NETWORKS DONGXIAO HE, JIE LIU, BO YANG, YUXIAO HUANG, DAYOU LIU *, DI JIN College of Computer Science and Technology,

More information

On the Permanence of Vertices in Network Communities. Tanmoy Chakraborty Google India PhD Fellow IIT Kharagpur, India

On the Permanence of Vertices in Network Communities. Tanmoy Chakraborty Google India PhD Fellow IIT Kharagpur, India On the Permanence of Vertices in Network Communities Tanmoy Chakraborty Google India PhD Fellow IIT Kharagpur, India 20 th ACM SIGKDD, New York City, Aug 24-27, 2014 Tanmoy Chakraborty Niloy Ganguly IIT

More information

An Efficient Algorithm for Community Detection in Complex Networks

An Efficient Algorithm for Community Detection in Complex Networks An Efficient Algorithm for Community Detection in Complex Networks Qiong Chen School of Computer Science & Engineering South China University of Technology Guangzhou Higher Education Mega Centre Panyu

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

A Novel Parallel Hierarchical Community Detection Method for Large Networks

A Novel Parallel Hierarchical Community Detection Method for Large Networks A Novel Parallel Hierarchical Community Detection Method for Large Networks Ping Lu Shengmei Luo Lei Hu Yunlong Lin Junyang Zou Qiwei Zhong Kuangyan Zhu Jian Lu Qiao Wang Southeast University, School of

More information

SLPA: Uncovering Overlapping Communities in Social Networks via A Speaker-listener Interaction Dynamic Process

SLPA: Uncovering Overlapping Communities in Social Networks via A Speaker-listener Interaction Dynamic Process SLPA: Uncovering Overlapping Cmunities in Social Networks via A Speaker-listener Interaction Dynamic Process Jierui Xie and Boleslaw K. Szymanski Department of Cputer Science Rensselaer Polytechnic Institute

More information

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum

More information

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks SOMSN: An Effective Self Organizing Map for Clustering of Social Networks Fatemeh Ghaemmaghami Research Scholar, CSE and IT Dept. Shiraz University, Shiraz, Iran Reza Manouchehri Sarhadi Research Scholar,

More information

Jure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research

Jure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research Jure Leskovec, Cornell/Stanford University Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research Network: an interaction graph: Nodes represent entities Edges represent interaction

More information

arxiv: v2 [cs.si] 22 Mar 2013

arxiv: v2 [cs.si] 22 Mar 2013 Community Structure Detection in Complex Networks with Partial Background Information Zhong-Yuan Zhang a arxiv:1210.2018v2 [cs.si] 22 Mar 2013 Abstract a School of Statistics, Central University of Finance

More information

Clusters and Communities

Clusters and Communities Clusters and Communities Lecture 7 CSCI 4974/6971 22 Sep 2016 1 / 14 Today s Biz 1. Reminders 2. Review 3. Communities 4. Betweenness and Graph Partitioning 5. Label Propagation 2 / 14 Today s Biz 1. Reminders

More information

Statistical Physics of Community Detection

Statistical Physics of Community Detection Statistical Physics of Community Detection Keegan Go (keegango), Kenji Hata (khata) December 8, 2015 1 Introduction Community detection is a key problem in network science. Identifying communities, defined

More information

Efficient Community Detection Algorithm with Label Propagation using Node Importance and Link Weight

Efficient Community Detection Algorithm with Label Propagation using Node Importance and Link Weight Efficient Community Detection Algorithm with Label Propagation using Node Importance and Link Weight Mohsen Arab, Mahdieh Hasheminezhad* Department of Computer Science Yazd University, Yazd, Iran Abstract

More information

Research Article An Improved Topology-Potential-Based Community Detection Algorithm for Complex Network

Research Article An Improved Topology-Potential-Based Community Detection Algorithm for Complex Network e Scientific World Journal, Article ID 121609, 7 pages http://dx.doi.org/10.1155/2014/121609 Research Article An Improved Topology-Potential-Based Community Detection Algorithm for Complex Network Zhixiao

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford

More information

Community Detection in Social Networks

Community Detection in Social Networks San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 5-24-2017 Community Detection in Social Networks Ketki Kulkarni San Jose State University Follow

More information

Graph Sampling Approach for Reducing. Computational Complexity of. Large-Scale Social Network

Graph Sampling Approach for Reducing. Computational Complexity of. Large-Scale Social Network Journal of Innovative Technology and Education, Vol. 3, 216, no. 1, 131-137 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/1.12988/jite.216.6828 Graph Sampling Approach for Reducing Computational Complexity

More information

http://www.xkcd.com/233/ Text Clustering David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Administrative 2 nd status reports Paper review

More information

Sensor Tasking and Control

Sensor Tasking and Control Sensor Tasking and Control Outline Task-Driven Sensing Roles of Sensor Nodes and Utilities Information-Based Sensor Tasking Joint Routing and Information Aggregation Summary Introduction To efficiently

More information

Online Social Networks and Media. Community detection

Online Social Networks and Media. Community detection Online Social Networks and Media Community detection 1 Notes on Homework 1 1. You should write your own code for generating the graphs. You may use SNAP graph primitives (e.g., add node/edge) 2. For the

More information

CAIM: Cerca i Anàlisi d Informació Massiva

CAIM: Cerca i Anàlisi d Informació Massiva 1 / 72 CAIM: Cerca i Anàlisi d Informació Massiva FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Balcázar, Ricard Gavaldá Department of Computer Science, UPC Fall 2016 http://www.cs.upc.edu/~caim

More information

Extracting Information from Complex Networks

Extracting Information from Complex Networks Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform

More information

Community Detection in Bipartite Networks:

Community Detection in Bipartite Networks: Community Detection in Bipartite Networks: Algorithms and Case Studies Kathy Horadam and Taher Alzahrani Mathematical and Geospatial Sciences, RMIT Melbourne, Australia IWCNA 2014 Community Detection,

More information

ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENT

ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENT ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENT Xiaojia He 1 and Natarajan Meghanathan 2 1 University of Georgia, GA, USA, 2 Jackson State University, MS, USA 2 natarajan.meghanathan@jsums.edu

More information

Edge Weight Method for Community Detection in Scale-Free Networks

Edge Weight Method for Community Detection in Scale-Free Networks Edge Weight Method for Community Detection in Scale-Free Networks Sorn Jarukasemratana Tsuyoshi Murata Tokyo Institute of Technology WIMS'14 June 2-4, 2014 - Thessaloniki, Greece Modularity High modularity

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Overlapping Community Detection in Dynamic Networks

Overlapping Community Detection in Dynamic Networks Journal of Software Engineering and Applications, 24, 7, 872-882 Published Online September 24 in SciRes. http://www.scirp.org/journal/jsea http://dx.doi.org/.4236/jsea.24.778 Overlapping Community Detection

More information

SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS

SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS Ren Wang, Andong Wang, Talat Iqbal Syed and Osmar R. Zaïane Department of Computing Science, University of Alberta, Canada ABSTRACT

More information

Detecting Community Structure for Undirected Big Graphs Based on Random Walks

Detecting Community Structure for Undirected Big Graphs Based on Random Walks Detecting Community Structure for Undirected Big Graphs Based on Random Walks Xiaoming Liu 1, Yadong Zhou 1, Chengchen Hu 1, Xiaohong Guan 1,, Junyuan Leng 1 1 MOE KLNNIS Lab, Xi an Jiaotong University,

More information

Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark

Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark PL.Marichamy 1, M.Phil Research Scholar, Department of Computer Application, Alagappa University, Karaikudi,

More information

Data Clustering. Danushka Bollegala

Data Clustering. Danushka Bollegala Data Clustering Danushka Bollegala Outline Why cluster data? Clustering as unsupervised learning Clustering algorithms k-means, k-medoids agglomerative clustering Brown s clustering Spectral clustering

More information

Flat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017

Flat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017 Flat Clustering Slides are mostly from Hinrich Schütze March 7, 07 / 79 Overview Recap Clustering: Introduction 3 Clustering in IR 4 K-means 5 Evaluation 6 How many clusters? / 79 Outline Recap Clustering:

More information

Community Detection Using Random Walk Label Propagation Algorithm and PageRank Algorithm over Social Network

Community Detection Using Random Walk Label Propagation Algorithm and PageRank Algorithm over Social Network Community Detection Using Random Walk Label Propagation Algorithm and PageRank Algorithm over Social Network 1 Monika Kasondra, 2 Prof. Kamal Sutaria, 1 M.E. Student, 2 Assistent Professor, 1 Computer

More information

MATH 567: Mathematical Techniques in Data

MATH 567: Mathematical Techniques in Data Supervised and unsupervised learning Supervised learning problems: MATH 567: Mathematical Techniques in Data (X, Y ) P (X, Y ). Data Science Clustering I is labelled (input/output) with joint density We

More information

CEIL: A Scalable, Resolution Limit Free Approach for Detecting Communities in Large Networks

CEIL: A Scalable, Resolution Limit Free Approach for Detecting Communities in Large Networks CEIL: A Scalable, Resolution Limit Free Approach for Detecting Communities in Large etworks Vishnu Sankar M IIT Madras Chennai, India vishnusankar151gmail.com Balaraman Ravindran IIT Madras Chennai, India

More information

Community Structure and Beyond

Community Structure and Beyond Community Structure and Beyond Elizabeth A. Leicht MAE: 298 April 9, 2009 Why do we care about community structure? Large Networks Discussion Outline Overview of past work on community structure. How to

More information

Brief description of the base clustering algorithms

Brief description of the base clustering algorithms Brief description of the base clustering algorithms Le Ou-Yang, Dao-Qing Dai, and Xiao-Fei Zhang In this paper, we choose ten state-of-the-art protein complex identification algorithms as base clustering

More information

CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16)

CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16) CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16) Michael Hahsler Southern Methodist University These slides are largely based on the slides by Hinrich Schütze Institute for

More information

MR-ECOCD: AN EDGE CLUSTERING ALGORITHM FOR OVERLAPPING COMMUNITY DETECTION ON LARGE-SCALE NETWORK USING MAPREDUCE

MR-ECOCD: AN EDGE CLUSTERING ALGORITHM FOR OVERLAPPING COMMUNITY DETECTION ON LARGE-SCALE NETWORK USING MAPREDUCE International Journal of Innovative Computing, Information and Control ICIC International c 2016 ISSN 1349-4198 Volume 12, Number 1, February 2016 pp. 263 273 MR-ECOCD: AN EDGE CLUSTERING ALGORITHM FOR

More information

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian Demystifying movie ratings 224W Project Report Amritha Raghunath (amrithar@stanford.edu) Vignesh Ganapathi Subramanian (vigansub@stanford.edu) 9 December, 2014 Introduction The past decade or so has seen

More information

Fast Nearest Neighbor Search on Large Time-Evolving Graphs

Fast Nearest Neighbor Search on Large Time-Evolving Graphs Fast Nearest Neighbor Search on Large Time-Evolving Graphs Leman Akoglu Srinivasan Parthasarathy Rohit Khandekar Vibhore Kumar Deepak Rajan Kun-Lung Wu Graphs are everywhere Leman Akoglu Fast Nearest Neighbor

More information

Diffusion and Clustering on Large Graphs

Diffusion and Clustering on Large Graphs Diffusion and Clustering on Large Graphs Alexander Tsiatas Final Defense 17 May 2012 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of large graphs: The World

More information

L1-graph based community detection in online social networks

L1-graph based community detection in online social networks L1-graph based community detection in online social networks Liang Huang 1, Ruixuan Li 1, Kunmei Wen 1, Xiwu Gu 1, Yuhua Li 1 and Zhiyong Xu 2 1 Huazhong University of Science and Technology 2 Suffork

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

MCL. (and other clustering algorithms) 858L

MCL. (and other clustering algorithms) 858L MCL (and other clustering algorithms) 858L Comparing Clustering Algorithms Brohee and van Helden (2006) compared 4 graph clustering algorithms for the task of finding protein complexes: MCODE RNSC Restricted

More information

Alternative Clusterings: Current Progress and Open Challenges

Alternative Clusterings: Current Progress and Open Challenges Alternative Clusterings: Current Progress and Open Challenges James Bailey Department of Computer Science and Software Engineering The University of Melbourne, Australia 1 Introduction Cluster analysis:

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

Introduction to network metrics

Introduction to network metrics Universitat Politècnica de Catalunya Version 0.5 Complex and Social Networks (2018-2019) Master in Innovation and Research in Informatics (MIRI) Instructors Argimiro Arratia, argimiro@cs.upc.edu, http://www.cs.upc.edu/~argimiro/

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu November 7, 2017 Learnt Clustering Methods Vector Data Set Data Sequence Data Text

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

Consensus clustering by graph based approach

Consensus clustering by graph based approach Consensus clustering by graph based approach Haytham Elghazel 1, Khalid Benabdeslemi 1 and Fatma Hamdi 2 1- University of Lyon 1, LIESP, EA4125, F-69622 Villeurbanne, Lyon, France; {elghazel,kbenabde}@bat710.univ-lyon1.fr

More information

Local Community Detection in Dynamic Graphs Using Personalized Centrality

Local Community Detection in Dynamic Graphs Using Personalized Centrality algorithms Article Local Community Detection in Dynamic Graphs Using Personalized Centrality Eisha Nathan, Anita Zakrzewska, Jason Riedy and David A. Bader * School of Computational Science and Engineering,

More information

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Kinds of Clustering Sequential Fast Cost Optimization Fixed number of clusters Hierarchical

More information

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008 Lesson 4 Random graphs Sergio Barbarossa Graph models 1. Uncorrelated random graph (Erdős, Rényi) N nodes are connected through n edges which are chosen randomly from the possible configurations 2. Binomial

More information

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS224W: Analysis of Networks Jure Leskovec, Stanford University CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models

More information

Near Linear-Time Community Detection in Networks with Hardly Detectable Community Structure

Near Linear-Time Community Detection in Networks with Hardly Detectable Community Structure Near Linear-Time Community Detection in Networks with Hardly Detectable Community Structure Aria Rezaei Department of Computer Engineering Sharif University of Technology Email: arezaei@ce.sharif.edu Saeed

More information

Topological Centrality and Its Applications. Hai Zhuge, Senior Member, IEEE, and Junsheng Zhang

Topological Centrality and Its Applications. Hai Zhuge, Senior Member, IEEE, and Junsheng Zhang 1 Topological Centrality and Its Applications Hai Zhuge, Senior Member, IEEE, and Junsheng Zhang Abstract Recent development of network structure analysis shows that it plays an important role in characterizing

More information

Algorithms for Grid Graphs in the MapReduce Model

Algorithms for Grid Graphs in the MapReduce Model University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department

More information

Community Detection Algorithm based on Centrality and Node Closeness in Scale-Free Networks

Community Detection Algorithm based on Centrality and Node Closeness in Scale-Free Networks 234 29 2 SP-B 2014 Community Detection Algorithm based on Centrality and Node Closeness in Scale-Free Networks Sorn Jarukasemratana Tsuyoshi Murata Xin Liu 1 Tokyo Institute of Technology sorn.jaru@ai.cs.titech.ac.jp

More information

EXTREMAL OPTIMIZATION AND NETWORK COMMUNITY STRUCTURE

EXTREMAL OPTIMIZATION AND NETWORK COMMUNITY STRUCTURE EXTREMAL OPTIMIZATION AND NETWORK COMMUNITY STRUCTURE Noémi Gaskó Department of Computer Science, Babeş-Bolyai University, Cluj-Napoca, Romania gaskonomi@cs.ubbcluj.ro Rodica Ioana Lung Department of Statistics,

More information

Keywords: dynamic Social Network, Community detection, Centrality measures, Modularity function.

Keywords: dynamic Social Network, Community detection, Centrality measures, Modularity function. Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient

More information

Generalized Measures for the Evaluation of Community Detection Methods

Generalized Measures for the Evaluation of Community Detection Methods Edit 13/05/2016: the R source code for the measures described in this article is now publicly available online on GitHub: https://github.com/compnet/topomeasures Generalized Measures for the Evaluation

More information

Chapter 7: Competitive learning, clustering, and self-organizing maps

Chapter 7: Competitive learning, clustering, and self-organizing maps Chapter 7: Competitive learning, clustering, and self-organizing maps António R. C. Paiva EEL 6814 Spring 2008 Outline Competitive learning Clustering Self-Organizing Maps What is competition in neural

More information

arxiv: v1 [cs.si] 17 Sep 2016

arxiv: v1 [cs.si] 17 Sep 2016 Understanding Stability of Noisy Networks through Centrality Measures and Local Connections arxiv:69.542v [cs.si] 7 Sep 26 ABSTRACT Vladimir Ufimtsev Dept. of CS Univ. of Nebraska at Omaha NE 6882, USA

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 6: Flat Clustering Wiltrud Kessler & Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 0-- / 83

More information

Community Structure Detection. Amar Chandole Ameya Kabre Atishay Aggarwal

Community Structure Detection. Amar Chandole Ameya Kabre Atishay Aggarwal Community Structure Detection Amar Chandole Ameya Kabre Atishay Aggarwal What is a network? Group or system of interconnected people or things Ways to represent a network: Matrices Sets Sequences Time

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Community Mining in Signed Networks: A Multiobjective Approach

Community Mining in Signed Networks: A Multiobjective Approach Community Mining in Signed Networks: A Multiobjective Approach Alessia Amelio National Research Council of Italy (CNR) Inst. for High Perf. Computing and Networking (ICAR) Via Pietro Bucci, 41C 87036 Rende

More information

Modularity CMSC 858L

Modularity CMSC 858L Modularity CMSC 858L Module-detection for Function Prediction Biological networks generally modular (Hartwell+, 1999) We can try to find the modules within a network. Once we find modules, we can look

More information

An overview of Graph Categories and Graph Primitives

An overview of Graph Categories and Graph Primitives An overview of Graph Categories and Graph Primitives Dino Ienco (dino.ienco@irstea.fr) https://sites.google.com/site/dinoienco/ Topics I m interested in: Graph Database and Graph Data Mining Social Network

More information

WSI using Graphs of Collocations. Paper by: Ioannis P. Klapaftis and Suresh Manandhar Presented by: Ahmad R. Shahid

WSI using Graphs of Collocations. Paper by: Ioannis P. Klapaftis and Suresh Manandhar Presented by: Ahmad R. Shahid WSI using Graphs of Collocations Paper by: Ioannis P. Klapaftis and Suresh Manandhar Presented by: Ahmad R. Shahid Word Sense Induction (WSI) Identifying different senses (uses) of a word Finds applications

More information

A Review on Overlapping Community Detection Algorithms

A Review on Overlapping Community Detection Algorithms Review Paper A Review on Overlapping Community Detection Algorithms Authors 1 G.T.Prabavathi*, 2 Dr. V. Thiagarasu Address For correspondence: 1 Asst Professor in Computer Science, Gobi Arts & Science

More information

Community detection. Leonid E. Zhukov

Community detection. Leonid E. Zhukov Community detection Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Network Science Leonid E.

More information

Outlier edge detection using random graph generation models and applications

Outlier edge detection using random graph generation models and applications Tampere University of Technology Outlier edge detection using random graph generation models and applications Citation Zhang, H., Kiranyaz, S., & Gabbouj, M. (2017). Outlier edge detection using random

More information

Community Analysis. Chapter 6

Community Analysis. Chapter 6 This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. Cambridge University Press, 2014. Draft version: April 20, 2014. Complete Draft and Slides

More information

PV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211

PV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211 PV: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv IIR 6: Flat Clustering Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University, Brno Center

More information

Graph analytics approach to analyse Enterprise Architecture models

Graph analytics approach to analyse Enterprise Architecture models Nikhitha Rajashekar nikhita.rajashekar@rwth-aachen.de Graph analytics approach to analyse Enterprise Architecture models Master Thesis Proposal Supervisor: Simon Hacks Overview 1. Enterprise Architecture

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Routing Outline. EECS 122, Lecture 15

Routing Outline. EECS 122, Lecture 15 Fall & Walrand Lecture 5 Outline EECS, Lecture 5 Kevin Fall kfall@cs.berkeley.edu Jean Walrand wlr@eecs.berkeley.edu Definition/Key Questions Distance Vector Link State Comparison Variations EECS - Fall

More information

V4 Matrix algorithms and graph partitioning

V4 Matrix algorithms and graph partitioning V4 Matrix algorithms and graph partitioning - Community detection - Simple modularity maximization - Spectral modularity maximization - Division into more than two groups - Other algorithms for community

More information