DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

Size: px
Start display at page:

Download "DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li"

Transcription

1 Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 233 Spring 2018

2 Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy, Reduce Air Pollution,... Urban Data Analytics Data Mining, Machine Learning, Visualization Recommender systems Big Graph Data Mining Applications Big Data Clustering Urban Data Management Spatio-temporal index, streaming, trajectory, and graph data management,... Management Human mobility Traffic Air Quality Meteorolo gy Social Media Energy Road Networks POIs Urban Sensing & Data Acquisition Participatory Sensing, Crowd Sensing, Mobile Sensing Acquisition Cleaning Urban Computing: concepts, methodologies, and applications. Zheng, Y., et al. ACM transactions on Intelligent Systems and Technology.

3 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna, 2011] J. Leskovec, A. Rajaraman, J. Ullman: 3 Mining of Massive Datasets,

4 Graph Data: Media Networks Connections between political blogs Polarization of the network [Adamic-Glance, 2005] J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 4

5 Graph Data: Information Nets Citation networks and Maps of science [Börner et al., 2012] J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 5

6 Graph Data: Communication Nets domain2 domain1 router domain3 Internet J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 6

7 Questions? Partial map of the Internet based on the January 15, 2005 data found on opte.org. (from

8 Graph Data: Topological Networks Seven Bridges of Königsberg [Euler, 1735] Return to the starting point by traveling each link of the graph once and only once. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 8

9 Graph representation of networks Friendship Resistance Co-authorship Following One-way road Wireless channel Undirected links Directed links Friend & foe Trust & distrust Multi-relational links Hyperlinks Repulsion & cohesion Signed links

10 Mining in Big Graphs v Network Statistic Analysis (this lecture) Network Size Degree distribution. v Node Ranking (Next lecture) Identifying most influential nodes Viral Marketing, resource allocation

11 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna, 2011] J. Leskovec, A. Rajaraman, J. Ullman: 11 Mining of Massive Datasets,

12 Sampling graphs Random sampling (uniform & independent) crawling } vertex sampling } BFS sampling } edge sampling } random walk sampling 12 12

13 Random Walks on Graphs Random walk sampling Random Walk Routing Molecule in liquid Influence diffusion

14 Undirected Graphs Undirected!!

15 Random Walk v Adjacency matrix! # A = # # # " $ & & & & %! # D = # # # " v Transition Probability Matrix 1 if i is not equal to j P ij = k i 0 if i=j v E : number of links v Stationary Distribution $ & & & & % Symmetric 1 " $ P = A D 1 = $ $ $ # Undirected 0 1/ 3 1/ 3 1/ 3 1/ 2 0 1/ 2 0 1/ 3 1/ 3 0 1/ 3 1/ 2 0 1/ 2 0 % ' ' ' ' & π i = d i 2 E

16 Metropolis-Hastings Random Walk v Adjacency matrix! # A = # # # " $ & & & & %! # D = # # # " $ & & & & % Symmetric v Transition Probability Matrix 1 min(1, k υ ) if w neighbor of υ P MH k υ,w = υ k w 1 MH P υ,y if w=υ y υ v E : number of links v Stationary Distribution π υ = 1 V 1 " $ P = A D 1 = $ $ $ # Undirected 0 1/ 3 1/ 3 1/ 3 1/ 3 1/ 3 1/ 3 0 1/ 3 1/ 3 0 1/ 3 1/ 3 0 1/ 3 1/ Undirected 2 % ' ' ' ' &

17 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant, Carter Butts, Athina Markopoulou UC Irvine, EPFL Minas Gjoka, UC Irvine Walking in Facebook 17

18 Outline v Motivation and Problem Statement v Sampling Methodology v Data Analysis v Conclusion Minas Gjoka, UC Irvine Walking in Facebook 18

19 Online Social Networks (OSNs) v A network of declared friendships between users v Allows users to maintain relationships G F E H C D B v Many popular OSNs with different focus Facebook, LinkedIn, Flickr, A Social Graph Minas Gjoka, UC Irvine Walking in Facebook 19

20 Why Sample OSNs? v Representative samples desirable study properties test algorithms We use node distribution in this study v Obtaining complete dataset difficult companies usually unwilling to share data tremendous overhead to measure all (~100TB for Facebook) Minas Gjoka, UC Irvine Walking in Facebook 20

21 Problem statement v Obtain a representative sample of users in a given OSN by exploration of the social graph. in this work we sample Facebook (FB) explore graph using various crawling techniques Minas Gjoka, UC Irvine Walking in Facebook 21

22 Related Work v Graph traversal (BFS) A. Mislove et al, IMC 2007 Y. Ahn et al, WWW 2007 C. Wilson, Eurosys 2009 v Random walks (MHRW, RDS) M. Henzinger et al, WWW 2000 D. Stutbach et al, IMC 2006 A. Rasti et al, Mini Infocom 2009 Minas Gjoka, UC Irvine Walking in Facebook 22

23 Outline v Motivation and Problem Statement v Sampling Methodology crawling methods data collection convergence evaluation method comparisons v Data Analysis v Conclusion Minas Gjoka, UC Irvine Walking in Facebook 23

24 (1) Breadth-First-Search (BFS) v v v pa ( ) Starting from a seed, explores all neighbor nodes. Process continues iteratively without replacement. BFS leads to bias towards high degree nodes Lee et al, Statistical properties of Sampled Networks, Phys Review E, 2006 Early measurement studies of OSNs use BFS as primary sampling technique i.e [Mislove et al], [Ahn et al], [Wilson et al.] i u A1 A i i = = 1 V u V H G D Subset of sampled nodes with value i All sampled nodes F E C B A Unexplored Explored Visited Walking in Facebook 24

25 (2) Random Walk (RW) Explores graph one node at a time with replacement P Degree of node υ RW υ, w = 1 k In the stationary distribution π υ = k 2 υ υ E Number of edges Subset of sampled u A1 A i i nodes with value i pa ( i) = = u V1 V All sampled nodes Minas Gjoka, UC Irvine Walking in Facebook 25 H G D 1/3 C F 1/3 A 1/3 Next candidate Current node E B

26 (3) Re-Weighted Random Walk (RWRW) Hansen-Hurwitz estimator v Corrects for degree bias at the end of collection v Without re-weighting, the probability distribution for node property A is: pa ( ) i u A1 A i i = = 1 V u V v Re-Weighted probability distribution : Subset of sampled nodes with value i All sampled nodes Minas Gjoka, UC Irvine pa ( ) i u A = u V i 1/ 1/ k k u u Degree of node u Walking in Facebook 26

27 (4) Metropolis-Hastings Random Walk (MHRW) v Explore graph one node at a time with replacement H G C F E 1 k min(1, υ ) if w neighbor of υ MH k k P υ w υ, w = MH 1 Pυ, y if w= υ y υ v In the stationary distribution pa ( ) i 1 π υ = V u A1 A i i = = 1 V u V MH P AA Subset of sampled nodes with value i All sampled nodes D 1/3 1/5 A 2/15 1/3 Walking in Facebook 27 B Next candidate Current node = 1 ( + + ) = MH 1 3 P AC = =

28 Uniform userid Sampling (UNI) v As a basis for comparison, we collect a uniform sample of Facebook userids (UNI) rejection sampling on the 32-bit userid space v UNI not a general solution for sampling OSNs userid space must not be sparse names instead of numbers Minas Gjoka, UC Irvine Walking in Facebook 28

29 Summary of Datasets Sampling method MHRW RW BFS UNI #Valid Users 28x81K 28x81K 28x81K 984K # Unique Users 957K 2.19M 2.20M 984K Egonets for a subsample of MHRW - local properties of nodes Datasets available at: Minas Gjoka, UC Irvine Walking in Facebook 29

30 Data Collection Basic Node Information v What information do we collect for each sampled node u? UserID Name Networks Privacy settings Friend List UserID Name Networks Privacy Settings u UserID Name Networks Privacy settings 1111 UserID Name Networks Privacy settings Regional School/Workplace Send Message View Friends Profile Photo Add as Friend Minas Gjoka, UC Irvine Walking in Facebook 30

31 Detecting Convergence Number of samples (iterations) to loose dependence from starting points? Minas Gjoka, UC Irvine Walking in Facebook 31

32 Online Convergence Diagnostics Geweke v Detects convergence for a single walk. Let X be a sequence of samples for metric of interest. Xa Xb a =10%,b = 50% J. Geweke, Evaluating the accuracy of sampling based approaches to calculate posterior moments in Bayesian Statistics 4, 1992 Minas Gjoka, UC Irvine z = E(X a ) E(X b ) Var(X a ) Var(X b ) [ 1,1] Walking in Facebook 32

33 Online Convergence Diagnostics Gelman-Rubin v Detects convergence for M>1 walks Walk 1 Between walks variance Walk 2 Walk 3 R = N 1 N + M +1 B MN W Within walks variance Convergence is declared when R<1.02 A. Gelman, D. Rubin, Inference from iterative simulation using multiple sequences in Statistical Science Volume 7, 1992 Minas Gjoka, UC Irvine Walking in Facebook 33

34 When do we reach equilibrium? Node Degree Burn-in determined to be 3K Minas Gjoka, UC Irvine Walking in Facebook 34

35 Methods Comparison Node Degree v Poor performance for BFS, RW 28 crawls v MHRW, RWRW produce good estimates per chain overall Minas Gjoka, UC Irvine Walking in Facebook 35

36 Sampling Bias BFS v Low degree nodes underrepresented by two orders of magnitude v BFS is biased Minas Gjoka, UC Irvine Walking in Facebook 36

37 Sampling Bias MHRW, RW, RWRW v v Degree distribution identical to UNI (MHRW,RWRW) RW as biased as BFS but with smaller variance in each walk Minas Gjoka, UC Irvine Walking in Facebook 37

38 Practical Recommendations for Sampling Methods v Use MHRW or RWRW. Do not use BFS, RW. v Use formal convergence diagnostics assess convergence online use multiple parallel walks v MHRW vs RWRW RWRW slightly better performance MHRW provides a ready-to-use sample Minas Gjoka, UC Irvine Walking in Facebook 38

39 FB Social Graph Degree Distribution, Power Law Distribution f(k)=b*k -a a 1 =1.32 a 2 =3.38 v Degree distribution not a power law Minas Gjoka, UC Irvine Walking in Facebook 39

40 Outline v Motivation and Problem Statement v Sampling Methodology v Data Analysis v Conclusion Minas Gjoka, UC Irvine Walking in Facebook 40

41 Any Comments & Critiques?

42 Next Week v 5 team presentations v 30 minutes each team including the presentation and Q&A v We will have snacks and soft drinks Minas Gjoka, UC Irvine Walking in Facebook 42

43 Project 2 starts next week v KDD cup 2018 v Real data, worldwide competition v Will be announced March 1 st, 2018 v Define your own project or KDD cup v v v view/announcing-kdd-cup highway-tollgates-traffic-flowprediction Gjoka, UC Minas Irvine Walking in Facebook 43

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna,

More information

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 Reiews/Critiques I will choose one reiew to grade this week. Graph Data: Social

More information

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks A Walk in Facebook: Uniform Sampling of Users in Online Social Networks Minas Gjoka, Maciej Kurant, Carter T. Butts, Athina Markopoulou California Institute for Telecommunications and Information Technology

More information

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks A Walk in Facebook: Uniform Sampling of Users in Online Social Networks Minas Gjoka CalIT2 UC Irvine mgjoka@uci.edu Maciej Kurant CalIT2 UC Irvine maciej.kurant@gmail.com Carter T. Butts Sociology Dept

More information

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Welcome to DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm Wednesday Location: Fuller 320 Spring 2017 2 Team assignment Finalized. (Great!) Guest Speaker 2/22 A

More information

Unbiased Sampling of Facebook

Unbiased Sampling of Facebook Unbiased Sampling of Facebook Minas Gjoka Networked Systems UC Irvine mgjoka@uci.edu Maciej Kurant School of ICS EPFL, Lausanne maciej.kurant@epfl.ch Carter T. Butts Sociology Dept UC Irvine buttsc@uci.edu

More information

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 First Grading for Reading Assignment Weka v 6 weeks v https://weka.waikato.ac.nz/dataminingwithweka/preview

More information

Walking in Facebook: A Case Study of Unbiased Sampling of OSNs

Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka Networked Systems UC Irvine mgjoka@uci.edu Maciej Kurant School of Comp.& Comm. Sciences EPFL, Lausanne maciej.kurant@epfl.ch

More information

Making social interactions accessible in online social networks

Making social interactions accessible in online social networks Information Services & Use 33 (2013) 113 117 113 DOI 10.3233/ISU-130702 IOS Press Making social interactions accessible in online social networks Fredrik Erlandsson a,, Roozbeh Nia a, Henric Johnson b

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Empirical Studies on Methods of Crawling Directed Networks

Empirical Studies on Methods of Crawling Directed Networks www.ijcsi.org 487 Empirical Studies on Methods of Crawling Directed Networks Junjie Tong, Haihong E, Meina Song and Junde Song School of Computer, Beijing University of Posts and Telecommunications Beijing,

More information

How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200?

How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200? How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200? Questions from last time Avg. FB degree is 200 (suppose).

More information

Part 1: Link Analysis & Page Rank

Part 1: Link Analysis & Page Rank Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Sampling Large Graphs: Algorithms and Applications

Sampling Large Graphs: Algorithms and Applications Sampling Large Graphs: Algorithms and Applications Don Towsley College of Information & Computer Science Umass - Amherst Collaborators: P.H. Wang, J.C.S. Lui, J.Z. Zhou, X. Guan Measuring, analyzing large

More information

Multigraph Sampling of Online Social Networks

Multigraph Sampling of Online Social Networks Multigraph Sampling of Online Social Networks Minas Gjoka CalIT2 UC Irvine mgjoka@uci.edu Carter T. Butts Sociology Dept UC Irvine buttsc@uci.edu Maciej Kurant CalIT2 UC Irvine maciej.kurant@gmail.com

More information

Mining and Analyzing Online Social Networks

Mining and Analyzing Online Social Networks The 5th EuroSys Doctoral Workshop (EuroDW 2011) Salzburg, Austria, Sunday 10 April 2011 Mining and Analyzing Online Social Networks Emilio Ferrara eferrara@unime.it Advisor: Prof. Giacomo Fiumara PhD School

More information

A two-phase Sampling based Algorithm for Social Networks

A two-phase Sampling based Algorithm for Social Networks A two-phase Sampling based Algorithm for Social Networks Zeinab S. Jalali, Alireza Rezvanian, Mohammad Reza Meybodi Soft Computing Laboratory, Computer Engineering and Information Technology Department

More information

Sampling Large Graphs: Algorithms and Applications

Sampling Large Graphs: Algorithms and Applications Sampling Large Graphs: Algorithms and Applications Don Towsley Umass - Amherst Joint work with P.H. Wang, J.Z. Zhou, J.C.S. Lui, X. Guan Measuring, Analyzing Large Networks - large networks can be represented

More information

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul 1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given

More information

Empirical Characterization of P2P Systems

Empirical Characterization of P2P Systems Empirical Characterization of P2P Systems Reza Rejaie Mirage Research Group Department of Computer & Information Science University of Oregon http://mirage.cs.uoregon.edu/ Collaborators: Daniel Stutzbach

More information

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur

More information

Probabilistic Models in Social Network Analysis

Probabilistic Models in Social Network Analysis Probabilistic Models in Social Network Analysis Sargur N. Srihari University at Buffalo, The State University of New York USA US-India Workshop on Large Scale Data Analytics and Intelligent Services December

More information

Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets

Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Nadathur Satish, Narayanan Sundaram, Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming

More information

DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH116 Fall 2017 Merged CS586 and DS504 Examples of Reviews/ Critiques Random

More information

OSN: when multiple autonomous users disclose another individual s information

OSN: when multiple autonomous users disclose another individual s information OSN: when multiple autonomous users disclose another individual s information Cristina Pérez-Solà Dept. d Enginyeria de la Informació i les Comunicacions Universitat Autònoma de Barcelona Email: cperez@deic.uab.cat

More information

Estimating Sizes of Social Networks via Biased Sampling

Estimating Sizes of Social Networks via Biased Sampling Estimating Sizes of Social Networks via Biased Sampling Liran Katzir, Edo Liberty, and Oren Somekh Yahoo! Labs, Haifa, Israel International World Wide Web Conference, 28th March - 1st April 2011, Hyderabad,

More information

DS504/CS586: Big Data Analytics Big Data Clustering II

DS504/CS586: Big Data Analytics Big Data Clustering II Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm

More information

DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 232 Fall 2016 The Data Equation Oceans of Data Ocean Biodiversity Informatics,

More information

Supplementary file for SybilDefender: A Defense Mechanism for Sybil Attacks in Large Social Networks

Supplementary file for SybilDefender: A Defense Mechanism for Sybil Attacks in Large Social Networks 1 Supplementary file for SybilDefender: A Defense Mechanism for Sybil Attacks in Large Social Networks Wei Wei, Fengyuan Xu, Chiu C. Tan, Qun Li The College of William and Mary, Temple University {wwei,

More information

DS504/CS586: Big Data Analytics Big Data Clustering II

DS504/CS586: Big Data Analytics Big Data Clustering II Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours

More information

Crawling Online Social Graphs

Crawling Online Social Graphs Crawling Online Social Graphs Shaozhi Ye Department of Computer Science University of California, Davis sye@ucdavis.edu Juan Lang Department of Computer Science University of California, Davis jilang@ucdavis.edu

More information

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han Chapter 1. Social Media and Social Computing October 2012 Youn-Hee Han http://link.koreatech.ac.kr 1.1 Social Media A rapid development and change of the Web and the Internet Participatory web application

More information

Modeling and Estimating the Structure of D2D-Based Mobile Social Networks

Modeling and Estimating the Structure of D2D-Based Mobile Social Networks Modeling and Estimating the Structure of DD-Based Mobile Social Networks Sigit Aryo Pambudi Wenye Wang Department of Electrical and Computer Engineering North Carolina State University, Raleigh, NC 7606

More information

Lecture 27: Learning from relational data

Lecture 27: Learning from relational data Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be

More information

Overview of the Stateof-the-Art. Networks. Evolution of social network studies

Overview of the Stateof-the-Art. Networks. Evolution of social network studies Overview of the Stateof-the-Art in Social Networks INF5370 spring 2014 Evolution of social network studies 1950-1970: mathematical studies of networks formed by the actual human interactions Pandemics,

More information

Extracting Information from Complex Networks

Extracting Information from Complex Networks Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform

More information

Video Sharing Websites Study

Video Sharing Websites Study Video Sharing Websites Study Content Characteristic Analysis Nan ZHAO Télécom ParisTech Paris, France nan.zhao@telecom-paristech.fr Loïc BAUD DREV, Hadopi Paris, France loic.baud@hadopi.fr Patrick BELLOT

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu Location: AK 232 Fall 2016 High Dimensional Data v Given a cloud of data points we want to understand

More information

This is the published version of a paper presented at Network Intelligence Conference (ENIC), 2015 Second European.

This is the published version of a paper presented at Network Intelligence Conference (ENIC), 2015 Second European. http://www.diva-portal.org This is the published version of a paper presented at Network Intelligence Conference (ENIC), 2015 Second European. Citation for the original published paper: Erlandsson, F.,

More information

CS224W: Analysis of Network Jure Leskovec, Stanford University

CS224W: Analysis of Network Jure Leskovec, Stanford University CS224W: Analysis of Network Jure Leskovec, Stanford University http://cs224w.stanford.edu 9/25/17 Jure Leskovec, Stanford CS224W: Analysis of Networks 2 Why Networks? Networks are a general language for

More information

Data mining --- mining graphs

Data mining --- mining graphs Data mining --- mining graphs University of South Florida Xiaoning Qian Today s Lecture 1. Complex networks 2. Graph representation for networks 3. Markov chain 4. Viral propagation 5. Google s PageRank

More information

COMMUNITY SHELL S EFFECT ON THE DISINTEGRATION OF SOCIAL NETWORKS

COMMUNITY SHELL S EFFECT ON THE DISINTEGRATION OF SOCIAL NETWORKS Annales Univ. Sci. Budapest., Sect. Comp. 43 (2014) 57 68 COMMUNITY SHELL S EFFECT ON THE DISINTEGRATION OF SOCIAL NETWORKS Imre Szücs (Budapest, Hungary) Attila Kiss (Budapest, Hungary) Dedicated to András

More information

Supervised Random Walks

Supervised Random Walks Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate

More information

Graph Data Management

Graph Data Management Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of

More information

745: Advanced Database Systems

745: Advanced Database Systems 745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.

More information

A Short History of Markov Chain Monte Carlo

A Short History of Markov Chain Monte Carlo A Short History of Markov Chain Monte Carlo Christian Robert and George Casella 2010 Introduction Lack of computing machinery, or background on Markov chains, or hesitation to trust in the practicality

More information

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov Algorithms and Applications in Social Networks 2017/2018, Semester B Slava Novgorodov 1 Lesson #1 Administrative questions Course overview Introduction to Social Networks Basic definitions Network properties

More information

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization Pedro Ribeiro (DCC/FCUP & CRACS/INESC-TEC) Part 1 Motivation and emergence of Network Science

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu Can we identify node groups? (communities, modules, clusters) 2/13/2014 Jure Leskovec, Stanford C246: Mining

More information

Markov Chain Monte Carlo (part 1)

Markov Chain Monte Carlo (part 1) Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for

More information

Graph Data Processing with MapReduce

Graph Data Processing with MapReduce Distributed data processing on the Cloud Lecture 5 Graph Data Processing with MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, 2015 (licensed under Creation Commons Attribution

More information

Observing the Evolution of Social Network on Weibo by Sampled Data

Observing the Evolution of Social Network on Weibo by Sampled Data Observing the Evolution of Social Network on Weibo by Sampled Data Lu Ma, Gang Lu, Junxia Guo College of Information Science and Technology Beijing University of Chemical Technology Beijing, China sizheng@126.com

More information

Cycles in Random Graphs

Cycles in Random Graphs Cycles in Random Graphs Valery Van Kerrebroeck Enzo Marinari, Guilhem Semerjian [Phys. Rev. E 75, 066708 (2007)] [J. Phys. Conf. Series 95, 012014 (2008)] Outline Introduction Statistical Mechanics Approach

More information

Graph Data Management Systems in New Applications Domains. Mikko Halin

Graph Data Management Systems in New Applications Domains. Mikko Halin Graph Data Management Systems in New Applications Domains Mikko Halin Introduction Presentation is based on two papers Graph Data Management Systems for New Application Domains - Philippe Cudré-Mauroux,

More information

CS281 Section 9: Graph Models and Practical MCMC

CS281 Section 9: Graph Models and Practical MCMC CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs

More information

Popularity of Twitter Accounts: PageRank on a Social Network

Popularity of Twitter Accounts: PageRank on a Social Network Popularity of Twitter Accounts: PageRank on a Social Network A.D-A December 8, 2017 1 Problem Statement Twitter is a social networking service, where users can create and interact with 140 character messages,

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

Social Networks 2015 Lecture 10: The structure of the web and link analysis

Social Networks 2015 Lecture 10: The structure of the web and link analysis 04198250 Social Networks 2015 Lecture 10: The structure of the web and link analysis The structure of the web Information networks Nodes: pieces of information Links: different relations between information

More information

Enhancing Stratified Graph Sampling Algorithms based on Approximate Degree Distribution

Enhancing Stratified Graph Sampling Algorithms based on Approximate Degree Distribution Enhancing Stratified Graph Sampling Algorithms based on Approximate Degree Distribution Junpeng Zhu 1,2, Hui Li 1,2(), Mei Chen 1,2, Zhenyu Dai 1,2 and Ming Zhu 3 1 College of Computer Science and Technology,

More information

Quantitative Biology II!

Quantitative Biology II! Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!

More information

Topic mash II: assortativity, resilience, link prediction CS224W

Topic mash II: assortativity, resilience, link prediction CS224W Topic mash II: assortativity, resilience, link prediction CS224W Outline Node vs. edge percolation Resilience of randomly vs. preferentially grown networks Resilience in real-world networks network resilience

More information

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011 Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Link Analysis in the Cloud

Link Analysis in the Cloud Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)

More information

Graph Exploration: How to do better than the random walk? Adrian Kosowski. INRIA Bordeaux Sud-Ouest.

Graph Exploration: How to do better than the random walk? Adrian Kosowski. INRIA Bordeaux Sud-Ouest. Graph Exploration: How to do better than the random walk? Adrian Kosowski INRIA Bordeaux Sud-Ouest kosowski@labri.fr Réunion Displexity La Rochelle, April 4, 2013 Talk outline Introduction to network exploration

More information

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017 Latent Space Model for Road Networks to Predict Time-Varying Traffic Presented by: Rob Fitzgerald Spring 2017 Definition of Latent https://en.oxforddictionaries.com/definition/latent Latent Space Model?

More information

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:

More information

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur Lecture : Graphs Rajat Mittal IIT Kanpur Combinatorial graphs provide a natural way to model connections between different objects. They are very useful in depicting communication networks, social networks

More information

Graph similarity. Laura Zager and George Verghese EECS, MIT. March 2005

Graph similarity. Laura Zager and George Verghese EECS, MIT. March 2005 Graph similarity Laura Zager and George Verghese EECS, MIT March 2005 Words you won t hear today impedance matching thyristor oxide layer VARs Some quick definitions GV (, E) a graph G V the set of vertices

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html

More information

Online Social Networks and Media. Community detection

Online Social Networks and Media. Community detection Online Social Networks and Media Community detection 1 Notes on Homework 1 1. You should write your own code for generating the graphs. You may use SNAP graph primitives (e.g., add node/edge) 2. For the

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 4: Analyzing Graphs (1/2) October 4, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are

More information

Some Graph Theory for Network Analysis. CS 249B: Science of Networks Week 01: Thursday, 01/31/08 Daniel Bilar Wellesley College Spring 2008

Some Graph Theory for Network Analysis. CS 249B: Science of Networks Week 01: Thursday, 01/31/08 Daniel Bilar Wellesley College Spring 2008 Some Graph Theory for Network Analysis CS 9B: Science of Networks Week 0: Thursday, 0//08 Daniel Bilar Wellesley College Spring 008 Goals this lecture Introduce you to some jargon what we call things in

More information

Lecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science

Lecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science Lecture 9: I: Web Retrieval II: Webology Johan Bollen Old Dominion University Department of Computer Science jbollen@cs.odu.edu http://www.cs.odu.edu/ jbollen April 10, 2003 Page 1 WWW retrieval Two approaches

More information

Mining Data that Changes. 17 July 2015

Mining Data that Changes. 17 July 2015 Mining Data that Changes 17 July 2015 Data is Not Static Data is not static New transactions, new friends, stop following somebody in Twitter, But most data mining algorithms assume static data Even a

More information

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

CSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection

CSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection CSE 258 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:

More information

MCMC Methods for data modeling

MCMC Methods for data modeling MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms

More information

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information

More information

COMP 4601 Hubs and Authorities

COMP 4601 Hubs and Authorities COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

University of Maryland. Tuesday, March 2, 2010

University of Maryland. Tuesday, March 2, 2010 Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Framework and Algorithms for Network Bucket Testing

Framework and Algorithms for Network Bucket Testing Framework and Algorithms for Network Bucket Testing Liran Katzir Yahoo! Labs., Haifa, Israel lirank@yahoo-inc.com Edo Liberty Yahoo! Labs., Haifa, Israel edo@yahoo-inc.com Oren Somekh Yahoo! Labs., Haifa,

More information

Recommendation System for Location-based Social Network CS224W Project Report

Recommendation System for Location-based Social Network CS224W Project Report Recommendation System for Location-based Social Network CS224W Project Report Group 42, Yiying Cheng, Yangru Fang, Yongqing Yuan 1 Introduction With the rapid development of mobile devices and wireless

More information

Biological Networks Analysis

Biological Networks Analysis Biological Networks Analysis Introduction and Dijkstra s algorithm Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The clustering problem: partition genes into distinct

More information

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection CSE 158 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:

More information

Algorithmic and Economic Aspects of Networks. Nicole Immorlica

Algorithmic and Economic Aspects of Networks. Nicole Immorlica Algorithmic and Economic Aspects of Networks Nicole Immorlica Syllabus 1. Jan. 8 th (today): Graph theory, network structure 2. Jan. 15 th : Random graphs, probabilistic network formation 3. Jan. 20 th

More information

Theory of Computing. Lecture 10 MAS 714 Hartmut Klauck

Theory of Computing. Lecture 10 MAS 714 Hartmut Klauck Theory of Computing Lecture 10 MAS 714 Hartmut Klauck Seven Bridges of Königsberg Can one take a walk that crosses each bridge exactly once? Seven Bridges of Königsberg Model as a graph Is there a path

More information

Degree Ranking Using Local Information

Degree Ranking Using Local Information Degree Ranking Using Local Information arxiv:1706.01205v2 [cs.si] 10 Jun 2017 Akrati Saxena* akrati.saxena@iitrpr.ac.in S. R. S. Iyengar* sudarshan@iitrpr.ac.in Ralucca Gera** rgera@nps.edu *Department

More information

Jure Leskovec Machine Learning Department Carnegie Mellon University

Jure Leskovec Machine Learning Department Carnegie Mellon University Jure Leskovec Machine Learning Department Carnegie Mellon University Currently: Soon: Today: Large on line systems have detailed records of human activity On line communities: Facebook (64 million users,

More information

CSE 316: SOCIAL NETWORK ANALYSIS INTRODUCTION. Fall 2017 Marion Neumann

CSE 316: SOCIAL NETWORK ANALYSIS INTRODUCTION. Fall 2017 Marion Neumann CSE 316: SOCIAL NETWORK ANALYSIS Fall 2017 Marion Neumann INTRODUCTION Contents in these slides may be subject to copyright. Some materials are adopted from: http://www.cs.cornell.edu/home /kleinber/ networks-book,

More information

Epilog: Further Topics

Epilog: Further Topics Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Epilog: Further Topics Lecture: Prof. Dr. Thomas

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Recommender Systems II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Recommender Systems Recommendation via Information Network Analysis Hybrid Collaborative Filtering

More information

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users Practical Considerations for WinBUGS Users Kate Cowles, Ph.D. Department of Statistics and Actuarial Science University of Iowa 22S:138 Lecture 12 Oct. 3, 2003 Issues in MCMC use for Bayesian model fitting

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank

More information

TI2736-B Big Data Processing. Claudia Hauff

TI2736-B Big Data Processing. Claudia Hauff TI2736-B Big Data Processing Claudia Hauff ti2736b-ewi@tudelft.nl Intro Streams Streams Map Reduce HDFS Pig Ctd. Graphs Pig Design Patterns Hadoop Ctd. Giraph Zoo Keeper Spark Spark Ctd. Learning objectives

More information

Unbiased Sampling over Online Social Networks

Unbiased Sampling over Online Social Networks Unbiased Sampling over Online Social Networks Mojtaba Torkjazi Department of Computer & Information Science University of Oregon moji@cs.uoreogn.edu ABSTRACT During recent years, Online Social Networks

More information

Graphs (Part II) Shannon Quinn

Graphs (Part II) Shannon Quinn Graphs (Part II) Shannon Quinn (with thanks to William Cohen and Aapo Kyrola of CMU, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University) Parallel Graph Computation Distributed computation

More information