DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

Size: px
Start display at page:

Download "DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li"

Transcription

1 Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017

2 Reiews/Critiques I will choose one reiew to grade this week.

3 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna, 2011] J. Leskoec, A. Rajaraman, J. Ullman: Mining of Massie Datasets, 3

4 Graph Data: Media Networks Connections between political blogs Polarization of the network [Adamic-Glance, 2005] J. Leskoec, A. Rajaraman, J. Ullman: Mining of Massie Datasets, 4

5 Graph Data: Information Nets Citation networks and Maps of science [Börner et al., 2012] J. Leskoec, A. Rajaraman, J. Ullman: Mining of Massie Datasets, 5

6 Graph Data: Communication Nets domain2 domain1 router domain3 Internet J. Leskoec, A. Rajaraman, J. Ullman: Mining of Massie Datasets, 6

7 Questions? Partial map of the Internet based on the January 15, 2005 data found on opte.org. (from

8 Graph Data: Topological Networks Seen Bridges of Königsberg [Euler, 1735] Return to the starting point by traeling each link of the graph once and only once. J. Leskoec, A. Rajaraman, J. Ullman: Mining of Massie Datasets, 8

9 Graph representation of networks Friendship Resistance Co-authorship Following One-way road Wireless channel Undirected links Directed links + Friend & foe Trust & distrust Multi-relational links Hyperlinks Repulsion & cohesion Signed links

10 Mining in Big Graphs Network Statistic Analysis (this lecture) Network Size Degree distribution. Node Ranking (Next lecture) Identifying most influential nodes Viral Marketing, resource allocation

11 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna, 2011] J. Leskoec, A. Rajaraman, J. Ullman: Mining of Massie Datasets, 11

12 Sampling graphs Random sampling (uniform & independent) crawling } ertex sampling } BFS sampling } edge sampling } random walk sampling 12 12

13 Random Walks on Graphs Random walk sampling Random Walk Routing Molecule in liquid Influence diffusion

14 Undirected Graphs Undirected!!

15 Random Walk! # A = # # # " Adjacency matrix $ & & & & %! # D = # # # " Transition Probability Matrix 1 if i is not equal to j P ij = k i 0 if i=j E : number of links Stationary Distribution $ & & & & % Symmetric 1 " $ P = A D 1 = $ $ $ # Undirected 0 1/ 3 1/ 3 1/ 3 1/ 2 0 1/ 2 0 1/ 3 1/ 3 0 1/ 3 1/ 2 0 1/ 2 0 % ' ' ' ' & π i = d i 2 E

16 Metropolis-Hastings Random Walk! # A = # # # " Adjacency matrix $ & & & & %! # D = # # # " Transition Probability Matrix 1 min(1, k υ ) if w neighbor of υ P MH k υ,w = υ k w 1 MH P υ,y if w=υ y υ E : number of links Stationary Distribution p = u 1 V $ & & & & % Symmetric 1 " $ P = A D 1 = $ $ $ # Undirected 0 1/ 3 1/ 3 1/ 3 1/ 3 1/ 3 1/ 3 0 1/ 3 1/ 3 0 1/ 3 1/ 3 0 1/ 3 1/ 3 % ' ' ' ' &

17 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant, Carter Butts, Athina Markopoulou UC Irine, EPFL Minas Gjoka, UC Irine Walking in Facebook 17

18 Outline Motiation and Problem Statement Sampling Methodology Data Analysis Conclusion Minas Gjoka, UC Irine Walking in Facebook 18

19 Online Social Networks (OSNs) A network of declared friendships between users Allows users to maintain relationships G F E H C Many popular OSNs with different focus Facebook, LinkedIn, Flickr, D A Social Graph B Minas Gjoka, UC Irine Walking in Facebook 19

20 Why Sample OSNs? Representatie samples desirable study properties test algorithms Obtaining complete dataset difficult companies usually unwilling to share data tremendous oerhead to measure all (~100TB for Facebook) Minas Gjoka, UC Irine Walking in Facebook 20

21 Problem statement Obtain a representatie sample of users in a gien OSN by exploration of the social graph. in this work we sample Facebook (FB) explore graph using arious crawling techniques Minas Gjoka, UC Irine Walking in Facebook 21

22 Related Work Graph traersal (BFS) A. Misloe et al, IMC 2007 Y. Ahn et al, WWW 2007 C. Wilson, Eurosys 2009 Random walks (MHRW, RDS) M. Henzinger et al, WWW 2000 D. Stutbach et al, IMC 2006 A. Rasti et al, Mini Infocom 2009 Minas Gjoka, UC Irine Walking in Facebook 22

23 Outline Motiation and Problem Statement Sampling Methodology crawling methods data collection conergence ealuation method comparisons Data Analysis Conclusion Minas Gjoka, UC Irine Walking in Facebook 23

24 (1) Breadth-First-Search (BFS) Starting from a seed, explores all neighbor nodes. Process continues iteratiely without replacement. G F E H C BFS leads to bias towards high degree nodes Lee et al, Statistical properties of Sampled Networks, Phys Reiew E, 2006 D A B Unexplored Early measurement studies of OSNs use BFS as primary sampling technique i.e [Misloe et al], [Ahn et al], [Wilson et al.] Explored Visited Minas Gjoka, UC Irine Walking in Facebook 24

25 (2) Random Walk (RW) Explores graph one node at a time with replacement RW u, w P Degree of node υ In the stationary distribution Number of edges = 1 k u k p = u u 2 E H G D 1/3 F E C 1/3 A 1/3 Next candidate Current node B Minas Gjoka, UC Irine Walking in Facebook 25

26 (3) Re-Weighted Random Walk (RWRW) Hansen-Hurwitz estimator Corrects for degree bias at the end of collection Without re-weighting, the probability distribution for node property A is: pa ( ) i å å 1 A uîa = i = i 1 V uîv Re-Weighted probability distribution : Subset of sampled nodes with alue i All sampled nodes Minas Gjoka, UC Irine pa ( ) i uîa = å å Î u V i 1/ 1/ k k u u Degree of node u Walking in Facebook 26

27 (4) Metropolis-Hastings Random Walk (MHRW) Explore graph one node at a time with replacement H G C F E ì 1 k min(1, u ) if w neighbor of u MH ïk k P u w u, w = í ï MH 1 - å Pu, y if w= u ïî y¹ u In the stationary distribution Minas Gjoka, UC Irine p = u 1 V MH P AA D 1/3 1/5 A 2/15 1/3 Walking in Facebook 27 B Next candidate Current node = 1 - ( + + ) = MH 1 3 P AC = =

28 Uniform userid Sampling (UNI) As a basis for comparison, we collect a uniform sample of Facebook userids (UNI) rejection sampling on the 32-bit userid space UNI not a general solution for sampling OSNs userid space must not be sparse names instead of numbers Minas Gjoka, UC Irine Walking in Facebook 28

29 Summary of Datasets Sampling method MHRW RW BFS UNI #Valid Users 28x81K 28x81K 28x81K 984K # Unique Users 957K 2.19M 2.20M 984K Egonets for a subsample of MHRW - local properties of nodes Datasets aailable at: Minas Gjoka, UC Irine Walking in Facebook 29

30 Data Collection Basic Node Information What information do we collect for each sampled node u? UserID Name Networks Priacy settings Friend List UserID Name Networks Priacy Settings u UserID Name Networks Priacy settings 1111 UserID Name Networks Priacy settings Regional School/Workplace Send Message View Friends Profile Photo Add as Friend Minas Gjoka, UC Irine Walking in Facebook 30

31 Detecting Conergence Number of samples (iterations) to loose dependence from starting points? Minas Gjoka, UC Irine Walking in Facebook 31

32 Online Conergence Diagnostics Geweke Detects conergence for a single walk. Let X be a sequence of samples for metric of interest. Xa Xb EX ( a) - EX ( b) z = Var( X )-Var( X ) a b J. Geweke, Ealuating the accuracy of sampling based approaches to calculate posterior moments in Bayesian Statistics 4, 1992 Minas Gjoka, UC Irine Walking in Facebook 32

33 Online Conergence Diagnostics Gelman-Rubin Detects conergence for m>1 walks Walk 1 Between walks ariance Walk 2 Walk 3 R æn- 1 m+ 1 B ö = ç + è n mn W ø Within walks ariance A. Gelman, D. Rubin, Inference from iteratie simulation using multiple sequences in Statistical Science Volume 7, 1992 Minas Gjoka, UC Irine Walking in Facebook 33

34 When do we reach equilibrium? Node Degree Burn-in determined to be 3K Minas Gjoka, UC Irine Walking in Facebook 34

35 Methods Comparison Node Degree Poor performance for BFS, RW 28 crawls MHRW, RWRW produce good estimates per chain oerall Minas Gjoka, UC Irine Walking in Facebook 35

36 Sampling Bias BFS Low degree nodes underrepresented by two orders of magnitude BFS is biased Minas Gjoka, UC Irine Walking in Facebook 36

37 Sampling Bias MHRW, RW, RWRW Degree distribution identical to UNI (MHRW,RWRW) RW as biased as BFS but with smaller ariance in each walk Minas Gjoka, UC Irine Walking in Facebook 37

38 Practical Recommendations for Sampling Methods Use MHRW or RWRW. Do not use BFS, RW. Use formal conergence diagnostics assess conergence online use multiple parallel walks MHRW s RWRW RWRW slightly better performance MHRW proides a ready-to-use sample Minas Gjoka, UC Irine Walking in Facebook 38

39 Outline Motiation and Problem Statement Sampling Methodology Data Analysis Conclusion Minas Gjoka, UC Irine Walking in Facebook 39

40 FB Social Graph Degree Distribution, Power Law Distribution f(k)=b*k -a a 1 =1.32 a 2 =3.38 Degree distribution not a power law Minas Gjoka, UC Irine Walking in Facebook 40

41 Any Comments & Critiques?

42 Next Class: Graph Mining (II) Do assigned readings before class Submit reiews/critiques Attend in-class discussions 42

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna,

More information

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 233 Spring 2018 Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy,

More information

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks A Walk in Facebook: Uniform Sampling of Users in Online Social Networks Minas Gjoka, Maciej Kurant, Carter T. Butts, Athina Markopoulou California Institute for Telecommunications and Information Technology

More information

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks A Walk in Facebook: Uniform Sampling of Users in Online Social Networks Minas Gjoka CalIT2 UC Irvine mgjoka@uci.edu Maciej Kurant CalIT2 UC Irvine maciej.kurant@gmail.com Carter T. Butts Sociology Dept

More information

Unbiased Sampling of Facebook

Unbiased Sampling of Facebook Unbiased Sampling of Facebook Minas Gjoka Networked Systems UC Irvine mgjoka@uci.edu Maciej Kurant School of ICS EPFL, Lausanne maciej.kurant@epfl.ch Carter T. Butts Sociology Dept UC Irvine buttsc@uci.edu

More information

Walking in Facebook: A Case Study of Unbiased Sampling of OSNs

Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka Networked Systems UC Irvine mgjoka@uci.edu Maciej Kurant School of Comp.& Comm. Sciences EPFL, Lausanne maciej.kurant@epfl.ch

More information

How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200?

How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200? How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200? Questions from last time Avg. FB degree is 200 (suppose).

More information

Making social interactions accessible in online social networks

Making social interactions accessible in online social networks Information Services & Use 33 (2013) 113 117 113 DOI 10.3233/ISU-130702 IOS Press Making social interactions accessible in online social networks Fredrik Erlandsson a,, Roozbeh Nia a, Henric Johnson b

More information

Empirical Studies on Methods of Crawling Directed Networks

Empirical Studies on Methods of Crawling Directed Networks www.ijcsi.org 487 Empirical Studies on Methods of Crawling Directed Networks Junjie Tong, Haihong E, Meina Song and Junde Song School of Computer, Beijing University of Posts and Telecommunications Beijing,

More information

Part 1: Link Analysis & Page Rank

Part 1: Link Analysis & Page Rank Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Multigraph Sampling of Online Social Networks

Multigraph Sampling of Online Social Networks Multigraph Sampling of Online Social Networks Minas Gjoka CalIT2 UC Irvine mgjoka@uci.edu Carter T. Butts Sociology Dept UC Irvine buttsc@uci.edu Maciej Kurant CalIT2 UC Irvine maciej.kurant@gmail.com

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

DS504/CS586: Big Data Analytics Data acquisition and measurement Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data acquisition and measurement Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Data acquisition and measurement Prof. Yanhua Li Time: 6:00pm 8:50pm THURSDAY Location: AK 232 Fall 2016 Data acquisition and measurement ia Sampling and Estimation

More information

Sampling Large Graphs: Algorithms and Applications

Sampling Large Graphs: Algorithms and Applications Sampling Large Graphs: Algorithms and Applications Don Towsley Umass - Amherst Joint work with P.H. Wang, J.Z. Zhou, J.C.S. Lui, X. Guan Measuring, Analyzing Large Networks - large networks can be represented

More information

Mining and Analyzing Online Social Networks

Mining and Analyzing Online Social Networks The 5th EuroSys Doctoral Workshop (EuroDW 2011) Salzburg, Austria, Sunday 10 April 2011 Mining and Analyzing Online Social Networks Emilio Ferrara eferrara@unime.it Advisor: Prof. Giacomo Fiumara PhD School

More information

Sampling Large Graphs: Algorithms and Applications

Sampling Large Graphs: Algorithms and Applications Sampling Large Graphs: Algorithms and Applications Don Towsley College of Information & Computer Science Umass - Amherst Collaborators: P.H. Wang, J.C.S. Lui, J.Z. Zhou, X. Guan Measuring, analyzing large

More information

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur

More information

CS224W: Analysis of Network Jure Leskovec, Stanford University

CS224W: Analysis of Network Jure Leskovec, Stanford University CS224W: Analysis of Network Jure Leskovec, Stanford University http://cs224w.stanford.edu 9/25/17 Jure Leskovec, Stanford CS224W: Analysis of Networks 2 Why Networks? Networks are a general language for

More information

Empirical Characterization of P2P Systems

Empirical Characterization of P2P Systems Empirical Characterization of P2P Systems Reza Rejaie Mirage Research Group Department of Computer & Information Science University of Oregon http://mirage.cs.uoregon.edu/ Collaborators: Daniel Stutzbach

More information

A two-phase Sampling based Algorithm for Social Networks

A two-phase Sampling based Algorithm for Social Networks A two-phase Sampling based Algorithm for Social Networks Zeinab S. Jalali, Alireza Rezvanian, Mohammad Reza Meybodi Soft Computing Laboratory, Computer Engineering and Information Technology Department

More information

Probabilistic Models in Social Network Analysis

Probabilistic Models in Social Network Analysis Probabilistic Models in Social Network Analysis Sargur N. Srihari University at Buffalo, The State University of New York USA US-India Workshop on Large Scale Data Analytics and Intelligent Services December

More information

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul 1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given

More information

Modeling and Estimating the Structure of D2D-Based Mobile Social Networks

Modeling and Estimating the Structure of D2D-Based Mobile Social Networks Modeling and Estimating the Structure of DD-Based Mobile Social Networks Sigit Aryo Pambudi Wenye Wang Department of Electrical and Computer Engineering North Carolina State University, Raleigh, NC 7606

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be

More information

OSN: when multiple autonomous users disclose another individual s information

OSN: when multiple autonomous users disclose another individual s information OSN: when multiple autonomous users disclose another individual s information Cristina Pérez-Solà Dept. d Enginyeria de la Informació i les Comunicacions Universitat Autònoma de Barcelona Email: cperez@deic.uab.cat

More information

DS504/CS586: Big Data Analytics Big Data Clustering II

DS504/CS586: Big Data Analytics Big Data Clustering II Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours

More information

Estimating Sizes of Social Networks via Biased Sampling

Estimating Sizes of Social Networks via Biased Sampling Estimating Sizes of Social Networks via Biased Sampling Liran Katzir, Edo Liberty, and Oren Somekh Yahoo! Labs, Haifa, Israel International World Wide Web Conference, 28th March - 1st April 2011, Hyderabad,

More information

Video Sharing Websites Study

Video Sharing Websites Study Video Sharing Websites Study Content Characteristic Analysis Nan ZHAO Télécom ParisTech Paris, France nan.zhao@telecom-paristech.fr Loïc BAUD DREV, Hadopi Paris, France loic.baud@hadopi.fr Patrick BELLOT

More information

A Short History of Markov Chain Monte Carlo

A Short History of Markov Chain Monte Carlo A Short History of Markov Chain Monte Carlo Christian Robert and George Casella 2010 Introduction Lack of computing machinery, or background on Markov chains, or hesitation to trust in the practicality

More information

Algorithmic and Economic Aspects of Networks. Nicole Immorlica

Algorithmic and Economic Aspects of Networks. Nicole Immorlica Algorithmic and Economic Aspects of Networks Nicole Immorlica Syllabus 1. Jan. 8 th (today): Graph theory, network structure 2. Jan. 15 th : Random graphs, probabilistic network formation 3. Jan. 20 th

More information

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 First Grading for Reading Assignment Weka v 6 weeks v https://weka.waikato.ac.nz/dataminingwithweka/preview

More information

DS504/CS586: Big Data Analytics Big Data Clustering II

DS504/CS586: Big Data Analytics Big Data Clustering II Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm

More information

Supplementary file for SybilDefender: A Defense Mechanism for Sybil Attacks in Large Social Networks

Supplementary file for SybilDefender: A Defense Mechanism for Sybil Attacks in Large Social Networks 1 Supplementary file for SybilDefender: A Defense Mechanism for Sybil Attacks in Large Social Networks Wei Wei, Fengyuan Xu, Chiu C. Tan, Qun Li The College of William and Mary, Temple University {wwei,

More information

DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH116 Fall 2017 Merged CS586 and DS504 Examples of Reviews/ Critiques Random

More information

Supervised Random Walks

Supervised Random Walks Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate

More information

Framework and Algorithms for Network Bucket Testing

Framework and Algorithms for Network Bucket Testing Framework and Algorithms for Network Bucket Testing Liran Katzir Yahoo! Labs., Haifa, Israel lirank@yahoo-inc.com Edo Liberty Yahoo! Labs., Haifa, Israel edo@yahoo-inc.com Oren Somekh Yahoo! Labs., Haifa,

More information

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

Social Networks 2015 Lecture 10: The structure of the web and link analysis

Social Networks 2015 Lecture 10: The structure of the web and link analysis 04198250 Social Networks 2015 Lecture 10: The structure of the web and link analysis The structure of the web Information networks Nodes: pieces of information Links: different relations between information

More information

Extracting Information from Complex Networks

Extracting Information from Complex Networks Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform

More information

Markov Chain Monte Carlo (part 1)

Markov Chain Monte Carlo (part 1) Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for

More information

Data mining --- mining graphs

Data mining --- mining graphs Data mining --- mining graphs University of South Florida Xiaoning Qian Today s Lecture 1. Complex networks 2. Graph representation for networks 3. Markov chain 4. Viral propagation 5. Google s PageRank

More information

Introduction To Graphs and Networks. Fall 2013 Carola Wenk

Introduction To Graphs and Networks. Fall 2013 Carola Wenk Introduction To Graphs and Networks Fall 203 Carola Wenk On the Internet, links are essentially weighted by factors such as transit time, or cost. The goal is to find the shortest path from one node to

More information

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov Algorithms and Applications in Social Networks 2017/2018, Semester B Slava Novgorodov 1 Lesson #1 Administrative questions Course overview Introduction to Social Networks Basic definitions Network properties

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 4: Analyzing Graphs (1/2) October 4, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are

More information

This is the published version of a paper presented at Network Intelligence Conference (ENIC), 2015 Second European.

This is the published version of a paper presented at Network Intelligence Conference (ENIC), 2015 Second European. http://www.diva-portal.org This is the published version of a paper presented at Network Intelligence Conference (ENIC), 2015 Second European. Citation for the original published paper: Erlandsson, F.,

More information

Graph similarity. Laura Zager and George Verghese EECS, MIT. March 2005

Graph similarity. Laura Zager and George Verghese EECS, MIT. March 2005 Graph similarity Laura Zager and George Verghese EECS, MIT March 2005 Words you won t hear today impedance matching thyristor oxide layer VARs Some quick definitions GV (, E) a graph G V the set of vertices

More information

: Semantic Web (2013 Fall)

: Semantic Web (2013 Fall) 03-60-569: Web (2013 Fall) University of Windsor September 4, 2013 Table of contents 1 2 3 4 5 Definition of the Web The World Wide Web is a system of interlinked hypertext documents accessed via the Internet

More information

Lecture 27: Learning from relational data

Lecture 27: Learning from relational data Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission

More information

Degree Ranking Using Local Information

Degree Ranking Using Local Information Degree Ranking Using Local Information arxiv:1706.01205v2 [cs.si] 10 Jun 2017 Akrati Saxena* akrati.saxena@iitrpr.ac.in S. R. S. Iyengar* sudarshan@iitrpr.ac.in Ralucca Gera** rgera@nps.edu *Department

More information

Crawling Online Social Graphs

Crawling Online Social Graphs Crawling Online Social Graphs Shaozhi Ye Department of Computer Science University of California, Davis sye@ucdavis.edu Juan Lang Department of Computer Science University of California, Davis jilang@ucdavis.edu

More information

MCMC Methods for data modeling

MCMC Methods for data modeling MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms

More information

Estimating Deep Web Properties by Random Walk

Estimating Deep Web Properties by Random Walk University of Windsor Scholarship at UWindsor Electronic Theses and Dissertations 2013 Estimating Deep Web Properties by Random Walk Sajib Kumer Sinha University of Windsor Follow this and additional works

More information

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Welcome to DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm Wednesday Location: Fuller 320 Spring 2017 2 Team assignment Finalized. (Great!) Guest Speaker 2/22 A

More information

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS224W: Analysis of Networks Jure Leskovec, Stanford University CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models

More information

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization Pedro Ribeiro (DCC/FCUP & CRACS/INESC-TEC) Part 1 Motivation and emergence of Network Science

More information

745: Advanced Database Systems

745: Advanced Database Systems 745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.

More information

Cycles in Random Graphs

Cycles in Random Graphs Cycles in Random Graphs Valery Van Kerrebroeck Enzo Marinari, Guilhem Semerjian [Phys. Rev. E 75, 066708 (2007)] [J. Phys. Conf. Series 95, 012014 (2008)] Outline Introduction Statistical Mechanics Approach

More information

COMP 4601 Hubs and Authorities

COMP 4601 Hubs and Authorities COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one

More information

CS 3516: Advanced Computer Networks

CS 3516: Advanced Computer Networks Welcome to CS 56: Adanced Computer Networks Prof. Yanhua Li Time: 9:00am 9:50am M, T, R, and F Location: Fuller 0 Fall 07 A-term Some slides are originall the course materials of the tetbook Computer Networking:

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html

More information

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information

More information

Aiding the Detection of Fake Accounts in Large Scale Social Online Services

Aiding the Detection of Fake Accounts in Large Scale Social Online Services Aiding the Detection of Fake Accounts in Large Scale Social Online Services Qiang Cao Duke University Michael Sirivianos Xiaowei Yang Tiago Pregueiro Cyprus Univ. of Technology Duke University Tuenti,

More information

Graph Data Management

Graph Data Management Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of

More information

CS281 Section 9: Graph Models and Practical MCMC

CS281 Section 9: Graph Models and Practical MCMC CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs

More information

CS 3516: Advanced Computer Networks

CS 3516: Advanced Computer Networks Welcome to CS 56: Adanced Computer Networks Prof. Yanhua Li Time: 9:00am 9:50am M, T, R, and F Location: Fuller 0 Fall 06 A-term Some slides are originall from the course materials of the tetbook Computer

More information

Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets

Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Nadathur Satish, Narayanan Sundaram, Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming

More information

Lecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science

Lecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science Lecture 9: I: Web Retrieval II: Webology Johan Bollen Old Dominion University Department of Computer Science jbollen@cs.odu.edu http://www.cs.odu.edu/ jbollen April 10, 2003 Page 1 WWW retrieval Two approaches

More information

Networks in economics and finance. Lecture 1 - Measuring networks

Networks in economics and finance. Lecture 1 - Measuring networks Networks in economics and finance Lecture 1 - Measuring networks What are networks and why study them? A network is a set of items (nodes) connected by edges or links. Units (nodes) Individuals Firms Banks

More information

ST440/540: Applied Bayesian Analysis. (5) Multi-parameter models - Initial values and convergence diagn

ST440/540: Applied Bayesian Analysis. (5) Multi-parameter models - Initial values and convergence diagn (5) Multi-parameter models - Initial values and convergence diagnostics Tuning the MCMC algoritm MCMC is beautiful because it can handle virtually any statistical model and it is usually pretty easy to

More information

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:

More information

Introduction to Algorithms. Lecture 11. Prof. Patrick Jaillet

Introduction to Algorithms. Lecture 11. Prof. Patrick Jaillet 6.006- Introduction to Algorithms Lecture 11 Prof. Patrick Jaillet Lecture Overview Searching I: Graph Search and Representations Readings: CLRS 22.1-22.3, B.4 Graphs G=(V,E) V a set of vertices usually

More information

Topic mash II: assortativity, resilience, link prediction CS224W

Topic mash II: assortativity, resilience, link prediction CS224W Topic mash II: assortativity, resilience, link prediction CS224W Outline Node vs. edge percolation Resilience of randomly vs. preferentially grown networks Resilience in real-world networks network resilience

More information

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur Lecture : Graphs Rajat Mittal IIT Kanpur Combinatorial graphs provide a natural way to model connections between different objects. They are very useful in depicting communication networks, social networks

More information

Graph Exploration: How to do better than the random walk? Adrian Kosowski. INRIA Bordeaux Sud-Ouest.

Graph Exploration: How to do better than the random walk? Adrian Kosowski. INRIA Bordeaux Sud-Ouest. Graph Exploration: How to do better than the random walk? Adrian Kosowski INRIA Bordeaux Sud-Ouest kosowski@labri.fr Réunion Displexity La Rochelle, April 4, 2013 Talk outline Introduction to network exploration

More information

Graph Data Processing with MapReduce

Graph Data Processing with MapReduce Distributed data processing on the Cloud Lecture 5 Graph Data Processing with MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, 2015 (licensed under Creation Commons Attribution

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 232 Fall 2016 The Data Equation Oceans of Data Ocean Biodiversity Informatics,

More information

An Efficient Methodology for Image Rich Information Retrieval

An Efficient Methodology for Image Rich Information Retrieval An Efficient Methodology for Image Rich Information Retrieval 56 Ashwini Jaid, 2 Komal Savant, 3 Sonali Varma, 4 Pushpa Jat, 5 Prof. Sushama Shinde,2,3,4 Computer Department, Siddhant College of Engineering,

More information

E6885 Network Science Lecture 5: Network Estimation and Modeling

E6885 Network Science Lecture 5: Network Estimation and Modeling E 6885 Topics in Signal Processing -- Network Science E6885 Network Science Lecture 5: Network Estimation and Modeling Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University October 7th,

More information

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han Chapter 1. Social Media and Social Computing October 2012 Youn-Hee Han http://link.koreatech.ac.kr 1.1 Social Media A rapid development and change of the Web and the Internet Participatory web application

More information

Unbiased Sampling over Online Social Networks

Unbiased Sampling over Online Social Networks Unbiased Sampling over Online Social Networks Mojtaba Torkjazi Department of Computer & Information Science University of Oregon moji@cs.uoreogn.edu ABSTRACT During recent years, Online Social Networks

More information

CSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection

CSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection CSE 258 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:

More information

Quantitative Biology II!

Quantitative Biology II! Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!

More information

Clustering Relational Data using the Infinite Relational Model

Clustering Relational Data using the Infinite Relational Model Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015

More information

Graph and Link Mining

Graph and Link Mining Graph and Link Mining Graphs - Basics A graph is a powerful abstraction for modeling entities and their pairwise relationships. G = (V,E) Set of nodes V = v,, v 5 Set of edges E = { v, v 2, v 4, v 5 }

More information

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011 Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Social Network Analysis

Social Network Analysis Social Network Analysis Mathematics of Networks Manar Mohaisen Department of EEC Engineering Adjacency matrix Network types Edge list Adjacency list Graph representation 2 Adjacency matrix Adjacency matrix

More information

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection CSE 158 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:

More information

CSE 316: SOCIAL NETWORK ANALYSIS INTRODUCTION. Fall 2017 Marion Neumann

CSE 316: SOCIAL NETWORK ANALYSIS INTRODUCTION. Fall 2017 Marion Neumann CSE 316: SOCIAL NETWORK ANALYSIS Fall 2017 Marion Neumann INTRODUCTION Contents in these slides may be subject to copyright. Some materials are adopted from: http://www.cs.cornell.edu/home /kleinber/ networks-book,

More information

Graph Theory for Network Science

Graph Theory for Network Science Graph Theory for Network Science Dr. Natarajan Meghanathan Professor Department of Computer Science Jackson State University, Jackson, MS E-mail: natarajan.meghanathan@jsums.edu Networks or Graphs We typically

More information

My Best Current Friend in a Social Network

My Best Current Friend in a Social Network Procedia Computer Science Volume 51, 2015, Pages 2903 2907 ICCS 2015 International Conference On Computational Science My Best Current Friend in a Social Network Francisco Moreno 1, Santiago Hernández

More information

Biological Networks Analysis

Biological Networks Analysis Biological Networks Analysis Introduction and Dijkstra s algorithm Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The clustering problem: partition genes into distinct

More information

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Gautam Bhat, Rajeev Kumar Singh Department of Computer Science and Engineering Shiv Nadar University Gautam Buddh Nagar,

More information

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu Location: AK 232 Fall 2016 High Dimensional Data v Given a cloud of data points we want to understand

More information

Network Centrality. Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017

Network Centrality. Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017 Network Centrality Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017 Node centrality n Relative importance of a node in a network n How influential a person is within a

More information

Enhancing Stratified Graph Sampling Algorithms based on Approximate Degree Distribution

Enhancing Stratified Graph Sampling Algorithms based on Approximate Degree Distribution Enhancing Stratified Graph Sampling Algorithms based on Approximate Degree Distribution Junpeng Zhu 1,2, Hui Li 1,2(), Mei Chen 1,2, Zhenyu Dai 1,2 and Ming Zhu 3 1 College of Computer Science and Technology,

More information

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech CSE 6242/ CX 4242 Feb 18, 2014 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey

More information

Graph Data Management Systems in New Applications Domains. Mikko Halin

Graph Data Management Systems in New Applications Domains. Mikko Halin Graph Data Management Systems in New Applications Domains Mikko Halin Introduction Presentation is based on two papers Graph Data Management Systems for New Application Domains - Philippe Cudré-Mauroux,

More information