Sampling Large Graphs: Algorithms and Applications

Size: px
Start display at page:

Download "Sampling Large Graphs: Algorithms and Applications"

Transcription

1 Sampling Large Graphs: Algorithms and Applications Don Towsley Umass - Amherst Joint work with P.H. Wang, J.Z. Zhou, J.C.S. Lui, X. Guan

2 Measuring, Analyzing Large Networks - large networks can be represented by graphs - Facebook - WWW - Twitter - Ebay 300 million 1 Billion 50 Billion 233 Million Curse of data dimensionality!!!

3 Challenges in measurement: Information Distortion World Map in 1459 incomplete (Columbus et al. 1492) (Australia 17 th century) wrong proportions (Africa & Asia)

4 Why do we want to understand these networks? High school friendship Want to understand or find out network how did these networks evolve? who are the influential users? how does influence propagate? distribution of preference of users? communities in these networks?.etc.

5 Goals and Challenges Goals generate statistically valid characterization of network structure node pairs in this work Challenges large network sizes correcting for biases

6 How to measure: sampling algorithms Random sampling (uniform & independent) Vertex sampling Crawling Breadth First sampling (BFS) Edge sampling Random walk sampling

7 How to measure: sampling algorithms Random sampling (uniform & independent) Vertex sampling Crawling Breadth First sampling (BFS) Edge sampling Random walk sampling

8 How to measure: sampling algorithms Random sampling (uniform & independent) Vertex sampling Crawling Breadth First sampling (BFS) Edge sampling Random walk sampling

9 How to measure: sampling algorithms Random sampling (uniform & independent) Vertex sampling Crawling Breadth First sampling (BFS) Edge sampling Random walk sampling

10 How to measure: sampling algorithms Random sampling (uniform & independent) Vertex sampling Crawling Breadth First sampling (BFS) Edge sampling Random walk sampling

11 Focus of this talk Measure node pair statistics: important for many applications! 12

12 Classification of Node Pairs Classify node pair [u, v] using shortest path 1-hop node pair class if distance(u,v) = 1 2-hop node pair class if distance(u,v) = 2 13

13 Similarity Function: Homophily Homophily: tendency of users to connect to others with common interests. P. Singla and M. Richardson. Yes, there is a correlation: from social networks to personal behavior on the web. In WWW 2008 (MSN) Can infer characteristics and make recommendations Compare homophily(u,v) between different node pair classes 14

14 Similarity Function: Proximity Proximity(u,v): closeness of u and v; defined as functions such as number of common neighbors of u and v u v knowing proximity distribution of node pairs important for friendship prediction interest recommendation 15

15 Similarity Function: Distance Distance(u,v): length of shortest path between u and v in graph measure distance distribution of all node pairs to calculate average distance Twitter: 4.1 MSN: 6.6 effective diameter (the 90th percentile of all distances) small world 16

16 Problem Formulation undirected graph G = (V, E) measure characteristics of node pairs in following sets: all pairs - S = { u, v : u, v V, u v} one-hop pairs - pairs of connected nodes S (1) = u, v : u, v E two-hop pairs - pairs of nodes with at least one common neighbor S (2) = { u, v : u, v V, u v; x V st x, u, x, v E} 17

17 Problem Formulation F(u, v) - value of node pair property under study, e.g., # of common neighbors of u, v {a 1,, a K } - range of F u, v distribution of F u, v S: (ω 1,, ω K ) S (1) : (ω 1 (1),, ωk (1) ) S (2) : (ω 1 2,, ω K 2 ) ω k, ω k (1), ωk (2) - fractions of node pairs in S, S 1, S (2) with property F u, v = a k 18

18 Challenges OSNs large Facebook, Google+, Twitter, Renren, Sina Weibo,, V > 500 million users huge number of node pairs, V 2 > topology not available sampling required UVS (Uniform Vertex Sampling): sampling bias fors (1), S (2). Sometimes UVS not allowed crawling methods, such as breadth first search, and random walk (RW): sampling bias need to construct unbiased estimator 19

19 Node Pair Sampling Based on UVS Basic sampling techniques UVS: sample nodes from V uniformly weighted vertex sampling (WVS): sample nodes from V with desired probability distribution (π x : x V) independent WVS (IWVS (if we know topology) Metropolis-Hastings WVS (MHWVS): At each step, MHWVS selects a node v using UVS and then accepts the move with probability min(π v /π u, 1), otherwise remains at u. 20

20 Node Pair Sampling Based on UVS All pairs S Sampling method: use UVS to select two different nodes u and v uniformly at random Estimator: given sampled node pairs u i, v i, i = 1,, n n ω k = 1 1(F u n i, v i i=1 = a k ), k = 1,, K Accuracy (unbiased) E ω k = ω k and Var ω k = ω k ω k 2 n, k = 1,, K 21

21 Node Pair Sampling Based on UVS One hop pairs S (1) Sampling node pair [u, v] 1) sample node u according to probability distribution (π u 1 : u V), where d u π u 1 - degree of node u = d u 2 E 2) select neighbor v at random Each [u, v] sampled uniformly from S (1) 22

22 Node Pair Sampling Based on UVS One hop pairs S (1) Estimator: given sampled node pairs u i, v i, i = 1,, n ω k (1) = 1 n n i=1 Accuracy (unbiased) 1(F u i, v i = a k ), k = 1,, K E ω k (1) = ω k (1) and Var ω k (1) = ω (1) 1 2 k ωk n, k = 1,, K 23

23 Node Pair Sampling Based on UVS Two hop pairs S (2) Sampling node pair u, v 1) sample a node x according to probability distribution (π x 2 : x V) π x 2 = d x d x 1 M M - normalization constant 2) then select two neighbors u and v of node x at random 24

24 Node Pair Sampling Based on UVS Two hop pairs S (2) Sample (u, v) in S (2) with probability (2) π [u,v] = m(u, v)/m m(u, v) - number of neighbors common to u, v Estimator: given node pairs u i, v i, i = 1,, K ω k (2 ) = 1 H n 1(F u i, v i = a k ) i=1 m(u i, v i ), k = 1,, K H normalization constant 25

25 Node Pair Sampling Based on UVS Two hop pairs S (2) Accuracy: ω k (2) - asymptotically unbiased estimator of ω k (2), k = 1,, K. Meanwhile, P ω k 2 ω k 2 2εω 2 k 1 ε 1 1 nε 2 m ω k 2 + m 2 where 0 < ε < 1 and m = M/ S 2 26

26 Node Pair Sampling Based on RW Why? UVS not available, too costly. API not provided user IDs sparsely distributed Only crawling techniques can be used Breadth first search: unknown sampling bias Random walk: Walker moves to random neighbor, samples its information. For connected and non-bipartite graph, probability of RW being at node v converges to π v = d v 2 E, v V 27

27 Node Pair Sampling Based on RW All pairs S Sample node pair u i, v i by two independent RWs, where u i, v i are nodes sampled by two RWs at step i Node pair [u,v] sampled according to stationary distribution π [u,v] = d ud v, u, v V 4 E 2 28

28 Node Pair Sampling Based on RW All pairs S Estimator: given sampled node pairs u i, v i, i = 1,, n ω k = 1 J n 1(F u i, v i = a k )1(u i v i ) i=1 d ui d vi, k = 1,, K J normalization constant Accuracy: ω k, k = 1,, K is asymptotically unbiased estimate of ω k 29

29 Node Pair Sampling Based on RW One hop pairs S (1) Sampling method: random node pair [u i, v i ] sampled by RW u i, v i are nodes sampled at steps i and i + 1 Probability of RW sampling node pairs in S (1) are equal when RW reaches the steady state. 30

30 Node Pair Sampling Based on RW One hop pairs S (1) Estimator: given sampled node pairs u i, v i, i = 1,, K ω k (1 ) = 1 n n i=1 1(F u i, v i = a k ), k = 1,, K Accuracy: ω k (1 ) asymptotically unbiased estimate of ω k (1), 1 k K 31

31 Node Pair Sampling Based on RW Two hop pairs S (2) Neighborhood RW (NRW) current edge (u, v) next edge: select a random edge from edges connected to u or v, except edge (u, v) 32

32 Node Pair Sampling Based on RW Two hop pairs S (2) Select node pair by excluding same node in two edges consequently sampled by NRW. For example, output node pair [u, v] for two consequently sampled edges (u, w) and (w, v) Probability NRW samples node pair [u,v] in S (2) converges to (2) π [u,v] = m(u, v)/m m(u, v) - number neighbors common to u, v 33

33 Node Pair Sampling Based on RW Two hop pairs S (2) Estimator: given sampled node pairs u i, v i, i = 1,, n ω k (2 ) = 1 H n 1(F u i, v i = a k ) i=1 m(u i, v i ) where H = n i=1 1/m(u i, v i ) Accuracy: ω k (2 ) asymptotically unbiased estimate of ω k (2), 1 k K 34

34 Simulations: Distance Distribution Estimation B - number of sampled node pairs S - total number of node pairs error metric - NMSE ω k = E ω k ω k 2 ω k, k = 1,, K Gnutella - V 6300 B >.005 S, NMSE < 1 NMSE proportional to inverse of sqr() 35

35 Simulations: Mutual Neighbor Count Distribution in S (1) B = 0.01 S 1 node pairs sampled from S 1 36

36 Simulations: Mutual Neighbor Count Distribution in S (2) B = 0.01 S 2 node pairs sampled from S 2 37

37 Simulations: Interest Distribution Scheme Distribute interests over graph: 10 5 distinct interests: number nodes per interest ~ truncated Pareto distribution over {1,,10 3 } To distribute an interest possessed by k different nodes 1. select a random node v that can reach at least k- 1 different nodes 2. distribute interest to node v and closest k-1 nodes connected to v 38

38 Simulations: Common Interest Count Distribution for Generated Content CCDF of common interest count distribution for S, S (1), and S (2). Wiki-vote P2P-Gnutella 39

39 Simulations: Common Interest Count Distribution in S B = 0.01 S node pairs sampled from S Wiki-vote P2P-Gnutella 40

40 Simulations: Common Interest Count Distribution in S (1) B = 0.01 S (1) node pairs sampled from S (1) Wiki-vote P2P-Gnutella RW ~ IWVS > MHWVS => no topology 41

41 Simulations: Common Interest Count Distribution in S (2) B = 0.01 S 2 node pairs sampled from S 2 Wiki-vote P2P-Gnutella 42

42 Conclusions use sampling to estimate pair characteristics in sets S, S (1), and S (2). sampling methods based on independent vertex sampling and random walk produce asymptotically unbiased estimates validated approaches on wide range of graphs 43

43 Conclusions Markov Chain Mixing Times Other more powerful & elegant sampling methods: Frontier Sampling Efficiently Estimating Motif Statistics of Large Networks in the Dark. TKDD 2014 Design of Efficient Sampling Methods on Hybrid Social- Affiliation Networks. Accepted in IEEE ICDE 15 measuring, maximizing group closeness centrality over diskresident graphs. WWW 14 44

44 Thanks! Slides (will be) at r-sampling.pdf

Sampling Large Graphs: Algorithms and Applications

Sampling Large Graphs: Algorithms and Applications Sampling Large Graphs: Algorithms and Applications Don Towsley College of Information & Computer Science Umass - Amherst Collaborators: P.H. Wang, J.C.S. Lui, J.Z. Zhou, X. Guan Measuring, analyzing large

More information

How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200?

How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200? How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200? Questions from last time Avg. FB degree is 200 (suppose).

More information

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna,

More information

Empirical Characterization of P2P Systems

Empirical Characterization of P2P Systems Empirical Characterization of P2P Systems Reza Rejaie Mirage Research Group Department of Computer & Information Science University of Oregon http://mirage.cs.uoregon.edu/ Collaborators: Daniel Stutzbach

More information

Fast Low-Cost Estimation of Network Properties Using Random Walks

Fast Low-Cost Estimation of Network Properties Using Random Walks Fast Low-Cost Estimation of Network Properties Using Random Walks Colin Cooper, Tomasz Radzik, and Yiannis Siantos Department of Informatics, King s College London, WC2R 2LS, UK Abstract. We study the

More information

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 Reiews/Critiques I will choose one reiew to grade this week. Graph Data: Social

More information

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 233 Spring 2018 Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy,

More information

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks A Walk in Facebook: Uniform Sampling of Users in Online Social Networks Minas Gjoka, Maciej Kurant, Carter T. Butts, Athina Markopoulou California Institute for Telecommunications and Information Technology

More information

Modeling and Estimating the Structure of D2D-Based Mobile Social Networks

Modeling and Estimating the Structure of D2D-Based Mobile Social Networks Modeling and Estimating the Structure of DD-Based Mobile Social Networks Sigit Aryo Pambudi Wenye Wang Department of Electrical and Computer Engineering North Carolina State University, Raleigh, NC 7606

More information

Optimizing Capacity-Heterogeneous Unstructured P2P Networks for Random-Walk Traffic

Optimizing Capacity-Heterogeneous Unstructured P2P Networks for Random-Walk Traffic Optimizing Capacity-Heterogeneous Unstructured P2P Networks for Random-Walk Traffic Chandan Rama Reddy Microsoft Joint work with Derek Leonard and Dmitri Loguinov Internet Research Lab Department of Computer

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Outsourcing Privacy-Preserving Social Networks to a Cloud

Outsourcing Privacy-Preserving Social Networks to a Cloud IEEE INFOCOM 2013, April 14-19, Turin, Italy Outsourcing Privacy-Preserving Social Networks to a Cloud Guojun Wang a, Qin Liu a, Feng Li c, Shuhui Yang d, and Jie Wu b a Central South University, China

More information

Absorbing Random walks Coverage

Absorbing Random walks Coverage DATA MINING LECTURE 3 Absorbing Random walks Coverage Random Walks on Graphs Random walk: Start from a node chosen uniformly at random with probability. n Pick one of the outgoing edges uniformly at random

More information

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks A Walk in Facebook: Uniform Sampling of Users in Online Social Networks Minas Gjoka CalIT2 UC Irvine mgjoka@uci.edu Maciej Kurant CalIT2 UC Irvine maciej.kurant@gmail.com Carter T. Butts Sociology Dept

More information

Extracting Information from Complex Networks

Extracting Information from Complex Networks Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform

More information

Absorbing Random walks Coverage

Absorbing Random walks Coverage DATA MINING LECTURE 3 Absorbing Random walks Coverage Random Walks on Graphs Random walk: Start from a node chosen uniformly at random with probability. n Pick one of the outgoing edges uniformly at random

More information

CENTRALITIES. Carlo PICCARDI. DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy

CENTRALITIES. Carlo PICCARDI. DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy CENTRALITIES Carlo PICCARDI DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy email carlo.piccardi@polimi.it http://home.deib.polimi.it/piccardi Carlo Piccardi

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul 1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given

More information

Unbiased Sampling of Facebook

Unbiased Sampling of Facebook Unbiased Sampling of Facebook Minas Gjoka Networked Systems UC Irvine mgjoka@uci.edu Maciej Kurant School of ICS EPFL, Lausanne maciej.kurant@epfl.ch Carter T. Butts Sociology Dept UC Irvine buttsc@uci.edu

More information

Supervised Random Walks

Supervised Random Walks Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html

More information

CS6200 Information Retreival. The WebGraph. July 13, 2015

CS6200 Information Retreival. The WebGraph. July 13, 2015 CS6200 Information Retreival The WebGraph The WebGraph July 13, 2015 1 Web Graph: pages and links The WebGraph describes the directed links between pages of the World Wide Web. A directed edge connects

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths Analysis of Biological Networks 1. Clustering 2. Random Walks 3. Finding paths Problem 1: Graph Clustering Finding dense subgraphs Applications Identification of novel pathways, complexes, other modules?

More information

Neural Network Weight Selection Using Genetic Algorithms

Neural Network Weight Selection Using Genetic Algorithms Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks

More information

Estimating and Sampling Graphs with Multidimensional Random Walks. Group 2: Mingyan Zhao, Chengle Zhang, Biao Yin, Yuchen Liu

Estimating and Sampling Graphs with Multidimensional Random Walks. Group 2: Mingyan Zhao, Chengle Zhang, Biao Yin, Yuchen Liu Estimating and Sampling Graphs with Multidimensional Random Walks Group 2: Mingyan Zhao, Chengle Zhang, Biao Yin, Yuchen Liu Complex Network Motivation Social Network Biological Network erence: www.forbe.com

More information

Complex-Network Modelling and Inference

Complex-Network Modelling and Inference Complex-Network Modelling and Inference Lecture 8: Graph features (2) Matthew Roughan http://www.maths.adelaide.edu.au/matthew.roughan/notes/ Network_Modelling/ School

More information

CS 5220: Parallel Graph Algorithms. David Bindel

CS 5220: Parallel Graph Algorithms. David Bindel CS 5220: Parallel Graph Algorithms David Bindel 2017-11-14 1 Graphs Mathematically: G = (V, E) where E V V Convention: V = n and E = m May be directed or undirected May have weights w V : V R or w E :

More information

Link Farming in Twitter

Link Farming in Twitter Link Farming in Twitter Pawan Goyal CSE, IITKGP Nov 11, 2016 Pawan Goyal (IIT Kharagpur) Link Farming in Twitter Nov 11, 2016 1 / 1 Reference Saptarshi Ghosh, Bimal Viswanath, Farshad Kooti, Naveen Kumar

More information

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that

More information

Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web

Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 4: Analyzing Graphs (1/2) October 4, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are

More information

Estimating Sizes of Social Networks via Biased Sampling

Estimating Sizes of Social Networks via Biased Sampling Estimating Sizes of Social Networks via Biased Sampling Liran Katzir, Edo Liberty, and Oren Somekh Yahoo! Labs, Haifa, Israel International World Wide Web Conference, 28th March - 1st April 2011, Hyderabad,

More information

Lecture 6: Spectral Graph Theory I

Lecture 6: Spectral Graph Theory I A Theorist s Toolkit (CMU 18-859T, Fall 013) Lecture 6: Spectral Graph Theory I September 5, 013 Lecturer: Ryan O Donnell Scribe: Jennifer Iglesias 1 Graph Theory For this course we will be working on

More information

Using PageRank in Feature Selection

Using PageRank in Feature Selection Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important

More information

Online Social Networks and Media

Online Social Networks and Media Online Social Networks and Media Absorbing Random Walks Link Prediction Why does the Power Method work? If a matrix R is real and symmetric, it has real eigenvalues and eigenvectors: λ, w, λ 2, w 2,, (λ

More information

Overlay (and P2P) Networks

Overlay (and P2P) Networks Overlay (and P2P) Networks Part II Recap (Small World, Erdös Rényi model, Duncan Watts Model) Graph Properties Scale Free Networks Preferential Attachment Evolving Copying Navigation in Small World Samu

More information

Making social interactions accessible in online social networks

Making social interactions accessible in online social networks Information Services & Use 33 (2013) 113 117 113 DOI 10.3233/ISU-130702 IOS Press Making social interactions accessible in online social networks Fredrik Erlandsson a,, Roozbeh Nia a, Henric Johnson b

More information

Nonparametric Importance Sampling for Big Data

Nonparametric Importance Sampling for Big Data Nonparametric Importance Sampling for Big Data Abigael C. Nachtsheim Research Training Group Spring 2018 Advisor: Dr. Stufken SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Motivation Goal: build a model

More information

Topic mash II: assortativity, resilience, link prediction CS224W

Topic mash II: assortativity, resilience, link prediction CS224W Topic mash II: assortativity, resilience, link prediction CS224W Outline Node vs. edge percolation Resilience of randomly vs. preferentially grown networks Resilience in real-world networks network resilience

More information

Diffusion and Clustering on Large Graphs

Diffusion and Clustering on Large Graphs Diffusion and Clustering on Large Graphs Alexander Tsiatas Thesis Proposal / Advancement Exam 8 December 2011 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of

More information

Using PageRank in Feature Selection

Using PageRank in Feature Selection Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important

More information

Switching for a Small World. Vilhelm Verendel. Master s Thesis in Complex Adaptive Systems

Switching for a Small World. Vilhelm Verendel. Master s Thesis in Complex Adaptive Systems Switching for a Small World Vilhelm Verendel Master s Thesis in Complex Adaptive Systems Division of Computing Science Department of Computer Science and Engineering Chalmers University of Technology Göteborg,

More information

node2vec: Scalable Feature Learning for Networks

node2vec: Scalable Feature Learning for Networks node2vec: Scalable Feature Learning for Networks A paper by Aditya Grover and Jure Leskovec, presented at Knowledge Discovery and Data Mining 16. 11/27/2018 Presented by: Dharvi Verma CS 848: Graph Database

More information

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader A New Parallel Algorithm for Connected Components in Dynamic Graphs Robert McColl Oded Green David Bader Overview The Problem Target Datasets Prior Work Parent-Neighbor Subgraph Results Conclusions Problem

More information

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han Chapter 1. Social Media and Social Computing October 2012 Youn-Hee Han http://link.koreatech.ac.kr 1.1 Social Media A rapid development and change of the Web and the Internet Participatory web application

More information

Graph Data Management

Graph Data Management Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of

More information

CS 220: Discrete Structures and their Applications. graphs zybooks chapter 10

CS 220: Discrete Structures and their Applications. graphs zybooks chapter 10 CS 220: Discrete Structures and their Applications graphs zybooks chapter 10 directed graphs A collection of vertices and directed edges What can this represent? undirected graphs A collection of vertices

More information

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov Algorithms and Applications in Social Networks 2017/2018, Semester B Slava Novgorodov 1 Lesson #1 Administrative questions Course overview Introduction to Social Networks Basic definitions Network properties

More information

Bruno Martins. 1 st Semester 2012/2013

Bruno Martins. 1 st Semester 2012/2013 Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4

More information

Local Algorithms for Sparse Spanning Graphs

Local Algorithms for Sparse Spanning Graphs Local Algorithms for Sparse Spanning Graphs Reut Levi Dana Ron Ronitt Rubinfeld Intro slides based on a talk given by Reut Levi Minimum Spanning Graph (Spanning Tree) Local Access to a Minimum Spanning

More information

Research Article Link Prediction in Directed Network and Its Application in Microblog

Research Article Link Prediction in Directed Network and Its Application in Microblog Mathematical Problems in Engineering, Article ID 509282, 8 pages http://dx.doi.org/10.1155/2014/509282 Research Article Link Prediction in Directed Network and Its Application in Microblog Yan Yu 1,2 and

More information

A large-scale community structure analysis in Facebook

A large-scale community structure analysis in Facebook Ferrara EPJ Data Science 2012, 1:9 REGULAR ARTICLE OpenAccess A large-scale community structure analysis in Facebook Emilio Ferrara * * Correspondence: ferrarae@indiana.edu Center for Complex Networks

More information

DS504/CS586: Big Data Analytics Data acquisition and measurement Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data acquisition and measurement Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Data acquisition and measurement Prof. Yanhua Li Time: 6:00pm 8:50pm THURSDAY Location: AK 232 Fall 2016 Data acquisition and measurement ia Sampling and Estimation

More information

Inferring Coarse Views of Connectivity in Very Large Graphs

Inferring Coarse Views of Connectivity in Very Large Graphs Inferring Coarse Views of Connectivity in Very Large Graphs Reza Motamedi, Reza Rejaie, Walter Willinger, Daniel Lowd, Roberto Gonzalez http://onrg.cs.uoregon.edu/walkabout 10/8/14 1 Introduction! Large-scale

More information

Mining and Analyzing Online Social Networks

Mining and Analyzing Online Social Networks The 5th EuroSys Doctoral Workshop (EuroDW 2011) Salzburg, Austria, Sunday 10 April 2011 Mining and Analyzing Online Social Networks Emilio Ferrara eferrara@unime.it Advisor: Prof. Giacomo Fiumara PhD School

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Cycles in Random Graphs

Cycles in Random Graphs Cycles in Random Graphs Valery Van Kerrebroeck Enzo Marinari, Guilhem Semerjian [Phys. Rev. E 75, 066708 (2007)] [J. Phys. Conf. Series 95, 012014 (2008)] Outline Introduction Statistical Mechanics Approach

More information

Network embedding. Cheng Zheng

Network embedding. Cheng Zheng Network embedding Cheng Zheng Outline Problem definition Factorization based algorithms --- Laplacian Eigenmaps(NIPS, 2001) Random walk based algorithms ---DeepWalk(KDD, 2014), node2vec(kdd, 2016) Deep

More information

Multigraph Sampling of Online Social Networks

Multigraph Sampling of Online Social Networks Multigraph Sampling of Online Social Networks Minas Gjoka CalIT2 UC Irvine mgjoka@uci.edu Carter T. Butts Sociology Dept UC Irvine buttsc@uci.edu Maciej Kurant CalIT2 UC Irvine maciej.kurant@gmail.com

More information

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Gautam Bhat, Rajeev Kumar Singh Department of Computer Science and Engineering Shiv Nadar University Gautam Buddh Nagar,

More information

Introduction to Graph Theory

Introduction to Graph Theory Introduction to Graph Theory Tandy Warnow January 20, 2017 Graphs Tandy Warnow Graphs A graph G = (V, E) is an object that contains a vertex set V and an edge set E. We also write V (G) to denote the vertex

More information

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech CSE 6242/ CX 4242 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John

More information

Graph traversal and BFS

Graph traversal and BFS Graph traversal and BFS Fundamental building block Graph traversal is part of many important tasks Connected components Tree/Cycle detection Articulation vertex finding Real-world applications Peer-to-peer

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 60 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Pregel: A System for Large-Scale Graph Processing

More information

SI Networks: Theory and Application, Fall 2008

SI Networks: Theory and Application, Fall 2008 University of Michigan Deep Blue deepblue.lib.umich.edu 2008-09 SI 508 - Networks: Theory and Application, Fall 2008 Adamic, Lada Adamic, L. (2008, November 12). Networks: Theory and Application. Retrieved

More information

Lecture Note: Computation problems in social. network analysis

Lecture Note: Computation problems in social. network analysis Lecture Note: Computation problems in social network analysis Bang Ye Wu CSIE, Chung Cheng University, Taiwan September 29, 2008 In this lecture note, several computational problems are listed, including

More information

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech CSE 6242/ CX 4242 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John

More information

Sequential Algorithms for Generating Random Graphs. Amin Saberi Stanford University

Sequential Algorithms for Generating Random Graphs. Amin Saberi Stanford University Sequential Algorithms for Generating Random Graphs Amin Saberi Stanford University Outline Generating random graphs with given degrees (joint work with M Bayati and JH Kim) Generating random graphs with

More information

Analysis of Large Graphs: TrustRank and WebSpam

Analysis of Large Graphs: TrustRank and WebSpam Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Social Network Analysis

Social Network Analysis Social Network Analysis Mathematics of Networks Manar Mohaisen Department of EEC Engineering Adjacency matrix Network types Edge list Adjacency list Graph representation 2 Adjacency matrix Adjacency matrix

More information

Wedge A New Frontier for Pull-based Graph Processing. Samuel Grossman and Christos Kozyrakis Platform Lab Retreat June 8, 2018

Wedge A New Frontier for Pull-based Graph Processing. Samuel Grossman and Christos Kozyrakis Platform Lab Retreat June 8, 2018 Wedge A New Frontier for Pull-based Graph Processing Samuel Grossman and Christos Kozyrakis Platform Lab Retreat June 8, 2018 Graph Processing Problems modelled as objects (vertices) and connections between

More information

Basics of Network Analysis

Basics of Network Analysis Basics of Network Analysis Hiroki Sayama sayama@binghamton.edu Graph = Network G(V, E): graph (network) V: vertices (nodes), E: edges (links) 1 Nodes = 1, 2, 3, 4, 5 2 3 Links = 12, 13, 15, 23,

More information

PULP: Fast and Simple Complex Network Partitioning

PULP: Fast and Simple Complex Network Partitioning PULP: Fast and Simple Complex Network Partitioning George Slota #,* Kamesh Madduri # Siva Rajamanickam * # The Pennsylvania State University *Sandia National Laboratories Dagstuhl Seminar 14461 November

More information

Short-Cut MCMC: An Alternative to Adaptation

Short-Cut MCMC: An Alternative to Adaptation Short-Cut MCMC: An Alternative to Adaptation Radford M. Neal Dept. of Statistics and Dept. of Computer Science University of Toronto http://www.cs.utoronto.ca/ radford/ Third Workshop on Monte Carlo Methods,

More information

Practical Session No. 12 Graphs, BFS, DFS, Topological sort

Practical Session No. 12 Graphs, BFS, DFS, Topological sort Practical Session No. 12 Graphs, BFS, DFS, Topological sort Graphs and BFS Graph G = (V, E) Graph Representations (V G ) v1 v n V(G) = V - Set of all vertices in G E(G) = E - Set of all edges (u,v) in

More information

Multidimensional Arrays & Graphs. CMSC 420: Lecture 3

Multidimensional Arrays & Graphs. CMSC 420: Lecture 3 Multidimensional Arrays & Graphs CMSC 420: Lecture 3 Mini-Review Abstract Data Types: List Stack Queue Deque Dictionary Set Implementations: Linked Lists Circularly linked lists Doubly linked lists XOR

More information

Economics of Information Networks

Economics of Information Networks Economics of Information Networks Stephen Turnbull Division of Policy and Planning Sciences Lecture 4: December 7, 2017 Abstract We continue discussion of the modern economics of networks, which considers

More information

Copyright 2000, Kevin Wayne 1

Copyright 2000, Kevin Wayne 1 Chapter 3 - Graphs Undirected Graphs Undirected graph. G = (V, E) V = nodes. E = edges between pairs of nodes. Captures pairwise relationship between objects. Graph size parameters: n = V, m = E. Directed

More information

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu G(B)enchmark GraphBench: Towards a Universal Graph Benchmark Khaled Ammar M. Tamer Özsu Bioinformatics Software Engineering Social Network Gene Co-expression Protein Structure Program Flow Big Graphs o

More information

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS Overview of Networks Instructor: Yizhou Sun yzsun@cs.ucla.edu January 10, 2017 Overview of Information Network Analysis Network Representation Network

More information

De#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs,

De#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs, De#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs, Jianwei Qian Illinois Tech Chunhong Zhang BUPT Xiang#Yang Li USTC,/Illinois Tech Linlin Chen Illinois Tech Outline

More information

Graph Exploration: How to do better than the random walk? Adrian Kosowski. INRIA Bordeaux Sud-Ouest.

Graph Exploration: How to do better than the random walk? Adrian Kosowski. INRIA Bordeaux Sud-Ouest. Graph Exploration: How to do better than the random walk? Adrian Kosowski INRIA Bordeaux Sud-Ouest kosowski@labri.fr Réunion Displexity La Rochelle, April 4, 2013 Talk outline Introduction to network exploration

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a !"#$ %#& ' Introduction ' Social network analysis ' Co-citation and bibliographic coupling ' PageRank ' HIS ' Summary ()*+,-/*,) Early search engines mainly compare content similarity of the query and

More information

Outline. Last 3 Weeks. Today. General background. web characterization ( web archaeology ) size and shape of the web

Outline. Last 3 Weeks. Today. General background. web characterization ( web archaeology ) size and shape of the web Web Structures Outline Last 3 Weeks General background Today web characterization ( web archaeology ) size and shape of the web What is the size of the web? Issues The web is really infinite Dynamic content,

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum

More information

Algorithms Activity 6: Applications of BFS

Algorithms Activity 6: Applications of BFS Algorithms Activity 6: Applications of BFS Suppose we have a graph G = (V, E). A given graph could have zero edges, or it could have lots of edges, or anything in between. Let s think about the range of

More information

Link Analysis in the Cloud

Link Analysis in the Cloud Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)

More information

Collecting social media data based on open APIs

Collecting social media data based on open APIs Collecting social media data based on open APIs Ye Li With Qunyan Zhang, Haixin Ma, Weining Qian, and Aoying Zhou http://database.ecnu.edu.cn/ Outline Social Media Data Set Data Feature Data Model Data

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org A 3.3 B 38.4 C 34.3 D 3.9 E 8.1 F 3.9 1.6 1.6 1.6 1.6 1.6 2 y 0.8 ½+0.2 ⅓ M 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 [1/N]

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/4/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org J.

More information

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system. Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.

More information

Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material.

Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material. Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material. 1 Contents Introduction Network properties Social network analysis Co-citation

More information

The Role of Homophily and Popularity in Informed Decentralized Search

The Role of Homophily and Popularity in Informed Decentralized Search The Role of Homophily and Popularity in Informed Decentralized Search Florian Geigl and Denis Helic Knowledge Technologies Institute In eldgasse 13/5. floor, 8010 Graz, Austria {florian.geigl,dhelic}@tugraz.at

More information

Evolution of Attribute-Augmented Social Networks: Measurements, Modeling, and Implications using Google+

Evolution of Attribute-Augmented Social Networks: Measurements, Modeling, and Implications using Google+ Evolution of Attribute-Augmented Social Networks: Measurements, Modeling, and Implications using Google+ Neil Zhenqiang Gong, Wenchang Xu, Ling Huang, Prateek Mittal, Emil Stefanov, Vyas Sekar, Dawn Song

More information

Randomized Rumor Spreading in Social Networks

Randomized Rumor Spreading in Social Networks Randomized Rumor Spreading in Social Networks Benjamin Doerr (MPI Informatics / Saarland U) Summary: We study how fast rumors spread in social networks. For the preferential attachment network model and

More information