Estimating Sizes of Social Networks via Biased Sampling
|
|
- Phebe Jefferson
- 5 years ago
- Views:
Transcription
1 Estimating Sizes of Social Networks via Biased Sampling Liran Katzir, Edo Liberty, and Oren Somekh Yahoo! Labs, Haifa, Israel International World Wide Web Conference, 28th March - 1st April 2011, Hyderabad, India Yahoo! Labs: WWW / 20
2 Social Network size estimation Goal: Obtaining estimates for sizes of (sub)populations in social network. Why: Advertisement - estimate of market share. Business development - merger/acquisition or asset valuation. Yahoo! Labs: WWW / 20
3 The Problem Difficulties: Social network have become pretty big: Facebook (650,000,000) Qzone (200,000,000) Twitter (175,000,000)... No public API for population size queries. What is the total number of registered users? What is the number of registered (self-declared) year olds living in New-York? Even if a public API is provided an independent estimate is needed. Exhaustive crawl is time/space/communication intensive and violates politeness. Yahoo! Labs: WWW / 20
4 Population size estimation Population sizes can be estimated efficiently using the birthday paradox. The birthday paradox : Given r uniform samples from a set of n elements, the expected number of collisions is r(r 1) 2n. A collision is a pair of identical samples. Example: Samples: X = (d, b, b, a, b, e). Total 3 collisions, (x 2, x 3 ), (x 2,x 5 ), and (x 3,x 5 ). Yahoo! Labs: WWW / 20
5 Population size estimation Using the birthday paradox inversely: When observing C collisions the pouplation can be estimated by n r 2 2C If r = const n this gives a rather good estimator. Similar to mark-and-recapture which counts collisions between two sample sets (but is essentially equivalent). Newer version of mark-and-recapture also handles non-uniform but a-priory known distributions [Chao, 1987]. Social network size estimation [Ye and Wu, 2010] Alas, we cannot sample users uniformly from most social networks... Yahoo! Labs: WWW / 20
6 Uniform distribution on graphs Social networks can be viewed as an undirected graph which we can traverse using their public APIs. Special random walks can generate close to uniform sampling: 1 Bipartite Query-Web page graph [Bharat and Broder, 1998] [Bar-Yossef and Gurevich, 2007]. 2 Social network [Gjoka et al, 2010]. Uses only r = const n samples, but obtaining each sample might be hard. Yahoo! Labs: WWW / 20
7 Graph size estimation It is possible to estimate the size of some graphs directly. 1 Estimate the size of a tree [Knuth, 1974]. 2 Estimate the size of a directed acyclic graph [Pitt, 1987]. We give an estimator for the size of undirected graphs (and sub graphs) which: 1 Counts collisions but uses the graph s stationary distribution. (does not require a uniform sample) 2 Requires asymptotically less than n samples to converge. 3 Obtains samples efficiently. (provable small number of random walk steps.) Yahoo! Labs: WWW / 20
8 Assumptions The graph can be traversed from nodes to neighboring nodes. We can perform a random walk the graph: start at any node In each step, proceed to one of the neighbors uniformly at random. Yahoo! Labs: WWW / 20
9 Facts about random walks This random walk yields the stationary distribution. 1 The probability to get the i th node is d i D. 2 d i i th node s degree. 3 D = n i=1 d i. taking a few steps/several walks ensures independence between two consecutive samples. Yahoo! Labs: WWW / 20
10 Algorithm Outline 1 Sample r users using random walk. 2 C the number of collisions. 3 Ψ 1 the sum of the sampled nodes degrees. 4 Ψ 1 the sum of the inverse sampled nodes degrees. The estimated number of nodes: ˆn = Ψ 1Ψ 1 2C. Yahoo! Labs: WWW / 20
11 Sampled Nodes: Sampled Node Degree: C: Ψ 1 : Ψ 1 : ˆn:
12 Sampled Nodes: Sampled Node Degree: C: Ψ 1 : Ψ 1 : ˆn:
13 Sampled Nodes: Sampled Node Degree: C: Ψ 1 : Ψ 1 : ˆn:
14 Sampled Nodes: Sampled Node Degree: C: Ψ 1 : Ψ 1 : ˆn:
15 Sampled Nodes: Sampled Node Degree: C: Ψ 1 : Ψ 1 : ˆn:
16 Sampled Nodes: Sampled Node Degree: C: Ψ 1 : Ψ 1 : ˆn:
17 Sampled Nodes: d Sampled Node Degree: 3 C: 0 Ψ 1 : 3 Ψ 1 : 1/3 ˆn:
18 Sampled Nodes: d Sampled Node Degree: 3 C: 0 Ψ 1 : 3 Ψ 1 : 1/3 ˆn:
19 Sampled Nodes: d Sampled Node Degree: 3 C: 0 Ψ 1 : 3 Ψ 1 : 1/3 ˆn:
20 Sampled Nodes: d Sampled Node Degree: 3 C: 0 Ψ 1 : 3 Ψ 1 : 1/3 ˆn:
21 Sampled Nodes: d Sampled Node Degree: 3 C: 0 Ψ 1 : 3 Ψ 1 : 1/3 ˆn:
22 Sampled Nodes: d Sampled Node Degree: 3 C: 0 Ψ 1 : 3 Ψ 1 : 1/3 ˆn:
23 Sampled Nodes: d f Sampled Node Degree: 3 2 C: 0 0 Ψ 1 : 3 5 Ψ 1 : 1/3 5/6 ˆn:
24 Sampled Nodes: d f Sampled Node Degree: 3 2 C: 0 0 Ψ 1 : 3 5 Ψ 1 : 1/3 5/6 ˆn:
25 Sampled Nodes: d f Sampled Node Degree: 3 2 C: 0 0 Ψ 1 : 3 5 Ψ 1 : 1/3 5/6 ˆn:
26 Sampled Nodes: d f Sampled Node Degree: 3 2 C: 0 0 Ψ 1 : 3 5 Ψ 1 : 1/3 5/6 ˆn:
27 Sampled Nodes: d f Sampled Node Degree: 3 2 C: 0 0 Ψ 1 : 3 5 Ψ 1 : 1/3 5/6 ˆn:
28 Sampled Nodes: d f Sampled Node Degree: 3 2 C: 0 0 Ψ 1 : 3 5 Ψ 1 : 1/3 5/6 ˆn:
29 Sampled Nodes: d f f Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 ˆn: 4
30 Sampled Nodes: d f f Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 ˆn: 4
31 Sampled Nodes: d f f Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 ˆn: 4
32 Sampled Nodes: d f f Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 ˆn: 4
33 Sampled Nodes: d f f Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 ˆn: 4
34 Sampled Nodes: d f f Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 ˆn: 4
35 Sampled Nodes: d f f c Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 19/12 ˆn: 4 8
36 Sampled Nodes: d f f c Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 19/12 ˆn: 4 8
37 Sampled Nodes: d f f c Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 19/12 ˆn: 4 8
38 Sampled Nodes: d f f c Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 19/12 ˆn: 4 8
39 Sampled Nodes: d f f c Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 19/12 ˆn: 4 8
40 Sampled Nodes: d f f c Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 19/12 ˆn: 4 8
41 Sampled Nodes: d f f c c Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 19/12 22/12 ˆn: 4 8 6
42 Sampled Nodes: d f f c c Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 19/12 22/12 ˆn: 4 8 6
43 Sampled Nodes: d f f c c Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 19/12 22/12 ˆn: 4 8 6
44 Sampled Nodes: d f f c c Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 19/12 22/12 ˆn: 4 8 6
45 Sampled Nodes: d f f c c Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 19/12 22/12 ˆn: 4 8 6
46 Sampled Nodes: d f f c c Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 19/12 22/12 ˆn: 4 8 6
47 Sampled Nodes: d f f c c d Sampled Node Degree: C: Ψ 1 : Ψ 1 : 1/3 5/6 16/12 19/12 22/12 26/12 ˆn:
48 Proof Intuition Notations: n the graph size, d i node i degree, Expectations: r number of samples D = n i=1 d i ˆn E [Ψ 1 ] = rd n i=1 E [C] = ( r 2) n i=1 ( di D ) 2, E [Ψ 1 ] = rn D ( di D ) 2. E [Ψ 1 ]E [Ψ 1 ] 2E [C] = n r r 1 n. ˆn = Ψ 1Ψ 1 2C E [Ψ 1]E [Ψ 1 ] 2E [C] n Yahoo! Labs: WWW / 20
49 Analytic Results Main statement: Using r(n, ɛ, δ) samples: Pr[n(1 ɛ) ˆn n(1 + ɛ)] 1 δ Uniform vs Biased: Example n = 10 9 n 30, n log n 6, 000. Sampling method Number of samples Any graph, uniform O( n) Synthetic graph, Zipfian degree distribution α = 2, d m = n, O( 4 n log n) random walk Yahoo! Labs: WWW / 20
50 Setup Networks of known sizes: Network Size Edges Synthetic 1,000,000 Zipfian α = 2, d m = 1000 DBLP 845,211 co-authorship IMDB 1,955,508 co-casting Yahoo! Labs: WWW / 20
51 A Synthetic Network, Degree Zipfian α = 2, d m = 1000 Size estimation [Relative to network size] Synthetic network Confidence interval Unif. dist. non unique 95% Deg. dist. non unique 95% Deg. dist. non unique 5% Unif. dist. non unique 5% Number of samples [Percentage of network size] Yahoo! Labs: WWW / 20
52 DBLP - The Digital Bibliography and Library Project Size estimation [Relative to network size] DBLP network Confidence interval Unif. dist. non unique 95% Deg. dist. non unique 95% Deg. dist. non unique 5% Unif. dist. non unique 5% Number of samples [Percentage of network size] Yahoo! Labs: WWW / 20
53 IMDB - The Internet Movie Database Size estimation [Relative to network size] IMDB Confidence interval Unif. dist. non unique 95% Deg. dist. non unique 95% Deg. dist. non unique 5% Unif. dist. non unique 5% Number of samples [Percentage of network size] Yahoo! Labs: WWW / 20
54 Facebook Date April 2009 October 2010 Sampling method uniform random walk Number of samples Collision estimator Facebook report Thanks to Minas Gjoka for the Facebook crawls. Yahoo! Labs: WWW / 20
55 Conclusions An efficient algorithm to estimate the size of a social network using public API was presented. Its effectiveness was demonstrated on synthetic and real world networks. This algorithm outperforms prior art methods by using biased sampling. This algorithm also applies for sub-populations. Yahoo! Labs: WWW / 20
56 Thanks! Yahoo! Labs: WWW / 20
DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationDS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 233 Spring 2018 Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy,
More informationSampling Large Graphs: Algorithms and Applications
Sampling Large Graphs: Algorithms and Applications Don Towsley Umass - Amherst Joint work with P.H. Wang, J.Z. Zhou, J.C.S. Lui, X. Guan Measuring, Analyzing Large Networks - large networks can be represented
More informationOutsourcing Privacy-Preserving Social Networks to a Cloud
IEEE INFOCOM 2013, April 14-19, Turin, Italy Outsourcing Privacy-Preserving Social Networks to a Cloud Guojun Wang a, Qin Liu a, Feng Li c, Shuhui Yang d, and Jie Wu b a Central South University, China
More informationFast Low-Cost Estimation of Network Properties Using Random Walks
Fast Low-Cost Estimation of Network Properties Using Random Walks Colin Cooper, Tomasz Radzik, and Yiannis Siantos Department of Informatics, King s College London, WC2R 2LS, UK Abstract. We study the
More informationSocial Networks 2015 Lecture 10: The structure of the web and link analysis
04198250 Social Networks 2015 Lecture 10: The structure of the web and link analysis The structure of the web Information networks Nodes: pieces of information Links: different relations between information
More informationCS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul
1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given
More informationGraph and Link Mining
Graph and Link Mining Graphs - Basics A graph is a powerful abstraction for modeling entities and their pairwise relationships. G = (V,E) Set of nodes V = v,, v 5 Set of edges E = { v, v 2, v 4, v 5 }
More informationSampling Large Graphs: Algorithms and Applications
Sampling Large Graphs: Algorithms and Applications Don Towsley College of Information & Computer Science Umass - Amherst Collaborators: P.H. Wang, J.C.S. Lui, J.Z. Zhou, X. Guan Measuring, analyzing large
More informationSimilarity Ranking in Large- Scale Bipartite Graphs
Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 1 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2 AdWords Ads Ads
More informationPart 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationFramework and Algorithms for Network Bucket Testing
Framework and Algorithms for Network Bucket Testing Liran Katzir Yahoo! Labs., Haifa, Israel lirank@yahoo-inc.com Edo Liberty Yahoo! Labs., Haifa, Israel edo@yahoo-inc.com Oren Somekh Yahoo! Labs., Haifa,
More informationCounting YouTube Videos via Random Prefix Sampling
Counting YouTube Videos via Random Prefix Sampling Jia Zhou, Yanhua Li, Vijay Kumar Adhikari, and Zhi-Li Zhang Department of Computer Science and Engineering University of Minnesota Minneapolis, MN 55414,
More informationDS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 Reiews/Critiques I will choose one reiew to grade this week. Graph Data: Social
More informationLecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods
Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur
More informationSociaLite: A Datalog-based Language for
SociaLite: A Datalog-based Language for Large-Scale Graph Analysis Jiwon Seo M OBIS OCIAL RESEARCH GROUP Overview Overview! SociaLite: language for large-scale graph analysis! Extensions to Datalog! Compiler
More informationHow to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200?
How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200? Questions from last time Avg. FB degree is 200 (suppose).
More informationSybil defenses via social networks
Sybil defenses via social networks Abhishek University of Oslo, Norway 19/04/2012 1 / 24 Sybil identities Single user pretends many fake/sybil identities i.e., creating multiple accounts observed in real-world
More informationConcise Papers. Bias Correction in a Small Sample from Big Data 1 INTRODUCTION 2 RELATED WORK. Jianguo Lu and Dingding Li
658 I TRANSACTIONS ON KNOWLDG AND DATA NGINRING, VOL. 5, NO., NOVMBR 03 Concise Papers Bias Correction in a Small Sample from Big Data Jianguo Lu and Dingding Li Abstract This paper discusses the bias
More informationAspEm: Embedding Learning by Aspects in Heterogeneous Information Networks
AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, Jiawei Han University of Illinois at Urbana-Champaign (UIUC) Facebook Inc. U.S. Army Research
More informationEmpirical Characterization of P2P Systems
Empirical Characterization of P2P Systems Reza Rejaie Mirage Research Group Department of Computer & Information Science University of Oregon http://mirage.cs.uoregon.edu/ Collaborators: Daniel Stutzbach
More informationInformation Retrieval. Lecture 9 - Web search basics
Information Retrieval Lecture 9 - Web search basics Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 30 Introduction Up to now: techniques for general
More informationCPS 102: Discrete Mathematics. Quiz 3 Date: Wednesday November 30, Instructor: Bruce Maggs NAME: Prob # Score. Total 60
CPS 102: Discrete Mathematics Instructor: Bruce Maggs Quiz 3 Date: Wednesday November 30, 2011 NAME: Prob # Score Max Score 1 10 2 10 3 10 4 10 5 10 6 10 Total 60 1 Problem 1 [10 points] Find a minimum-cost
More informationDS504/CS586: Big Data Analytics Data acquisition and measurement Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data acquisition and measurement Prof. Yanhua Li Time: 6:00pm 8:50pm THURSDAY Location: AK 232 Fall 2016 Data acquisition and measurement ia Sampling and Estimation
More informationA New Algorithm for Multiple Key Interpolation Search in Uniform List of Numbers
A New Algorithm for Multiple Key Interpolation Search in Uniform List of Numbers AHMED TAREK California University of Pennsylvania Department of Math and Computer Science 50 University Avenue, California
More informationSummarizing and mining inverse distributions on data streams via dynamic inverse sampling
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling Presented by Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Irina Rozenbaum rozenbau@paul.rutgers.edu
More informationReduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs
Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome Motivation Recommendation
More informationANT-INSPIRED DENSITY ESTIMATION VIA RANDOM WALKS. Nancy Lynch, Cameron Musco, Hsin-Hao Su BDA 2016 July, 2016 Chicago, Illinois
ANT-INSPIRED DENSITY ESTIMATION VIA RANDOM WALKS Nancy Lynch, Cameron Musco, Hsin-Hao Su BDA 2016 July, 2016 Chicago, Illinois 1. Introduction Ants appear to use estimates of colony density (number of
More informationImpact of Clustering on Epidemics in Random Networks
Impact of Clustering on Epidemics in Random Networks Joint work with Marc Lelarge INRIA-ENS 8 March 2012 Coupechoux - Lelarge (INRIA-ENS) Epidemics in Random Networks 8 March 2012 1 / 19 Outline 1 Introduction
More informationA Walk in Facebook: Uniform Sampling of Users in Online Social Networks
A Walk in Facebook: Uniform Sampling of Users in Online Social Networks Minas Gjoka CalIT2 UC Irvine mgjoka@uci.edu Maciej Kurant CalIT2 UC Irvine maciej.kurant@gmail.com Carter T. Butts Sociology Dept
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams
More informationSocial and Technological Network Data Analytics. Lecture 5: Structure of the Web, Search and Power Laws. Prof Cecilia Mascolo
Social and Technological Network Data Analytics Lecture 5: Structure of the Web, Search and Power Laws Prof Cecilia Mascolo In This Lecture We describe power law networks and their properties and show
More informationTesting the Cluster Structure of Graphs Christian Sohler
Testing the Cluster Structure of Graphs Christian Sohler Very Large Networks Examples Social networks The World Wide Web Cocitation graphs Coauthorship graphs Data size GigaByte upto TeraByte (only the
More informationEfficient Search Engine Measurements
Efficient Search Engine Measurements Ziv Bar-Yossef Maxim Gurevich July 18, 2010 Abstract We address the problem of externally measuring aggregate functions over documents indexed by search engines, like
More informationTirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).
Tirgul 7 Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys belong to a universal group of keys, U = {1... M}.
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank
More informationarxiv: v1 [stat.me] 2 Oct 2018
SAMPLING-BASED ESTIMATION OF IN-DEGREE DISTRIBUTION WITH APPLICATIONS TO DIRECTED COMPLEX NETWORKS NELSON ANTUNES, SHANKAR BHAMIDI, TIANJIAN GUO, VLADAS PIPIRAS, AND BANG WANG arxiv:1810.01300v1 [stat.me]
More informationGraph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web
Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some
More informationCentralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge
Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationLink Structure Analysis
Link Structure Analysis Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!) Link Analysis In the Lecture HITS: topic-specific algorithm Assigns each page two scores a hub score
More informationCS250: Discrete Math for Computer Science. L20: Complete Induction and Proof of Euler s Characterization of Eulerian-Walks
CS250: Discrete Math for Computer Science L20: Complete Induction and Proof of Euler s Characterization of Eulerian-Walks Last time: Eulerian Graphs 1 2 1 0 1 2 2 2 2 4 2 2 0 1 2 3 4 5 Def. An Eulerian
More informationLecture 6: Spectral Graph Theory I
A Theorist s Toolkit (CMU 18-859T, Fall 013) Lecture 6: Spectral Graph Theory I September 5, 013 Lecturer: Ryan O Donnell Scribe: Jennifer Iglesias 1 Graph Theory For this course we will be working on
More informationScalable Influence Maximization in Social Networks under the Linear Threshold Model
Scalable Influence Maximization in Social Networks under the Linear Threshold Model Wei Chen Microsoft Research Asia Yifei Yuan Li Zhang In collaboration with University of Pennsylvania Microsoft Research
More informationAbsorbing Random walks Coverage
DATA MINING LECTURE 3 Absorbing Random walks Coverage Random Walks on Graphs Random walk: Start from a node chosen uniformly at random with probability. n Pick one of the outgoing edges uniformly at random
More informationEstimating Deep Web Properties by Random Walk
University of Windsor Scholarship at UWindsor Electronic Theses and Dissertations 2013 Estimating Deep Web Properties by Random Walk Sajib Kumer Sinha University of Windsor Follow this and additional works
More informationAbsorbing Random walks Coverage
DATA MINING LECTURE 3 Absorbing Random walks Coverage Random Walks on Graphs Random walk: Start from a node chosen uniformly at random with probability. n Pick one of the outgoing edges uniformly at random
More informationKEYWORD search is a well known method for extracting
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 7, JULY 2014 1657 Efficient Duplication Free and Minimal Keyword Search in Graphs Mehdi Kargar, Student Member, IEEE, Aijun An, Member,
More informationFigure 1: A directed graph.
1 Graphs A graph is a data structure that expresses relationships between objects. The objects are called nodes and the relationships are called edges. For example, social networks can be represented as
More informationRandom Sampling of Search Engine s Index Using Monte Carlo Simulation Method
Random Sampling of Search Engine s Index Using Monte Carlo Simulation Method Sajib Kumer Sinha University of Windsor Getting uniform random samples from a search engine s index is a challenging problem
More informationInitial Assumptions. Modern Distributed Computing. Network Topology. Initial Input
Initial Assumptions Modern Distributed Computing Theory and Applications Ioannis Chatzigiannakis Sapienza University of Rome Lecture 4 Tuesday, March 6, 03 Exercises correspond to problems studied during
More informationA Walk in Facebook: Uniform Sampling of Users in Online Social Networks
A Walk in Facebook: Uniform Sampling of Users in Online Social Networks Minas Gjoka, Maciej Kurant, Carter T. Butts, Athina Markopoulou California Institute for Telecommunications and Information Technology
More informationOnline Social Networks and Media
Online Social Networks and Media Absorbing Random Walks Link Prediction Why does the Power Method work? If a matrix R is real and symmetric, it has real eigenvalues and eigenvectors: λ, w, λ 2, w 2,, (λ
More informationCS 6604: Data Mining Large Networks and Time-Series
CS 6604: Data Mining Large Networks and Time-Series Soumya Vundekode Lecture #12: Centrality Metrics Prof. B Aditya Prakash Agenda Link Analysis and Web Search Searching the Web: The Problem of Ranking
More informationNew Directions in Traffic Measurement and Accounting. Need for traffic measurement. Relation to stream databases. Internet backbone monitoring
New Directions in Traffic Measurement and Accounting C. Estan and G. Varghese Presented by Aaditeshwar Seth 1 Need for traffic measurement Internet backbone monitoring Short term Detect DoS attacks Long
More informationGraph Cube: On Warehousing and OLAP Multidimensional Networks
Graph Cube: On Warehousing and OLAP Multidimensional Networks Peixiang Zhao, Xiaolei Li, Dong Xin, Jiawei Han Department of Computer Science, UIUC Groupon Inc. Google Cooperation pzhao4@illinois.edu, hanj@cs.illinois.edu
More informationCentrality in Large Networks
Centrality in Large Networks Mostafa H. Chehreghani May 14, 2017 Table of contents Centrality notions Exact algorithm Approximate algorithms Conclusion Centrality notions Exact algorithm Approximate algorithms
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REAL TIME DATA SEARCH OPTIMIZATION: AN OVERVIEW MS. DEEPASHRI S. KHAWASE 1, PROF.
More informationA brief history of Google
the math behind Sat 25 March 2006 A brief history of Google 1995-7 The Stanford days (aka Backrub(!?)) 1998 Yahoo! wouldn't buy (but they might invest...) 1999 Finally out of beta! Sergey Brin Larry Page
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More information09 B: Graph Algorithms II
Correctness and Complexity of 09 B: Graph Algorithms II CS1102S: Data Structures and Algorithms Martin Henz March 19, 2010 Generated on Thursday 18 th March, 2010, 00:20 CS1102S: Data Structures and Algorithms
More informationA quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:
The clustering problem: partition genes into distinct sets with high homogeneity and high separation Hierarchical clustering algorithm: 1. Assign each object to a separate cluster.. Regroup the pair of
More information1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:
CS 124 Section #8 Hashing, Skip Lists 3/20/17 1 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look
More informationSequential Monte Carlo Method for counting vertex covers
Sequential Monte Carlo Method for counting vertex covers Slava Vaisman Faculty of Industrial Engineering and Management Technion, Israel Institute of Technology Haifa, Israel May 18, 2013 Slava Vaisman
More informationRandom graph models with fixed degree sequences: choices, consequences and irreducibilty proofs for sampling
Random graph models with fixed degree sequences: choices, consequences and irreducibilty proofs for sampling Joel Nishimura 1, Bailey K Fosdick 2, Daniel B Larremore 3 and Johan Ugander 4 1 Arizona State
More informationSummary of Raptor Codes
Summary of Raptor Codes Tracey Ho October 29, 2003 1 Introduction This summary gives an overview of Raptor Codes, the latest class of codes proposed for reliable multicast in the Digital Fountain model.
More informationSOFIA: Social Filtering for Niche Markets
Social Filtering for Niche Markets Matteo Dell'Amico Licia Capra University College London UCL MobiSys Seminar 9 October 2007 : Social Filtering for Niche Markets Outline 1 Social Filtering Competence:
More informationGraph Exploration: How to do better than the random walk? Adrian Kosowski. INRIA Bordeaux Sud-Ouest.
Graph Exploration: How to do better than the random walk? Adrian Kosowski INRIA Bordeaux Sud-Ouest kosowski@labri.fr Réunion Displexity La Rochelle, April 4, 2013 Talk outline Introduction to network exploration
More informationRandom Sampling from a Search Engine s Index
Random Sampling from a Search Engine s Index Ziv Bar-Yossef Maxim Gurevich Department of Electrical Engineering Technion 1 Search Engine Samplers Search Engine Web Queries Public Interface Sampler Top
More informationUsing Non-Linear Dynamical Systems for Web Searching and Ranking
Using Non-Linear Dynamical Systems for Web Searching and Ranking Panayiotis Tsaparas Dipartmento di Informatica e Systemistica Universita di Roma, La Sapienza tsap@dis.uniroma.it ABSTRACT In the recent
More informationChoosing a Random Peer
Choosing a Random Peer Jared Saia University of New Mexico Joint Work with Valerie King University of Victoria and Scott Lewis University of New Mexico P2P problems Easy problems on small networks become
More informationGraph Data Processing with MapReduce
Distributed data processing on the Cloud Lecture 5 Graph Data Processing with MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, 2015 (licensed under Creation Commons Attribution
More informationEXTRACTION OF RELEVANT WEB PAGES USING DATA MINING
Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,
More informationGraph Theory for Network Science
Graph Theory for Network Science Dr. Natarajan Meghanathan Professor Department of Computer Science Jackson State University, Jackson, MS E-mail: natarajan.meghanathan@jsums.edu Networks or Graphs We typically
More informationBiological Networks Analysis
Biological Networks Analysis Introduction and Dijkstra s algorithm Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The clustering problem: partition genes into distinct
More informationNavigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets
Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Nadathur Satish, Narayanan Sundaram, Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming
More information1.1 Our Solution: Random Walks for Uniform Sampling In order to estimate the results of aggregate queries or the fraction of all web pages that would
Approximating Aggregate Queries about Web Pages via Random Walks Λ Ziv Bar-Yossef y Alexander Berg Steve Chien z Jittat Fakcharoenphol x Dror Weitz Computer Science Division University of California at
More informationOutline. Last 3 Weeks. Today. General background. web characterization ( web archaeology ) size and shape of the web
Web Structures Outline Last 3 Weeks General background Today web characterization ( web archaeology ) size and shape of the web What is the size of the web? Issues The web is really infinite Dynamic content,
More informationLocal Partitioning using PageRank
Local Partitioning using PageRank Reid Andersen Fan Chung Kevin Lang UCSD, UCSD, Yahoo! What is a local partitioning algorithm? An algorithm for dividing a graph into two pieces. Instead of searching for
More informationASAP: Fast, Approximate Graph Pattern Mining at Scale
ASAP: Fast, Approximate Graph Pattern Mining at Scale Anand Padmanabha Iyer, UC Berkeley; Zaoxing Liu and Xin Jin, Johns Hopkins University; Shivaram Venkataraman, Microsoft Research / University of Wisconsin;
More informationGary Viray Founder, Search Opt Media Inc. Search.Rank.Convert.
SEARCH + SOCIAL Gary Viray Founder, Search Opt Media Inc. Goo gol Google Algorithm Change Google Toolbar December 2000 Birth of Toolbar Pagerank They move the toilet mid stream. 404P Pages are ranking
More informationOn Asymptotic Cost of Triangle Listing in Random Graphs
On Asymptotic Cost of Triangle Listing in Random Graphs Di Xiao, Yi Cui, Daren B.H. Cline, Dmitri Loguinov Internet Research Lab Department of Computer Science and Engineering Texas A&M University May
More informationSybilLimit: A Near-Optimal Social Network Defense against Sybil Attacks
2008 IEEE Symposium on Security and Privacy SybilLimit: A Near-Optimal Social Network Defense against Sybil Attacks Haifeng Yu National University of Singapore haifeng@comp.nus.edu.sg Michael Kaminsky
More informationGraph Data Management
Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of
More informationMaximizing the Spread of Influence through a Social Network
Maximizing the Spread of Influence through a Social Network By David Kempe, Jon Kleinberg, Eva Tardos Report by Joe Abrams Social Networks Infectious disease networks Viral Marketing Viral Marketing Example:
More informationDiffusion and Clustering on Large Graphs
Diffusion and Clustering on Large Graphs Alexander Tsiatas Final Defense 17 May 2012 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of large graphs: The World
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 25, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 25, 2011 1 / 17 Clique Trees Today we are going to
More informationA quick review. Which molecular processes/functions are involved in a certain phenotype (e.g., disease, stress response, etc.)
Gene expression profiling A quick review Which molecular processes/functions are involved in a certain phenotype (e.g., disease, stress response, etc.) The Gene Ontology (GO) Project Provides shared vocabulary/annotation
More informationInferring Coarse Views of Connectivity in Very Large Graphs
Inferring Coarse Views of Connectivity in Very Large Graphs Reza Motamedi, Reza Rejaie, Walter Willinger, Daniel Lowd, Roberto Gonzalez http://onrg.cs.uoregon.edu/walkabout 10/8/14 1 Introduction! Large-scale
More informationScalable Network Analysis
Inderjit S. Dhillon University of Texas at Austin COMAD, Ahmedabad, India Dec 20, 2013 Outline Unstructured Data - Scale & Diversity Evolving Networks Machine Learning Problems arising in Networks Recommender
More informationLecture 5: Graphs. Rajat Mittal. IIT Kanpur
Lecture : Graphs Rajat Mittal IIT Kanpur Combinatorial graphs provide a natural way to model connections between different objects. They are very useful in depicting communication networks, social networks
More informationPerformance and cost effectiveness of caching in mobile access networks
Performance and cost effectiveness of caching in mobile access networks Jim Roberts (IRT-SystemX) joint work with Salah Eddine Elayoubi (Orange Labs) ICN 2015 October 2015 The memory-bandwidth tradeoff
More informationSocial Network Analysis
Social Network Analysis Mathematics of Networks Manar Mohaisen Department of EEC Engineering Adjacency matrix Network types Edge list Adjacency list Graph representation 2 Adjacency matrix Adjacency matrix
More informationEfficient Identification of Starters and Followers in Social Media
Efficient Identification of Starters and Followers in Social Media Michael Mathioudakis Department of Computer Science University of Toronto mathiou@cs.toronto.edu Nick Koudas Department of Computer Science
More informationGetafix: Workload-aware Distributed Interactive Analytics
Getafix: Workload-aware Distributed Interactive Analytics Presenter: Mainak Ghosh Collaborators: Le Xu, Xiaoyao Qian, Thomas Kao, Indranil Gupta, Himanshu Gupta Data Analytics 2 Picture borrowed from https://conferences.oreilly.com/strata/strata-ny-2016/public/schedule/detail/51640
More informationThink before You Discard: Accurate Triangle Counting in Graph Streams with Deletions
Think before You Discard: Accurate Triangle Counting in Graph Streams with Deletions Kijung Shin 1( ), Jisu Kim 2, Bryan Hooi 2, and Christos Faloutsos 1 School of Computer Science, Carnegie Mellon University,
More informationOn Dimensionality Reduction of Massive Graphs for Indexing and Retrieval
On Dimensionality Reduction of Massive Graphs for Indexing and Retrieval Charu C. Aggarwal 1, Haixun Wang # IBM T. J. Watson Research Center Hawthorne, NY 153, USA 1 charu@us.ibm.com # Microsoft Research
More informationUS Patent 6,658,423. William Pugh
US Patent 6,658,423 William Pugh Detecting duplicate and near - duplicate files Worked on this problem at Google in summer of 2000 I have no information whether this is currently being used I know that
More informationCS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS
CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS Overview of Networks Instructor: Yizhou Sun yzsun@cs.ucla.edu January 10, 2017 Overview of Information Network Analysis Network Representation Network
More information