DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
|
|
- Evelyn Russell
- 5 years ago
- Views:
Transcription
1 Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017
2 Reiews/Critiques I will choose one reiew to grade this week.
3 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna, 2011] J. Leskoec, A. Rajaraman, J. Ullman: Mining of Massie Datasets, 3
4 Graph Data: Media Networks Connections between political blogs Polarization of the network [Adamic-Glance, 2005] J. Leskoec, A. Rajaraman, J. Ullman: Mining of Massie Datasets, 4
5 Graph Data: Information Nets Citation networks and Maps of science [Börner et al., 2012] J. Leskoec, A. Rajaraman, J. Ullman: Mining of Massie Datasets, 5
6 Graph Data: Communication Nets domain2 domain1 router domain3 Internet J. Leskoec, A. Rajaraman, J. Ullman: Mining of Massie Datasets, 6
7 Questions? Partial map of the Internet based on the January 15, 2005 data found on opte.org. (from
8 Graph Data: Topological Networks Seen Bridges of Königsberg [Euler, 1735] Return to the starting point by traeling each link of the graph once and only once. J. Leskoec, A. Rajaraman, J. Ullman: Mining of Massie Datasets, 8
9 Graph representation of networks Friendship Resistance Co-authorship Following One-way road Wireless channel Undirected links Directed links + Friend & foe Trust & distrust Multi-relational links Hyperlinks Repulsion & cohesion Signed links
10 Mining in Big Graphs Network Statistic Analysis (this lecture) Network Size Degree distribution. Node Ranking (Next lecture) Identifying most influential nodes Viral Marketing, resource allocation
11 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna, 2011] J. Leskoec, A. Rajaraman, J. Ullman: Mining of Massie Datasets, 11
12 Sampling graphs Random sampling (uniform & independent) crawling } ertex sampling } BFS sampling } edge sampling } random walk sampling 12 12
13 Random Walks on Graphs Random walk sampling Random Walk Routing Molecule in liquid Influence diffusion
14 Undirected Graphs Undirected!!
15 Random Walk! # A = # # # " Adjacency matrix $ & & & & %! # D = # # # " Transition Probability Matrix 1 if i is not equal to j P ij = k i 0 if i=j E : number of links Stationary Distribution $ & & & & % Symmetric 1 " $ P = A D 1 = $ $ $ # Undirected 0 1/ 3 1/ 3 1/ 3 1/ 2 0 1/ 2 0 1/ 3 1/ 3 0 1/ 3 1/ 2 0 1/ 2 0 % ' ' ' ' & π i = d i 2 E
16 Metropolis-Hastings Random Walk! # A = # # # " Adjacency matrix $ & & & & %! # D = # # # " Transition Probability Matrix 1 min(1, k υ ) if w neighbor of υ P MH k υ,w = υ k w 1 MH P υ,y if w=υ y υ E : number of links Stationary Distribution p = u 1 V $ & & & & % Symmetric 1 " $ P = A D 1 = $ $ $ # Undirected 0 1/ 3 1/ 3 1/ 3 1/ 3 1/ 3 1/ 3 0 1/ 3 1/ 3 0 1/ 3 1/ 3 0 1/ 3 1/ 3 % ' ' ' ' &
17 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant, Carter Butts, Athina Markopoulou UC Irine, EPFL Minas Gjoka, UC Irine Walking in Facebook 17
18 Outline Motiation and Problem Statement Sampling Methodology Data Analysis Conclusion Minas Gjoka, UC Irine Walking in Facebook 18
19 Online Social Networks (OSNs) A network of declared friendships between users Allows users to maintain relationships G F E H C Many popular OSNs with different focus Facebook, LinkedIn, Flickr, D A Social Graph B Minas Gjoka, UC Irine Walking in Facebook 19
20 Why Sample OSNs? Representatie samples desirable study properties test algorithms Obtaining complete dataset difficult companies usually unwilling to share data tremendous oerhead to measure all (~100TB for Facebook) Minas Gjoka, UC Irine Walking in Facebook 20
21 Problem statement Obtain a representatie sample of users in a gien OSN by exploration of the social graph. in this work we sample Facebook (FB) explore graph using arious crawling techniques Minas Gjoka, UC Irine Walking in Facebook 21
22 Related Work Graph traersal (BFS) A. Misloe et al, IMC 2007 Y. Ahn et al, WWW 2007 C. Wilson, Eurosys 2009 Random walks (MHRW, RDS) M. Henzinger et al, WWW 2000 D. Stutbach et al, IMC 2006 A. Rasti et al, Mini Infocom 2009 Minas Gjoka, UC Irine Walking in Facebook 22
23 Outline Motiation and Problem Statement Sampling Methodology crawling methods data collection conergence ealuation method comparisons Data Analysis Conclusion Minas Gjoka, UC Irine Walking in Facebook 23
24 (1) Breadth-First-Search (BFS) Starting from a seed, explores all neighbor nodes. Process continues iteratiely without replacement. G F E H C BFS leads to bias towards high degree nodes Lee et al, Statistical properties of Sampled Networks, Phys Reiew E, 2006 D A B Unexplored Early measurement studies of OSNs use BFS as primary sampling technique i.e [Misloe et al], [Ahn et al], [Wilson et al.] Explored Visited Minas Gjoka, UC Irine Walking in Facebook 24
25 (2) Random Walk (RW) Explores graph one node at a time with replacement RW u, w P Degree of node υ In the stationary distribution Number of edges = 1 k u k p = u u 2 E H G D 1/3 F E C 1/3 A 1/3 Next candidate Current node B Minas Gjoka, UC Irine Walking in Facebook 25
26 (3) Re-Weighted Random Walk (RWRW) Hansen-Hurwitz estimator Corrects for degree bias at the end of collection Without re-weighting, the probability distribution for node property A is: pa ( ) i å å 1 A uîa = i = i 1 V uîv Re-Weighted probability distribution : Subset of sampled nodes with alue i All sampled nodes Minas Gjoka, UC Irine pa ( ) i uîa = å å Î u V i 1/ 1/ k k u u Degree of node u Walking in Facebook 26
27 (4) Metropolis-Hastings Random Walk (MHRW) Explore graph one node at a time with replacement H G C F E ì 1 k min(1, u ) if w neighbor of u MH ïk k P u w u, w = í ï MH 1 - å Pu, y if w= u ïî y¹ u In the stationary distribution Minas Gjoka, UC Irine p = u 1 V MH P AA D 1/3 1/5 A 2/15 1/3 Walking in Facebook 27 B Next candidate Current node = 1 - ( + + ) = MH 1 3 P AC = =
28 Uniform userid Sampling (UNI) As a basis for comparison, we collect a uniform sample of Facebook userids (UNI) rejection sampling on the 32-bit userid space UNI not a general solution for sampling OSNs userid space must not be sparse names instead of numbers Minas Gjoka, UC Irine Walking in Facebook 28
29 Summary of Datasets Sampling method MHRW RW BFS UNI #Valid Users 28x81K 28x81K 28x81K 984K # Unique Users 957K 2.19M 2.20M 984K Egonets for a subsample of MHRW - local properties of nodes Datasets aailable at: Minas Gjoka, UC Irine Walking in Facebook 29
30 Data Collection Basic Node Information What information do we collect for each sampled node u? UserID Name Networks Priacy settings Friend List UserID Name Networks Priacy Settings u UserID Name Networks Priacy settings 1111 UserID Name Networks Priacy settings Regional School/Workplace Send Message View Friends Profile Photo Add as Friend Minas Gjoka, UC Irine Walking in Facebook 30
31 Detecting Conergence Number of samples (iterations) to loose dependence from starting points? Minas Gjoka, UC Irine Walking in Facebook 31
32 Online Conergence Diagnostics Geweke Detects conergence for a single walk. Let X be a sequence of samples for metric of interest. Xa Xb EX ( a) - EX ( b) z = Var( X )-Var( X ) a b J. Geweke, Ealuating the accuracy of sampling based approaches to calculate posterior moments in Bayesian Statistics 4, 1992 Minas Gjoka, UC Irine Walking in Facebook 32
33 Online Conergence Diagnostics Gelman-Rubin Detects conergence for m>1 walks Walk 1 Between walks ariance Walk 2 Walk 3 R æn- 1 m+ 1 B ö = ç + è n mn W ø Within walks ariance A. Gelman, D. Rubin, Inference from iteratie simulation using multiple sequences in Statistical Science Volume 7, 1992 Minas Gjoka, UC Irine Walking in Facebook 33
34 When do we reach equilibrium? Node Degree Burn-in determined to be 3K Minas Gjoka, UC Irine Walking in Facebook 34
35 Methods Comparison Node Degree Poor performance for BFS, RW 28 crawls MHRW, RWRW produce good estimates per chain oerall Minas Gjoka, UC Irine Walking in Facebook 35
36 Sampling Bias BFS Low degree nodes underrepresented by two orders of magnitude BFS is biased Minas Gjoka, UC Irine Walking in Facebook 36
37 Sampling Bias MHRW, RW, RWRW Degree distribution identical to UNI (MHRW,RWRW) RW as biased as BFS but with smaller ariance in each walk Minas Gjoka, UC Irine Walking in Facebook 37
38 Practical Recommendations for Sampling Methods Use MHRW or RWRW. Do not use BFS, RW. Use formal conergence diagnostics assess conergence online use multiple parallel walks MHRW s RWRW RWRW slightly better performance MHRW proides a ready-to-use sample Minas Gjoka, UC Irine Walking in Facebook 38
39 Outline Motiation and Problem Statement Sampling Methodology Data Analysis Conclusion Minas Gjoka, UC Irine Walking in Facebook 39
40 FB Social Graph Degree Distribution, Power Law Distribution f(k)=b*k -a a 1 =1.32 a 2 =3.38 Degree distribution not a power law Minas Gjoka, UC Irine Walking in Facebook 40
41 Any Comments & Critiques?
42 Next Class: Graph Mining (II) Do assigned readings before class Submit reiews/critiques Attend in-class discussions 42
DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationDS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 233 Spring 2018 Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy,
More informationA Walk in Facebook: Uniform Sampling of Users in Online Social Networks
A Walk in Facebook: Uniform Sampling of Users in Online Social Networks Minas Gjoka, Maciej Kurant, Carter T. Butts, Athina Markopoulou California Institute for Telecommunications and Information Technology
More informationA Walk in Facebook: Uniform Sampling of Users in Online Social Networks
A Walk in Facebook: Uniform Sampling of Users in Online Social Networks Minas Gjoka CalIT2 UC Irvine mgjoka@uci.edu Maciej Kurant CalIT2 UC Irvine maciej.kurant@gmail.com Carter T. Butts Sociology Dept
More informationUnbiased Sampling of Facebook
Unbiased Sampling of Facebook Minas Gjoka Networked Systems UC Irvine mgjoka@uci.edu Maciej Kurant School of ICS EPFL, Lausanne maciej.kurant@epfl.ch Carter T. Butts Sociology Dept UC Irvine buttsc@uci.edu
More informationWalking in Facebook: A Case Study of Unbiased Sampling of OSNs
Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka Networked Systems UC Irvine mgjoka@uci.edu Maciej Kurant School of Comp.& Comm. Sciences EPFL, Lausanne maciej.kurant@epfl.ch
More informationHow to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200?
How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200? Questions from last time Avg. FB degree is 200 (suppose).
More informationMaking social interactions accessible in online social networks
Information Services & Use 33 (2013) 113 117 113 DOI 10.3233/ISU-130702 IOS Press Making social interactions accessible in online social networks Fredrik Erlandsson a,, Roozbeh Nia a, Henric Johnson b
More informationEmpirical Studies on Methods of Crawling Directed Networks
www.ijcsi.org 487 Empirical Studies on Methods of Crawling Directed Networks Junjie Tong, Haihong E, Meina Song and Junde Song School of Computer, Beijing University of Posts and Telecommunications Beijing,
More informationPart 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationMultigraph Sampling of Online Social Networks
Multigraph Sampling of Online Social Networks Minas Gjoka CalIT2 UC Irvine mgjoka@uci.edu Carter T. Butts Sociology Dept UC Irvine buttsc@uci.edu Maciej Kurant CalIT2 UC Irvine maciej.kurant@gmail.com
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationDS504/CS586: Big Data Analytics Data acquisition and measurement Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data acquisition and measurement Prof. Yanhua Li Time: 6:00pm 8:50pm THURSDAY Location: AK 232 Fall 2016 Data acquisition and measurement ia Sampling and Estimation
More informationSampling Large Graphs: Algorithms and Applications
Sampling Large Graphs: Algorithms and Applications Don Towsley Umass - Amherst Joint work with P.H. Wang, J.Z. Zhou, J.C.S. Lui, X. Guan Measuring, Analyzing Large Networks - large networks can be represented
More informationMining and Analyzing Online Social Networks
The 5th EuroSys Doctoral Workshop (EuroDW 2011) Salzburg, Austria, Sunday 10 April 2011 Mining and Analyzing Online Social Networks Emilio Ferrara eferrara@unime.it Advisor: Prof. Giacomo Fiumara PhD School
More informationSampling Large Graphs: Algorithms and Applications
Sampling Large Graphs: Algorithms and Applications Don Towsley College of Information & Computer Science Umass - Amherst Collaborators: P.H. Wang, J.C.S. Lui, J.Z. Zhou, X. Guan Measuring, analyzing large
More informationLecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods
Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur
More informationCS224W: Analysis of Network Jure Leskovec, Stanford University
CS224W: Analysis of Network Jure Leskovec, Stanford University http://cs224w.stanford.edu 9/25/17 Jure Leskovec, Stanford CS224W: Analysis of Networks 2 Why Networks? Networks are a general language for
More informationEmpirical Characterization of P2P Systems
Empirical Characterization of P2P Systems Reza Rejaie Mirage Research Group Department of Computer & Information Science University of Oregon http://mirage.cs.uoregon.edu/ Collaborators: Daniel Stutzbach
More informationA two-phase Sampling based Algorithm for Social Networks
A two-phase Sampling based Algorithm for Social Networks Zeinab S. Jalali, Alireza Rezvanian, Mohammad Reza Meybodi Soft Computing Laboratory, Computer Engineering and Information Technology Department
More informationProbabilistic Models in Social Network Analysis
Probabilistic Models in Social Network Analysis Sargur N. Srihari University at Buffalo, The State University of New York USA US-India Workshop on Large Scale Data Analytics and Intelligent Services December
More informationCS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul
1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given
More informationModeling and Estimating the Structure of D2D-Based Mobile Social Networks
Modeling and Estimating the Structure of DD-Based Mobile Social Networks Sigit Aryo Pambudi Wenye Wang Department of Electrical and Computer Engineering North Carolina State University, Raleigh, NC 7606
More informationMining Social Network Graphs
Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be
More informationOSN: when multiple autonomous users disclose another individual s information
OSN: when multiple autonomous users disclose another individual s information Cristina Pérez-Solà Dept. d Enginyeria de la Informació i les Comunicacions Universitat Autònoma de Barcelona Email: cperez@deic.uab.cat
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours
More informationEstimating Sizes of Social Networks via Biased Sampling
Estimating Sizes of Social Networks via Biased Sampling Liran Katzir, Edo Liberty, and Oren Somekh Yahoo! Labs, Haifa, Israel International World Wide Web Conference, 28th March - 1st April 2011, Hyderabad,
More informationVideo Sharing Websites Study
Video Sharing Websites Study Content Characteristic Analysis Nan ZHAO Télécom ParisTech Paris, France nan.zhao@telecom-paristech.fr Loïc BAUD DREV, Hadopi Paris, France loic.baud@hadopi.fr Patrick BELLOT
More informationA Short History of Markov Chain Monte Carlo
A Short History of Markov Chain Monte Carlo Christian Robert and George Casella 2010 Introduction Lack of computing machinery, or background on Markov chains, or hesitation to trust in the practicality
More informationAlgorithmic and Economic Aspects of Networks. Nicole Immorlica
Algorithmic and Economic Aspects of Networks Nicole Immorlica Syllabus 1. Jan. 8 th (today): Graph theory, network structure 2. Jan. 15 th : Random graphs, probabilistic network formation 3. Jan. 20 th
More informationDS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 First Grading for Reading Assignment Weka v 6 weeks v https://weka.waikato.ac.nz/dataminingwithweka/preview
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm
More informationSupplementary file for SybilDefender: A Defense Mechanism for Sybil Attacks in Large Social Networks
1 Supplementary file for SybilDefender: A Defense Mechanism for Sybil Attacks in Large Social Networks Wei Wei, Fengyuan Xu, Chiu C. Tan, Qun Li The College of William and Mary, Temple University {wwei,
More informationDS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH116 Fall 2017 Merged CS586 and DS504 Examples of Reviews/ Critiques Random
More informationSupervised Random Walks
Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate
More informationFramework and Algorithms for Network Bucket Testing
Framework and Algorithms for Network Bucket Testing Liran Katzir Yahoo! Labs., Haifa, Israel lirank@yahoo-inc.com Edo Liberty Yahoo! Labs., Haifa, Israel edo@yahoo-inc.com Oren Somekh Yahoo! Labs., Haifa,
More information1 Methods for Posterior Simulation
1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing
More informationSocial Networks 2015 Lecture 10: The structure of the web and link analysis
04198250 Social Networks 2015 Lecture 10: The structure of the web and link analysis The structure of the web Information networks Nodes: pieces of information Links: different relations between information
More informationExtracting Information from Complex Networks
Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform
More informationMarkov Chain Monte Carlo (part 1)
Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for
More informationData mining --- mining graphs
Data mining --- mining graphs University of South Florida Xiaoning Qian Today s Lecture 1. Complex networks 2. Graph representation for networks 3. Markov chain 4. Viral propagation 5. Google s PageRank
More informationIntroduction To Graphs and Networks. Fall 2013 Carola Wenk
Introduction To Graphs and Networks Fall 203 Carola Wenk On the Internet, links are essentially weighted by factors such as transit time, or cost. The goal is to find the shortest path from one node to
More informationAlgorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov
Algorithms and Applications in Social Networks 2017/2018, Semester B Slava Novgorodov 1 Lesson #1 Administrative questions Course overview Introduction to Social Networks Basic definitions Network properties
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 4: Analyzing Graphs (1/2) October 4, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are
More informationThis is the published version of a paper presented at Network Intelligence Conference (ENIC), 2015 Second European.
http://www.diva-portal.org This is the published version of a paper presented at Network Intelligence Conference (ENIC), 2015 Second European. Citation for the original published paper: Erlandsson, F.,
More informationGraph similarity. Laura Zager and George Verghese EECS, MIT. March 2005
Graph similarity Laura Zager and George Verghese EECS, MIT March 2005 Words you won t hear today impedance matching thyristor oxide layer VARs Some quick definitions GV (, E) a graph G V the set of vertices
More information: Semantic Web (2013 Fall)
03-60-569: Web (2013 Fall) University of Windsor September 4, 2013 Table of contents 1 2 3 4 5 Definition of the Web The World Wide Web is a system of interlinked hypertext documents accessed via the Internet
More informationLecture 27: Learning from relational data
Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission
More informationDegree Ranking Using Local Information
Degree Ranking Using Local Information arxiv:1706.01205v2 [cs.si] 10 Jun 2017 Akrati Saxena* akrati.saxena@iitrpr.ac.in S. R. S. Iyengar* sudarshan@iitrpr.ac.in Ralucca Gera** rgera@nps.edu *Department
More informationCrawling Online Social Graphs
Crawling Online Social Graphs Shaozhi Ye Department of Computer Science University of California, Davis sye@ucdavis.edu Juan Lang Department of Computer Science University of California, Davis jilang@ucdavis.edu
More informationMCMC Methods for data modeling
MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms
More informationEstimating Deep Web Properties by Random Walk
University of Windsor Scholarship at UWindsor Electronic Theses and Dissertations 2013 Estimating Deep Web Properties by Random Walk Sajib Kumer Sinha University of Windsor Follow this and additional works
More informationDS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li
Welcome to DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm Wednesday Location: Fuller 320 Spring 2017 2 Team assignment Finalized. (Great!) Guest Speaker 2/22 A
More informationCS224W: Analysis of Networks Jure Leskovec, Stanford University
CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models
More informationAn Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization
An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization Pedro Ribeiro (DCC/FCUP & CRACS/INESC-TEC) Part 1 Motivation and emergence of Network Science
More information745: Advanced Database Systems
745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.
More informationCycles in Random Graphs
Cycles in Random Graphs Valery Van Kerrebroeck Enzo Marinari, Guilhem Semerjian [Phys. Rev. E 75, 066708 (2007)] [J. Phys. Conf. Series 95, 012014 (2008)] Outline Introduction Statistical Mechanics Approach
More informationCOMP 4601 Hubs and Authorities
COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one
More informationCS 3516: Advanced Computer Networks
Welcome to CS 56: Adanced Computer Networks Prof. Yanhua Li Time: 9:00am 9:50am M, T, R, and F Location: Fuller 0 Fall 07 A-term Some slides are originall the course materials of the tetbook Computer Networking:
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationWeb consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page
Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information
More informationAiding the Detection of Fake Accounts in Large Scale Social Online Services
Aiding the Detection of Fake Accounts in Large Scale Social Online Services Qiang Cao Duke University Michael Sirivianos Xiaowei Yang Tiago Pregueiro Cyprus Univ. of Technology Duke University Tuenti,
More informationGraph Data Management
Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of
More informationCS281 Section 9: Graph Models and Practical MCMC
CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs
More informationCS 3516: Advanced Computer Networks
Welcome to CS 56: Adanced Computer Networks Prof. Yanhua Li Time: 9:00am 9:50am M, T, R, and F Location: Fuller 0 Fall 06 A-term Some slides are originall from the course materials of the tetbook Computer
More informationNavigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets
Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Nadathur Satish, Narayanan Sundaram, Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming
More informationLecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science
Lecture 9: I: Web Retrieval II: Webology Johan Bollen Old Dominion University Department of Computer Science jbollen@cs.odu.edu http://www.cs.odu.edu/ jbollen April 10, 2003 Page 1 WWW retrieval Two approaches
More informationNetworks in economics and finance. Lecture 1 - Measuring networks
Networks in economics and finance Lecture 1 - Measuring networks What are networks and why study them? A network is a set of items (nodes) connected by edges or links. Units (nodes) Individuals Firms Banks
More informationST440/540: Applied Bayesian Analysis. (5) Multi-parameter models - Initial values and convergence diagn
(5) Multi-parameter models - Initial values and convergence diagnostics Tuning the MCMC algoritm MCMC is beautiful because it can handle virtually any statistical model and it is usually pretty easy to
More informationMCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24
MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,
More informationCSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection
CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationIntroduction to Algorithms. Lecture 11. Prof. Patrick Jaillet
6.006- Introduction to Algorithms Lecture 11 Prof. Patrick Jaillet Lecture Overview Searching I: Graph Search and Representations Readings: CLRS 22.1-22.3, B.4 Graphs G=(V,E) V a set of vertices usually
More informationTopic mash II: assortativity, resilience, link prediction CS224W
Topic mash II: assortativity, resilience, link prediction CS224W Outline Node vs. edge percolation Resilience of randomly vs. preferentially grown networks Resilience in real-world networks network resilience
More informationLecture 5: Graphs. Rajat Mittal. IIT Kanpur
Lecture : Graphs Rajat Mittal IIT Kanpur Combinatorial graphs provide a natural way to model connections between different objects. They are very useful in depicting communication networks, social networks
More informationGraph Exploration: How to do better than the random walk? Adrian Kosowski. INRIA Bordeaux Sud-Ouest.
Graph Exploration: How to do better than the random walk? Adrian Kosowski INRIA Bordeaux Sud-Ouest kosowski@labri.fr Réunion Displexity La Rochelle, April 4, 2013 Talk outline Introduction to network exploration
More informationGraph Data Processing with MapReduce
Distributed data processing on the Cloud Lecture 5 Graph Data Processing with MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, 2015 (licensed under Creation Commons Attribution
More informationMining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationDS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 232 Fall 2016 The Data Equation Oceans of Data Ocean Biodiversity Informatics,
More informationAn Efficient Methodology for Image Rich Information Retrieval
An Efficient Methodology for Image Rich Information Retrieval 56 Ashwini Jaid, 2 Komal Savant, 3 Sonali Varma, 4 Pushpa Jat, 5 Prof. Sushama Shinde,2,3,4 Computer Department, Siddhant College of Engineering,
More informationE6885 Network Science Lecture 5: Network Estimation and Modeling
E 6885 Topics in Signal Processing -- Network Science E6885 Network Science Lecture 5: Network Estimation and Modeling Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University October 7th,
More informationChapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han
Chapter 1. Social Media and Social Computing October 2012 Youn-Hee Han http://link.koreatech.ac.kr 1.1 Social Media A rapid development and change of the Web and the Internet Participatory web application
More informationUnbiased Sampling over Online Social Networks
Unbiased Sampling over Online Social Networks Mojtaba Torkjazi Department of Computer & Information Science University of Oregon moji@cs.uoreogn.edu ABSTRACT During recent years, Online Social Networks
More informationCSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection
CSE 258 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationQuantitative Biology II!
Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!
More informationClustering Relational Data using the Infinite Relational Model
Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015
More informationGraph and Link Mining
Graph and Link Mining Graphs - Basics A graph is a powerful abstraction for modeling entities and their pairwise relationships. G = (V,E) Set of nodes V = v,, v 5 Set of edges E = { v, v 2, v 4, v 5 }
More informationJordan Boyd-Graber University of Maryland. Thursday, March 3, 2011
Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationSocial Network Analysis
Social Network Analysis Mathematics of Networks Manar Mohaisen Department of EEC Engineering Adjacency matrix Network types Edge list Adjacency list Graph representation 2 Adjacency matrix Adjacency matrix
More informationCSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection
CSE 158 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationCSE 316: SOCIAL NETWORK ANALYSIS INTRODUCTION. Fall 2017 Marion Neumann
CSE 316: SOCIAL NETWORK ANALYSIS Fall 2017 Marion Neumann INTRODUCTION Contents in these slides may be subject to copyright. Some materials are adopted from: http://www.cs.cornell.edu/home /kleinber/ networks-book,
More informationGraph Theory for Network Science
Graph Theory for Network Science Dr. Natarajan Meghanathan Professor Department of Computer Science Jackson State University, Jackson, MS E-mail: natarajan.meghanathan@jsums.edu Networks or Graphs We typically
More informationMy Best Current Friend in a Social Network
Procedia Computer Science Volume 51, 2015, Pages 2903 2907 ICCS 2015 International Conference On Computational Science My Best Current Friend in a Social Network Francisco Moreno 1, Santiago Hernández
More informationBiological Networks Analysis
Biological Networks Analysis Introduction and Dijkstra s algorithm Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The clustering problem: partition genes into distinct
More informationDetecting and Analyzing Communities in Social Network Graphs for Targeted Marketing
Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Gautam Bhat, Rajeev Kumar Singh Department of Computer Science and Engineering Shiv Nadar University Gautam Buddh Nagar,
More informationDS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu Location: AK 232 Fall 2016 High Dimensional Data v Given a cloud of data points we want to understand
More informationNetwork Centrality. Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017
Network Centrality Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017 Node centrality n Relative importance of a node in a network n How influential a person is within a
More informationEnhancing Stratified Graph Sampling Algorithms based on Approximate Degree Distribution
Enhancing Stratified Graph Sampling Algorithms based on Approximate Degree Distribution Junpeng Zhu 1,2, Hui Li 1,2(), Mei Chen 1,2, Zhenyu Dai 1,2 and Ming Zhu 3 1 College of Computer Science and Technology,
More informationGraphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech
CSE 6242/ CX 4242 Feb 18, 2014 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey
More informationGraph Data Management Systems in New Applications Domains. Mikko Halin
Graph Data Management Systems in New Applications Domains Mikko Halin Introduction Presentation is based on two papers Graph Data Management Systems for New Application Domains - Philippe Cudré-Mauroux,
More information