DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
|
|
- Jonathan Morrison
- 6 years ago
- Views:
Transcription
1 Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016
2 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna, 2011] J. Leskovec, A. Rajaraman, J. Ullman: 2 Mining of Massive Datasets,
3 Graph Data: Media Networks Connections between political blogs Polarization of the network [Adamic-Glance, 2005] J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 3
4 Graph Data: Information Nets Citation networks and Maps of science [Börner et al., 2012] J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 4
5 Graph Data: Communication Nets domain2 domain1 router domain3 Internet J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 5
6 Graph Data: Topological Networks Seven Bridges of Königsberg [Euler, 1735] Return to the starting point by traveling each link of the graph once and only once. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 6
7 Graph representation of networks Friendship Resistance Co-authorship Following One-way road Wireless channel Undirected links Directed links Friend & foe Trust & distrust Multi-relational links Hyperlinks Repulsion & cohesion Signed links
8 Mining in Big Graphs v Network Statistic Analysis (this lecture) Network Size Degree distribution. v Node Ranking (Next lecture) Identifying most influential nodes Viral Marketing, resource allocation
9 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna, 2011] J. Leskovec, A. Rajaraman, J. Ullman: 9 Mining of Massive Datasets,
10 Sampling graphs Random sampling (uniform & independent) crawling } vertex sampling } BFS sampling } edge sampling } random walk sampling 10 10
11 Random Walks on Graphs Random walk sampling Random Walk Routing Molecule in liquid Influence diffusion
12 Undirected Graphs Undirected!!
13 Random Walk v Adjacency matrix! # A = # # # " $ & & & & %! # D = # # # " v Transition Probability Matrix P ij = 1 k i v E : number of links v Stationary Distribution $ & & & & % Symmetric 1 " $ P = A D 1 = $ $ $ # Undirected 0 1/ 3 1/ 3 1/ 3 1/ 2 0 1/ 2 0 1/ 3 1/ 3 0 1/ 3 1/ 2 0 1/ 2 0 % ' ' ' ' & π i = d i 2 E
14 Metropolis-Hastings Random Walk v Adjacency matrix! # A = # # # " $ & & & & %! # D = # # # " v Transition Probability Matrix 1 k min(1, υ ) if w neighbor of υ MH k k P υ w υ, w = MH 1 Pυ, y if w= υ y υ v E : number of links v Stationary Distribution π υ = 1 V $ & & & & % Symmetric 1 " $ P = A D 1 = $ $ $ # Undirected 0 1/ 3 1/ 3 1/ 3 1/ 3 1/ 3 1/ 3 0 1/ 3 1/ 3 0 1/ 3 1/ 3 0 1/ 3 1/ 3 % ' ' ' ' &
15 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant, Carter Butts, Athina Markopoulou UC Irvine, EPFL Walking in Facebook 15
16 Outline v Motivation and Problem Statement v Sampling Methodology v Data Analysis v Conclusion Walking in Facebook 16
17 Online Social Networks (OSNs) v A network of declared friendships between users v Allows users to maintain relationships G F E H C D B v Many popular OSNs with different focus Facebook, LinkedIn, Flickr, A Social Graph Walking in Facebook 17
18 Why Sample OSNs? v Representative samples desirable study properties test algorithms v Obtaining complete dataset difficult companies usually unwilling to share data tremendous overhead to measure all (~100TB for Facebook) Walking in Facebook 18
19 Problem statement v Obtain a representative sample of users in a given OSN by exploration of the social graph. in this work we sample Facebook (FB) explore graph using various crawling techniques Walking in Facebook 19
20 Related Work v Graph traversal (BFS) A. Mislove et al, IMC 2007 Y. Ahn et al, WWW 2007 C. Wilson, Eurosys 2009 v Random walks (MHRW, RDS) M. Henzinger et al, WWW 2000 D. Stutbach et al, IMC 2006 A. Rasti et al, Mini Infocom 2009 Walking in Facebook 20
21 Outline v Motivation and Problem Statement v Sampling Methodology crawling methods data collection convergence evaluation method comparisons v Data Analysis v Conclusion Walking in Facebook 21
22 (1) Breadth-First-Search (BFS) v Starting from a seed, explores all neighbor nodes. Process continues iteratively without replacement. G F E H C v BFS leads to bias towards high degree nodes Lee et al, Statistical properties of Sampled Networks, Phys Review E, 2006 D A B Unexplored v Early measurement studies of OSNs use BFS as primary sampling technique i.e [Mislove et al], [Ahn et al], [Wilson et al.] Explored Visited Walking in Facebook 22
23 (2) Random Walk (RW) Explores graph one node at a time with replacement P Degree of node υ RW υ, w In the stationary distribution π υ = Number of edges = 1 k k 2 υ υ E H G D 1/3 F E C 1/3 A 1/3 Next candidate Current node B Walking in Facebook 23
24 (3) Re-Weighted Random Walk (RWRW) Hansen-Hurwitz estimator v Corrects for degree bias at the end of collection v Without re-weighting, the probability distribution for node property A is: pa ( ) i u A1 A i i = = 1 V u V v Re-Weighted probability distribution : Subset of sampled nodes with value i All sampled nodes pa ( ) i u A = u V i 1/ 1/ k k u u Degree of node u Walking in Facebook 24
25 (4) Metropolis-Hastings Random Walk (MHRW) v Explore graph one node at a time with replacement H G C F E 1 k min(1, υ ) if w neighbor of υ MH k k P υ w υ, w = MH 1 Pυ, y if w= υ y υ v In the stationary distribution π υ = 1 V MH P AA D 1/3 1/5 A 2/15 1/3 Walking in Facebook 25 B Next candidate Current node = 1 ( + + ) = MH 1 3 P AC = =
26 Uniform userid Sampling (UNI) v As a basis for comparison, we collect a uniform sample of Facebook userids (UNI) rejection sampling on the 32-bit userid space v UNI not a general solution for sampling OSNs userid space must not be sparse names instead of numbers Walking in Facebook 26
27 Summary of Datasets Sampling method MHRW RW BFS UNI #Valid Users 28x81K 28x81K 28x81K 984K # Unique Users 957K 2.19M 2.20M 984K Egonets for a subsample of MHRW - local properties of nodes Datasets available at: Walking in Facebook 27
28 Data Collection Basic Node Information v What information do we collect for each sampled node u? UserID Name Networks Privacy settings Friend List UserID Name Networks Privacy Settings u UserID Name Networks Privacy settings 1111 UserID Name Networks Privacy settings Regional School/Workplace Send Message View Friends Profile Photo Add as Friend Walking in Facebook 28
29 Detecting Convergence Number of samples (iterations) to loose dependence from starting points? Walking in Facebook 29
30 Online Convergence Diagnostics Geweke v Detects convergence for a single walk. Let X be a sequence of samples for metric of interest. Xa Xb EX ( a) EX ( b) z = Var( X ) Var( X ) a b J. Geweke, Evaluating the accuracy of sampling based approaches to calculate posterior moments in Bayesian Statistics 4, 1992 Walking in Facebook 30
31 Online Convergence Diagnostics Gelman-Rubin v Detects convergence for m>1 walks Walk 1 Between walks variance Walk 2 Walk 3 R n 1 m+ 1 B = + n mn W Within walks variance A. Gelman, D. Rubin, Inference from iterative simulation using multiple sequences in Statistical Science Volume 7, 1992 Walking in Facebook 31
32 When do we reach equilibrium? Node Degree Burn-in determined to be 3K Walking in Facebook 32
33 Methods Comparison Node Degree v Poor performance for BFS, RW 28 crawls v MHRW, RWRW produce good estimates per chain overall Walking in Facebook 33
34 Sampling Bias BFS v Low degree nodes underrepresented by two orders of magnitude v BFS is biased Walking in Facebook 34
35 Sampling Bias MHRW, RW, RWRW v v Degree distribution identical to UNI (MHRW,RWRW) RW as biased as BFS but with smaller variance in each walk Walking in Facebook 35
36 Practical Recommendations for Sampling Methods v Use MHRW or RWRW. Do not use BFS, RW. v Use formal convergence diagnostics assess convergence online use multiple parallel walks v MHRW vs RWRW RWRW slightly better performance MHRW provides a ready-to-use sample Walking in Facebook 36
37 Outline v Motivation and Problem Statement v Sampling Methodology v Data Analysis v Conclusion Walking in Facebook 37
38 FB Social Graph Degree Distribution, Power Law Distribution f(k)=b*k -a a 1 =1.32 a 2 =3.38 v Degree distribution not a power law Walking in Facebook 38
39 Any Comments & Critiques?
40 Next Class: Graph Mining (II) v Do assigned readings before class v Submit reviews/critiques v Attend in-class discussions 40
DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 233 Spring 2018 Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy,
More informationDS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 Reiews/Critiques I will choose one reiew to grade this week. Graph Data: Social
More informationA Walk in Facebook: Uniform Sampling of Users in Online Social Networks
A Walk in Facebook: Uniform Sampling of Users in Online Social Networks Minas Gjoka, Maciej Kurant, Carter T. Butts, Athina Markopoulou California Institute for Telecommunications and Information Technology
More informationA Walk in Facebook: Uniform Sampling of Users in Online Social Networks
A Walk in Facebook: Uniform Sampling of Users in Online Social Networks Minas Gjoka CalIT2 UC Irvine mgjoka@uci.edu Maciej Kurant CalIT2 UC Irvine maciej.kurant@gmail.com Carter T. Butts Sociology Dept
More informationUnbiased Sampling of Facebook
Unbiased Sampling of Facebook Minas Gjoka Networked Systems UC Irvine mgjoka@uci.edu Maciej Kurant School of ICS EPFL, Lausanne maciej.kurant@epfl.ch Carter T. Butts Sociology Dept UC Irvine buttsc@uci.edu
More informationWalking in Facebook: A Case Study of Unbiased Sampling of OSNs
Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka Networked Systems UC Irvine mgjoka@uci.edu Maciej Kurant School of Comp.& Comm. Sciences EPFL, Lausanne maciej.kurant@epfl.ch
More informationHow to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200?
How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200? Questions from last time Avg. FB degree is 200 (suppose).
More informationMaking social interactions accessible in online social networks
Information Services & Use 33 (2013) 113 117 113 DOI 10.3233/ISU-130702 IOS Press Making social interactions accessible in online social networks Fredrik Erlandsson a,, Roozbeh Nia a, Henric Johnson b
More informationEmpirical Studies on Methods of Crawling Directed Networks
www.ijcsi.org 487 Empirical Studies on Methods of Crawling Directed Networks Junjie Tong, Haihong E, Meina Song and Junde Song School of Computer, Beijing University of Posts and Telecommunications Beijing,
More informationPart 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationSampling Large Graphs: Algorithms and Applications
Sampling Large Graphs: Algorithms and Applications Don Towsley College of Information & Computer Science Umass - Amherst Collaborators: P.H. Wang, J.C.S. Lui, J.Z. Zhou, X. Guan Measuring, analyzing large
More informationSampling Large Graphs: Algorithms and Applications
Sampling Large Graphs: Algorithms and Applications Don Towsley Umass - Amherst Joint work with P.H. Wang, J.Z. Zhou, J.C.S. Lui, X. Guan Measuring, Analyzing Large Networks - large networks can be represented
More informationMultigraph Sampling of Online Social Networks
Multigraph Sampling of Online Social Networks Minas Gjoka CalIT2 UC Irvine mgjoka@uci.edu Carter T. Butts Sociology Dept UC Irvine buttsc@uci.edu Maciej Kurant CalIT2 UC Irvine maciej.kurant@gmail.com
More informationMining and Analyzing Online Social Networks
The 5th EuroSys Doctoral Workshop (EuroDW 2011) Salzburg, Austria, Sunday 10 April 2011 Mining and Analyzing Online Social Networks Emilio Ferrara eferrara@unime.it Advisor: Prof. Giacomo Fiumara PhD School
More informationLecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods
Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur
More informationA two-phase Sampling based Algorithm for Social Networks
A two-phase Sampling based Algorithm for Social Networks Zeinab S. Jalali, Alireza Rezvanian, Mohammad Reza Meybodi Soft Computing Laboratory, Computer Engineering and Information Technology Department
More informationCS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul
1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given
More informationEmpirical Characterization of P2P Systems
Empirical Characterization of P2P Systems Reza Rejaie Mirage Research Group Department of Computer & Information Science University of Oregon http://mirage.cs.uoregon.edu/ Collaborators: Daniel Stutzbach
More informationSupplementary file for SybilDefender: A Defense Mechanism for Sybil Attacks in Large Social Networks
1 Supplementary file for SybilDefender: A Defense Mechanism for Sybil Attacks in Large Social Networks Wei Wei, Fengyuan Xu, Chiu C. Tan, Qun Li The College of William and Mary, Temple University {wwei,
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm
More informationProbabilistic Models in Social Network Analysis
Probabilistic Models in Social Network Analysis Sargur N. Srihari University at Buffalo, The State University of New York USA US-India Workshop on Large Scale Data Analytics and Intelligent Services December
More informationMining Social Network Graphs
Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be
More informationEstimating Sizes of Social Networks via Biased Sampling
Estimating Sizes of Social Networks via Biased Sampling Liran Katzir, Edo Liberty, and Oren Somekh Yahoo! Labs, Haifa, Israel International World Wide Web Conference, 28th March - 1st April 2011, Hyderabad,
More informationModeling and Estimating the Structure of D2D-Based Mobile Social Networks
Modeling and Estimating the Structure of DD-Based Mobile Social Networks Sigit Aryo Pambudi Wenye Wang Department of Electrical and Computer Engineering North Carolina State University, Raleigh, NC 7606
More informationOSN: when multiple autonomous users disclose another individual s information
OSN: when multiple autonomous users disclose another individual s information Cristina Pérez-Solà Dept. d Enginyeria de la Informació i les Comunicacions Universitat Autònoma de Barcelona Email: cperez@deic.uab.cat
More informationDS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu Location: AK 232 Fall 2016 High Dimensional Data v Given a cloud of data points we want to understand
More informationNavigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets
Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Nadathur Satish, Narayanan Sundaram, Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming
More informationLecture 27: Learning from relational data
Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours
More informationDS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li
Welcome to DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm Wednesday Location: Fuller 320 Spring 2017 2 Team assignment Finalized. (Great!) Guest Speaker 2/22 A
More informationCS224W: Analysis of Network Jure Leskovec, Stanford University
CS224W: Analysis of Network Jure Leskovec, Stanford University http://cs224w.stanford.edu 9/25/17 Jure Leskovec, Stanford CS224W: Analysis of Networks 2 Why Networks? Networks are a general language for
More informationA Short History of Markov Chain Monte Carlo
A Short History of Markov Chain Monte Carlo Christian Robert and George Casella 2010 Introduction Lack of computing machinery, or background on Markov chains, or hesitation to trust in the practicality
More informationCrawling Online Social Graphs
Crawling Online Social Graphs Shaozhi Ye Department of Computer Science University of California, Davis sye@ucdavis.edu Juan Lang Department of Computer Science University of California, Davis jilang@ucdavis.edu
More informationMarkov Chain Monte Carlo (part 1)
Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for
More informationExtracting Information from Complex Networks
Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform
More informationGraph Data Management
Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of
More informationVideo Sharing Websites Study
Video Sharing Websites Study Content Characteristic Analysis Nan ZHAO Télécom ParisTech Paris, France nan.zhao@telecom-paristech.fr Loïc BAUD DREV, Hadopi Paris, France loic.baud@hadopi.fr Patrick BELLOT
More informationThis is the published version of a paper presented at Network Intelligence Conference (ENIC), 2015 Second European.
http://www.diva-portal.org This is the published version of a paper presented at Network Intelligence Conference (ENIC), 2015 Second European. Citation for the original published paper: Erlandsson, F.,
More informationAn Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization
An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization Pedro Ribeiro (DCC/FCUP & CRACS/INESC-TEC) Part 1 Motivation and emergence of Network Science
More informationMining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationData mining --- mining graphs
Data mining --- mining graphs University of South Florida Xiaoning Qian Today s Lecture 1. Complex networks 2. Graph representation for networks 3. Markov chain 4. Viral propagation 5. Google s PageRank
More informationSocial Networks 2015 Lecture 10: The structure of the web and link analysis
04198250 Social Networks 2015 Lecture 10: The structure of the web and link analysis The structure of the web Information networks Nodes: pieces of information Links: different relations between information
More informationCycles in Random Graphs
Cycles in Random Graphs Valery Van Kerrebroeck Enzo Marinari, Guilhem Semerjian [Phys. Rev. E 75, 066708 (2007)] [J. Phys. Conf. Series 95, 012014 (2008)] Outline Introduction Statistical Mechanics Approach
More informationLecture 5: Graphs. Rajat Mittal. IIT Kanpur
Lecture : Graphs Rajat Mittal IIT Kanpur Combinatorial graphs provide a natural way to model connections between different objects. They are very useful in depicting communication networks, social networks
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 4: Analyzing Graphs (1/2) October 4, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are
More informationLecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science
Lecture 9: I: Web Retrieval II: Webology Johan Bollen Old Dominion University Department of Computer Science jbollen@cs.odu.edu http://www.cs.odu.edu/ jbollen April 10, 2003 Page 1 WWW retrieval Two approaches
More informationGraph Exploration: How to do better than the random walk? Adrian Kosowski. INRIA Bordeaux Sud-Ouest.
Graph Exploration: How to do better than the random walk? Adrian Kosowski INRIA Bordeaux Sud-Ouest kosowski@labri.fr Réunion Displexity La Rochelle, April 4, 2013 Talk outline Introduction to network exploration
More information1 Methods for Posterior Simulation
1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing
More informationIntroduction To Graphs and Networks. Fall 2013 Carola Wenk
Introduction To Graphs and Networks Fall 203 Carola Wenk On the Internet, links are essentially weighted by factors such as transit time, or cost. The goal is to find the shortest path from one node to
More informationMCMC Methods for data modeling
MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms
More informationCS281 Section 9: Graph Models and Practical MCMC
CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs
More informationSupervised Random Walks
Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate
More informationDS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH116 Fall 2017 Merged CS586 and DS504 Examples of Reviews/ Critiques Random
More informationCOMP 4601 Hubs and Authorities
COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one
More informationTopic mash II: assortativity, resilience, link prediction CS224W
Topic mash II: assortativity, resilience, link prediction CS224W Outline Node vs. edge percolation Resilience of randomly vs. preferentially grown networks Resilience in real-world networks network resilience
More informationEstimating Deep Web Properties by Random Walk
University of Windsor Scholarship at UWindsor Electronic Theses and Dissertations 2013 Estimating Deep Web Properties by Random Walk Sajib Kumer Sinha University of Windsor Follow this and additional works
More informationTheory of Computing. Lecture 10 MAS 714 Hartmut Klauck
Theory of Computing Lecture 10 MAS 714 Hartmut Klauck Seven Bridges of Königsberg Can one take a walk that crosses each bridge exactly once? Seven Bridges of Königsberg Model as a graph Is there a path
More informationDS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 First Grading for Reading Assignment Weka v 6 weeks v https://weka.waikato.ac.nz/dataminingwithweka/preview
More informationChapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han
Chapter 1. Social Media and Social Computing October 2012 Youn-Hee Han http://link.koreatech.ac.kr 1.1 Social Media A rapid development and change of the Web and the Internet Participatory web application
More informationGraph similarity. Laura Zager and George Verghese EECS, MIT. March 2005
Graph similarity Laura Zager and George Verghese EECS, MIT March 2005 Words you won t hear today impedance matching thyristor oxide layer VARs Some quick definitions GV (, E) a graph G V the set of vertices
More informationFramework and Algorithms for Network Bucket Testing
Framework and Algorithms for Network Bucket Testing Liran Katzir Yahoo! Labs., Haifa, Israel lirank@yahoo-inc.com Edo Liberty Yahoo! Labs., Haifa, Israel edo@yahoo-inc.com Oren Somekh Yahoo! Labs., Haifa,
More informationOnline Social Networks and Media. Community detection
Online Social Networks and Media Community detection 1 Notes on Homework 1 1. You should write your own code for generating the graphs. You may use SNAP graph primitives (e.g., add node/edge) 2. For the
More informationDS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 232 Fall 2016 The Data Equation Oceans of Data Ocean Biodiversity Informatics,
More informationMCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24
MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,
More informationQuantitative Biology II!
Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!
More informationOverview of the Stateof-the-Art. Networks. Evolution of social network studies
Overview of the Stateof-the-Art in Social Networks INF5370 spring 2014 Evolution of social network studies 1950-1970: mathematical studies of networks formed by the actual human interactions Pandemics,
More informationDegree Ranking Using Local Information
Degree Ranking Using Local Information arxiv:1706.01205v2 [cs.si] 10 Jun 2017 Akrati Saxena* akrati.saxena@iitrpr.ac.in S. R. S. Iyengar* sudarshan@iitrpr.ac.in Ralucca Gera** rgera@nps.edu *Department
More informationST440/540: Applied Bayesian Analysis. (5) Multi-parameter models - Initial values and convergence diagn
(5) Multi-parameter models - Initial values and convergence diagnostics Tuning the MCMC algoritm MCMC is beautiful because it can handle virtually any statistical model and it is usually pretty easy to
More informationWeb consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page
Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information
More informationGraph and Link Mining
Graph and Link Mining Graphs - Basics A graph is a powerful abstraction for modeling entities and their pairwise relationships. G = (V,E) Set of nodes V = v,, v 5 Set of edges E = { v, v 2, v 4, v 5 }
More informationAlgorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov
Algorithms and Applications in Social Networks 2017/2018, Semester B Slava Novgorodov 1 Lesson #1 Administrative questions Course overview Introduction to Social Networks Basic definitions Network properties
More information: Semantic Web (2013 Fall)
03-60-569: Web (2013 Fall) University of Windsor September 4, 2013 Table of contents 1 2 3 4 5 Definition of the Web The World Wide Web is a system of interlinked hypertext documents accessed via the Internet
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu Can we identify node groups? (communities, modules, clusters) 2/13/2014 Jure Leskovec, Stanford C246: Mining
More informationGraph Data Processing with MapReduce
Distributed data processing on the Cloud Lecture 5 Graph Data Processing with MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, 2015 (licensed under Creation Commons Attribution
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationCOMMUNITY SHELL S EFFECT ON THE DISINTEGRATION OF SOCIAL NETWORKS
Annales Univ. Sci. Budapest., Sect. Comp. 43 (2014) 57 68 COMMUNITY SHELL S EFFECT ON THE DISINTEGRATION OF SOCIAL NETWORKS Imre Szücs (Budapest, Hungary) Attila Kiss (Budapest, Hungary) Dedicated to András
More informationCSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection
CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationOptimizing Capacity-Heterogeneous Unstructured P2P Networks for Random-Walk Traffic
Optimizing Capacity-Heterogeneous Unstructured P2P Networks for Random-Walk Traffic Chandan Rama Reddy Microsoft Joint work with Derek Leonard and Dmitri Loguinov Internet Research Lab Department of Computer
More informationAiding the Detection of Fake Accounts in Large Scale Social Online Services
Aiding the Detection of Fake Accounts in Large Scale Social Online Services Qiang Cao Duke University Michael Sirivianos Xiaowei Yang Tiago Pregueiro Cyprus Univ. of Technology Duke University Tuenti,
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank
More informationPopularity of Twitter Accounts: PageRank on a Social Network
Popularity of Twitter Accounts: PageRank on a Social Network A.D-A December 8, 2017 1 Problem Statement Twitter is a social networking service, where users can create and interact with 140 character messages,
More informationIssues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users
Practical Considerations for WinBUGS Users Kate Cowles, Ph.D. Department of Statistics and Actuarial Science University of Iowa 22S:138 Lecture 12 Oct. 3, 2003 Issues in MCMC use for Bayesian model fitting
More information11/26/17. directed graphs. CS 220: Discrete Structures and their Applications. graphs zybooks chapter 10. undirected graphs
directed graphs S 220: iscrete Structures and their pplications collection of vertices and directed edges graphs zybooks chapter 10 collection of vertices and edges undirected graphs What can this represent?
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic
More informationEnhancing Stratified Graph Sampling Algorithms based on Approximate Degree Distribution
Enhancing Stratified Graph Sampling Algorithms based on Approximate Degree Distribution Junpeng Zhu 1,2, Hui Li 1,2(), Mei Chen 1,2, Zhenyu Dai 1,2 and Ming Zhu 3 1 College of Computer Science and Technology,
More informationCSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection
CSE 258 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationCS4800: Algorithms & Data Jonathan Ullman
CS4800: Algorithms & Data Jonathan Ullman Lecture 11: Graphs Graph Traversals: BFS Feb 16, 2018 What s Next What s Next Graph Algorithms: Graphs: Key Definitions, Properties, Representations Exploring
More informationIntroduction to Algorithms. Lecture 11. Prof. Patrick Jaillet
6.006- Introduction to Algorithms Lecture 11 Prof. Patrick Jaillet Lecture Overview Searching I: Graph Search and Representations Readings: CLRS 22.1-22.3, B.4 Graphs G=(V,E) V a set of vertices usually
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford
More informationCS6200 Information Retreival. The WebGraph. July 13, 2015
CS6200 Information Retreival The WebGraph The WebGraph July 13, 2015 1 Web Graph: pages and links The WebGraph describes the directed links between pages of the World Wide Web. A directed edge connects
More informationGraph Theory. Many problems are mapped to graphs. Problems. traffic VLSI circuits social network communication networks web pages relationship
Graph Graph Usage I want to visit all the known famous places starting from Seoul ending in Seoul Knowledge: distances, costs Find the optimal(distance or cost) path Graph Theory Many problems are mapped
More informationGraph Data Management Systems in New Applications Domains. Mikko Halin
Graph Data Management Systems in New Applications Domains Mikko Halin Introduction Presentation is based on two papers Graph Data Management Systems for New Application Domains - Philippe Cudré-Mauroux,
More informationLecture 26: Graphs: Traversal (Part 1)
CS8 Integrated Introduction to Computer Science Fisler, Nelson Lecture 6: Graphs: Traversal (Part ) 0:00 AM, Apr, 08 Contents Introduction. Definitions........................................... Representations.......................................
More informationLink Analysis in the Cloud
Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)
More informationBiological Networks Analysis
Biological Networks Analysis Introduction and Dijkstra s algorithm Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The clustering problem: partition genes into distinct
More informationAlgorithmic and Economic Aspects of Networks. Nicole Immorlica
Algorithmic and Economic Aspects of Networks Nicole Immorlica Syllabus 1. Jan. 8 th (today): Graph theory, network structure 2. Jan. 15 th : Random graphs, probabilistic network formation 3. Jan. 20 th
More informationCovert and Anomalous Network Discovery and Detection (CANDiD)
Covert and Anomalous Network Discovery and Detection (CANDiD) Rajmonda S. Caceres, Edo Airoldi, Garrett Bernstein, Edward Kao, Benjamin A. Miller, Raj Rao Nadakuditi, Matthew C. Schmidt, Kenneth Senne,
More informationCSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection
CSE 158 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More information