DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
|
|
- Ariel Poole
- 5 years ago
- Views:
Transcription
1 Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 233 Spring 2018
2 Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy, Reduce Air Pollution,... Urban Data Analytics Data Mining, Machine Learning, Visualization Recommender systems Big Graph Data Mining Applications Big Data Clustering Urban Data Management Spatio-temporal index, streaming, trajectory, and graph data management,... Management Human mobility Traffic Air Quality Meteorolo gy Social Media Energy Road Networks POIs Urban Sensing & Data Acquisition Participatory Sensing, Crowd Sensing, Mobile Sensing Acquisition Cleaning Urban Computing: concepts, methodologies, and applications. Zheng, Y., et al. ACM transactions on Intelligent Systems and Technology.
3 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna, 2011] J. Leskovec, A. Rajaraman, J. Ullman: 3 Mining of Massive Datasets,
4 Graph Data: Media Networks Connections between political blogs Polarization of the network [Adamic-Glance, 2005] J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 4
5 Graph Data: Information Nets Citation networks and Maps of science [Börner et al., 2012] J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 5
6 Graph Data: Communication Nets domain2 domain1 router domain3 Internet J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 6
7 Questions? Partial map of the Internet based on the January 15, 2005 data found on opte.org. (from
8 Graph Data: Topological Networks Seven Bridges of Königsberg [Euler, 1735] Return to the starting point by traveling each link of the graph once and only once. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 8
9 Graph representation of networks Friendship Resistance Co-authorship Following One-way road Wireless channel Undirected links Directed links Friend & foe Trust & distrust Multi-relational links Hyperlinks Repulsion & cohesion Signed links
10 Mining in Big Graphs v Network Statistic Analysis (this lecture) Network Size Degree distribution. v Node Ranking (Next lecture) Identifying most influential nodes Viral Marketing, resource allocation
11 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna, 2011] J. Leskovec, A. Rajaraman, J. Ullman: 11 Mining of Massive Datasets,
12 Sampling graphs Random sampling (uniform & independent) crawling } vertex sampling } BFS sampling } edge sampling } random walk sampling 12 12
13 Random Walks on Graphs Random walk sampling Random Walk Routing Molecule in liquid Influence diffusion
14 Undirected Graphs Undirected!!
15 Random Walk v Adjacency matrix! # A = # # # " $ & & & & %! # D = # # # " v Transition Probability Matrix 1 if i is not equal to j P ij = k i 0 if i=j v E : number of links v Stationary Distribution $ & & & & % Symmetric 1 " $ P = A D 1 = $ $ $ # Undirected 0 1/ 3 1/ 3 1/ 3 1/ 2 0 1/ 2 0 1/ 3 1/ 3 0 1/ 3 1/ 2 0 1/ 2 0 % ' ' ' ' & π i = d i 2 E
16 Metropolis-Hastings Random Walk v Adjacency matrix! # A = # # # " $ & & & & %! # D = # # # " $ & & & & % Symmetric v Transition Probability Matrix 1 min(1, k υ ) if w neighbor of υ P MH k υ,w = υ k w 1 MH P υ,y if w=υ y υ v E : number of links v Stationary Distribution π υ = 1 V 1 " $ P = A D 1 = $ $ $ # Undirected 0 1/ 3 1/ 3 1/ 3 1/ 3 1/ 3 1/ 3 0 1/ 3 1/ 3 0 1/ 3 1/ 3 0 1/ 3 1/ Undirected 2 % ' ' ' ' &
17 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant, Carter Butts, Athina Markopoulou UC Irvine, EPFL Minas Gjoka, UC Irvine Walking in Facebook 17
18 Outline v Motivation and Problem Statement v Sampling Methodology v Data Analysis v Conclusion Minas Gjoka, UC Irvine Walking in Facebook 18
19 Online Social Networks (OSNs) v A network of declared friendships between users v Allows users to maintain relationships G F E H C D B v Many popular OSNs with different focus Facebook, LinkedIn, Flickr, A Social Graph Minas Gjoka, UC Irvine Walking in Facebook 19
20 Why Sample OSNs? v Representative samples desirable study properties test algorithms We use node distribution in this study v Obtaining complete dataset difficult companies usually unwilling to share data tremendous overhead to measure all (~100TB for Facebook) Minas Gjoka, UC Irvine Walking in Facebook 20
21 Problem statement v Obtain a representative sample of users in a given OSN by exploration of the social graph. in this work we sample Facebook (FB) explore graph using various crawling techniques Minas Gjoka, UC Irvine Walking in Facebook 21
22 Related Work v Graph traversal (BFS) A. Mislove et al, IMC 2007 Y. Ahn et al, WWW 2007 C. Wilson, Eurosys 2009 v Random walks (MHRW, RDS) M. Henzinger et al, WWW 2000 D. Stutbach et al, IMC 2006 A. Rasti et al, Mini Infocom 2009 Minas Gjoka, UC Irvine Walking in Facebook 22
23 Outline v Motivation and Problem Statement v Sampling Methodology crawling methods data collection convergence evaluation method comparisons v Data Analysis v Conclusion Minas Gjoka, UC Irvine Walking in Facebook 23
24 (1) Breadth-First-Search (BFS) v v v pa ( ) Starting from a seed, explores all neighbor nodes. Process continues iteratively without replacement. BFS leads to bias towards high degree nodes Lee et al, Statistical properties of Sampled Networks, Phys Review E, 2006 Early measurement studies of OSNs use BFS as primary sampling technique i.e [Mislove et al], [Ahn et al], [Wilson et al.] i u A1 A i i = = 1 V u V H G D Subset of sampled nodes with value i All sampled nodes F E C B A Unexplored Explored Visited Walking in Facebook 24
25 (2) Random Walk (RW) Explores graph one node at a time with replacement P Degree of node υ RW υ, w = 1 k In the stationary distribution π υ = k 2 υ υ E Number of edges Subset of sampled u A1 A i i nodes with value i pa ( i) = = u V1 V All sampled nodes Minas Gjoka, UC Irvine Walking in Facebook 25 H G D 1/3 C F 1/3 A 1/3 Next candidate Current node E B
26 (3) Re-Weighted Random Walk (RWRW) Hansen-Hurwitz estimator v Corrects for degree bias at the end of collection v Without re-weighting, the probability distribution for node property A is: pa ( ) i u A1 A i i = = 1 V u V v Re-Weighted probability distribution : Subset of sampled nodes with value i All sampled nodes Minas Gjoka, UC Irvine pa ( ) i u A = u V i 1/ 1/ k k u u Degree of node u Walking in Facebook 26
27 (4) Metropolis-Hastings Random Walk (MHRW) v Explore graph one node at a time with replacement H G C F E 1 k min(1, υ ) if w neighbor of υ MH k k P υ w υ, w = MH 1 Pυ, y if w= υ y υ v In the stationary distribution pa ( ) i 1 π υ = V u A1 A i i = = 1 V u V MH P AA Subset of sampled nodes with value i All sampled nodes D 1/3 1/5 A 2/15 1/3 Walking in Facebook 27 B Next candidate Current node = 1 ( + + ) = MH 1 3 P AC = =
28 Uniform userid Sampling (UNI) v As a basis for comparison, we collect a uniform sample of Facebook userids (UNI) rejection sampling on the 32-bit userid space v UNI not a general solution for sampling OSNs userid space must not be sparse names instead of numbers Minas Gjoka, UC Irvine Walking in Facebook 28
29 Summary of Datasets Sampling method MHRW RW BFS UNI #Valid Users 28x81K 28x81K 28x81K 984K # Unique Users 957K 2.19M 2.20M 984K Egonets for a subsample of MHRW - local properties of nodes Datasets available at: Minas Gjoka, UC Irvine Walking in Facebook 29
30 Data Collection Basic Node Information v What information do we collect for each sampled node u? UserID Name Networks Privacy settings Friend List UserID Name Networks Privacy Settings u UserID Name Networks Privacy settings 1111 UserID Name Networks Privacy settings Regional School/Workplace Send Message View Friends Profile Photo Add as Friend Minas Gjoka, UC Irvine Walking in Facebook 30
31 Detecting Convergence Number of samples (iterations) to loose dependence from starting points? Minas Gjoka, UC Irvine Walking in Facebook 31
32 Online Convergence Diagnostics Geweke v Detects convergence for a single walk. Let X be a sequence of samples for metric of interest. Xa Xb a =10%,b = 50% J. Geweke, Evaluating the accuracy of sampling based approaches to calculate posterior moments in Bayesian Statistics 4, 1992 Minas Gjoka, UC Irvine z = E(X a ) E(X b ) Var(X a ) Var(X b ) [ 1,1] Walking in Facebook 32
33 Online Convergence Diagnostics Gelman-Rubin v Detects convergence for M>1 walks Walk 1 Between walks variance Walk 2 Walk 3 R = N 1 N + M +1 B MN W Within walks variance Convergence is declared when R<1.02 A. Gelman, D. Rubin, Inference from iterative simulation using multiple sequences in Statistical Science Volume 7, 1992 Minas Gjoka, UC Irvine Walking in Facebook 33
34 When do we reach equilibrium? Node Degree Burn-in determined to be 3K Minas Gjoka, UC Irvine Walking in Facebook 34
35 Methods Comparison Node Degree v Poor performance for BFS, RW 28 crawls v MHRW, RWRW produce good estimates per chain overall Minas Gjoka, UC Irvine Walking in Facebook 35
36 Sampling Bias BFS v Low degree nodes underrepresented by two orders of magnitude v BFS is biased Minas Gjoka, UC Irvine Walking in Facebook 36
37 Sampling Bias MHRW, RW, RWRW v v Degree distribution identical to UNI (MHRW,RWRW) RW as biased as BFS but with smaller variance in each walk Minas Gjoka, UC Irvine Walking in Facebook 37
38 Practical Recommendations for Sampling Methods v Use MHRW or RWRW. Do not use BFS, RW. v Use formal convergence diagnostics assess convergence online use multiple parallel walks v MHRW vs RWRW RWRW slightly better performance MHRW provides a ready-to-use sample Minas Gjoka, UC Irvine Walking in Facebook 38
39 FB Social Graph Degree Distribution, Power Law Distribution f(k)=b*k -a a 1 =1.32 a 2 =3.38 v Degree distribution not a power law Minas Gjoka, UC Irvine Walking in Facebook 39
40 Outline v Motivation and Problem Statement v Sampling Methodology v Data Analysis v Conclusion Minas Gjoka, UC Irvine Walking in Facebook 40
41 Any Comments & Critiques?
42 Next Week v 5 team presentations v 30 minutes each team including the presentation and Q&A v We will have snacks and soft drinks Minas Gjoka, UC Irvine Walking in Facebook 42
43 Project 2 starts next week v KDD cup 2018 v Real data, worldwide competition v Will be announced March 1 st, 2018 v Define your own project or KDD cup v v v view/announcing-kdd-cup highway-tollgates-traffic-flowprediction Gjoka, UC Minas Irvine Walking in Facebook 43
DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationDS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 Reiews/Critiques I will choose one reiew to grade this week. Graph Data: Social
More informationA Walk in Facebook: Uniform Sampling of Users in Online Social Networks
A Walk in Facebook: Uniform Sampling of Users in Online Social Networks Minas Gjoka, Maciej Kurant, Carter T. Butts, Athina Markopoulou California Institute for Telecommunications and Information Technology
More informationA Walk in Facebook: Uniform Sampling of Users in Online Social Networks
A Walk in Facebook: Uniform Sampling of Users in Online Social Networks Minas Gjoka CalIT2 UC Irvine mgjoka@uci.edu Maciej Kurant CalIT2 UC Irvine maciej.kurant@gmail.com Carter T. Butts Sociology Dept
More informationDS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li
Welcome to DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm Wednesday Location: Fuller 320 Spring 2017 2 Team assignment Finalized. (Great!) Guest Speaker 2/22 A
More informationUnbiased Sampling of Facebook
Unbiased Sampling of Facebook Minas Gjoka Networked Systems UC Irvine mgjoka@uci.edu Maciej Kurant School of ICS EPFL, Lausanne maciej.kurant@epfl.ch Carter T. Butts Sociology Dept UC Irvine buttsc@uci.edu
More informationDS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 First Grading for Reading Assignment Weka v 6 weeks v https://weka.waikato.ac.nz/dataminingwithweka/preview
More informationWalking in Facebook: A Case Study of Unbiased Sampling of OSNs
Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka Networked Systems UC Irvine mgjoka@uci.edu Maciej Kurant School of Comp.& Comm. Sciences EPFL, Lausanne maciej.kurant@epfl.ch
More informationMaking social interactions accessible in online social networks
Information Services & Use 33 (2013) 113 117 113 DOI 10.3233/ISU-130702 IOS Press Making social interactions accessible in online social networks Fredrik Erlandsson a,, Roozbeh Nia a, Henric Johnson b
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationEmpirical Studies on Methods of Crawling Directed Networks
www.ijcsi.org 487 Empirical Studies on Methods of Crawling Directed Networks Junjie Tong, Haihong E, Meina Song and Junde Song School of Computer, Beijing University of Posts and Telecommunications Beijing,
More informationHow to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200?
How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200? Questions from last time Avg. FB degree is 200 (suppose).
More informationPart 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationSampling Large Graphs: Algorithms and Applications
Sampling Large Graphs: Algorithms and Applications Don Towsley College of Information & Computer Science Umass - Amherst Collaborators: P.H. Wang, J.C.S. Lui, J.Z. Zhou, X. Guan Measuring, analyzing large
More informationMultigraph Sampling of Online Social Networks
Multigraph Sampling of Online Social Networks Minas Gjoka CalIT2 UC Irvine mgjoka@uci.edu Carter T. Butts Sociology Dept UC Irvine buttsc@uci.edu Maciej Kurant CalIT2 UC Irvine maciej.kurant@gmail.com
More informationMining and Analyzing Online Social Networks
The 5th EuroSys Doctoral Workshop (EuroDW 2011) Salzburg, Austria, Sunday 10 April 2011 Mining and Analyzing Online Social Networks Emilio Ferrara eferrara@unime.it Advisor: Prof. Giacomo Fiumara PhD School
More informationA two-phase Sampling based Algorithm for Social Networks
A two-phase Sampling based Algorithm for Social Networks Zeinab S. Jalali, Alireza Rezvanian, Mohammad Reza Meybodi Soft Computing Laboratory, Computer Engineering and Information Technology Department
More informationSampling Large Graphs: Algorithms and Applications
Sampling Large Graphs: Algorithms and Applications Don Towsley Umass - Amherst Joint work with P.H. Wang, J.Z. Zhou, J.C.S. Lui, X. Guan Measuring, Analyzing Large Networks - large networks can be represented
More informationCS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul
1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given
More informationEmpirical Characterization of P2P Systems
Empirical Characterization of P2P Systems Reza Rejaie Mirage Research Group Department of Computer & Information Science University of Oregon http://mirage.cs.uoregon.edu/ Collaborators: Daniel Stutzbach
More informationLecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods
Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur
More informationProbabilistic Models in Social Network Analysis
Probabilistic Models in Social Network Analysis Sargur N. Srihari University at Buffalo, The State University of New York USA US-India Workshop on Large Scale Data Analytics and Intelligent Services December
More informationNavigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets
Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Nadathur Satish, Narayanan Sundaram, Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming
More informationDS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH116 Fall 2017 Merged CS586 and DS504 Examples of Reviews/ Critiques Random
More informationOSN: when multiple autonomous users disclose another individual s information
OSN: when multiple autonomous users disclose another individual s information Cristina Pérez-Solà Dept. d Enginyeria de la Informació i les Comunicacions Universitat Autònoma de Barcelona Email: cperez@deic.uab.cat
More informationEstimating Sizes of Social Networks via Biased Sampling
Estimating Sizes of Social Networks via Biased Sampling Liran Katzir, Edo Liberty, and Oren Somekh Yahoo! Labs, Haifa, Israel International World Wide Web Conference, 28th March - 1st April 2011, Hyderabad,
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm
More informationDS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 232 Fall 2016 The Data Equation Oceans of Data Ocean Biodiversity Informatics,
More informationSupplementary file for SybilDefender: A Defense Mechanism for Sybil Attacks in Large Social Networks
1 Supplementary file for SybilDefender: A Defense Mechanism for Sybil Attacks in Large Social Networks Wei Wei, Fengyuan Xu, Chiu C. Tan, Qun Li The College of William and Mary, Temple University {wwei,
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours
More informationCrawling Online Social Graphs
Crawling Online Social Graphs Shaozhi Ye Department of Computer Science University of California, Davis sye@ucdavis.edu Juan Lang Department of Computer Science University of California, Davis jilang@ucdavis.edu
More informationChapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han
Chapter 1. Social Media and Social Computing October 2012 Youn-Hee Han http://link.koreatech.ac.kr 1.1 Social Media A rapid development and change of the Web and the Internet Participatory web application
More informationModeling and Estimating the Structure of D2D-Based Mobile Social Networks
Modeling and Estimating the Structure of DD-Based Mobile Social Networks Sigit Aryo Pambudi Wenye Wang Department of Electrical and Computer Engineering North Carolina State University, Raleigh, NC 7606
More informationLecture 27: Learning from relational data
Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission
More informationMining Social Network Graphs
Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be
More informationOverview of the Stateof-the-Art. Networks. Evolution of social network studies
Overview of the Stateof-the-Art in Social Networks INF5370 spring 2014 Evolution of social network studies 1950-1970: mathematical studies of networks formed by the actual human interactions Pandemics,
More informationExtracting Information from Complex Networks
Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform
More informationVideo Sharing Websites Study
Video Sharing Websites Study Content Characteristic Analysis Nan ZHAO Télécom ParisTech Paris, France nan.zhao@telecom-paristech.fr Loïc BAUD DREV, Hadopi Paris, France loic.baud@hadopi.fr Patrick BELLOT
More informationMining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationDS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu Location: AK 232 Fall 2016 High Dimensional Data v Given a cloud of data points we want to understand
More informationThis is the published version of a paper presented at Network Intelligence Conference (ENIC), 2015 Second European.
http://www.diva-portal.org This is the published version of a paper presented at Network Intelligence Conference (ENIC), 2015 Second European. Citation for the original published paper: Erlandsson, F.,
More informationCS224W: Analysis of Network Jure Leskovec, Stanford University
CS224W: Analysis of Network Jure Leskovec, Stanford University http://cs224w.stanford.edu 9/25/17 Jure Leskovec, Stanford CS224W: Analysis of Networks 2 Why Networks? Networks are a general language for
More informationData mining --- mining graphs
Data mining --- mining graphs University of South Florida Xiaoning Qian Today s Lecture 1. Complex networks 2. Graph representation for networks 3. Markov chain 4. Viral propagation 5. Google s PageRank
More informationCOMMUNITY SHELL S EFFECT ON THE DISINTEGRATION OF SOCIAL NETWORKS
Annales Univ. Sci. Budapest., Sect. Comp. 43 (2014) 57 68 COMMUNITY SHELL S EFFECT ON THE DISINTEGRATION OF SOCIAL NETWORKS Imre Szücs (Budapest, Hungary) Attila Kiss (Budapest, Hungary) Dedicated to András
More informationSupervised Random Walks
Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate
More informationGraph Data Management
Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of
More information745: Advanced Database Systems
745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.
More informationA Short History of Markov Chain Monte Carlo
A Short History of Markov Chain Monte Carlo Christian Robert and George Casella 2010 Introduction Lack of computing machinery, or background on Markov chains, or hesitation to trust in the practicality
More informationAlgorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov
Algorithms and Applications in Social Networks 2017/2018, Semester B Slava Novgorodov 1 Lesson #1 Administrative questions Course overview Introduction to Social Networks Basic definitions Network properties
More informationAn Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization
An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization Pedro Ribeiro (DCC/FCUP & CRACS/INESC-TEC) Part 1 Motivation and emergence of Network Science
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu Can we identify node groups? (communities, modules, clusters) 2/13/2014 Jure Leskovec, Stanford C246: Mining
More informationMarkov Chain Monte Carlo (part 1)
Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for
More informationGraph Data Processing with MapReduce
Distributed data processing on the Cloud Lecture 5 Graph Data Processing with MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, 2015 (licensed under Creation Commons Attribution
More informationObserving the Evolution of Social Network on Weibo by Sampled Data
Observing the Evolution of Social Network on Weibo by Sampled Data Lu Ma, Gang Lu, Junxia Guo College of Information Science and Technology Beijing University of Chemical Technology Beijing, China sizheng@126.com
More informationCycles in Random Graphs
Cycles in Random Graphs Valery Van Kerrebroeck Enzo Marinari, Guilhem Semerjian [Phys. Rev. E 75, 066708 (2007)] [J. Phys. Conf. Series 95, 012014 (2008)] Outline Introduction Statistical Mechanics Approach
More informationGraph Data Management Systems in New Applications Domains. Mikko Halin
Graph Data Management Systems in New Applications Domains Mikko Halin Introduction Presentation is based on two papers Graph Data Management Systems for New Application Domains - Philippe Cudré-Mauroux,
More informationCS281 Section 9: Graph Models and Practical MCMC
CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs
More informationPopularity of Twitter Accounts: PageRank on a Social Network
Popularity of Twitter Accounts: PageRank on a Social Network A.D-A December 8, 2017 1 Problem Statement Twitter is a social networking service, where users can create and interact with 140 character messages,
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic
More informationSocial Networks 2015 Lecture 10: The structure of the web and link analysis
04198250 Social Networks 2015 Lecture 10: The structure of the web and link analysis The structure of the web Information networks Nodes: pieces of information Links: different relations between information
More informationEnhancing Stratified Graph Sampling Algorithms based on Approximate Degree Distribution
Enhancing Stratified Graph Sampling Algorithms based on Approximate Degree Distribution Junpeng Zhu 1,2, Hui Li 1,2(), Mei Chen 1,2, Zhenyu Dai 1,2 and Ming Zhu 3 1 College of Computer Science and Technology,
More informationQuantitative Biology II!
Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!
More informationTopic mash II: assortativity, resilience, link prediction CS224W
Topic mash II: assortativity, resilience, link prediction CS224W Outline Node vs. edge percolation Resilience of randomly vs. preferentially grown networks Resilience in real-world networks network resilience
More informationJordan Boyd-Graber University of Maryland. Thursday, March 3, 2011
Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationLink Analysis in the Cloud
Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)
More informationGraph Exploration: How to do better than the random walk? Adrian Kosowski. INRIA Bordeaux Sud-Ouest.
Graph Exploration: How to do better than the random walk? Adrian Kosowski INRIA Bordeaux Sud-Ouest kosowski@labri.fr Réunion Displexity La Rochelle, April 4, 2013 Talk outline Introduction to network exploration
More informationLatent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017
Latent Space Model for Road Networks to Predict Time-Varying Traffic Presented by: Rob Fitzgerald Spring 2017 Definition of Latent https://en.oxforddictionaries.com/definition/latent Latent Space Model?
More informationCSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection
CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationLecture 5: Graphs. Rajat Mittal. IIT Kanpur
Lecture : Graphs Rajat Mittal IIT Kanpur Combinatorial graphs provide a natural way to model connections between different objects. They are very useful in depicting communication networks, social networks
More informationGraph similarity. Laura Zager and George Verghese EECS, MIT. March 2005
Graph similarity Laura Zager and George Verghese EECS, MIT March 2005 Words you won t hear today impedance matching thyristor oxide layer VARs Some quick definitions GV (, E) a graph G V the set of vertices
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationOnline Social Networks and Media. Community detection
Online Social Networks and Media Community detection 1 Notes on Homework 1 1. You should write your own code for generating the graphs. You may use SNAP graph primitives (e.g., add node/edge) 2. For the
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 4: Analyzing Graphs (1/2) October 4, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are
More informationSome Graph Theory for Network Analysis. CS 249B: Science of Networks Week 01: Thursday, 01/31/08 Daniel Bilar Wellesley College Spring 2008
Some Graph Theory for Network Analysis CS 9B: Science of Networks Week 0: Thursday, 0//08 Daniel Bilar Wellesley College Spring 008 Goals this lecture Introduce you to some jargon what we call things in
More informationLecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science
Lecture 9: I: Web Retrieval II: Webology Johan Bollen Old Dominion University Department of Computer Science jbollen@cs.odu.edu http://www.cs.odu.edu/ jbollen April 10, 2003 Page 1 WWW retrieval Two approaches
More informationMining Data that Changes. 17 July 2015
Mining Data that Changes 17 July 2015 Data is Not Static Data is not static New transactions, new friends, stop following somebody in Twitter, But most data mining algorithms assume static data Even a
More information1 Methods for Posterior Simulation
1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing
More informationCSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection
CSE 258 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationMCMC Methods for data modeling
MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms
More informationWeb consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page
Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information
More informationCOMP 4601 Hubs and Authorities
COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one
More informationData-Intensive Computing with MapReduce
Data-Intensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationUniversity of Maryland. Tuesday, March 2, 2010
Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationFramework and Algorithms for Network Bucket Testing
Framework and Algorithms for Network Bucket Testing Liran Katzir Yahoo! Labs., Haifa, Israel lirank@yahoo-inc.com Edo Liberty Yahoo! Labs., Haifa, Israel edo@yahoo-inc.com Oren Somekh Yahoo! Labs., Haifa,
More informationRecommendation System for Location-based Social Network CS224W Project Report
Recommendation System for Location-based Social Network CS224W Project Report Group 42, Yiying Cheng, Yangru Fang, Yongqing Yuan 1 Introduction With the rapid development of mobile devices and wireless
More informationBiological Networks Analysis
Biological Networks Analysis Introduction and Dijkstra s algorithm Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The clustering problem: partition genes into distinct
More informationCSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection
CSE 158 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationAlgorithmic and Economic Aspects of Networks. Nicole Immorlica
Algorithmic and Economic Aspects of Networks Nicole Immorlica Syllabus 1. Jan. 8 th (today): Graph theory, network structure 2. Jan. 15 th : Random graphs, probabilistic network formation 3. Jan. 20 th
More informationTheory of Computing. Lecture 10 MAS 714 Hartmut Klauck
Theory of Computing Lecture 10 MAS 714 Hartmut Klauck Seven Bridges of Königsberg Can one take a walk that crosses each bridge exactly once? Seven Bridges of Königsberg Model as a graph Is there a path
More informationDegree Ranking Using Local Information
Degree Ranking Using Local Information arxiv:1706.01205v2 [cs.si] 10 Jun 2017 Akrati Saxena* akrati.saxena@iitrpr.ac.in S. R. S. Iyengar* sudarshan@iitrpr.ac.in Ralucca Gera** rgera@nps.edu *Department
More informationJure Leskovec Machine Learning Department Carnegie Mellon University
Jure Leskovec Machine Learning Department Carnegie Mellon University Currently: Soon: Today: Large on line systems have detailed records of human activity On line communities: Facebook (64 million users,
More informationCSE 316: SOCIAL NETWORK ANALYSIS INTRODUCTION. Fall 2017 Marion Neumann
CSE 316: SOCIAL NETWORK ANALYSIS Fall 2017 Marion Neumann INTRODUCTION Contents in these slides may be subject to copyright. Some materials are adopted from: http://www.cs.cornell.edu/home /kleinber/ networks-book,
More informationEpilog: Further Topics
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Epilog: Further Topics Lecture: Prof. Dr. Thomas
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Recommender Systems II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Recommender Systems Recommendation via Information Network Analysis Hybrid Collaborative Filtering
More informationIssues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users
Practical Considerations for WinBUGS Users Kate Cowles, Ph.D. Department of Statistics and Actuarial Science University of Iowa 22S:138 Lecture 12 Oct. 3, 2003 Issues in MCMC use for Bayesian model fitting
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank
More informationTI2736-B Big Data Processing. Claudia Hauff
TI2736-B Big Data Processing Claudia Hauff ti2736b-ewi@tudelft.nl Intro Streams Streams Map Reduce HDFS Pig Ctd. Graphs Pig Design Patterns Hadoop Ctd. Giraph Zoo Keeper Spark Spark Ctd. Learning objectives
More informationUnbiased Sampling over Online Social Networks
Unbiased Sampling over Online Social Networks Mojtaba Torkjazi Department of Computer & Information Science University of Oregon moji@cs.uoreogn.edu ABSTRACT During recent years, Online Social Networks
More informationGraphs (Part II) Shannon Quinn
Graphs (Part II) Shannon Quinn (with thanks to William Cohen and Aapo Kyrola of CMU, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University) Parallel Graph Computation Distributed computation
More information