CS246: Mining Massive Datasets Jure Leskovec, Stanford University
|
|
- Tyrone Earl Goodwin
- 5 years ago
- Views:
Transcription
1 CS246: Mining Massive Datasets Jure Leskovec, Stanford University
2 Can we identify node groups? (communities, modules, clusters) 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets Nodes: Football Teams Edges: Games played 2
3 NCAA conferences 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets Nodes: Football Teams Edges: Games played 3
4 Can we identify functional modules? 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets Nodes: Proteins Edges: Physical interactions 4
5 Functional modules 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets Nodes: Proteins Edges: Physical interactions 5
6 Can we identify social communities? 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets Nodes: Facebook Users Edges: Friendships 6
7 Social communities High school Summer internship Stanford (Squash) Stanford (Basketball) 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets Nodes: Facebook Users Edges: Friendships 7
8 Non-overlapping vs. overlapping communities 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 8
9 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 9 Nodes Nodes Network Adjacency matrix
10 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 10 What is the structure of community overlaps: Edge density in the overlaps is higher! Communities as tiles
11 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 11 Communities in a network This is what we want!
12 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 12 1) Given a model, we generate the network: Generative model for networks A C B D E F G 2) Given a network, find the best model H A C B D E H F G Generative model for networks
13 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 13 Goal: Define a model that can generate networks The model will have a set of parameters that we will later want to estimate (and detect communities) Generative model for networks A C B D E F G Q: Given a set of nodes, how do communities generate edges of the network? H
14 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 14 Communities, C p A p B Model Memberships, M Nodes, V Model Network Generative model B(V, C, M, {p c }) for graphs: Nodes V, Communities C, Memberships M Each community c has a single probability p c Later we fit the model to networks to detect communities
15 Communities, C Memberships, M p A p B Model Nodes, V Community Affiliations AGM generates the links: For each For each pair of nodes in community AA, we connect them with prob. pp AA The overall edge probability is: P ( u, v) = 1 (1 ) p c c M u M v If uu, vv share no communities: PP uu, vv = εε Network MM uu set of communities node uu belongs to Think of this as an OR function: If at least 1 community says YES we create an edge 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 15
16 Network 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 16 Model
17 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 17 AGM can express a variety of community structures: Non-overlapping, Overlapping, Nested
18 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 18 Detecting communities with AGM: A C B D E F G H Given a Graph, find the Model 1) Affiliation graph M 2) Number of communities C 3) Parameters p c
19 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 19 Maximum Likelihood Principle (MLE): Given: Data XX Assumption: Data is generated by some model ff(θθ) ff model ΘΘ model parameters Want to estimate PP ff XX ΘΘ): The probability that our model ff (with parameters ΘΘ) generated the data Now let s find the most likely model that could have generated the data: arg max PP ff XX ΘΘ) Θ
20 Imagine we are given a set of coin flips Task: Figure the bias of a coin! Data: Sequence of coin flips: XX = [11, 00, 00, 00, 11, 00, 00, 11] Model: ff ΘΘ = return 1 with prob. Θ, else return 0 What is PP ff XX ΘΘ? Assuming coin flips are independent So, PP ff XX ΘΘ = PP ff 00 ΘΘ PP ff 11 ΘΘ PP ff 11 ΘΘ PP ff 00 ΘΘ What is PP ff 11 ΘΘ? Simple, PP ff 00 ΘΘ = ΘΘ Then, PP ff XX ΘΘ = ΘΘ ΘΘ 55 For example: PP ff XX ΘΘ = = PP ff XX ΘΘ = = What did we learn? Our data was most likely generated by coin with bias ΘΘ = 33/88 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 20 PP ff XX ΘΘ ΘΘ = 33/88 ΘΘ
21 How do we do MLE for graphs? Model generates a probabilistic adjacency matrix We then flip all the entries of the probabilistic matrix to obtain the binary adjacency matrix For every pair of nodes uu, vv AGM gives the prob. pp uuuu of them being linked Flip biased coins The likelihood of AGM generating graph G: P( G Θ) = Π ( u, v) G P( u, v) ( u, v) G (1 P( u, v)) 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 21 Π
22 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 22 Given graph G and Θ we calculate likelihood that Θ generated G: P(G Θ) G Θ=B(V, C, M, {p c }) P(G Θ) G P( G Θ) = Π ( u, v) G P( u, v) Π ( u, v) G (1 P( u, v))
23 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 23 Our goal: Find ΘΘ = BB(VV, CC, MM, pp CC ) such that: arg max Θ P( GG ) AGM Θ How do we find BB(VV, CC, MM, pp CC ) that maximizes the likelihood? No nice way, have to search over all affiliation graphs But, don t despair
24 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 24 Relaxation: Memberships have strengths FF uuuu u v FF uuuu : The membership strength of node uu to community AA (FF uuuu = 00: no membership) Each community AA links nodes independently: PP AA uu, vv = 11 eeeeee ( FF uuaa FF vvaa )
25 j Community membership strength matrix FF Nodes FF = Communities PP AA uu, vv = 11 eeeeee ( FF uuuu FF vvvv ) Probability of connection is proportional to the product of strengths Prob. that at least one common community CC links the nodes: PP uu, vv = 11 CC 11 PP CC uu, vv FF uuuu strength of uu s membership to AA FF vv vector of community membership strengths of uu 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 25
26 FF uu : FF vv : FF ww : Community AA links nodes uu, vv independently: PP AA uu, vv = 11 eeeeee ( FF uuaa FF vvaa ) Then prob. at least one common CC links them: PP uu, vv = 11 CC 11 PP CC uu, vv = 11 eeeeee ( CC FF uuuu FF vvvv ) = 11 eeeeee ( FF uu FF TT vv ) For example: Node community membership strengths 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets Then: FF uu FF vv TT = And: PP uu, vv = 11 eeeeee = But: PP uu, ww = PP vv, ww = 00 26
27 Another way to think about this is: Think of a weighted graph (but we don t see the weights) uu, vv interact with strength XX AA uuuu ~PPPPPPPP(FF uuaa FF vvaa ) Total interaction strength is the sum: XX uuuu = CC CC XX uuuu Then: XX uuuu ~PPPPPPPP( CC FF uuuu FF vvvv ) We observe an edge if there is non-zero interaction: PP XX uuuu > 0 = 1 PP XX uuuu = 0 = = 11 eeeeee ( FF uu FF TT vv ) 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets Poisson PMF: PP(XX = xx) = λλkk kk! ee λλ 27
28 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets [wsdm 13] 28 Task: Given a network, estimate FF Find FF that maximizes the log-likelihood Many times we take the logarithm of the likelihood and call it log-likelihood: ll FF = log PP(GG FF) Objective function is biconvex Non-negative matrix factorization: Update FF uuuu for node uu while fixing the memberships of all other nodes
29 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 29 Task: Given a network, estimate FF Find FF that maximizes the log-likelihood Many times we take the logarithm of the likelihood and call it log-likelihood: ll FF = llllll PP(GG FF) Connection: Non-negative matrix factorization We want to factor the adjacency matrix G 11 eeeeee (. ) = F F T Matrix of edge probs. Graph G
30 Non-negative matrix factorization: Use a gradient based approach! But updating the whole FF takes too long Computing the gradient takes quadratic time! Our network are far too big to allow for that: Network with 1M nodes, can have different edges Instead, update FF in small units : Update FF uuuu for node uu while fixing the memberships of all other nodes 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 30
31 This is slow! Computing FF uu takes linear time! 31 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets Compute gradient of a single row FF uu of FF: Coordinate gradient ascent: Iterate over the rows of FF: Compute gradient FF uu of row uu (while keeping others fixed) Update the row FF uu : FF uu FF uu + ηη ll(ff uu ) Project FF uu back to a non-negative vector: If FF uuuu < 00: FF uuuu = 00
32 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 32 However, we notice: We can cache vv FF vv So, computing vv NN(uu) FF vv now takes linear time in the degree NN(uu) of uu In networks degree of a node is much smaller to the total number of nodes in the network, so this is a significant speedup!
33 BigCLAM takes 5 minutes for 300k node nets Other methods take 10 days Can process networks with 100M edges! 33
34 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 34
35 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 35 Extension: Make community membership edges directed: Outgoing membership: Nodes sends edges Incoming membership: Node receives edges
36 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 36
37 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 37 Everything is almost the same except now we have 2 matrices: FF and HH FF out-going community memberships HH in-coming community memberships FF HH Edge prob.: PP uu, vv = 11 eeeeee( FF uu HH vv TT ) Network log-likelihood: which we optimize the same way as before
38 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 38
39 2/13/2014 Jure Leskovec, Stanford C246: Mining Massive Datasets 39 Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach by J. Yang, J. Leskovec. ACM International Conference on Web Search and Data Mining (WSDM), Detecting Cohesive and 2-mode Communities in Directed and Undirected Networks by J. Yang, J. McAuley, J. Leskovec. ACM International Conference on Web Search and Data Mining (WSDM), Community Detection in Networks with Node Attributes by J. Yang, J. McAuley, J. Leskovec. IEEE International Conference On Data Mining (ICDM), 2013.
Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationOnline Social Networks and Media. Community detection
Online Social Networks and Media Community detection 1 Notes on Homework 1 1. You should write your own code for generating the graphs. You may use SNAP graph primitives (e.g., add node/edge) 2. For the
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford
More informationCS224W: Analysis of Networks Jure Leskovec, Stanford University
CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Setting from the last class: AB-A : gets a AB-B : gets b AB-AB : gets max(a, b) Also: Cost
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/24/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 High dim. data
More informationGaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm
Gaussian Mixture Models For Clustering Data Soft Clustering and the EM Algorithm K-Means Clustering Input: Observations: xx ii R dd ii {1,., NN} Number of Clusters: kk Output: Cluster Assignments. Cluster
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu Web advertising We discussed how to match advertisers to queries in real-time But we did not discuss how to estimate
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS46: Mining Massive Datasets Jure Leskovec, Stanford University http://cs46.stanford.edu /7/ Jure Leskovec, Stanford C46: Mining Massive Datasets Many real-world problems Web Search and Text Mining Billions
More informationCSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection
CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationGraphs (Part II) Shannon Quinn
Graphs (Part II) Shannon Quinn (with thanks to William Cohen and Aapo Kyrola of CMU, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University) Parallel Graph Computation Distributed computation
More informationCS6716 Pattern Recognition
CS6716 Pattern Recognition Prototype Methods Aaron Bobick School of Interactive Computing Administrivia Problem 2b was extended to March 25. Done? PS3 will be out this real soon (tonight) due April 10.
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 3/6/2012 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 In many data mining
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/25/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3 In many data mining
More informationMining Social Network Graphs
Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be
More informationCSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection
CSE 258 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationCSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection
CSE 158 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationMulti-view object segmentation in space and time. Abdelaziz Djelouah, Jean Sebastien Franco, Edmond Boyer
Multi-view object segmentation in space and time Abdelaziz Djelouah, Jean Sebastien Franco, Edmond Boyer Outline Addressed problem Method Results and Conclusion Outline Addressed problem Method Results
More information1.2 Round-off Errors and Computer Arithmetic
1.2 Round-off Errors and Computer Arithmetic 1 In a computer model, a memory storage unit word is used to store a number. A word has only a finite number of bits. These facts imply: 1. Only a small set
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu /2/8 Jure Leskovec, Stanford CS246: Mining Massive Datasets 2 Task: Given a large number (N in the millions or
More informationThe ABC s of Web Site Evaluation
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz The ABC s of Web Site Evaluation by Kathy Schrock Digital Literacy by Paul Gilster Digital literacy is the ability to understand
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
We need your help with our research on human interpretable machine learning. Please complete a survey at http://stanford.io/1wpokco. It should be fun and take about 1min to complete. Thanks a lot for your
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationSequence alignment is an essential concept for bioinformatics, as most of our data analysis and interpretation techniques make use of it.
Sequence Alignments Overview Sequence alignment is an essential concept for bioinformatics, as most of our data analysis and interpretation techniques make use of it. Sequence alignment means arranging
More informationTraining Restricted Boltzmann Machines using Approximations to the Likelihood Gradient. Ali Mirzapour Paper Presentation - Deep Learning March 7 th
Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient Ali Mirzapour Paper Presentation - Deep Learning March 7 th 1 Outline of the Presentation Restricted Boltzmann Machine
More informationClassify My Social Contacts into Circles Stanford University CS224W Fall 2014
Classify My Social Contacts into Circles Stanford University CS224W Fall 2014 Amer Hammudi (SUNet ID: ahammudi) ahammudi@stanford.edu Darren Koh (SUNet: dtkoh) dtkoh@stanford.edu Jia Li (SUNet: jli14)
More informationT10/05-233r2 SAT - ATA Errors to SCSI Errors - Translation Map
To: T10 Technical Committee From: Wayne Bellamy (wayne.bellamy@hp.com), Hewlett Packard Date: June 30, 2005 Subject: T10/05-233r2 SAT - s to Errors - Translation Map Revision History Revision 0 (June 08,
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 3/12/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 3/12/2014 Jure
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu [Morris 2000] Based on 2 player coordination game 2 players
More informationWisconsin Retirement Testing Preparation
Wisconsin Retirement Testing Preparation The Wisconsin Retirement System (WRS) is changing its reporting requirements from annual to every pay period starting January 1, 2018. With that, there are many
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/10/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
More informationVisit MathNation.com or search "Math Nation" in your phone or tablet's app store to watch the videos that go along with this workbook!
Topic 1: Introduction to Angles - Part 1... 47 Topic 2: Introduction to Angles Part 2... 50 Topic 3: Angle Pairs Part 1... 53 Topic 4: Angle Pairs Part 2... 56 Topic 5: Special Types of Angle Pairs Formed
More informationCS Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu Training data 00 million ratings, 80,000 users, 7,770 movies 6 years of data: 000 00 Test data Last few ratings of
More informationnode2vec: Scalable Feature Learning for Networks
node2vec: Scalable Feature Learning for Networks A paper by Aditya Grover and Jure Leskovec, presented at Knowledge Discovery and Data Mining 16. 11/27/2018 Presented by: Dharvi Verma CS 848: Graph Database
More informationVisual Identity Guidelines. Abbreviated for Constituent Leagues
Visual Identity Guidelines Abbreviated for Constituent Leagues 1 Constituent League Logo The logo is available in a horizontal and vertical format. Either can be used depending on the best fit for a particular
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationThe vertex set is a finite nonempty set. The edge set may be empty, but otherwise its elements are two-element subsets of the vertex set.
Math 3336 Section 10.2 Graph terminology and Special Types of Graphs Definition: A graph is an object consisting of two sets called its vertex set and its edge set. The vertex set is a finite nonempty
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationLearning Undirected Models with Missing Data
Learning Undirected Models with Missing Data Sargur Srihari srihari@cedar.buffalo.edu 1 Topics Log-linear form of Markov Network The missing data parameter estimation problem Methods for missing data:
More informationNon Overlapping Communities
Non Overlapping Communities Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets Training data 100 million ratings, 80,000 users, 17,770
More informationHow to organize the Web?
How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second try: Web Search Information Retrieval attempts to find relevant docs in a small and trusted set Newspaper
More informationBRAND STANDARD GUIDELINES 2014
BRAND STANDARD GUIDELINES 2014 LOGO USAGE & TYPEFACES Logo Usage The Lackawanna College School of Petroleum & Natural Gas logo utilizes typography, two simple rule lines and the Lackawanna College graphic
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationMath 96--Radicals #1-- Simplify; Combine--page 1
Simplify; Combine--page 1 Part A Number Systems a. Whole Numbers = {0, 1, 2, 3,...} b. Integers = whole numbers and their opposites = {..., 3, 2, 1, 0, 1, 2, 3,...} c. Rational Numbers = quotient of integers
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams
More informationLesson 12: Angles Associated with Parallel Lines
Lesson 12 Lesson 12: Angles Associated with Parallel Lines Classwork Exploratory Challenge 1 In the figure below, LL 1 is not parallel to LL 2, and mm is a transversal. Use a protractor to measure angles
More informationCS281 Section 9: Graph Models and Practical MCMC
CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs
More informationLogistic Regression and Gradient Ascent
Logistic Regression and Gradient Ascent CS 349-02 (Machine Learning) April 0, 207 The perceptron algorithm has a couple of issues: () the predictions have no probabilistic interpretation or confidence
More informationLearning to Discover Social Circles in Ego Networks
Learning to Discover Social Circles in Ego Networks Julian McAuley Stanford jmcauley@cs.stanford.edu Jure Leskovec Stanford jure@cs.stanford.edu Abstract Our personal social networks are big and cluttered,
More informationSocial-Network Graphs
Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationBRANDING AND STYLE GUIDELINES
BRANDING AND STYLE GUIDELINES INTRODUCTION The Dodd family brand is designed for clarity of communication and consistency within departments. Bold colors and photographs are set on simple and clean backdrops
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationrecruitment Logo Typography Colourways Mechanism Usage Pip Recruitment Brand Toolkit
Logo Typography Colourways Mechanism Usage Primary; Secondary; Silhouette; Favicon; Additional Notes; Where possible, use the logo with the striped mechanism behind. Only when it is required to be stripped
More informationOverlapping Communities
Yangyang Hou, Mu Wang, Yongyang Yu Purdue Univiersity Department of Computer Science April 25, 2013 Overview Datasets Algorithm I Algorithm II Algorithm III Evaluation Overview Graph models of many real
More informationIntroduction to Deep Learning
ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)
More informationLink Analysis. Hongning Wang
Link Analysis Hongning Wang CS@UVa Structured v.s. unstructured data Our claim before IR v.s. DB = unstructured data v.s. structured data As a result, we have assumed Document = a sequence of words Query
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationInformation Networks: PageRank
Information Networks: PageRank Web Science (VU) (706.716) Elisabeth Lex ISDS, TU Graz June 18, 2018 Elisabeth Lex (ISDS, TU Graz) Links June 18, 2018 1 / 38 Repetition Information Networks Shape of the
More informationExpectation Maximization: Inferring model parameters and class labels
Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1 2/27/2017 Jumble of unlabeled images HISTOGRAM blue
More informationGraph Mining: Overview of different graph models
Graph Mining: Overview of different graph models Davide Mottin, Konstantina Lazaridou Hasso Plattner Institute Graph Mining course Winter Semester 2016 Lecture road Anomaly detection (previous lecture)
More informationSimilar Polygons Date: Per:
Math 2 Unit 6 Worksheet 1 Name: Similar Polygons Date: Per: [1-2] List the pairs of congruent angles and the extended proportion that relates the corresponding sides for the similar polygons. 1. AA BB
More informationBrand Guidelines October, 2014
Brand Guidelines October, 2014 Contents 1 Logo 2 Graphical Elements 3 Icons 4 Typography 5 Colors 6 Stationery 7 Social Media 8 Templates 9 Product Line logos Brand Guidelines Page 2 1) Logo Logo: Overview
More informationSection 6: Triangles Part 1
Section 6: Triangles Part 1 Topic 1: Introduction to Triangles Part 1... 125 Topic 2: Introduction to Triangles Part 2... 127 Topic 3: rea and Perimeter in the Coordinate Plane Part 1... 130 Topic 4: rea
More informationNaïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict
More informationLecture 1.3 Basic projective geometry. Thomas Opsahl
Lecture 1.3 Basic projective geometr Thomas Opsahl Motivation For the pinhole camera, the correspondence between observed 3D points in the world and D points in the captured image is given b straight lines
More informationDS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationPracticeAdmin Identity Guide. Last Updated 4/27/2015 Created by Vanessa Street
PracticeAdmin Identity Guide Last Updated 4/27/2015 Created by Vanessa Street About PracticeAdmin Mission At PracticeAdmin, we simplify the complex process of medical billing by providing healthcare professionals
More informationSegmentation: Clustering, Graph Cut and EM
Segmentation: Clustering, Graph Cut and EM Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 yingwu@northwestern.edu http://www.eecs.northwestern.edu/~yingwu
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationExpectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University
Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate
More informationCS3600 SYSTEMS AND NETWORKS
CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 15: Networking overview Prof. (amislove@ccs.neu.edu) What is a network? 2 What is a network? Wikipedia: A telecommunications network is a network
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationDatabases 2 (VU) ( )
Databases 2 (VU) (707.030) Map-Reduce Denis Helic KMI, TU Graz Nov 4, 2013 Denis Helic (KMI, TU Graz) Map-Reduce Nov 4, 2013 1 / 90 Outline 1 Motivation 2 Large Scale Computation 3 Map-Reduce 4 Environment
More informationHow to Register for Summer Camp. A Tutorial
How to Register for Summer Camp A Tutorial 1. Upon arriving at our website (https://flightcamp.ou.edu/), the very first step is logging in. Please click the Login link in the top left corner of the page
More informationRandom Graph Model; parameterization 2
Agenda Random Graphs Recap giant component and small world statistics problems: degree distribution and triangles Recall that a graph G = (V, E) consists of a set of vertices V and a set of edges E V V.
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationMulticast Tree Aggregation in Large Domains
Multicast Tree Aggregation in Large Domains Joanna Moulierac 1 Alexandre Guitton 2 and Miklós Molnár 1 IRISA/University of Rennes I 502 Rennes France 2 Birkbeck College University of London England IRISA/INSA
More informationMedia Kit & Brand Guidelines
Media Kit & Brand Guidelines Generation XYZ Generation X 1960-1980 Generation Y 1977-2004 Generation Z Early 2001 - Present Named after Douglas Coupland s novel Generation X: Tales from an Accelerated
More informationCOMPUTER GRAPHICS COURSE. LuxRender. Light Transport Foundations
COMPUTER GRAPHICS COURSE LuxRender Light Transport Foundations Georgios Papaioannou - 2015 Light Transport Light is emitted at the light sources and scattered around a 3D environment in a practically infinite
More informationEureka Math. Grade 7, Module 6. Student File_A. Contains copy-ready classwork and homework
A Story of Ratios Eureka Math Grade 7, Module 6 Student File_A Contains copy-ready classwork and homework Published by the non-profit Great Minds. Copyright 2015 Great Minds. No part of this work may be
More informationDATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm
DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)
More informationParameter Estimation. Learning From Data: MLE. Parameter Estimation. Likelihood. Maximum Likelihood Parameter Estimation. Likelihood Function 12/1/16
Learning From Data: MLE Maximum Estimators Common approach in statistics: use a parametric model of data: Assume data set: Bin(n, p), Poisson( ), N(µ, exp( ) Uniform(a, b) 2 ) But parameters are unknown!!!
More informationIBL and clustering. Relationship of IBL with CBR
IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationLogis&c Regression. Aar$ Singh & Barnabas Poczos. Machine Learning / Jan 28, 2014
Logis&c Regression Aar$ Singh & Barnabas Poczos Machine Learning 10-701/15-781 Jan 28, 2014 Linear Regression & Linear Classifica&on Weight Height Linear fit Linear decision boundary 2 Naïve Bayes Recap
More informationConsistency Theory in Machine Learning
Consistency Theory in Machine Learning Wei Gao ( 高尉 ) Learning And Mining from DatA (LAMDA) National Key Laboratory for Novel Software Technology Nanjing University Machine learning Training data learn
More informationA Joint Congestion Control, Routing, and Scheduling Algorithm in Multihop Wireless Networks with Heterogeneous Flows
1 A Joint Congestion Control, Routing, and Scheduling Algorithm in Multihop Wireless Networks with Heterogeneous Flows Phuong Luu Vo, Nguyen H. Tran, Choong Seon Hong, *KiJoon Chae Kyung H University,
More informationTranont Mission Statement. Tranont Vision Statement. Change the world s economy, one household at a time.
STYLE GUIDE Tranont Mission Statement Change the world s economy, one household at a time. Tranont Vision Statement We offer individuals world class financial education and training, financial management
More informationDS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu Location: AK 232 Fall 2016 High Dimensional Data v Given a cloud of data points we want to understand
More informationLecture 11: E-M and MeanShift. CAP 5415 Fall 2007
Lecture 11: E-M and MeanShift CAP 5415 Fall 2007 Review on Segmentation by Clustering Each Pixel Data Vector Example (From Comanciu and Meer) Review of k-means Let's find three clusters in this data These
More information