Data-Intensive Computing with MapReduce
|
|
- Rolf Davis
- 5 years ago
- Views:
Transcription
1 Data-Intensive Computing with MapReduce Session 7: Clustering and Classification Jimmy Lin University of Maryland Thursday, March 7, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See for details
2 Today s Agenda Personalized PageRank Clustering Classification
3 Clustering Source: Wikipedia (Star cluster)
4 Problem Setup Arrange items into clusters l High similarity between objects in the same cluster l Low similarity between objects in different clusters Cluster labeling is a separate problem
5 Applications Exploratory analysis of large collections of objects Image segmentation Recommender systems Cluster hypothesis in information retrieval Computational biology and bioinformatics Pre-processing for many other algorithms
6 Three Approaches Hierarchical K-Means Gaussian Mixture Models
7 Hierarchical Agglomerative Clustering Start with each document in its own cluster Until there is only one cluster: l Find the two clusters c i and c j, that are most similar l Replace c i and c j with a single cluster c i c j The history of merges forms the hierarchy
8 HAC in Action A B C D E F G H
9 Cluster Merging Which two clusters do we merge? What s the similarity between two clusters? l Single Link: similarity of two most similar members l Complete Link: similarity of two least similar members l Group Average: average similarity between members
10 Link Functions Single link: l Uses maximum similarity of pairs: sim(c i,c j ) = l Can result in straggly (long and thin) clusters due to chaining effect Complete link: l Use minimum similarity of pairs: sim(c i,c j )= max sim(x, y) x2c i,y2c j min sim(x, y) x2c i,y2c j l Makes more tight spherical clusters
11 MapReduce Implementation What s the inherent challenge? One possible approach: l Iteratively use LSH to group together similar items l When dataset is small enough, run HAC in memory on a single machine l Observation: structure at the leaves is not very important
12 K-Means Algorithm Let d be the distance between documents Define the centroid of a cluster to be: µ(c) = 1 c X x x2c Select k random instances {s 1, s 2, s k } as seeds. Until clusters converge: l Assign each instance x i to the cluster c j such that d(x i, s j ) is minimal l Update the seeds to the centroid of each cluster l For each cluster c j, s j = µ(c j )
13 K-Means Clustering Example Pick seeds Reassign clusters Compute centroids Reassign clusters Compute centroids Reassign clusters Converged!
14 Basic MapReduce Implementation 1: class Mapper 2: method Configure() 3: c LoadClusters() 4: method Map(id i, pointp) 5: n NearestClusterID(clusters c, pointp) 6: p ExtendPoint(point p) (Just a clever way to keep 7: Emit(clusterid n, pointp) track of denominator) 1: class Reducer 2: method Reduce(clusterid n, points[p 1, p 2,...]) 3: s InitPointSum() 4: for all point p 2 points do 5: s s + p 6: m ComputeCentroid(point s) 7: Emit(clusterid n, centroid m)
15 MapReduce Implementation w/ IMC 1: class Mapper 2: method Configure() 3: c LoadClusters() 4: H InitAssociativeArray() 5: method Map(id i, pointp) 6: n NearestClusterID(clusters c, pointp) 7: p ExtendPoint(point p) 8: H{n} H{n} + p 9: method Close() 10: for all clusterid n 2 H do 11: Emit(clusterid n, pointh{n}) 1: class Reducer 2: method Reduce(clusterid n, points[p 1, p 2,...]) 3: s InitPointSum() 4: for all point p 2 points do 5: s s + p 6: m ComputeCentroid(point s) 7: Emit(clusterid n, centroid m)
16 Implementation Notes Standard setup of iterative MapReduce algorithms l Driver program sets up MapReduce job l Waits for completion l Checks for convergence l Repeats if necessary Must be able keep cluster centroids in memory l With large k, large feature spaces, potentially an issue l Memory requirements of centroids grow over time! Variant: k-medoids
17 Clustering w/ Gaussian Mixture Models Model data as a mixture of Gaussians Given data, recover model parameters Source: Wikipedia (Cluster analysis)
18 Gaussian Distributions Univariate Gaussian (i.e., Normal): p(x; µ, 2 )= 1 p 2 1 exp (x µ)2 2 2 l A random variable with such a distribution we write as: x N (µ, 2 ) Multivariate Gaussian: p(x; µ, ) = l A vector-value random variable with such a distribution we write as: x N (µ, ) 1 exp (2 ) n/2 1/2 1 2 (x µ)t 1 (x µ)
19 Univariate Gaussian Source: Wikipedia (Normal Distribution)
20 Multivariate Gaussians apple apple apple apple µ = = Figure 2: µ = = e figure on the left shows a heatmap indicating [ ] values of the density function fo 3 is-aligned multivariate Gaussian with mean µ = and diagonal covariance matrix ] Source: Lecture notes by Chuong B. Do (IIT Delhi)
21 Gaussian Mixture Models Model parameters l Number of components: K l Mixing weight vector: l For each Gaussian, mean and covariance matrix: Varying constraints on co-variance matrices l Spherical vs. diagonal vs. full l Tied vs. untied µ 1:K 1:K
22 Learning for Simple Univariate Case Problem setup: l Given number of components: l Given points: x 1:N l Learn parameters:,µ 1:K, Model selection criterion: maximize likelihood of data K 2 1:K l Introduce indicator variables: 1 if xn is in cluster k z n,k = 0 otherwise l Likelihood of the data: p(x 1:N,z 1:N,1:K µ 1:K, 2 1:K, )
23 EM to the Rescue! We re faced with this: p(x 1:N,z 1:N,1:K µ 1:K, l It d be a lot easier if we knew the z s! Expectation Maximization l Guess the model parameters 2 1:K, ) l E-step: Compute posterior distribution over latent (hidden) variables given the model parameters l M-step: Update model parameters using posterior distribution computed in the E-step l Iterate until convergence
24
25 EM for Univariate GMMs Initialize: Iterate:,µ 1:K, 2 1:K l E-step: compute expectation of z variables E[z n,k ]= 2 N (x n µ k, k ) P k k N (x 0 n µ k 0, 2 k 0) k 0 l M-step: compute new model parameters k = 1 X z n,k N n 1 X µ k = P n z z n,k x n n,k n 2 1 X k = P n z z n,k x n µ k 2 n,k n
26 MapReduce Implementation x 1 z 1,1 z 1,2 z 1,K Map E[z n,k ]= 2 N (x n µ k, k ) P k k N (x 0 n µ k 0, 2 k 0) k 0 x 2 x 3 z 2,1 z 2,1 z 2,2 z 2,3 z 2,K z 2,K x N z N,1 z N,2 z N,K Reduce k = 1 X z n,k N n 1 X µ k = P n z z n,k x n n,k n 2 1 X k = P n z z n,k x n µ k 2 n,k n
27 K-Means vs. GMMs K-Means GMM Map Compute distance of points to centroids E-step: compute expectation of z indicator variables Reduce Recompute new centroids M-step: update values of model parameters
28 Summary Hierarchical clustering l Difficult to implement in MapReduce K-Means l Straightforward implementation in MapReduce Gaussian Mixture Models l Implementation conceptually similar to k-means, more bookkeeping
29 Classification Source: Wikipedia (Sorting)
30 Supervised Machine Learning The generic problem of function induction given sample instances of input and output l Classification: output draws from finite discrete labels l Regression: output is a continuous value Focus here on supervised classification l Suffices to illustrate large-scale machine learning This is not meant to be an exhaustive treatment of machine learning!
31 Applications Spam detection Content (e.g., movie) classification POS tagging Friendship recommendation Document ranking Many, many more!
32 Supervised Binary Classification Restrict output label to be binary l Yes/No l 1/0 Binary classifiers form a primitive building block for multi-class problems l One vs. rest classifier ensembles l Classifier cascades
33 Limits of Supervised Classification? Why is this a big data problem? l Isn t gathering labels a serious bottleneck? Solution: user behavior logs l Learning to rank l Computational advertising l Link recommendation The virtuous cycle of data-driven products
34 The Task Given D = {(x i,y i )} n i x i =[x 1,x 2,x 3,...,x d ] y 2 {0, 1} label (sparse) feature vector Induce f : X! Y l Such that loss is minimized 1 nx `(f(x i ),y i ) n i=0 Typically, consider functions of a parametric form: 1 arg min n nx `(f(x i ; ),y i ) i=0 loss function model parameters
35 Key insight: machine learning as an optimization problem! (closed form solutions generally not possible)
36 Gradient Descent: Preliminaries Rewrite: 1 arg min n Compute gradient: nx `(f(x i ; ),y i ) i=0 l Points to fastest increasing direction ) rl( ) d arg min L( ) So, at any point: * b=a rl(a) L(a) L(b) * caveats
37 Gradient Descent: Iterative Update Start at an arbitrary point, iteratively update: (t+1) (t) (t) rl( (t) ) We have: L( (0) ) L( (1) ) L( (2) )... Lots of details: l Figuring out the step size l Getting stuck in local minima l Convergence rate l
38 Gradient Descent (t+1) (t) (t) 1 n Repeat until convergence: nx r`(f(x i ; (t) ),y i ) i=0
39 Intuition behind the math `(x) d dx` r` (t+1) (t) (t) 1 n New weights Old weights nx r`(f(x i ; (t) ),y i ) i=0 Update based on gradient
40
41 Gradient Descent (t+1) (t) (t) 1 n nx r`(f(x i ; (t) ),y i ) i=0 Source: Wikipedia (Hills)
42 Lots More Details Gradient descent is a first order optimization technique l Often, slow convergence l Conjugate techniques accelerate convergence Newton and quasi-newton methods: l Intuition: Taylor expansion f(x + x) =f(x)+f 0 (x) x f 00 (x) x 2 l Requires the Hessian (square matrix of second order partial derivatives): impractical to fully compute
43 Logistic Regression Source: Wikipedia (Hammer)
44 Logistic Regression: Preliminaries Given Let s define: Interpretation: D = {(x i,y i )} n i x i =[x 1,x 2,x 3,...,x d ] y 2 {0, 1} f(x; w) : R d! {0, 1} 1ifw x t f(x; w) = 0ifw x <t apple Pr (y =1 x) ln =w x Pr (y =0 x) apple Pr (y =1 x) ln =w x 1 Pr (y =1 x)
45 Relation to the Logistic Function After some algebra: Pr (y =1 x) = Pr (y =0 x) = ew x 1+e w x 1 1+e w x The logistic function: f(z) = ez e z logistic(z) z
46 Training an LR Classifier Maximize the conditional likelihood: arg max w Define the objective in terms of conditional log likelihood: L(w) = l We know l Substituting: nx L(w) = ny Pr(y i x i, w) i=1 nx ln Pr(y i x i, w) i=1 y 2 {0, 1} so: Pr(y x, w) = Pr(y =1 x, w) y (1 y) [1 Pr(y =0 x, w)] i=1 y i ln Pr(y i =1 x i, w) + (1 y i )lnpr(y i =0 x i, w)
47 LR Classifier Update Rule Take the derivative: L(w) = nx i=1 X n i=0 General form for update rule: Final update rule: y i ln Pr(y i =1 x i, w) + (1 y i )lnpr(y i =0 x i, w) x i y i Pr(y i =1 x i, w) w (t+1) w (t) + (t) r w L(w (t) ) @w d w (t+1) i w (t) i + (t) nx j=0 x j,i y j Pr(y j =1 x j, w (t) )
48 Lots more details Regularization Different loss functions Want more details? Take a real machine-learning course!
49 MapReduce Implementation (t+1) (t) (t) 1 n nx r`(f(x i ; (t) ),y i ) i=0 mappers single reducer compute partial gradient mapper mapper mapper mapper iterate until convergence reducer update model
50 Shortcomings Hadoop is bad at iterative algorithms l High job startup costs l Awkward to retain state across iterations High sensitivity to skew l Iteration speed bounded by slowest task Potentially poor cluster utilization l Must shuffle all data to a single reducer Some possible tradeoffs l Number of iterations vs. complexity of computation per iteration l E.g., L-BFGS: faster convergence, but more to compute
51 Gradient Descent Source: Wikipedia (Hills)
52 Stochastic Gradient Descent Source: Wikipedia (Water Slide)
53 Batch vs. Online Gradient Descent (t+1) (t) (t) 1 n i=0 batch learning: update model after considering all training instances Stochastic Gradient Descent (SGD) online learning: update model after considering each (randomly-selected) training instance In practice just as good! nx r`(f(x i ; (t) ),y i ) (t+1) (t) (t) r`(f(x; (t) ),y)
54 Practical Notes Most common implementation: l Randomly shuffle training instances l Stream instances through learner Single vs. multi-pass approaches Mini-batching as a middle ground between batch and stochastic gradient descent We ve solved the iteration problem! What about the single reducer problem?
55 Ensembles Source: Wikipedia (Orchestra)
56 Ensemble Learning Learn multiple models, combine results from different models to make prediction Why does it work? l If errors uncorrelated, multiple classifiers being wrong is less likely l Reduces the variance component of error A variety of different techniques: l Majority voting l Simple weighted voting: nx y = arg max k p k (y x) y2y l Model averaging l k=1
57 Practical Notes Common implementation: l Train classifiers on different input partitions of the data l Embarassingly parallel! Contrast with bagging Contrast with boosting
58 MapReduce Implementation (t+1) (t) (t) r`(f(x; (t) ),y) training data training data training data training data mapper mapper mapper mapper model model model model
59 MapReduce Implementation: Details Shuffling/resort training instances before learning Two possible implementations: l Mappers write model out as side data l Mappers emit model as intermediate output
60 Sentiment Analysis Case Study Lin and Kolcz, SIGMOD 2012 Binary polarity classification: {positive, negative} sentiment l Independently interesting task l Illustrates end-to-end flow l Use the emoticon trick to gather data Data l Test: 500k positive/500k negative tweets from 9/1/2011 l Training: {1m, 10m, 100m} instances from before (50/50 split) Features: Sliding window byte-4grams Models: l Logistic regression with SGD (L2 regularization) l Ensembles of various sizes (simple weighted voting)
61 m instances 10m instances 100m instances Ensembles with 10m examples better than 100m single classifier! Diminishing returns Accuracy 0.8 for free Number of Classifiers in Ensemble single classifier 10m ensembles 100m ensembles
62 Takeaway Lesson Big data recipe for problem solving l Simple technique l Simple features l Lots of data Usually works very well!
63 Today s Agenda Personalized PageRank Clustering Classification
64 Questions? Source: Wikipedia (Japanese rock garden)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 9: Data Mining (4/4) March 9, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationData-Intensive Computing with MapReduce
Data-Intensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationUniversity of Maryland. Tuesday, March 2, 2010
Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationGradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz
Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 9: Data Mining (3/4) March 7, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides
More informationJordan Boyd-Graber University of Maryland. Thursday, March 3, 2011
Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 9: Data Mining (3/4) March 8, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 5: Analyzing Graphs (2/2) February 2, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression
More informationBased on Raymond J. Mooney s slides
Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationK-Means Clustering 3/3/17
K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering
More information732A54/TDDE31 Big Data Analytics
732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks
More informationClustering algorithms
Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22 Table of contents 1 Supervised
More informationData-Intensive Computing with MapReduce
Data-Intensive Computing with MapReduce Session 6: Similar Item Detection Jimmy Lin University of Maryland Thursday, February 28, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationPartitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning
Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationUnsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning
Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationClustering. Partition unlabeled examples into disjoint subsets of clusters, such that:
Text Clustering 1 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: Examples within a cluster are very similar Examples in different clusters are very different Discover
More informationPractice Questions for Midterm
Practice Questions for Midterm - 10-605 Oct 14, 2015 (version 1) 10-605 Name: Fall 2015 Sample Questions Andrew ID: Time Limit: n/a Grade Table (for teacher use only) Question Points Score 1 6 2 6 3 15
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More information10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)
10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationGraph Algorithms. Revised based on the slides by Ruoming Kent State
Graph Algorithms Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationPerceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron Matt Gormley Lecture 5 Jan. 31, 2018 1 Q&A Q: We pick the best hyperparameters
More informationIntroduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering
Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical
More informationMachine Learning Lecture 3
Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationClustering web search results
Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationIBL and clustering. Relationship of IBL with CBR
IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed
More informationCS281 Section 3: Practical Optimization
CS281 Section 3: Practical Optimization David Duvenaud and Dougal Maclaurin Most parameter estimation problems in machine learning cannot be solved in closed form, so we often have to resort to numerical
More informationClustering. Image segmentation, document clustering, protein class discovery, compression
Clustering CS 444 Some material on these is slides borrowed from Andrew Moore's machine learning tutorials located at: Clustering The problem of grouping unlabeled data on the basis of similarity. A key
More informationComputational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions
Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................
More informationECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationhttp://www.xkcd.com/233/ Text Clustering David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Administrative 2 nd status reports Paper review
More informationCOMS 4771 Clustering. Nakul Verma
COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationGradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent
Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent Slide credit: http://sebastianruder.com/optimizing-gradient-descent/index.html#batchgradientdescent
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationINF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22
INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task
More informationMIA - Master on Artificial Intelligence
MIA - Master on Artificial Intelligence 1 Hierarchical Non-hierarchical Evaluation 1 Hierarchical Non-hierarchical Evaluation The Concept of, proximity, affinity, distance, difference, divergence We use
More informationLearning via Optimization
Lecture 7 1 Outline 1. Optimization Convexity 2. Linear regression in depth Locally weighted linear regression 3. Brief dips Logistic Regression [Stochastic] gradient ascent/descent Support Vector Machines
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationExpectation-Maximization. Nuno Vasconcelos ECE Department, UCSD
Expectation-Maximization Nuno Vasconcelos ECE Department, UCSD Plan for today last time we started talking about mixture models we introduced the main ideas behind EM to motivate EM, we looked at classification-maximization
More informationCSE 546 Machine Learning, Autumn 2013 Homework 2
CSE 546 Machine Learning, Autumn 2013 Homework 2 Due: Monday, October 28, beginning of class 1 Boosting [30 Points] We learned about boosting in lecture and the topic is covered in Murphy 16.4. On page
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationUnsupervised Learning
Unsupervised Learning Pierre Gaillard ENS Paris September 28, 2018 1 Supervised vs unsupervised learning Two main categories of machine learning algorithms: - Supervised learning: predict output Y from
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationFlat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017
Flat Clustering Slides are mostly from Hinrich Schütze March 7, 07 / 79 Overview Recap Clustering: Introduction 3 Clustering in IR 4 K-means 5 Evaluation 6 How many clusters? / 79 Outline Recap Clustering:
More informationAn Introduction to PDF Estimation and Clustering
Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 1 An Introduction to PDF Estimation and Clustering David Corrigan corrigad@tcd.ie Electrical and Electronic Engineering Dept., University
More informationAdministrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES
Administrative Machine learning: Unsupervised learning" Assignment 5 out soon David Kauchak cs311 Spring 2013 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Machine
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationAssociation Rule Mining and Clustering
Association Rule Mining and Clustering Lecture Outline: Classification vs. Association Rule Mining vs. Clustering Association Rule Mining Clustering Types of Clusters Clustering Algorithms Hierarchical:
More informationUnsupervised Learning. Pantelis P. Analytis. Introduction. Finding structure in graphs. Clustering analysis. Dimensionality reduction.
March 19, 2018 1 / 40 1 2 3 4 2 / 40 What s unsupervised learning? Most of the data available on the internet do not have labels. How can we make sense of it? 3 / 40 4 / 40 5 / 40 Organizing the web First
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationClustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B
Clustering in R d Clustering CSE 250B Two common uses of clustering: Vector quantization Find a finite set of representatives that provides good coverage of a complex, possibly infinite, high-dimensional
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationCLUSTERING. Quiz information. today 11/14/13% ! The second midterm quiz is on Thursday (11/21) ! In-class (75 minutes!)
CLUSTERING Quiz information! The second midterm quiz is on Thursday (11/21)! In-class (75 minutes!)! Allowed one two-sided (8.5x11) cheat sheet! Solutions for optional problems to HW5 posted today 1% Quiz
More informationCluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008
Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a
More informationk-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out
Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationSGN (4 cr) Chapter 11
SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining
More informationMachine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University K- Means + GMMs Clustering Readings: Murphy 25.5 Bishop 12.1, 12.3 HTF 14.3.0 Mitchell
More informationColorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.
Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Image Segmentation Some material for these slides comes from https://www.csd.uwo.ca/courses/cs4487a/
More informationCS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek
CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function
More informationNatural Language Processing
Natural Language Processing Machine Learning Potsdam, 26 April 2012 Saeedeh Momtazi Information Systems Group Introduction 2 Machine Learning Field of study that gives computers the ability to learn without
More informationHomework #4 Programming Assignment Due: 11:59 pm, November 4, 2018
CSCI 567, Fall 18 Haipeng Luo Homework #4 Programming Assignment Due: 11:59 pm, ovember 4, 2018 General instructions Your repository will have now a directory P4/. Please do not change the name of this
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Unsupervised learning Daniel Hennes 29.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Supervised learning Regression (linear
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 6: Flat Clustering Wiltrud Kessler & Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 0-- / 83
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More information