MapReduce ML & Clustering Algorithms

Size: px
Start display at page:

Download "MapReduce ML & Clustering Algorithms"

Transcription

1 MapReduce ML & Clustering Algorithms

2 Reminder MapReduce: A trade-off between ease of use & possible parallelism Graph Algorithms Approaches: Reduce input size (filtering) Graph specific optimizations (Pregel & Giraph) 2

3 Today Machine Learning More filtering -- reducing input size Machine Learning Optimizations & AllReduce Applications: k-means clustering 3

4 Machine Learning Definition: Fitting a function to the data 4

5 Machine Learning Definition: Fitting a function to the data Examples: Classification: Data (x,y) pairs x 2 R d,y = ±1 Temperature = 15, Pressure = 755mm, Cloud Cover = 90%. : No Rain Temperature = 17, Pressure = 760mm, Cloud Cover = 75% : No Rain Temperature = 23, Pressure = 766mm, Cloud Cover = 95% : Rain Temperature = 19, Pressure = 740mm, Cloud Cover = 100% :??? 5

6 Machine Learning Definition: Fitting a function to the data Examples: Classification. Regression: Data (x,y) pairs x 2 R d,y 2 R Temperature = 15, Pressure = 755mm, Cloud Cover = 90%. : 1mm Rain Temperature = 17, Pressure = 760mm, Cloud Cover = 75% : 0mm Rain Temperature = 23, Pressure = 766mm, Cloud Cover = 95% : 9mm Rain Temperature = 19, Pressure = 740mm, Cloud Cover = 100% :??? 6

7 Machine Learning Definition: Fitting a function to the data Examples: Classification. Regression. Clustering: x 2 R d Data, Goal: find a sensible grouping into groups k 7

8 Machine Learning Definition: Fitting a function to the data Examples: Classification. Regression. Clustering. Today: Regression & Clustering 8

9 Regression Data: Aarhus (yesterday) Temperature = 15, Pressure = 755mm, Cloud Cover = 90%. : 1mm Temperature = 17, Pressure = 760mm, Cloud Cover = 75% : 0mm Temperature = 23, Pressure = 766mm, Cloud Cover = 95% : 9mm Temperature = 11, Pressure = 740mm, Cloud Cover = 100% : 5mm Matrix Form: 0 x = C A y = C A Temperature Pressure Cloud Cover 9

10 Linear Regression x = C A y = C A y Approximate by a linear function of : Example: 0.05 weight on Temperature, 0 on pressure, 0.1 on Humidity x 10

11 Linear Regression x = C A y = C A Approximate by a linear function of : Example: 0.05 weight on Temperature, 0 on pressure, 0.1 on Humidity Predictions: y Find that minimizes the squared distance: x = = kx yk 2 11

12 Linear Regression x = C A y = C A Approximate by a linear function of : Example: 0.05 weight on Temperature, 0 on pressure, 0.1 on Humidity Predictions: Find that minimizes the squared distance: Very simple! y Is this complex enough to capture all of the data? x = = kx yk 2 12

13 Linear Regression x = C A y = C A Approximate by a linear function of : Example: 0.05 weight on Temperature, 0 on pressure, 0.1 on Humidity Predictions: Find that minimizes the squared distance: Very simple! y Is this complex enough to capture all of the data? Maybe if you have a lot of features x = = kx yk 2 13

14 Doing Regression Problem: X 2 R n d Y 2 R n 1 2 R d 1 kx Y k 2 Examples, labels: Find a set of weights that minimizes: 14

15 Doing Regression Problem: X 2 R n d Y 2 R n 1 2 R d 1 kx Y k 2 Examples, labels: Find a set of weights that minimizes: Exact Solution: =(X T X) 1 X T Y 15

16 Doing Regression Problem: X 2 R n d Y 2 R n 1 2 R d 1 kx Y k 2 Examples, labels: Find a set of weights that minimizes: Exact Solution: If the number of examples is large.....but the dimensionality is small Then: =(X T X) 1 X T Y n d (X T X) 2 R d d X T Y 2 R d 1 and are small.. 16

17 Doing Regression Problem: X 2 R n d Y 2 R n 1 2 R d 1 kx Y k 2 Examples, labels: Find a set of weights that minimizes: Exact Solution: If the number of examples is large.....but the dimensionality is small Then: =(X T X) 1 X T Y n d (X T X) 2 R d d X T Y 2 R d 1 and are small....and therefore fit onto a single machine 17

18 Doing Regression Problem: X 2 R n d Y 2 R n 1 2 R d 1 kx Y k 2 Examples, labels: Find a set of weights that minimizes: Exact Solution: If the number of examples is large.....but the dimensionality is small Then: =(X T X) 1 X T Y n d (X T X) 2 R d d X T Y 2 R d 1 and are small....and therefore fit onto a single machine X T X, X T Y Idea: Compute in parallel, finish on a single machine 18

19 Computation Idea: X T X, X T Y Compute in parallel, finish on a single machine 19

20 Computation Idea: X T X, X T Y Compute in parallel, finish on a single machine How hard? Computing Matrix-Matrix & Matrix-Vector products For square matrices: O p p n n p n n time m p M Machine memory Total memory O(1) m = n 3/4,M = n 3/2 when 20

21 Computation Idea: X T X, X T Y Compute in parallel, finish on a single machine How hard? Computing Matrix-Matrix & Matrix-Vector products For square matrices: O time and! p p n n p n n m p M p n n m p M Machine memory Total memory O(1) m = n 3/4,M = n 3/2 when 21

22 How far can you go? ML Theory Statistical query model Interact with data only via some that s averaged over all of the examples. f(x, Y )= 1 n This is trivial to parallelize f(x, y) X (x,y)2(x Y ) f(x, y) 22

23 Statistical Query Model What can you do using the statistical query model? Linear Regression Naive Bayes Logistic Regression Neural Networks Principle Component Analysis (PCA) 23

24 Statistical Query Model What can you do using the statistical query model? But required runtime per step... O(d 2 ) O(d) O(d 2 ) O(d) O(d 2 ) Linear Regression Naive Bayes Logistic Regression Neural Networks Principle Component Analysis (PCA) 24

25 Statistical Query Model What can you do using the statistical query model? But required runtime per step... O(d 2 ) O(d) O(d 2 ) O(d) O(d 2 ) Linear Regression Naive Bayes Logistic Regression Neural Networks Principle Component Analysis (PCA)...Applicable only to low dimensional spaces 25

26 Large Dimensionality What if the dimension is too high? Goal Minimize: J( ) =kx Y k 2 26

27 Large Dimensionality What if the dimension is too high? Goal Minimize: J( ) =kx Y k 2 greedy solution: gradient descent new = old + rj( old ) Single j J( ) =(y x ) x j 27

28 Batch Gradient Descent Given examples: 1. Compute gradient for every example for j J( ) =(y x ) x j Easy to parallelize! 28

29 Batch Gradient Descent Given examples: 1. Compute gradient for every example for every coordinate Easy to parallelize! 2. j J( ) =(y new(j) = old(j) + x ) x j nx i=1 (y (i) x (i) ) x (i) j 29

30 Batch Gradient Descent Given examples: 1. Compute gradient for every example for every coordinate Easy to parallelize! 2. j J( ) =(y new(j) = old(j) + x ) x j nx i=1 (y (i) x (i) ) x (i) j 3. Repeat until convergence Will converge given mild conditions on 30

31 Performance Batch Gradient Descent in MR: Easy to write But requires many many rounds to converge......this is inefficient in MapReduce O(1) Remember, aimed for rounds. 31

32 Performance Batch Gradient Descent in MR: Easy to write But requires many many rounds to converge......this is inefficient in MapReduce. O(1) Remember, aimed for rounds. Same problem exists sequentially: Batch Gradient Descent looks at all examples in every round! 32

33 Sequential Improvement What if we update the gradient after every example Read one example: (x (i),y (i) ) 33

34 Sequential Improvement What if we update the gradient after every example Read one example: (x (i),y (i) ) Update the parameter: new(j) = old(j) + (y (i) x (i) ) x (i) j 34

35 Sequential Improvement What if we update the gradient after every example Read one example: (x (i),y (i) ) Update the parameter: new(j) = old(j) + (y (i) x (i) ) x (i) j Repeat until changes are minor Again, converges to somewhere near the local minimum Known as Stochastic Gradient Descent 35

36 Stochastic GS & All-Reduce How to parallelize stochastic gradient descent? Main Loop: First compute new(j) = old(j) + (y (i) y (i) x (i) Then: perform the update x (i) ) x (i) j Requires all coordinates Easy to parallelize by coordinate 36

37 All-Reduce Optimizes taking a sum & propagating the updates. 37

38 All-Reduce Optimizes taking a sum & propagating the updates

39 All-Reduce Optimizes taking a sum & propagating the updates

40 All-Reduce Optimizes taking a sum & propagating the updates

41 All-Reduce Optimizes taking a sum & propagating the updates

42 All-Reduce Optimizes taking a sum & propagating the updates

43 All-Reduce Optimizes taking a sum & propagating the updates

44 All-Reduce Optimizes taking a sum & propagating the updates Very optimized! Can get a significant speed up over straightforward Hadoop Mostly maintain fault tolerance given by Hadoop Pipelining means nodes are never idle Delay propagation of gradient by a few rounds 44

45 All-Reduce Optimizes taking a sum & propagating the updates Very optimized! Can get a significant speed up over straightforward Hadoop Mostly maintain fault tolerance given by Hadoop Pipelining means nodes are never idle Delay propagation of gradient by a few rounds Gets good results! Can be made to work with other Statistical Query Algorithms 45

46 Regression Overview Two approaches: Exact Computation Works only if the dimension is small (quadratic algorithms on dimension allowed) Streaming style computation Works even if dimension is large AllReduce makes MapReduce more scalable 46

47 Today Machine Learning More filtering -- reducing input size Machine Learning Optimizations & AllReduce Applications: k-means clustering 47

48 Clustering Clustering: Group similar items together One of the oldest problems in CS: Thousands of papers Hundreds of algorithms 48

49 Lloyd s Method: k-means Initialize with random clusters 49

50 Lloyd s Method: k-means Assign each point to nearest center 50

51 Lloyd s Method: k-means Recompute optimum centers (means) 51

52 Lloyd s Method: k-means Repeat: Assign points to nearest center 52

53 Lloyd s Method: k-means Repeat: Recompute centers 53

54 Lloyd s Method: k-means Repeat... 54

55 Lloyd s Method: k-means Repeat...Until clustering does not change 55

56 Lloyd s Method: k-means Repeat...Until clustering does not change Total error reduced at every step - guaranteed to converge. 55

57 Lloyd s Method: k-means Repeat...Until clustering does not change Total error reduced at every step - guaranteed to converge. Minimizes: (X, C) = X x2x d(x, C) 2 56

58 k-means Initialization Random? 57

59 k-means Initialization Random? 58

60 k-means Initialization Random? A bad idea 59

61 k-means Initialization Random? A bad idea Even with many random restarts! 59

62 Easy Fix Select centers using a furthest point algorithm (2-approximation to k- Center clustering). 60

63 Easy Fix Select centers using a furthest point algorithm (2-approximation to k- Center clustering). 61

64 Easy Fix Select centers using a furthest point algorithm (2-approximation to k- Center clustering). 62

65 Easy Fix Select centers using a furthest point algorithm (2-approximation to k- Center clustering). 63

66 Easy Fix Select centers using a furthest point algorithm (2-approximation to k- Center clustering). 64

67 Sensitive to Outliers 65

68 Sensitive to Outliers 65

69 Sensitive to Outliers 65

70 Sensitive to Outliers 65

71 Sensitive to Outliers 66

72 k-means++ Interpolate between two methods. Give preference to further points. Let D(p) be the distance between p and the nearest cluster center. Sample next center proportionally to D (p). 67

73 k-means++ Interpolate between two methods. Give preference to further points. Let D(p) be the distance between p and the nearest cluster center. Sample next center proportionally to D (p). kmeans++: Select first point uniformly at random for (int i=1; i < k; ++i){ D (p) Select next point p with probability P ; UpdateDistances(); x D (p) } 68

74 k-means++ Interpolate between two methods. Give preference to further points. Let D(p) be the distance between p and the nearest cluster center. Sample next center proportionally to D (p). kmeans++: Select first point uniformly at random for (int i=1; i < k; ++i){ D (p) Select next point p with probability P ; UpdateDistances(); x D (p) } Original Lloyd s: Furthest Point: k-means++: =0 = 1 =2 69

75 k-means++ 70

76 k-means++ 70

77 k-means++ 70

78 k-means++ 70

79 k-means++ Theorem [AV 07]: k-means++ guarantees a (log k) approximation 71

80 Algorithm Initialization: kmeans++: Select first point uniformly at random for (int i=1; i < k; ++i){ D Select next point p with probability 2 (p) P ; UpdateDistances(); p D2 (p) } Very Sequential! Must update all distances before selecting next cluster 72

81 Goal: Simulate k-means++ k-means++: Is a soft greedy algorithm Adapt sample & prune technique Sample multiple points in each round In the end, prune back down to k points 73

82 k-means kmeans++: Select first point uniformly at random for (int i=1; i < k; ++i){ D Select next point p with probability 2 (p) P ; UpdateDistances(); p D2 (p) } } 74

83 k-means kmeans++: Select first point c uniformly at random for (int i=1; i < log`( (X, c)) ; ++i){ Select point p independently with probability k ` UpdateDistances(); } Prune to k points total by clustering the clusters } D (p) P x D (p) 75

84 k-means kmeans++: Select first point c uniformly at random for (int i=1; i < log`( (X, c)) ; ++i){ Select point p independently with probability k ` UpdateDistances(); } Prune to k points total by clustering the clusters } D (p) P x D (p) Independent selection Easy MR 76

85 k-means Oversampling Parameter kmeans++: Select first point c uniformly at random for (int i=1; i < log`( (X, c)) ; ++i){ Select point p independently with probability k ` UpdateDistances(); } Prune to k points total by clustering the clusters } D (p) P x D (p) Independent selection Easy MR 77

86 k-means Oversampling Parameter kmeans++: Select first point c uniformly at random for (int i=1; i < log`( (X, c)) ; ++i){ Select point p independently with probability k ` UpdateDistances(); } Prune to k points total by clustering the clusters } D (p) P x D (p) Independent selection Easy MR Re-clustering step 78

87 k-means : Analysis How Many Rounds? Theorem: After O(log`(n )) rounds, guarantee O(1) approximation In practice: fewer iterations are needed Need to re-cluster O(k` log`(n )) intermediate centers Discussion: Number of rounds independent of k Tradeoff between number of rounds and memory 79

88 How well does this work? Random Initialization cost 1e+16 1e+15 1e+14 1e+13 1e+12 KDD Dataset, k=65 l/k=1 l/k=2 l/k=4 l=1 l=2 l=4 k-means++ 1e k-means log # Rounds 80

89 Performance vs. k-means++ Even better on small datasets: 4600 points, 50 dimensions (SPAM) Accuracy: Time (iterations): 81

90 Conclusion: ML Three prevalent ideas: If the dimension is small: prune (in a smart way) If the dimension is large, figure out what to optimize All-Reduce For clustering, adapt methods by oversampling & pruning 82

91 Conclusion: MapReduce Overall: A robust implementation of the BSP model Easy to work with, easy to think about Things to Keep in Mind: The data will be skewed! For specific classes of problems, additional optimizations possible Graphs with Pregel, ML with All-Reduce Wisely sampling the input & using the sample gets you very far 83

92 Conclusion: MapReduce Overall: A robust implementation of the BSP model Easy to work with, easy to think about Things to Keep in Mind: The data will be skewed! For specific classes of problems, additional optimizations possible Graphs with Pregel, ML with All-Reduce Wisely sampling the input & using the sample gets you very far Apparently it never rains in Aarhus :-) 84

93 References Map-Reduce for Machine Learning on Multicore. Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Ng, Kunle Olukotun, NIPS Space-Round Tradeoffs for MapReduce Computations. Andrea Pietracaprina, Geppino Pucci, Matteo Riondato, Francesco Silvestri, Eli Upfal, ICS Efficient Noise-Tolerant Learning from Statistical Queries. Michael Kearns, STOC A Reliable Effective Terascale Linear Learning System. Alekh Agarwal, Olivier Chapelle, Miroslav Dudik, John Langford, ArXiv k-means++: The Advantages of Careful Seeding. David Arthur, S.V., SODA Scalable k-means++. Bahman Bahmani, Benjamin Moseley, Andrea Vattani, Ravi Kumar, S.V., VLDB

Templates. for scalable data analysis. 2 Synchronous Templates. Amr Ahmed, Alexander J Smola, Markus Weimer. Yahoo! Research & UC Berkeley & ANU

Templates. for scalable data analysis. 2 Synchronous Templates. Amr Ahmed, Alexander J Smola, Markus Weimer. Yahoo! Research & UC Berkeley & ANU Templates for scalable data analysis 2 Synchronous Templates Amr Ahmed, Alexander J Smola, Markus Weimer Yahoo! Research & UC Berkeley & ANU Running Example Inbox Spam Running Example Inbox Spam Spam Filter

More information

Analyzing Stochastic Gradient Descent for Some Non- Convex Problems

Analyzing Stochastic Gradient Descent for Some Non- Convex Problems Analyzing Stochastic Gradient Descent for Some Non- Convex Problems Christopher De Sa Soon at Cornell University cdesa@stanford.edu stanford.edu/~cdesa Kunle Olukotun Christopher Ré Stanford University

More information

Scalable K-Means++ Bahman Bahmani Stanford University

Scalable K-Means++ Bahman Bahmani Stanford University Scalable K-Means++ Bahman Bahmani Stanford University K-means Clustering Fundamental problem in data analysis and machine learning By far the most popular clustering algorithm used in scientific and industrial

More information

MULTICORE LEARNING ALGORITHM

MULTICORE LEARNING ALGORITHM MULTICORE LEARNING ALGORITHM CHENG-TAO CHU, YI-AN LIN, YUANYUAN YU 1. Summary The focus of our term project is to apply the map-reduce principle to a variety of machine learning algorithms that are computationally

More information

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint

More information

732A54/TDDE31 Big Data Analytics

732A54/TDDE31 Big Data Analytics 732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Clustering. (Part 2)

Clustering. (Part 2) Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning

More information

Lecture 25 November 26, 2013

Lecture 25 November 26, 2013 CS 229r: Algorithms for Big Data Fall 2013 Prof. Jelani Nelson Lecture 25 November 26, 2013 Scribe: Thomas Steinke 1 Overview Tody is the last lecture. We will finish our discussion of MapReduce by covering

More information

k-means, k-means++ Barna Saha March 8, 2016

k-means, k-means++ Barna Saha March 8, 2016 k-means, k-means++ Barna Saha March 8, 2016 K-Means: The Most Popular Clustering Algorithm k-means clustering problem is one of the oldest and most important problem. K-Means: The Most Popular Clustering

More information

A Brief Look at Optimization

A Brief Look at Optimization A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest

More information

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang Department of Computer Science, University of Houston, USA Abstract. We study the serial and parallel

More information

Scaled Machine Learning at Matroid

Scaled Machine Learning at Matroid Scaled Machine Learning at Matroid Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Machine Learning Pipeline Learning Algorithm Replicate model Data Trained Model Serve Model Repeat entire pipeline Scaling

More information

Parallel learning of content recommendations using map- reduce

Parallel learning of content recommendations using map- reduce Parallel learning of content recommendations using map- reduce Michael Percy Stanford University Abstract In this paper, machine learning within the map- reduce paradigm for ranking

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

What to come. There will be a few more topics we will cover on supervised learning

What to come. There will be a few more topics we will cover on supervised learning Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression

More information

Clustering. Unsupervised Learning

Clustering. Unsupervised Learning Clustering. Unsupervised Learning Maria-Florina Balcan 11/05/2018 Clustering, Informal Goals Goal: Automatically partition unlabeled data into groups of similar datapoints. Question: When and why would

More information

Behavioral Data Mining. Lecture 10 Kernel methods and SVMs

Behavioral Data Mining. Lecture 10 Kernel methods and SVMs Behavioral Data Mining Lecture 10 Kernel methods and SVMs Outline SVMs as large-margin linear classifiers Kernel methods SVM algorithms SVMs as large-margin classifiers margin The separating plane maximizes

More information

Scalable Data Analysis

Scalable Data Analysis Scalable Data Analysis David M. Blei April 26, 2012 1 Why? Olden days: Small data sets Careful experimental design Challenge: Squeeze as much as we can out of the data Modern data analysis: Very large

More information

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare COMP6237 Data Mining Data Mining & Machine Learning with Big Data Jonathon Hare jsh2@ecs.soton.ac.uk Contents Going to look at two case-studies looking at how we can make machine-learning algorithms work

More information

K-Means. Algorithmic Methods of Data Mining M. Sc. Data Science Sapienza University of Rome. Carlos Castillo

K-Means. Algorithmic Methods of Data Mining M. Sc. Data Science Sapienza University of Rome. Carlos Castillo K-Means Class Program University Semester Slides by Algorithmic Methods of Data Mining M. Sc. Data Science Sapienza University of Rome Fall 2017 Carlos Castillo http://chato.cl/ Sources: Mohammed J. Zaki,

More information

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture

More information

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server Q. Ho, J. Cipar, H. Cui, J.K. Kim, S. Lee, *P.B. Gibbons, G.A. Gibson, G.R. Ganger, E.P. Xing Carnegie Mellon University

More information

Distributed Machine Learning" on Spark

Distributed Machine Learning on Spark Distributed Machine Learning" on Spark Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Outline Data flow vs. traditional network programming Spark computing engine Optimization Example Matrix Computations

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017 3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

More information

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University K- Means + GMMs Clustering Readings: Murphy 25.5 Bishop 12.1, 12.3 HTF 14.3.0 Mitchell

More information

Webinar Series TMIP VISION

Webinar Series TMIP VISION Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing

More information

Topics in Machine Learning

Topics in Machine Learning Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur

More information

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Jesse Read 1, Albert Bifet 2, Bernhard Pfahringer 2, Geoff Holmes 2 1 Department of Signal Theory and Communications Universidad

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Supervised vs unsupervised clustering

Supervised vs unsupervised clustering Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

More information

PV211: Introduction to Information Retrieval

PV211: Introduction to Information Retrieval PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 15-1: Support Vector Machines Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University,

More information

Scaling Distributed Machine Learning

Scaling Distributed Machine Learning Scaling Distributed Machine Learning with System and Algorithm Co-design Mu Li Thesis Defense CSD, CMU Feb 2nd, 2017 nx min w f i (w) Distributed systems i=1 Large scale optimization methods Large-scale

More information

CS281 Section 3: Practical Optimization

CS281 Section 3: Practical Optimization CS281 Section 3: Practical Optimization David Duvenaud and Dougal Maclaurin Most parameter estimation problems in machine learning cannot be solved in closed form, so we often have to resort to numerical

More information

1 Training/Validation/Testing

1 Training/Validation/Testing CPSC 340 Final (Fall 2015) Name: Student Number: Please enter your information above, turn off cellphones, space yourselves out throughout the room, and wait until the official start of the exam to begin.

More information

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

More information

SUPPORT VECTOR MACHINES

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Today Reading AIMA 8.9 (SVMs) Goals Finish Backpropagation Support vector machines Backpropagation. Begin with randomly initialized weights 2. Apply the neural network to each training

More information

Machine Learning: Think Big and Parallel

Machine Learning: Think Big and Parallel Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

More information

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each

More information

CS 6453: Parameter Server. Soumya Basu March 7, 2017

CS 6453: Parameter Server. Soumya Basu March 7, 2017 CS 6453: Parameter Server Soumya Basu March 7, 2017 What is a Parameter Server? Server for large scale machine learning problems Machine learning tasks in a nutshell: Feature Extraction (1, 1, 1) (2, -1,

More information

Machine Learning (CSE 446): Unsupervised Learning

Machine Learning (CSE 446): Unsupervised Learning Machine Learning (CSE 446): Unsupervised Learning Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 19 Announcements HW2 posted. Due Feb 1. It is long. Start this week! Today:

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster

More information

Machine Learning Basics. Sargur N. Srihari

Machine Learning Basics. Sargur N. Srihari Machine Learning Basics Sargur N. srihari@cedar.buffalo.edu 1 Overview Deep learning is a specific type of ML Necessary to have a solid understanding of the basic principles of ML 2 Topics Stochastic Gradient

More information

Linear Regression Optimization

Linear Regression Optimization Gradient Descent Linear Regression Optimization Goal: Find w that minimizes f(w) f(w) = Xw y 2 2 Closed form solution exists Gradient Descent is iterative (Intuition: go downhill!) n w * w Scalar objective:

More information

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class

More information

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com Neural Networks These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides

More information

Practice Questions for Midterm

Practice Questions for Midterm Practice Questions for Midterm - 10-605 Oct 14, 2015 (version 1) 10-605 Name: Fall 2015 Sample Questions Andrew ID: Time Limit: n/a Grade Table (for teacher use only) Question Points Score 1 6 2 6 3 15

More information

5 Machine Learning Abstractions and Numerical Optimization

5 Machine Learning Abstractions and Numerical Optimization Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer

More information

Memory Bandwidth and Low Precision Computation. CS6787 Lecture 9 Fall 2017

Memory Bandwidth and Low Precision Computation. CS6787 Lecture 9 Fall 2017 Memory Bandwidth and Low Precision Computation CS6787 Lecture 9 Fall 2017 Memory as a Bottleneck So far, we ve just been talking about compute e.g. techniques to decrease the amount of compute by decreasing

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

Matrix Computations and " Neural Networks in Spark

Matrix Computations and  Neural Networks in Spark Matrix Computations and " Neural Networks in Spark Reza Zadeh Paper: http://arxiv.org/abs/1509.02256 Joint work with many folks on paper. @Reza_Zadeh http://reza-zadeh.com Training Neural Networks Datasets

More information

Optimization. Industrial AI Lab.

Optimization. Industrial AI Lab. Optimization Industrial AI Lab. Optimization An important tool in 1) Engineering problem solving and 2) Decision science People optimize Nature optimizes 2 Optimization People optimize (source: http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do)

More information

Graph-Parallel Problems. ML in the Context of Parallel Architectures

Graph-Parallel Problems. ML in the Context of Parallel Architectures Case Study 4: Collaborative Filtering Graph-Parallel Problems Synchronous v. Asynchronous Computation Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 20 th, 2014

More information

Automatic Scaling Iterative Computations. Aug. 7 th, 2012

Automatic Scaling Iterative Computations. Aug. 7 th, 2012 Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th, 2012 1 What are Non-Iterative Computations? Non-iterative computation flow Directed Acyclic Examples Batch style analytics

More information

Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev

Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev http://grigory.us Appeared in STOC 2014, joint work with Alexandr Andoni, Krzysztof Onak and Aleksandar Nikolov. The Big Data Theory

More information

modern database systems lecture 10 : large-scale graph processing

modern database systems lecture 10 : large-scale graph processing modern database systems lecture 1 : large-scale graph processing Aristides Gionis spring 18 timeline today : homework is due march 6 : homework out april 5, 9-1 : final exam april : homework due graphs

More information

Determining the k in k-means with MapReduce

Determining the k in k-means with MapReduce Algorithms for MapReduce and Beyond 2014 Determining the k in k-means with MapReduce Thibault Debatty, Pietro Michiardi, Wim Mees & Olivier Thonnard Clustering & k-means Clustering K-means [Stuart P. Lloyd.

More information

Asynchronous Parallel Stochastic Gradient Descent. A Numeric Core for Scalable Distributed Machine Learning Algorithms

Asynchronous Parallel Stochastic Gradient Descent. A Numeric Core for Scalable Distributed Machine Learning Algorithms Asynchronous Parallel Stochastic Gradient Descent A Numeric Core for Scalable Distributed Machine Learning Algorithms J. Keuper and F.-J. Pfreundt Competence Center High Performance Computing Fraunhofer

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 16: Finale Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

Research challenges in data-intensive computing The Stratosphere Project Apache Flink Research challenges in data-intensive computing The Stratosphere Project Apache Flink Seif Haridi KTH/SICS haridi@kth.se e2e-clouds.org Presented by: Seif Haridi May 2014 Research Areas Data-intensive

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

Distributed Computing with Spark

Distributed Computing with Spark Distributed Computing with Spark Reza Zadeh Thanks to Matei Zaharia Outline Data flow vs. traditional network programming Limitations of MapReduce Spark computing engine Numerical computing on Spark Ongoing

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017 CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia MapReduce Spark Some slides are adapted from those of Jeff Dean and Matei Zaharia What have we learnt so far? Distributed storage systems consistency semantics protocols for fault tolerance Paxos, Raft,

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

Beyond Online Aggregation: Parallel and Incremental Data Mining with MapReduce Joos-Hendrik Böse*, Artur Andrzejak, Mikael Högqvist

Beyond Online Aggregation: Parallel and Incremental Data Mining with MapReduce Joos-Hendrik Böse*, Artur Andrzejak, Mikael Högqvist Beyond Online Aggregation: Parallel and Incremental Data Mining with MapReduce Joos-Hendrik Böse*, Artur Andrzejak, Mikael Högqvist *ICSI Berkeley Zuse Institut Berlin 4/26/2010 Joos-Hendrik Boese Slide

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

An Introduction to Apache Spark

An Introduction to Apache Spark An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations

More information

Clustering Documents. Case Study 2: Document Retrieval

Clustering Documents. Case Study 2: Document Retrieval Case Study 2: Document Retrieval Clustering Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 21 th, 2015 Sham Kakade 2016 1 Document Retrieval Goal: Retrieve

More information

Distributed Computing with Spark and MapReduce

Distributed Computing with Spark and MapReduce Distributed Computing with Spark and MapReduce Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Traditional Network Programming Message-passing between nodes (e.g. MPI) Very difficult to do at scale:» How

More information

Online K-Means. Edo Liberty, Maxim Sviridenko, Ram Sriharsha

Online K-Means. Edo Liberty, Maxim Sviridenko, Ram Sriharsha Online K-Means Edo Liberty, Maxim Sviridenko, Ram Sriharsha K-Means definition We are given a set of points v 1,, v t,, v n in Euclidean space K-Means definition For each point we assign a cluster identifier

More information

Linear Regression and K-Nearest Neighbors 3/28/18

Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,

More information

Towards the world s fastest k-means algorithm

Towards the world s fastest k-means algorithm Greg Hamerly Associate Professor Computer Science Department Baylor University Joint work with Jonathan Drake May 15, 2014 Objective function and optimization Lloyd s algorithm 1 The k-means clustering

More information

Memory Bandwidth and Low Precision Computation. CS6787 Lecture 10 Fall 2018

Memory Bandwidth and Low Precision Computation. CS6787 Lecture 10 Fall 2018 Memory Bandwidth and Low Precision Computation CS6787 Lecture 10 Fall 2018 Memory as a Bottleneck So far, we ve just been talking about compute e.g. techniques to decrease the amount of compute by decreasing

More information

Clustering: Overview and K-means algorithm

Clustering: Overview and K-means algorithm Clustering: Overview and K-means algorithm Informal goal Given set of objects and measure of similarity between them, group similar objects together K-Means illustrations thanks to 2006 student Martin

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Combine the PA Algorithm with a Proximal Classifier

Combine the PA Algorithm with a Proximal Classifier Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU

More information

Unsupervised Learning Partitioning Methods

Unsupervised Learning Partitioning Methods Unsupervised Learning Partitioning Methods Road Map 1. Basic Concepts 2. K-Means 3. K-Medoids 4. CLARA & CLARANS Cluster Analysis Unsupervised learning (i.e., Class label is unknown) Group data to form

More information

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

2. Linear Regression and Gradient Descent

2. Linear Regression and Gradient Descent Pattern Recognition And Machine Learning - EPFL - Fall 2015 Emtiyaz Khan, Timur Bagautdinov, Carlos Becker, Ilija Bogunovic & Ksenia Konyushkova 2. Linear Regression and Gradient Descent 2.1 Goals The

More information

Divide and Conquer Kernel Ridge Regression

Divide and Conquer Kernel Ridge Regression Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem

More information

CSE 559A: Computer Vision

CSE 559A: Computer Vision CSE 559A: Computer Vision Fall 2018: T-R: 11:30-1pm @ Lopata 101 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao Xia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/

More information

CS 179 Lecture 16. Logistic Regression & Parallel SGD

CS 179 Lecture 16. Logistic Regression & Parallel SGD CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)

More information

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2 So far in the course Clustering Subhransu Maji : Machine Learning 2 April 2015 7 April 2015 Supervised learning: learning with a teacher You had training data which was (feature, label) pairs and the goal

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Big Data Technologies 2016/2017

Big Data Technologies 2016/2017 Big Data Technologies 2016/2017 Machine Learning and Big Data Matteo Matteucci matteo.matteucci@polimi.it Artificial Intelligence and Robotics Lab - Politecnico di Milano This lecture will not, or at least

More information

A survey of submodular functions maximization. Yao Zhang 03/19/2015

A survey of submodular functions maximization. Yao Zhang 03/19/2015 A survey of submodular functions maximization Yao Zhang 03/19/2015 Example Deploy sensors in the water distribution network to detect contamination F(S): the performance of the detection when a set S of

More information

Lecture 22 : Distributed Systems for ML

Lecture 22 : Distributed Systems for ML 10-708: Probabilistic Graphical Models, Spring 2017 Lecture 22 : Distributed Systems for ML Lecturer: Qirong Ho Scribes: Zihang Dai, Fan Yang 1 Introduction Big data has been very popular in recent years.

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval Case Study 2: Document Retrieval Clustering Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April, 2017 Sham Kakade 2017 1 Document Retrieval n Goal: Retrieve

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 22 2019 Outline Practical issues in Linear Regression Outliers Categorical variables Lab

More information

Convex and Distributed Optimization. Thomas Ropars

Convex and Distributed Optimization. Thomas Ropars >>> Presentation of this master2 course Convex and Distributed Optimization Franck Iutzeler Jérôme Malick Thomas Ropars Dmitry Grishchenko from LJK, the applied maths and computer science laboratory and

More information