ECS289: Scalable Machine Learning

 Rosemary Moody
 7 months ago
 Views:
Transcription
1 ECS289: Scalable Machine Learning ChoJui Hsieh UC Davis Sept 22, 2016
2 Course Information Website: ECS289G_Fall2016/main.html My office: Mathematical Sciences Building (MSB) 4232 Office hours: 1pm2pm Wednesday My This is a 4unit course
3 Course Information Goals: Understand the challenges in largescale machine learning. Understand stateoftheart approaches for addressing these challenges. Identify interesting open questions. Course Structure: Pick some important machine learning problems (classification, regression, recommender system,... ) Introduce the model Discuss the computational challenges Discuss algorithms to overcome these challenges. Prerequisites: Basic knowledge in linear algebra (matrix multiplication, inversion,... ) Basic knowledge in programming for the final project.
4 Grading Policy Class participation (15%) Paper presentation (35%) Final project (50%)
5 Paper presentation Form a group of 2 students. Read a NIPS, ICML, or KDD paper published in the past 3 years. The presentation should include: The problem to be solved in the paper Related work Previous approaches before this paper Weak points of those approaches The proposed approach Why the proposed algorithm is better. Algorithms (and theoretical guarantee) Any drawback of the proposed method? Send me the slides 3 days before the class; I will give some feedbacks to improve the slides Presentations on November 10, 15, 17, 22.
6 Final Project Topics include: Develop new algorithms or improve existing algorithms Implement parallel machine learning algorithms and test on large datasets Apply machine learning to some application Compare existing algorithms Survey of algorithms for a specific ML problem... Schedule: Final project proposal TBD Final project presentation 11/29, 12/1 Final project paper due 12/9
7 Syllabus Mathematical tools (optimization) Linear empirical risk minimization: classification and regression Matrix Completion Extreme classification Treebased algorithms (random forest, gradient boosted decision tree) Kernel methods Deep learning Ranking
8 What is Machine Learning? Train and test data are usually assumed to be iid samples from the same distribution
9 Training Linear SVM/regression: Linear hyperplane Kernel SVM/regression: Nonlinear hyperplane Decision tree, random forest Nearest Neighbor...
10 Prediction Learn a model that best explains the observed data as well as generalizes to unseen data Scalability Issues: Time & space complexity of the (Training) Learning Algorithm Size of the Model Time complexity of Prediction (for realtime applications)
11 A simple example Knearest neighbor classification Model size: storing all the training samples 1 billion samples, each reqruires 1 KBytes space 1000G memory Prediction time: Find the nearest training sample 1 billion samples, each distance evaluation requires 1 micro second 1000 secs per prediction
12 Topics in this course Classification Regression Matrix Completion (Recommender systems) Ranking Other Nonlinear Models
13 Machine Learning Problems: Classification Image classification Handwritten digit recognition Spam filters
14 Binary Classification Input: training samples {x 1, x 2,..., x n } and labels {y 1, y 2,..., y n } x i : ddimensional vector y i : +1 or 1 Output: A decision function f such that f (x i ) > 0 if y i = 1, f (x i ) < 0 if y i = 1
15 Feature generation for documents Bag of words features for documents: number of features = number of potential words 10,000
16 Feature generation for documents Bag of ngram features (n = 2): 10,000 words 10, potential features
17 Classification > 1 million dimensional space, > 1 billion training points
18 Scalability challenges Large number of features Large number of samples Data cannot fit into memory Splicesite: 10 million samples, 11 million features, > 1T memory Current solutions: Intellectually swap between memory and disk Online algorithms Parallel algorithms on distributed systems Other idea?
19 Challenges: large number of categories Multilabel (or multiclass) classification with large number of labels Image classification > labels Recommending tags for articles: millions of labels (tags)
20 Challenges: large number of categories Consider a problem with 1 million labels. Traditional approach: reduce to binary problems. Training: 1 million binary classification problems. Need 694 days if each binary problem can be solved in 1 minute Model size: 1 million models. Need 1 TB if each model requires 1MB. Prediction one testing data: 1 million binary prediction Need 1000 secs if each binary prediction needs 10 3 secs.
21 Machine Learning Problems: Regression Line fitting Stock price prediction Polynomial curve fitting (Figures from Dhillon et al)
22 (Figure from Dhillon et al) Machine Learning Problems: Recommender Systems Netflix Problem
23 Machine Learning Problems: Recommender Systems Collaborative Filtering (Figure from Dhillon et al)
24 Machine Learning Problems: Recommender Systems Latent Factor Model (Figure from Dhillon et al)
25 Machine Learning Problems: Recommender Systems Latent Factor Model (Figure from Dhillon et al)
26 Machine Learning Problems: Recommender Systems Latent Factor Model (Figure from Dhillon et al)
27 Recommender Systems: challenges Size of the matrix: billions of users, billions of items, >100 billions of observations Memory to store ratings: > 1200 GBytes How to incorporate Side information? User/Item profiles Temporal information, click sequence Prediction time: Recommend topk items to a user: Need to compute a row of a matrix: O(mk) time m > 1, 000, 000, 000, k > 500: need > 100 seconds Recommend items to all users: 100 billion seconds 3170 years
28 Different architectures Machine learning on different scales: Embedded systems: mobile devices, robotic systems,... Single computer: multiple cores, but limited (32G) memory with large (1T) disk Single computer with GPU(s) Multiple computers: data centers, computing clusters need communication between computers The best algorithm and model can be totally different
29 Machine learning on embedded devices Examples: mobile devices, robotic systems, auto cars,... Small memory (model compression to reduce model size) Real time response (fast prediction time) Need new (distributed) learning algorithms: Local (noniid) samples on each device Slow and unreliable network connection Need to consider power consumption Privacy issues
30 Machine learning on a single computer Disk access is expensive. Data can fit in memory: Can apply many existing optimization algorithms How to make the algorithms faster by exploiting the problem structure Data cannot fit in memory (full data stored in disk): Online updates: Processing one or few data points at a time Outofcore method: Load a block of data from disk to memory at a time Distributed systems
31 Machine learning on distributed systems Multiple computers, each with local memory and disk Intercomputer communication is slow Programming tools & computing models Message Passing Interface (MPI) Hadoop (mapreduce) Spark (mapreduce) Parameter servers
32 Popular topics in ML research
33 Topics in NIPS Figure from
34 Optimization (Almost) all the machine learning problems can be modeled as an optimization problem arg min f (θ) θ f : an estimator of prediction error θ: model parameter Find the best model to minimize prediction error Traditional optimization: Usually assume the objective function is convex Want to get a very accurate solution
35 Optimization for BigData Problems Parallel Optimization Synchronized vs asynchronous Different architecture: multicore shared memory, distributed systems, GPU Convergence rate analysis Stochastic/Online optimization Each iteration only uses one or a subset of training data Convergence rate: global bound & dependency on data size
36 Optimization for Complex Functions Nonconvex optimization: Examples: neural network, matrix decomposition,... Algorithms can converge to (1) global minimizer (2) local minimizer (3) saddle points Discrete optimization: Greedy algorithms for submodular optimization Submodular convexity for discrete function Relaxation to continuous optimization problems
37 Deep Neural Network Network design for different problems Scalability (parallel stochastic gradient descent) Geometry of the problem (local minimum, global minimum, saddle points... )
38 Matrix/Tensor Decomposition Learning lowdimensional embeddings (usually unsupervised) Netflix problem (recommender systems), word2vec New formulations, scalable optimization algorithms, and theoretical guarantee
39 Other Topics Bandit Reinforcement learning Topic modeling Extreme classification (Multiclass/multilabel with millions of labels) Graph mining Bayesian Clustering...
40 Coming up Next class: linear regression Questions?
ECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning ChoJui Hsieh UC Davis Oct 4, 2016 Outline Multicore v.s. multiprocessor Parallel Gradient Descent Parallel Stochastic Gradient Parallel Coordinate Descent Parallel
More informationCS 179 Lecture 16. Logistic Regression & Parallel SGD
CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)
More informationConstrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell
Constrained Convolutional Neural Networks for Weakly Supervised Segmentation Deepak Pathak, Philipp Krähenbühl and Trevor Darrell 1 Multiclass Image Segmentation Assign a class label to each pixel in
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationClass 6 LargeScale Image Classification
Class 6 LargeScale Image Classification Liangliang Cao, March 7, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationon learned visual embedding patrick pérez Allegro Workshop Inria RhônesAlpes 22 July 2015
on learned visual embedding patrick pérez Allegro Workshop Inria RhônesAlpes 22 July 2015 Vector visual representation Fixedsize image representation Highdim (100 100,000) Generic, unsupervised: BoW,
More informationGradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz
Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint
More informationApache SystemML Declarative Machine Learning
Apache Big Data Seville 2016 Apache SystemML Declarative Machine Learning Luciano Resende About Me Luciano Resende (lresende@apache.org) Architect and community liaison at Have been contributing to open
More informationScalable Machine Learning in R. with H2O
Scalable Machine Learning in R with H2O Erin LeDell @ledell DSC July 2016 Introduction Statistician & Machine Learning Scientist at H2O.ai in Mountain View, California, USA Ph.D. in Biostatistics with
More informationThe exam is closed book, closed notes except your onepage (twosided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your onepage (twosided) cheat sheet. No calculators or
More informationOpportunities and challenges in personalization of online hotel search
Opportunities and challenges in personalization of online hotel search David Zibriczky Data Science & Analytics Lead, User Profiling Introduction 2 Introduction About Mission: Helping the travelers to
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationCS 6453: Parameter Server. Soumya Basu March 7, 2017
CS 6453: Parameter Server Soumya Basu March 7, 2017 What is a Parameter Server? Server for large scale machine learning problems Machine learning tasks in a nutshell: Feature Extraction (1, 1, 1) (2, 1,
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationPredicting poverty from satellite imagery
Predicting poverty from satellite imagery Neal Jean, Michael Xie, Stefano Ermon Department of Computer Science Stanford University Matt Davis, Marshall Burke, David Lobell Department of Earth Systems Science
More informationMatrix Computations and " Neural Networks in Spark
Matrix Computations and " Neural Networks in Spark Reza Zadeh Paper: http://arxiv.org/abs/1509.02256 Joint work with many folks on paper. @Reza_Zadeh http://rezazadeh.com Training Neural Networks Datasets
More informationMachine Learning Basics. Sargur N. Srihari
Machine Learning Basics Sargur N. srihari@cedar.buffalo.edu 1 Overview Deep learning is a specific type of ML Necessary to have a solid understanding of the basic principles of ML 2 Topics Stochastic Gradient
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.618.12, 20.120.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationIntroduction to object recognition. Slides adapted from FeiFei Li, Rob Fergus, Antonio Torralba, and others
Introduction to object recognition Slides adapted from FeiFei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationMemory Bandwidth and Low Precision Computation. CS6787 Lecture 9 Fall 2017
Memory Bandwidth and Low Precision Computation CS6787 Lecture 9 Fall 2017 Memory as a Bottleneck So far, we ve just been talking about compute e.g. techniques to decrease the amount of compute by decreasing
More informationParallel Implementation of Deep Learning Using MPI
Parallel Implementation of Deep Learning Using MPI CSE633 Parallel Algorithms (Spring 2014) Instructor: Prof. Russ Miller Team #13: Tianle Ma Email: tianlema@buffalo.edu May 7, 2014 Content Introduction
More informationADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA
INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS
More informationBeyond Sliding Windows: Object Localization by Efficient Subwindow Search
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search Christoph H. Lampert, Matthew B. Blaschko, & Thomas Hofmann Max Planck Institute for Biological Cybernetics Tübingen, Germany Google,
More informationCreating a Recommender System. An Elasticsearch & Apache Spark approach
Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused
More informationDistributed Computing with Spark
Distributed Computing with Spark Reza Zadeh Thanks to Matei Zaharia Outline Data flow vs. traditional network programming Limitations of MapReduce Spark computing engine Numerical computing on Spark Ongoing
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationDistributed Machine Learning" on Spark
Distributed Machine Learning" on Spark Reza Zadeh @Reza_Zadeh http://rezazadeh.com Outline Data flow vs. traditional network programming Spark computing engine Optimization Example Matrix Computations
More informationA Taxonomy of SemiSupervised Learning Algorithms
A Taxonomy of SemiSupervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationCSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..
More informationConflict Graphs for Parallel Stochastic Gradient Descent
Conflict Graphs for Parallel Stochastic Gradient Descent Darshan Thaker*, Guneet Singh Dhillon* Abstract We present various methods for inducing a conflict graph in order to effectively parallelize Pegasos.
More informationCSC 4510 Machine Learning
4: Regression (con.nued) CSC 4510 Machine Learning Dr. Mary Angela Papalaskari Department of CompuBng Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ The slides in this presentabon
More informationComputational and Statistical Tradeoffs in VoIDriven Learning
ARO ARO MURI MURI on on Valuecentered Theory for for Adaptive Learning, Inference, Tracking, and and Exploitation Computational and Statistical Tradefs in VoIDriven Learning CoPI Michael Jordan University
More informationFeature Extractors. CS 188: Artificial Intelligence Fall NearestNeighbor Classification. The Perceptron Update Rule.
CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your
More informationThe exam is closed book, closed notes except your onepage (twosided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your onepage (twosided) cheat sheet. No calculators or
More informationMLlib and Distributing the " Singular Value Decomposition. Reza Zadeh
MLlib and Distributing the " Singular Value Decomposition Reza Zadeh Outline Example Invocations Benefits of Iterations Singular Value Decomposition Allpairs Similarity Computation MLlib + {Streaming,
More informationComputational Machine Learning, Fall 2015 Homework 4: stochastic gradient algorithms
Computational Machine Learning, Fall 2015 Homework 4: stochastic gradient algorithms Due: Tuesday, November 24th, 2015, before 11:59pm (submit via email) Preparation: install the software packages and
More informationCS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek
CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function
More informationFacial Expression Classification with Random Filters Feature Extraction
Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle
More informationTowards the world s fastest kmeans algorithm
Greg Hamerly Associate Professor Computer Science Department Baylor University Joint work with Jonathan Drake May 15, 2014 Objective function and optimization Lloyd s algorithm 1 The kmeans clustering
More informationLecture 11: Clustering Introduction and Projects Machine Learning
Lecture 11: Clustering Introduction and Projects Machine Learning Andrew Rosenberg March 12, 2010 1/1 Last Time Junction Tree Algorithm Efficient Marginals in Graphical Models 2/1 Today Clustering Project
More informationFall 2017 ECEN Special Topics in Data Mining and Analysis
Fall 2017 ECEN 689600 Special Topics in Data Mining and Analysis Nick Duffield Department of Electrical & Computer Engineering Teas A&M University Organization Organization Instructor: Nick Duffield,
More informationMachine Learning Lecture 9
Course Outline Machine Learning Lecture 9 Fundamentals ( weeks) Bayes Decision Theory Probability Density Estimation Nonlinear SVMs 19.05.013 Discriminative Approaches (5 weeks) Linear Discriminant Functions
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationHyperparameter optimization. CS6787 Lecture 6 Fall 2017
Hyperparameter optimization CS6787 Lecture 6 Fall 2017 Review We ve covered many methods Stochastic gradient descent Step size/learning rate, how long to run Minibatching Batch size Momentum Momentum
More informationOneclass Problems and Outlier Detection. 陶卿 中国科学院自动化研究所
Oneclass Problems and Outlier Detection 陶卿 Qing.tao@mail.ia.ac.cn 中国科学院自动化研究所 Applicationdriven Various kinds of detection problems: unexpected conditions in engineering; abnormalities in medical data,
More informationCoding for Random Projects
Coding for Random Projects CS 584: Big Data Analytics Material adapted from Li s talk at ICML 2014 (http://techtalks.tv/talks/codingforrandomprojections/61085/) Random Projections for HighDimensional
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationOptimization. 1. Optimization. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents
Optimization by Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Optimization II. 2. Solving Optimization Problems III. 3. How do we Find x f(x) = 0 IV.
More informationGeneral Instructions. Questions
CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These
More informationScalable Network Analysis
Inderjit S. Dhillon University of Texas at Austin COMAD, Ahmedabad, India Dec 20, 2013 Outline Unstructured Data  Scale & Diversity Evolving Networks Machine Learning Problems arising in Networks Recommender
More informationOptimization for Machine Learning
with a focus on proximal gradient descent algorithm Department of Computer Science and Engineering Outline 1 History & Trends 2 Proximal Gradient Descent 3 Three Applications A Brief History A. Convex
More informationUnmoderated realtime news trends extraction from World Wide Web using Apache Mahout
Unmoderated realtime news trends extraction from World Wide Web using Apache Mahout A Project Report Presented to Professor Rakesh Ranjan San Jose State University Spring 2011 By Kalaivanan Durairaj
More informationJeff Howbert Introduction to Machine Learning Winter
Collaborative Filtering Nearest es Neighbor Approach Jeff Howbert Introduction to Machine Learning Winter 2012 1 Bad news Netflix Prize data no longer available to public. Just after contest t ended d
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationDevelopment in Object Detection. Junyuan Lin May 4th
Development in Object Detection Junyuan Lin May 4th Line of Research [1] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection, CVPR 2005. HOG Feature template [2] P. Felzenszwalb,
More informationMachine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center
Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction
More informationData Science Bootcamp Curriculum. NYC Data Science Academy
Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, selfpaced online course. Access to parttime inperson courses hosted at NYC campus Machine Learning with R and Python Foundations
More informationCaseBased Reasoning. CS 188: Artificial Intelligence Fall NearestNeighbor Classification. Parametric / Nonparametric.
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley CaseBased Reasoning Similarity for classification Casebased reasoning Predict an instance
More informationORT EP R RCH A ESE R P A IDI! " #$$% &' (# $!"
R E S E A R C H R E P O R T IDIAP A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert a b Yoshua Bengio b IDIAP RR 0112 April 26, 2002 Samy Bengio a published in Neural Computation,
More informationCS2941 Final Project. Algorithms Comparison
CS2941 Final Project Algorithms Comparison Deep Learning Neural Network AdaBoost Random Forest Prepared By: Shuang Bi (24094630) Wenchang Zhang (24094623) 20130515 1 INTRODUCTION In this project, we
More informationKernels and representation
Kernels and representation Corso di AA, anno 2017/18, Padova Fabio Aiolli 20 Dicembre 2017 Fabio Aiolli Kernels and representation 20 Dicembre 2017 1 / 19 (Hierarchical) Representation Learning Hierarchical
More informationSupport Vector Machines
Support Vector Machines SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions 6. Dealing
More informationBAYESIAN GLOBAL OPTIMIZATION
BAYESIAN GLOBAL OPTIMIZATION Using Optimal Learning to Tune Deep Learning Pipelines Scott Clark scott@sigopt.com OUTLINE 1. Why is Tuning AI Models Hard? 2. Comparison of Tuning Methods 3. Bayesian Global
More informationMachine Learning Lecture 9
Course Outline Machine Learning Lecture 9 Fundamentals ( weeks) Bayes Decision Theory Probability Density Estimation Nonlinear SVMs 30.05.016 Discriminative Approaches (5 weeks) Linear Discriminant Functions
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationCSCI6900 Assignment 3: Clustering on Spark
DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF GEORGIA CSCI6900 Assignment 3: Clustering on Spark DUE: Friday, Oct 2 by 11:59:59pm Out Friday, September 18, 2015 1 OVERVIEW Clustering is a data mining technique
More informationLargeScale Traffic Sign Recognition based on Local Features and Color Segmentation
LargeScale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, FraunhoferPlatz 1,
More informationLecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 1 Notes Outline 1. Machine Learning What is it? Classification vs. Regression Error Training Error vs. Test Error 2. Linear Classifiers Goals and Motivations
More informationScalable deep learning on distributed GPUs with a GPUspecialized parameter server
Scalable deep learning on distributed GPUs with a GPUspecialized parameter server Henggang Cui, Gregory R. Ganger, and Phillip B. Gibbons Carnegie Mellon University CMUPDL15107 October 2015 Parallel
More informationPredictive Indexing for Fast Search
Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive
More informationLearning to Localize Objects with Structured Output Regression
Learning to Localize Objects with Structured Output Regression Matthew Blaschko and Christopher Lampert ECCV 2008 Best Student Paper Award Presentation by Jaeyong Sung and Yiting Xie 1 Object Localization
More informationA novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems
A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics
More informationSemisupervised learning and active learning
Semisupervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners
More informationMachine Learning. A. Supervised Learning A.7. Decision Trees. Lars SchmidtThieme
Machine Learning A. Supervised Learning A.7. Decision Trees Lars SchmidtThieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationINMEMORY ASSOCIATIVE COMPUTING
INMEMORY ASSOCIATIVE COMPUTING AVIDAN AKERIB, GSI TECHNOLOGY AAKERIB@GSITECHNOLOGY.COM AGENDA The AI computational challenge Introduction to associative computing Examples An NLP use case What s next?
More informationParallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade
Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 23 Announcements...
More informationLearning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation
Learning Learning agents Inductive learning Different Learning Scenarios Evaluation Slides based on Slides by Russell/Norvig, Ronald Williams, and Torsten Reil Material from Russell & Norvig, chapters
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationPart 5: Structured Support Vector Machines
Part 5: Structured Support Vector Machines Sebastian Nowozin and Christoph H. Lampert Providence, 21st June 2012 1 / 34 Problem (LossMinimizing Parameter Learning) Let d(x, y) be the (unknown) true data
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationLearning image representations equivariant to egomotion (Supplementary material)
Learning image representations equivariant to egomotion (Supplementary material) Dinesh Jayaraman UT Austin dineshj@cs.utexas.edu Kristen Grauman UT Austin grauman@cs.utexas.edu maxpool (3x3, stride2)
More informationAccelerated Machine Learning Algorithms in Python
Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals
More informationRecommender Systems. Collaborative Filtering & ContentBased Recommending
Recommender Systems Collaborative Filtering & ContentBased Recommending 1 Recommender Systems Systems for recommending items (e.g. books, movies, CD s, web pages, newsgroup messages) to users based on
More informationNeural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Neural Networks These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides
More informationAn introduction to random forests
An introduction to random forests Eric Debreuve / Team Morpheme Institutions: University Nice Sophia Antipolis / CNRS / Inria Labs: I3S / Inria CRI SAM / ibv Outline Machine learning Decision tree Random
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a priori. Classification: Classes are defined apriori Sometimes called supervised clustering Extract useful
More informationBPR: Bayesian Personalized Ranking from Implicit Feedback
452 RENDLE ET AL. UAI 2009 BPR: Bayesian Personalized Ranking from Implicit Feedback Steffen Rendle, Christoph Freudenthaler, Zeno Gantner and Lars SchmidtThieme {srendle, freudenthaler, gantner, schmidtthieme}@ismll.de
More informationNeural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Neural Networks These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides
More informationOverview and Practical Application of Machine Learning in Pricing
Overview and Practical Application of Machine Learning in Pricing 2017 CAS Spring Meeting May 23, 2017 Duncan Anderson and Claudine Modlin (Willis Towers Watson) Mark Richards (Allstate Insurance Company)
More informationAn Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation
An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio Université de Montréal 13/06/2007
More informationUSC Viterbi School of Engineering
Introduction to Computational Thinking and Data Science USC Viterbi School of Engineering http://www.datascience4all.org Term: Fall 2016 Time: Tues Thur 10am 11:50am Location: Allan Hancock Foundation
More informationData Preprocessing. Javier Béjar AMLT /2017 CS  MAI. (CS  MAI) Data Preprocessing AMLT / / 71 BY: $\
Data Preprocessing S  MAI AMLT  2016/2017 (S  MAI) Data Preprocessing AMLT  2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization
More informationParallel Stochastic Gradient Descent: The case for native GPUside GPI
Parallel Stochastic Gradient Descent: The case for native GPUside GPI J. Keuper Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Mark Silberstein Accelerated Computer
More informationModule 1 Lecture Notes 2. Optimization Problem and Model Formulation
Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization
More information