Machine Learning (CSE 446): Perceptron

Similar documents
Machine Learning (CSE 446): Unsupervised Learning

Machine Learning (CSE 446): Decision Trees

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

CS5670: Computer Vision

Machine Learning (CSE 446): Practical Issues

Deep Learning for Computer Vision

Machine Learning in Biology

Lecture 25: Review I

Supervised vs unsupervised clustering

Lecture 3: Linear Classification

Logical Rhythm - Class 3. August 27, 2018

Machine Learning: Think Big and Parallel

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Lecture 8: Grid Search and Model Validation Continued

Homework Assignment #3

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones

Geometric data structures:

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018

Overfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

k-nearest Neighbors + Model Selection

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur

CPSC 340: Machine Learning and Data Mining

Linear Classification and Perceptron

CSE 573: Artificial Intelligence Autumn 2010

Rapid growth of massive datasets

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

CS 229 Midterm Review

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Data Mining. Neural Networks

Supervised vs.unsupervised Learning

Intro to Artificial Intelligence

COMPUTATIONAL INTELLIGENCE

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Deep Learning. Volker Tresp Summer 2014

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Comparison of Linear Regression with K-Nearest Neighbors

Introduction to Machine Learning. Xiaojin Zhu

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Introduction to Machine Learning CMU-10701

Notes and Announcements

Lecture on Modeling Tools for Clustering & Regression

Performance Measures

Instructor: Jessica Wu Harvey Mudd College

Introduction to Automated Text Analysis. bit.ly/poir599

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Nearest neighbors classifiers

The Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013

EE 511 Linear Regression

low bias high variance high bias low variance error test set training set high low Model Complexity Typical Behaviour Lecture 11:

error low bias high variance test set training set high low Model Complexity Typical Behaviour 2 CSC2515 Machine Learning high bias low variance

Perceptron: This is convolution!

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Machine Learning / Jan 27, 2010

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

Classification and Regression

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Network Traffic Measurements and Analysis

Kernels and Clustering

CS547, Neural Networks: Homework 3

Applying Supervised Learning

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018

Lecture 3. Oct

Nearest Neighbor Classification

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

1 Training/Validation/Testing

CSE 158. Web Mining and Recommender Systems. Midterm recap

Clustering and Visualisation of Data

CSE446: Linear Regression. Spring 2017

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Statistics 202: Statistical Aspects of Data Mining

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Unsupervised Learning

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

Learning to Learn: additional notes

Classification and K-Nearest Neighbors

CS 343: Artificial Intelligence

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

Using Machine Learning to Optimize Storage Systems

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

Machine Learning Classifiers and Boosting

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017

Evaluating Classifiers

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

Supervised Learning for Image Segmentation

Lecture #11: The Perceptron

Transcription:

Machine Learning (CSE 446): Perceptron Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 14

Announcements HW due this week. See detailed instructions in the hw. One pdf file. Answers and figures grouped together for each problem (in order). Submit your code (only for problem 4 needed for HW1). Qz section this week. a little probability + more linear algebra Updated late policy: You get 2 late days for the entire quarter, which will be automatically deducted per late day (or part thereof). After these days are used up, 33% deducted per day. Today: the perceptron algo 1 / 14

Review 1 / 14

The General Recipe (for supervised learning) The cardinal rule of machine learning: Don t touch your test data. If you follow that rule, this recipe will give you accurate information: 1. Split data into training, (maybe) development, and test sets. 2. For different hyperparameter settings: 2.1 Train on the training data using those hyperparameter values. 2.2 Evaluate loss on development data. 3. Choose the hyperparameter setting whose model achieved the lowest development data loss. Optionally, retrain on the training and development data together. 4. Now you have a hypothesis. Evaluate that model on test data. 2 / 14

Also... supervised learning algorithms we covered: Decision trees: uses few features, selectively Nearest neighbor: uses all features, blindly unsupervised learning algorithms we covered: K-means: a clustering algorithm we show it converges later in the class this algorithm does not use labels 3 / 14

Probability you should know expectations (and notation) mean, variance unbiased estimate joint distributions conditional distributions 3 / 14

Linear algebra you should know vectors inner products (and interpretation) Euclidean distance. Euclidean norm. soon: matrices and matrix multiplication outer products a covariance matrix how to write a vector x in an orthogonal basis. SVD/eigenvectors/eigenvalues... 3 / 14

Today 3 / 14

Is there a happy medium? Decision trees (that aren t too deep): use relatively few features to classify. K-nearest neighbors: all features weighted equally. Today: use all features, but weight them. 4 / 14

Is there a happy medium? Decision trees (that aren t too deep): use relatively few features to classify. K-nearest neighbors: all features weighted equally. Today: use all features, but weight them. For today s lecture, assume that y { 1, +1} instead of {0, 1}, and that x R d. 4 / 14

Inspiration from Neurons Image from Wikimedia Commons. Input signals come in through dendrites, output signal passes out through the axon. 5 / 14

Neuron-Inspired Classifier input weight parameters x[1] w[1] x[2] w[2] activation output x[3] w[3]! fire, or not? ŷ x[d] w[d] bias parameter b 6 / 14

Neuron-Inspired Classifier input x[1] w[1] f x[2] w[2] x[3] w[3] output! ŷ x[d] w[d] b 6 / 14

A parametric Hypothesis f(x) = sign (w x + b) d remembering that: w x = w[j] x[j] j=1 Learning requires us to set the weights w and the bias b. 7 / 14

Geometrically... What does the decision boundary look like? 8 / 14

Online learning Let s think of an online algorithm, where we try to update w as we examine each training point (x i, y i ), one at a time. How should we change w if we do not make an error on some point (x, y)? How should we change w if we make an error on some point (x, y)? 9 / 14

Perceptron Learning Algorithm Data: D = (x n, y n ) N n=1, number of epochs E Result: weights w and bias b initialize: w = 0 and b = 0; for e {1,..., E} do for n {1,..., N}, in random order do # predict ŷ = sign (w x n + b); if ŷ y n then # update w w + y n x n ; b b + y n ; end end end return w, b Algorithm 1: PerceptronTrain 10 / 14

Parameters and Hyperparameters This is the first supervised algorithm we ve seen that has parameters that are numerical values (w and b). The perceptron learning algorithm s sole hyperparameter is E, the number of epochs (passes over the training data). How should we tune E using the development data? 11 / 14

Linear Decision Boundary w x + b = 0 activation = w x + b 12 / 14

Interpretation of Weight Values What does it mean when... w 1 = 100? w 2 = 1? w 3 = 0? What about feature scaling? 13 / 14

What can we say about convergence? Can you say what has to occur for the algorithm to converge? Can we understand when it will never converge? Stay tuned... 14 / 14