About the Course. Reading List. Assignments and Examina5on

Size: px

Start display at page:

Download "About the Course. Reading List. Assignments and Examina5on"

Kelley Goodman
6 years ago
Views:

1 Uppsala University Department of Linguis5cs and Philology About the Course Introduc5on to machine learning Focus on methods used in NLP Decision trees and nearest neighbor methods Linear models for classifica5on and structured predic5on Ensemble methods Unsupervised learning (clustering) Builds on Sta5s5cal Methods in NLP Mostly discrimina5ve methods Genera5ve probability models covered in first course 2 Reading List Main textbook: Ethem Alpaydin, Introduc)on to Machine Learning (2nd ed) Addi5onal material: Hal Daumé III, A Course in Machine Learning (drap) Do not distribute chapters! Do submit bug reports! Papers on specific methods not covered in the book Assignments and Examina5on Assignments: Decision trees and nearest neighbor Perceptron learning Clustering Examina5on: WriTen report submited for each assignment All three assignments necessary to pass the course Grade determined by majority grade on assignments 3 4 1

Prac5cal Organiza5on Lectures in Adobe Connect: Raise hand to ask ques5ons Recordings of lectures available on course home page Lecturers: Joakim Nivre (1 4) Oscar Täckström (5) Magnus Rosell (6)

2 Prac5cal Organiza5on Lectures in Adobe Connect: Raise hand to ask ques5ons Recordings of lectures available on course home page Lecturers: Joakim Nivre (1 4) Oscar Täckström (5) Magnus Rosell (6) Assignments: No lab sessions, supervision by Reports submited to Lecture Slides for ETHEM ALPAYDIN The MIT Press, 2010 h;p:// Revised and adapted by Joakim Nivre 5 Machine Learning Machine learning is programming computers to op5mize a performance criterion for some task using example data or past experience Why learning? No known exact method speech recogni5on Exact method too expensive sta5s5cal physics Task evolves over 5me network rou5ng Compare: No need to use machine learning for compu5ng payroll Elements of Machine Learning Generaliza5on: Generalize from specific examples Based on sta5s5cal inference Data: Training data: specific examples to learn from Test data: (new) specific examples to assess performance Models: Theore5cal assump5ons about the task/domain Parameters that can be inferred from data Algorithms: Learning algorithm: infer model (parameters) from data Inference algorithm: infer predic5ons from model 7 8 2

3 Types of Machine Learning Associa5on Supervised Learning Classifica5on Regression Unsupervised Learning Reinforcement Learning Learning Associa5ons Basket analysis: P (Y X ) probability that somebody who buys X also buys Y where X and Y are products/services Example: P ( chips beer ) = Classifica5on Example: Credit scoring Differen5a5ng between low risk and high risk customers from their income and savings Discriminant: IF income > θ 1 AND savings > θ 2 THEN low risk ELSE high risk Classifica5on in NLP Binary classifica5on: Spam filtering (spam vs. ham) Spelling error detec5on (error vs. no error) Mul5class classifica5on: Text categoriza5on (news, economy, culture, sport,...) Named en5ty classifica5on (person, loca5on, organiza5on,...) Structured predic5on: Part of speech tagging (classes = tag sequences) Syntac5c parsing (classes = parse trees)

4 Regression Example: Price of used car x : car atributes y : price y = g (x θ ) g ( ) model, θ parameters y = wx+w 0 Uses of Supervised Learning Predic5on of future cases: Use the rule to predict the output for future inputs Knowledge extrac5on: The rule is easy to understand Compression: The rule is simpler than the data it explains Outlier detec5on: Excep5ons that are not covered by the rule, e.g., fraud Unsupervised Learning Reinforcement Learning Finding regulari5es in data No mapping to outputs Learning a policy = sequence of outputs/ac5ons No supervised output but delayed reward Clustering: Grouping similar instances Example applica5ons: Game playing Example applica5ons: Robot in a maze Customer segmenta5on in CRM NLP: Dialogue systems Image compression: Color quan5za5on NLP: Unsupervised text categoriza5on

5 Back to Classifica5on Learning the class C of a family car from examples Predic5on: Is car x a family car? Knowledge extrac5on: What do people expect from a family car? Output (labels): Posi5ve (+) and nega5ve ( ) examples Input representa5on (features): x 1 : price, x 2 : engine power Training set X X = {x t,r t N } t=1 1 if x is positive r = 0 if x is negative x = x 1 x Hypothesis class H ( p 1 price p 2 ) AND ( e 1 engine power e 2 ) Empirical (training) error 1 if h says x is positive h(x) = 0 if h says x is negative Empirical error of h on X E(h X ) = N 1( h( x t ) r t ) t=

S, G, and the Version Space most specific hypothesis, S most general hypothesis, G Margin Choose h with largest margin h H, between S and G is consistent [E( h X) = 0] and make up the version space

6 S, G, and the Version Space most specific hypothesis, S most general hypothesis, G Margin Choose h with largest margin h H, between S and G is consistent [E( h X) = 0] and make up the version space Noise Unwanted anomaly in data Imprecision in input atributes Errors in labeling data points Hidden atributes (rela5ve to H) Consequence: No h in H may be consistent! Noise and Model Complexity Arguments for simpler model Easier to make predic5ons Easier to train (fewer parameters) Easier to understand Generalizes beter (if data is noisy)

7 Induc5ve Bias Learning is an ill posed problem Training data is never sufficient to find a unique solu5on There are always infinitely many consistent hypotheses We need an induc5ve bias: Assump5ons that entail a unique h for a training set X Hypothesis class H axis aligned rectangles Learning algorithm find consistent hypothesis with max margin Hyperparameters trade off between training error and margin Generaliza5on Generaliza5on how well a model performs on new data Overfi ng: H more complex than C Underfi ng: H less complex than C Trade off between three factors: Complexity of H, c(h) Training set size N Generaliza5on error E on new data Dependencies: As N, E As c(h), first E and then E Model Selec5on To es5mate generaliza5on error, we need data unseen during training: M ( ( ) r t ) E ˆ = E(h V) = 1 h x t V = {x t,r t } M t=1 X t=1 Given models (hypotheses) h 1,..., h k induced from the training set X, we can use E(h i V ) to select the model h i with the smallest generaliza5on error Model Assessment To es5mate the generaliza5on error of the best model h i, we need data unseen during training and model selec5on Standard setup: Training set X (50 80%) Valida5on (development) set V (10 25%) Test (publica5on) set T (10 25%) Note: Valida5on data can be added to training set before tes5ng Resampling methods can be used if data is limited

8 Cross Valida5on K fold cross valida5on: Divide X into X 1,..., X K V 1 = X 1 X 1 = X 2 X 3 X K V 2 = X 2 X 2 = X 1 X 3 X K Bootstrapping Generate new training sets of size N from X by random sampling with replacement Use original training set as valida5on set (V = X ) Probability that we do not pick an instance aper N draws V K = X K X K = X 1 X 2 X K 1 Note: Generaliza5on error es5mated by means across K folds Training sets for different folds share K 2 parts Separate test set must be maintained for model assessment that is, only 36.8% of instances are new! Measuring Error Sta5s5cal Inference Interval es5ma5on to quan5fy the precision of our measurements m ±1.96 σ N Error rate Accuracy Recall Precision = # of errors / # of instances = (FP+FN) / N = # of correct / # of instances = (TP+TN) / N = # of found posi5ves / # of posi5ves = TP / (TP+FN) = # of found posi5ves / # of found = TP / (TP+FP) Hypothesis tes5ng to assess whether differences between models are sta5s5cally significant ( e 01 e 10 1) 2 2 ~ X 1 e 01 + e

9 Supervised Learning Summary Anatomy of a Supervised Learner Training data + learner hypothesis Learner incorporates induc5ve bias Test data + hypothesis es5mated generaliza5on Test data must be unseen Next three lectures: Different learners with different induc5ve biases Model: Loss func5on: Op5miza5on procedure: g( x θ) E θ X ( ( )) ( ) = L r t,g x t θ t θ* = arg min E( θ X ) θ

ML4Bio Lecture #1: Introduc3on. February 24 th, 2016 Quaid Morris

ML4Bio Lecture #1: Introduc3on February 24 th, 216 Quaid Morris Course goals Prac3cal introduc3on to ML Having a basic grounding in the terminology and important concepts in ML; to permit self- study,