Linear Regression QuadraJc Regression Year
|
|
- Clarence Charles
- 6 years ago
- Views:
Transcription
1 Linear Regression
2 Regression Given: n Data X = x (1),...,x (n)o where x (i) 2 R d n Corresponding labels y = y (1),...,y (n)o where y (i) 2 R September Arc+c Sea Ice Extent (1,000,000 sq km) Linear Regression QuadraJc Regression Year Data from G. WiF. Journal of StaJsJcs EducaJon, Volume 21, Number 1 (2013) 2
3 Prostate Cancer Dataset 97 samples, parjjoned into 67 train / 30 test Eight predictors (features): 6 conjnuous (4 log transforms), 1 binary, 1 ordinal ConJnuous outcome variable: lpsa: log(prostate specific anjgen level) Based on slide by Jeff Howbert
4 Hypothesis: Linear Regression y = x x d x d Assume x 0 = 1 = dx j=0 j x j Fit model by minimizing sum of squared errors x Figures are courtesy of Greg Shakhnarovich 5
5 Least Squares Linear Regression Cost FuncJon J( ) = 1 2n Fit by solving min nx i=1 J( ) h x (i) y (i) 2 6
6 IntuiJon Behind Cost FuncJon J( ) = 1 2n For insight on J(), let s assume nx h x (i) y (i) 2 i=1 x 2 R so =[ 0, 1 ] Based on example by Andrew Ng 7
7 IntuiJon Behind Cost FuncJon J( ) = 1 2n For insight on J(), let s assume nx h x (i) y (i) 2 i=1 x 2 R so =[ 0, 1 ] (for fixed, this is a funcjon of x) (funcjon of the parameter ) 3 3 y x Based on example by Andrew Ng 8
8 IntuiJon Behind Cost FuncJon J( ) = 1 2n For insight on J(), let s assume nx h x (i) y (i) 2 i=1 x 2 R so =[ 0, 1 ] (for fixed, this is a funcjon of x) (funcjon of the parameter ) 3 3 y Based on example by Andrew Ng x J([0, 0.5]) = (0.5 1) 2 +(1 2) 2 +(1.5 3)
9 IntuiJon Behind Cost FuncJon J( ) = 1 2n For insight on J(), let s assume nx h x (i) y (i) 2 i=1 x 2 R so =[ 0, 1 ] y (for fixed, this is a funcjon of x) (funcjon of the parameter ) 3 3 J([0, 0]) J() is concave x Based on example by Andrew Ng 10
10 IntuiJon Behind Cost FuncJon Slide by Andrew Ng 11
11 IntuiJon Behind Cost FuncJon (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 12
12 IntuiJon Behind Cost FuncJon (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 13
13 IntuiJon Behind Cost FuncJon (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 14
14 IntuiJon Behind Cost FuncJon (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 15
15 Basic Search Procedure Choose inijal value for UnJl we reach a minimum: Choose a new value for to reduce J( ) J(θ 0,θ 1 ) θ 1 θ 0 Figure by Andrew Ng 16
16 Basic Search Procedure Choose inijal value for UnJl we reach a minimum: Choose a new value for to reduce J( ) J(θ 0,θ 1 ) θ 1 θ 0 Figure by Andrew Ng 17
17 Basic Search Procedure Choose inijal value for UnJl we reach a minimum: Choose a new value for to reduce J( ) J(θ 0,θ 1 ) Since the least squares objecjve funcjon is convex θ 1 (concave), θ 0 we don t need to worry about local minima Figure by Andrew Ng 18
18 Gradient Descent IniJalize Repeat unjl convergence j J( j simultaneous update for j = 0... d learning rate (small) e.g., α = J( )
19 Gradient Descent IniJalize Repeat unjl convergence j J( j simultaneous update for j = 0... d For j J( ) j 2n nx h x (i) y (i) 2 i=1 X X! 2 20
20 Gradient Descent IniJalize Repeat unjl convergence j J( j simultaneous update for j = 0... d For j J( ) j j 1 2n X nx h x (i) y (i) 2 i=1 nx i=1 X dx k=0 k x (i) k! y (i)! 2 X! 21
21 Gradient Descent IniJalize Repeat unjl convergence j J( j simultaneous update for j = 0... d For j J( ) j j 1 2n = 1 n nx i=1 X nx h x (i) y (i) 2 i=1 nx i=1 dx k=0 X dx k=0 k x (i) k k x (i) k y (i)!! y j dx k=0 k x (i) k y (i)! 22
22 Gradient Descent IniJalize Repeat unjl convergence j J( j simultaneous update for j = 0... d For j J( ) j j 1 2n = 1 n = 1 n nx i=1 nx i=1 nx h x (i) y (i) 2 i=1 nx i=1 dx k=0 dx k=0 dx k=0 k x (i) k k x (i) k k x (i) k y (i) y (i)!! nx y (i) x (i) j dx k=0 k x (i) k y (i)! 23
23 Gradient Descent for Linear Regression IniJalize Repeat unjl convergence j j 1 nx h x (i) n i=1 y (i) x (i) j simultaneous update for j = 0... d To achieve simultaneous update At the start of each GD iterajon, compute h x (i) Use this stored value in the update step loop Assume convergence when L kvk 2 = 2 norm: s X vi qv 2 = v v2 v i k new old k 2 < 24
24 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) h(x) = x Slide by Andrew Ng 25
25 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 26
26 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 27
27 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 28
28 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 29
29 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 30
30 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 31
31 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 32
32 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 33
33 Choosing α α too small slow convergence α too large Increasing value for J( ) May overshoot the minimum May fail to converge May even diverge To see if gradient descent is working, print out The value should decrease at each iterajon If it doesn t, adjust α J( ) each iterajon 34
34 Extending Linear Regression to More Complex Models The inputs X for linear regression can be: Original quanjtajve inputs TransformaJon of quanjtajve inputs e.g. log, exp, square root, square, etc. Polynomial transformajon example: y = β 0 + β 1 x + β 2 x 2 + β 3 x 3 Basis expansions Dummy coding of categorical inputs InteracJons between variables example: x 3 = x 1 x 2 This allows use of linear regression techniques to fit non- linear datasets.
35 Linear Basis FuncJon Models Generally, h (x) = dx j=0 j j (x) basis funcjon Typically, 0(x) =1so that 0 acts as a bias In the simplest case, we use linear basis funcjons : j(x) =x j Based on slide by Christopher Bishop (PRML)
36 Linear Basis FuncJon Models Polynomial basis funcjons: These are global; a small change in x affects all basis funcjons Gaussian basis funcjons: These are local; a small change in x only affect nearby basis funcjons. μ j and s control locajon and scale (width). Based on slide by Christopher Bishop (PRML)
37 Based on slide by Christopher Bishop (PRML) Linear Basis FuncJon Models Sigmoidal basis funcjons: where These are also local; a small change in x only affects nearby basis funcjons. μ j and s control locajon and scale (slope).
38 Example of Fimng a Polynomial Curve with a Linear Model y = x + 2 x p x p = px j x j j=0
39 Linear Basis FuncJon Models Basic Linear Model: Generalized Linear Model: h (x) = h (x) = Once we have replaced the data by the outputs of the basis funcjons, fimng the generalized model is exactly the same problem as fimng the basic model Unless we use the kernel trick more on that when we cover support vector machines Therefore, there is no point in clufering the math with basis funcjons dx j=0 dx j=0 j x j j j (x) Based on slide by Geoff Hinton 40
40 Linear Algebra Concepts Vector in R d is an ordered set of d real numbers e.g., v = [1,6,3,4] is in R 4 [1,6,3,4] is a column vector: as opposed to a row vector: ( ) An m- by- n matrix is an object with m rows and n columns, where each entry is a real number: Based on slides by Joseph Bradley
41 Linear Algebra Concepts Transpose: reflect vector/matrix on line: a T = b ( a b) a c b d T = a b c d Note: (Ax) T =x T A T (We ll define muljplicajon soon ) Vector norms: L p norm of v = (v 1,,v k ) is Common norms: L 1, L 2 L infinity = max i v i Length of a vector v is L 2 (v) Based on slides by Joseph Bradley X i v i p! 1 p
42 Linear Algebra Concepts Vector dot product: ( u1 u2) ( v1 v2 ) = u1v 1 u2v2 u v = + Note: dot product of u with itself = length(u) 2 = kuk 2 2 Matrix product: A AB a = a a = a b b a a , + + B = a a b b b b a a b b b b a a b b Based on slides by Joseph Bradley
43 Vector products: Dot product: Outer product: ( ) v u u v v v u u v u v u T + = = = ( ) = = v u v u u v u v v v u u uv T Based on slides by Joseph Bradley Linear Algebra Concepts
44 Benefits of vectorizajon More compact equajons VectorizaJon Faster code (using opjmized matrix libraries) Consider our model: Let = d h(x) = dx j=0 j x j x = 1 x 1... x d Can write the model in vectorized form as h(x) = x 45
45 VectorizaJon Consider our model for n instances: h x (i) dx = j x (i) j Let = d R (d+1) X = Can write the model in vectorized form as j=0 1 x (1) 1... x (1) d x (i) 1... x (i) d x (n) 1... x (n) d R n (d+1) h (x) =X 46
46 VectorizaJon Let: y = For the linear regression cost funcjon: J( ) = 1 nx h x (i) y (i) 2 2n y (1) y (2). y (n) = 1 2n i=1 nx i=1 x (i) y (i) 2 = 1 2n (X y) (X y) R 1 n R n 1 R n (d+1) R (d+1) 1 47
47 Closed Form SoluJon Instead of using GD, solve for opjmal NoJce that the solujon is when DerivaJon: Take derivajve and set equal to 0, then solve for ( X X 2 X y + y (X X) X y =0 Closed Form J( ) ( ) = 1 2n (X y) 1 x 1 J (X y) / X X y X X y + y y / / X X 2 X y + y y (X X) = X y analyjcally =(X X) 1 X y 48
48 Closed Form SoluJon Can obtain by simply plugging X and y into =(X X) 1 X y X = x (1) 1... x (1) d x (i) 1... x (i) d x (n) 1... x (n) d y = 6 4 If X T X is not inverjble (i.e., singular), may need to: Use pseudo- inverse instead of the inverse In python, numpy.linalg.pinv(a) Remove redundant (not linearly independent) features Remove extra features to ensure that d n y (1) y (2). y (n)
49 Gradient Descent vs Closed Form Gradient Descent Requires muljple iterajons Need to choose α Works well when n is large Can support incremental learning Closed Form Solu+on Non- iterajve No need for α Slow if n is large CompuJng (X T X) - 1 is roughly O(n 3 ) 50
50 Improving Learning: Feature Scaling Idea: Ensure that feature have similar scales Before Feature Scaling Makes gradient descent converge much faster Auer Feature Scaling
51 Feature StandardizaJon Rescales features to have zero mean and unit variance Let μ j be the mean of feature j: µ j = 1 nx x (i) j n Replace each value with: x (i) j x (i) j s j s j is the standard deviajon of feature j Could also use the range of feature j (max j min j ) for s j Must apply the same transformajon to instances for both training and predicjon Outliers can cause problems µ j i=1 for j = 1...d (not x 0!) 52
52 Quality of Fit Price Price Price Size Underfimng (high bias) Size Correct fit Size Overfimng (high variance) OverfiHng: The learned hypothesis may fit the training set very well ( J( ) 0 )...but fails to generalize to new examples Based on example by Andrew Ng 53
53 RegularizaJon A method for automajcally controlling the complexity of the learned hypothesis Idea: penalize for large values of Can incorporate into the cost funcjon Works well when we have a lot of features, each that contributes a bit to predicjng the label Can also address overfimng by eliminajng features (either manually or via model selecjon) j 54
54 RegularizaJon Linear regression objecjve funcjon J( ) = 1 2n nx i=1 h x (i) y (i) dx j=1 2 j model fit to data regularizajon is the regularizajon parameter ( 0) No regularizajon on 0! 55
55 Understanding RegularizaJon J( ) = 1 2n nx i=1 h x (i) y (i) dx j=1 2 j dx Note that j 2 = k 1:d k 2 2 j=1 This is the magnitude of the feature coefficient vector! We can also think of this as: dx ( j 0) 2 = k 1:d ~0k 2 2 j=1 L 2 regularizajon pulls coefficients toward 0 56
56 Understanding RegularizaJon J( ) = 1 2n nx i=1 h x (i) y (i) dx j=1 2 j What happens if we set to be huge (e.g., )? Price Size Based on example by Andrew Ng 57
57 Understanding RegularizaJon J( ) = 1 2n nx i=1 h x (i) y (i) dx j=1 2 j What happens if we set to be huge (e.g., )? Price Size Based on example by Andrew Ng 58
58 Regularized Linear Regression Cost FuncJon J( ) = 1 2n Fit by solving nx i=1 min h x (i) y (i) dx J( ) j=1 0 j J( ) Gradient update: n j j 1 n nx i=1 nx i=1 h x (i) y (i) h x (i) y (i) x (i) j j regularizajon 59
59 Regularized Linear Regression J( ) = 1 2n n nx h x (i) y (i) 2 dx + 2 i=1 j j 1 n nx h x (i) y (i) i=1 nx i=1 h x (i) y (i) x (i) j j=1 2 j j We can rewrite the gradient step as: j j (1 ) 1 nx h x (i) n i=1 y (i) x (i) j 60
60 Regularized Linear Regression To incorporate regularizajon into the closed form solujon: = B X X X y C. 5A
61 Regularized Linear Regression To incorporate regularizajon into the closed form solujon: = B X X X y C. 5A Can derive this the same way, by solving Can prove that for λ > 0, inverse exists in the equajon J( ) 62
Linear Regression & Gradient Descent
Linear Regression & Gradient Descent These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online.
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationCSC 4510 Machine Learning
4: Regression (con.nued) CSC 4510 Machine Learning Dr. Mary Angela Papalaskari Department of CompuBng Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ The slides in this presentabon
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 22 2019 Outline Practical issues in Linear Regression Outliers Categorical variables Lab
More informationNeural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Neural Networks These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides
More informationUVA CS 4501: Machine Learning. Lecture 22: Unsupervised Clustering (I) Dr. Yanjun Qi. University of Virginia. Department of Computer Science
UVA CS : Machine Learning Lecture : Unsupervised Clustering (I) Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è major secjons of this course q Regression (supervised)
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 18 2018 Logistics HW 1 is on Piazza and Gradescope Deadline: Friday, Sept. 28, 2018 Office
More informationUVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 19: Unsupervised Clustering (I) Dr. Yanjun Qi. University of Virginia
Dr. Yanjun Qi / UVA CS 66 / f6 UVA CS 66/ Fall 6 Machine Learning Lecture 9: Unsupervised Clustering (I) Dr. Yanjun Qi University of Virginia Department of Computer Science Dr. Yanjun Qi / UVA CS 66 /
More informationSlides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP
Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Linear regression Linear Basis FuncDon Models (1) Example: Polynomial Curve FiLng Linear Basis FuncDon Models (2) Generally
More informationCPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015
CPSC 340: Machine Learning and Data Mining Robust Regression Fall 2015 Admin Can you see Assignment 1 grades on UBC connect? Auditors, don t worry about it. You should already be working on Assignment
More information06: Logistic Regression
06_Logistic_Regression 06: Logistic Regression Previous Next Index Classification Where y is a discrete value Develop the logistic regression algorithm to determine what class a new input should fall into
More informationUnderstanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version)
Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version) Note: All source materials and diagrams are taken from the Coursera s lectures created by Dr Andrew Ng. Everything I have
More informationMachine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD
Machine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD Due: Friday, February 5, 2015, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions
More informationMachine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD
Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD Due: Friday, February 6, 2015, at 4pm (Submit via NYU Classes) Instructions: Your answers to the questions
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 3 Parametric Distribu>ons We want model the probability
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More information3 Types of Gradient Descent Algorithms for Small & Large Data Sets
3 Types of Gradient Descent Algorithms for Small & Large Data Sets Introduction Gradient Descent Algorithm (GD) is an iterative algorithm to find a Global Minimum of an objective function (cost function)
More informationCPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017
CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other
More informationCPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017
CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationLogis&c Regression. Aar$ Singh & Barnabas Poczos. Machine Learning / Jan 28, 2014
Logis&c Regression Aar$ Singh & Barnabas Poczos Machine Learning 10-701/15-781 Jan 28, 2014 Linear Regression & Linear Classifica&on Weight Height Linear fit Linear decision boundary 2 Naïve Bayes Recap
More informationOptimization and least squares. Prof. Noah Snavely CS1114
Optimization and least squares Prof. Noah Snavely CS1114 http://cs1114.cs.cornell.edu Administrivia A5 Part 1 due tomorrow by 5pm (please sign up for a demo slot) Part 2 will be due in two weeks (4/17)
More informationHomework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:
Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationCOMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Simplest
More informationSupport Vector Machines
Support Vector Machines Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [ Based on slides from Andrew Moore http://www.cs.cmu.edu/~awm/tutorials] slide 1
More informationREGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA
REGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA MACHINE LEARNING It is the science of getting computer to learn without being explicitly programmed. Machine learning is an area of artificial
More informationComputational Intelligence (CS) SS15 Homework 1 Linear Regression and Logistic Regression
Computational Intelligence (CS) SS15 Homework 1 Linear Regression and Logistic Regression Anand Subramoney Points to achieve: 17 pts Extra points: 5* pts Info hour: 28.04.2015 14:00-15:00, HS i12 (ICK1130H)
More informationDivide and Conquer Kernel Ridge Regression
Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem
More informationHMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression
HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression Goals: To open up the black-box of scikit-learn and implement regression models. To investigate how adding polynomial
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More information2. Linear Regression and Gradient Descent
Pattern Recognition And Machine Learning - EPFL - Fall 2015 Emtiyaz Khan, Timur Bagautdinov, Carlos Becker, Ilija Bogunovic & Ksenia Konyushkova 2. Linear Regression and Gradient Descent 2.1 Goals The
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationAkarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different
More informationSupport vector machines
Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationGradient Descent - Problem of Hiking Down a Mountain
Gradient Descent - Problem of Hiking Down a Mountain Udacity Have you ever climbed a mountain? I am sure you had to hike down at some point? Hiking down is a great exercise and it is going to help us understand
More informationCOMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions
COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Regression
More informationCSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13
CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit
More informationPS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is:
Economics 1660: Big Data PS 6: Regularization Prof. Daniel Björkegren PART A: (Source: HTF page 95) The Ridge regression problem is: : β "#$%& = argmin (y # β 2 x #4 β 4 ) 6 6 + λ β 4 #89 Consider the
More informationMachine Learning Lecture-1
Machine Learning Lecture-1 Programming Club Indian Institute of Technology Kanpur pclubiitk@gmail.com February 25, 2016 Programming Club (IIT Kanpur) ML-1 February 25, 2016 1 / 18 Acknowledgement This
More informationCPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017
CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after
More informationCPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Multi-Class Classification Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation
More informationGradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent
Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent Slide credit: http://sebastianruder.com/optimizing-gradient-descent/index.html#batchgradientdescent
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationCSC 411: Lecture 02: Linear Regression
CSC 411: Lecture 02: Linear Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 16, 2015 Urtasun & Zemel (UofT) CSC 411: 02-Regression Sep 16, 2015 1 / 16 Today Linear regression problem continuous
More informationLecture 3: Linear Classification
Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.
More informationMachine Learning. Topic 4: Linear Regression Models
Machine Learning Topic 4: Linear Regression Models (contains ideas and a few images from wikipedia and books by Alpaydin, Duda/Hart/ Stork, and Bishop. Updated Fall 205) Regression Learning Task There
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationKernel Methods. Chapter 9 of A Course in Machine Learning by Hal Daumé III. Conversion to beamer by Fabrizio Riguzzi
Kernel Methods Chapter 9 of A Course in Machine Learning by Hal Daumé III http://ciml.info Conversion to beamer by Fabrizio Riguzzi Kernel Methods 1 / 66 Kernel Methods Linear models are great because
More informationPartitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning
Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning
More informationCSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.
CSE 547: Machine Learning for Big Data Spring 2019 Problem Set 2 Please read the homework submission policies. 1 Principal Component Analysis and Reconstruction (25 points) Let s do PCA and reconstruct
More informationGAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.
GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential
More informationDeep Neural Networks Optimization
Deep Neural Networks Optimization Creative Commons (cc) by Akritasa http://arxiv.org/pdf/1406.2572.pdf Slides from Geoffrey Hinton CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationCS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes
1 CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June
More informationIST 597 Foundations of Deep Learning Fall 2018 Homework 1: Regression & Gradient Descent
IST 597 Foundations of Deep Learning Fall 2018 Homework 1: Regression & Gradient Descent This assignment is worth 15% of your grade for this class. 1 Introduction Before starting your first assignment,
More informationOptimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund
Optimization Plugin for RapidMiner Technical Report Venkatesh Umaashankar Sangkyun Lee 04/2012 technische universität dortmund Part of the work on this technical report has been supported by Deutsche Forschungsgemeinschaft
More informationCS 179 Lecture 16. Logistic Regression & Parallel SGD
CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)
More informationCS281 Section 3: Practical Optimization
CS281 Section 3: Practical Optimization David Duvenaud and Dougal Maclaurin Most parameter estimation problems in machine learning cannot be solved in closed form, so we often have to resort to numerical
More information3.3 Optimizing Functions of Several Variables 3.4 Lagrange Multipliers
3.3 Optimizing Functions of Several Variables 3.4 Lagrange Multipliers Prof. Tesler Math 20C Fall 2018 Prof. Tesler 3.3 3.4 Optimization Math 20C / Fall 2018 1 / 56 Optimizing y = f (x) In Math 20A, we
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationMachine Learning Lecture 3
Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationPenalizied Logistic Regression for Classification
Penalizied Logistic Regression for Classification Gennady G. Pekhimenko Department of Computer Science University of Toronto Toronto, ON M5S3L1 pgen@cs.toronto.edu Abstract Investigation for using different
More informationLocally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005
Locally Weighted Learning for Control Alexander Skoglund Machine Learning Course AASS, June 2005 Outline Locally Weighted Learning, Christopher G. Atkeson et. al. in Artificial Intelligence Review, 11:11-73,1997
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More informationData Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017
Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationCPSC 340: Machine Learning and Data Mining. Regularization Fall 2016
CPSC 340: Machine Learning and Data Mining Regularization Fall 2016 Assignment 2: Admin 2 late days to hand it in Friday, 3 for Monday. Assignment 3 is out. Due next Wednesday (so we can release solutions
More informationRecap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach
Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)
More informationOverfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff
Overfitting Machine Learning CSE546 Carlos Guestrin University of Washington October 2, 2013 1 Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More
More informationMachine Learning Lecture 3
Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course
More informationLearning via Optimization
Lecture 7 1 Outline 1. Optimization Convexity 2. Linear regression in depth Locally weighted linear regression 3. Brief dips Logistic Regression [Stochastic] gradient ascent/descent Support Vector Machines
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationClustering algorithms and autoencoders for anomaly detection
Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms
More informationConvexization in Markov Chain Monte Carlo
in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non
More information6 Model selection and kernels
6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)
More informationNatural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs
Natural Language Processing with Deep Learning CS4N/Ling84 Christopher Manning Lecture 4: Backpropagation and computation graphs Lecture Plan Lecture 4: Backpropagation and computation graphs 1. Matrix
More informationChapter 7: Numerical Prediction
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.
More informationData Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\
Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured
More informationData Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\
Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization
More informationconvolution shift invariant linear system Fourier Transform Aliasing and sampling scale representation edge detection corner detection
COS 429: COMPUTER VISON Linear Filters and Edge Detection convolution shift invariant linear system Fourier Transform Aliasing and sampling scale representation edge detection corner detection Reading:
More information1 Training/Validation/Testing
CPSC 340 Final (Fall 2015) Name: Student Number: Please enter your information above, turn off cellphones, space yourselves out throughout the room, and wait until the official start of the exam to begin.
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More information