Linear Regression QuadraJc Regression Year

Size: px
Start display at page:

Download "Linear Regression QuadraJc Regression Year"

Transcription

1 Linear Regression

2 Regression Given: n Data X = x (1),...,x (n)o where x (i) 2 R d n Corresponding labels y = y (1),...,y (n)o where y (i) 2 R September Arc+c Sea Ice Extent (1,000,000 sq km) Linear Regression QuadraJc Regression Year Data from G. WiF. Journal of StaJsJcs EducaJon, Volume 21, Number 1 (2013) 2

3 Prostate Cancer Dataset 97 samples, parjjoned into 67 train / 30 test Eight predictors (features): 6 conjnuous (4 log transforms), 1 binary, 1 ordinal ConJnuous outcome variable: lpsa: log(prostate specific anjgen level) Based on slide by Jeff Howbert

4 Hypothesis: Linear Regression y = x x d x d Assume x 0 = 1 = dx j=0 j x j Fit model by minimizing sum of squared errors x Figures are courtesy of Greg Shakhnarovich 5

5 Least Squares Linear Regression Cost FuncJon J( ) = 1 2n Fit by solving min nx i=1 J( ) h x (i) y (i) 2 6

6 IntuiJon Behind Cost FuncJon J( ) = 1 2n For insight on J(), let s assume nx h x (i) y (i) 2 i=1 x 2 R so =[ 0, 1 ] Based on example by Andrew Ng 7

7 IntuiJon Behind Cost FuncJon J( ) = 1 2n For insight on J(), let s assume nx h x (i) y (i) 2 i=1 x 2 R so =[ 0, 1 ] (for fixed, this is a funcjon of x) (funcjon of the parameter ) 3 3 y x Based on example by Andrew Ng 8

8 IntuiJon Behind Cost FuncJon J( ) = 1 2n For insight on J(), let s assume nx h x (i) y (i) 2 i=1 x 2 R so =[ 0, 1 ] (for fixed, this is a funcjon of x) (funcjon of the parameter ) 3 3 y Based on example by Andrew Ng x J([0, 0.5]) = (0.5 1) 2 +(1 2) 2 +(1.5 3)

9 IntuiJon Behind Cost FuncJon J( ) = 1 2n For insight on J(), let s assume nx h x (i) y (i) 2 i=1 x 2 R so =[ 0, 1 ] y (for fixed, this is a funcjon of x) (funcjon of the parameter ) 3 3 J([0, 0]) J() is concave x Based on example by Andrew Ng 10

10 IntuiJon Behind Cost FuncJon Slide by Andrew Ng 11

11 IntuiJon Behind Cost FuncJon (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 12

12 IntuiJon Behind Cost FuncJon (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 13

13 IntuiJon Behind Cost FuncJon (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 14

14 IntuiJon Behind Cost FuncJon (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 15

15 Basic Search Procedure Choose inijal value for UnJl we reach a minimum: Choose a new value for to reduce J( ) J(θ 0,θ 1 ) θ 1 θ 0 Figure by Andrew Ng 16

16 Basic Search Procedure Choose inijal value for UnJl we reach a minimum: Choose a new value for to reduce J( ) J(θ 0,θ 1 ) θ 1 θ 0 Figure by Andrew Ng 17

17 Basic Search Procedure Choose inijal value for UnJl we reach a minimum: Choose a new value for to reduce J( ) J(θ 0,θ 1 ) Since the least squares objecjve funcjon is convex θ 1 (concave), θ 0 we don t need to worry about local minima Figure by Andrew Ng 18

18 Gradient Descent IniJalize Repeat unjl convergence j J( j simultaneous update for j = 0... d learning rate (small) e.g., α = J( )

19 Gradient Descent IniJalize Repeat unjl convergence j J( j simultaneous update for j = 0... d For j J( ) j 2n nx h x (i) y (i) 2 i=1 X X! 2 20

20 Gradient Descent IniJalize Repeat unjl convergence j J( j simultaneous update for j = 0... d For j J( ) j j 1 2n X nx h x (i) y (i) 2 i=1 nx i=1 X dx k=0 k x (i) k! y (i)! 2 X! 21

21 Gradient Descent IniJalize Repeat unjl convergence j J( j simultaneous update for j = 0... d For j J( ) j j 1 2n = 1 n nx i=1 X nx h x (i) y (i) 2 i=1 nx i=1 dx k=0 X dx k=0 k x (i) k k x (i) k y (i)!! y j dx k=0 k x (i) k y (i)! 22

22 Gradient Descent IniJalize Repeat unjl convergence j J( j simultaneous update for j = 0... d For j J( ) j j 1 2n = 1 n = 1 n nx i=1 nx i=1 nx h x (i) y (i) 2 i=1 nx i=1 dx k=0 dx k=0 dx k=0 k x (i) k k x (i) k k x (i) k y (i) y (i)!! nx y (i) x (i) j dx k=0 k x (i) k y (i)! 23

23 Gradient Descent for Linear Regression IniJalize Repeat unjl convergence j j 1 nx h x (i) n i=1 y (i) x (i) j simultaneous update for j = 0... d To achieve simultaneous update At the start of each GD iterajon, compute h x (i) Use this stored value in the update step loop Assume convergence when L kvk 2 = 2 norm: s X vi qv 2 = v v2 v i k new old k 2 < 24

24 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) h(x) = x Slide by Andrew Ng 25

25 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 26

26 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 27

27 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 28

28 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 29

29 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 30

30 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 31

31 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 32

32 Gradient Descent (for fixed, this is a funcjon of x) (funcjon of the parameters ) Slide by Andrew Ng 33

33 Choosing α α too small slow convergence α too large Increasing value for J( ) May overshoot the minimum May fail to converge May even diverge To see if gradient descent is working, print out The value should decrease at each iterajon If it doesn t, adjust α J( ) each iterajon 34

34 Extending Linear Regression to More Complex Models The inputs X for linear regression can be: Original quanjtajve inputs TransformaJon of quanjtajve inputs e.g. log, exp, square root, square, etc. Polynomial transformajon example: y = β 0 + β 1 x + β 2 x 2 + β 3 x 3 Basis expansions Dummy coding of categorical inputs InteracJons between variables example: x 3 = x 1 x 2 This allows use of linear regression techniques to fit non- linear datasets.

35 Linear Basis FuncJon Models Generally, h (x) = dx j=0 j j (x) basis funcjon Typically, 0(x) =1so that 0 acts as a bias In the simplest case, we use linear basis funcjons : j(x) =x j Based on slide by Christopher Bishop (PRML)

36 Linear Basis FuncJon Models Polynomial basis funcjons: These are global; a small change in x affects all basis funcjons Gaussian basis funcjons: These are local; a small change in x only affect nearby basis funcjons. μ j and s control locajon and scale (width). Based on slide by Christopher Bishop (PRML)

37 Based on slide by Christopher Bishop (PRML) Linear Basis FuncJon Models Sigmoidal basis funcjons: where These are also local; a small change in x only affects nearby basis funcjons. μ j and s control locajon and scale (slope).

38 Example of Fimng a Polynomial Curve with a Linear Model y = x + 2 x p x p = px j x j j=0

39 Linear Basis FuncJon Models Basic Linear Model: Generalized Linear Model: h (x) = h (x) = Once we have replaced the data by the outputs of the basis funcjons, fimng the generalized model is exactly the same problem as fimng the basic model Unless we use the kernel trick more on that when we cover support vector machines Therefore, there is no point in clufering the math with basis funcjons dx j=0 dx j=0 j x j j j (x) Based on slide by Geoff Hinton 40

40 Linear Algebra Concepts Vector in R d is an ordered set of d real numbers e.g., v = [1,6,3,4] is in R 4 [1,6,3,4] is a column vector: as opposed to a row vector: ( ) An m- by- n matrix is an object with m rows and n columns, where each entry is a real number: Based on slides by Joseph Bradley

41 Linear Algebra Concepts Transpose: reflect vector/matrix on line: a T = b ( a b) a c b d T = a b c d Note: (Ax) T =x T A T (We ll define muljplicajon soon ) Vector norms: L p norm of v = (v 1,,v k ) is Common norms: L 1, L 2 L infinity = max i v i Length of a vector v is L 2 (v) Based on slides by Joseph Bradley X i v i p! 1 p

42 Linear Algebra Concepts Vector dot product: ( u1 u2) ( v1 v2 ) = u1v 1 u2v2 u v = + Note: dot product of u with itself = length(u) 2 = kuk 2 2 Matrix product: A AB a = a a = a b b a a , + + B = a a b b b b a a b b b b a a b b Based on slides by Joseph Bradley

43 Vector products: Dot product: Outer product: ( ) v u u v v v u u v u v u T + = = = ( ) = = v u v u u v u v v v u u uv T Based on slides by Joseph Bradley Linear Algebra Concepts

44 Benefits of vectorizajon More compact equajons VectorizaJon Faster code (using opjmized matrix libraries) Consider our model: Let = d h(x) = dx j=0 j x j x = 1 x 1... x d Can write the model in vectorized form as h(x) = x 45

45 VectorizaJon Consider our model for n instances: h x (i) dx = j x (i) j Let = d R (d+1) X = Can write the model in vectorized form as j=0 1 x (1) 1... x (1) d x (i) 1... x (i) d x (n) 1... x (n) d R n (d+1) h (x) =X 46

46 VectorizaJon Let: y = For the linear regression cost funcjon: J( ) = 1 nx h x (i) y (i) 2 2n y (1) y (2). y (n) = 1 2n i=1 nx i=1 x (i) y (i) 2 = 1 2n (X y) (X y) R 1 n R n 1 R n (d+1) R (d+1) 1 47

47 Closed Form SoluJon Instead of using GD, solve for opjmal NoJce that the solujon is when DerivaJon: Take derivajve and set equal to 0, then solve for ( X X 2 X y + y (X X) X y =0 Closed Form J( ) ( ) = 1 2n (X y) 1 x 1 J (X y) / X X y X X y + y y / / X X 2 X y + y y (X X) = X y analyjcally =(X X) 1 X y 48

48 Closed Form SoluJon Can obtain by simply plugging X and y into =(X X) 1 X y X = x (1) 1... x (1) d x (i) 1... x (i) d x (n) 1... x (n) d y = 6 4 If X T X is not inverjble (i.e., singular), may need to: Use pseudo- inverse instead of the inverse In python, numpy.linalg.pinv(a) Remove redundant (not linearly independent) features Remove extra features to ensure that d n y (1) y (2). y (n)

49 Gradient Descent vs Closed Form Gradient Descent Requires muljple iterajons Need to choose α Works well when n is large Can support incremental learning Closed Form Solu+on Non- iterajve No need for α Slow if n is large CompuJng (X T X) - 1 is roughly O(n 3 ) 50

50 Improving Learning: Feature Scaling Idea: Ensure that feature have similar scales Before Feature Scaling Makes gradient descent converge much faster Auer Feature Scaling

51 Feature StandardizaJon Rescales features to have zero mean and unit variance Let μ j be the mean of feature j: µ j = 1 nx x (i) j n Replace each value with: x (i) j x (i) j s j s j is the standard deviajon of feature j Could also use the range of feature j (max j min j ) for s j Must apply the same transformajon to instances for both training and predicjon Outliers can cause problems µ j i=1 for j = 1...d (not x 0!) 52

52 Quality of Fit Price Price Price Size Underfimng (high bias) Size Correct fit Size Overfimng (high variance) OverfiHng: The learned hypothesis may fit the training set very well ( J( ) 0 )...but fails to generalize to new examples Based on example by Andrew Ng 53

53 RegularizaJon A method for automajcally controlling the complexity of the learned hypothesis Idea: penalize for large values of Can incorporate into the cost funcjon Works well when we have a lot of features, each that contributes a bit to predicjng the label Can also address overfimng by eliminajng features (either manually or via model selecjon) j 54

54 RegularizaJon Linear regression objecjve funcjon J( ) = 1 2n nx i=1 h x (i) y (i) dx j=1 2 j model fit to data regularizajon is the regularizajon parameter ( 0) No regularizajon on 0! 55

55 Understanding RegularizaJon J( ) = 1 2n nx i=1 h x (i) y (i) dx j=1 2 j dx Note that j 2 = k 1:d k 2 2 j=1 This is the magnitude of the feature coefficient vector! We can also think of this as: dx ( j 0) 2 = k 1:d ~0k 2 2 j=1 L 2 regularizajon pulls coefficients toward 0 56

56 Understanding RegularizaJon J( ) = 1 2n nx i=1 h x (i) y (i) dx j=1 2 j What happens if we set to be huge (e.g., )? Price Size Based on example by Andrew Ng 57

57 Understanding RegularizaJon J( ) = 1 2n nx i=1 h x (i) y (i) dx j=1 2 j What happens if we set to be huge (e.g., )? Price Size Based on example by Andrew Ng 58

58 Regularized Linear Regression Cost FuncJon J( ) = 1 2n Fit by solving nx i=1 min h x (i) y (i) dx J( ) j=1 0 j J( ) Gradient update: n j j 1 n nx i=1 nx i=1 h x (i) y (i) h x (i) y (i) x (i) j j regularizajon 59

59 Regularized Linear Regression J( ) = 1 2n n nx h x (i) y (i) 2 dx + 2 i=1 j j 1 n nx h x (i) y (i) i=1 nx i=1 h x (i) y (i) x (i) j j=1 2 j j We can rewrite the gradient step as: j j (1 ) 1 nx h x (i) n i=1 y (i) x (i) j 60

60 Regularized Linear Regression To incorporate regularizajon into the closed form solujon: = B X X X y C. 5A

61 Regularized Linear Regression To incorporate regularizajon into the closed form solujon: = B X X X y C. 5A Can derive this the same way, by solving Can prove that for λ > 0, inverse exists in the equajon J( ) 62

Linear Regression & Gradient Descent

Linear Regression & Gradient Descent Linear Regression & Gradient Descent These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online.

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

CSC 4510 Machine Learning

CSC 4510 Machine Learning 4: Regression (con.nued) CSC 4510 Machine Learning Dr. Mary Angela Papalaskari Department of CompuBng Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ The slides in this presentabon

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 22 2019 Outline Practical issues in Linear Regression Outliers Categorical variables Lab

More information

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com Neural Networks These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides

More information

UVA CS 4501: Machine Learning. Lecture 22: Unsupervised Clustering (I) Dr. Yanjun Qi. University of Virginia. Department of Computer Science

UVA CS 4501: Machine Learning. Lecture 22: Unsupervised Clustering (I) Dr. Yanjun Qi. University of Virginia. Department of Computer Science UVA CS : Machine Learning Lecture : Unsupervised Clustering (I) Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è major secjons of this course q Regression (supervised)

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 18 2018 Logistics HW 1 is on Piazza and Gradescope Deadline: Friday, Sept. 28, 2018 Office

More information

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 19: Unsupervised Clustering (I) Dr. Yanjun Qi. University of Virginia

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 19: Unsupervised Clustering (I) Dr. Yanjun Qi. University of Virginia Dr. Yanjun Qi / UVA CS 66 / f6 UVA CS 66/ Fall 6 Machine Learning Lecture 9: Unsupervised Clustering (I) Dr. Yanjun Qi University of Virginia Department of Computer Science Dr. Yanjun Qi / UVA CS 66 /

More information

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Linear regression Linear Basis FuncDon Models (1) Example: Polynomial Curve FiLng Linear Basis FuncDon Models (2) Generally

More information

CPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015

CPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015 CPSC 340: Machine Learning and Data Mining Robust Regression Fall 2015 Admin Can you see Assignment 1 grades on UBC connect? Auditors, don t worry about it. You should already be working on Assignment

More information

06: Logistic Regression

06: Logistic Regression 06_Logistic_Regression 06: Logistic Regression Previous Next Index Classification Where y is a discrete value Develop the logistic regression algorithm to determine what class a new input should fall into

More information

Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version)

Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version) Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version) Note: All source materials and diagrams are taken from the Coursera s lectures created by Dr Andrew Ng. Everything I have

More information

Machine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD

Machine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD Machine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD Due: Friday, February 5, 2015, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions

More information

Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD

Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD Due: Friday, February 6, 2015, at 4pm (Submit via NYU Classes) Instructions: Your answers to the questions

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 3 Parametric Distribu>ons We want model the probability

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

All lecture slides will be available at CSC2515_Winter15.html

All lecture slides will be available at  CSC2515_Winter15.html CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

More information

3 Types of Gradient Descent Algorithms for Small & Large Data Sets

3 Types of Gradient Descent Algorithms for Small & Large Data Sets 3 Types of Gradient Descent Algorithms for Small & Large Data Sets Introduction Gradient Descent Algorithm (GD) is an iterative algorithm to find a Global Minimum of an objective function (cost function)

More information

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017 CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other

More information

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Logis&c Regression. Aar$ Singh & Barnabas Poczos. Machine Learning / Jan 28, 2014

Logis&c Regression. Aar$ Singh & Barnabas Poczos. Machine Learning / Jan 28, 2014 Logis&c Regression Aar$ Singh & Barnabas Poczos Machine Learning 10-701/15-781 Jan 28, 2014 Linear Regression & Linear Classifica&on Weight Height Linear fit Linear decision boundary 2 Naïve Bayes Recap

More information

Optimization and least squares. Prof. Noah Snavely CS1114

Optimization and least squares. Prof. Noah Snavely CS1114 Optimization and least squares Prof. Noah Snavely CS1114 http://cs1114.cs.cornell.edu Administrivia A5 Part 1 due tomorrow by 5pm (please sign up for a demo slot) Part 2 will be due in two weeks (4/17)

More information

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures: Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Simplest

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [ Based on slides from Andrew Moore http://www.cs.cmu.edu/~awm/tutorials] slide 1

More information

REGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA

REGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA REGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA MACHINE LEARNING It is the science of getting computer to learn without being explicitly programmed. Machine learning is an area of artificial

More information

Computational Intelligence (CS) SS15 Homework 1 Linear Regression and Logistic Regression

Computational Intelligence (CS) SS15 Homework 1 Linear Regression and Logistic Regression Computational Intelligence (CS) SS15 Homework 1 Linear Regression and Logistic Regression Anand Subramoney Points to achieve: 17 pts Extra points: 5* pts Info hour: 28.04.2015 14:00-15:00, HS i12 (ICK1130H)

More information

Divide and Conquer Kernel Ridge Regression

Divide and Conquer Kernel Ridge Regression Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem

More information

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression Goals: To open up the black-box of scikit-learn and implement regression models. To investigate how adding polynomial

More information

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

CS535 Big Data Fall 2017 Colorado State University   10/10/2017 Sangmi Lee Pallickara Week 8- A. CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE

More information

2. Linear Regression and Gradient Descent

2. Linear Regression and Gradient Descent Pattern Recognition And Machine Learning - EPFL - Fall 2015 Emtiyaz Khan, Timur Bagautdinov, Carlos Becker, Ilija Bogunovic & Ksenia Konyushkova 2. Linear Regression and Gradient Descent 2.1 Goals The

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different

More information

Support vector machines

Support vector machines Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Gradient Descent - Problem of Hiking Down a Mountain

Gradient Descent - Problem of Hiking Down a Mountain Gradient Descent - Problem of Hiking Down a Mountain Udacity Have you ever climbed a mountain? I am sure you had to hike down at some point? Hiking down is a great exercise and it is going to help us understand

More information

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Regression

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

PS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is:

PS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is: Economics 1660: Big Data PS 6: Regularization Prof. Daniel Björkegren PART A: (Source: HTF page 95) The Ridge regression problem is: : β "#$%& = argmin (y # β 2 x #4 β 4 ) 6 6 + λ β 4 #89 Consider the

More information

Machine Learning Lecture-1

Machine Learning Lecture-1 Machine Learning Lecture-1 Programming Club Indian Institute of Technology Kanpur pclubiitk@gmail.com February 25, 2016 Programming Club (IIT Kanpur) ML-1 February 25, 2016 1 / 18 Acknowledgement This

More information

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017 CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after

More information

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Multi-Class Classification Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation

More information

Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent

Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent Slide credit: http://sebastianruder.com/optimizing-gradient-descent/index.html#batchgradientdescent

More information

Machine Learning: Think Big and Parallel

Machine Learning: Think Big and Parallel Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

More information

CSC 411: Lecture 02: Linear Regression

CSC 411: Lecture 02: Linear Regression CSC 411: Lecture 02: Linear Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 16, 2015 Urtasun & Zemel (UofT) CSC 411: 02-Regression Sep 16, 2015 1 / 16 Today Linear regression problem continuous

More information

Lecture 3: Linear Classification

Lecture 3: Linear Classification Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.

More information

Machine Learning. Topic 4: Linear Regression Models

Machine Learning. Topic 4: Linear Regression Models Machine Learning Topic 4: Linear Regression Models (contains ideas and a few images from wikipedia and books by Alpaydin, Duda/Hart/ Stork, and Bishop. Updated Fall 205) Regression Learning Task There

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

Kernel Methods. Chapter 9 of A Course in Machine Learning by Hal Daumé III. Conversion to beamer by Fabrizio Riguzzi

Kernel Methods. Chapter 9 of A Course in Machine Learning by Hal Daumé III.   Conversion to beamer by Fabrizio Riguzzi Kernel Methods Chapter 9 of A Course in Machine Learning by Hal Daumé III http://ciml.info Conversion to beamer by Fabrizio Riguzzi Kernel Methods 1 / 66 Kernel Methods Linear models are great because

More information

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning

More information

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies. CSE 547: Machine Learning for Big Data Spring 2019 Problem Set 2 Please read the homework submission policies. 1 Principal Component Analysis and Reconstruction (25 points) Let s do PCA and reconstruct

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Deep Neural Networks Optimization

Deep Neural Networks Optimization Deep Neural Networks Optimization Creative Commons (cc) by Akritasa http://arxiv.org/pdf/1406.2572.pdf Slides from Geoffrey Hinton CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes

CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes 1 CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June

More information

IST 597 Foundations of Deep Learning Fall 2018 Homework 1: Regression & Gradient Descent

IST 597 Foundations of Deep Learning Fall 2018 Homework 1: Regression & Gradient Descent IST 597 Foundations of Deep Learning Fall 2018 Homework 1: Regression & Gradient Descent This assignment is worth 15% of your grade for this class. 1 Introduction Before starting your first assignment,

More information

Optimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund

Optimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund Optimization Plugin for RapidMiner Technical Report Venkatesh Umaashankar Sangkyun Lee 04/2012 technische universität dortmund Part of the work on this technical report has been supported by Deutsche Forschungsgemeinschaft

More information

CS 179 Lecture 16. Logistic Regression & Parallel SGD

CS 179 Lecture 16. Logistic Regression & Parallel SGD CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)

More information

CS281 Section 3: Practical Optimization

CS281 Section 3: Practical Optimization CS281 Section 3: Practical Optimization David Duvenaud and Dougal Maclaurin Most parameter estimation problems in machine learning cannot be solved in closed form, so we often have to resort to numerical

More information

3.3 Optimizing Functions of Several Variables 3.4 Lagrange Multipliers

3.3 Optimizing Functions of Several Variables 3.4 Lagrange Multipliers 3.3 Optimizing Functions of Several Variables 3.4 Lagrange Multipliers Prof. Tesler Math 20C Fall 2018 Prof. Tesler 3.3 3.4 Optimization Math 20C / Fall 2018 1 / 56 Optimizing y = f (x) In Math 20A, we

More information

Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

More information

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear

More information

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,

More information

Content-based image and video analysis. Machine learning

Content-based image and video analysis. Machine learning Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP

More information

Penalizied Logistic Regression for Classification

Penalizied Logistic Regression for Classification Penalizied Logistic Regression for Classification Gennady G. Pekhimenko Department of Computer Science University of Toronto Toronto, ON M5S3L1 pgen@cs.toronto.edu Abstract Investigation for using different

More information

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005 Locally Weighted Learning for Control Alexander Skoglund Machine Learning Course AASS, June 2005 Outline Locally Weighted Learning, Christopher G. Atkeson et. al. in Artificial Intelligence Review, 11:11-73,1997

More information

Lecture #11: The Perceptron

Lecture #11: The Perceptron Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be

More information

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017 Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest

More information

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data

More information

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2016

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2016 CPSC 340: Machine Learning and Data Mining Regularization Fall 2016 Assignment 2: Admin 2 late days to hand it in Friday, 3 for Monday. Assignment 3 is out. Due next Wednesday (so we can release solutions

More information

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)

More information

Overfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff

Overfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff Overfitting Machine Learning CSE546 Carlos Guestrin University of Washington October 2, 2013 1 Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course

More information

Learning via Optimization

Learning via Optimization Lecture 7 1 Outline 1. Optimization Convexity 2. Linear regression in depth Locally weighted linear regression 3. Brief dips Logistic Regression [Stochastic] gradient ascent/descent Support Vector Machines

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

Clustering algorithms and autoencoders for anomaly detection

Clustering algorithms and autoencoders for anomaly detection Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms

More information

Convexization in Markov Chain Monte Carlo

Convexization in Markov Chain Monte Carlo in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non

More information

6 Model selection and kernels

6 Model selection and kernels 6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)

More information

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs Natural Language Processing with Deep Learning CS4N/Ling84 Christopher Manning Lecture 4: Backpropagation and computation graphs Lecture Plan Lecture 4: Backpropagation and computation graphs 1. Matrix

More information

Chapter 7: Numerical Prediction

Chapter 7: Numerical Prediction Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.

More information

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\ Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured

More information

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\ Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization

More information

convolution shift invariant linear system Fourier Transform Aliasing and sampling scale representation edge detection corner detection

convolution shift invariant linear system Fourier Transform Aliasing and sampling scale representation edge detection corner detection COS 429: COMPUTER VISON Linear Filters and Edge Detection convolution shift invariant linear system Fourier Transform Aliasing and sampling scale representation edge detection corner detection Reading:

More information

1 Training/Validation/Testing

1 Training/Validation/Testing CPSC 340 Final (Fall 2015) Name: Student Number: Please enter your information above, turn off cellphones, space yourselves out throughout the room, and wait until the official start of the exam to begin.

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information