Non-linear models and model selection for spectral data. The data

Size: px
Start display at page:

Download "Non-linear models and model selection for spectral data. The data"

Transcription

1 Non-linear models and model selection for spectral data Michel Verleysen Université catholique de Louvain (Louvain-la-Neuve, Belgium) November 00 Michel Verleysen Non-linear models and model selection for spectral data - The data p middle infrared spectrum of wine samples A fine wine Another fine wine Michel Verleysen Non-linear models and model selection for spectral data -

2 p differences between spectra The data A fine wine (middle infrared for alcool) A not-so-fine juice (near infrared for sugar) Michel Verleysen Non-linear models and model selection for spectral data - 3 Calibration 0.60 input-output pair alcool concentration: calibration model p But: p order of input variables: not relevant for analysis p large dependencies between variables value to «predict» Michel Verleysen Non-linear models and model selection for spectral data - 4

3 dimension of data: D Modelling number of data: N known data model known outputs number of data: N learning dimension of data: D new data model generalization unknown outputs?? Michel Verleysen Non-linear models and model selection for spectral data - 5 Spectra: high-dimensional data dimension of data: D number of data: N known data X model Y = WX known outputs T number of data: N p Linear model p X : N x D p Y and T : N x p Learning p W = (X T X) - X T T p if D<N: least mean squares solution: min( T-WX ) p if D>N: impossible! Michel Verleysen Non-linear models and model selection for spectral data - 6 3

4 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 7 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 8 4

5 John Wilder Tukey p The Future of Data Analysis, Ann. Math. Statis., 33, -67, 96. «Analyze data rather than prove theorems» p In other words: p data are here p they will be coming more and more in the future p we must analyze them p with very humble means p insistence on mathematics will distract us from fundamental points From D. L. Donoho, High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. Lecture on August 8, 000, to the American Mathematical Society "Math Challenges of the st Century". Available from Michel Verleysen Non-linear models and model selection for spectral data - 9 Empty space phenomenon f(x) f(x) x x p Necessity fo fill space with learning points p # learning points exponential with dimension Michel Verleysen Non-linear models and model selection for spectral data - 0 5

6 Example: Silvermann s result p How to approximate a Gaussian distribution with Gaussian kernels p Desired accuracy: 90% # points E+06 E+05 E+04 E+03 E+0 E+0 E dim Michel Verleysen Non-linear models and model selection for spectral data - Surprizing phenomena in HD spaces p Sphere volume d / π V ( d ) = Γ r ( ) d d / + volume p Sphere volume / cube volume p Embedded spheres (radius ratio = 0.9) dim volume volume dim dim Michel Verleysen Non-linear models and model selection for spectral data - 6

7 Gaussian kernels p -D Gaussian % points p % points inside a sphere of radius dim Michel Verleysen Non-linear models and model selection for spectral data - 3 Concentration of measure phenomenon p Take all pairwise distances in random data p Compute the average A and the variance V of these distances p If D increases then p V remains fixed p A increases p All distances seem to concentrate!!! p Example: Eucliden norm of samples p average A increases with (D) 0.5 p variance V remains fixed p samples seem to be normalized! Michel Verleysen Non-linear models and model selection for spectral data - 4 7

8 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 5 Linear and non-linear models p Linear models + fixed number of parameters + low number of parameters + direct learning + no local minima - restricted to linear problems p Non-linear models - variable number of parameters - large number of parameters - adaptive learning - local minima + valid for any problem Michel Verleysen Non-linear models and model selection for spectral data - 6 8

9 Non-linear models RBF MLP x 0 (bias) x z 0 (bias) g z x 0 (bias) () w ij ( ) w ij h y x ϕ( x c i ) (linear terms) σ w i F(x) x D g z K st layer nd layer c ij x d σk st layer nd layer if several outputs Michel Verleysen Non-linear models and model selection for spectral data - 7 Non-linear models p Number of parameters p MLP: (D+)K + K + weights p RBF: K + weights KD centers K widths p In any case: large if D large! p If # of parameters > # data: ill-posed problem p infinity of solutions p ad-hoc criteria: one solution (linear case) p always a «solution» (non-linear case) p but overfitting! Michel Verleysen Non-linear models and model selection for spectral data - 8 9

10 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 9 Validation and test p Basic principle: never test on learning data Data Learning set Test set Michel Verleysen Non-linear models and model selection for spectral data - 0 0

11 Validation and test p Basic principle: never test on learning data p Corrolary: never compare models on learning data Data N N N 3 Learning set Validation set Test set Michel Verleysen Non-linear models and model selection for spectral data - Validation and test p Basic principle: never test on learning data p Corrolary: never compare models on learning data Data N N N 3 Learning set Validation set Test set p Learning set: used to lear each model p Validation set: used to compare models and select one of them p Test set: used to estimate the error made by the selected model Michel Verleysen Non-linear models and model selection for spectral data -

12 Validation and test p Each model: dependent on (sub)set of data several learnings for each model validation errors are averaged p Average validations errors are compared (between models) p The best is selected p Its error is estimated on test set p To select subsets of data: several ways p (k-fold) (cross)-validation p bootstrap p Particularly important with small sets of high-dimensional data! Michel Verleysen Non-linear models and model selection for spectral data - 3 Validation Data N N Learning set Validation set A model is built Error ( ŷ y ) t Ê t VS gen = N t Michel Verleysen Non-linear models and model selection for spectral data - 4

13 Cross-validation Data N N Learning set Validation set A model is built Experience repeated K times Error k ( ŷ ) K t yt t VS Êgen = K k= N Michel Verleysen Non-linear models and model selection for spectral data - 5 K-fold cross-validation Data N/K Validation set Learning set Error Êgen = K k= k ( ŷ ) K t yt t VS N / K A model is built Experience repeated K times Michel Verleysen Non-linear models and model selection for spectral data - 6 3

14 Leave-one-out Data Validation set Learning set Error A model is built Êgen = N k= N k ( ŷt yt ) Experience repeated K times Michel Verleysen Non-linear models and model selection for spectral data - 7 Bootstrap: plug-in principle World Sample Sample = new world New sample Michel Verleysen Non-linear models and model selection for spectral data - 8 4

15 Bootstrap : plug-in principle Egen = Esample + E World Sample Sample = new world Learning E = E new world Enew sample New sample Esample E new world Learning E new sample Michel Verleysen Non-linear models and model selection for spectral data - 9 Bootstrap : plug-in principle Sample = new world New sample Michel Verleysen Non-linear models and model selection for spectral data

16 Bootstrap Egen ( θ) = E ( θ) E ( θ) + E ( θ) Definition: D gen Egen sample ( θ) = D( θ) + E ( θ) sample sample ( θ) = E gen ( θ) E ( θ) ˆ sample Estimate: Dˆ ( θ) = E new ( θ) E ( θ) world new sample Michel Verleysen Non-linear models and model selection for spectral data - 3 Bootstrap Egen ( θ) = E ( θ) E ( θ) + E ( θ) Definition: D gen Egen sample ( θ) = D( θ) + E ( θ) sample sample ( θ) = E gen ( θ) E ( θ) ˆ sample K ( ) K k = k k Estimate: Dˆ ( θ) = E ( θ) E ( θ) Êgen new world new sample K ( ) K k = k k ( θ) = E ( θ) + E ( θ) E ( θ) sample new world new sample Michel Verleysen Non-linear models and model selection for spectral data - 3 6

17 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 33 Reducing the number of inputs p Overfitting (even in linear case)! = too many parameters (wrt # of inputs) p ex: polynomial approximation y y x x x C C x p Simplest model: linear (# parameters = # inputs + ) p But sometimes # known observations < # inputs! p Worst with non-linear model Michel Verleysen Non-linear models and model selection for spectral data

18 Multiple Linear Regression (MLR) p Simplest model t y = w 0 + w x + w x w D x D W = if T T ( X X ) X T E = min ( T Y ) p no non-linear capabilities p N > D necessary! X = M t t T = t M N x x M N x x x M N x x3 x3 M N x3 L L O L M xd xd N xd Michel Verleysen Non-linear models and model selection for spectral data - 35 Reducing the number of inputs p Necessity to reduce the number of inputs! dimension of data: D dimension of data: D number of data: N data X selection projection data X model Y = WX outputs Y, T Michel Verleysen Non-linear models and model selection for spectral data

19 Reducing the number of inputs p Necessity to reduce the number of inputs! dimension of data: D dimension of data: D number of data: N data X selection projection data X model Y = WX outputs Y, T p Selection: variables x are found among variables x Michel Verleysen Non-linear models and model selection for spectral data - 37 Reducing the number of inputs p Necessity to reduce the number of inputs! dimension of data: D dimension of data: D number of data: N data X selection projection data X model Y = WX outputs Y, T p Projection: variables x are linear or non-linear combinations of variables x Michel Verleysen Non-linear models and model selection for spectral data

20 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 39 Projection. Linear projection + linear model p Linear projection: PCA (Principal Component Analysis) p PCA = Karhunen-Loève transform p projection to a smaller dimension space p aims: p dimension reduction p to loose a minimum of information p data compression and/or representation Michel Verleysen Non-linear models and model selection for spectral data

21 PCA: data centering - normalisation pcentered and normalised data pto be independent from measurement units pbecause the origin of the data has no physical signification pto make computations easier! pcoordinates (= columns) are transformed j j xi E xi var [ xi ] ( x ) i Michel Verleysen Non-linear models and model selection for spectral data - 4 PCA: search for axes p We look for axes which : p minimise projection errors p maximise the variance after projection p Along axis u (u = unit vector): p But E [ x ] = 0 p Therefore N j = V = N T j [ u ( x E[ x] )] j = T j T T [ u x ] u X Xu V = = Equivalent criteria! Michel Verleysen Non-linear models and model selection for spectral data - 4

22 PCA: covariance matrix p Variance and covariance : s kl = N N j xk j = x j l (if x are centered) p Variances et covariances matrix : C = s s s L L O s D sd T = X X M N sd Michel Verleysen Non-linear models and model selection for spectral data - 43 p Classical result : PCA: choice of direction Best choice for u is the eigenvector u associated to the largest eigenvalue λ of matrix C. p In the space orthogonal to u : Best choice for u is the eigenvector u associated to the largest eigenvalue λ of matrix C. p And so on... Michel Verleysen Non-linear models and model selection for spectral data - 44

23 PCA: properties p C is symmetrical p eigenvalues are real p eigenvectors are orthogonal p C is semi-positive definite p eigenvalues are positive or zero p Contribution of each axis to the variance : p Eigenvalues are ordered controbution of K first axes to the variance : p One chooses for example V K >90% λ V = k k D = λ i i i K K λ = i V K = V = i D i= λ i = i Michel Verleysen Non-linear models and model selection for spectral data - 45 PCA: example and difficulties p PCA example dim dim p Difficulty: PCA requires to diagonalise matrix C (dimension: D x D). Heavy if D is large! Michel Verleysen Non-linear models and model selection for spectral data

24 Projection. Linear projection + linear model p PCR: Principal Component Regression p Projection (PCA): does not take T into account (unsupervised projection) p # variables after projection: user-defined (no automatic procedure) 95% criterion or cross-validation necessary! Michel Verleysen Non-linear models and model selection for spectral data - 47 Projection. Linear projection + linear model p cross-validation original data set learning set validation set repeated K times data X L data X V projection (learning) projection (generaliz.) data X L data X V model (learn.) model (gener.) outputs Y targets T Validation Error Michel Verleysen Non-linear models and model selection for spectral data

25 Projection. Linear projection + linear model p cross-validation 0.7 learning and validation errors learning set validation set # variables Michel Verleysen Non-linear models and model selection for spectral data - 49 Projection. Linear projection + linear model p running example: juices (sugar), D = 700 (near infrared) p N = 8, when applicable: N L = 50, N V = predicted concentration NMSE V = principal components actual concentration Michel Verleysen Non-linear models and model selection for spectral data

26 Projection. Linear projection + non-linear model p idem but model: RBFN, MLP, etc. p non-linear possibilities p but: K times learning and choice of meta-parameters! Michel Verleysen Non-linear models and model selection for spectral data - 5 Projection. Linear projection + non-linear model 80 predicted concentration NMSE V = principal components RBFN with 6 functions actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 5 6

27 3. Partial Least Squares (PLS) p build latent variables p most correlated to output p orthogonal Projection p first variable regression t = w x + L+ w D x D y = ct + y = c p ( ) second variable wx + L+ wd xd + y t = wx + L+ wdxd y = ct + ct + y where x i are the residuals of the regression of x i on t p etc. p cross-validation! Michel Verleysen Non-linear models and model selection for spectral data Partial Least Squares (PLS) Projection 80 predicted concentration NMSE V = latent variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data

28 Projection 4. Partial Least Squares (PLS) + non-linear model p use latent variables from PLS p build non-linear model on these variables 80 predicted concentration NMSE V = latent variables RBFN with 6 functions actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 55 Projection 5. Non-linear projection + (non-)linear model p Non-linear projection: p because linear projections flatten distributions Michel Verleysen Non-linear models and model selection for spectral data

29 Non-linear projection: how? p Build a (bijective) relation between p the data in the original space p the data in the projected space p If bijection: p possibility to switch between representation spaces («information» rather than «measure») p Problems to consider: p noise p twists and folds p impossibility to build a bijection Michel Verleysen Non-linear models and model selection for spectral data - 57 Non-linear projection: algorithms p Variance preservation p Distance preservation (like MDS) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data

30 Non-linear projection: algorithms p Variance preservation p local PCA p non-continuous representation p kernel PCA p transform data (non-linearly) in higher dimensional space p project linearly (PCA) in this space p Distance preservation (like MDS) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data - 59 Non-linear projection: algorithms p Variance preservation p Distance preservation (like MDS) p Sammon s non-linear mapping p Curvilinear Component Analysis (CCA) / Curvilinear Distances Analysis (CDA) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data

31 Sammon s Non-Linear Mapping (NLM) / pcriterion to be optimized: pdistance preservation (cfr metric MDS) psammon s stress = δi,j i< j i < j ( δi, j di, j ) δi ppreservation of small distances firstly, j distances in original space distances in projection space pcalculation: pminimization by gradient descent Michel Verleysen Non-linear models and model selection for spectral data - 6 Sammon s Non-Linear Mapping (NLM) / p Example: p Shortcomings: p Global gradient: lateral faces are «compacted» p Computational load (preprocess with VQ) p Euclidean distance (use curvilinear distance) Michel Verleysen Non-linear models and model selection for spectral data - 6 3

32 Curvilinear Component Analysis (/) pcriterion to be optimized: pdistance preservation ppreservation of small distances firstly (but «tears» are allowed) E = ( δ d ) F(d ) p CCA r < s r,s r,s r, s pcalculation:. Vector Quantization as preprocessing. Minimization by stochastic gradient descent (±) 3. Interpolation Michel Verleysen Non-linear models and model selection for spectral data - 63 Curvilinear Component Analysis (/) p Example: p Shortcomings: p Convergence of the gradient descent: «torn» faces p Euclidean distance use curvilinear distance (becomes «Curvilinear Distances Analysis») Michel Verleysen Non-linear models and model selection for spectral data

33 NLP: use of curvilinear distance (/4) p Principle: Curvilinear (or geodesic) distance = Length of the shortest path from one node to another in a weighted graph Michel Verleysen Non-linear models and model selection for spectral data - 65 NLP: use of curvilinear distance (/4) p Useful for NLP Curvilinear distances are easier to preserve! Michel Verleysen Non-linear models and model selection for spectral data

34 NLP: use of curvilinear distance (3/4) p Integration in projection algorithms: ENLM = δi, j i < j i < j ( δi, j di, j ) δi, j use curvilinear distance (instead of Euclidean one) ECCA = r < s ( δr,s dr,s ) F(dr, s ) Michel Verleysen Non-linear models and model selection for spectral data - 67 NLP: use of curvilinear distance (4/4) Projected open box: Sammon s NLM with Euclidean distance Projected open box: Sammon s NLM with curvilinear distance Faces are «compacted» «Perfect»! Michel Verleysen Non-linear models and model selection for spectral data

35 Non-linear projection: algorithms p Variance preservation p Distance preservation (like MDS) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data - 69 Self-Organizing Map (SOM) (/) p Criterion to be optimized: p Quantization error & neighborhood preservation p No unique mathematical formulation of neighborhood criteria p Calculation: p Preestablished D or D grid: distance d(r,s) p Learning rule: r( i ) = argmin r Xi Cr Cr d ( r,r (i )) = α e λ ( Xi Cr ) Michel Verleysen Non-linear models and model selection for spectral data

36 Self-Organizing Map (SOM) (/) p Example: p Shortcomings: p Inadequate grid shape: faces are «cracked» p D or D grid only Michel Verleysen Non-linear models and model selection for spectral data - 7 Non-linear projection: algorithms p Variance preservation p Distance preservation (like MDS) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data

37 Autoassociative MLP (/) p Criterion to be minimized: Reconstruction error (MSE) after coding and decoding of the data with an autoassociative neural network (MLP) p Autoassociative MLP: unsupervised (in=out) Auto-encoding Michel Verleysen Non-linear models and model selection for spectral data - 73 p Example: Autoassociative MLP (/) p Shortcomings: p «Non-geometric» method p Slow and hasardous convergence (5 layers!) Michel Verleysen Non-linear models and model selection for spectral data

38 Projection 5. Non-linear projection + (non-)linear model p All methods: p computer-intensive p (up to now) work only in moderate dimensions (D 5 0) p promising research direction! p same limitation as PCA: user-defined projection dimension cross-validation necessary! Michel Verleysen Non-linear models and model selection for spectral data - 75 Projection 5. Non-linear projection + (non-)linear model p All methods: cross-validation necessary! original data set learning set validation set repeated K times data X L data X V projection (learning) projection (generaliz.) data X L data X V model (learn.) model (gener.) outputs Y targets T Validation Error Michel Verleysen Non-linear models and model selection for spectral data

39 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 77 Selection 6. Constructive (forward) linear selection p repeat for each variable x i p select one (any) variable x i p build a linear model with variable x i p compare models and select variable x a corresponding to best one p repeat for each variable x i except x a p select one (any) variable x i p build a linear model with variables x a and x i p compare models and select variable x b corresponding to best one p etc. Michel Verleysen Non-linear models and model selection for spectral data

40 Selection 6. Constructive (forward) linear selection p if D initial variables and F final variables: p FD F(F-)/ models to build! p in average (F=D/): 3/8 D models p if models compared on learning set: no stopping criterion! use validation set! p BUT: validations sets of variables! use validation to choose the number of variables only then new forward selection on whole set Michel Verleysen Non-linear models and model selection for spectral data - 79 Selection 6. Constructive (forward) linear selection 80 predicted concentration NMSE V = variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data

41 Selection 7. Constructive (forward) non-linear selection p Same as forward linear selection except model p Validation + choice of meta parameters may become computerintensive! p Cross-validation: sale problem as in linear p Low # of cross-validations: non-smooth error curve (wrt # variables) stopping difficult! Michel Verleysen Non-linear models and model selection for spectral data - 8 Selection 7. Constructive (forward) non-linear selection 80 predicted concentration NMSE V = variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 8 4

42 Selection 8. Destructive (backward) linear selection p repeat for each variable x i p select one (any) variable p build a linear model with all variables but x i p compare models and remove variable x a corresponding to worst one p repeat for each variable x j except x a p select one (any) variable p build a linear model with all variables but x a and x j p compare models and remove variable x b corresponding to worst one p etc. Michel Verleysen Non-linear models and model selection for spectral data - 83 Selection 8. Destructive (backward) linear selection p building a (linear) model is only possible if # variables # observations! need to first reduce the # variables p preliminary PCA p identify most correlated pairs of variables and eliminate one of them Michel Verleysen Non-linear models and model selection for spectral data

43 Selection 9. Forward-backward on learning set with Fisher test p idea: test the significance of a variable (added or removed) p how? ANOVA SS df MS F regression SSreg p- MSreg=SSreg/(p-) MSreg/MSres residuals SSres N-p MSres=SSres/(N-p) total SStot N- N SSres = ( yi ti ) i = N SSreg = ( yi t ) i = N SStot = SSreg + SSres = i = ( ti t ) N : # samples p = D : # parameters SSres R = SStot Michel Verleysen Non-linear models and model selection for spectral data - 85 Selection 9. Forward-backward on learning set with Fisher test Selection of first variable p repeat for each variable x i p select one (any) variable x i p build a linear model with variable x i p compare models and select variable x a corresponding to maximum R Michel Verleysen Non-linear models and model selection for spectral data

44 Selection 9. Forward-backward on learning set with Fisher test Forward selection p repeat for each variable x i except x a p select one (any) variable x i p build a linear model with variables x a and x i p compute F to enter ( x ) p choose x b =argmax(f to enter (x j )) j SSreg = MSres ( x j xa )/ ( x,x ) j a = ( SSreg( x j,xa ) SSreg( xa )) MSres( x,x ) j a / p compare to Fisher table (5% or 0%) and keep x b if F to enter (x j ) higher Michel Verleysen Non-linear models and model selection for spectral data - 87 Selection 9. Forward-backward on learning set with Fisher test Backward selection p repeat for each variable x i except x b p select one (any) variable x i p build a linear model with variables x b and without x i p compute Fto remove ( x ) p choose x c =argmin(f to remove (x j )) j = ( SSreg( x j,xb ) SSreg( xb )) MSres( x,x ) j a / p compare to Fisher table (5% or 0%) and remove x c if F to remove (x j ) lower Michel Verleysen Non-linear models and model selection for spectral data

45 Selection 9. Forward-backward on learning set with Fisher test 0 predicted concentration NMSE A =0.088 NMSE V = variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 89 Selection 0. Forward-backward linear selection p Forward selection: When x b is selected, no guarantee that x a remains optimal! p Ideal algorithm: p (re)test each variable each time a new one is selected p equivalent to exhaustive search! p Compromise: p forward selection p start backward selection from result p On validation set! Michel Verleysen Non-linear models and model selection for spectral data

46 Selection 0. Forward-backward linear selection predicted concentration actual concentration NMSE V = variables no backward! Michel Verleysen Non-linear models and model selection for spectral data - 9 Selection 0. Forward-backward linear selection p But if experience (forward) is repeated: p N = 8, when applicable: N L = 4, N V = 47, N T = Michel Verleysen Non-linear models and model selection for spectral data

47 Selection 0. Forward-backward linear selection p Then Forward-Backward on N A + N V 80 predicted concentration NMSE T = variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 93 Selection. Forward-backward non-linear selection p Same as forward linear selection except model p Cross-validation + choice of meta parameters may become computer-intensive! p Low # of cross-validations: non-smooth error curve (wrt # variables) stopping difficult! Michel Verleysen Non-linear models and model selection for spectral data

48 Selection. Forward-backward non-linear selection predicted concentration actual concentration NMSE V = variables RBFN with 8 functions Michel Verleysen Non-linear models and model selection for spectral data - 95 Acknowledgements p Part of this work has been realized in collaboration with PhD students: p Nabil Benoudjit p John Lee p Amaury Lendasse Michel Verleysen Non-linear models and model selection for spectral data

49 References p Curse of dimensionality: p D. L. Donoho, High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. Lecture on August 8, 000, to the American Mathematical Society "Math Challenges of the st Century". Available from p M. Verleysen, Learning high-dimensional data. Accepted for publication in Limitations and future trends in neural computation, S. Ablameyko, L. Goras, M. Gori, V. Piuri eds, IOS Press. p M. Verleysen, Machine learning of high-dimensional data: local artificial neural networks and the curse of dimensionality. Agregation in higher education thesis, UCL ( February 00). Michel Verleysen Non-linear models and model selection for spectral data - 97 References p Cross-validation and bootstrap p NN FAQ: p B. Efron and R.J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, first edition, 993. p A.C. Davison, D.V. Hinkley. Bootstrap Methods and their Applications. Cambridge University Press, 3rd edition, 999. p PCA p any good textbook on linear statistics p PLS p M. Tenenhaus, La régression PLS: théorie et pratique, Editions Technip, 998. Michel Verleysen Non-linear models and model selection for spectral data

50 References p Non-linear projections p W. Sammon, J. A nonlinear mapping algorithm for data structure analysis. IEEE Transactions on Computers, CC-8(5):40{409, 969. p P. Demartines, J. Herault. Curvilinear Component Analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transaction on Neural Networks, 8():48{54, January 997. p J.A. Lee, A. Lendasse, M. Verleysen, Curvilinear Distance Analysis versus Isomap, ESANN 00, European Symposium on Artificial Neural Networks, Bruges (Belgium), April 00, pp Michel Verleysen Non-linear models and model selection for spectral data - 99 References p Forward and/or backward selection p Eklov T, Martensson P., Lundstrom I, Selection of variables for interpreting multivariate gas sensor data, Analytica Chimica Acta 38 (999) -3. p Bertrand D., Dufour E., La spectroscopie infrarouge et ses applications analytiques, Editions Tec& Doc, collection sciences et techniques agroalimentaires, (000). p Massart D. L., Vandeginste B. G. M., Buydens L. M. C., De Jong S., Lewi P. J., Smeyers-Verbeke J., Handbook of Chemometrics and Qualimetrics : Part A, Elsevier Science, Amsterdam, 997. p A. D. Walmsley, Improved variable selection procedure for multivariate linear regression, Analytica Chimica Acta, 354 (997) 5-3. p N. Benoudjit, E. Cools, M. Meurens, M. Verleysen, Calibrage chimiométrique des spectrophotomètres : sélection et validation des variables par modèles non-linéaires, Accepted for Chimiométrie 00, 4-5 December 00, Paris (France). Michel Verleysen Non-linear models and model selection for spectral data

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi(

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi( Nonlinear projections Université catholique de Louvain (Belgium) Machine Learning Group http://www.dice.ucl ucl.ac.be/.ac.be/mlg/ 1 Motivation High-dimensional data are difficult to represent difficult

More information

Curvilinear Distance Analysis versus Isomap

Curvilinear Distance Analysis versus Isomap Curvilinear Distance Analysis versus Isomap John Aldo Lee, Amaury Lendasse, Michel Verleysen Université catholique de Louvain Place du Levant, 3, B-1348 Louvain-la-Neuve, Belgium {lee,verleysen}@dice.ucl.ac.be,

More information

How to project circular manifolds using geodesic distances?

How to project circular manifolds using geodesic distances? How to project circular manifolds using geodesic distances? John Aldo Lee, Michel Verleysen Université catholique de Louvain Place du Levant, 3, B-1348 Louvain-la-Neuve, Belgium {lee,verleysen}@dice.ucl.ac.be

More information

Feature clustering and mutual information for the selection of variables in spectral data

Feature clustering and mutual information for the selection of variables in spectral data Feature clustering and mutual information for the selection of variables in spectral data C. Krier 1, D. François 2, F.Rossi 3, M. Verleysen 1 1,2 Université catholique de Louvain, Machine Learning Group

More information

Chemometric calibration of infrared spectrometers: selection and validation of variables by non-linear models

Chemometric calibration of infrared spectrometers: selection and validation of variables by non-linear models Chemometrics and Intelligent Laboratory Systems 70 (2004) 47 53 www.elsevier.com/locate/chemolab Chemometric calibration of infrared spectrometers: selection and validation of variables by non-linear models

More information

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..

More information

Width optimization of the Gaussian kernels in Radial Basis Function Networks

Width optimization of the Gaussian kernels in Radial Basis Function Networks ESANN' proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), - April, d-side publi., ISBN -97--, pp. - Width optimization of the Gaussian kernels in Radial Basis Function Networks

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

Supervised Variable Clustering for Classification of NIR Spectra

Supervised Variable Clustering for Classification of NIR Spectra Supervised Variable Clustering for Classification of NIR Spectra Catherine Krier *, Damien François 2, Fabrice Rossi 3, Michel Verleysen, Université catholique de Louvain, Machine Learning Group, place

More information

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks Series Prediction as a Problem of Missing Values: Application to ESTSP7 and NN3 Competition Benchmarks Antti Sorjamaa and Amaury Lendasse Abstract In this paper, time series prediction is considered as

More information

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\ Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured

More information

Local multidimensional scaling with controlled tradeoff between trustworthiness and continuity

Local multidimensional scaling with controlled tradeoff between trustworthiness and continuity Local multidimensional scaling with controlled tradeoff between trustworthiness and continuity Jaro Venna and Samuel Kasi, Neural Networs Research Centre Helsini University of Technology Espoo, Finland

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 A / CX 4242 DVA March 6, 2014 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Analyze! Limited memory size! Data may not be fitted to the memory of your machine! Slow computation!

More information

CSE 6242 / CX October 9, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 / CX October 9, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 / CX 4242 October 9, 2014 Dimension Reduction Guest Lecturer: Jaegul Choo Volume Variety Big Data Era 2 Velocity Veracity 3 Big Data are High-Dimensional Examples of High-Dimensional Data Image

More information

Non-linear dimension reduction

Non-linear dimension reduction Sta306b May 23, 2011 Dimension Reduction: 1 Non-linear dimension reduction ISOMAP: Tenenbaum, de Silva & Langford (2000) Local linear embedding: Roweis & Saul (2000) Local MDS: Chen (2006) all three methods

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Hyperspectral Chemical Imaging: principles and Chemometrics.

Hyperspectral Chemical Imaging: principles and Chemometrics. Hyperspectral Chemical Imaging: principles and Chemometrics aoife.gowen@ucd.ie University College Dublin University College Dublin 1,596 PhD students 6,17 international students 8,54 graduate students

More information

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\ Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

SOM+EOF for Finding Missing Values

SOM+EOF for Finding Missing Values SOM+EOF for Finding Missing Values Antti Sorjamaa 1, Paul Merlin 2, Bertrand Maillet 2 and Amaury Lendasse 1 1- Helsinki University of Technology - CIS P.O. Box 5400, 02015 HUT - Finland 2- Variances and

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

SGN (4 cr) Chapter 10

SGN (4 cr) Chapter 10 SGN-41006 (4 cr) Chapter 10 Feature Selection and Extraction Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 18, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006

More information

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps Today Gradient descent for minimization of functions of real variables. Multi-dimensional scaling Self-organizing maps Gradient Descent Derivatives Consider function f(x) : R R. The derivative w.r.t. x

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling

More information

CIE L*a*b* color model

CIE L*a*b* color model CIE L*a*b* color model To further strengthen the correlation between the color model and human perception, we apply the following non-linear transformation: with where (X n,y n,z n ) are the tristimulus

More information

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks Part IV 1 Function approximation MLP is both a pattern classifier and a function approximator As a function approximator,

More information

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22 Feature selection Javier Béjar cbea LSI - FIB Term 2011/2012 Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/2012 1 / 22 Outline 1 Dimensionality reduction 2 Projections 3 Attribute selection

More information

Robust Pose Estimation using the SwissRanger SR-3000 Camera

Robust Pose Estimation using the SwissRanger SR-3000 Camera Robust Pose Estimation using the SwissRanger SR- Camera Sigurjón Árni Guðmundsson, Rasmus Larsen and Bjarne K. Ersbøll Technical University of Denmark, Informatics and Mathematical Modelling. Building,

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing feature 3 PC 3 Beate Sick Many slides are taken form Hinton s great lecture on NN: https://www.coursera.org/course/neuralnets

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM) School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression

More information

3 Nonlinear Regression

3 Nonlinear Regression CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Alternative Statistical Methods for Bone Atlas Modelling

Alternative Statistical Methods for Bone Atlas Modelling Alternative Statistical Methods for Bone Atlas Modelling Sharmishtaa Seshamani, Gouthami Chintalapani, Russell Taylor Department of Computer Science, Johns Hopkins University, Baltimore, MD Traditional

More information

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Correlated Correspondences [ASP*04] A Complete Registration System [HAW*08] In this session Advanced Global Matching Some practical

More information

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

MTTTS17 Dimensionality Reduction and Visualization. Spring 2018 Jaakko Peltonen. Lecture 11: Neighbor Embedding Methods continued

MTTTS17 Dimensionality Reduction and Visualization. Spring 2018 Jaakko Peltonen. Lecture 11: Neighbor Embedding Methods continued MTTTS17 Dimensionality Reduction and Visualization Spring 2018 Jaakko Peltonen Lecture 11: Neighbor Embedding Methods continued This Lecture Neighbor embedding by generative modeling Some supervised neighbor

More information

On the Kernel Widths in Radial-Basis Function Networks

On the Kernel Widths in Radial-Basis Function Networks Neural ProcessingLetters 18: 139 154, 2003. 139 # 2003 Kluwer Academic Publishers. Printed in the Netherlands. On the Kernel Widths in Radial-Basis Function Networks NABIL BENOUDJIT and MICHEL VERLEYSEN

More information

ESANN'2006 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2006, d-side publi., ISBN

ESANN'2006 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2006, d-side publi., ISBN ESANN'26 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 26-28 April 26, d-side publi., ISBN 2-9337-6-4. Visualizing the trustworthiness of a projection Michaël Aupetit

More information

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will Lectures Recent advances in Metamodel of Optimal Prognosis Thomas Most & Johannes Will presented at the Weimar Optimization and Stochastic Days 2010 Source: www.dynardo.de/en/library Recent advances in

More information

Chapter 7: Competitive learning, clustering, and self-organizing maps

Chapter 7: Competitive learning, clustering, and self-organizing maps Chapter 7: Competitive learning, clustering, and self-organizing maps António R. C. Paiva EEL 6814 Spring 2008 Outline Competitive learning Clustering Self-Organizing Maps What is competition in neural

More information

SELECTION OF A MULTIVARIATE CALIBRATION METHOD

SELECTION OF A MULTIVARIATE CALIBRATION METHOD SELECTION OF A MULTIVARIATE CALIBRATION METHOD 0. Aim of this document Different types of multivariate calibration methods are available. The aim of this document is to help the user select the proper

More information

Sensitivity to parameter and data variations in dimensionality reduction techniques

Sensitivity to parameter and data variations in dimensionality reduction techniques Sensitivity to parameter and data variations in dimensionality reduction techniques Francisco J. García-Fernández 1,2,MichelVerleysen 2, John A. Lee 3 and Ignacio Díaz 1 1- Univ. of Oviedo - Department

More information

Basis Functions. Volker Tresp Summer 2017

Basis Functions. Volker Tresp Summer 2017 Basis Functions Volker Tresp Summer 2017 1 Nonlinear Mappings and Nonlinear Classifiers Regression: Linearity is often a good assumption when many inputs influence the output Some natural laws are (approximately)

More information

LS-SVM Functional Network for Time Series Prediction

LS-SVM Functional Network for Time Series Prediction LS-SVM Functional Network for Time Series Prediction Tuomas Kärnä 1, Fabrice Rossi 2 and Amaury Lendasse 1 Helsinki University of Technology - Neural Networks Research Center P.O. Box 5400, FI-02015 -

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 9. Linear regression with latent variables 9.1 Principal component regression (PCR) 9.2 Partial least-squares regression (PLS) [ mostly

More information

Recent Developments in Model-based Derivative-free Optimization

Recent Developments in Model-based Derivative-free Optimization Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:

More information

3 Nonlinear Regression

3 Nonlinear Regression 3 Linear models are often insufficient to capture the real-world phenomena. That is, the relation between the inputs and the outputs we want to be able to predict are not linear. As a consequence, nonlinear

More information

3 Feature Selection & Feature Extraction

3 Feature Selection & Feature Extraction 3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy

More information

Locally Linear Landmarks for large-scale manifold learning

Locally Linear Landmarks for large-scale manifold learning Locally Linear Landmarks for large-scale manifold learning Max Vladymyrov and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu

More information

Mathematical morphology for grey-scale and hyperspectral images

Mathematical morphology for grey-scale and hyperspectral images Mathematical morphology for grey-scale and hyperspectral images Dilation for grey-scale images Dilation: replace every pixel by the maximum value computed over the neighborhood defined by the structuring

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression Machine learning, pattern recognition and statistical data modelling Lecture 3. Linear Methods (part 1) Coryn Bailer-Jones Last time... curse of dimensionality local methods quickly become nonlocal as

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

A Geometric Analysis of Subspace Clustering with Outliers

A Geometric Analysis of Subspace Clustering with Outliers A Geometric Analysis of Subspace Clustering with Outliers Mahdi Soltanolkotabi and Emmanuel Candés Stanford University Fundamental Tool in Data Mining : PCA Fundamental Tool in Data Mining : PCA Subspace

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

DD-HDS: a method for visualization and exploration of high-dimensional data

DD-HDS: a method for visualization and exploration of high-dimensional data TNN05-P800 1 DD-HDS: a method for visualization and exploration of high-dimensional data Sylvain Lespinats, Michel Verleysen, Senior Member, IEEE, Alain Giron, Bernard Fertil Abstract Mapping high-dimensional

More information

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract 19-1214 Chemometrics Technical Note Description of Pirouette Algorithms Abstract This discussion introduces the three analysis realms available in Pirouette and briefly describes each of the algorithms

More information

Numerical Optimization: Introduction and gradient-based methods

Numerical Optimization: Introduction and gradient-based methods Numerical Optimization: Introduction and gradient-based methods Master 2 Recherche LRI Apprentissage Statistique et Optimisation Anne Auger Inria Saclay-Ile-de-France November 2011 http://tao.lri.fr/tiki-index.php?page=courses

More information

Chap.12 Kernel methods [Book, Chap.7]

Chap.12 Kernel methods [Book, Chap.7] Chap.12 Kernel methods [Book, Chap.7] Neural network methods became popular in the mid to late 1980s, but by the mid to late 1990s, kernel methods have also become popular in machine learning. The first

More information

9 length of contour = no. of horizontal and vertical components + ( 2 no. of diagonal components) diameter of boundary B

9 length of contour = no. of horizontal and vertical components + ( 2 no. of diagonal components) diameter of boundary B 8. Boundary Descriptor 8.. Some Simple Descriptors length of contour : simplest descriptor - chain-coded curve 9 length of contour no. of horiontal and vertical components ( no. of diagonal components

More information

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016 Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the

More information

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Simplest

More information

Optimization. Industrial AI Lab.

Optimization. Industrial AI Lab. Optimization Industrial AI Lab. Optimization An important tool in 1) Engineering problem solving and 2) Decision science People optimize Nature optimizes 2 Optimization People optimize (source: http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do)

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Machine Learning: Think Big and Parallel

Machine Learning: Think Big and Parallel Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

More information

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Raquel Urtasun & Rich Zemel University of Toronto Nov 4, 2015 Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 1 / 18

More information

Model validation T , , Heli Hiisilä

Model validation T , , Heli Hiisilä Model validation T-61.6040, 03.10.2006, Heli Hiisilä Testing Neural Models: How to Use Re-Sampling Techniques? A. Lendasse & Fast bootstrap methodology for model selection, A. Lendasse, G. Simon, V. Wertz,

More information

Nonparametric regression using kernel and spline methods

Nonparametric regression using kernel and spline methods Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested

More information

DIMENSION REDUCTION FOR HYPERSPECTRAL DATA USING RANDOMIZED PCA AND LAPLACIAN EIGENMAPS

DIMENSION REDUCTION FOR HYPERSPECTRAL DATA USING RANDOMIZED PCA AND LAPLACIAN EIGENMAPS DIMENSION REDUCTION FOR HYPERSPECTRAL DATA USING RANDOMIZED PCA AND LAPLACIAN EIGENMAPS YIRAN LI APPLIED MATHEMATICS, STATISTICS AND SCIENTIFIC COMPUTING ADVISOR: DR. WOJTEK CZAJA, DR. JOHN BENEDETTO DEPARTMENT

More information

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different

More information

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 14-PCA & Autoencoders 1 / 18

More information

Dimension Reduction of Image Manifolds

Dimension Reduction of Image Manifolds Dimension Reduction of Image Manifolds Arian Maleki Department of Electrical Engineering Stanford University Stanford, CA, 9435, USA E-mail: arianm@stanford.edu I. INTRODUCTION Dimension reduction of datasets

More information

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics 1. Introduction EE 546, Univ of Washington, Spring 2016 performance of numerical methods complexity bounds structural convex optimization course goals and topics 1 1 Some course info Welcome to EE 546!

More information

Artificial Neural Networks Unsupervised learning: SOM

Artificial Neural Networks Unsupervised learning: SOM Artificial Neural Networks Unsupervised learning: SOM 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Basis Functions Tom Kelsey School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ tom@cs.st-andrews.ac.uk Tom Kelsey ID5059-02-BF 2015-02-04

More information

Learning from Data: Adaptive Basis Functions

Learning from Data: Adaptive Basis Functions Learning from Data: Adaptive Basis Functions November 21, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Neural Networks Hidden to output layer - a linear parameter model But adapt the features of the model.

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP

More information

Locally Weighted Learning

Locally Weighted Learning Locally Weighted Learning Peter Englert Department of Computer Science TU Darmstadt englert.peter@gmx.de Abstract Locally Weighted Learning is a class of function approximation techniques, where a prediction

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

Machine Learning. Topic 4: Linear Regression Models

Machine Learning. Topic 4: Linear Regression Models Machine Learning Topic 4: Linear Regression Models (contains ideas and a few images from wikipedia and books by Alpaydin, Duda/Hart/ Stork, and Bishop. Updated Fall 205) Regression Learning Task There

More information

CS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003

CS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003 CS 664 Slides #11 Image Segmentation Prof. Dan Huttenlocher Fall 2003 Image Segmentation Find regions of image that are coherent Dual of edge detection Regions vs. boundaries Related to clustering problems

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Unsupervised learning Daniel Hennes 29.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Supervised learning Regression (linear

More information

A Stochastic Optimization Approach for Unsupervised Kernel Regression

A Stochastic Optimization Approach for Unsupervised Kernel Regression A Stochastic Optimization Approach for Unsupervised Kernel Regression Oliver Kramer Institute of Structural Mechanics Bauhaus-University Weimar oliver.kramer@uni-weimar.de Fabian Gieseke Institute of Structural

More information

Lecture Topic Projects

Lecture Topic Projects Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017 CPSC 340: Machine Learning and Data Mining Multi-Dimensional Scaling Fall 2017 Assignment 4: Admin 1 late day for tonight, 2 late days for Wednesday. Assignment 5: Due Monday of next week. Final: Details

More information

Extending reservoir computing with random static projections: a hybrid between extreme learning and RC

Extending reservoir computing with random static projections: a hybrid between extreme learning and RC Extending reservoir computing with random static projections: a hybrid between extreme learning and RC John Butcher 1, David Verstraeten 2, Benjamin Schrauwen 2,CharlesDay 1 and Peter Haycock 1 1- Institute

More information