Non-linear models and model selection for spectral data. The data

Size: px

Start display at page:

Download "Non-linear models and model selection for spectral data. The data"

May Potter
5 years ago
Views:

1 Non-linear models and model selection for spectral data Michel Verleysen Université catholique de Louvain (Louvain-la-Neuve, Belgium) November 00 Michel Verleysen Non-linear models and model selection for spectral data - The data p middle infrared spectrum of wine samples A fine wine Another fine wine Michel Verleysen Non-linear models and model selection for spectral data -

2 p differences between spectra The data A fine wine (middle infrared for alcool) A not-so-fine juice (near infrared for sugar) Michel Verleysen Non-linear models and model selection for spectral data - 3 Calibration 0.60 input-output pair alcool concentration: calibration model p But: p order of input variables: not relevant for analysis p large dependencies between variables value to «predict» Michel Verleysen Non-linear models and model selection for spectral data - 4

3 dimension of data: D Modelling number of data: N known data model known outputs number of data: N learning dimension of data: D new data model generalization unknown outputs?? Michel Verleysen Non-linear models and model selection for spectral data - 5 Spectra: high-dimensional data dimension of data: D number of data: N known data X model Y = WX known outputs T number of data: N p Linear model p X : N x D p Y and T : N x p Learning p W = (X T X) - X T T p if D<N: least mean squares solution: min( T-WX ) p if D>N: impossible! Michel Verleysen Non-linear models and model selection for spectral data - 6 3

4 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 7 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 8 4

5 John Wilder Tukey p The Future of Data Analysis, Ann. Math. Statis., 33, -67, 96. «Analyze data rather than prove theorems» p In other words: p data are here p they will be coming more and more in the future p we must analyze them p with very humble means p insistence on mathematics will distract us from fundamental points From D. L. Donoho, High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. Lecture on August 8, 000, to the American Mathematical Society "Math Challenges of the st Century". Available from Michel Verleysen Non-linear models and model selection for spectral data - 9 Empty space phenomenon f(x) f(x) x x p Necessity fo fill space with learning points p # learning points exponential with dimension Michel Verleysen Non-linear models and model selection for spectral data - 0 5

6 Example: Silvermann s result p How to approximate a Gaussian distribution with Gaussian kernels p Desired accuracy: 90% # points E+06 E+05 E+04 E+03 E+0 E+0 E dim Michel Verleysen Non-linear models and model selection for spectral data - Surprizing phenomena in HD spaces p Sphere volume d / π V ( d ) = Γ r ( ) d d / + volume p Sphere volume / cube volume p Embedded spheres (radius ratio = 0.9) dim volume volume dim dim Michel Verleysen Non-linear models and model selection for spectral data - 6

7 Gaussian kernels p -D Gaussian % points p % points inside a sphere of radius dim Michel Verleysen Non-linear models and model selection for spectral data - 3 Concentration of measure phenomenon p Take all pairwise distances in random data p Compute the average A and the variance V of these distances p If D increases then p V remains fixed p A increases p All distances seem to concentrate!!! p Example: Eucliden norm of samples p average A increases with (D) 0.5 p variance V remains fixed p samples seem to be normalized! Michel Verleysen Non-linear models and model selection for spectral data - 4 7

8 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 5 Linear and non-linear models p Linear models + fixed number of parameters + low number of parameters + direct learning + no local minima - restricted to linear problems p Non-linear models - variable number of parameters - large number of parameters - adaptive learning - local minima + valid for any problem Michel Verleysen Non-linear models and model selection for spectral data - 6 8

9 Non-linear models RBF MLP x 0 (bias) x z 0 (bias) g z x 0 (bias) () w ij ( ) w ij h y x ϕ( x c i ) (linear terms) σ w i F(x) x D g z K st layer nd layer c ij x d σk st layer nd layer if several outputs Michel Verleysen Non-linear models and model selection for spectral data - 7 Non-linear models p Number of parameters p MLP: (D+)K + K + weights p RBF: K + weights KD centers K widths p In any case: large if D large! p If # of parameters > # data: ill-posed problem p infinity of solutions p ad-hoc criteria: one solution (linear case) p always a «solution» (non-linear case) p but overfitting! Michel Verleysen Non-linear models and model selection for spectral data - 8 9

10 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 9 Validation and test p Basic principle: never test on learning data Data Learning set Test set Michel Verleysen Non-linear models and model selection for spectral data - 0 0

11 Validation and test p Basic principle: never test on learning data p Corrolary: never compare models on learning data Data N N N 3 Learning set Validation set Test set Michel Verleysen Non-linear models and model selection for spectral data - Validation and test p Basic principle: never test on learning data p Corrolary: never compare models on learning data Data N N N 3 Learning set Validation set Test set p Learning set: used to lear each model p Validation set: used to compare models and select one of them p Test set: used to estimate the error made by the selected model Michel Verleysen Non-linear models and model selection for spectral data -

12 Validation and test p Each model: dependent on (sub)set of data several learnings for each model validation errors are averaged p Average validations errors are compared (between models) p The best is selected p Its error is estimated on test set p To select subsets of data: several ways p (k-fold) (cross)-validation p bootstrap p Particularly important with small sets of high-dimensional data! Michel Verleysen Non-linear models and model selection for spectral data - 3 Validation Data N N Learning set Validation set A model is built Error ( ŷ y ) t Ê t VS gen = N t Michel Verleysen Non-linear models and model selection for spectral data - 4

13 Cross-validation Data N N Learning set Validation set A model is built Experience repeated K times Error k ( ŷ ) K t yt t VS Êgen = K k= N Michel Verleysen Non-linear models and model selection for spectral data - 5 K-fold cross-validation Data N/K Validation set Learning set Error Êgen = K k= k ( ŷ ) K t yt t VS N / K A model is built Experience repeated K times Michel Verleysen Non-linear models and model selection for spectral data - 6 3

14 Leave-one-out Data Validation set Learning set Error A model is built Êgen = N k= N k ( ŷt yt ) Experience repeated K times Michel Verleysen Non-linear models and model selection for spectral data - 7 Bootstrap: plug-in principle World Sample Sample = new world New sample Michel Verleysen Non-linear models and model selection for spectral data - 8 4

15 Bootstrap : plug-in principle Egen = Esample + E World Sample Sample = new world Learning E = E new world Enew sample New sample Esample E new world Learning E new sample Michel Verleysen Non-linear models and model selection for spectral data - 9 Bootstrap : plug-in principle Sample = new world New sample Michel Verleysen Non-linear models and model selection for spectral data

16 Bootstrap Egen ( θ) = E ( θ) E ( θ) + E ( θ) Definition: D gen Egen sample ( θ) = D( θ) + E ( θ) sample sample ( θ) = E gen ( θ) E ( θ) ˆ sample Estimate: Dˆ ( θ) = E new ( θ) E ( θ) world new sample Michel Verleysen Non-linear models and model selection for spectral data - 3 Bootstrap Egen ( θ) = E ( θ) E ( θ) + E ( θ) Definition: D gen Egen sample ( θ) = D( θ) + E ( θ) sample sample ( θ) = E gen ( θ) E ( θ) ˆ sample K ( ) K k = k k Estimate: Dˆ ( θ) = E ( θ) E ( θ) Êgen new world new sample K ( ) K k = k k ( θ) = E ( θ) + E ( θ) E ( θ) sample new world new sample Michel Verleysen Non-linear models and model selection for spectral data - 3 6

17 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 33 Reducing the number of inputs p Overfitting (even in linear case)! = too many parameters (wrt # of inputs) p ex: polynomial approximation y y x x x C C x p Simplest model: linear (# parameters = # inputs + ) p But sometimes # known observations < # inputs! p Worst with non-linear model Michel Verleysen Non-linear models and model selection for spectral data

18 Multiple Linear Regression (MLR) p Simplest model t y = w 0 + w x + w x w D x D W = if T T ( X X ) X T E = min ( T Y ) p no non-linear capabilities p N > D necessary! X = M t t T = t M N x x M N x x x M N x x3 x3 M N x3 L L O L M xd xd N xd Michel Verleysen Non-linear models and model selection for spectral data - 35 Reducing the number of inputs p Necessity to reduce the number of inputs! dimension of data: D dimension of data: D number of data: N data X selection projection data X model Y = WX outputs Y, T Michel Verleysen Non-linear models and model selection for spectral data

19 Reducing the number of inputs p Necessity to reduce the number of inputs! dimension of data: D dimension of data: D number of data: N data X selection projection data X model Y = WX outputs Y, T p Selection: variables x are found among variables x Michel Verleysen Non-linear models and model selection for spectral data - 37 Reducing the number of inputs p Necessity to reduce the number of inputs! dimension of data: D dimension of data: D number of data: N data X selection projection data X model Y = WX outputs Y, T p Projection: variables x are linear or non-linear combinations of variables x Michel Verleysen Non-linear models and model selection for spectral data

20 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 39 Projection. Linear projection + linear model p Linear projection: PCA (Principal Component Analysis) p PCA = Karhunen-Loève transform p projection to a smaller dimension space p aims: p dimension reduction p to loose a minimum of information p data compression and/or representation Michel Verleysen Non-linear models and model selection for spectral data

21 PCA: data centering - normalisation pcentered and normalised data pto be independent from measurement units pbecause the origin of the data has no physical signification pto make computations easier! pcoordinates (= columns) are transformed j j xi E xi var [ xi ] ( x ) i Michel Verleysen Non-linear models and model selection for spectral data - 4 PCA: search for axes p We look for axes which : p minimise projection errors p maximise the variance after projection p Along axis u (u = unit vector): p But E [ x ] = 0 p Therefore N j = V = N T j [ u ( x E[ x] )] j = T j T T [ u x ] u X Xu V = = Equivalent criteria! Michel Verleysen Non-linear models and model selection for spectral data - 4

22 PCA: covariance matrix p Variance and covariance : s kl = N N j xk j = x j l (if x are centered) p Variances et covariances matrix : C = s s s L L O s D sd T = X X M N sd Michel Verleysen Non-linear models and model selection for spectral data - 43 p Classical result : PCA: choice of direction Best choice for u is the eigenvector u associated to the largest eigenvalue λ of matrix C. p In the space orthogonal to u : Best choice for u is the eigenvector u associated to the largest eigenvalue λ of matrix C. p And so on... Michel Verleysen Non-linear models and model selection for spectral data - 44

23 PCA: properties p C is symmetrical p eigenvalues are real p eigenvectors are orthogonal p C is semi-positive definite p eigenvalues are positive or zero p Contribution of each axis to the variance : p Eigenvalues are ordered controbution of K first axes to the variance : p One chooses for example V K >90% λ V = k k D = λ i i i K K λ = i V K = V = i D i= λ i = i Michel Verleysen Non-linear models and model selection for spectral data - 45 PCA: example and difficulties p PCA example dim dim p Difficulty: PCA requires to diagonalise matrix C (dimension: D x D). Heavy if D is large! Michel Verleysen Non-linear models and model selection for spectral data

24 Projection. Linear projection + linear model p PCR: Principal Component Regression p Projection (PCA): does not take T into account (unsupervised projection) p # variables after projection: user-defined (no automatic procedure) 95% criterion or cross-validation necessary! Michel Verleysen Non-linear models and model selection for spectral data - 47 Projection. Linear projection + linear model p cross-validation original data set learning set validation set repeated K times data X L data X V projection (learning) projection (generaliz.) data X L data X V model (learn.) model (gener.) outputs Y targets T Validation Error Michel Verleysen Non-linear models and model selection for spectral data

25 Projection. Linear projection + linear model p cross-validation 0.7 learning and validation errors learning set validation set # variables Michel Verleysen Non-linear models and model selection for spectral data - 49 Projection. Linear projection + linear model p running example: juices (sugar), D = 700 (near infrared) p N = 8, when applicable: N L = 50, N V = predicted concentration NMSE V = principal components actual concentration Michel Verleysen Non-linear models and model selection for spectral data

26 Projection. Linear projection + non-linear model p idem but model: RBFN, MLP, etc. p non-linear possibilities p but: K times learning and choice of meta-parameters! Michel Verleysen Non-linear models and model selection for spectral data - 5 Projection. Linear projection + non-linear model 80 predicted concentration NMSE V = principal components RBFN with 6 functions actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 5 6

27 3. Partial Least Squares (PLS) p build latent variables p most correlated to output p orthogonal Projection p first variable regression t = w x + L+ w D x D y = ct + y = c p ( ) second variable wx + L+ wd xd + y t = wx + L+ wdxd y = ct + ct + y where x i are the residuals of the regression of x i on t p etc. p cross-validation! Michel Verleysen Non-linear models and model selection for spectral data Partial Least Squares (PLS) Projection 80 predicted concentration NMSE V = latent variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data

28 Projection 4. Partial Least Squares (PLS) + non-linear model p use latent variables from PLS p build non-linear model on these variables 80 predicted concentration NMSE V = latent variables RBFN with 6 functions actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 55 Projection 5. Non-linear projection + (non-)linear model p Non-linear projection: p because linear projections flatten distributions Michel Verleysen Non-linear models and model selection for spectral data

29 Non-linear projection: how? p Build a (bijective) relation between p the data in the original space p the data in the projected space p If bijection: p possibility to switch between representation spaces («information» rather than «measure») p Problems to consider: p noise p twists and folds p impossibility to build a bijection Michel Verleysen Non-linear models and model selection for spectral data - 57 Non-linear projection: algorithms p Variance preservation p Distance preservation (like MDS) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data

30 Non-linear projection: algorithms p Variance preservation p local PCA p non-continuous representation p kernel PCA p transform data (non-linearly) in higher dimensional space p project linearly (PCA) in this space p Distance preservation (like MDS) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data - 59 Non-linear projection: algorithms p Variance preservation p Distance preservation (like MDS) p Sammon s non-linear mapping p Curvilinear Component Analysis (CCA) / Curvilinear Distances Analysis (CDA) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data

Non-linear models and model selection for spectral data - 6 Sammon s Non-Linear Mapping (NLM) / p Example: p Shortcomings: p Global gradient: lateral faces are

31 Sammon s Non-Linear Mapping (NLM) / pcriterion to be optimized: pdistance preservation (cfr metric MDS) psammon s stress = δi,j i< j i < j ( δi, j di, j ) δi ppreservation of small distances firstly, j distances in original space distances in projection space pcalculation: pminimization by gradient descent Michel Verleysen Non-linear models and model selection for spectral data - 6 Sammon s Non-Linear Mapping (NLM) / p Example: p Shortcomings: p Global gradient: lateral faces are «compacted» p Computational load (preprocess with VQ) p Euclidean distance (use curvilinear distance) Michel Verleysen Non-linear models and model selection for spectral data - 6 3

Curvilinear Component Analysis (/) pcriterion to be optimized: pdistance preservation ppreservation of small distances firstly (but «tears» are allowed) E = ( δ d ) F(d ) p CCA r < s r,s r,s r, s

32 Curvilinear Component Analysis (/) pcriterion to be optimized: pdistance preservation ppreservation of small distances firstly (but «tears» are allowed) E = ( δ d ) F(d ) p CCA r < s r,s r,s r, s pcalculation:. Vector Quantization as preprocessing. Minimization by stochastic gradient descent (±) 3. Interpolation Michel Verleysen Non-linear models and model selection for spectral data - 63 Curvilinear Component Analysis (/) p Example: p Shortcomings: p Convergence of the gradient descent: «torn» faces p Euclidean distance use curvilinear distance (becomes «Curvilinear Distances Analysis») Michel Verleysen Non-linear models and model selection for spectral data

33 NLP: use of curvilinear distance (/4) p Principle: Curvilinear (or geodesic) distance = Length of the shortest path from one node to another in a weighted graph Michel Verleysen Non-linear models and model selection for spectral data - 65 NLP: use of curvilinear distance (/4) p Useful for NLP Curvilinear distances are easier to preserve! Michel Verleysen Non-linear models and model selection for spectral data

for spectral data - 67 NLP: use of curvilinear distance (4/4) Projected open box: Sammon s NLM with Euclidean distance Projected open box:

34 NLP: use of curvilinear distance (3/4) p Integration in projection algorithms: ENLM = δi, j i < j i < j ( δi, j di, j ) δi, j use curvilinear distance (instead of Euclidean one) ECCA = r < s ( δr,s dr,s ) F(dr, s ) Michel Verleysen Non-linear models and model selection for spectral data - 67 NLP: use of curvilinear distance (4/4) Projected open box: Sammon s NLM with Euclidean distance Projected open box: Sammon s NLM with curvilinear distance Faces are «compacted» «Perfect»! Michel Verleysen Non-linear models and model selection for spectral data

35 Non-linear projection: algorithms p Variance preservation p Distance preservation (like MDS) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data - 69 Self-Organizing Map (SOM) (/) p Criterion to be optimized: p Quantization error & neighborhood preservation p No unique mathematical formulation of neighborhood criteria p Calculation: p Preestablished D or D grid: distance d(r,s) p Learning rule: r( i ) = argmin r Xi Cr Cr d ( r,r (i )) = α e λ ( Xi Cr ) Michel Verleysen Non-linear models and model selection for spectral data

Self-Organizing Map (SOM) (/) p Example: p Shortcomings: p Inadequate grid shape: faces are «cracked» p D or D grid only Michel Verleysen Non-linear models and model selection for spectral data - 7

36 Self-Organizing Map (SOM) (/) p Example: p Shortcomings: p Inadequate grid shape: faces are «cracked» p D or D grid only Michel Verleysen Non-linear models and model selection for spectral data - 7 Non-linear projection: algorithms p Variance preservation p Distance preservation (like MDS) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data

models and model selection for spectral data - 73 p Example: Autoassociative MLP (/) p Shortcomings: p «Non-geometric» method

37 Autoassociative MLP (/) p Criterion to be minimized: Reconstruction error (MSE) after coding and decoding of the data with an autoassociative neural network (MLP) p Autoassociative MLP: unsupervised (in=out) Auto-encoding Michel Verleysen Non-linear models and model selection for spectral data - 73 p Example: Autoassociative MLP (/) p Shortcomings: p «Non-geometric» method p Slow and hasardous convergence (5 layers!) Michel Verleysen Non-linear models and model selection for spectral data

38 Projection 5. Non-linear projection + (non-)linear model p All methods: p computer-intensive p (up to now) work only in moderate dimensions (D 5 0) p promising research direction! p same limitation as PCA: user-defined projection dimension cross-validation necessary! Michel Verleysen Non-linear models and model selection for spectral data - 75 Projection 5. Non-linear projection + (non-)linear model p All methods: cross-validation necessary! original data set learning set validation set repeated K times data X L data X V projection (learning) projection (generaliz.) data X L data X V model (learn.) model (gener.) outputs Y targets T Validation Error Michel Verleysen Non-linear models and model selection for spectral data

39 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 77 Selection 6. Constructive (forward) linear selection p repeat for each variable x i p select one (any) variable x i p build a linear model with variable x i p compare models and select variable x a corresponding to best one p repeat for each variable x i except x a p select one (any) variable x i p build a linear model with variables x a and x i p compare models and select variable x b corresponding to best one p etc. Michel Verleysen Non-linear models and model selection for spectral data

40 Selection 6. Constructive (forward) linear selection p if D initial variables and F final variables: p FD F(F-)/ models to build! p in average (F=D/): 3/8 D models p if models compared on learning set: no stopping criterion! use validation set! p BUT: validations sets of variables! use validation to choose the number of variables only then new forward selection on whole set Michel Verleysen Non-linear models and model selection for spectral data - 79 Selection 6. Constructive (forward) linear selection 80 predicted concentration NMSE V = variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data

41 Selection 7. Constructive (forward) non-linear selection p Same as forward linear selection except model p Validation + choice of meta parameters may become computerintensive! p Cross-validation: sale problem as in linear p Low # of cross-validations: non-smooth error curve (wrt # variables) stopping difficult! Michel Verleysen Non-linear models and model selection for spectral data - 8 Selection 7. Constructive (forward) non-linear selection 80 predicted concentration NMSE V = variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 8 4

42 Selection 8. Destructive (backward) linear selection p repeat for each variable x i p select one (any) variable p build a linear model with all variables but x i p compare models and remove variable x a corresponding to worst one p repeat for each variable x j except x a p select one (any) variable p build a linear model with all variables but x a and x j p compare models and remove variable x b corresponding to worst one p etc. Michel Verleysen Non-linear models and model selection for spectral data - 83 Selection 8. Destructive (backward) linear selection p building a (linear) model is only possible if # variables # observations! need to first reduce the # variables p preliminary PCA p identify most correlated pairs of variables and eliminate one of them Michel Verleysen Non-linear models and model selection for spectral data

43 Selection 9. Forward-backward on learning set with Fisher test p idea: test the significance of a variable (added or removed) p how? ANOVA SS df MS F regression SSreg p- MSreg=SSreg/(p-) MSreg/MSres residuals SSres N-p MSres=SSres/(N-p) total SStot N- N SSres = ( yi ti ) i = N SSreg = ( yi t ) i = N SStot = SSreg + SSres = i = ( ti t ) N : # samples p = D : # parameters SSres R = SStot Michel Verleysen Non-linear models and model selection for spectral data - 85 Selection 9. Forward-backward on learning set with Fisher test Selection of first variable p repeat for each variable x i p select one (any) variable x i p build a linear model with variable x i p compare models and select variable x a corresponding to maximum R Michel Verleysen Non-linear models and model selection for spectral data

44 Selection 9. Forward-backward on learning set with Fisher test Forward selection p repeat for each variable x i except x a p select one (any) variable x i p build a linear model with variables x a and x i p compute F to enter ( x ) p choose x b =argmax(f to enter (x j )) j SSreg = MSres ( x j xa )/ ( x,x ) j a = ( SSreg( x j,xa ) SSreg( xa )) MSres( x,x ) j a / p compare to Fisher table (5% or 0%) and keep x b if F to enter (x j ) higher Michel Verleysen Non-linear models and model selection for spectral data - 87 Selection 9. Forward-backward on learning set with Fisher test Backward selection p repeat for each variable x i except x b p select one (any) variable x i p build a linear model with variables x b and without x i p compute Fto remove ( x ) p choose x c =argmin(f to remove (x j )) j = ( SSreg( x j,xb ) SSreg( xb )) MSres( x,x ) j a / p compare to Fisher table (5% or 0%) and remove x c if F to remove (x j ) lower Michel Verleysen Non-linear models and model selection for spectral data

45 Selection 9. Forward-backward on learning set with Fisher test 0 predicted concentration NMSE A =0.088 NMSE V = variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 89 Selection 0. Forward-backward linear selection p Forward selection: When x b is selected, no guarantee that x a remains optimal! p Ideal algorithm: p (re)test each variable each time a new one is selected p equivalent to exhaustive search! p Compromise: p forward selection p start backward selection from result p On validation set! Michel Verleysen Non-linear models and model selection for spectral data

46 Selection 0. Forward-backward linear selection predicted concentration actual concentration NMSE V = variables no backward! Michel Verleysen Non-linear models and model selection for spectral data - 9 Selection 0. Forward-backward linear selection p But if experience (forward) is repeated: p N = 8, when applicable: N L = 4, N V = 47, N T = Michel Verleysen Non-linear models and model selection for spectral data

47 Selection 0. Forward-backward linear selection p Then Forward-Backward on N A + N V 80 predicted concentration NMSE T = variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 93 Selection. Forward-backward non-linear selection p Same as forward linear selection except model p Cross-validation + choice of meta parameters may become computer-intensive! p Low # of cross-validations: non-smooth error curve (wrt # variables) stopping difficult! Michel Verleysen Non-linear models and model selection for spectral data

48 Selection. Forward-backward non-linear selection predicted concentration actual concentration NMSE V = variables RBFN with 8 functions Michel Verleysen Non-linear models and model selection for spectral data - 95 Acknowledgements p Part of this work has been realized in collaboration with PhD students: p Nabil Benoudjit p John Lee p Amaury Lendasse Michel Verleysen Non-linear models and model selection for spectral data

49 References p Curse of dimensionality: p D. L. Donoho, High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. Lecture on August 8, 000, to the American Mathematical Society "Math Challenges of the st Century". Available from p M. Verleysen, Learning high-dimensional data. Accepted for publication in Limitations and future trends in neural computation, S. Ablameyko, L. Goras, M. Gori, V. Piuri eds, IOS Press. p M. Verleysen, Machine learning of high-dimensional data: local artificial neural networks and the curse of dimensionality. Agregation in higher education thesis, UCL ( February 00). Michel Verleysen Non-linear models and model selection for spectral data - 97 References p Cross-validation and bootstrap p NN FAQ: p B. Efron and R.J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, first edition, 993. p A.C. Davison, D.V. Hinkley. Bootstrap Methods and their Applications. Cambridge University Press, 3rd edition, 999. p PCA p any good textbook on linear statistics p PLS p M. Tenenhaus, La régression PLS: théorie et pratique, Editions Technip, 998. Michel Verleysen Non-linear models and model selection for spectral data

50 References p Non-linear projections p W. Sammon, J. A nonlinear mapping algorithm for data structure analysis. IEEE Transactions on Computers, CC-8(5):40{409, 969. p P. Demartines, J. Herault. Curvilinear Component Analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transaction on Neural Networks, 8():48{54, January 997. p J.A. Lee, A. Lendasse, M. Verleysen, Curvilinear Distance Analysis versus Isomap, ESANN 00, European Symposium on Artificial Neural Networks, Bruges (Belgium), April 00, pp Michel Verleysen Non-linear models and model selection for spectral data - 99 References p Forward and/or backward selection p Eklov T, Martensson P., Lundstrom I, Selection of variables for interpreting multivariate gas sensor data, Analytica Chimica Acta 38 (999) -3. p Bertrand D., Dufour E., La spectroscopie infrarouge et ses applications analytiques, Editions Tec& Doc, collection sciences et techniques agroalimentaires, (000). p Massart D. L., Vandeginste B. G. M., Buydens L. M. C., De Jong S., Lewi P. J., Smeyers-Verbeke J., Handbook of Chemometrics and Qualimetrics : Part A, Elsevier Science, Amsterdam, 997. p A. D. Walmsley, Improved variable selection procedure for multivariate linear regression, Analytica Chimica Acta, 354 (997) 5-3. p N. Benoudjit, E. Cools, M. Meurens, M. Verleysen, Calibrage chimiométrique des spectrophotomètres : sélection et validation des variables par modèles non-linéaires, Accepted for Chimiométrie 00, 4-5 December 00, Paris (France). Michel Verleysen Non-linear models and model selection for spectral data

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi(

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi( Nonlinear projections Université catholique de Louvain (Belgium) Machine Learning Group http://www.dice.ucl ucl.ac.be/.ac.be/mlg/ 1 Motivation High-dimensional data are difficult to represent difficult