Non-linear models and model selection for spectral data. The data
|
|
- May Potter
- 5 years ago
- Views:
Transcription
1 Non-linear models and model selection for spectral data Michel Verleysen Université catholique de Louvain (Louvain-la-Neuve, Belgium) November 00 Michel Verleysen Non-linear models and model selection for spectral data - The data p middle infrared spectrum of wine samples A fine wine Another fine wine Michel Verleysen Non-linear models and model selection for spectral data -
2 p differences between spectra The data A fine wine (middle infrared for alcool) A not-so-fine juice (near infrared for sugar) Michel Verleysen Non-linear models and model selection for spectral data - 3 Calibration 0.60 input-output pair alcool concentration: calibration model p But: p order of input variables: not relevant for analysis p large dependencies between variables value to «predict» Michel Verleysen Non-linear models and model selection for spectral data - 4
3 dimension of data: D Modelling number of data: N known data model known outputs number of data: N learning dimension of data: D new data model generalization unknown outputs?? Michel Verleysen Non-linear models and model selection for spectral data - 5 Spectra: high-dimensional data dimension of data: D number of data: N known data X model Y = WX known outputs T number of data: N p Linear model p X : N x D p Y and T : N x p Learning p W = (X T X) - X T T p if D<N: least mean squares solution: min( T-WX ) p if D>N: impossible! Michel Verleysen Non-linear models and model selection for spectral data - 6 3
4 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 7 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 8 4
5 John Wilder Tukey p The Future of Data Analysis, Ann. Math. Statis., 33, -67, 96. «Analyze data rather than prove theorems» p In other words: p data are here p they will be coming more and more in the future p we must analyze them p with very humble means p insistence on mathematics will distract us from fundamental points From D. L. Donoho, High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. Lecture on August 8, 000, to the American Mathematical Society "Math Challenges of the st Century". Available from Michel Verleysen Non-linear models and model selection for spectral data - 9 Empty space phenomenon f(x) f(x) x x p Necessity fo fill space with learning points p # learning points exponential with dimension Michel Verleysen Non-linear models and model selection for spectral data - 0 5
6 Example: Silvermann s result p How to approximate a Gaussian distribution with Gaussian kernels p Desired accuracy: 90% # points E+06 E+05 E+04 E+03 E+0 E+0 E dim Michel Verleysen Non-linear models and model selection for spectral data - Surprizing phenomena in HD spaces p Sphere volume d / π V ( d ) = Γ r ( ) d d / + volume p Sphere volume / cube volume p Embedded spheres (radius ratio = 0.9) dim volume volume dim dim Michel Verleysen Non-linear models and model selection for spectral data - 6
7 Gaussian kernels p -D Gaussian % points p % points inside a sphere of radius dim Michel Verleysen Non-linear models and model selection for spectral data - 3 Concentration of measure phenomenon p Take all pairwise distances in random data p Compute the average A and the variance V of these distances p If D increases then p V remains fixed p A increases p All distances seem to concentrate!!! p Example: Eucliden norm of samples p average A increases with (D) 0.5 p variance V remains fixed p samples seem to be normalized! Michel Verleysen Non-linear models and model selection for spectral data - 4 7
8 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 5 Linear and non-linear models p Linear models + fixed number of parameters + low number of parameters + direct learning + no local minima - restricted to linear problems p Non-linear models - variable number of parameters - large number of parameters - adaptive learning - local minima + valid for any problem Michel Verleysen Non-linear models and model selection for spectral data - 6 8
9 Non-linear models RBF MLP x 0 (bias) x z 0 (bias) g z x 0 (bias) () w ij ( ) w ij h y x ϕ( x c i ) (linear terms) σ w i F(x) x D g z K st layer nd layer c ij x d σk st layer nd layer if several outputs Michel Verleysen Non-linear models and model selection for spectral data - 7 Non-linear models p Number of parameters p MLP: (D+)K + K + weights p RBF: K + weights KD centers K widths p In any case: large if D large! p If # of parameters > # data: ill-posed problem p infinity of solutions p ad-hoc criteria: one solution (linear case) p always a «solution» (non-linear case) p but overfitting! Michel Verleysen Non-linear models and model selection for spectral data - 8 9
10 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 9 Validation and test p Basic principle: never test on learning data Data Learning set Test set Michel Verleysen Non-linear models and model selection for spectral data - 0 0
11 Validation and test p Basic principle: never test on learning data p Corrolary: never compare models on learning data Data N N N 3 Learning set Validation set Test set Michel Verleysen Non-linear models and model selection for spectral data - Validation and test p Basic principle: never test on learning data p Corrolary: never compare models on learning data Data N N N 3 Learning set Validation set Test set p Learning set: used to lear each model p Validation set: used to compare models and select one of them p Test set: used to estimate the error made by the selected model Michel Verleysen Non-linear models and model selection for spectral data -
12 Validation and test p Each model: dependent on (sub)set of data several learnings for each model validation errors are averaged p Average validations errors are compared (between models) p The best is selected p Its error is estimated on test set p To select subsets of data: several ways p (k-fold) (cross)-validation p bootstrap p Particularly important with small sets of high-dimensional data! Michel Verleysen Non-linear models and model selection for spectral data - 3 Validation Data N N Learning set Validation set A model is built Error ( ŷ y ) t Ê t VS gen = N t Michel Verleysen Non-linear models and model selection for spectral data - 4
13 Cross-validation Data N N Learning set Validation set A model is built Experience repeated K times Error k ( ŷ ) K t yt t VS Êgen = K k= N Michel Verleysen Non-linear models and model selection for spectral data - 5 K-fold cross-validation Data N/K Validation set Learning set Error Êgen = K k= k ( ŷ ) K t yt t VS N / K A model is built Experience repeated K times Michel Verleysen Non-linear models and model selection for spectral data - 6 3
14 Leave-one-out Data Validation set Learning set Error A model is built Êgen = N k= N k ( ŷt yt ) Experience repeated K times Michel Verleysen Non-linear models and model selection for spectral data - 7 Bootstrap: plug-in principle World Sample Sample = new world New sample Michel Verleysen Non-linear models and model selection for spectral data - 8 4
15 Bootstrap : plug-in principle Egen = Esample + E World Sample Sample = new world Learning E = E new world Enew sample New sample Esample E new world Learning E new sample Michel Verleysen Non-linear models and model selection for spectral data - 9 Bootstrap : plug-in principle Sample = new world New sample Michel Verleysen Non-linear models and model selection for spectral data
16 Bootstrap Egen ( θ) = E ( θ) E ( θ) + E ( θ) Definition: D gen Egen sample ( θ) = D( θ) + E ( θ) sample sample ( θ) = E gen ( θ) E ( θ) ˆ sample Estimate: Dˆ ( θ) = E new ( θ) E ( θ) world new sample Michel Verleysen Non-linear models and model selection for spectral data - 3 Bootstrap Egen ( θ) = E ( θ) E ( θ) + E ( θ) Definition: D gen Egen sample ( θ) = D( θ) + E ( θ) sample sample ( θ) = E gen ( θ) E ( θ) ˆ sample K ( ) K k = k k Estimate: Dˆ ( θ) = E ( θ) E ( θ) Êgen new world new sample K ( ) K k = k k ( θ) = E ( θ) + E ( θ) E ( θ) sample new world new sample Michel Verleysen Non-linear models and model selection for spectral data - 3 6
17 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 33 Reducing the number of inputs p Overfitting (even in linear case)! = too many parameters (wrt # of inputs) p ex: polynomial approximation y y x x x C C x p Simplest model: linear (# parameters = # inputs + ) p But sometimes # known observations < # inputs! p Worst with non-linear model Michel Verleysen Non-linear models and model selection for spectral data
18 Multiple Linear Regression (MLR) p Simplest model t y = w 0 + w x + w x w D x D W = if T T ( X X ) X T E = min ( T Y ) p no non-linear capabilities p N > D necessary! X = M t t T = t M N x x M N x x x M N x x3 x3 M N x3 L L O L M xd xd N xd Michel Verleysen Non-linear models and model selection for spectral data - 35 Reducing the number of inputs p Necessity to reduce the number of inputs! dimension of data: D dimension of data: D number of data: N data X selection projection data X model Y = WX outputs Y, T Michel Verleysen Non-linear models and model selection for spectral data
19 Reducing the number of inputs p Necessity to reduce the number of inputs! dimension of data: D dimension of data: D number of data: N data X selection projection data X model Y = WX outputs Y, T p Selection: variables x are found among variables x Michel Verleysen Non-linear models and model selection for spectral data - 37 Reducing the number of inputs p Necessity to reduce the number of inputs! dimension of data: D dimension of data: D number of data: N data X selection projection data X model Y = WX outputs Y, T p Projection: variables x are linear or non-linear combinations of variables x Michel Verleysen Non-linear models and model selection for spectral data
20 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 39 Projection. Linear projection + linear model p Linear projection: PCA (Principal Component Analysis) p PCA = Karhunen-Loève transform p projection to a smaller dimension space p aims: p dimension reduction p to loose a minimum of information p data compression and/or representation Michel Verleysen Non-linear models and model selection for spectral data
21 PCA: data centering - normalisation pcentered and normalised data pto be independent from measurement units pbecause the origin of the data has no physical signification pto make computations easier! pcoordinates (= columns) are transformed j j xi E xi var [ xi ] ( x ) i Michel Verleysen Non-linear models and model selection for spectral data - 4 PCA: search for axes p We look for axes which : p minimise projection errors p maximise the variance after projection p Along axis u (u = unit vector): p But E [ x ] = 0 p Therefore N j = V = N T j [ u ( x E[ x] )] j = T j T T [ u x ] u X Xu V = = Equivalent criteria! Michel Verleysen Non-linear models and model selection for spectral data - 4
22 PCA: covariance matrix p Variance and covariance : s kl = N N j xk j = x j l (if x are centered) p Variances et covariances matrix : C = s s s L L O s D sd T = X X M N sd Michel Verleysen Non-linear models and model selection for spectral data - 43 p Classical result : PCA: choice of direction Best choice for u is the eigenvector u associated to the largest eigenvalue λ of matrix C. p In the space orthogonal to u : Best choice for u is the eigenvector u associated to the largest eigenvalue λ of matrix C. p And so on... Michel Verleysen Non-linear models and model selection for spectral data - 44
23 PCA: properties p C is symmetrical p eigenvalues are real p eigenvectors are orthogonal p C is semi-positive definite p eigenvalues are positive or zero p Contribution of each axis to the variance : p Eigenvalues are ordered controbution of K first axes to the variance : p One chooses for example V K >90% λ V = k k D = λ i i i K K λ = i V K = V = i D i= λ i = i Michel Verleysen Non-linear models and model selection for spectral data - 45 PCA: example and difficulties p PCA example dim dim p Difficulty: PCA requires to diagonalise matrix C (dimension: D x D). Heavy if D is large! Michel Verleysen Non-linear models and model selection for spectral data
24 Projection. Linear projection + linear model p PCR: Principal Component Regression p Projection (PCA): does not take T into account (unsupervised projection) p # variables after projection: user-defined (no automatic procedure) 95% criterion or cross-validation necessary! Michel Verleysen Non-linear models and model selection for spectral data - 47 Projection. Linear projection + linear model p cross-validation original data set learning set validation set repeated K times data X L data X V projection (learning) projection (generaliz.) data X L data X V model (learn.) model (gener.) outputs Y targets T Validation Error Michel Verleysen Non-linear models and model selection for spectral data
25 Projection. Linear projection + linear model p cross-validation 0.7 learning and validation errors learning set validation set # variables Michel Verleysen Non-linear models and model selection for spectral data - 49 Projection. Linear projection + linear model p running example: juices (sugar), D = 700 (near infrared) p N = 8, when applicable: N L = 50, N V = predicted concentration NMSE V = principal components actual concentration Michel Verleysen Non-linear models and model selection for spectral data
26 Projection. Linear projection + non-linear model p idem but model: RBFN, MLP, etc. p non-linear possibilities p but: K times learning and choice of meta-parameters! Michel Verleysen Non-linear models and model selection for spectral data - 5 Projection. Linear projection + non-linear model 80 predicted concentration NMSE V = principal components RBFN with 6 functions actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 5 6
27 3. Partial Least Squares (PLS) p build latent variables p most correlated to output p orthogonal Projection p first variable regression t = w x + L+ w D x D y = ct + y = c p ( ) second variable wx + L+ wd xd + y t = wx + L+ wdxd y = ct + ct + y where x i are the residuals of the regression of x i on t p etc. p cross-validation! Michel Verleysen Non-linear models and model selection for spectral data Partial Least Squares (PLS) Projection 80 predicted concentration NMSE V = latent variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data
28 Projection 4. Partial Least Squares (PLS) + non-linear model p use latent variables from PLS p build non-linear model on these variables 80 predicted concentration NMSE V = latent variables RBFN with 6 functions actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 55 Projection 5. Non-linear projection + (non-)linear model p Non-linear projection: p because linear projections flatten distributions Michel Verleysen Non-linear models and model selection for spectral data
29 Non-linear projection: how? p Build a (bijective) relation between p the data in the original space p the data in the projected space p If bijection: p possibility to switch between representation spaces («information» rather than «measure») p Problems to consider: p noise p twists and folds p impossibility to build a bijection Michel Verleysen Non-linear models and model selection for spectral data - 57 Non-linear projection: algorithms p Variance preservation p Distance preservation (like MDS) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data
30 Non-linear projection: algorithms p Variance preservation p local PCA p non-continuous representation p kernel PCA p transform data (non-linearly) in higher dimensional space p project linearly (PCA) in this space p Distance preservation (like MDS) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data - 59 Non-linear projection: algorithms p Variance preservation p Distance preservation (like MDS) p Sammon s non-linear mapping p Curvilinear Component Analysis (CCA) / Curvilinear Distances Analysis (CDA) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data
31 Sammon s Non-Linear Mapping (NLM) / pcriterion to be optimized: pdistance preservation (cfr metric MDS) psammon s stress = δi,j i< j i < j ( δi, j di, j ) δi ppreservation of small distances firstly, j distances in original space distances in projection space pcalculation: pminimization by gradient descent Michel Verleysen Non-linear models and model selection for spectral data - 6 Sammon s Non-Linear Mapping (NLM) / p Example: p Shortcomings: p Global gradient: lateral faces are «compacted» p Computational load (preprocess with VQ) p Euclidean distance (use curvilinear distance) Michel Verleysen Non-linear models and model selection for spectral data - 6 3
32 Curvilinear Component Analysis (/) pcriterion to be optimized: pdistance preservation ppreservation of small distances firstly (but «tears» are allowed) E = ( δ d ) F(d ) p CCA r < s r,s r,s r, s pcalculation:. Vector Quantization as preprocessing. Minimization by stochastic gradient descent (±) 3. Interpolation Michel Verleysen Non-linear models and model selection for spectral data - 63 Curvilinear Component Analysis (/) p Example: p Shortcomings: p Convergence of the gradient descent: «torn» faces p Euclidean distance use curvilinear distance (becomes «Curvilinear Distances Analysis») Michel Verleysen Non-linear models and model selection for spectral data
33 NLP: use of curvilinear distance (/4) p Principle: Curvilinear (or geodesic) distance = Length of the shortest path from one node to another in a weighted graph Michel Verleysen Non-linear models and model selection for spectral data - 65 NLP: use of curvilinear distance (/4) p Useful for NLP Curvilinear distances are easier to preserve! Michel Verleysen Non-linear models and model selection for spectral data
34 NLP: use of curvilinear distance (3/4) p Integration in projection algorithms: ENLM = δi, j i < j i < j ( δi, j di, j ) δi, j use curvilinear distance (instead of Euclidean one) ECCA = r < s ( δr,s dr,s ) F(dr, s ) Michel Verleysen Non-linear models and model selection for spectral data - 67 NLP: use of curvilinear distance (4/4) Projected open box: Sammon s NLM with Euclidean distance Projected open box: Sammon s NLM with curvilinear distance Faces are «compacted» «Perfect»! Michel Verleysen Non-linear models and model selection for spectral data
35 Non-linear projection: algorithms p Variance preservation p Distance preservation (like MDS) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data - 69 Self-Organizing Map (SOM) (/) p Criterion to be optimized: p Quantization error & neighborhood preservation p No unique mathematical formulation of neighborhood criteria p Calculation: p Preestablished D or D grid: distance d(r,s) p Learning rule: r( i ) = argmin r Xi Cr Cr d ( r,r (i )) = α e λ ( Xi Cr ) Michel Verleysen Non-linear models and model selection for spectral data
36 Self-Organizing Map (SOM) (/) p Example: p Shortcomings: p Inadequate grid shape: faces are «cracked» p D or D grid only Michel Verleysen Non-linear models and model selection for spectral data - 7 Non-linear projection: algorithms p Variance preservation p Distance preservation (like MDS) p Neighborhood preservation (like SOM) p Minimal reconstruction error Michel Verleysen Non-linear models and model selection for spectral data
37 Autoassociative MLP (/) p Criterion to be minimized: Reconstruction error (MSE) after coding and decoding of the data with an autoassociative neural network (MLP) p Autoassociative MLP: unsupervised (in=out) Auto-encoding Michel Verleysen Non-linear models and model selection for spectral data - 73 p Example: Autoassociative MLP (/) p Shortcomings: p «Non-geometric» method p Slow and hasardous convergence (5 layers!) Michel Verleysen Non-linear models and model selection for spectral data
38 Projection 5. Non-linear projection + (non-)linear model p All methods: p computer-intensive p (up to now) work only in moderate dimensions (D 5 0) p promising research direction! p same limitation as PCA: user-defined projection dimension cross-validation necessary! Michel Verleysen Non-linear models and model selection for spectral data - 75 Projection 5. Non-linear projection + (non-)linear model p All methods: cross-validation necessary! original data set learning set validation set repeated K times data X L data X V projection (learning) projection (generaliz.) data X L data X V model (learn.) model (gener.) outputs Y targets T Validation Error Michel Verleysen Non-linear models and model selection for spectral data
39 Content I. High-dimensional data: surprizing results II. Linear and non-linear models III. Learning, validation, test IV. Reducing the number of inputs V. Projection VI. Selection p Running example Michel Verleysen Non-linear models and model selection for spectral data - 77 Selection 6. Constructive (forward) linear selection p repeat for each variable x i p select one (any) variable x i p build a linear model with variable x i p compare models and select variable x a corresponding to best one p repeat for each variable x i except x a p select one (any) variable x i p build a linear model with variables x a and x i p compare models and select variable x b corresponding to best one p etc. Michel Verleysen Non-linear models and model selection for spectral data
40 Selection 6. Constructive (forward) linear selection p if D initial variables and F final variables: p FD F(F-)/ models to build! p in average (F=D/): 3/8 D models p if models compared on learning set: no stopping criterion! use validation set! p BUT: validations sets of variables! use validation to choose the number of variables only then new forward selection on whole set Michel Verleysen Non-linear models and model selection for spectral data - 79 Selection 6. Constructive (forward) linear selection 80 predicted concentration NMSE V = variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data
41 Selection 7. Constructive (forward) non-linear selection p Same as forward linear selection except model p Validation + choice of meta parameters may become computerintensive! p Cross-validation: sale problem as in linear p Low # of cross-validations: non-smooth error curve (wrt # variables) stopping difficult! Michel Verleysen Non-linear models and model selection for spectral data - 8 Selection 7. Constructive (forward) non-linear selection 80 predicted concentration NMSE V = variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 8 4
42 Selection 8. Destructive (backward) linear selection p repeat for each variable x i p select one (any) variable p build a linear model with all variables but x i p compare models and remove variable x a corresponding to worst one p repeat for each variable x j except x a p select one (any) variable p build a linear model with all variables but x a and x j p compare models and remove variable x b corresponding to worst one p etc. Michel Verleysen Non-linear models and model selection for spectral data - 83 Selection 8. Destructive (backward) linear selection p building a (linear) model is only possible if # variables # observations! need to first reduce the # variables p preliminary PCA p identify most correlated pairs of variables and eliminate one of them Michel Verleysen Non-linear models and model selection for spectral data
43 Selection 9. Forward-backward on learning set with Fisher test p idea: test the significance of a variable (added or removed) p how? ANOVA SS df MS F regression SSreg p- MSreg=SSreg/(p-) MSreg/MSres residuals SSres N-p MSres=SSres/(N-p) total SStot N- N SSres = ( yi ti ) i = N SSreg = ( yi t ) i = N SStot = SSreg + SSres = i = ( ti t ) N : # samples p = D : # parameters SSres R = SStot Michel Verleysen Non-linear models and model selection for spectral data - 85 Selection 9. Forward-backward on learning set with Fisher test Selection of first variable p repeat for each variable x i p select one (any) variable x i p build a linear model with variable x i p compare models and select variable x a corresponding to maximum R Michel Verleysen Non-linear models and model selection for spectral data
44 Selection 9. Forward-backward on learning set with Fisher test Forward selection p repeat for each variable x i except x a p select one (any) variable x i p build a linear model with variables x a and x i p compute F to enter ( x ) p choose x b =argmax(f to enter (x j )) j SSreg = MSres ( x j xa )/ ( x,x ) j a = ( SSreg( x j,xa ) SSreg( xa )) MSres( x,x ) j a / p compare to Fisher table (5% or 0%) and keep x b if F to enter (x j ) higher Michel Verleysen Non-linear models and model selection for spectral data - 87 Selection 9. Forward-backward on learning set with Fisher test Backward selection p repeat for each variable x i except x b p select one (any) variable x i p build a linear model with variables x b and without x i p compute Fto remove ( x ) p choose x c =argmin(f to remove (x j )) j = ( SSreg( x j,xb ) SSreg( xb )) MSres( x,x ) j a / p compare to Fisher table (5% or 0%) and remove x c if F to remove (x j ) lower Michel Verleysen Non-linear models and model selection for spectral data
45 Selection 9. Forward-backward on learning set with Fisher test 0 predicted concentration NMSE A =0.088 NMSE V = variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 89 Selection 0. Forward-backward linear selection p Forward selection: When x b is selected, no guarantee that x a remains optimal! p Ideal algorithm: p (re)test each variable each time a new one is selected p equivalent to exhaustive search! p Compromise: p forward selection p start backward selection from result p On validation set! Michel Verleysen Non-linear models and model selection for spectral data
46 Selection 0. Forward-backward linear selection predicted concentration actual concentration NMSE V = variables no backward! Michel Verleysen Non-linear models and model selection for spectral data - 9 Selection 0. Forward-backward linear selection p But if experience (forward) is repeated: p N = 8, when applicable: N L = 4, N V = 47, N T = Michel Verleysen Non-linear models and model selection for spectral data
47 Selection 0. Forward-backward linear selection p Then Forward-Backward on N A + N V 80 predicted concentration NMSE T = variables actual concentration Michel Verleysen Non-linear models and model selection for spectral data - 93 Selection. Forward-backward non-linear selection p Same as forward linear selection except model p Cross-validation + choice of meta parameters may become computer-intensive! p Low # of cross-validations: non-smooth error curve (wrt # variables) stopping difficult! Michel Verleysen Non-linear models and model selection for spectral data
48 Selection. Forward-backward non-linear selection predicted concentration actual concentration NMSE V = variables RBFN with 8 functions Michel Verleysen Non-linear models and model selection for spectral data - 95 Acknowledgements p Part of this work has been realized in collaboration with PhD students: p Nabil Benoudjit p John Lee p Amaury Lendasse Michel Verleysen Non-linear models and model selection for spectral data
49 References p Curse of dimensionality: p D. L. Donoho, High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. Lecture on August 8, 000, to the American Mathematical Society "Math Challenges of the st Century". Available from p M. Verleysen, Learning high-dimensional data. Accepted for publication in Limitations and future trends in neural computation, S. Ablameyko, L. Goras, M. Gori, V. Piuri eds, IOS Press. p M. Verleysen, Machine learning of high-dimensional data: local artificial neural networks and the curse of dimensionality. Agregation in higher education thesis, UCL ( February 00). Michel Verleysen Non-linear models and model selection for spectral data - 97 References p Cross-validation and bootstrap p NN FAQ: p B. Efron and R.J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, first edition, 993. p A.C. Davison, D.V. Hinkley. Bootstrap Methods and their Applications. Cambridge University Press, 3rd edition, 999. p PCA p any good textbook on linear statistics p PLS p M. Tenenhaus, La régression PLS: théorie et pratique, Editions Technip, 998. Michel Verleysen Non-linear models and model selection for spectral data
50 References p Non-linear projections p W. Sammon, J. A nonlinear mapping algorithm for data structure analysis. IEEE Transactions on Computers, CC-8(5):40{409, 969. p P. Demartines, J. Herault. Curvilinear Component Analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transaction on Neural Networks, 8():48{54, January 997. p J.A. Lee, A. Lendasse, M. Verleysen, Curvilinear Distance Analysis versus Isomap, ESANN 00, European Symposium on Artificial Neural Networks, Bruges (Belgium), April 00, pp Michel Verleysen Non-linear models and model selection for spectral data - 99 References p Forward and/or backward selection p Eklov T, Martensson P., Lundstrom I, Selection of variables for interpreting multivariate gas sensor data, Analytica Chimica Acta 38 (999) -3. p Bertrand D., Dufour E., La spectroscopie infrarouge et ses applications analytiques, Editions Tec& Doc, collection sciences et techniques agroalimentaires, (000). p Massart D. L., Vandeginste B. G. M., Buydens L. M. C., De Jong S., Lewi P. J., Smeyers-Verbeke J., Handbook of Chemometrics and Qualimetrics : Part A, Elsevier Science, Amsterdam, 997. p A. D. Walmsley, Improved variable selection procedure for multivariate linear regression, Analytica Chimica Acta, 354 (997) 5-3. p N. Benoudjit, E. Cools, M. Meurens, M. Verleysen, Calibrage chimiométrique des spectrophotomètres : sélection et validation des variables par modèles non-linéaires, Accepted for Chimiométrie 00, 4-5 December 00, Paris (France). Michel Verleysen Non-linear models and model selection for spectral data
Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi(
Nonlinear projections Université catholique de Louvain (Belgium) Machine Learning Group http://www.dice.ucl ucl.ac.be/.ac.be/mlg/ 1 Motivation High-dimensional data are difficult to represent difficult
More informationCurvilinear Distance Analysis versus Isomap
Curvilinear Distance Analysis versus Isomap John Aldo Lee, Amaury Lendasse, Michel Verleysen Université catholique de Louvain Place du Levant, 3, B-1348 Louvain-la-Neuve, Belgium {lee,verleysen}@dice.ucl.ac.be,
More informationHow to project circular manifolds using geodesic distances?
How to project circular manifolds using geodesic distances? John Aldo Lee, Michel Verleysen Université catholique de Louvain Place du Levant, 3, B-1348 Louvain-la-Neuve, Belgium {lee,verleysen}@dice.ucl.ac.be
More informationFeature clustering and mutual information for the selection of variables in spectral data
Feature clustering and mutual information for the selection of variables in spectral data C. Krier 1, D. François 2, F.Rossi 3, M. Verleysen 1 1,2 Université catholique de Louvain, Machine Learning Group
More informationChemometric calibration of infrared spectrometers: selection and validation of variables by non-linear models
Chemometrics and Intelligent Laboratory Systems 70 (2004) 47 53 www.elsevier.com/locate/chemolab Chemometric calibration of infrared spectrometers: selection and validation of variables by non-linear models
More informationCSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..
More informationWidth optimization of the Gaussian kernels in Radial Basis Function Networks
ESANN' proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), - April, d-side publi., ISBN -97--, pp. - Width optimization of the Gaussian kernels in Radial Basis Function Networks
More informationMultiresponse Sparse Regression with Application to Multidimensional Scaling
Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,
More informationSupervised Variable Clustering for Classification of NIR Spectra
Supervised Variable Clustering for Classification of NIR Spectra Catherine Krier *, Damien François 2, Fabrice Rossi 3, Michel Verleysen, Université catholique de Louvain, Machine Learning Group, place
More informationTime Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks
Series Prediction as a Problem of Missing Values: Application to ESTSP7 and NN3 Competition Benchmarks Antti Sorjamaa and Amaury Lendasse Abstract In this paper, time series prediction is considered as
More informationData Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\
Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured
More informationLocal multidimensional scaling with controlled tradeoff between trustworthiness and continuity
Local multidimensional scaling with controlled tradeoff between trustworthiness and continuity Jaro Venna and Samuel Kasi, Neural Networs Research Centre Helsini University of Technology Espoo, Finland
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More informationCSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 A / CX 4242 DVA March 6, 2014 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Analyze! Limited memory size! Data may not be fitted to the memory of your machine! Slow computation!
More informationCSE 6242 / CX October 9, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 / CX 4242 October 9, 2014 Dimension Reduction Guest Lecturer: Jaegul Choo Volume Variety Big Data Era 2 Velocity Veracity 3 Big Data are High-Dimensional Examples of High-Dimensional Data Image
More informationNon-linear dimension reduction
Sta306b May 23, 2011 Dimension Reduction: 1 Non-linear dimension reduction ISOMAP: Tenenbaum, de Silva & Langford (2000) Local linear embedding: Roweis & Saul (2000) Local MDS: Chen (2006) all three methods
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationHyperspectral Chemical Imaging: principles and Chemometrics.
Hyperspectral Chemical Imaging: principles and Chemometrics aoife.gowen@ucd.ie University College Dublin University College Dublin 1,596 PhD students 6,17 international students 8,54 graduate students
More informationData Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\
Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationSOM+EOF for Finding Missing Values
SOM+EOF for Finding Missing Values Antti Sorjamaa 1, Paul Merlin 2, Bertrand Maillet 2 and Amaury Lendasse 1 1- Helsinki University of Technology - CIS P.O. Box 5400, 02015 HUT - Finland 2- Variances and
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationSGN (4 cr) Chapter 10
SGN-41006 (4 cr) Chapter 10 Feature Selection and Extraction Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 18, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006
More informationToday. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps
Today Gradient descent for minimization of functions of real variables. Multi-dimensional scaling Self-organizing maps Gradient Descent Derivatives Consider function f(x) : R R. The derivative w.r.t. x
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling
More informationCIE L*a*b* color model
CIE L*a*b* color model To further strengthen the correlation between the color model and human perception, we apply the following non-linear transformation: with where (X n,y n,z n ) are the tristimulus
More informationCSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks
CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks Part IV 1 Function approximation MLP is both a pattern classifier and a function approximator As a function approximator,
More informationFeature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22
Feature selection Javier Béjar cbea LSI - FIB Term 2011/2012 Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/2012 1 / 22 Outline 1 Dimensionality reduction 2 Projections 3 Attribute selection
More informationRobust Pose Estimation using the SwissRanger SR-3000 Camera
Robust Pose Estimation using the SwissRanger SR- Camera Sigurjón Árni Guðmundsson, Rasmus Larsen and Bjarne K. Ersbøll Technical University of Denmark, Informatics and Mathematical Modelling. Building,
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationNeural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing
Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing feature 3 PC 3 Beate Sick Many slides are taken form Hinton s great lecture on NN: https://www.coursera.org/course/neuralnets
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationLecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)
School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression
More information3 Nonlinear Regression
CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationAlternative Statistical Methods for Bone Atlas Modelling
Alternative Statistical Methods for Bone Atlas Modelling Sharmishtaa Seshamani, Gouthami Chintalapani, Russell Taylor Department of Computer Science, Johns Hopkins University, Baltimore, MD Traditional
More informationGeometric Registration for Deformable Shapes 3.3 Advanced Global Matching
Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Correlated Correspondences [ASP*04] A Complete Registration System [HAW*08] In this session Advanced Global Matching Some practical
More informationCOMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS
COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationMTTTS17 Dimensionality Reduction and Visualization. Spring 2018 Jaakko Peltonen. Lecture 11: Neighbor Embedding Methods continued
MTTTS17 Dimensionality Reduction and Visualization Spring 2018 Jaakko Peltonen Lecture 11: Neighbor Embedding Methods continued This Lecture Neighbor embedding by generative modeling Some supervised neighbor
More informationOn the Kernel Widths in Radial-Basis Function Networks
Neural ProcessingLetters 18: 139 154, 2003. 139 # 2003 Kluwer Academic Publishers. Printed in the Netherlands. On the Kernel Widths in Radial-Basis Function Networks NABIL BENOUDJIT and MICHEL VERLEYSEN
More informationESANN'2006 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2006, d-side publi., ISBN
ESANN'26 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 26-28 April 26, d-side publi., ISBN 2-9337-6-4. Visualizing the trustworthiness of a projection Michaël Aupetit
More informationRecent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will
Lectures Recent advances in Metamodel of Optimal Prognosis Thomas Most & Johannes Will presented at the Weimar Optimization and Stochastic Days 2010 Source: www.dynardo.de/en/library Recent advances in
More informationChapter 7: Competitive learning, clustering, and self-organizing maps
Chapter 7: Competitive learning, clustering, and self-organizing maps António R. C. Paiva EEL 6814 Spring 2008 Outline Competitive learning Clustering Self-Organizing Maps What is competition in neural
More informationSELECTION OF A MULTIVARIATE CALIBRATION METHOD
SELECTION OF A MULTIVARIATE CALIBRATION METHOD 0. Aim of this document Different types of multivariate calibration methods are available. The aim of this document is to help the user select the proper
More informationSensitivity to parameter and data variations in dimensionality reduction techniques
Sensitivity to parameter and data variations in dimensionality reduction techniques Francisco J. García-Fernández 1,2,MichelVerleysen 2, John A. Lee 3 and Ignacio Díaz 1 1- Univ. of Oviedo - Department
More informationBasis Functions. Volker Tresp Summer 2017
Basis Functions Volker Tresp Summer 2017 1 Nonlinear Mappings and Nonlinear Classifiers Regression: Linearity is often a good assumption when many inputs influence the output Some natural laws are (approximately)
More informationLS-SVM Functional Network for Time Series Prediction
LS-SVM Functional Network for Time Series Prediction Tuomas Kärnä 1, Fabrice Rossi 2 and Amaury Lendasse 1 Helsinki University of Technology - Neural Networks Research Center P.O. Box 5400, FI-02015 -
More informationBasics of Multivariate Modelling and Data Analysis
Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 9. Linear regression with latent variables 9.1 Principal component regression (PCR) 9.2 Partial least-squares regression (PLS) [ mostly
More informationRecent Developments in Model-based Derivative-free Optimization
Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:
More information3 Nonlinear Regression
3 Linear models are often insufficient to capture the real-world phenomena. That is, the relation between the inputs and the outputs we want to be able to predict are not linear. As a consequence, nonlinear
More information3 Feature Selection & Feature Extraction
3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy
More informationLocally Linear Landmarks for large-scale manifold learning
Locally Linear Landmarks for large-scale manifold learning Max Vladymyrov and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu
More informationMathematical morphology for grey-scale and hyperspectral images
Mathematical morphology for grey-scale and hyperspectral images Dilation for grey-scale images Dilation: replace every pixel by the maximum value computed over the neighborhood defined by the structuring
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationLast time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression
Machine learning, pattern recognition and statistical data modelling Lecture 3. Linear Methods (part 1) Coryn Bailer-Jones Last time... curse of dimensionality local methods quickly become nonlocal as
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationA Geometric Analysis of Subspace Clustering with Outliers
A Geometric Analysis of Subspace Clustering with Outliers Mahdi Soltanolkotabi and Emmanuel Candés Stanford University Fundamental Tool in Data Mining : PCA Fundamental Tool in Data Mining : PCA Subspace
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationDD-HDS: a method for visualization and exploration of high-dimensional data
TNN05-P800 1 DD-HDS: a method for visualization and exploration of high-dimensional data Sylvain Lespinats, Michel Verleysen, Senior Member, IEEE, Alain Giron, Bernard Fertil Abstract Mapping high-dimensional
More informationChemometrics. Description of Pirouette Algorithms. Technical Note. Abstract
19-1214 Chemometrics Technical Note Description of Pirouette Algorithms Abstract This discussion introduces the three analysis realms available in Pirouette and briefly describes each of the algorithms
More informationNumerical Optimization: Introduction and gradient-based methods
Numerical Optimization: Introduction and gradient-based methods Master 2 Recherche LRI Apprentissage Statistique et Optimisation Anne Auger Inria Saclay-Ile-de-France November 2011 http://tao.lri.fr/tiki-index.php?page=courses
More informationChap.12 Kernel methods [Book, Chap.7]
Chap.12 Kernel methods [Book, Chap.7] Neural network methods became popular in the mid to late 1980s, but by the mid to late 1990s, kernel methods have also become popular in machine learning. The first
More information9 length of contour = no. of horizontal and vertical components + ( 2 no. of diagonal components) diameter of boundary B
8. Boundary Descriptor 8.. Some Simple Descriptors length of contour : simplest descriptor - chain-coded curve 9 length of contour no. of horiontal and vertical components ( no. of diagonal components
More informationMachine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016
Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the
More informationCOMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Simplest
More informationOptimization. Industrial AI Lab.
Optimization Industrial AI Lab. Optimization An important tool in 1) Engineering problem solving and 2) Decision science People optimize Nature optimizes 2 Optimization People optimize (source: http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do)
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationCSC 411: Lecture 14: Principal Components Analysis & Autoencoders
CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Raquel Urtasun & Rich Zemel University of Toronto Nov 4, 2015 Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 1 / 18
More informationModel validation T , , Heli Hiisilä
Model validation T-61.6040, 03.10.2006, Heli Hiisilä Testing Neural Models: How to Use Re-Sampling Techniques? A. Lendasse & Fast bootstrap methodology for model selection, A. Lendasse, G. Simon, V. Wertz,
More informationNonparametric regression using kernel and spline methods
Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested
More informationDIMENSION REDUCTION FOR HYPERSPECTRAL DATA USING RANDOMIZED PCA AND LAPLACIAN EIGENMAPS
DIMENSION REDUCTION FOR HYPERSPECTRAL DATA USING RANDOMIZED PCA AND LAPLACIAN EIGENMAPS YIRAN LI APPLIED MATHEMATICS, STATISTICS AND SCIENTIFIC COMPUTING ADVISOR: DR. WOJTEK CZAJA, DR. JOHN BENEDETTO DEPARTMENT
More informationAkarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different
More informationCSC 411: Lecture 14: Principal Components Analysis & Autoencoders
CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 14-PCA & Autoencoders 1 / 18
More informationDimension Reduction of Image Manifolds
Dimension Reduction of Image Manifolds Arian Maleki Department of Electrical Engineering Stanford University Stanford, CA, 9435, USA E-mail: arianm@stanford.edu I. INTRODUCTION Dimension reduction of datasets
More information1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics
1. Introduction EE 546, Univ of Washington, Spring 2016 performance of numerical methods complexity bounds structural convex optimization course goals and topics 1 1 Some course info Welcome to EE 546!
More informationArtificial Neural Networks Unsupervised learning: SOM
Artificial Neural Networks Unsupervised learning: SOM 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Basis Functions Tom Kelsey School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ tom@cs.st-andrews.ac.uk Tom Kelsey ID5059-02-BF 2015-02-04
More informationLearning from Data: Adaptive Basis Functions
Learning from Data: Adaptive Basis Functions November 21, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Neural Networks Hidden to output layer - a linear parameter model But adapt the features of the model.
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationLocally Weighted Learning
Locally Weighted Learning Peter Englert Department of Computer Science TU Darmstadt englert.peter@gmx.de Abstract Locally Weighted Learning is a class of function approximation techniques, where a prediction
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationMachine Learning. Topic 4: Linear Regression Models
Machine Learning Topic 4: Linear Regression Models (contains ideas and a few images from wikipedia and books by Alpaydin, Duda/Hart/ Stork, and Bishop. Updated Fall 205) Regression Learning Task There
More informationCS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003
CS 664 Slides #11 Image Segmentation Prof. Dan Huttenlocher Fall 2003 Image Segmentation Find regions of image that are coherent Dual of edge detection Regions vs. boundaries Related to clustering problems
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Unsupervised learning Daniel Hennes 29.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Supervised learning Regression (linear
More informationA Stochastic Optimization Approach for Unsupervised Kernel Regression
A Stochastic Optimization Approach for Unsupervised Kernel Regression Oliver Kramer Institute of Structural Mechanics Bauhaus-University Weimar oliver.kramer@uni-weimar.de Fabian Gieseke Institute of Structural
More informationLecture Topic Projects
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationCPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017
CPSC 340: Machine Learning and Data Mining Multi-Dimensional Scaling Fall 2017 Assignment 4: Admin 1 late day for tonight, 2 late days for Wednesday. Assignment 5: Due Monday of next week. Final: Details
More informationExtending reservoir computing with random static projections: a hybrid between extreme learning and RC
Extending reservoir computing with random static projections: a hybrid between extreme learning and RC John Butcher 1, David Verstraeten 2, Benjamin Schrauwen 2,CharlesDay 1 and Peter Haycock 1 1- Institute
More information