Preprocessing: Smoothing and Derivatives

Similar documents
Data Preprocessing. D.N. Rutledge, AgroParisTech

SPA_GUI. Matlab graphical user interface allowing signal processing and. variable selection for multivariate calibration.

Improving the 3D Scan Precision of Laser Triangulation

Program II Section-by-Section Description of Program Logic

SUPPLEMENTARY INFORMATION

Automatic Lane Extraction in Hemoglobin and Serum Protein Electrophoresis Using Image Processing

Wavelet Shrinkage in Noise Removal of Hyperspectral Remote Sensing Data

VARIANCE REDUCTION TECHNIQUES IN MONTE CARLO SIMULATIONS K. Ming Leung

Help Page for the Java-Powered Simulation for Experimental Structural Dynamics

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data

Application of Wavelet Packet Transform in Pattern Recognition of Near-IR Data

Interpolation by Spline Functions

Fourier Transformation Methods in the Field of Gamma Spectrometry

14.8 Savitzky-Golay Smoothing Filters

NMRProcFlow Macro-command Reference Guide

IgorCD Procedures. Data -> Load CD Data

Agilent ChemStation for UV-Visible Spectroscopy

Machine Learning / Jan 27, 2010

Experiment 5: Exploring Resolution, Signal, and Noise using an FTIR CH3400: Instrumental Analysis, Plymouth State University, Fall 2013

Agilent ChemStation for UV-visible Spectroscopy

Introduction to Raman spectroscopy measurement data processing using Igor Pro

Pre-processing in vibrational spectroscopy, a when, why and how

Simulations Relating to the Determination of Protein Secondary Structure Fractions from Circular Dichroism Spectra

Multivariate Calibration Quick Guide

Instrument Specific Signal

3D Motion from Image Derivatives Using the Least Trimmed Square Regression

1D Regression. i.i.d. with mean 0. Univariate Linear Regression: fit by least squares. Minimize: to get. The set of all possible functions is...

V. Zetterberg Amlab Elektronik AB Nettovaegen 11, Jaerfaella, Sweden Phone:

Clustering Functional Data with the SOM algorithm

Chapter 7. Widely Tunable Monolithic Laser Diodes

PROGRAM DOCUMENTATION FOR SAS PLS PROGRAM I. This document describes the source code for the first program described in the article titled:

Spline fitting tool for scientometric applications: estimation of citation peaks and publication time-lags

Online Calibration of Path Loss Exponent in Wireless Sensor Networks

Interpolation and Basis Fns

Robust Poisson Surface Reconstruction

A Practical Review of Uniform B-Splines

Estimation of common groundplane based on co-motion statistics

ESTIMATION OF THE BLOOD VELOCITY SPECTRUM USING A RECURSIVE LATTICE FILTER

Improved Centroid Peak Detection and Mass Accuracy using a Novel, Fast Data Reconstruction Method

WAVELET USE FOR IMAGE RESTORATION

Four equations are necessary to evaluate these coefficients. Eqn

Image features. Image Features

Lecture VIII. Global Approximation Methods: I

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Numerical Robustness. The implementation of adaptive filtering algorithms on a digital computer, which inevitably operates using finite word-lengths,

arxiv: v1 [cs.lg] 1 Nov 2018

TIME-FREQUENCY SPECTRA OF MUSIC

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

Can automated smoothing significantly improve benchmark time series classification algorithms?

THE DEVELOPMENT OF CALIBRATION MODELS FOR SPECTROSCOPIC DATA USING PRINCIPAL COMPONENT REGRESSION

Lecture 7: Most Common Edge Detectors

Curve Fitting Toolbox

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

Automated Baseline Estimation for Analytical Signals

LS-SVM Functional Network for Time Series Prediction

Curve Fitting Toolbox

SELECTION OF A MULTIVARIATE CALIBRATION METHOD

Design Optimization of a Compressor Loop Pipe Using Response Surface Method

Neural Networks Based Time-Delay Estimation using DCT Coefficients

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Savitsky-Golay Filters

k-means, k-means++ Barna Saha March 8, 2016

A Comparative Study of LOWESS and RBF Approximations for Visualization

Privacy Preserving Machine Learning: A Theoretically Sound App

QstatLab: software for statistical process control and robust engineering

Model Fitting. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

Noise Canceller Using a New Modified Adaptive Step Size LMS Algorithm

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Smooth Curve from noisy 2-Dimensional Dataset

Central Slice Theorem

A Joint Histogram - 2D Correlation Measure for Incomplete Image Similarity

Fitting Baselines to Spectra

Correction and Calibration 2. Preprocessing

Formula for the asymmetric diffraction peak profiles based on double Soller slit geometry

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

Agilent ChemStation for UV-visible Spectroscopy

Lecture 16: High-dimensional regression, non-linear regression

A Trimmed Translation-Invariant Denoising Estimator

Dealing with Data in Excel 2013/2016

LECTURE 15. Dr. Teresa D. Golden University of North Texas Department of Chemistry

The latest trend of hybrid instrumentation

Thin film solar cell simulations with FDTD

Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling

IMPULSE RESPONSE IDENTIFICATION BY ENERGY SPECTRUM METHOD USING GEOINFORMATION DATA IN CASE OF REMOTE SENSING IMAGES

Removing a mixture of Gaussian and impulsive noise using the total variation functional and split Bregman iterative method

Multi Layer Perceptron trained by Quasi Newton learning rule

SUPPLEMENTARY INFORMATION

The art of well tying with new MATLAB tools

Image denoising in the wavelet domain using Improved Neigh-shrink

Regression on a Graph

Construction of Limited Diffraction Beams with Bessel Bases

Introduction to Multivariate Image Analysis (MIA) Table of Contents

Multicomponent land data pre-processing for FWI: a benchmark dataset

Alpy 600 Mixed Mode Calibration

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Accelerometer Gesture Recognition

CHAPTER 6 COUNTER PROPAGATION NEURAL NETWORK FOR IMAGE RESTORATION

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

IMAGE DE-NOISING IN WAVELET DOMAIN

Refined Measurement of Digital Image Texture Loss Peter D. Burns Burns Digital Imaging Fairport, NY USA

Transcription:

Preprocessing: Smoothing and Derivatives Jarmo Alander University of Vaasa November 1, 2010

Contents Preprocessing Baseline correction. Autoscaling. Savitzky-Golay Smoothing Filters Savitzky-Golay Filters for Derivatives Smoothing Splines Splines for Derivatives

Preprocessing Normalizing. It is done by dividing each feature realted to a sample vector by a constant. The constant value may be defined as the 1-norm or 2-norm of the vector. Weighting. For weighting any values are used. The purpose to give importance only to some samples. Smoothing. Reduce noise and keep useful variation. Smoothing filters. Baseline correction. Reduce unwilling slow variation. Estimation of baseline and subtruction from spectrum, Derivatives, Multiplicative Scatter Correction MSC.

Baseline correction We consider the measured spectrum s. The baseline is approximated using polynomial [3], for example up to linear term. s = s + α + βx where x is a wavelength number, α defines a shift (offset) in spectrum and β desribes a linear change in spectrum. We are interested in s and the rest terms describe unwilling variation. If the baseline model has only shift then after estimating the shift we may subtruct it from each spectrum.

Derivatives If we have only shift s = s + α we can use derivative with respect x to remove the shift as follows: s = ( s + α) = s. Computing the successive derivaties we may remove the baseline defined by the higher order model. Numerically the first derivative for s = (s 1, s 2,...,s k ) can be computed using a moving window where s i is replaced by s i s i 1. For the second derivate we use the moving window: (s i s i 1 ) (s i 1 s i 2 ) = s i 2s i 1 + s i 2 We can also use the derivative using the moving window: (s i +s i 1 ) 2 ( s i 1+s i 2 ) 2. This computing is robust to noise because we average spectrum values.

Autoscaling The autoscaling procedure uses both mean centering and variance scaling and helps for removing unwilling variation. Mean centering. For each feature we compute mean and subtruct it from the other feature values. Variance scaling is implemented by divison each feature value by standard deviation. Standard deviation is defined using values of this feature.

Savitzky-Golay Smoothing Filters The mesured spectra are usually slowly varying and corrupted by random noise. The beginning and the end of the spectra contain mostly noise. The purpose of smooting is to reduce the level of noise keeping spectrum details. Savitzky-Golay filters were designed to analyze the absorption peaks in noisy spectra. The absorption peaks suggest which chemical elements are presented in the tested materials. Filters have several names Savitzky-Golay, least squares and Digital Smoothing Polynomial (DISPO) filters. The filters operate in the time domain and relate to low-pass filters.

Savitzky-Golay Smoothing Filters Suppose we have a spectrum with equally spaced values s i = s 0 + i, where the spectral values are obtained at intervals anf i =..., 2, 1, 0, 1, 2,.... The finit impulse response filter replaces s i by a linear combination of its neighbor values f i = n right n=n left c n s i+n where n left and n right are the number of points to the left and right of a current point, respectively. c n are weight coefficients.

Savitzky-Golay Smoothing Filters For an averaging filter all coefficients are constant c n = 1/(n left + n right + 1). The coefficients are computed in a moving window. The Savitzky-Golay smoothing filter uses a polynomial of higher order. Least-squares fitting is used to fit the polynomial to the data inside the moving window. The use of the least square fit in this way may be computationally demanding. However it is possible to find the coefficients c n for which least square fitting inside a moving window is automatically implemented.

Savitzky-Golay Smoothing Filters Next the Savitzky-Golay smoothing filter coefficients are given for the polynomial order M [1]. M c nleft c n0 c nright 2-0.086 0.343 0.486 0.343-0.086 4 0.035-0.128 0.070 0.315 0.417 0.315 0.070-0.128 0.035 In practice, however, the larger values n left and n right are used, e.g. n left = n right = 16, 32.

Savitzky-Golay Smoothing Filters a) The generated synthetic signal a(x) which is used as an underlying function. b) The same signal with additive Gausian random noise. a b 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 1 2 3 4 5 6

Savitzky-Golay Smoothing Filters a) Moving average, n left = n right = 15. b) Savitzky-Golay filter, n left = n right = 15, polynomial order 2. The smoothed result is shown by green dots. a b 2 a vs. x a vs. x (smooth) 2 a vs. x a vs. x (smooth) 1.8 1.8 1.6 1.6 1.4 1.4 1.2 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 1 2 3 4 5 0 0 1 2 3 4 5

Savitzky-Golay Smoothing Filters a) Savitzky-Golay filter, n left = n right = 15, polynomial order 2. a) Savitzky-Golay filter, n left = n right = 30, polynomial order 2. The smoothed result is shown by green dots. a b 2 a vs. x a vs. x (smooth) 2 a vs. x a vs. x (smooth) 1.8 1.8 1.6 1.6 1.4 1.4 1.2 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 1 2 3 4 5 0 0 1 2 3 4 5

Savitzky-Golay Filters for Derivatives This technique may be used with noisy data. The polynomial and least square fitting are used at each point of spectra. The measured derivative is the derivative of the fitted polynomial. The Savitzky-Golay filters from libraries can be used in the derivative mode. Computation is made in the moving window.

Smoothing Splines Splines and wavelets are the other popular techniques used for smoothing. For example, the smoothing spline may fit spectral data. For the next illustration the smoothing splines were used from the functional data analysis library [2]. a) The original spectrum of the gastrointestinal tract of the dog. b) The smoothed spectrum. a b Value 300 250 200 150 100 50 0 400 500 600 700 800 900 1000 Wavelength, nm Value 300 250 200 150 100 50 400 500 600 700 800 900 1000 Wavelength, nm

Splines for derivatives The computation of derivatives may be ustable numerically. To achieve robust computation a B-spline is used. After obtaining the analytical spline expansion of spectrum the first and second derivatives can be exactly computed. a) The smoothed spectrum of the gastrointestinal tract of the dog. b) The first derivative. a b Value 300 250 200 150 100 50 Derivative value 20 15 10 5 0 5 10 15 400 500 600 700 800 900 1000 Wavelength, nm 20 400 500 600 700 800 900 1000 Wavelength, nm

References Press W. H., S.A. Teukolsky S.A., W. T. Vetterling W. T., Flannery B. P.. Numerical Recipes in C++, Cambridge University Press, 2002. Ramsay J. O. and Silverman B. W., Functional Data Analysis. Springer-Verlag, New York, 1997 Beebe K., Pell R. J., Seasholtz M. B., Chemometrics: A Practical guide. John Wiley & Sons, Inc., 1998. SOLO toolbox: http://wiki.eigenvector.com/index.php?title=main Page

Questions