NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Similar documents
S1 Note. Basis functions.

Hermite Splines in Lie Groups as Products of Geodesics

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

y and the total sum of

Smoothing Spline ANOVA for variable screening

Help for Time-Resolved Analysis TRI2 version 2.4 P Barber,

Support Vector Machines

X- Chart Using ANOM Approach

Wishing you all a Total Quality New Year!

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

Radial Basis Functions

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Programming in Fortran 90 : 2017/2018

A Semi-parametric Regression Model to Estimate Variability of NO 2

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation

A Statistical Model Selection Strategy Applied to Neural Networks

Wavefront Reconstructor

Feature Reduction and Selection

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

TN348: Openlab Module - Colocalization

Available online at ScienceDirect. Procedia Environmental Sciences 26 (2015 )

Why visualisation? IRDS: Visualization. Univariate data. Visualisations that we won t be interested in. Graphics provide little additional information

Lecture 5: Multilayer Perceptrons

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

Anonymisation of Public Use Data Sets

Problem Set 3 Solutions

CMPS 10 Introduction to Computer Science Lecture Notes

Mixed Linear System Estimation and Identification

Analysis of Malaysian Wind Direction Data Using ORIANA

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Review of approximation techniques

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Cubic Spline Interpolation for. Petroleum Engineering Data

A Binarization Algorithm specialized on Document Images and Photos

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Module Management Tool in Software Development Organizations

Improved Methods for Lithography Model Calibration

Adaptive Regression in SAS/IML

A Robust Method for Estimating the Fundamental Matrix

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Computational Statistics and Data Analysis. Robust smoothing of gridded data in one and higher dimensions with missing values

Edge Detection in Noisy Images Using the Support Vector Machines

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Statistical analysis on mean rainfall and mean temperature via functional data analysis technique

What are the camera parameters? Where are the light sources? What is the mapping from radiance to pixel color? Want to solve for 3D geometry

FITTING A CHI -square CURVE TO AN OBSERVI:D FREQUENCY DISTRIBUTION By w. T Federer BU-14-M Jan. 17, 1951

Three supervised learning methods on pen digits character recognition dataset

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Unsupervised Learning

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Analysis of Continuous Beams in General

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Classification / Regression Support Vector Machines

Related-Mode Attacks on CTR Encryption Mode

Outlier Detection based on Robust Parameter Estimates

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

Biostatistics 615/815

CS 534: Computer Vision Model Fitting

Page 0 of 0 SPATIAL INTERPOLATION METHODS

Estimating Regression Coefficients using Weighted Bootstrap with Probability

Unsupervised Learning and Clustering

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

REFRACTIVE INDEX SELECTION FOR POWDER MIXTURES

Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids Verification. General Terms Algorithms

7/12/2016. GROUP ANALYSIS Martin M. Monti UCLA Psychology AGGREGATING MULTIPLE SUBJECTS VARIANCE AT THE GROUP LEVEL

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Mathematics 256 a course in differential equations for engineering students

Lecture 5: Probability Distributions. Random Variables

Array transposition in CUDA shared memory

5 The Primal-Dual Method

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

A Robust LS-SVM Regression

Laplacian Eigenmap for Image Retrieval

A Bootstrap Approach to Robust Regression

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

General Vector Machine. Hong Zhao Department of Physics, Xiamen University

Outline. Midterm Review. Declaring Variables. Main Variable Data Types. Symbolic Constants. Arithmetic Operators. Midterm Review March 24, 2014

ECONOMICS 452* -- Stata 12 Tutorial 6. Stata 12 Tutorial 6. TOPIC: Representing Multi-Category Categorical Variables with Dummy Variable Regressors

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

GSLM Operations Research II Fall 13/14

A Simple and Efficient Goal Programming Model for Computing of Fuzzy Linear Regression Parameters with Considering Outliers

LECTURE : MANIFOLD LEARNING

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Backpropagation: In Search of Performance Parameters

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Video Object Tracking Based On Extended Active Shape Models With Color Information

NAG C Library Function Document nag_robust_m_regsn_param_var (g02hfc)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Transcription:

Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson Models... 2 2.3 Densty Estmaton... 3 2.4 Smoothers for Tme Seres... 4 3 Recommendatons on Choce and Use of Avalable Routnes... 4 4 Routnes Wthdrawn or Scheduled for Wthdrawal... 5 5 References... 5 [NP3546/20A] G10.1

Introducton G10 NAG Fortran Lbrary Manual 1 Scope of the Chapter Ths chapter s concerned wth methods for smoothng data. Included are methods for densty estmaton, smoothng tme seres data, and statstcal applcatons of splnes. These methods may also be vewed as nonparametrc modellng. 2 Background to the Problems 2.1 Smoothng Methods Many of the methods used n statstcs nvolve fttng a model, the form of whch s determned by a small number of parameters, for example, a dstrbuton model lke the gamma dstrbuton, a lnear regresson model or an autoregresson model n tme seres. In these cases the fttng nvolves the estmaton of the small number of parameters from the data. In modellng data wth these models there are two mportant stages n addton to the estmaton of the parameters; these are the dentfcaton of a sutable model, for example, the selecton of a gamma dstrbuton rather than a Webull dstrbuton, and the checkng to see f the ftted model adequately fts the data. Whle these parametrc models can be farly flexble, they wll not adequately ft all data sets, especally f the number of parameters s to be kept small. Alternatve models based on smoothng can be used. These models wll not be wrtten explctly n terms of parameters. They are suffcently flexble for a much wder range of stuatons than parametrc models. The man requrement for such a model to be sutable s that the underlyng models would be expected to be smooth, so excludng those stuatons where, for example, a step functon would be expected. These smoothng methods can be used n a varety of ways, for example: 1. producng smoothed plots to ad understandng; 2. dentfyng of a sutable parametrc model from the shape of the smoothed data; 3. elmnatng complex effects that are not of drect nterest so that attenton can be focused on the effects of nterest. Several smoothng technques make use of a smoothng parameter whch can be ether chosen by the user or estmated from the data. The smoothng parameter balances the two crteron of smoothness of the ftted model and the closeness of the ft of the model to the data. Generally, the larger the smoothng parameter s, the smoother the ftted model wll be, but for small values of the smoothng parameter the model wll closely follow the data, and for large values the ft wll be poorer. The smoothng parameter can be ether chosen usng prevous experence of a sutable value for such data, or estmated from the data. The estmaton can be ether formal, usng a crteron such as the crossvaldaton, or nformal by tryng dfferent values and examnng the result by means of sutable graphs. Smoothng methods can be used n three mportant areas of of statstcs: regresson modellng, dstrbuton modellng and tme seres modellng. 2.2 Smoothng Splnes and Regresson Models For a set of n observatons (y ;x ), ¼ 1; 2;...;n, the splne provdes a flexble smooth functon for stuatons n whch a smple polynomal or nonlnear regresson model s not sutable. Cubc smoothng splnes arse as the functon, f, wth contnuous frst dervatve whch mnmzes X n w ðy fðx ÞÞ 2 þ Z 1 1 ðf 00 ðxþþ 2 dx; where w s the (optonal) weght for the th observaton and s the smoothng parameter. Ths crteron conssts of two parts: the frst measures the ft of the curve and the second the smoothness of the curve. The value of the smoothng parameter,, weghts these two aspects: larger values of gve a smoother ftted curve but, n general, a poorer ft. Splnes are lnear smoothers snce the ftted values, ^y ¼ð^y 1 ; ^y 2 ;...; ^y n Þ T, can be wrtten as a lnear functon of the observed values y ¼ðy 1 ;y 2 ;...;y n Þ T, that s, ^y ¼ Hy G10.2 [NP3546/20A]

Introducton G10 for a matrx H. The degrees of freedom for the splne s traceðhþ gvng resdual degrees of freedom traceði HÞ ¼ Xn ð1 h Þ: The dagonal elements of H, h, are the leverages. The parameter can be estmated n a number of ways. 1. The degrees of freedom for the splne can be specfed,.e., fnd such that traceðhþ ¼ 0 for gven 0. 2. Mnmze the cross-valdaton (CV),.e., fnd such that the CV s mnmzed, where CV ¼ 1 X n r 2 : n 1 h 3. Mnmze generalsed cross-valdaton (GCV),.e., fnd such that the GCV s mnmzed, where GCV ¼ n P n r2 P n ð 1 h 2!: Þ 2.3 Densty Estmaton The object of densty estmaton s to produce from a set of observatons a smooth nonparametrc estmate of the unknown densty functon from whch the observatons were drawn. That s, gven a sample of n observatons, x 1, x 2 ;...;x n, from a dstrbuton wth unknown densty functon, fðxþ, fnd an estmate of the densty functon, ^fðxþ. The smplest form of densty estmator s the hstogram; ths may be defned by ^fðxþ ¼ 1 nh n j; a þðj 1Þh <x<aþjh; j ¼ 1; 2;...;n s ; where n j s the number of observatons fallng n the nterval a þðj 1Þh to a þ jh, a s the lower bound of the hstogram and b ¼ n s h s the upper bound. The value h s known as the wndow wdth. A smple development of ths estmator would be the runnng hstogram estmator ^fðxþ ¼ 1 2nh n x; a x b; where n x s the number of observatons fallng n the nterval ½x h : x þ hš. Ths estmator can be wrtten as ^fðxþ ¼ 1 X n w x x nh h for a functon w where wðxþ ¼ 1 2 f 1 <x<1 ¼ 0 otherwse: The functon w can be consdered as a kernel functon. To produce a smoother densty estmate, the kernel functon, KðtÞ, whch satsfes the followng condtons can be used: Z 1 1 The kernel densty estmator s therefore defned as KðtÞ dt ¼ 1 and KðtÞ 0:0: ^fðxþ ¼ 1 X n nh K x x : h The choce of KðÞ s usually not mportant, but to ease computatonal burden use can be made of Gaussan kernel defned as KðtÞ ¼ p 1 ffffffffff e t2 =2 : 2 [NP3546/20A] G10.3

Introducton G10 NAG Fortran Lbrary Manual The smoothness of the estmator, ^fðxþ, depends on the wndow wdth, h. In general, the larger the value h s, the smoother the resultng densty estmate s. There s, however, the problem of oversmoothng when the value of h s too large and essental features of the dstrbuton functon are removed. For example, f the dstrbuton was bmodal, a large value of h may result n a unmodal estmate. The value of h has to be chosen such that the essental shape of the dstrbuton s retaned whle effects due only to the observed sample are smoothed out. The choce of h can be aded by lookng at plots of the densty estmate for dfferent values of h, or by usng cross-valdaton methods; see Slverman (1990). Slverman (1990) shows how the Gaussan kernel densty estmator can be computed usng a fast Fourer transform (FFT). 2.4 Smoothers for Tme Seres If the data conssts of a sequence of n observatons recorded at equally spaced ntervals, usually a tme seres, several robust smoothers are avalable. The ftted curve s ntended to be robust to any outlyng observatons n the sequence, hence the technques employed prmarly make use of medans rather than means. These deas come from the feld of exploratory data analyss (EDA); see Tukey (1977) and Velleman and Hoagln (1981). The smoothers are based on the use of runnng medans to summarze overlappng segments; these provde a smple but flexble curve. In EDA termnology, the ftted curve and the resduals are called the smooth and the rough respectvely, so that Data ¼ Smooth þ Rough: Usng the notaton of Tukey, one of the smoothers commonly used s 4253H,twce. Ths conssts of a runnng medan of 4, then 2, then 5, then 3. Ths s then followed by what s known as hannng. Hannng s a runnng weghted mean, the weghts beng 1/4, 1/2 and 1/4. The result of ths smoothng s then reroughed. Ths nvolves computng resduals from the computed ft, applyng the same smoother to the resduals and addng the result to the smooth of the frst pass. 3 Recommendatons on Choce and Use of Avalable Routnes Note: refer to the Users Note for your mplementaton to check that a routne s avalable. The followng routnes ft smoothng splnes: G10ABF computes a cubc smoothng splne for a gven value of the smoothng parameter. The results returned nclude the values of leverages and the coeffcents of the cubc splne. Optons allow only parts of the computaton to be performed when the routne s used to estmate the value of the smoothng parameter or as when t s part of an teratve procedure such as that used n fttng generalzed addtve models; see Haste and Tbshran (1990). G10ACF estmates the value of the smoothng parameter usng one of three crtera and fts the cubc smoothng splne usng that value. G10ABF and G10ACF requre the x to be strctly ncreasng. If two or more observatons have the same x -value then they should be replaced by a sngle observaton wth y equal to the (weghted) mean of the y-values and weght, w, equal to the sum of the weghts. Ths operaton can be performed by G10ZAF. The followng routne produces an estmate of the densty functon: G10BAF computes a densty estmate usng a Normal kernel. The followng routne produces a smoothed estmate for a tme seres: G10CAF computes a smoothed seres usng runnng medan smoothers. The followng servce routne s also avalable: G10ZAF orders and weghts the ðx; yþ nput data to produce a data set strctly monotonc n x. G10.4 [NP3546/20A]

Introducton G10 4 Routnes Wthdrawn or Scheduled for Wthdrawal None. 5 References Haste T J and Tbshran R J (1990) Generalzed Addtve Models Chapman and Hall Slverman B W (1990) Densty Estmaton Chapman and Hall Tukey J W (1977) Exploratory Data Analyss Addson-Wesley Velleman P F and Hoagln D C (1981) Applcatons, Bascs, and Computng of Exploratory Data Analyss Duxbury Press, Boston, MA [NP3546/20A] G10.5 (last)