Package feature. R topics documented: July 8, Version Date 2013/07/08

Similar documents
Package feature. R topics documented: October 26, Version Date

Package curvhdr. April 16, 2017

Package r2d2. February 20, 2015

Package SiZer. February 19, 2015

Package munfold. R topics documented: February 8, Type Package. Title Metric Unfolding. Version Date Author Martin Elff

Package FCGR. October 13, 2015

Package qvcalc. R topics documented: September 19, 2017

Package beanplot. R topics documented: February 19, Type Package

Package orthodr. R topics documented: March 26, Type Package

The RcmdrPlugin.HH Package

On Kernel Density Estimation with Univariate Application. SILOKO, Israel Uzuazor

Package RcmdrPlugin.HH

Package fso. February 19, 2015

FFT-Based Fast Computation of Multivariate Kernel Density Estimators with Unconstrained Bandwidth Matrices

Package Mondrian. R topics documented: March 4, Type Package

Package panelview. April 24, 2018

Package EnQuireR. R topics documented: February 19, Type Package Title A package dedicated to questionnaires Version 0.

Package kdevine. May 19, 2017

Package MPCI. October 25, 2015

The nor1mix Package. August 3, 2006

Package ALS. August 3, 2015

Package npphen. August 31, 2017

Package weco. May 4, 2018

The nor1mix Package. June 12, 2007

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Package waterfall. R topics documented: February 15, Type Package. Version Date Title Waterfall Charts in R

Package nprotreg. October 14, 2018

Package GLDreg. February 28, 2017

Package replicatedpp2w

Package sure. September 19, 2017

Package Daim. February 15, 2013

Package esabcv. May 29, 2015

Package cncagui. June 21, 2015

Package sspline. R topics documented: February 20, 2015

1 Overview XploRe is an interactive computational environment for statistics. The aim of XploRe is to provide a full, high-level programming language

Package mcmcse. February 15, 2013

Package meshsimp. June 13, 2017

Package mlegp. April 15, 2018

Package Kernelheaping

Package gplm. August 29, 2016

Package influence.sem

Package cgh. R topics documented: February 19, 2015

Package SCVA. June 1, 2017

Package spikes. September 22, 2016

Package sbf. R topics documented: February 20, Type Package Title Smooth Backfitting Version Date Author A. Arcagni, L.

Package gwrr. February 20, 2015

Package mixphm. July 23, 2015

MATLAB Routines for Kernel Density Estimation and the Graphical Representation of Archaeological Data

Package mtsdi. January 23, 2018

Package pca3d. February 17, 2017

Package batchmeans. R topics documented: July 4, Version Date

Package pendvine. R topics documented: July 9, Type Package

Package SCBmeanfd. December 27, 2016

Package desire. R topics documented: February 19, Version Title Desirability functions in R

Package seg. February 15, 2013

Package DFIT. February 19, 2015

Package spd. R topics documented: August 29, Type Package Title Semi Parametric Distribution Version Date

Package OLScurve. August 29, 2016

Package gibbs.met. February 19, 2015

Multivariate Analysis

Stat 302 Statistical Software and Its Applications Density Estimation

Package DTRlearn. April 6, 2018

Package SPIn. R topics documented: February 19, Type Package

Package ExceedanceTools

Package fractaldim. R topics documented: February 19, Version Date Title Estimation of fractal dimensions

Package basictrendline

Package TPD. June 14, 2018

Package gains. September 12, 2017

Package hsiccca. February 20, 2015

Package mrbsizer. May 2, 2018

Package MfUSampler. June 13, 2017

Bandwidth Selection for Kernel Density Estimation Using Total Variation with Fourier Domain Constraints

A Fast Clustering Algorithm with Application to Cosmology. Woncheol Jang

Package ade4tkgui. R topics documented: November 9, Version Date Title 'ade4' Tcl/Tk Graphical User Interface

Package visualizationtools

Package Kernelheaping

Package mmeta. R topics documented: March 28, 2017

Package ClustGeo. R topics documented: July 14, Type Package

Package bdots. March 12, 2018

Package MixSim. April 29, 2017

Package dirmcmc. R topics documented: February 27, 2017

Package OptimaRegion

The heplots Package. February 1, 2007

Package CINID. February 19, 2015

Package ldbod. May 26, 2017

Package svmpath. R topics documented: August 30, Title The SVM Path Algorithm Date Version Author Trevor Hastie

Package gsscopu. R topics documented: July 2, Version Date

Package spark. July 21, 2017

Package blocksdesign

Package GFD. January 4, 2018

Package iosmooth. R topics documented: February 20, 2015

Package MFDFA. April 18, 2018

Package sciplot. February 15, 2013

Package RTNduals. R topics documented: March 7, Type Package

Regression III: Advanced Methods

Improving the Post-Smoothing of Test Norms with Kernel Smoothing

Package SoftClustering

Package PIPS. February 19, 2015

Package ggsubplot. February 15, 2013

Package Grid2Polygons

Transcription:

Package feature July 8, 2013 Version 1.2.9 Date 2013/07/08 Title Feature significance for multivariate kernel density estimation Author Tarn Duong <tarn.duong@gmail.com> & Matt Wand <Matt.Wand@uts.edu.au> Maintainer Tarn Duong <tarn.duong@gmail.com> Depends R (>= 1.4.0), KernSmooth, ks (>= 1.8.0), tcltk Suggests MASS Feature significance for multivariate kernel density estimation License GPL-2 GPL-3 URL http://www.mvstat.net/tduong NeedsCompilation no Repository CRAN Date/Publication 2013-07-08 18:53:46 R topics documented: feature-package....................................... 2 earthquake.......................................... 2 featuresignif........................................ 3 featuresignifgui...................................... 5 plot.fs............................................ 6 SiZer,siCon......................................... 8 Index 10 1

2 earthquake feature-package feature Details Package for feature significance for multivariate kernel density estimation. The feature package contains functions to display and compute kernel density estimates, significant gradient and significant curvature regions. Significant gradient and/or curvature regions often correspond to significant features (e.g. local modes). There are two main functions in this package. featuresignifgui is the interactive function where the user can select bandwidths from a pre-defined range. This mode is useful for initial exploratory data analysis. featuresignif is the non-interactive function. This is useful when the user has a more definite idea of suitable values for the bandwidths. For a more detailed example for 1-d and 2-d data, see vignette("feature"). Author(s) See Also Tarn Duong <tarn.duong@gmail.com> & Matt Wand <wand@uow.edu.au> ks, sm, KernSmooth earthquake Mt St Helens earthquake data Usage Format This data set is a reduced version of the full data set in Scott (1992). It contains the first three variables. data(earthquake) A matrix with 3 columns and 510 rows. Each row corresponds to the measurements of an earthquake beneath the Mt St Helens volcano. The first column is the longitude (in degrees, where a negative number indicates west of the International Date Line), the second column is the latitude (in degrees, where a positive number indicates north of the Equator) and the third column is the depth (in km, where a negative number indicates below the Earth s surface).

featuresignif 3 Source Scott, D.W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization. John Wiley & Sons Inc., New York. featuresignif Feature significance for kernel density estimation Usage Identify significant features of kernel density estimates of 1- to 4-dimensional data. featuresignif(x, bw, gridsize, scaledata=false, addsignifgrad=true, addsignifcurv=true, signiflevel=0.05) Arguments x bw gridsize scaledata addsignifgrad addsignifcurv signiflevel data matrix vector of bandwidth(s) vector of estimation grid sizes flag for scaling the data i.e. transforming to unit variance for each dimension. flag for computing significant gradient regions flag for computing significant curvature regions significance level Details Feature significance is based on significance testing of the gradient (first derivative) and curvature (second derivative) of a kernel density estimate. This was developed for 1-d data by Chaudhuri & Marron (1995), for 2-d data by Godtliebsen, Marron & Chaudhuri (1999), and for 3-d and 4-d data by Duong, Cowling, Koch & Wand (2007). The test statistic for gradient testing is at a point x is W (x) = f(x; H) 2 where f(x; H) is kernel estimate of the gradient of f(x) with bandwidth H, and is the Euclidean norm. W (x) is approximately chi-squared distributed with d degrees of freedom where d is the dimension of the data. The analogous test statistic for curvature is W (2) (x) = vech (2) f(x; H) 2 where (2) f(x; H) is the kernel estimate of the curvature of f(x), and vech is the vector-half operator. W (2) (x) is approximately chi-squared distributed with d(d + 1)/2 degrees of freedom. Since this is a situation with many dependent hypothesis tests, we use a multiple comparison or simultaneous test to control the overall level of significance. We use a Hochberg-type procedure. See Hochberg (1988) and Duong, Cowling, Koch & Wand (2007).

4 featuresignif Value Returns an object of class fs which is a list with the following fields x data matrix names name labels used for plotting bw vector of bandwidths fhat kernel density estimate on a grid grad logical grid for significant gradient curv logical grid for significant curvature graddata logical vector for significant gradient data points graddatapoints significant gradient data points curvdata logical vector for significant curvature data points curvdatapoints significant curvature data points References Chaudhuri, P. & Marron, J.S. (1999) SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807-823. Duong, T., Cowling, A., Koch, I. & Wand, M.P. (2008) Feature significance for multivariate kernel density estimation. Computational Statistics and Data Analysis, 52, 4225-4242. Hochberg, Y. (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800-802. Godtliebsen, F., Marron, J.S. & Chaudhuri, P. (2002) Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics, 11, 1-22. See Also Wand, M.P. & Jones, M.C. (1995) Kernel Smoothing. Chapman & Hall/CRC, London. featuresignifgui, plot.fs Examples ## Univariate example data(earthquake) eq3 <- -log10(-earthquake[,3]) fs <- featuresignif(eq3, bw=0.1) plot(fs, addsignifgradregion=true) ## Bivariate example library(mass) data(geyser) fs <- featuresignif(geyser) plot(fs, addsignifcurvregion=true) ## Trivariate example data(earthquake)

featuresignifgui 5 earthquake[,3] <- -log10(-earthquake[,3]) fs <- featuresignif(earthquake, scaledata=true, bw=c(0.06, 0.06, 0.05)) plot(fs, addkde=true) plot(fs, addkde=false, addsignifcurvregion=true) featuresignifgui GUI for feature significance for kernel density estimation GUI for feature significance for kernel density estimation. Usage featuresignifgui(x, scaledata=false) Arguments x scaledata data matrix flag for scaling the data to the unit interval in each dimension Details In the first column are the sliders for selecting the bandwidths (one for each dimension). Move the slider buttons to change the value of the bandwidths. The text field is for the grid size which specifies the number of points in each dimension of the kernel estimation binning grid. Press the Compute significant features button to begin the computation. This creates a plot of the kernel density estimate (KDE) from the data with the specified bandwidths by calling featuresignif. Once this complete, a pop-up window will appear. In the second column are the axis limits and labels. The last text field is for the (maximum) number of data points used in the display. Press the Reset plot (except KDE) button to clear the plot of all added features except for the KDE itself. In the third column are 5 buttons which can be used to add to the KDE plot such as the data points, significant gradient points/regions and significant curvature points/regions. For 1-d data, the button in the third column is Compute SiZer map. Press this button to compute a gradient SiZer plot using the SiZer function. Once this complete, a pop-up window will appear. For 2- and 3-d data, the button in the third column is Reset plot. This will clear the plot of all features as well as the KDE. This is useful for showing only the significant features when the KDE may interfere with their display. For 3-d data, there is an extra fourth column of options: these are sliders for the transparency values for the features. Move the slider button along to the desired value (between 0 and 1) and then press the Add... button to the left. Repeatedly pressing the Add... button will cause the transparency of the features to decrease. In this case, press the one of the Reset plot buttons to clear the plot window, and replot the significant feature with the desired transparency.

6 plot.fs Examples ## Not run: library(mass) data(geyser) duration <- geyser$duration featuresignifgui(duration) featuresignifgui(geyser) ## univariate example ## bivariate example data(earthquake) ## trivariate example earthquake$depth <- -log10(-earthquake$depth) featuresignifgui(earthquake, scaledata=true) ## End(Not run) plot.fs Feature signficance plot for 1- to 3-dimensional data Usage Feature signficance plot for 1- to 3-dimensional data. ## S3 method for class fs plot(x,..., xlab, ylab, zlab, xlim, ylim, zlim, add=false, adddata=false, scaledata=false, adddatanum=1000, addkde=true, jitterrug=true, addsignifgradregion=false, addsignifgraddata=false, addsignifcurvregion=false, addsignifcurvdata=false, addaxes3d=true, denscol, datacol="black", gradcol="green4", curvcol="blue", axiscol="black", bgcol="white", dataalpha=0.1, graddataalpha=0.3, gradregionalpha=0.2, curvdataalpha=0.3, curvregionalpha=0.3) Arguments x an object of class fs (output from featuresignif function) xlim, ylim, zlim x-, y-, z-axis limits xlab, ylab, zlab x-, y-, z-axis labels scaledata add adddata adddatanum addkde flag for scaling the data i.e. transforming to unit variance for each dimension flag for adding to an existing plot flag for display of the data maximum number of data points plotted in displays flag for display of kernel density estimates

plot.fs 7 jitterrug flag for jittering of rug-plot for univariate data display addsignifgradregion,addsignifgraddata flag for display of significant gradient regions/data points addsignifcurvregion,addsignifcurvdata flag for display of significant curvature regions/data points addaxes3d denscol datacol gradcol curvcol axiscol bgcol dataalpha flag for displaying axes in 3-d displays colour of density estimate curve colour of data points colour of significant gradient regions/data points colour of significant curvature regions/data points colour of axes colour of background transparency of data points gradregionalpha,graddataalpha transparency of significant gradient regions/data points curvregionalpha,curvdataalpha transparency of significant curvature regions/data points... other graphics parameters Value Plot of 1-d and 2-d kernel density estimates are sent to graphics window. Plot for 3-d is sent to RGL window. See Also featuresignif Examples library(mass) data(geyser) fs <- featuresignif(geyser, bw=c(4.5, 0.37)) plot(fs, addkde=false, adddata=true) ## data only plot(fs, addkde=true) ## KDE plot only plot(fs, addsignifgradregion=true) plot(fs, addkde=false, addsignifcurvregion=true) plot(fs, addsignifcurvdata=true, curvcol="cyan")

8 SiZer,siCon SiZer,siCon SiZer and SiCon plots for 1-dimensional data SiZer (Significant Zero crossings) and (Significant Convexity) plots for 1-dimensional data. Usage SiZer(x, bw, gridsize, scaledata=false, signiflevel=0.05, plotsizer=true, logbw=true, xlim, xlab, addlegend=true, poslegend="bottomright") SiCon(x, bw, gridsize, scaledata=false, signiflevel=0.05, plotsicon=true, logbw=true, xlim, xlab, addlegend=true, poslegend="bottomright") Arguments x bw gridsize scaledata data vector vector of range of bandwidths number of x- and y-axis grid points flag for scaling the data i.e. transforming to unit variance for each dimension. signiflevel significance level plotsizer,plotsicon flag for displaying SiZer/SiCon map logbw xlim xlab Details addlegend poslegend flag for displaying log bandwidths on y-axis x-axis limits x-axis label flag for legend display legend position The gradient SiZer and curvature SiCon maps of Chaudhuri & Marron (1999) are implemented. The horizontal axis is the data axis, the vertical axis are the bandwidths. The colour scheme for the SiZer map is red: negative gradient, blue: positive gradient, purple: zero gradient and grey: sparse regions. For the SiCon map, orange: negative curvature (concave), blue: positive curvature (convex), green: zero curvature and grey: sparse regions. Value SiZer plot sent to graphics window.

SiZer,siCon 9 References Chaudhuri, P. & Marron, J.S. (1999) SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807-823. See Also featuresignif Examples data(earthquake) eq3 <- -log10(-earthquake[,3]) SiZer(eq3) SiCon(eq3)

Index Topic datasets earthquake, 2 Topic hplot plot.fs, 6 SiZer,siCon, 8 Topic package feature-package, 2 Topic smooth featuresignif, 3 featuresignifgui, 5 earthquake, 2 feature (feature-package), 2 feature-package, 2 featuresignif, 2, 3, 5 7, 9 featuresignifgui, 2, 4, 5 plot.fs, 4, 6 SiCon (SiZer,siCon), 8 SiZer, 5 SiZer (SiZer,siCon), 8 SiZer,siCon, 8 10