Package Numero. November 24, 2018

Similar documents
Package bisect. April 16, 2018

Package validara. October 19, 2017

Package ECctmc. May 1, 2018

Package Rsomoclu. January 3, 2019

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package sigqc. June 13, 2018

Package SEMrushR. November 3, 2018

Package fastdummies. January 8, 2018

Package rucrdtw. October 13, 2017

Package strat. November 23, 2016

Package canvasxpress

Package ccapp. March 7, 2016

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package datasets.load

Package balance. October 12, 2018

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package kdtools. April 26, 2018

Package nmslibr. April 14, 2018

Package liftr. R topics documented: May 14, Type Package

Package farver. November 20, 2018

Package zoomgrid. January 3, 2019

Package docxtools. July 6, 2018

Package manet. September 19, 2017

Package PUlasso. April 7, 2018

Package elmnnrcpp. July 21, 2018

Package ipft. January 4, 2018

Package TVsMiss. April 5, 2018

Package semver. January 6, 2017

Package crossrun. October 8, 2018

Package clustvarsel. April 9, 2018

Package IATScore. January 10, 2018

Package CuCubes. December 9, 2016

Package queuecomputer

Package RNAseqNet. May 16, 2017

Package MTLR. March 9, 2019

Package ClusterSignificance

Package radiomics. March 30, 2018

Package fusionclust. September 19, 2017

Package ECOSolveR. February 18, 2018

Package RODBCext. July 31, 2017

Package BiocNeighbors

Package dyncorr. R topics documented: December 10, Title Dynamic Correlation Package Version Date

Package nngeo. September 29, 2018

Figure (5) Kohonen Self-Organized Map

Package GeneralizedUmatrix

Package longclust. March 18, 2018

Package geogrid. August 19, 2018

Package mixsqp. November 14, 2018

Package ClustGeo. R topics documented: July 14, Type Package

Package NFP. November 21, 2016

Package smoothr. April 4, 2018

Package rmi. R topics documented: August 2, Title Mutual Information Estimators Version Author Isaac Michaud [cre, aut]

Package qualmap. R topics documented: September 12, Type Package

Package raker. October 10, 2017

Package barcoder. October 26, 2018

Package readxl. April 18, 2017

Package preprosim. July 26, 2016

Package plotluck. November 13, 2016

Package wrswor. R topics documented: February 2, Type Package

Package optimus. March 24, 2017

Package catenary. May 4, 2018

Package ordinalclust

Package linkspotter. July 22, Type Package

Package pomdp. January 3, 2019

Package localsolver. February 20, 2015

Package gtrendsr. October 19, 2017

Package effectr. January 17, 2018

Package Rtsne. April 14, 2017

Package diffusr. May 17, 2018

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package diagis. January 25, 2018

Package epitab. July 4, 2018

Package rnn. R topics documented: June 21, Title Recurrent Neural Network Version 0.8.1

Package weco. May 4, 2018

Package PDN. November 4, 2017

Package gtrendsr. August 4, 2018

Package restlos. June 18, 2013

Package mmpa. March 22, 2017

Package postgistools

Package SCORPIUS. June 29, 2018

Package mgc. April 13, 2018

Package scmap. March 29, 2019

Package meme. November 2, 2017

Package tiler. June 9, 2018

Package dials. December 9, 2018

Package Mondrian. R topics documented: March 4, Type Package

Package reval. May 26, 2015

Package PCADSC. April 19, 2017

Package coga. May 8, 2018

Package omu. August 2, 2018

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package geoops. March 19, 2018

Package qicharts. October 7, 2014

Package meshsimp. June 13, 2017

Package Cubist. December 2, 2017

Package robustreg. R topics documented: April 27, Version Date Title Robust Regression Functions

Package rpostgislt. March 2, 2018

Package RCA. R topics documented: February 29, 2016

Transcription:

Package Numero November 24, 2018 Type Package Title Statistical Framework to Define Subgroups in Complex Datasets Version 1.1.1 Date 2018-11-21 Author Song Gao [aut], Stefan Mutter [aut], Aaron E. Casey [aut], Ville-Petteri Makinen [aut, cre] Maintainer Ville-Petteri Makinen <vpmakine@gmail.com> High-dimensional sets that do not exhibit a clear intrinsic clustered structure pose a challenge to conventional clustering algorithms. For this reason, we developed an unsupervised framework that helps scientists to better subgroup their sets based on visual cues, please see Gao S, Mutter S, Casey A, Makinen V-P (2018) Numero: a statistical framework to define multivariable subgroups in complex population-based sets, Int J Epidemiology, dyy113, <doi:10.1093/ije/dyy113>. The framework includes the necessary functions to construct a self-organizing map of the, to evaluate the statistical significance of the observed patterns, and to visualize the results. License GPL (>= 2) Imports Rcpp (>= 0.11.4) LinkingTo Rcpp VignetteBuilder knitr Suggests knitr, rmarkdown, BiocStyle NeedsCompilation yes Repository CRAN SystemRequirements C++11 Encoding UTF-8 Date/Publication 2018-11-24 17:20:09 UTC 1

2 nroaggregate R topics documented: nroaggregate........................................ 2 nrocolorize......................................... 3 nrokmeans......................................... 4 nrokohonen......................................... 4 nrolabel........................................... 5 nromatch.......................................... 6 nropermute......................................... 7 nroplot............................................ 7 nropreprocess........................................ 8 nrosummary........................................ 9 nrotrain........................................... 10 Index 11 nroaggregate Regional averages on a self-organizing map Estimate district averages based on assigned map locations for each point. nroaggregate(topology, districts, = NULL) topology districts A frame with K rows and six columns that contains the district positions of a self-organizing map, please see nrokohonen for details. A vector of M best-matching districts for each row in the matrix, please see nromatch for a typical scenario. A vector of M elements or an M x N matrix of values. A frame of K rows and N columns that contains the average district values after smoothing. The frame has also the attribute histogram that contains a K x 1 vector of estimated sample counts after smoothing.

nrocolorize 3 nrocolorize Assign colors based on value Assign colors to map districts based on the respective district values. nrocolorize(values, amplitudes = 1, palette = "rhodo" ) values amplitudes palette A vector of K values or a K x N matrix, where K is the number of map districts and N is the number of variables. Single value or a vector of N elements to specify available proportion of color range for each variable. One of pre-defined colormap names (see details) or a sorted vector of hexadecimal color codes as strings, see rgb() for additional details. The argument amplitudes controls the part of the color range that is available for the district value range. For proportions below 1, the minimum district value is assigned to a color that is between the first and middle element in the color palette, and the maximum is assigned to a color that is between the middle and the last element. If amplitude is greater than 1, the extreme low and high values are clipped to the first and last color in the palette, respectively. Palette can also contain the name of a colormap: gray, fire, jungle, miami, rhodo or tan. Any other word will revert to a rainbow colormap. A frame with K rows and N columns that contains color definitions as character strings.

4 nrokohonen nrokmeans K-means clustering K-means clustering for multi-dimensional. nrokmeans(, k = 3, subsample = NULL, balance = 0) k subsample balance Numerical frame or matrix with M rows and N columns. Number of centroids. Number of randomly selected rows used during a single training cycle. If 0, the algorithm is applied with no balancing, if 1 all the clusters will be forced to be of equal size. The K centroids are determined by Lloyd s algorithm with Euclidean distances. If subsample is less than the number of rows, a random subset of rows is used for each training cycle. A list of three named elements: centroids is a k x N matrix of the main results and layout contains the M best-matching centroid labels for each sample from the original set. Finally, history is the chronological record of training errors. nrokohonen Self-organizing map Interpolates the initial district profiles of a self-organizing map based on pre-determined seed profiles. The function is named after Teuvo Kohonen, the inventor of the self-organizing map. nrokohonen(seeds, radius = 3)

nrolabel 5 seeds radius A matrix of K rows and N columns. Map radius. A list containing two named elements: centroids contains the N-dimensional district profiles, and topology is an H x 6 matrix that contains the 2D spatial layout for the map districts: the first two columns (X, Y) indicate the positions of districts in Cartesian coordinates, the other four columns (RADIUS1, RADIUS2, ANGLE1, ANGLE2) define the perimeter of the district areas for visualisation on a circular map. See Also Please see nrokmeans as a means to create the seed profiles. nrolabel Label pruning Optimize the selection of labels on map districts. nrolabel(topology, values, gap = 2.3) topology values gap A matrix with six columns that contain the geometric properties of map districts, please see nrokohonen for details. A vector of K elements or an K x N matrix of smoothed district values, where K is the number of map districts and N is the number of variables, please see nroaggregate for details. Minimum distance between map districts with non-empty labels. The function assigns non-empty labels for districts based on the absolute deviations from the average district value. The most extreme districts are picked first, and then the remaining districts are prioritized based on their value and distance to the other districts already labeled.

6 nromatch A frame with K rows and N columns that contains labels for all the districts. nromatch Best-matching districts Compare the multivariate samples from a set against the district profiles of a self-organizing map (SOM). nromatch(som, ) som Either a matrix or a list with an element centroids that contains the matrix of reference profiles, such as outputs from nrokmeans and nrotrain. A matrix with identical column names to the centroid matrix. The matching error between a sample and a reference profile is defined as the Euclidean distance in N-dimensional space, where N is the number of variables. A vector of integers with elements corresponding to the rows in. Each element contains the index of the best matching centroid from som. The vector also has the attribute quality that contains three columns: RESIDUAL is the Euclidean distance in space (shorter is better), QUALITY is a scale-independent measure of the matching quality if training history is available (higher is better, 1 means equal quality to the training set), and COVERAGE shows the proportion of elements that were available for matching. Finally, the names of the columns that were used for matching are stored in the attribute features.

nropermute 7 nropermute Permutation analysis Estimate the dynamic range (and statistical significance) for regional patterns on a self-organizing maps using permutations. nropermute(som, districts,, n = 10000) som districts n A list that must contain the element topology, see nrokohonen for details. A numeric vector of M best matching districts, typically the output from /codenromatch. A numeric vector of M values or an M x N matrix, where M is the number of points and N is the number of variables. Maximum number of permutations. A frame with eight columns. For example, P.z is a parametric estimate for statistical significance, P.freq is the frequency-based estimate for statistical signicance, and Z is the estimated z-score of how far the observed map plane is from the average randomly generated layout. N. indicates how many values were used and N.cycles tells the number of completed permutations. AMPLITUDE is the dynamic range for colors that can be used in nrocolorize. nroplot Plot a self-organizing map Create a graphical interface for selecting subgroups from multiple map colorings simultaneusly. nroplot(elements, colors, labels = NULL, values = NULL, subplot = NULL, interactive = FALSE, file = NULL)

8 nropreprocess elements colors labels values subplot interactive file A frame with K rows and six columns that contains the district positions of a self-organizing map (i.e. the topology), please see nrokohonen for details. May also contain additional columns for visualization of subgroup information. A character vector with K elements or a K x N matrix of hexadecimal color codes as strings. A character vector with K elements or a K x N matrix of district labels. A vector with K elements or a K x N matrix of district values. A two-element vector that sets out the number of rows and columns for a grid layout of multiple colorings. If TRUE, an interactive version of the plot is launched. If non-empty, the figure is saved as a Scalable Vector Graphics file. Some non-alphanumeric characters are not supported and will be automatically converted to _. Too long labels or column names will be truncated. A frame with K rows that contains the topology and subgrouping information. See Also Please see nrocolorize for converting values into color codes, and nrolabel for optimizing which labels to show on the map. nropreprocess Data cleaning and standardization Convert to numerical values, remove unusable rows and columns, and standardize scale of each variable. nropreprocess(, training = NULL, strata = NULL, key = NULL)

nrosummary 9 training strata key A matrix or a frame. A character vector that contains the headings of feature columns that are intended for model training and point matching. The heading of the column that defines batches; each batch will be processed separately. The heading of the column that contains row identifiers. A list with three members is returned: original contains a subset of the original set were unusable rows were removed, values contains those columns that could be converted to numbers, and features contains the standardized training columns. nrosummary Estimate subgroup statistics Combine subgrouping information for districts with the points that reside in the districts, and estimate statistics for each subgroup and variable. nrosummary(, districts, regions, categlim = 8) districts regions categlim A vector of M elements or an M x N matrix of values. A vector of M best-matching districts for each row in the matrix, please see nromatch for a typical usage case. An vector of K elements that defines if a district belongs to a larger region (i.e. a subgroup). The threshold for the number of unique values before a variable is considered continuous. The region vector must have K elements where K is the total number of map districts. Accordingly, the value at element [i] indicates the region for the district [i].

10 nrotrain A frame of summary statistics that contains a row for every combination of subgroups and variables. nrotrain Train Self-Organizing Map Iterative algorithm to adapt a self-organizing map (SOM) to a set of multivariable. nrotrain(som,, subsample = NULL) som subsample A list of two elements: centroids and topology, see nrokohonen for additional details. A matrix with the same column names as the centroids. Number of rows used during a single training cycle. A copy of som, where the centroids list element is updated according to the patterns. In addition, the quantization errors during training is stored in the element history. See Also nrokohonen for details on the SOM

Index nroaggregate, 2, 5 nrocolorize, 3, 7, 8 nrokmeans, 4, 5, 6 nrokohonen, 2, 4, 5, 7, 8, 10 nrolabel, 5, 8 nromatch, 2, 6, 7, 9 nropermute, 7 nroplot, 7 nropreprocess, 8 nrosummary, 9 nrotrain, 6, 10 11