Package deductive. June 2, 2017

Similar documents
Package ECctmc. May 1, 2018

Package errorlocate. R topics documented: March 30, Type Package Title Locate Errors with Validation Rules Version 0.1.3

Package fastdummies. January 8, 2018

Package lumberjack. R topics documented: July 20, 2018

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package lintools. July 30, 2018

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Automatic editing of numerical data with R

Package validara. October 19, 2017

Package datasets.load

Package vinereg. August 10, 2018

Package dbx. July 5, 2018

Package SEMrushR. November 3, 2018

Package messaging. May 27, 2018

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package available. November 17, 2017

Package clipr. June 23, 2018

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package labelvector. July 28, 2018

Package projector. February 27, 2018

Package strat. November 23, 2016

Package rprojroot. January 3, Title Finding Files in Project Subdirectories Version 1.3-2

Package hyphenatr. August 29, 2016

Package wrswor. R topics documented: February 2, Type Package

Package ecoseries. R topics documented: September 27, 2017

An R-based data editing system. Mark van der Loo

Package humanize. R topics documented: April 4, Version Title Create Values for Human Consumption

Package gtrendsr. October 19, 2017

Package crochet. January 8, 2018

Package spark. July 21, 2017

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package ether. September 22, 2018

Package canvasxpress

Package shinyfeedback

Package shinyhelper. June 21, 2018

Package goodpractice

Package bisect. April 16, 2018

Package TestDataImputation

Package ompr. November 18, 2017

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package nlgeocoder. October 8, 2018

Package vip. June 15, 2018

Package widyr. August 14, 2017

Package sfc. August 29, 2016

Package repec. August 31, 2018

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package nngeo. September 29, 2018

Package balance. October 12, 2018

Package knitrprogressbar

Package sessioninfo. June 21, 2017

Package preprosim. July 26, 2016

Package UNF. June 13, 2017

Package semver. January 6, 2017

Package kdtools. April 26, 2018

Package dotcall64. January 11, 2018

Package aws.transcribe

Package SimilaR. June 21, 2018

Package triebeard. August 29, 2016

Package mdftracks. February 6, 2017

Package eply. April 6, 2018

Package postgistools

Package gtrendsr. August 4, 2018

Package robotstxt. November 12, 2017

Package tidyimpute. March 5, 2018

Package calpassapi. August 25, 2018

Package pdfsearch. July 10, 2018

Package effectr. January 17, 2018

Package kirby21.base

Package barcoder. October 26, 2018

Package spelling. December 18, 2017

Package jpmesh. December 4, 2017

Package RODBCext. July 31, 2017

Package raker. October 10, 2017

Package censusr. R topics documented: June 14, Type Package Title Collect Data from the Census API Version 0.0.

Package WordR. September 7, 2017

Package milr. June 8, 2017

Package censusapi. August 19, 2018

Package chunked. July 2, 2017

Package ECOSolveR. February 18, 2018

Package rbraries. April 18, 2018

Package rnn. R topics documented: June 21, Title Recurrent Neural Network Version 0.8.1

Package editdata. October 7, 2017

Package qap. February 27, Index 5. Solve Quadratic Assignment Problems (QAP)

Package regexselect. R topics documented: September 22, Version Date Title Regular Expressions in 'shiny' Select Lists

Package utilsipea. November 13, 2017

Package wikitaxa. December 21, 2017

Package weco. May 4, 2018

Package virustotal. May 1, 2017

Package docxtools. July 6, 2018

Package combiter. December 4, 2017

Package cregulome. September 13, 2018

Package gsalib. R topics documented: February 20, Type Package. Title Utility Functions For GATK. Version 2.1.

Package NFP. November 21, 2016

Package modmarg. R topics documented:

Package internetarchive

Package opencage. January 16, 2018

Package edfreader. R topics documented: May 21, 2017

Package waver. January 29, 2018

Package rmatio. July 28, 2017

Transcription:

Package deductive June 2, 2017 Maintainer Mark van der Loo <mark.vanderloo@gmail.com> License GPL-3 Title Data Correction and Imputation Using Deductive Methods LazyData no Type Package LazyLoad yes Attempt to repair inconsistencies and missing values in data records by using information from valid values and validation rules restricting the data. Version 0.1.2 Depends R (>= 3.2.0) URL https://github.com/data-cleaning/deductive BugReports https://github.com/data-cleaning/deductive/issues Date 2017-06-02 Imports methods, lintools (>= 0.1.1.2), validate, stringdist Suggests testthat, knitr RoxygenNote 6.0.1 NeedsCompilation yes Author Mark van der Loo [cre, aut], Edwin de Jonge [aut] Repository CRAN Date/Publication 2017-06-02 18:49:04 UTC R topics documented: correct_typos........................................ 2 deductive.......................................... 3 impute_lr.......................................... 3 Index 5 1

2 correct_typos correct_typos Correct typos in restricted numeric data Attempt to fix violations of linear (in)equality restrictions imposed on a record by replacing values with values that differ from the original values by typographical errors. Usage correct_typos(dat, x,...) ## S4 method for signature 'data.frame,validator' correct_typos(dat, x, fixate = NULL, eps = 1e-08, maxdist = 1,...) Arguments dat x Value An R object holding numeric (integer) data. An R object holding linear data validation rules... Options to be passed to stringdist which is used to determine the typographic distance between the original value and candidate solutions. By default, the optimal string alignment distance is used, with all weights equal to one. fixate eps maxdist dat, with values corrected. [character] vector of variable names that may not be changed [numeric] maximum roundoff error [numeric] maximum allowd typographical distance Details The algorithm works by proposing candidate replacement values and checking whether they are likely to be the result of a typographical error. A value is accepted as a solution when it resolves at least one equality violation. An equality restriction a.x=b is considered satisfied when abs(a.x-b)<eps. Setting eps to one or two units of measurement allows for robust typographical error detection in the presence of roundoff-errors. The algorithm is meant to be used on numeric data representing integers. References The first version of the algorithm was described by S. Scholtus (2009). Automatic correction of simple typing errors in numerical data with balance edits. Statistics Netherlands, Discussion Paper 09046

deductive 3 The generalized version of this algorithm that is implemented for this package is described in M. van der Loo, E. de Jonge and S. Scholtus (2011). Correction of rounding, typing and sign errors with the deducorrect package. Statistics Netherlands, Discussion Paper 2011019 Examples library(validate) # example from section 4 in Scholtus (2009) v <-validate::validator( x1 + x2 == x3, x2 == x4, x5 + x6 + x7 == x8, x3 + x8 == x9, x9 - x10 == x11 ) dat <- read.csv(textconnection( "x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11 1452, 116, 1568, 116, 323, 76, 12, 411, 1979, 1842, 137 1452, 116, 1568, 161, 323, 76, 12, 411, 1979, 1842, 137 1452, 116, 1568, 161, 323, 76, 12, 411, 19979, 1842, 137 1452, 116, 1568, 161, 0, 0, 0, 411, 19979, 1842, 137 1452, 116, 1568, 161, 323, 76, 12, 0, 19979, 1842, 137" )) cor <- correct_typos(dat,v) dat - cor deductive Deductive Data Correction and Imputation Deductive Data Correction and Imputation impute_lr Impute values derived from linear (in)equality restrictions.

4 impute_lr Usage Partially filled records x under linear (in)equality restrictions may reveal unique imputation solutions when the system of linear inequalities is reduced by substituting observed values. This function applies a number of fast heuristic methods before deriving all variable ranges and unique values. impute_lr(dat, x,...) ## S4 method for signature 'data.frame,validator' impute_lr(dat, x,...) Arguments dat x an R object carrying data an R object carrying validation rules... arguments to be passed to other methods. Examples v <- validate::validator(y ==2,y + z ==3, x +y <= 0) dat <- data.frame(x=na_real_,y=na_real_,z=na_real_) impute_lr(dat,v)

Index correct_typos, 2 correct_typos,data.frame,validator-method (correct_typos), 2 deductive, 3 deductive-package (deductive), 3 impute_lr, 3 impute_lr,data.frame,validator-method (impute_lr), 3 stringdist, 2 5