Package knnp. R topics documented: July 1, Version Date

Similar documents
Package ECctmc. May 1, 2018

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package robets. March 6, Type Package

Package paralleldist

Package rucrdtw. October 13, 2017

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package kdtools. April 26, 2018

Package widyr. August 14, 2017

Package dissutils. August 29, 2016

Package dbx. July 5, 2018

Package fastdummies. January 8, 2018

Package messaging. May 27, 2018

Package frequencyconnectedness

Package optimus. March 24, 2017

Package subplex. April 5, 2018

Package available. November 17, 2017

Package spark. July 21, 2017

Package bisect. April 16, 2018

Package strat. November 23, 2016

Package areaplot. October 18, 2017

Package shinyhelper. June 21, 2018

Package nngeo. September 29, 2018

Package rbraries. April 18, 2018

Package repec. August 31, 2018

Package farver. November 20, 2018

Package sfc. August 29, 2016

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics

Package weco. May 4, 2018

Package nmslibr. April 14, 2018

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package raker. October 10, 2017

Package clipr. June 23, 2018

Package ecoseries. R topics documented: September 27, 2017

Package IATScore. January 10, 2018

Package clustering.sc.dp

Package canvasxpress

Package rtsplot. September 15, 2018

Package sigqc. June 13, 2018

Package projector. February 27, 2018

Package deductive. June 2, 2017

Package robotstxt. November 12, 2017

Package MatchIt. April 18, 2017

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package crochet. January 8, 2018

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package embed. November 19, 2018

Package SimilaR. June 21, 2018

Package kirby21.base

Package rtext. January 23, 2019

Package KernelKnn. January 16, 2018

Package Numero. November 24, 2018

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package tiler. June 9, 2018

Package pdfsearch. July 10, 2018

Package NORTARA. February 19, 2015

Package readobj. November 2, 2017

Package Rsomoclu. January 3, 2019

Package datasets.load

Package reval. May 26, 2015

Package clusterrepro

Package autocogs. September 22, Title Automatic Cognostic Summaries Version 0.1.1

Package calibrar. August 29, 2016

Package calpassapi. August 25, 2018

1 Surprises in high dimensions

Package censusr. R topics documented: June 14, Type Package Title Collect Data from the Census API Version 0.0.

Package gtrendsr. October 19, 2017

Package ibm. August 29, 2016

Package filematrix. R topics documented: February 27, Type Package

Package taxizedb. June 21, 2017

Package enpls. May 14, 2018

Package rmi. R topics documented: August 2, Title Mutual Information Estimators Version Author Isaac Michaud [cre, aut]

Package goodpractice

Package grec. R topics documented: August 13, Type Package

Package NFP. November 21, 2016

Package waver. January 29, 2018

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package TestDataImputation

Package RSpectra. May 23, 2018

Package gtrendsr. August 4, 2018

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember

Package loggit. April 9, 2018

Package aws.transcribe

Package CorporaCoCo. R topics documented: November 23, 2017

Package geneslope. October 26, 2016

Package glcm. August 29, 2016

Package NNLM. June 18, 2018

Package eply. April 6, 2018

Package FastKNN. February 19, 2015

Package jstree. October 24, 2017

Package ClusterBootstrap

Package oec. R topics documented: May 11, Type Package

Package combiter. December 4, 2017

Package geogrid. August 19, 2018

Package geoops. March 19, 2018

Cloud Search Service Product Introduction. Issue 01 Date HUAWEI TECHNOLOGIES CO., LTD.

Package PCADSC. April 19, 2017

Package librarian. R topics documented:

Package fusionclust. September 19, 2017

Package fastrtext. December 10, 2017

Package longclust. March 18, 2018

Transcription:

Version 1.0.0 Date 2018-06-18 Package knnp Jul 1, 2018 Title Time Series Preiction using K-Nearest Neighbors Algorithm (Parallel) Depens R (>= 3.3.3) Imports paralleldist, forecast, stats, utils, oparallel, foreach Two main functionalities are provie. One of them is preicting values with k-nearest neighbors algorithm an the other is optimizing the parameters k an of the algorithm. These are carrie out in parallel using multiple threas. License AGPL-3 RoxgenNote 6.0.1 URL https://github.com/dani-basta/tfg BugReports https://github.com/dani-basta/tfg/issues NeesCompilation no Author Daniel Bastarrica Lacalle [aut], Javier Berecio Trigueros [aut, cre] Maintainer Javier Berecio Trigueros <javierbereciot@gmail.com> Repositor CRAN Date/Publication 2018-07-01 15:00:02 UTC R topics ocumente: knn_istances........................................ 2 knn_elements........................................ 3 knn_next.......................................... 3 knn_optim.......................................... 4 knn_optim_parallel..................................... 6 knn_optim_parallel2.................................... 7 knn_optim_parallelf..................................... 8 knn_past........................................... 10 Inex 12 1

2 knn_istances knn_istances Distances matrixes computation an saving in files with a maximum of columns Calculates one istances matrix per each for the given time series an then save them in files. Each file will contain a maximum of cols number of columns from the corresponing istances matrix. knn_istances(,, istance_metric = "eucliean", threas = NULL, file, cols = 1) s of s to be analze. istance_metric Tpe of metric to evaluate the istance between points. Man metrics are supporte: eucliean, manhattan, namic time warping, canberra an others. For more information about the supporte metrics check the values that metho argument of function pardist (from paralleldist package) can take as this is the function use to calculate the istances. Link to the package info: https: //cran.r-project.org/package=paralleldist. Some of the values that this argument can take are "eucliean", "manhattan", "tw", "canberra", "chor". threas file cols Number of threas to be use when parallelizing istances calculation, efault is number of cores etecte - 1 or 1 if there is onl one core. Path an i of the files where the istances matrixes will be save. Number of columns per file. knn_istances(airpassengers, 1:3, threas = 2, file = "AirPassengers", cols = 2) knn_istances(lakehuron, 1:6, threas = 2, file = "LakeHuron", cols = 10)

knn_elements 3 knn_elements Elements matrix computation Creates a matrix to be use for calculating istances. The most recent element is put in the first row of the matrix, the secon most recent element in the secon row an so on. Therefore, the olest element is put in the last row. knn_elements(, ) A matrix. Length of each of the elements. A matrix to be use for calculating istances. knn_next Next value preiction Preicts next value of the time series using k-nearest neighbors algorithm. knn_next(, k,, v = 1, istance_metric = "eucliean", weight = "proximit", threas = NULL) k v Number of neighbors. Length of each of the elements. Variable to be preicte if given multivariate time series.

4 knn_optim istance_metric Tpe of metric to evaluate the istance between points. Man metrics are supporte: eucliean, manhattan, namic time warping, canberra an others. For more information about the supporte metrics check the values that metho argument of function pardist (from paralleldist package) can take as this is the function use to calculate the istances. Link to the package info: https: //cran.r-project.org/package=paralleldist. Some of the values that this argument can take are "eucliean", "manhattan", "tw", "canberra", "chor". weight threas The preicte value. Tpe of weight to be use at the time of calculating the preicte value with a weighte mean. Three supporte: proximit, same, linear. proximit the weight assigne to each neighbor is proportional to its istance same all neighbors are assigne with the same weight linear nearest neighbor is assigne with weight k, secon closest neighbor with weight k-1, an so on until the least nearest neighbor which is assigne with a weight of 1. Number of threas to be use when parallelizing istances calculation, efault is number of cores etecte - 1 or 1 if there is onl one core. knn_next(airpassengers, 5, 2, threas = 2) knn_next(lakehuron, 3, 6, threas = 2) knn_optim k an optimization Optimizes the values of k an for a given time series. First, values corresponing to instants from init + 1 to the last one are preicte. The first value preicte, which correspons to instant init + 1, is calculate using instants from 1 to instant init; the secon value preicte, which correspons to instant init + 2, is preicte using instants from 1 to instant init + 1; an so on until the last value, which correspons to instant n (length of the given time series), is preicte using instants from 1 to instant n - 1. Finall, the error is evaluate between the preicte values an the real values of the series. This version of the optimization function onl uses one threa except for the istances matrixes calculation, for which the number of threas to be use can be specifie. knn_optim(, k,, v = 1, init = NULL, istance_metric = "eucliean", error_metric = "MAE", weight = "proximit", threas = NULL)

knn_optim 5 k v init s of k s to be analze. s of s to be analze. Variable to be preicte if given multivariate time series. Variable that etermines the limit of the known past for the first instant preicte. istance_metric Tpe of metric to evaluate the istance between points. Man metrics are supporte: eucliean, manhattan, namic time warping, canberra an others. For more information about the supporte metrics check the values that metho argument of function pardist (from paralleldist package) can take as this is the function use to calculate the istances. Link to the package info: https: //cran.r-project.org/package=paralleldist. Some of the values that this argument can take are "eucliean", "manhattan", "tw", "canberra", "chor". error_metric weight threas Tpe of metric to evaluate the preiction error. Five metrics supporte: ME Mean Error RMSE Root Mean Square Error MAE Mean Absolute Error MPE Mean Percentage Error MAPE Mean Absolute Percentage Error Tpe of weight to be use at the time of calculating the preicte value with a weighte mean. Three supporte: proximit, same, linear. proximit the weight assigne to each neighbor is proportional to its istance same all neighbors are assigne with the same weight linear nearest neighbor is assigne with weight k, secon closest neighbor with weight k-1, an so on until the least nearest neighbor which is assigne with a weight of 1. Number of threas to be use when parallelizing, efault is number of cores etecte - 1 or 1 if there is onl one core. A matrix of errors, optimal k an. knn_optim(airpassengers, 1:5, 1:3, threas = 2) knn_optim(lakehuron, 1:10, 1:6, threas = 2)

6 knn_optim_parallel knn_optim_parallel Parallel k an optimization Optimizes the values of K an D for a given time series. First, values corresponing to instants from init + 1 to the last one are preicte. The first value preicte, which correspons to instant init + 1, is calculate using instants from 1 to instant init; the secon value preicte, which correspons to instant init + 2, is preicte using instants from 1 to instant init + 1; an so on until the last value, which correspons to instant n (length of the given time series), is preicte using instants from 1 to instant n - 1. Finall, the error is evaluate between the preicte values an the real values of the series. This version of the optimization function uses a parallelize istances calculation function, an the computation of the preicte values is one parallelizing b the number of s an the number of instants to be preicte. knn_optim_parallel(, k,, v = 1, init = NULL, istance_metric = "eucliean", error_metric = "MAE", weight = "proximit", threas = NULL) k v s of k s to be analze. s of s to be analze. Variable to be preicte if given multivariate time series. init Variable that etermines the limit of the known past for the first instant preicte. istance_metric Tpe of metric to evaluate the istance between points. Man metrics are supporte: eucliean, manhattan, namic time warping, canberra an others. For more information about the supporte metrics check the values that metho argument of function pardist (from paralleldist package) can take as this is the function use to calculate the istances. Link to the package info: https: //cran.r-project.org/package=paralleldist. Some of the values that this argument can take are "eucliean", "manhattan", "tw", "canberra", "chor". error_metric weight Tpe of metric to evaluate the preiction error. Five metrics supporte: ME Mean Error RMSE Root Mean Square Error MAE Mean Absolute Error MPE Mean Percentage Error MAPE Mean Absolute Percentage Error Tpe of weight to be use at the time of calculating the preicte value with a weighte mean. Three supporte: proximit, same, linear.

knn_optim_parallel2 7 threas proximit the weight assigne to each neighbor is proportional to its istance same all neighbors are assigne with the same weight linear nearest neighbor is assigne with weight k, secon closest neighbor with weight k-1, an so on until the least nearest neighbor which is assigne with a weight of 1. Number of threas to be use when parallelizing, efault is number of cores etecte - 1 or 1 if there is onl one core. A matrix of errors, optimal k an. knn_optim_parallel(airpassengers, 1:5, 1:3, threas = 2) knn_optim_parallel(lakehuron, 1:10, 1:6, threas = 2) knn_optim_parallel2 Parallel k an optimization Optimizes the values of k an for a given time series. First, values corresponing to instants from init + 1 to the last one are preicte. The first value preicte, which correspons to instant init + 1, is calculate using instants from 1 to instant init; the secon value preicte, which correspons to instant init + 2, is preicte using instants from 1 to instant init + 1; an so on until the last value, which correspons to instant n (length of the given time series), is preicte using instants from 1 to instant n - 1. Finall, the error is evaluate between the preicte values an the real values of the series. This version of the optimization function uses a parallelize istances calculation function, an the computation of the preicte values is one parallelizing b the number of s. knn_optim_parallel2(, k,, v = 1, init = NULL, istance_metric = "eucliean", error_metric = "MAE", weight = "proximit", threas = NULL) k v init s of k s to be analze. s of s to be analze. Variable to be preicte if given multivariate time series. Variable that etermines the limit of the known past for the first instant preicte.

8 knn_optim_parallelf istance_metric Tpe of metric to evaluate the istance between points. Man metrics are supporte: eucliean, manhattan, namic time warping, canberra an others. For more information about the supporte metrics check the values that metho argument of function pardist (from paralleldist package) can take as this is the function use to calculate the istances. Link to the package info: https: //cran.r-project.org/package=paralleldist. Some of the values that this argument can take are "eucliean", "manhattan", "tw", "canberra", "chor". error_metric weight threas A matrix of errors, optimal k an. Tpe of metric to evaluate the preiction error. Five metrics supporte: ME Mean Error RMSE Root Mean Square Error MAE Mean Absolute Error MPE Mean Percentage Error MAPE Mean Absolute Percentage Error Tpe of weight to be use at the time of calculating the preicte value with a weighte mean. Three supporte: proximit, same, linear. proximit the weight assigne to each neighbor is proportional to its istance same all neighbors are assigne with the same weight linear nearest neighbor is assigne with weight k, secon closest neighbor with weight k-1, an so on until the least nearest neighbor which is assigne with a weight of 1. Number of threas to be use when parallelizing, efault is number of cores etecte - 1 or 1 if there is onl one core. knn_optim_parallel2(airpassengers, 1:5, 1:3, threas = 2) knn_optim_parallel2(lakehuron, 1:10, 1:6, threas = 2) knn_optim_parallelf Parallel k an optimization reaing from files Optimizes the values of k an for a given time series. First, values corresponing to instants from init + 1 to the last one are preicte. The first value preicte, which correspons to instant init + 1, is calculate using instants from 1 to instant init; the secon value preicte, which correspons to instant init + 2, is preicte using instants from 1 to instant init + 1; an so on until the last value, which correspons to instant n (length of the given time series), is preicte using instants from 1 to instant n - 1. Finall, the error is evaluate between the preicte values an the real values of the series. This version of the optimization function uses a parallelize istances calculation

knn_optim_parallelf 9 function, an the computation of the preicte values is one parallelizing b the number of s an the number of instants to be preicte. Each threa that calculates preicte values reas onl the part of the corresponing istances matrix in which the information use to preict is containe. knn_optim_parallelf(, k,, v = 1, init = NULL, error_metric = "MAE", weight = "proximit", threas = NULL, file, cols) k v init error_metric weight threas file cols s of k;s to be analze. s of s to be analze. Variable to be preicte if given multivariate time series. Variable that etermines the limit of the known past for the first instant preicte. Tpe of metric to evaluate the preiction error. Five metrics supporte: ME Mean Error RMSE Root Mean Square Error MAE Mean Absolute Error MPE Mean Percentage Error MAPE Mean Absolute Percentage Error Tpe of weight to be use at the time of calculating the preicte value with a weighte mean. Three supporte: proximit, same, linear. proximit the weight assigne to each neighbor is proportional to its istance same all neighbors are assigne with the same weight linear nearest neighbor is assigne with weight k, secon closest neighbor with weight k-1, an so on until the least nearest neighbor which is assigne with a weight of 1. Number of threas to be use when parallelizing, efault is number of cores etecte - 1 or 1 if there is onl one core. Path an i of the files where the istances matrixes are. Number of columns per file. A matrix of errors, optimal k an. knn_istances(airpassengers, 1:3, file = "AirPassengers", cols = 2, threas = 2) knn_optim_parallelf(airpassengers, 1:5, 1:3, file = "AirPassengers", cols = 2, threas = 2) knn_istances(lakehuron, 1:6, file = "LakeHuron", cols = 10, threas = 2) knn_optim_parallelf(lakehuron, 1:10, 1:6, file = "LakeHuron", cols = 10, threas = 2)

10 knn_past knn_past Past time preiction Preicts values of the time series using k-nearest neighbors algorithm. s corresponing to instants from init + 1 to the last one are preicte. The first value preicte, which correspons to instant init + 1, is calculate using instants from 1 to instant init; the secon value preicte, which correspons to instant init + 2, is preicte using instants from 1 to instant init + 1; an so on until the last value, which correspons to instant n (length of the given time series), is preicte using instants from 1 to instant n - 1. knn_past(, k,, v = 1, init = NULL, istance_metric = "eucliean", weight = "proximit", threas = NULL) k v Number of neighbors. Length of each of the elements. Variable to be preicte if given multivariate time series. init Variable that etermines the limit of the known past for the first instant preicte. istance_metric Tpe of metric to evaluate the istance between points. Man metrics are supporte: eucliean, manhattan, namic time warping, canberra an others. For more information about the supporte metrics check the values that metho argument of function pardist (from paralleldist package) can take as this is the function use to calculate the istances. Link to the package info: https: //cran.r-project.org/package=paralleldist. Some of the values that this argument can take are "eucliean", "manhattan", "tw", "canberra", "chor". weight threas The preicte values. Tpe of weight to be use at the time of calculating the preicte value with a weighte mean. Three supporte: proximit, same, linear. proximit the weight assigne to each neighbor is proportional to its istance same all neighbors are assigne with the same weight linear nearest neighbor is assigne with weight k, secon closest neighbor with weight k-1, an so on until the least nearest neighbor which is assigne with a weight of 1. Number of threas to be use when parallelizing, efault is number of cores etecte - 1 or 1 if there is onl one core.

knn_past 11 knn_past(airpassengers, 5, 2, threas = 2) knn_past(lakehuron, 3, 6, threas = 2)

Inex knn_istances, 2 knn_elements, 3 knn_next, 3 knn_optim, 4 knn_optim_parallel, 6 knn_optim_parallel2, 7 knn_optim_parallelf, 8 knn_past, 10 12