Version 1.0.0 Date 2018-06-18 Package knnp Jul 1, 2018 Title Time Series Preiction using K-Nearest Neighbors Algorithm (Parallel) Depens R (>= 3.3.3) Imports paralleldist, forecast, stats, utils, oparallel, foreach Two main functionalities are provie. One of them is preicting values with k-nearest neighbors algorithm an the other is optimizing the parameters k an of the algorithm. These are carrie out in parallel using multiple threas. License AGPL-3 RoxgenNote 6.0.1 URL https://github.com/dani-basta/tfg BugReports https://github.com/dani-basta/tfg/issues NeesCompilation no Author Daniel Bastarrica Lacalle [aut], Javier Berecio Trigueros [aut, cre] Maintainer Javier Berecio Trigueros <javierbereciot@gmail.com> Repositor CRAN Date/Publication 2018-07-01 15:00:02 UTC R topics ocumente: knn_istances........................................ 2 knn_elements........................................ 3 knn_next.......................................... 3 knn_optim.......................................... 4 knn_optim_parallel..................................... 6 knn_optim_parallel2.................................... 7 knn_optim_parallelf..................................... 8 knn_past........................................... 10 Inex 12 1
2 knn_istances knn_istances Distances matrixes computation an saving in files with a maximum of columns Calculates one istances matrix per each for the given time series an then save them in files. Each file will contain a maximum of cols number of columns from the corresponing istances matrix. knn_istances(,, istance_metric = "eucliean", threas = NULL, file, cols = 1) s of s to be analze. istance_metric Tpe of metric to evaluate the istance between points. Man metrics are supporte: eucliean, manhattan, namic time warping, canberra an others. For more information about the supporte metrics check the values that metho argument of function pardist (from paralleldist package) can take as this is the function use to calculate the istances. Link to the package info: https: //cran.r-project.org/package=paralleldist. Some of the values that this argument can take are "eucliean", "manhattan", "tw", "canberra", "chor". threas file cols Number of threas to be use when parallelizing istances calculation, efault is number of cores etecte - 1 or 1 if there is onl one core. Path an i of the files where the istances matrixes will be save. Number of columns per file. knn_istances(airpassengers, 1:3, threas = 2, file = "AirPassengers", cols = 2) knn_istances(lakehuron, 1:6, threas = 2, file = "LakeHuron", cols = 10)
knn_elements 3 knn_elements Elements matrix computation Creates a matrix to be use for calculating istances. The most recent element is put in the first row of the matrix, the secon most recent element in the secon row an so on. Therefore, the olest element is put in the last row. knn_elements(, ) A matrix. Length of each of the elements. A matrix to be use for calculating istances. knn_next Next value preiction Preicts next value of the time series using k-nearest neighbors algorithm. knn_next(, k,, v = 1, istance_metric = "eucliean", weight = "proximit", threas = NULL) k v Number of neighbors. Length of each of the elements. Variable to be preicte if given multivariate time series.
4 knn_optim istance_metric Tpe of metric to evaluate the istance between points. Man metrics are supporte: eucliean, manhattan, namic time warping, canberra an others. For more information about the supporte metrics check the values that metho argument of function pardist (from paralleldist package) can take as this is the function use to calculate the istances. Link to the package info: https: //cran.r-project.org/package=paralleldist. Some of the values that this argument can take are "eucliean", "manhattan", "tw", "canberra", "chor". weight threas The preicte value. Tpe of weight to be use at the time of calculating the preicte value with a weighte mean. Three supporte: proximit, same, linear. proximit the weight assigne to each neighbor is proportional to its istance same all neighbors are assigne with the same weight linear nearest neighbor is assigne with weight k, secon closest neighbor with weight k-1, an so on until the least nearest neighbor which is assigne with a weight of 1. Number of threas to be use when parallelizing istances calculation, efault is number of cores etecte - 1 or 1 if there is onl one core. knn_next(airpassengers, 5, 2, threas = 2) knn_next(lakehuron, 3, 6, threas = 2) knn_optim k an optimization Optimizes the values of k an for a given time series. First, values corresponing to instants from init + 1 to the last one are preicte. The first value preicte, which correspons to instant init + 1, is calculate using instants from 1 to instant init; the secon value preicte, which correspons to instant init + 2, is preicte using instants from 1 to instant init + 1; an so on until the last value, which correspons to instant n (length of the given time series), is preicte using instants from 1 to instant n - 1. Finall, the error is evaluate between the preicte values an the real values of the series. This version of the optimization function onl uses one threa except for the istances matrixes calculation, for which the number of threas to be use can be specifie. knn_optim(, k,, v = 1, init = NULL, istance_metric = "eucliean", error_metric = "MAE", weight = "proximit", threas = NULL)
knn_optim 5 k v init s of k s to be analze. s of s to be analze. Variable to be preicte if given multivariate time series. Variable that etermines the limit of the known past for the first instant preicte. istance_metric Tpe of metric to evaluate the istance between points. Man metrics are supporte: eucliean, manhattan, namic time warping, canberra an others. For more information about the supporte metrics check the values that metho argument of function pardist (from paralleldist package) can take as this is the function use to calculate the istances. Link to the package info: https: //cran.r-project.org/package=paralleldist. Some of the values that this argument can take are "eucliean", "manhattan", "tw", "canberra", "chor". error_metric weight threas Tpe of metric to evaluate the preiction error. Five metrics supporte: ME Mean Error RMSE Root Mean Square Error MAE Mean Absolute Error MPE Mean Percentage Error MAPE Mean Absolute Percentage Error Tpe of weight to be use at the time of calculating the preicte value with a weighte mean. Three supporte: proximit, same, linear. proximit the weight assigne to each neighbor is proportional to its istance same all neighbors are assigne with the same weight linear nearest neighbor is assigne with weight k, secon closest neighbor with weight k-1, an so on until the least nearest neighbor which is assigne with a weight of 1. Number of threas to be use when parallelizing, efault is number of cores etecte - 1 or 1 if there is onl one core. A matrix of errors, optimal k an. knn_optim(airpassengers, 1:5, 1:3, threas = 2) knn_optim(lakehuron, 1:10, 1:6, threas = 2)
6 knn_optim_parallel knn_optim_parallel Parallel k an optimization Optimizes the values of K an D for a given time series. First, values corresponing to instants from init + 1 to the last one are preicte. The first value preicte, which correspons to instant init + 1, is calculate using instants from 1 to instant init; the secon value preicte, which correspons to instant init + 2, is preicte using instants from 1 to instant init + 1; an so on until the last value, which correspons to instant n (length of the given time series), is preicte using instants from 1 to instant n - 1. Finall, the error is evaluate between the preicte values an the real values of the series. This version of the optimization function uses a parallelize istances calculation function, an the computation of the preicte values is one parallelizing b the number of s an the number of instants to be preicte. knn_optim_parallel(, k,, v = 1, init = NULL, istance_metric = "eucliean", error_metric = "MAE", weight = "proximit", threas = NULL) k v s of k s to be analze. s of s to be analze. Variable to be preicte if given multivariate time series. init Variable that etermines the limit of the known past for the first instant preicte. istance_metric Tpe of metric to evaluate the istance between points. Man metrics are supporte: eucliean, manhattan, namic time warping, canberra an others. For more information about the supporte metrics check the values that metho argument of function pardist (from paralleldist package) can take as this is the function use to calculate the istances. Link to the package info: https: //cran.r-project.org/package=paralleldist. Some of the values that this argument can take are "eucliean", "manhattan", "tw", "canberra", "chor". error_metric weight Tpe of metric to evaluate the preiction error. Five metrics supporte: ME Mean Error RMSE Root Mean Square Error MAE Mean Absolute Error MPE Mean Percentage Error MAPE Mean Absolute Percentage Error Tpe of weight to be use at the time of calculating the preicte value with a weighte mean. Three supporte: proximit, same, linear.
knn_optim_parallel2 7 threas proximit the weight assigne to each neighbor is proportional to its istance same all neighbors are assigne with the same weight linear nearest neighbor is assigne with weight k, secon closest neighbor with weight k-1, an so on until the least nearest neighbor which is assigne with a weight of 1. Number of threas to be use when parallelizing, efault is number of cores etecte - 1 or 1 if there is onl one core. A matrix of errors, optimal k an. knn_optim_parallel(airpassengers, 1:5, 1:3, threas = 2) knn_optim_parallel(lakehuron, 1:10, 1:6, threas = 2) knn_optim_parallel2 Parallel k an optimization Optimizes the values of k an for a given time series. First, values corresponing to instants from init + 1 to the last one are preicte. The first value preicte, which correspons to instant init + 1, is calculate using instants from 1 to instant init; the secon value preicte, which correspons to instant init + 2, is preicte using instants from 1 to instant init + 1; an so on until the last value, which correspons to instant n (length of the given time series), is preicte using instants from 1 to instant n - 1. Finall, the error is evaluate between the preicte values an the real values of the series. This version of the optimization function uses a parallelize istances calculation function, an the computation of the preicte values is one parallelizing b the number of s. knn_optim_parallel2(, k,, v = 1, init = NULL, istance_metric = "eucliean", error_metric = "MAE", weight = "proximit", threas = NULL) k v init s of k s to be analze. s of s to be analze. Variable to be preicte if given multivariate time series. Variable that etermines the limit of the known past for the first instant preicte.
8 knn_optim_parallelf istance_metric Tpe of metric to evaluate the istance between points. Man metrics are supporte: eucliean, manhattan, namic time warping, canberra an others. For more information about the supporte metrics check the values that metho argument of function pardist (from paralleldist package) can take as this is the function use to calculate the istances. Link to the package info: https: //cran.r-project.org/package=paralleldist. Some of the values that this argument can take are "eucliean", "manhattan", "tw", "canberra", "chor". error_metric weight threas A matrix of errors, optimal k an. Tpe of metric to evaluate the preiction error. Five metrics supporte: ME Mean Error RMSE Root Mean Square Error MAE Mean Absolute Error MPE Mean Percentage Error MAPE Mean Absolute Percentage Error Tpe of weight to be use at the time of calculating the preicte value with a weighte mean. Three supporte: proximit, same, linear. proximit the weight assigne to each neighbor is proportional to its istance same all neighbors are assigne with the same weight linear nearest neighbor is assigne with weight k, secon closest neighbor with weight k-1, an so on until the least nearest neighbor which is assigne with a weight of 1. Number of threas to be use when parallelizing, efault is number of cores etecte - 1 or 1 if there is onl one core. knn_optim_parallel2(airpassengers, 1:5, 1:3, threas = 2) knn_optim_parallel2(lakehuron, 1:10, 1:6, threas = 2) knn_optim_parallelf Parallel k an optimization reaing from files Optimizes the values of k an for a given time series. First, values corresponing to instants from init + 1 to the last one are preicte. The first value preicte, which correspons to instant init + 1, is calculate using instants from 1 to instant init; the secon value preicte, which correspons to instant init + 2, is preicte using instants from 1 to instant init + 1; an so on until the last value, which correspons to instant n (length of the given time series), is preicte using instants from 1 to instant n - 1. Finall, the error is evaluate between the preicte values an the real values of the series. This version of the optimization function uses a parallelize istances calculation
knn_optim_parallelf 9 function, an the computation of the preicte values is one parallelizing b the number of s an the number of instants to be preicte. Each threa that calculates preicte values reas onl the part of the corresponing istances matrix in which the information use to preict is containe. knn_optim_parallelf(, k,, v = 1, init = NULL, error_metric = "MAE", weight = "proximit", threas = NULL, file, cols) k v init error_metric weight threas file cols s of k;s to be analze. s of s to be analze. Variable to be preicte if given multivariate time series. Variable that etermines the limit of the known past for the first instant preicte. Tpe of metric to evaluate the preiction error. Five metrics supporte: ME Mean Error RMSE Root Mean Square Error MAE Mean Absolute Error MPE Mean Percentage Error MAPE Mean Absolute Percentage Error Tpe of weight to be use at the time of calculating the preicte value with a weighte mean. Three supporte: proximit, same, linear. proximit the weight assigne to each neighbor is proportional to its istance same all neighbors are assigne with the same weight linear nearest neighbor is assigne with weight k, secon closest neighbor with weight k-1, an so on until the least nearest neighbor which is assigne with a weight of 1. Number of threas to be use when parallelizing, efault is number of cores etecte - 1 or 1 if there is onl one core. Path an i of the files where the istances matrixes are. Number of columns per file. A matrix of errors, optimal k an. knn_istances(airpassengers, 1:3, file = "AirPassengers", cols = 2, threas = 2) knn_optim_parallelf(airpassengers, 1:5, 1:3, file = "AirPassengers", cols = 2, threas = 2) knn_istances(lakehuron, 1:6, file = "LakeHuron", cols = 10, threas = 2) knn_optim_parallelf(lakehuron, 1:10, 1:6, file = "LakeHuron", cols = 10, threas = 2)
10 knn_past knn_past Past time preiction Preicts values of the time series using k-nearest neighbors algorithm. s corresponing to instants from init + 1 to the last one are preicte. The first value preicte, which correspons to instant init + 1, is calculate using instants from 1 to instant init; the secon value preicte, which correspons to instant init + 2, is preicte using instants from 1 to instant init + 1; an so on until the last value, which correspons to instant n (length of the given time series), is preicte using instants from 1 to instant n - 1. knn_past(, k,, v = 1, init = NULL, istance_metric = "eucliean", weight = "proximit", threas = NULL) k v Number of neighbors. Length of each of the elements. Variable to be preicte if given multivariate time series. init Variable that etermines the limit of the known past for the first instant preicte. istance_metric Tpe of metric to evaluate the istance between points. Man metrics are supporte: eucliean, manhattan, namic time warping, canberra an others. For more information about the supporte metrics check the values that metho argument of function pardist (from paralleldist package) can take as this is the function use to calculate the istances. Link to the package info: https: //cran.r-project.org/package=paralleldist. Some of the values that this argument can take are "eucliean", "manhattan", "tw", "canberra", "chor". weight threas The preicte values. Tpe of weight to be use at the time of calculating the preicte value with a weighte mean. Three supporte: proximit, same, linear. proximit the weight assigne to each neighbor is proportional to its istance same all neighbors are assigne with the same weight linear nearest neighbor is assigne with weight k, secon closest neighbor with weight k-1, an so on until the least nearest neighbor which is assigne with a weight of 1. Number of threas to be use when parallelizing, efault is number of cores etecte - 1 or 1 if there is onl one core.
knn_past 11 knn_past(airpassengers, 5, 2, threas = 2) knn_past(lakehuron, 3, 6, threas = 2)
Inex knn_istances, 2 knn_elements, 3 knn_next, 3 knn_optim, 4 knn_optim_parallel, 6 knn_optim_parallel2, 7 knn_optim_parallelf, 8 knn_past, 10 12