Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Version 0.1.3 Date 2018-08-12 Title Read Large Text Files Package bigreadr August 13, 2018 Read large text s by splitting them in smaller s. License GPL-3 Encoding UTF-8 LazyData true ByteCompile true RoxygenNote 6.1.0 Imports data.table, Rcpp, parallel, fpeek, utils Suggests spelling, testthat, covr LinkingTo Rcpp Language en-us URL https://github.com/privefl/bigreadr BugReports https://github.com/privefl/bigreadr/issues NeedsCompilation yes Author Florian Privé [aut, cre] Maintainer Florian Privé <florian.prive.21@gmail.com> Repository CRAN Date/Publication 2018-08-13 16:30:09 UTC R topics documented: big_fread1.......................................... 2 big_fread2.......................................... 2 cbind_df........................................... 3 fread2............................................ 4 fwrite2............................................ 4 nlines............................................ 5 rbind_df........................................... 6 split_........................................... 6 1

2 big_fread2 Index 8 big_fread1 Read large text Read large text by splitting lines. big_fread1(, every_nlines,.transform = identity,.combine = rbind_df, skip = 0,..., print_timings = TRUE) every_nlines.transform.combine skip Path to that you want to read. Maximum number of lines in new parts. Function to transform each data frame corresponding to each part of the. Default doesn t change anything. Function to combine results (list of data frames). Number of lines to skip at the beginning of.... Other arguments to be passed to data.table::fread, excepted input,, skip and col.names. print_timings Whether to print timings? Default is TRUE. A data.frame by default; a data.table when data.table = TRUE. big_fread2 Read large text Read large text by splitting columns. big_fread2(, nb_parts = NULL,.transform = identity,.combine = cbind_df, skip = 0, select = NULL, progress = FALSE, part_size = 500 * 1024^2,...)

cbind_df 3 nb_parts.transform.combine skip select progress Path to that you want to read. Number of parts in which to split reading (and transforming). Parts are referring to blocks of selected columns. Default uses part_size to set a good value. Function to transform each data frame corresponding to each block of selected columns. Default doesn t change anything. Function to combine results (list of data frames). Number of lines to skip at the beginning of. Indices of columns to keep (sorted). Default keeps them all. Show progress? Default is FALSE. part_size Size of the parts if nb_parts is not supplied. Default is 500 * 1024^2 (500 MB).... Other arguments to be passed to data.table::fread, excepted input,, skip, select and showprogress. The outputs of fread2 +.transform, combined with.combine. cbind_df Merge data frames Merge data frames cbind_df(list_df) list_df A list of multiple data frames with the same observations in the same order. One merged data frame. cbind_df(list(iris, iris))

4 fwrite2 fread2 Read a text Read a text fread2(,..., data.table = FALSE, nthread = getoption("bigreadr.nthread")) Path to the that you want to read from.... Other arguments to be passed to data.table::fread. data.table nthread Whether to return a data.table or just a data.frame? Default is FALSE (and is the opposite of data.table::fread). Number of threads to use. Default uses all threads minus one. A data.frame by default; a data.table when data.table = TRUE. tmp <- fwrite2(iris) iris2 <- fread2(tmp) all.equal(iris2, iris) ## fread doesn't use factors fwrite2 Write a data frame to a text Write a data frame to a text fwrite2(x, = temp(),..., quote = FALSE, nthread = getoption("bigreadr.nthread"))

nlines 5 x Data frame to write. Path to the that you want to write to. Defaults uses temp().... Other arguments to be passed to data.table::fwrite. quote Whether to quote strings (default is FALSE). nthread Number of threads to use. Default uses all threads minus one. Input parameter, invisibly. tmp <- fwrite2(iris) iris2 <- fread2(tmp) all.equal(iris2, iris) ## fread doesn't use factors nlines Number of lines Get the number of lines of a using R package fpeek. nlines() Path of the. The number of lines as one integer. tmp <- fwrite2(iris) nlines(tmp)

6 split_ rbind_df Merge data frames Merge data frames rbind_df(list_df) list_df A list of multiple data frames with the same variables in the same order. One merged data frame with the names of the first input data frame. rbind_df(list(iris, iris)) split_ Split every nlines Split every nlines Get s from splitting. split_(, every_nlines, prefix_out = temp()) get_split_s(split out) Path to that you want to split. every_nlines Maximum number of lines in new parts. prefix_out Prefix for created s. Default uses temp(). split out Output of split_.

split_ 7 A list with name_in: input parameter, prefix_out: input parameter prefix_out, ns: Number of s (parts) created, nlines_part: input parameter every_nlines, nlines_all: total number of lines of. Vector of paths created by split_. tmp <- fwrite2(iris) infos <- split_(tmp, 100) str(infos) get_split_s(infos)

Index big_fread1, 2 big_fread2, 2 cbind_df, 3 data.table::fread, 2 4 data.table::fwrite, 5 fread2, 4 fwrite2, 4 get_split_s (split_), 6 nlines, 5 rbind_df, 6 split_, 6, 6, 7 8