Package GetHFData. November 28, 2017

Similar documents
Package GetITRData. October 22, 2017

Package fastdummies. January 8, 2018

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package ECctmc. May 1, 2018

Package validara. October 19, 2017

Package robotstxt. November 12, 2017

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package fitbitscraper

Package RODBCext. July 31, 2017

Package datasets.load

Package docxtools. July 6, 2018

Package githubinstall

Package SEMrushR. November 3, 2018

Package tidytransit. March 4, 2019

Package canvasxpress

Package knitrprogressbar

Package BiocManager. November 13, 2018

Package gtrendsr. October 19, 2017

Package fingertipsr. May 25, Type Package Version Title Fingertips Data for Public Health

Package opencage. January 16, 2018

Package lumberjack. R topics documented: July 20, 2018

Package reval. May 26, 2015

Package barcoder. October 26, 2018

Package nngeo. September 29, 2018

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package gtrendsr. August 4, 2018

Package WordR. September 7, 2017

Package TrafficBDE. March 1, 2018

Package censusr. R topics documented: June 14, Type Package Title Collect Data from the Census API Version 0.0.

Package tidylpa. March 28, 2018

Package clipr. June 23, 2018

Package raker. October 10, 2017

Package vinereg. August 10, 2018

Package facerec. May 14, 2018

Package jpmesh. December 4, 2017

Package smapr. October 20, 2017

Package fastqcr. April 11, 2017

Package editdata. October 7, 2017

Package rprojroot. January 3, Title Finding Files in Project Subdirectories Version 1.3-2

Package bisect. April 16, 2018

Package rpostgislt. March 2, 2018

Package guardianapi. February 3, 2019

Package splithalf. March 17, 2018

Package rgho. R topics documented: January 18, 2017

Package spark. July 21, 2017

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package dbx. July 5, 2018

Package strat. November 23, 2016

Package catenary. May 4, 2018

Package condusco. November 8, 2017

Package edfreader. R topics documented: May 21, 2017

Package climber. R topics documented:

Package postal. July 27, 2018

Package spelling. December 18, 2017

Package rsppfp. November 20, 2018

Package deductive. June 2, 2017

Package ezsummary. August 29, 2016

Package statsdk. September 30, 2017

Package effectr. January 17, 2018

Package readxl. April 18, 2017

Package data.world. April 5, 2018

Package modmarg. R topics documented:

Package virustotal. May 1, 2017

Package oec. R topics documented: May 11, Type Package

Package geneslope. October 26, 2016

Package RndTexExams. R topics documented:

Package calpassapi. August 25, 2018

Package meme. December 6, 2017

Package mdftracks. February 6, 2017

Package profvis. R topics documented:

Package AdhereR. June 15, 2018

Package svalues. July 15, 2018

Package wrswor. R topics documented: February 2, Type Package

Package datapasta. January 24, 2018

Package qualmap. R topics documented: September 12, Type Package

Package ompr. November 18, 2017

Package pdfetch. October 15, 2017

Package eply. April 6, 2018

Package readobj. November 2, 2017

Package triebeard. August 29, 2016

Package patentsview. July 12, 2017

Package quickreg. R topics documented:

Package rnn. R topics documented: June 21, Title Recurrent Neural Network Version 0.8.1

Package gifti. February 1, 2018

Package postgistools

Package meme. November 2, 2017

Package pdfsearch. July 10, 2018

Package repec. August 31, 2018

Package censusapi. August 19, 2018

Package ezknitr. September 16, 2016

Package preprosim. July 26, 2016

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package benchmarkme. R topics documented: February 26, Type Package Title Crowd Sourced System Benchmarks Version 0.2.

Package states. May 4, 2018

Package geogrid. August 19, 2018

Package internetarchive

Package ecoseries. R topics documented: September 27, 2017

Package BANEScarparkinglite

Transcription:

Package GetHFData November 28, 2017 Title Download and Aggregate High Frequency Trading Data from Bovespa Version 1.5 Date 2017-11-27 Downloads and aggregates high frequency trading data for Brazilian instruments directly from Bovespa ftp site <ftp://ftp.bmf.com.br/marketdata/>. Depends R (>= 3.3.0) Imports stringr,stats,rcurl, lubridate, readr, utils, curl,dplyr License GPL-2 BugReports https://github.com/msperlin/gethfdata/issues URL https://github.com/msperlin/gethfdata/ LazyData true RoxygenNote 6.0.1 Suggests knitr, rmarkdown, testthat, ggplot2 VignetteBuilder knitr NeedsCompilation no Author Marcelo Perlin [aut, cre], Henrique Ramos [ctb] Maintainer Marcelo Perlin <marceloperlin@gmail.com> Repository CRAN Date/Publication 2017-11-28 19:54:43 UTC R topics documented: add.order.......................................... 2 ghfd_build_lob....................................... 2 ghfd_download_file..................................... 3 ghfd_get_available_tickers_from_file........................... 4 ghfd_get_available_tickers_from_ftp............................ 5 ghfd_get_ftp_contents................................... 6 1

2 ghfd_build_lob ghfd_get_hf_data..................................... 6 ghfd_read_file........................................ 8 ghfd_read_file.orders.................................... 9 ghfd_read_file.trades.................................... 10 organize.lob......................................... 11 print.lob........................................... 11 process.lob.from.df..................................... 12 Index 13 add.order Adds an order to the LOB Adds an order to the LOB add.order(my.lob, order.in, silent = TRUE) my.lob order.in silent A LOB (order book) An order from the data Should the function print progress? (TRUE (default) or FALSE) An LOB with the new order # no example (internal) ghfd_build_lob Building LOB (limit order book) from orders Building LOB (limit order book) from orders ghfd_build_lob(df.orders, silent = TRUE)

ghfd_download_file 3 df.orders silent A dataframe, output from ghfd_gethfdata Should the function print progress? (TRUE (default) or FALSE) A dataframe with information about LOB ## Not run: library(gethfdata) first.time <- '11:00:00' last.time <- '17:00:00' first.date <- as.date('2015-11-03') last.date <- as.date('2015-11-03') type.output <- 'raw' type.data <- 'orders' type.market = 'equity-odds' df.out <- ghfd_get_hf_data(my.assets =my.assets, type.market = type.market, type.data = type.data, first.date = first.date, last.date = last.date, first.time = first.time, last.time = last.time, type.output = type.output) df.lob <- ghfd_build_lob(df.out) ## End(Not run) ghfd_download_file Downloads a single file from Bovespa ftp This function will take as input a ftp addresss, the name of the downloaded file in the local drive, and it will download the corresponding file. Returns TRUE if it worked and FALSE otherwise. ghfd_download_file(my.ftp, out.file, dl.dir = "Dl Files", max.dl.tries = 10)

4 ghfd_get_available_tickers_from_file my.ftp out.file dl.dir max.dl.tries A complete, including file name, ftp address to download the file from Name of downloaded file with HFT data from Bovespa The folder to download the zip files (default = ftp files ) Maximum attempts to download the files from ftp TRUE if sucessfull, FALSE if not my.ftp <- 'ftp://ftp.bmf.com.br/marketdata/bovespa-opcoes/neg_opcoes_20151229.zip' out.file <- 'temp.zip' ## Not run: ghfd_download_file(my.ftp = my.ftp, out.file=out.file) ## End(Not run) ghfd_get_available_tickers_from_file Function to get available tickers from downloaded zip file This function will read the zip file downloaded from Bovespa and output a numeric vector where the names of the elements represents the different tickers and the numeric values as the number of trades for each ticker ghfd_get_available_tickers_from_file(out.file) out.file Name of downloaded file with HFT data from Bovespa A dataframe with the number of trades for each ticker found in file

ghfd_get_available_tickers_from_ftp 5 ## get file from package (usually this would be been downloaded from the ftp) out.file <- system.file("extdata", 'NEG_OPCOES_20151126.zip', package = "GetHFData") df.tickers <- ghfd_get_available_tickers_from_file(out.file) print(head(df.tickers)) ghfd_get_available_tickers_from_ftp Function to get available tickers from ftp This function will read the Bovespa ftp for a given market/date and output a numeric vector where the names of the elements represents the different tickers and the numeric values as the number of trades for each ticker ghfd_get_available_tickers_from_ftp(my.date = "2015-11-03", type.market = "equity", type.data = "trades", dl.dir = "ftp files", max.dl.tries = 10) my.date A single date to check tickers in ftp (e.g. 2015-11-03 ) type.market type.data dl.dir max.dl.tries The type of market to download data from ( equity, equity-odds, options, BMF ). The type of financial data to download and aggregate ( trades or orders ). The folder to download the zip files (default = ftp files ) Maximum attempts to download the files from ftp A data.frame with the tickers, number of found trades and file name ## Not run: df.tickers <- ghfd_get_available_tickers_from_ftp(my.date = '2015-11-03', type.market = 'BMF') print(head(df.tickers)) ## End(Not run)

6 ghfd_get_hf_data ghfd_get_ftp_contents Gets the contents of Bovespa ftp This function will access the Bovespa ftp and return a vector with all files related to trades (all others are ignored) ghfd_get_ftp_contents(type.market = "equity", max.dl.tries = 10, type.data = "trades") type.market max.dl.tries type.data The type of market to download data from ( equity, equity-odds, options, BMF ). Maximum attempts to download the files from ftp The type of financial data to download and aggregate ( trades or orders ). A list with all files from the ftp that are related to executed trades ## Not run: ftp.files <- ghfd_get_ftp_contents(type.market = 'equity') print(ftp.files) ## End(Not run) ghfd_get_hf_data Downloads and aggregates high frequency trading data directly from the Bovespa ftp This function downloads zip files containing trades from Bovespa s ftp (ftp://ftp.bmf.com.br/marketdata/) and imports it into R. See the vignette and examples for more details on how to use the function.

ghfd_get_hf_data 7 ghfd_get_hf_data(my.assets = NULL, type.matching = "exact", type.market = "equity", type.data = "trades", first.date = "2016-01-01", last.date = "2016-01-05", first.time = NULL, last.time = NULL, type.output = "agg", agg.diff = "15 min", dl.dir = "ftp files", max.dl.tries = 10, clean.files = FALSE, only.dl = FALSE) my.assets The tickers (symbols) of the derised assets to import data (e.g. c( PETR4, VALE5 )). The function allow for partial patching (e.g. PETR for all assets related to Petrobras). Default is set to NULL (download all available tickers) type.matching type.market type.data Type of matching for asset names in data ( exact or partial ) The type of market to download data from ( equity, equity-odds, options, BMF ). The type of financial data to download and aggregate ( trades or orders ). first.date The first date of the imported data (e.g. 2016-01-01 ) last.date The last date of the imported data (e.g. 2016-01-05 ) first.time last.time type.output agg.diff dl.dir max.dl.tries clean.files only.dl The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. 10:00:00. The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. 18:00:00. Defines the type of output of the data. The choice agg outputs aggregated data for time intervals defined in agg.diff. The choice raw outputs the raw, tick by tick/order by order, data from the zip files. The time interval used in the aggregation of data. Only used for type.output= agg. It should contain a integer followed by a time unit ( sec or secs, min or mins, hour or hours, day or days ). Example: agg.diff = 15 mins, agg.diff = 1 hour. The folder to download the zip files (default = ftp files ) Maximum attempts to download the files from ftp Logical. Should the files be removed after reading it? (TRUE or FALSE) Logical. Should the function only download the files? (TRUE or FALSE). This is usefull if you just want the file for later analysis A dataframe with the financial data in the raw format (tick by tick) or aggregated my.assets <- 'ABEVA69' type.market <- 'options' first.date <- as.date('2015-12-29')

8 ghfd_read_file last.date <- as.date('2015-12-29') ## Not run: df.out <- ghfd_get_hf_data(my.assets, type.market, first.date, last.date) ## End(Not run) ghfd_read_file Reads zip file downloaded from Bovespa ftp (trades or orders) Reads zip file downloaded from Bovespa ftp (trades or orders) ghfd_read_file(out.file, my.assets = NULL, type.matching = "exact", type.data = "trades", first.time = "10:00:00", last.time = "17:00:00", type.output = "agg", agg.diff = "15 min") out.file Name of zip file my.assets The tickers (symbols) of the derised assets to import data (e.g. c( PETR4, VALE5 )). The function allow for partial patching (e.g. PETR for all assets related to Petrobras). Default is set to NULL (download all available tickers) type.matching type.data first.time last.time type.output agg.diff Type of matching for asset names in data ( exact or partial ) The type of financial data to download and aggregate ( trades or orders ). The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. 10:00:00. The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. 18:00:00. Defines the type of output of the data. The choice agg outputs aggregated data for time intervals defined in agg.diff. The choice raw outputs the raw, tick by tick/order by order, data from the zip files. The time interval used in the aggregation of data. Only used for type.output= agg. It should contain a integer followed by a time unit ( sec or secs, min or mins, hour or hours, day or days ). Example: agg.diff = 15 mins, agg.diff = 1 hour. A dataframe with the raw (tick by tick/order by order) dataset

ghfd_read_file.orders 9 my.assets <- c('abeva20', 'PETRL78') ## getting data from local file (in practice it would be downloaded from ftp) out.file <- system.file("extdata", 'NEG_OPCOES_20151126.zip', package = "GetHFData") df.out <- ghfd_read_file(out.file, my.assets) print(head(df.out)) ghfd_read_file.orders Reads zip file downloaded from Bovespa ftp (orders) - INTERNAL USE Reads zip file downloaded from Bovespa ftp (orders) - INTERNAL USE ghfd_read_file.orders(out.file, my.assets = NULL, type.matching = NULL, first.time = "10:00:00", last.time = "17:00:00", type.output = "agg", agg.diff = "15 min") out.file Name of zip file my.assets The tickers (symbols) of the derised assets to import data (e.g. c( PETR4, VALE5 )). The function allow for partial patching (e.g. PETR for all assets related to Petrobras). Default is set to NULL (download all available tickers) type.matching first.time last.time type.output agg.diff Type of matching for asset names in data ( exact or partial ) The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. 10:00:00. The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. 18:00:00. Defines the type of output of the data. The choice agg outputs aggregated data for time intervals defined in agg.diff. The choice raw outputs the raw, tick by tick/order by order, data from the zip files. The time interval used in the aggregation of data. Only used for type.output= agg. It should contain a integer followed by a time unit ( sec or secs, min or mins, hour or hours, day or days ). Example: agg.diff = 15 mins, agg.diff = 1 hour. A dataframe with trade data (aggregated or raw)

10 ghfd_read_file.trades # no example ghfd_read_file.trades Reads zip file downloaded from Bovespa ftp (trades) - INTERNAL USE Reads zip file downloaded from Bovespa ftp (trades) - INTERNAL USE ghfd_read_file.trades(out.file, my.assets = NULL, type.matching = NULL, first.time = "10:00:00", last.time = "17:00:00", type.output = "agg", agg.diff = "15 min") out.file Name of zip file my.assets The tickers (symbols) of the derised assets to import data (e.g. c( PETR4, VALE5 )). The function allow for partial patching (e.g. PETR for all assets related to Petrobras). Default is set to NULL (download all available tickers) type.matching first.time last.time type.output agg.diff Type of matching for asset names in data ( exact or partial ) The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. 10:00:00. The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. 18:00:00. Defines the type of output of the data. The choice agg outputs aggregated data for time intervals defined in agg.diff. The choice raw outputs the raw, tick by tick/order by order, data from the zip files. The time interval used in the aggregation of data. Only used for type.output= agg. It should contain a integer followed by a time unit ( sec or secs, min or mins, hour or hours, day or days ). Example: agg.diff = 15 mins, agg.diff = 1 hour. A dataframe with trade data (aggregated or raw) # no example

organize.lob 11 organize.lob Organizes LOB (internal function) This internal recursive function organizes the lob by making sure that all prices and time are ordered. Every time that prices in the bid and ask matches, it will create a trade and modify the lob accordingly. organize.lob(my.lob, silent = TRUE) my.lob silent A LOB (order book) Should the function print progress? (TRUE (default) or FALSE) An organized LOB # no examples (internal) print.lob Prints the LOB Prints the LOB ## S3 method for class 'lob' print(my.lob, max.level = 3) my.lob max.level A LOB (order book) Max level of lob to print

12 process.lob.from.df nothing # no example (internal) process.lob.from.df Process LOB from asset dataframe Process LOB from asset dataframe process.lob.from.df(asset.df, silent = T) asset.df silent A dataframe with orders for a single asset Should the function print progress? (TRUE (default) or FALSE) The lob for the single asset # no example (internal)

Index add.order, 2 ghfd_build_lob, 2 ghfd_download_file, 3 ghfd_get_available_tickers_from_file, 4 ghfd_get_available_tickers_from_ftp, 5 ghfd_get_ftp_contents, 6 ghfd_get_hf_data, 6 ghfd_read_file, 8 ghfd_read_file.orders, 9 ghfd_read_file.trades, 10 organize.lob, 11 print.lob, 11 process.lob.from.df, 12 13