Package reconstructr

Similar documents
Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package validara. October 19, 2017

Package triebeard. August 29, 2016

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package ECctmc. May 1, 2018

Package rwars. January 14, 2017

Package fastdummies. January 8, 2018

Package lumberjack. R topics documented: July 20, 2018

Package ezsummary. August 29, 2016

Package robotstxt. November 12, 2017

Package fst. December 18, 2017

Package githubinstall

Package canvasxpress

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package spark. July 21, 2017

Package docxtools. July 6, 2018

Package intcensroc. May 2, 2018

Package urltools. October 17, 2016

Package fst. June 7, 2018

Package knitrprogressbar

Package jpmesh. December 4, 2017

Package dbx. July 5, 2018

Package nngeo. September 29, 2018

Package opencage. January 16, 2018

Package kdtools. April 26, 2018

Package crossword.r. January 19, 2018

Package fitbitscraper

Package rzeit2. January 7, 2019

Package dkanr. July 12, 2018

Package padr. November 16, 2017

Package farver. November 20, 2018

Package WikipediR. February 5, 2017

Package loggit. April 9, 2018

Package pdfsearch. July 10, 2018

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package strat. November 23, 2016

Package rgho. R topics documented: January 18, 2017

Package humanize. R topics documented: April 4, Version Title Create Values for Human Consumption

Package climber. R topics documented:

Package rbraries. April 18, 2018

Package timechange. April 26, 2018

Package understandbpmn

Package ompr. November 18, 2017

Package censusr. R topics documented: June 14, Type Package Title Collect Data from the Census API Version 0.0.

Package BiocManager. November 13, 2018

Package facerec. May 14, 2018

Package httpcache. October 17, 2017

Package rucrdtw. October 13, 2017

Package projector. February 27, 2018

Package RPresto. July 13, 2017

Package ramcmc. May 29, 2018

Package guardianapi. February 3, 2019

Package leaflet.minicharts

Package rtext. January 23, 2019

Package statsdk. September 30, 2017

Package textrank. December 18, 2017

Package epitab. July 4, 2018

Package splithalf. March 17, 2018

Package tidytransit. March 4, 2019

Package semver. January 6, 2017

Package labelvector. July 28, 2018

Package MTLR. March 9, 2019

Package svalues. July 15, 2018

Package wikitaxa. December 21, 2017

Package readxl. April 18, 2017

Package virustotal. May 1, 2017

Package lexrankr. December 13, 2017

Package PUlasso. April 7, 2018

Package bisect. April 16, 2018

Package snplist. December 11, 2017

Package WordR. September 7, 2017

Package postal. July 27, 2018

Package reval. May 26, 2015

Package patentsview. July 12, 2017

Package wrswor. R topics documented: February 2, Type Package

Package milr. June 8, 2017

Package internetarchive

Package RODBCext. July 31, 2017

Package eply. April 6, 2018

Package BANEScarparkinglite

Package vinereg. August 10, 2018

Package repec. August 31, 2018

Package modules. July 22, 2017

Package tibble. August 22, 2017

Package textrecipes. December 18, 2018

Package radiomics. March 30, 2018

Package UNF. June 13, 2017

Package geoops. March 19, 2018

Package readobj. November 2, 2017

Package deductive. June 2, 2017

Package skm. January 23, 2017

Package rdryad. June 18, 2018

Package datasets.load

Package tidyimpute. March 5, 2018

Package sparsehessianfd

Package Rtsne. April 14, 2017

Package lvec. May 24, 2018

Package fastrtext. December 10, 2017

Package editdata. October 7, 2017

Package bupar. March 21, 2018

Transcription:

Type Package Title Session Reconstruction and Analysis Version 2.0.2 Date 2018-07-26 Author Oliver Keyes Package reconstructr July 26, 2018 Maintainer Oliver Keyes <ironholds@gmail.com> Functions to reconstruct sessions from web log or other user trace data and calculate various metrics around them, producing tabular, output that is compatible with 'dplyr' or 'data.table' centered processes. License MIT + file LICENSE LinkingTo Rcpp Imports Rcpp, openssl Suggests testthat, knitr VignetteBuilder knitr URL https://github.com/ironholds/reconstructr BugReports https://github.com/ironholds/reconstructr/issues RoxygenNote 6.0.1 Depends R (>= 3.3.0) NeedsCompilation yes Repository CRAN Date/Publication 2018-07-26 19:00:03 UTC R topics documented: bounce_rate......................................... 2 reconstructr......................................... 3 sessionise.......................................... 3 session_count........................................ 4 session_dataset....................................... 5 session_length........................................ 5 time_on_page........................................ 6 1

2 bounce_rate Index 8 bounce_rate calculate the bounce rate within a session dataset Calculates the "bounce rate" within a set of sessions - the proportion of sessions consisting only of a single event. bounce_rate(sessions, user_id = NULL, precision = 2) sessions user_id precision a sessions dataset, presumably generated with sessionise. a column that contains unique user IDs. NULL by default; if set, the assumption will be that you want per-user bounce rates. the number of decimal places to round the output to - set to 2 by default. either a single numeric value, representing the percentage of sessions overall that are bounces, or a data.frame of user IDs and bounce rates if user_id is set to a column rather than NULL. See Also sessionise for session reconstruction, and session_length, session_count and time_on_page for other session-related metrics. #Load and sessionise the dataset sessions <- sessionise(session_dataset, timestamp, uuid) # Calculate overall bounce rate rate <- bounce_rate(sessions) # Calculate bounce rate on a per-user basis per_user <- bounce_rate(sessions, user_id = uuid)

reconstructr 3 reconstructr functions for session reconstruction and analysis sessionreconstruct provides functions to aid in reconstructing and analysing user sessions. Although primarily designed for web sessions (see the introductory vignette), its session approach is plausibly applicable to other domains. Author(s) Oliver Keyes <okeyes@wikimedia.org> sessionise Reconstruct sessions (experimental) sessionise takes a data.frame of events (including timestamps and user IDs) and sessionises them, returning the same data.frame but with two additional columns - one containing a unique session ID, and one the time difference between successive events in the same session. sessionise(x, timestamp, user_id, threshold = 3600) x timestamp user_id threshold a data.frame of events. the name of the column of x containing timestamps, which should be (either) a representation of the number of seconds, or a POSIXct or POSIXlt date/time object. If it is neither, strptime can be used to convert most representations of date-times into POSIX formats. the name of the column of x containing unique user IDs. the number of seconds to use as the intertime threshold - the time that can elapse between two events before the second is considered part of a new session. Set to 3600 (one hour) by default. x, ordered by userid and timestamp, with two new columns - session_id (containing a unique ID for the session a row is in) and delta (containing the time elapsed between that row s event, and the previous event, if they were both in the same session).

4 session_count See Also bounce_rate, time_on_page, session_length and session_count - common metrics that can be calculated with a sessionised dataset. # Take a dataset with URLs and similar metadata and sessionise it - # retaining that metadata sessionised_data <- sessionise(x = session_dataset, timestamp = timestamp, user_id = uuid, threshold = 1800) session_count Count the number of sessions in a sessionised dataset link{session_count} counts the number of sessions in a sessionised dataset, producing either a count for the overall dataset or on a per-user basis (see below). session_count(sessions, user_id = NULL) sessions user_id a dataset of sessions, presumably generated by sessionise the column of sessions containing user IDs. If NULL (the default), a single count of sessions for the entire dataset will be generated. Otherwise, a data.frame of user IDs and the session count for each user ID will be returned. either a single integer value or a data.frame (see above). #Load and sessionise the dataset sessions <- sessionise(session_dataset, timestamp, uuid) # Calculate overall bounce rate count <- session_count(sessions)

session_dataset 5 # Calculate session count on a per-user basis per_user <- session_count(sessions, user_id = uuid) session_dataset Example event dataset an example dataset of events, for experimenting with session reconstruction and analysis session_dataset Format a data.frame of 63,524 rows consisting of: uuid Hashed and salted unique identifiers representing 10,000 unique clients. timestamp timestamps, as POSIXct objects url URLs, to demonstrate the carrying-along of metadata through the sessionisation process Source The uuid and timestamp columns come from an anonymised dataset of Wikipedia readers; the URLs are from NASA s internal web server, because space is awesome. session_length Calculate session length Calculate the overall length of each session. session_length(sessions) sessions a dataset of sessions, presumably generated with sessionise.

6 time_on_page a data.frame of two columns - session_id, containing unique session IDs, and session_length, containing the length (in seconds) of that particular session. Please note that these lengths should be considered a minimum; because of how sessions behave, calculating the time-on-page of the last event in a session is impossible. See Also sessionise for session reconstruction, and time_on_page, session_count and bounce_rate for other session-related metrics. #Load and sessionise the dataset sessions <- sessionise(session_dataset, timestamp, uuid) # Calculate session length len <- session_length(sessions) time_on_page Calculate time-on-page metrics time_on_page generates metrics around the mean (or median) time-on-page - on an overall, peruser, or per-session basis. time_on_page(sessions, by_session = FALSE, median = FALSE, precision = 2) sessions by_session median precision a sessions dataset, presumably generated with sessionise. Whether to generate time-on-page for the dataset overall (FALSE), or on a persession basis (TRUE). FALSE by default. whether to generate the median (TRUE) or mean (FALSE) time-on-page. FALSE by default. the number of decimal places to round the output to - set to 2 by default. either a single numeric value, representing the mean/median time on page for the overall dataset, or a data.frame of session IDs and numeric values if by_session is TRUE.

time_on_page 7 See Also sessionise for session reconstruction, and session_length, session_count and bounce_rate for other session-related metrics. #Load and sessionise the dataset sessions <- sessionise(session_dataset, timestamp, uuid) # Calculate overall time on page top <- time_on_page(sessions) # Calculate time-on-page on a per_session basis per_session <- time_on_page(sessions, by_session = TRUE) # Use median instead of mean top_med <- time_on_page(sessions, median = TRUE)

Index Topic datasets session_dataset, 5 bounce_rate, 2, 4, 6, 7 reconstructr, 3 reconstructr-package (reconstructr), 3 session_count, 2, 4, 4, 6, 7 session_dataset, 5 session_length, 2, 4, 5, 7 sessionise, 2, 3, 4 7 strptime, 3 time_on_page, 2, 4, 6, 6 8