Package pksea. December 22, 2017

Similar documents
Package datasets.load

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package IATScore. January 10, 2018

Package sfc. August 29, 2016

Package sigqc. June 13, 2018

Package TANDEM. R topics documented: June 15, Type Package

Package kirby21.base

Package crossword.r. January 19, 2018

Package messaging. May 27, 2018

Package jstree. October 24, 2017

Package fastdummies. January 8, 2018

Package repec. August 31, 2018

Package spelling. December 18, 2017

Package comparedf. February 11, 2019

Package tidyimpute. March 5, 2018

Package gtrendsr. October 19, 2017

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package robotstxt. November 12, 2017

Package mmpa. March 22, 2017

Package textrank. December 18, 2017

Package barcoder. October 26, 2018

Package bisect. April 16, 2018

Package SEMrushR. November 3, 2018

Package lxb. R topics documented: August 29, 2016

Package oasis. February 21, 2018

Package PCADSC. April 19, 2017

Package fastrtext. December 10, 2017

Package PlasmaMutationDetector

Package spark. July 21, 2017

Package effectr. January 17, 2018

Package autoshiny. June 25, 2018

Package gtrendsr. August 4, 2018

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package RNAseqNet. May 16, 2017

Package ecoseries. R topics documented: September 27, 2017

Package QCAtools. January 3, 2017

Package dalmatian. January 29, 2018

Package dbx. July 5, 2018

Package TVsMiss. April 5, 2018

Package calpassapi. August 25, 2018

Package auctestr. November 13, 2017

Package gquad. June 7, 2017

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package ECctmc. May 1, 2018

Package fastqcr. April 11, 2017

Package strat. November 23, 2016

Package optimus. March 24, 2017

Package available. November 17, 2017

Package ssh. June 4, 2018

Package wikitaxa. December 21, 2017

Package farver. November 20, 2018

Package mdftracks. February 6, 2017

Package geogrid. August 19, 2018

Package librarian. R topics documented:

Package IRkernel. January 7, 2019

Package rzeit2. January 7, 2019

Package estprod. May 2, 2018

Package humanize. R topics documented: April 4, Version Title Create Values for Human Consumption

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package splithalf. March 17, 2018

Package CuCubes. December 9, 2016

Package readobj. November 2, 2017

Package facerec. May 14, 2018

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package raker. October 10, 2017

Package essurvey. August 23, 2018

Package fst. December 18, 2017

Package shinyhelper. June 21, 2018

Package seqcat. March 25, 2019

Package ClusterBootstrap

Package rpst. June 6, 2017

Package censusr. R topics documented: June 14, Type Package Title Collect Data from the Census API Version 0.0.

Package linkspotter. July 22, Type Package

Package GetITRData. October 22, 2017

Package githubinstall

Package mgc. April 13, 2018

Package lumberjack. R topics documented: July 20, 2018

Package ASICS. January 23, 2018

Package hypercube. December 15, 2017

Package weco. May 4, 2018

Package projector. February 27, 2018

Package staplr. May 9, 2018

Package signalhsmm. August 29, 2016

Package cregulome. September 13, 2018

Package MetaLonDA. R topics documented: January 22, Type Package

Package tidylpa. March 28, 2018

Package internetarchive

Package geniusr. December 6, 2017

Package validara. October 19, 2017

Package rcv. August 11, 2017

Package BASiNET. October 2, 2018

Package dkanr. July 12, 2018

Package reval. May 26, 2015

Package fitbitscraper

Package IgorR. May 21, 2017

Package d3plus. September 25, 2017

Package meme. December 6, 2017

Package catchr. R topics documented: December 30, Type Package. Title Catch, Collect, and Raise Conditions. Version 0.1.0

Package knitrprogressbar

Package nonlinearicp

Transcription:

Type Package Package pksea December 22, 2017 Title Prediction-Based Kinase-Substrate Enrichment Analysis Version 0.0.1 A tool for inferring kinase activity changes from phosphoproteomics data. 'pksea' uses kinase-substrate prediction scores to weight observed changes in phosphopeptide abundance to calculate a phosphopeptide-level contribution score, then sums up these contribution scores by kinase to obtain a phosphoproteome-level kinase activity change score (KAC score). 'pksea' then assesses the significance of changes in predicted substrate abundances for each kinase using permutation testing. This results in a permutation score (pksea significance score) reflecting the likelihood of a similarly high or low KAC from random chance, which can then be interpreted in an analogous manner to an empirically calculated p-value. 'pksea' contains default databases of kinase-substrate predictions from 'NetworKIN' (NetworKINPred_db) <http://networkin.info> Horn, et. al (2014) <doi:10.1038/nmeth.2968> and of known kinase-substrate links from 'PhosphoSitePlus' (KSEAdb) <https://www.phosphosite.org/> Hornbeck PV, et. al (2015) <doi:10.1093/nar/gku1267>. Depends R (>= 3.3.0) License MIT + file LICENSE Encoding UTF-8 LazyData true RoxygenNote 6.0.1 Maintainer Peter Liao <pll21@case.edu> NeedsCompilation no Author Peter Liao [aut, cre] Repository CRAN Date/Publication 2017-12-22 18:46:08 UTC 1

2 batchrun R topics documented: batchrun........................................... 2 compare........................................... 3 get_matched_data...................................... 4 KSEAdb........................................... 5 NetworKINPred_db..................................... 5 run_on_matched...................................... 6 Index 7 batchrun Running pksea::compare() on multiple files For running compare() on multiple CSV data files in the same directory and for writing results to a folder in the designated data directory. Can receive various arguments to be passed on to downstream functions. Writes to tempdir() unless outputpath variable is specified by user (argument passed on to results_write). batchrun(summaryfiledir, commonfilestring = ".csv", predictiondb, results_folder = NULL,...) summaryfiledir Directory containing summary statistic CSV files. Required data file columns: GN = gene name identifier that will be matched with prediction database, Peptide = unique peptide identifier (for example, sequence with modifications), Phosphosites = comma-separated phosphorylation sites (eg. "T102,S105"), pval= pairwise test p-value, fc= mean fold change, t= pairwise test t-statistic. pval and fc are used for results reporting only, all others are important for database searching, calculation, and permutation testing. commonfilestring Common string identifying all files to be included in analysis predictiondb Input database whose prediction scores will be used for calculations. Required columns: substrate_name= name of substrate corresponding to GN in summary_data, kinase_id = identifiers for kinase predictors, position= phosphorylated residue number, score = numeric score for strength of prediction. results_folder if desired, a single output folder. Else each run performed on each file will have a separate output folder identified by run initiation time.... parameters to be passed on to downstream functions, including(default): outputpath (tempdir()) n_permutations (1000), seed (123), kseadb (NULL), kin_ens_table (NULL). See run_on_matched, compare for details.

compare 3 #point to data directory that contains summary.csv files datapath <- system.file("extdata", package = "pksea") #run batchrun function to analyze all files in that folder, with options batchrun(datapath, predictiondb=networkinpred_db, kseadb = KSEAdb, n_permutations = 5) compare Running analysis runs on known substrates, predicted substrates, and both. Performs up to three run_on_matched() runs on summary-prediction matcheddata from get_matched_data(), returning permutation significance score results. If a KSEA database is provided for filtering and comparison, one full analysis will be performed on all phosphosites, one on data with all known kinase substrates removed according to the provided KSEA database, and one on known kinase substrates only. compare(matched_data, predictiondb, kseadb,...) matched_data predictiondb kseadb File path to summary statistic phosphoproteomics CSV data file with an entry for each phosphopeptide. Required data file columns: GN = gene name identifier that will be matched with prediction database, Peptide = unique peptide identifier (for example, sequence with modifications), Phosphosites = commaseparated phosphorylation sites (eg. "T102,S105"), pval= pairwise test p-value, fc= mean fold change, t= pairwise test t-statistic. pval and fc are used for results reporting only, all others are important for database searching, calculation, and permutation testing. Input database whose prediction scores will be used for calculations. Required columns: substrate_name= name of substrate corresponding to GN in summary_data, kinase_id = identifiers for kinase predictors, position= phosphorylated residue number, score = numeric score for strength of prediction. Optional KSEA database for filtering purposes. Containing substrate gene name "SUB_GENE" and phosphorylated residue "SUB_MOD_RSD" in standard form (ie. T302).... optional parameters to be passed on to downstream functions, including (default): n_permutations (1000), seed (123), kin_ens_table (NULL). See run_on_matched for details.

4 get_matched_data #Read in example summary statistics dataset from csv summarydata_ex <- read.csv(system.file("extdata", "example_data1.csv", package="pksea")) #Get matched data using predictions from NetworKIN matched_data_ex <- get_matched_data(summarydata_ex, NetworKINPred_db) #Perform comparative analysis using provided KSEAdb as filter ## Not run: compare_results_ex <- compare(matched_data_ex, kseadb = KSEAdb, n_permutations = 10) ## End(Not run) get_matched_data Filtering data to matched predictions This function reformats summary statistic phosphoproteomicdata to single observations for each phosphorylation site, duplicating other fields for multiple sites on the same peptide. Next, it attempts to find predictions for each phosphorylation site in the provided database. It returns observations (phosphorylation sites) for which a prediction is detected in the database, matching based on HUGO gene name and phosphorylated residue. get_matched_data(datafull, predictiondb) datafull predictiondb Statistical summary data with an entry for each phosphopeptide. Required columns: GN = gene name identifier that will be matched with prediction database, Peptide = unique peptide identifier (for example, sequence with modifications), Phosphosites = comma-separated phosphorylation sites (eg. "T102,S105"), pval= pairwise test p-value, fc= mean fold change, t= pairwise test t-statistic. pval and fc are used for results reporting only, all others are important for database searching, calculation, and permutation testing. Input database whose prediction scores will be used for calculations. Required columns: substrate_name= name of substrate corresponding to GN in datafull, kinase_id = identifiers for kinase predictors, position= phosphorylated residue number, score = numeric score for strength of prediction. #Read in example summary statistics dataset from csv summarydata_ex <- read.csv(system.file("extdata", "example_data1.csv", package="pksea")) #Get matched data using predictions from NetworKIN matched_data_ex <- get_matched_data(summarydata_ex, NetworKINPred_db)

KSEAdb 5 KSEAdb KSEAdb Format Source A data table containing all known kinase-substrate links known in PhosphoSitePlus. KSEAdb An object of class data.frame with 240749 rows and 6 columns. https://www.phosphosite.org/staticdownloads.action References Hornbeck PV, Zhang B, Murray B, Kornhauser JM, Latham V, Skrzypek E PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 2015 43:D512-20. NetworKINPred_db NetworKINPred_db Format Source A data table containing all precalculated NetworKIN predictions performed on known ensembl sequences. NetworKINPred_db An object of class data.frame with 450418 rows and 4 columns. http://networkin.info/download.shtml References Horn et al., KinomeXplorer: an integrated platform for kinome biology studies. Nature Methods 2014 Jun;11(6):603 4.

6 run_on_matched run_on_matched Runs pksea analysis on a dataset result from get_matched_data. Calculates score contributions from summary statistics (tscore) and prediction scores, and sums contribution scores by kinase to calculate raw kinase activity change scores (KAC scores). Performs permutation test on summary statistic data to assess significance of kinase activity change scores, and reports significance as a percentile score (pksea significance score). run_on_matched(matched_data, n_permutations = 1000, seed = 123, kin_ens_table = NULL) matched_data data after filtering against predictions (results from get_matched_data()) n_permutations number of mutations to perform (default 1000) seed kin_ens_table seed used for permutation testing optional table for inclusion of matched ensembl ids for kinases, with columns: ens = ensembl id, kinases = kinase_id as otherwise used #Read in example summary statistics dataset from csv summarydata_ex <- read.csv(system.file("extdata", "example_data1.csv", package="pksea")) #Get matched data using predictions from NetworKIN matched_data_ex <- get_matched_data(summarydata_ex, NetworKINPred_db) #Perform single run of pksea analysis single_run_results_ex <- run_on_matched(matched_data_ex, n_permutations = 10)

Index Topic datasets KSEAdb, 5 NetworKINPred_db, 5 batchrun, 2 compare, 2, 3 get_matched_data, 4 KSEAdb, 5 NetworKINPred_db, 5 results_write, 2 run_on_matched, 2, 3, 6 7