Package pcagopromoter

Similar documents
Package methylgsa. March 7, 2019

Package geecc. R topics documented: December 7, Type Package

Package RmiR. R topics documented: September 26, 2018

Package GOFunction. April 13, 2019

Package MiRaGE. October 16, 2018

Package geecc. October 9, 2015

Package BAC. R topics documented: November 30, 2018

Package ChIPXpress. October 4, 2013

Package EMDomics. November 11, 2018

Package GSCA. October 12, 2016

Package GSCA. April 12, 2018

Package POST. R topics documented: November 8, Type Package

Package SCAN.UPC. July 17, 2018

Package MLP. February 12, 2018

Package ffpe. October 1, 2018

Package OrderedList. December 31, 2017

Package GSRI. March 31, 2019

Package snm. July 20, 2018

Package CONOR. August 29, 2013

Package PGSEA. R topics documented: May 4, Type Package Title Parametric Gene Set Enrichment Analysis Version 1.54.

Package sscore. R topics documented: June 27, Version Date

Package SCAN.UPC. October 9, Type Package. Title Single-channel array normalization (SCAN) and University Probability of expression Codes (UPC)

Package GeneNetworkBuilder

Expander Online Documentation

Package M3Drop. August 23, 2018

ChIPXpress: enhanced ChIP-seq and ChIP-chip target gene identification using publicly available gene expression data

Package frma. R topics documented: March 8, Version Date Title Frozen RMA and Barcode

Package les. October 4, 2013

Package biosigner. March 6, 2019

Introduction to GE Microarray data analysis Practical Course MolBio 2012

Package virtualarray

/ Computational Genomics. Normalization

Package mgsa. January 13, 2019

Package RcisTarget. July 17, 2018

Package cgh. R topics documented: February 19, 2015

Package clusterseq. R topics documented: June 13, Type Package

Package macat. October 16, 2018

Package RTNduals. R topics documented: March 7, Type Package

Package cycle. March 30, 2019

Package twilight. August 3, 2013

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1

Package clusterprofiler

Package clusterprofiler

Package CorporaCoCo. R topics documented: November 23, 2017

Package BrainStars. July 19, 2018

Package rgreat. June 19, 2018

Package dualks. April 3, 2019

Package transcriptogramer

Package TilePlot. April 8, 2011

mirnet Tutorial Starting with expression data

ROTS: Reproducibility Optimized Test Statistic

Package ChIPseeker. July 12, 2018

Package mirnapath. July 18, 2013

Package LMGene. R topics documented: December 23, Version Date

Package macorrplot. R topics documented: October 2, Title Visualize artificial correlation in microarray data. Version 1.50.

Package tspair. July 18, 2013

Package BgeeDB. January 5, 2019

Drug versus Disease (DrugVsDisease) package

Package MethylSeekR. January 26, 2019

Lecture 5. Functional Analysis with Blast2GO Enriched functions. Kegg Pathway Analysis Functional Similarities B2G-Far. FatiGO Babelomics.

Package SC3. September 29, 2018

Package KEGGlincs. April 12, 2019

Package TilePlot. February 15, 2013

CARMAweb users guide version Johannes Rainer

Package madsim. December 7, 2016

WebGestalt Manual. January 30, 2013

Package ABSSeq. September 6, 2018

Application of Hierarchical Clustering to Find Expression Modules in Cancer

Package polyphemus. February 15, 2013

Package LiquidAssociation

Package TDCor. October 26, 2015

Package multihiccompare

Package STROMA4. October 22, 2018

Package cghmcr. March 6, 2019

Package ssizerna. January 9, 2017

Course on Microarray Gene Expression Analysis

Package yaqcaffy. January 19, 2019

The genexplain platform. Workshop SW2: Pathway Analysis in Transcriptomics, Proteomics and Metabolomics

Package methyvim. April 5, 2019

Package ccmap. June 17, 2018

Package quantsmooth. R topics documented: May 4, Type Package

Package pandar. April 30, 2018

Package roar. August 31, 2018

Step-by-Step Guide to Advanced Genetic Analysis

Package anota2seq. January 30, 2018

Package demi. February 19, 2015

Package Modeler. May 18, 2018

Package rmat. November 1, 2018

Package igc. February 10, 2018

Package RLMM. March 7, 2019

Package lol. R topics documented: December 13, Type Package Title Lots Of Lasso Version Date Author Yinyin Yuan

Package categorycompare

GeneSifter.Net User s Guide

Release Notes. JMP Genomics. Version 4.0

Package LncPath. May 16, 2016

safe September 23, 2010

Package BDMMAcorrect

Package SingleCellExperiment

Package EventPointer

Package AffyExpress. October 3, 2013

Transcription:

Version 1.26.0 Date 2012-03-16 Package pcagopromoter November 13, 2018 Title pcagopromoter is used to analyze DNA micro array data Author Morten Hansen, Jorgen Olsen Maintainer Morten Hansen <mhansen@sund.ku.dk> Description This package contains functions to ease the analyses of DNA micro arrays. It utilizes principal component analysis as the initial multivariate analysis, followed by functional interpretation of the principal component dimensions with overrepresentation analysis for GO terms and regulatory interpretations using overrepresentation analysis of predicted transcription factor binding sites with the primo algorithm. biocviews GeneExpression, Microarray, GO, Visualization Imports AnnotationDbi Depends R (>= 2.14.0), ellipse, Biostrings Suggests Rgraphviz, GO.db, hgu133plus2.db, mouse4302.db, rat2302.db, hugene10sttranscriptcluster.db, mogene10sttranscriptcluster.db, pcagopromoter.hs.hg19, pcagopromoter.mm.mm9, pcagopromoter.rn.rn4, serumstimulation, parallel LazyLoad yes License GPL (>= 2) git_url https://git.bioconductor.org/packages/pcagopromoter git_branch RELEASE_3_8 git_last_commit 89221ea git_last_commit_date 2018-10-30 Date/Publication 2018-11-12 R topics documented: pcagopromoter-package.................................. 2 GOtree............................................ 2 pca.............................................. 5 primo............................................ 7 primodata.......................................... 9 Index 11 1

2 GOtree pcagopromoter-package Tools for analyzing data from DNA micro arrays. Description Welcome to the pcagopromoter package. The purpose of this package is to ease the analyses of data from genome wide gene expression analysis. Multiple platforms can be used (Affymetrix GeneChip, Illumina expression beadchips, qrt-pcr and more). Probe identifiers should be either Affymetrix Probe Set ID, Gene Symbol or Entrez gene ID. The overall analysis strategy is to conduct principal component analysis followed by the interpretation of the principal component dimensions by functional annotation for biological function and prediction of regulatory transcription factor networks. You can learn what objects this package supports with the following command: ls("package:pcagopromoter") Each of these objects has their own manual page detailing where relevant data was obtained along with some examples of how to use it. Author(s) Morten Hansen <mhansen@sund.ku.dk> and Jorgen Olsen <jolsen@sund.ku.dk> Maintainer: Morten Hansen <mhansen@sund.ku.dk> Examples ls("package:pcagopromoter") GOtree GOtree, plot.gotree and GOtreeWithLeaveOneOut Description GOtree finds significantly overrepresented Gene Ontology terms in a list of probes (only Biological processes) and return an object of type GOtree. print.gotree list significant GO terms from an object of type GOtree. plot.gotree creates a visual representation of the GO connection from an object of type GOtree. GOtreeHits return the genes/probes for a specific GO term. GOtreeWithLeaveOut returns the same as GOtree, but run through the samples multiple times with Leave one out cross-validation.

GOtree 3 Usage GOtree(input, inputtype = "hgu133plus2", org = "Hs", statisticaltest = "binom", binomalpha = NA, p.adjust.method = "fdr") ## S3 method for class 'GOtree' print(x,...) ## S3 method for class 'GOtree' plot(x, boxes = 25, legendposition = "topright", main = "Gene Ontology tree, biological processes",...) GOtreeHits(input, inputtype = "hgu133plus2", org = "Hs", GOid, returngenesymbols = TRUE) GOtreeWithLeaveOut(exprsData, inputtype = "hgu133plus2", org = "Hs", pc = 1, decreasing = TRUE, noprobes = 1000, leaveout = 1, runs = NCOL(exprsData)) Arguments input inputtype a character vector of Affymetrix probe ids, gene symbols or Entres gene IDs. a character vector description the input type. Must be Affymetrix chip type, "genesymbol" or "entrezid". The following Affymetrix chip type are supported: hgu133plus2, mouse4302, rat2302, hugene10st and mogene10st. Default is Affymetrix chip type "hgu133plus2". org a character vector with the organism. Can be "Hs", "Mm" or "Rn". Only needed if inputtype is "genesymbol" or "entrezid". See details. Default is "Hs". statisticaltest a character vector with the statistical method to be used. Can be "binom" or "fisher". Default is "binom". binomalpha a value with the pvalue for use in self contained test. p.adjust.method the method for adjust p-values due to multiple testing. This will come in a separate column. x boxes an object of type GOtree. an integer indication the amount of boxes (terms) in the plot. legendposition a vector description the position of the legend. See?xy.coords for possibilities. Set to NULL for no legend. Default is "topright". main a title for the GO tree plot... other parameters to be passed through to plotting functions. GOid a vector with the GO term of interest. returngenesymbols a logical indication whether gene symbols or probe ids should be returned. Default is gene symbols. exprsdata A table with expression data. Row names should be probe identifiers (Affymetrix Probe set ID, Gene Symbols or Entrez gene ID). Column names should be sample identifiers.

4 GOtree pc decreasing noprobes leaveout runs a number indication which principal component to extract the probe list based on the loading values from the pca. a logical value indication whether the loadings should be sorted in decreasing of ascending order ( decreasing == FALSE ). Decreasing order yields information about the positive direction and ascending order about the negative direction of the particular principal component. a number indicating the number of probes included in the calculations a number indication what percentage to leave out in the cross-validation. If set to 1, each observation would be left out once and runs is set equal to number of observations. Deafault is 1. a number indicating how many times to run with leave out. If leaveout = 1, runs is overrided with number of observations. Details Value Note GOtree returns a GOtree object. In contains a list of significant GO terms. plot() generated a visual plot of the GO tree. GOtreeHits returns a vector with the genes/probes in a specific GO term. GOtreeWithLeaveOut repeats function GOtree, but with different input. GOtreeWithLeaveOut takes a table of expression data as input, performs PCA, extracts probes / genes for the specified principal component and subsequenly performs GOtree. This is repeated the specified number of times. It can run with leave one out or with leave out a percentage. Only the GO terms that is found overrepresented in all the runs "qualifies" and a new p-value is calculated as the median of the p-value from all the runs. An object of type GOtree is returned. GOtree returns a object of type GOtree. GOtreeWithLeaveOut returns a object of type GOtree. GOtreeWithLeaveOut may take some time to run - depending on the number of samples. Author(s) Morten Hansen <mhansen@sund.ku.dk> and Jorgen Olsen <jolsen@sund.ku.dk> Examples library(serumstimulation) data(serumstimulation) pcaoutput <- pca( serumstimulation ) posloadings <- getrankedprobeids( x=pcaoutput ) GOs <- GOtree( input=posloadings[1:1000] ) GOs plot(gos, legendposition=null) ## Not run: GOs <- GOtreeWithLeaveOut( exprsdata=serumstimulation ) ## End(Not run)

pca 5 pca pca - Principal Component Analysis (PCA) Description Usage pca - This is a wrapper function around prcomp from stats package. It will prepare the DNA micro array data, run the principal component analysis and return an object of type pca plot.pca - Makes a score plot with legend of object type pca. getprobeids.pca - Get probe ids from PCA loadings. pcainfoplot - Makes a informative score plot with optional overrepresented GO terms and predicted transcription factor binding sites annotated along the axis. pca(edata, printdropped = TRUE, scale=true, center=true) ## S3 method for class 'pca' plot(x, groups, PCs = c(1,2), printnames = TRUE, symbolcolors = TRUE, plotci = TRUE, GOtreeObjs, primoobjs, main,...) ## S3 method for class 'pca' getrankedprobeids(x, pc = 1, decreasing = TRUE) pcainfoplot(edata, inputtype = "hgu133plus2", org = "Hs", groups, printnames = TRUE, plotci = TRUE, noprobes = 1365, GOtermsAnnotation = TRUE, primoannotation = TRUE) Arguments edata printdropped scale center x groups PCs printnames symbolcolors an ExpressionSet object or a table with expression data with variables in rows and observations in columns. Row names should be probe identifiers (Affymetrix probe set ID, Gene Symbol or Entrez IDs), column names should be sample identifiers. a logical value determination whether or not dropped probes should be printed. Probes are dropped if all values are equal. a logical value determination whether or not input values should be scaled. Default is TRUE. a logical value determination whether or not input values should be centered. Default is TRUE. Object of type pca. Created by function pca. a factor containing group information. an integer vector indication which principal components to use in the score plot. Exactly two principal components should be given. a logical value indicating whether observation names should be printed. Default is TRUE. a logical value indicating whether the symbols should be plotted with colors. Default is TRUE.

6 pca plotci GOtreeObjs a logical value indicating whether the confidence intervals for each group should be plotted. Default is TRUE. a list of GOtree objects which is used for annotation of the axis in the PCA plot - used by function pcainfoplot(). primoobjs a list of primo objects which is used for annotation of the axis in the PCA plot - used by function pcainfoplot(). main a character vector with the title of the plot. If no title is given, a default one will be provided. Set to NULL for no title.... other parameters to be passed through to plotting functions. pc decreasing a number indication which principal component to extract the probe identifiers from. a logical value indication whether the probe identifiers from PCA loadings should be sorted in decreasing of ascending order ( decreasing == FALSE ). inputtype a character vector with the input type. See?primo for details. Default is "hgu133plus2". org a character vector with the organism. See?primo for details. Default is "Hs". noprobes a numeric value indication the number of probes to use for GOtree and primo. GOtermsAnnotation a logical value to select GO terms annotation along the axis. The default is TRUE. primoannotation a logical value to select predicted transcription factor binding site annotation along the axis. The default is TRUE. Details pca uses prcomp to do the principal component analysis. The input data (x) is scaled and centered, so constant variables (sd = 0) will be removed to avoid divison by zero. plot.pca makes a PCA score plot. The input data is an object of type pca. The score plot is formatted and the observations are colored according to the class. A legend describing the class is placed below the plot. The proportion of variance is plotted along the axis. getrankedprobeids.pca extract the probe identifiers from PCA loadings from an object of type pca. The purpose of this function is to ease the extraction of probe identifiers from an object of type pca. It generally just gets the rownames from the loadings and sort it according to pc and decreasing. Value pca returns the an object of type pca. getrankedprobeids return a vector with the probe ids sorted from higher to lower if decreasing = TRUE or from lower to higher if decreasing = FALSE. Author(s) Morten Hansen <mhansen@sund.ku.dk> and Jorgen Olsen <jolsen@sund.ku.dk> See Also prcomp.

primo 7 Examples library(serumstimulation) data(serumstimulation) groups <- as.factor(c(rep("control", 5), rep("seruminhib", 5), rep("serumonly", 3))) pcaoutput <- pca(serumstimulation) pcaoutput plot(pcaoutput, groups=groups) posloadings <- getrankedprobeids(pcaoutput) primo primo, primohits and primowithleaveout Description Usage primo returns the predicted transcription factor binding sites for a given subset of probes. primohits returns a list of refseqs where a given pwm scores above a threshold. primowithleaveout returns the same as primo, but run through the samples multiple times with applied Leave out cross-validation settings. primo( input, inputtype = "hgu133plus2", org = "Hs", PvalueCutOff = 0.05, cutoff = 0.9, p.adjust.method = "fdr", printignored = FALSE, primodata = NULL ) primohits( input, inputtype = "hgu133plus2", org = "Hs", id, cutoff = 0.9, printignored = FALSE, primodata = NULL ) primowithleaveout( exprsdata, inputtype = "hgu133plus2", org = "Hs", pc = 1, decreasing = TRUE, noprobes = 1000, leaveout=1, runs = NCOL(exprsData) ) Arguments input inputtype org PvalueCutOff a character vector with the input. Can be Affymetrix probeset IDs, Entrez IDs or gene symbols. a character vector description the input type. Must be Affymetrix chip type, "genesymbol" or "entrezid". The following Affymetrix chip type are supported: hgu133plus2, mouse4302, rat2302, hugene10st and mogene10st. Default is Affymetrix chip type "hgu133plus2". a character vector with the organism. Can be "Hs", "Mm" or "Rn". Only needed if inputtype is "genesymbol" or "entrezid". See details. Default is "Hs". a P-value cut off. All P-value that are below this cut off will be returned. cutoff the percentage of promoters that each TF should bind to in theory p.adjust.method the method for adjust p-values due to multiple testing. Default is "fdr" (False Discovery Rate). See manual pages p.adjust for possibilities. printignored Should the ignored RefSeqs be printed? Default is FALSE.

8 primo primodata id exprsdata pc decreasing noprobes leaveout runs primodata object from function primodata. See manual pages for primodata for more information. Default is NULL. a vector with the pwm id used to retrieve the probes with the corresponding predicted hits. The ids should be taken from the "id" column of the primo output. a table with expression data. Variables in rows and observations in columns. Row names should be probe identifiers (Affymetrix probe set ID, Gene Symbol or Entrez IDs), column names should be sample identifiers. a number indication which principal component to extract the loadings from. a logical value indication whether the loadings should be sorted in decreasing of ascending order ( decreasing == FALSE ). a number indicating how many probe loadings to use a number indication what percentage to leave out. If set to 1, each observation would be left out once and runs is set equal to number of observations. Deafault is 1. a number indicating how many times to run with leave out. If leaveout = 1, runs is overrided with number of observations. Details primo returns the predicted transcription factor binding sites for a given set of probes (Affymetrix probe ids, Entrez ID or gene symbols). Note: All input values are first converted into refseqs. primohits returns all refseqs representing promoters where the pwm scores above the threshold. The threshold is based on the cutoff percentage. For each promoter the maximum value of the pwm is collected and sorted in increasing order. The threshold is the pwm score at the cutoff percentage in the list. Each promoter that has a scores above the threshold is considered to have potentionel binding site of the transcription factor. primowithleaveout repeats function primo, but with different input. primowithleaveout runs primo the specified number of times. It can with leave one out or with leave out a percentage. Only the pwns that is represented in all the runs "qualifies" and a new p-value is calculated as the median of the p-value from all the runs. The list of over and under represneted pwms is returned. Value upregulated a data frame with the possible transcription factors for up regulated genes. downregulated a data frame with the possible transcription factors for down regulated genes. primohits returns a character vector of probe ids (or refseqs). primowithleaveoneout returns the same data structure as primo. Note The P-values in column pvalue of the output is not corrected for multiple testing. primowithleaveoneout takes some time to run - especially if the is a lot of samples. Author(s) Morten Hansen <mhansen@sund.ku.dk> and Jorgen Olsen <jolsen@sund.ku.dk>

primodata 9 References Stegmann, A., Hansen, M., et al. (2006). "Metabolome, transcriptome, and bioinformatic ciselement analyses point to HNF-4 as a central regulator of gene expression during enterocyte differentiation." Physiological Genomics 27(2): 141-155. Examples library(serumstimulation) data(serumstimulation) pcaoutput <- pca( serumstimulation ) negloadings <- getrankedprobeids( pcaoutput, pc=1, decreasing=false ) TFs <- primo( negloadings[1:1000] ) probeids <- primohits( negloadings, id=9326 ) ## Not run: TFs <- primowithleaveout( serumstimulation ) ## End(Not run) primodata primodata Description Usage primodata calculated the data objects used by primo(). primodata(promoters, matrices, cleanuppromoters = TRUE) Arguments Details promoters a list with promoter data. See details. matrices a list with matrices. See details. cleanuppromoters Should some house-keeping procedures (remove dublets, map common name etc.) be preformed on promoters? primodata makes a custom data object for function primo(). primodata needs 2 arguments: promoters and matrices. promoter must consist of 2 list: a list with the RefSeq names for all the promoters and a list with all the promoter sequences in lowercase letters. The length of the two lists must be the same. The function pcagopromoters:::primodata.getpromoter( filename ) can be used to load promoter from a file in FASTA format. matrices must consist of one list of lists. Each list element must have a unique name and holds 3 data elements: baseid, name and pwm. baseid is a character vector with a base id. name is a character vector with the common name for the matrix. pwm is a position weighted matrix with the base A,C,G,T in rows and the weights in columns.

10 primodata Value Note Example of promoter and matrices data types is shown in the wiki at:??? primodata returns an object of type primodata. It contains to list - threshold and promotersrefseqs. thresholds is a list of lists. Each list element have a unique name and holds 4 data elements: baseid, name, pwmlength and maxscores. baseid is a charactor vector with a base id. name is a character vector with the common name for the matrix. pwmlength is the length of the position weighted matrix. maxscores holds the max score from all the promoters. promotersrefseq is a list with the Refseq names for all the promoters. Some promoters can have one that one refseq name. These names maps to the maxscores from thresholds. primodata returns an object to be used by primo() as argument primodata. primodata takes very long time (many hours) to run. Install package multicore and use a computer with multiple cores. Author(s) Morten Hansen <mhansen@sund.ku.dk> and Jorgen Olsen <jolsen@sund.ku.dk> References Stegmann, A., Hansen, M., et al. (2006). "Metabolome, transcriptome, and bioinformatic ciselement analyses point to HNF-4 as a central regulator of gene expression during enterocyte differentiation." Physiological Genomics 27(2): 141-155. Examples ## Not run: myprimodata <- primodata( mypromoters, mymatrices) ## End(Not run)

Index Topic methods GOtree, 2 pca, 5 primo, 7 primodata, 9 Topic package pcagopromoter-package, 2 getrankedprobeids (pca), 5 GOtree, 2 GOtreeHits (GOtree), 2 GOtreeWithLeaveOut (GOtree), 2 pca, 5 pcagopromoter (pcagopromoter-package), 2 pcagopromoter-package, 2 pcainfoplot (pca), 5 plot.gotree (GOtree), 2 plot.pca (pca), 5 primo, 7 primodata, 9 primohits (primo), 7 primowithleaveout (primo), 7 print.gotree (GOtree), 2 print.pca (pca), 5 11