Annotation and Gene Set Analysis with R y Bioconductor
|
|
- Janis Matthews
- 5 years ago
- Views:
Transcription
1 Annotation and Gene Set Analysis with R y Bioconductor Alex Sánchez Statistics and Bioinformatics Research Group Departament de Estadística. Universitat de Barcelona April 22, 2013 Contents 1 Introduction The estrogen case study Probe annotation information Probe annotation information using array specific annotation package Creating Annotated Results Tables (or how to get pretty output) 6 4 Species specific annotation packages 7 1 Introduction In this lab methods for the annotation of genes and also for the analysis of biological significance based on lists of genes are discussed and exemplified. These methods usually rely on one or more list of genes obtained after gene selection process. For the sake of completitude the process of selecting differentially expressed genes is reproduced below, although it is not discussed because it has been treated elsewhere. 1.1 The estrogen case study Data for the analyses are obtained from the estrogen dataset, available in the estrogen package. > if (!(require(estrogen))){ + source(" + bioclite("estrogen") + library(estrogen) + } 1
2 > estrogendir <- system.file("extdata", package = "estrogen") > # print(estrogendir) > workingdir <- getwd() > datadir <- file.path(workingdir, "datos") > if (!file.exists("datos")) system ("mkdir datos") > resultsdir <- file.path(workingdir, "results") > if (!file.exists("results")) system ("mkdir results") First, data are read from the package data directory. > require(biobase) > require(affy) > sampleinfo <- read.annotateddataframe(file.path(estrogendir,"targlimma.txt"), + header = TRUE, row.names = 1, sep="\t") > filenames <- pdata(sampleinfo)$filename > rawdata <- read.affybatch(filenames=file.path(estrogendir,filenames), + phenodata=sampleinfo) Exploration and quality control are omitted because they have been preented elsewhere. We go straight to normalization followed by non-specific filtering. > stopifnot(require(affy)) > eset_rma <- rma(rawdata) Background correcting Normalizing Calculating Expression > save(eset_rma, file=file.path(datadir,"normalized.rda")) > if(!(require(genefilter))) bioclite("genefilter") > if(!(require("hgu95av2.db"))) bioclite("hgu95av2.db") > filtrats <- nsfilter(eset_rma) Gene selection is done using on the linear model approach defined in the limma package. > cont.matrix <- makecontrasts ( + Estro10=(est10h-neg10h), + Estro48=(est48h-neg48h), + Tiempo=(neg48h-neg10h), + levels=design) > cont.matrix Contrasts Levels Estro10 Estro48 Tiempo neg10h est10h neg48h est48h
3 > toptabestro10 <- toptable (fit.main, number=nrow(fit.main), coef="estro10", adjust="fdr") > toptabestro48 <- toptable (fit.main, number=nrow(fit.main), coef="estro48", adjust="fdr") > toptabtiempo <- toptable (fit.main, number=nrow(fit.main), coef="tiempo", adjust="fdr") > save(toptabestro10, toptabestro48, toptabtiempo, file=file.path(resultsdir, "toptables.rd To select genes that are changed in either one comparison or another we rely on the decidetests function. Estro10 Estro48 Tiempo > probenames<-rownames(res) > probenames.selected<-probenames[sum.res.rows!=0] > exprsselected <-exprs(eset_rma)[probenames.selected,] > save(exprsselected, file=file.path(resultsdir, "exprsselected.rda")) 2 Probe annotation information The Bioconductor project provides software for associating microarray and other genomic data in real time to biological metadata from web databases such as GenBank, LocusLink and PubMed (annotate package). Functions are also provided for incorporating the results of statistical analysis in HTML reports with links to annotation WWW resources. Software tools are available for assembling and processing genomic annotation data, from databases such as GenBank, the Gene Ontology Consortium, Entrez, UniGene or the UCSC Human Genome Project (AnnotationDbi package). Data packages are distributed to provide mappings between different probe identifiers (e.g. Affy IDs, Entrez, PubMed). Customized annotation libraries can also be assembled. Use of Bioconductor annotation for Affymetrix arrays is illustrated below. We will use alternative approaches to obtain probe annotation. 2.1 Probe annotation information using array specific annotation package The purpose of the an annotation package, say hgu95av2.db package is to provide detailed information about the hgu95av2 platform. To use it it must first be loaded: > library(hgu95av2.db) We can try different options for displaying information about the content of the package: > require(hgu95av2.db) > ls("package:hgu95av2.db") [1] "hgu95av2" "hgu95av2accnum" "hgu95av2alias2probe" [4] "hgu95av2chr" "hgu95av2chrlengths" "hgu95av2chrloc" 3
4 [7] "hgu95av2chrlocend" "hgu95av2.db" "hgu95av2_dbconn" [10] "hgu95av2_dbfile" "hgu95av2_dbinfo" "hgu95av2_dbschema" [13] "hgu95av2ensembl" "hgu95av2ensembl2probe" "hgu95av2entrezid" [16] "hgu95av2enzyme" "hgu95av2enzyme2probe" "hgu95av2genename" [19] "hgu95av2go" "hgu95av2go2allprobes" "hgu95av2go2probe" [22] "hgu95av2map" "hgu95av2mapcounts" "hgu95av2omim" [25] "hgu95av2organism" "hgu95av2orgpkg" "hgu95av2path" [28] "hgu95av2path2probe" "hgu95av2pfam" "hgu95av2pmid" [31] "hgu95av2pmid2probe" "hgu95av2prosite" "hgu95av2refseq" [34] "hgu95av2symbol" "hgu95av2unigene" "hgu95av2uniprot" > head(ls("package:hgu95av2.db"), n = 10) [1] "hgu95av2" "hgu95av2accnum" "hgu95av2alias2probe" [4] "hgu95av2chr" "hgu95av2chrlengths" "hgu95av2chrloc" [7] "hgu95av2chrlocend" "hgu95av2.db" "hgu95av2_dbconn" [10] "hgu95av2_dbfile" > hgu95av2() Quality control information for hgu95av2: This package has the following mappings: hgu95av2accnum has mapped keys (of keys) hgu95av2alias2probe has mapped keys (of keys) hgu95av2chr has mapped keys (of keys) hgu95av2chrlengths has 93 mapped keys (of 93 keys) hgu95av2chrloc has mapped keys (of keys) hgu95av2chrlocend has mapped keys (of keys) hgu95av2ensembl has mapped keys (of keys) hgu95av2ensembl2probe has 9677 mapped keys (of keys) hgu95av2entrezid has mapped keys (of keys) hgu95av2enzyme has 2154 mapped keys (of keys) hgu95av2enzyme2probe has 791 mapped keys (of 975 keys) hgu95av2genename has mapped keys (of keys) hgu95av2go has mapped keys (of keys) hgu95av2go2allprobes has mapped keys (of keys) hgu95av2go2probe has mapped keys (of keys) hgu95av2map has mapped keys (of keys) hgu95av2omim has mapped keys (of keys) hgu95av2path has 5504 mapped keys (of keys) hgu95av2path2probe has 228 mapped keys (of 229 keys) hgu95av2pfam has mapped keys (of keys) hgu95av2pmid has mapped keys (of keys) hgu95av2pmid2probe has mapped keys (of keys) hgu95av2prosite has mapped keys (of keys) hgu95av2refseq has mapped keys (of keys) hgu95av2symbol has mapped keys (of keys) hgu95av2unigene has mapped keys (of keys) 4
5 hgu95av2uniprot has mapped keys (of keys) Additional Information about this package: DB schema: HUMANCHIP_DB DB schema version: 2.1 Organism: Homo sapiens Date for NCBI data: 2012-Sep4 Date for GO data: Date for KEGG data: 2011-Mar15 Date for Golden Path data: 2010-Mar22 Date for Ensembl data: 2012-Jul31 >?hgu95av2unigene > head(totable(hgu95av2unigene)) probe_id unigene_id _at Hs _at Hs _f_at Hs _s_at Hs _at Hs _at Hs We will now use some of the functions provided by the annotate package. The basic purpose of this package is to supply interface routines for getting data out of specific meta-data libraries (e.g. hu95av2.db) It is easy to get information about individual probes or a list of probes using the get/mget functions: > get("38187_at", hgu95av2genename) [1] "N-acetyltransferase 1 (arylamine N-acetyltransferase)" > affyid <- c("38187_at", "38912_at", "33825_at", "36512_at", "38434_at") > mget(affyid, hgu95av2genename) $`38187_at` [1] "N-acetyltransferase 1 (arylamine N-acetyltransferase)" $`38912_at` [1] "N-acetyltransferase 2 (arylamine N-acetyltransferase)" $`33825_at` [1] "serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 3" $`36512_at` [1] "arylacetamide deacetylase" $`38434_at` [1] "angio-associated, migratory cell protein" 5
6 Exercise : Try adding more annotation to the fit2 object generated in the linear model analysis described above. Add gene symbol and Entrez gene id Exercise: How many probes do not have a gene symbol? 3 Creating Annotated Results Tables (or how to get pretty output) It is possible to make reasonably nice looking HTML tables for presenting the results of a microarray analysis. These tables are a very nice format because you can insert clickable links to various public annotation databases, which facilitates the downstream analysis. In addition, the format is quite compact, can be posted on the web, and can be viewed using any number of free web browsers. The Bioconductor project supplies annotation packages for many of the more popular Affymetrix chips, as well as for many commercial spotted cdna chips. For chips that have annotation packages, the annaffy package is the preferred method for making HTML tables. In this example we will assume that we have analyzed an experiment using limma and that we have stored a top Table object with the most interesting genes into as an ASCII file, so that we begin reloading it into the computer. > if (!(exists("toptabestro48"))) load (file=file.path(resultsdir, "toptables.rda")) > toptab <- toptabestro48 > stopifnot(require(annotate)) > ### We will use ENTREZID codes to link with databases > gnames<-as.character(toptab$id) > # myenvirentrezid<-eval(parse(text = paste(anotpackage,"entrezid",sep=""))) > # gll<- mget(gnames, env = myenvirentrezid) > ### Add also gene symbols > # myenvirsymbol<-eval(parse(text = paste(anotpackage,"symbol",sep=""))) > # gsym <- mget(gnames, env = myenvirsymbol) > gll <- geteg(gnames, "hgu95av2.db") > gsym <- getsymbol(gnames, "hgu95av2.db") > linked <- list (misgenes=gll) > ### Prepare a dataframe to organize the output > othernames = data.frame(gll, gnames, round( toptab$logfc,4), round(toptab$t,4), + round(toptab$p.value, 6), round(toptab$adj.p.val,6), round(toptab$b,4)) > names(othernames) = c("genesymbol", "AffyID", "M", "t-stat", "p-val", "Adj. p-val", "B-st > htmlpage(linked, + filename =file.path(datadir, "Selected Genes.html"), + title = "Comparison of cell types after LPS treatment", + othernames = othernames, + table.head = c("locus ID", "Gene Symbol", "Affy ID", + "logfc", "t-stat", "p-val", "Adj. p-val","b-stat"), + table.center = TRUE, + repository=list("en")) A different approach can be obtained with the anaffy. 6
7 anaffy allows easy access to many types of annotations. Its use is more straightforward than that of htmlpage but it is restricted to Affymetrix chips. > source(" > if(!(require(annotate))) bioclite("annotate") > if(!(require("hgu95av2.db", character.only=t))) bioclite("hgu95av2.db") > if(!(require("kegg.db"))) bioclite("kegg.db", character.only=true) > if(!(require("go.db"))) bioclite("go.db") > if(!(require("annaffy"))) bioclite("annaffy") > atab <- aaftableann(toptab$id,"hgu95av2.db", aaf.handler() ) > savehtml(atab, file=file.path(datadir, "Annotations for Selected Genes.html")) See in the package vignette the section Building HTML pages to see how to build html pages combining annotations and results 4 Species specific annotation packages After some time of relying on plattform-specific annotation packages, (centered on the chips) it was decided to move the focus to organism-centered packages, allowing for a more flexible annotation system that does not depend on a specific brand dominating the market. > if(!(require(org.hs.eg.db))) bioclite("org.hs.eg.db") > require(kegg.db) > caff <- get("caffeine metabolism", + revmap(keggpathid2name)) > get(caff, revmap(org.hs.egpath)) [1] "9" "10" "1544" "1548" "1549" "1553" "7498" Exercise: Which gene symbols and gene names are associated with the following entrez gene Ids, 1544, 1548 and 1549? 7
Bioconductor: Annotation Package Overview
Bioconductor: Annotation Package Overview April 30, 2018 1 Overview In its current state the basic purpose of annotate is to supply interface routines that support user actions that rely on the different
More informationhgu95av2.db October 2, 2015 Map Manufacturer identifiers to Accession Numbers
hgu95av2.db October 2, 2015 hgu95av2accnum Map Manufacturer identifiers to Accession Numbers hgu95av2accnum is an R object that contains mappings between a manufacturer s identifiers and manufacturers
More informationAnnotationDbi: Introduction To Bioconductor Annotation Packages
AnnotationDbi: Introduction To Bioconductor Annotation Packages Marc Carlson March 18, 2015 PLATFORM PKGS GENE ID HOMOLOGY PKGS GENE ID ORG PKGS GENE ID ONTO ID TRANSCRIPT PKGS GENE ID SYSTEM BIOLOGY (GO,
More informationAnnotationDbi: Introduction To Bioconductor Annotation Packages
AnnotationDbi: Introduction To Bioconductor Annotation Packages Marc Carlson December 10, 2017 PLATFORM PKGS GENE ID HOMOLOGY PKGS GENE ID ORG PKGS GENE ID ONTO ID TRANSCRIPT PKGS GENE ID SYSTEM BIOLOGY
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationWorking with Affymetrix data: estrogen, a 2x2 factorial design example
Bioconductor exercises 1 Working with Affymetrix data: estrogen, a 2x2 factorial design example Practical Microarray Course, Heidelberg Oct 2003 Robert Gentleman, Wolfgang Huber 1.) Preliminaries. To go
More informationAnnotationDbi: How to use the.db annotation packages
AnnotationDbi: How to use the.db annotation packages Marc Carlson, Herve Pages, Seth Falcon, Nianhua Li April 7, 2011 1 Introduction 1.0.1 Purpose AnnotationDbi is used primarily to create mapping objects
More informationHow to use bimaps from the.db annotation packages
How to use bimaps from the.db annotation packages Marc Carlson, Herve Pages, Seth Falcon, Nianhua Li March 18, 2015 1 Introduction 1.0.1 Purpose AnnotationDbi is used primarily to create mapping objects
More informationHow to use bimaps from the ".db" annotation
How to use bimaps from the ".db" annotation packages Marc Carlson, Herve Pages, Seth Falcon, Nianhua Li May 7, 2018 NOTE The bimap interface to annotation resources is not recommend; instead, use the approach
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org kcoombes@mdanderson.org
More informationCreating a New Annotation Package using SQLForge
Creating a New Annotation Package using SQLForge Marc Carlson, Herve Pages, Nianhua Li November 19, 2013 1 Introduction The AnnotationForge package provides a series of functions that can be used to build
More informationBioconductor tutorial
Bioconductor tutorial Adapted by Alex Sanchez from tutorials by (1) Steffen Durinck, Robert Gentleman and Sandrine Dudoit (2) Laurent Gautier (3) Matt Ritchie (4) Jean Yang Outline The Bioconductor Project
More information7. Working with Big Data
7. Working with Big Data Thomas Lumley Ken Rice Universities of Washington and Auckland Auckland, November 2013 Large data R is well known to be unable to handle large data sets. Solutions: Get a bigger
More information7. Working with Big Data
7. Working with Big Data Thomas Lumley Ken Rice Universities of Washington and Auckland Seattle, July 2014 Large data R is well known to be unable to handle large data sets. Solutions: Get a bigger computer:
More informationRobert Gentleman! Copyright 2011, all rights reserved!
Robert Gentleman! Copyright 2011, all rights reserved! R is a fully functional programming language and analysis environment for scientific computing! it contains an essentially complete set of routines
More informationBioconductor annotation packages
Bioconductor annotation packages Major types of annotation in Bioconductor. AnnotationDbi packages: Organism level: org.mm.eg.db. Platform level: hgu133plus2.db. System-biology level: GO.db or KEGG.db.
More informationCreating a New Annotation Package using SQLForge
Creating a New Annotation Package using SQLForge Marc Carlson, HervÃľ PagÃĺs, Nianhua Li April 30, 2018 1 Introduction The AnnotationForge package provides a series of functions that can be used to build
More informationPackage AffyExpress. October 3, 2013
Version 1.26.0 Date 2009-07-22 Package AffyExpress October 3, 2013 Title Affymetrix Quality Assessment and Analysis Tool Author Maintainer Xuejun Arthur Li Depends R (>= 2.10), affy (>=
More informationTextual Description of annaffy
Textual Description of annaffy Colin A. Smith April 16, 2015 Introduction annaffy is part of the Bioconductor project. It is designed to help interface between Affymetrix analysis results and web-based
More informationHowTo: Querying online Data
HowTo: Querying online Data Jeff Gentry and Robert Gentleman November 12, 2017 1 Overview This article demonstrates how you can make use of the tools that have been provided for on-line querying of data
More informationAffymetrix Microarrays
Affymetrix Microarrays Cavan Reilly November 3, 2017 Table of contents Overview The CLL data set Quality Assessment and Remediation Preprocessing Testing for Differential Expression Moderated Tests Volcano
More informationHsAgilentDesign db
HsAgilentDesign026652.db January 16, 2019 HsAgilentDesign026652ACCNUM Map Manufacturer identifiers to Accession Numbers HsAgilentDesign026652ACCNUM is an R object that contains mappings between a manufacturer
More informationBayesian Pathway Analysis (BPA) Tutorial
Bayesian Pathway Analysis (BPA) Tutorial Step by Step to run BPA: 1-) Download latest version of BPAS from BPA website. Unzip it to an appropriate directory. You need to have JAVA Runtime engine and Matlab
More informationhgu133plus2.db December 11, 2017
hgu133plus2.db December 11, 2017 hgu133plus2accnum Map Manufacturer identifiers to Accession Numbers hgu133plus2accnum is an R object that contains mappings between a manufacturer s identifiers and manufacturers
More informationUsing metama for differential gene expression analysis from multiple studies
Using metama for differential gene expression analysis from multiple studies Guillemette Marot and Rémi Bruyère Modified: January 28, 2015. Compiled: January 28, 2015 Abstract This vignette illustrates
More informationmgu74a.db November 2, 2013 Map Manufacturer identifiers to Accession Numbers
mgu74a.db November 2, 2013 mgu74aaccnum Map Manufacturer identifiers to Accession Numbers mgu74aaccnum is an R object that contains mappings between a manufacturer s identifiers and manufacturers accessions.
More informationMicroarray annotation and biological information
Microarray annotation and biological information Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center b.brors@dkfz.de Why do we need microarray clone annotation? Often,
More informationPackage AnnotationForge
Package AnnotationForge October 9, 2018 Title Code for Building Annotation Database Packages Provides code for generating Annotation packages and their databases. Packages produced are intended to be used
More informationI. Overview of the Bioconductor Project. Bioinformatics and Biostatistics Lab., Seoul National Univ. Seoul, Korea Eun-Kyung Lee
Introduction to Bioconductor I. Overview of the Bioconductor Project Bioinformatics and Biostatistics Lab., Seoul National Univ. Seoul, Korea Eun-Kyung Lee Outline What is R? Overview of the Biocondcutor
More informationUsing Annotations in Bioconductor
Using Annotations in Bioconductor Marc Carlson Fred Hutchinson Cancer Research Center July 30, 2010 Bioconductor Annotation Packages AnnotationDbi AnnotationDbi Basics Working with GO.db SQL databases
More informationmogene20sttranscriptcluster.db
mogene20sttranscriptcluster.db November 17, 2017 mogene20sttranscriptclusteraccnum Map Manufacturer identifiers to Accession Numbers mogene20sttranscriptclusteraccnum is an R object that contains mappings
More informationBasic Functions of AnnBuilder
Basic Functions of AnnBuilder Jianhua Zhang November 1, 2004 2003 Bioconductor 1 Introduction This vignette is an overview of some of the functions that can be used to build an annotation data package.
More informationPackage annotate. December 1, 2017
Title Annotation for microarrays Version 1.57.2 Author R. Gentleman Using R enviroments for annotation. Package annotate December 1, 2017 Maintainer Bioconductor Package Maintainer
More informationBasic Functions of AnnBuilder
Basic Functions of AnnBuilder Jianhua Zhang June 23, 2003 c 2003 Bioconductor 1 Introduction This vignette is an overview of some of the functions that can be used to build an annotation data package.
More informationorg.hs.ipi.db November 7, 2017 annotation data package
org.hs.ipi.db November 7, 2017 org.hs.ipi.db annotation data package Welcome to the org.hs.ipi.db annotation Package. The annotation package was built using a downloadable R package - PAnnBuilder (download
More informationhgug4845a.db September 22, 2014 Map Manufacturer identifiers to Accession Numbers
hgug4845a.db September 22, 2014 hgug4845aaccnum Map Manufacturer identifiers to Accession Numbers hgug4845aaccnum is an R object that contains mappings between a manufacturer s identifiers and manufacturers
More informationPackage virtualarray
Package virtualarray March 26, 2013 Type Package Title Build virtual array from different microarray platforms Version 1.2.1 Date 2012-03-08 Author Andreas Heider Maintainer Andreas Heider
More informationR version has been released on (Linux source code versions)
Installation of R and Bioconductor R is a free software environment for statistical computing and graphics. It is based on the statistical computer language S. It is famous for its wide set of statistical
More informationPackage crossmeta. September 5, 2018
Package crossmeta September 5, 2018 Title Cross Platform Meta-Analysis of Microarray Data Version 1.6.0 Author Alex Pickering Maintainer Alex Pickering Implements cross-platform
More informationRelational Databases for Biologists: Efficiently Managing and Manipulating Your Data
Relational Databases for Biologists: Efficiently Managing and Manipulating Your Data Session 3 Building and modifying a database with SQL George Bell, Ph.D. WIBR Bioinformatics and Research Computing Session
More informationIntroduction to Genome Browsers
Introduction to Genome Browsers Rolando Garcia-Milian, MLS, AHIP (Rolando.milian@ufl.edu) Department of Biomedical and Health Information Services Health Sciences Center Libraries, University of Florida
More informationilluminahumanwgdaslv4.db
illuminahumanwgdaslv4.db September 24, 2018 illuminahumanwgdaslv4accnum Map Manufacturer identifiers to Accession Numbers illuminahumanwgdaslv4accnum is an R object that contains mappings between a manufacturer
More informationygs98.db December 22,
ygs98.db December 22, 2018 ygs98alias Map Open Reading Frame (ORF) Identifiers to Alias Gene Names A set of gene names may have been used to report yeast genes represented by ORF identifiers. One of these
More informationRDBMS in bioinformatics: the Bioconductor experience
DSC 2003 Working Papers (Draft Versions) http://www.ci.tuwien.ac.at/conferences/dsc-2003/ RDBMS in bioinformatics: the Bioconductor experience VJ Carey Harvard University stvjc@channing.harvard.edu Abstract.
More informationGenome Browsers Guide
Genome Browsers Guide Take a Class This guide supports the Galter Library class called Genome Browsers. See our Classes schedule for the next available offering. If this class is not on our upcoming schedule,
More informationTopics of the talk. Biodatabases. Data types. Some sequence terminology...
Topics of the talk Biodatabases Jarno Tuimala / Eija Korpelainen CSC What data are stored in biological databases? What constitutes a good database? Nucleic acid sequence databases Amino acid sequence
More informationTutorial - Analysis of Microarray Data. Microarray Core E Consortium for Functional Glycomics Funded by the NIGMS
Tutorial - Analysis of Microarray Data Microarray Core E Consortium for Functional Glycomics Funded by the NIGMS Data Analysis introduction Warning: Microarray data analysis is a constantly evolving science.
More informationAn Introduction to Bioconductor s ExpressionSet Class
An Introduction to Bioconductor s ExpressionSet Class Seth Falcon, Martin Morgan, and Robert Gentleman 6 October, 2006; revised 9 February, 2007 1 Introduction Biobase is part of the Bioconductor project,
More informationhgu133a2.db April 10, 2015 Map Manufacturer identifiers to Accession Numbers
hgu133a2.db April 10, 2015 hgu133a2accnum Map Manufacturer identifiers to Accession Numbers hgu133a2accnum is an R object that contains mappings between a manufacturer s identifiers and manufacturers accessions.
More informationFrom raw data to gene annotations
From raw data to gene annotations Laurent Gautier (Modified by C. Friis) 1 Process Affymetrix data First of all, you must download data files listed at http://www.cbs.dtu.dk/laurent/teaching/lemon/ and
More informationBioConductor Overviewr
BioConductor Overviewr 2016-09-28 Contents Installing Bioconductor 1 Bioconductor basics 1 ExressionSet 2 assaydata (gene expression)........................................ 2 phenodata (sample annotations).....................................
More informationGene Set Enrichment Analysis. GSEA User Guide
Gene Set Enrichment Analysis GSEA User Guide 1 Software Copyright The Broad Institute SOFTWARE COPYRIGHT NOTICE AGREEMENT This software and its documentation are copyright 2009, 2010 by the Broad Institute/Massachusetts
More informationBuilding R objects from ArrayExpress datasets
Building R objects from ArrayExpress datasets Audrey Kauffmann October 30, 2017 1 ArrayExpress database ArrayExpress is a public repository for transcriptomics and related data, which is aimed at storing
More informationAn introduction to Genomic Data Structures
An introduction to Genomic Data Structures Cavan Reilly October 30, 2017 Table of contents Object Oriented Programming The ALL data set ExpressionSet Objects Environments More on ExpressionSet Objects
More informationDrug versus Disease (DrugVsDisease) package
1 Introduction Drug versus Disease (DrugVsDisease) package The Drug versus Disease (DrugVsDisease) package provides a pipeline for the comparison of drug and disease gene expression profiles where negatively
More informationTutorial:OverRepresentation - OpenTutorials
Tutorial:OverRepresentation From OpenTutorials Slideshow OverRepresentation (about 12 minutes) (http://opentutorials.rbvi.ucsf.edu/index.php?title=tutorial:overrepresentation& ce_slide=true&ce_style=cytoscape)
More informationChIPXpress: enhanced ChIP-seq and ChIP-chip target gene identification using publicly available gene expression data
ChIPXpress: enhanced ChIP-seq and ChIP-chip target gene identification using publicly available gene expression data George Wu, Hongkai Ji December 22, 2017 1 Introduction ChIPx (i.e., ChIP-seq and ChIP-chip)
More informationBlast2GO Teaching Exercises
Blast2GO Teaching Exercises Ana Conesa and Stefan Götz 2012 BioBam Bioinformatics S.L. Valencia, Spain Contents 1 Annotate 10 sequences with Blast2GO 2 2 Perform a complete annotation process with Blast2GO
More information2. Take a few minutes to look around the site. The goal is to familiarize yourself with a few key components of the NCBI.
2 Navigating the NCBI Instructions Aim: To become familiar with the resources available at the National Center for Bioinformatics (NCBI) and the search engine Entrez. Instructions: Write the answers to
More informationIntroduction to R and microarray analysis
Introduction to R and microarray analysis Zhirui Hu Zack McCaw Jan 27 & Jan 28, 2016 1 Workspace Management Before jumping into R, it is important to ask ourselves Where am I? > getwd() I want to be
More informationRelational Databases for Biologists
Relational Databases for Biologists Session 2 SQL To Data Mine A Database Robert Latek, Ph.D. Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research Session 2 Outline Database Basics
More informationAnalysis of two-way cell-based assays
Analysis of two-way cell-based assays Lígia Brás, Michael Boutros and Wolfgang Huber April 16, 2015 Contents 1 Introduction 1 2 Assembling the data 2 2.1 Reading the raw intensity files..................
More informationCARMAweb users guide version Johannes Rainer
CARMAweb users guide version 1.0.8 Johannes Rainer July 4, 2006 Contents 1 Introduction 1 2 Preprocessing 5 2.1 Preprocessing of Affymetrix GeneChip data............................. 5 2.2 Preprocessing
More informationhom.dm.inp.db July 21, 2010 Bioconductor annotation data package
hom.dm.inp.db July 21, 2010 hom.dm.inp.db Bioconductor annotation data package Welcome to the hom.dm.inp.db annotation Package. The purpose of this package is to provide detailed information about the
More informationPackage PGSEA. R topics documented: May 4, Type Package Title Parametric Gene Set Enrichment Analysis Version 1.54.
Type Package Title Parametric Gene Set Enrichment Analysis Version 1.54.0 Date 2012-03-22 Package PGSEA May 4, 2018 Author Kyle Furge and Karl Dykema Maintainer
More informationCreating and Using Genome Assemblies Tutorial
Creating and Using Genome Assemblies Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Create a Genome Assembly for Danio rerio 2 2. Building Annotation Sources 5 A. Creating a Reference
More informationGenomics. Nolan C. Kane
Genomics Nolan C. Kane Nolan.Kane@Colorado.edu Course info http://nkane.weebly.com/genomics.html Emails let me know if you are not getting them! Email me at nolan.kane@colorado.edu Office hours by appointment
More informationAnalyzing ChIP- Seq Data in Galaxy
Analyzing ChIP- Seq Data in Galaxy Lauren Mills RISS ABSTRACT Step- by- step guide to basic ChIP- Seq analysis using the Galaxy platform. Table of Contents Introduction... 3 Links to helpful information...
More informationDatabases for Biologists
Databases for Biologists Session 3 Building And Modifying A Database With SQL Robert Latek, Ph.D. Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research Session 3 Outline SQL Query Review
More informationPublic Repositories Tutorial: Bulk Downloads
Public Repositories Tutorial: Bulk Downloads Almost all of the public databases, genome browsers, and other tools you have explored so far offer some form of access to rapidly download all or large chunks
More informationInformation Resources in Molecular Biology Marcela Davila-Lopez How many and where
Information Resources in Molecular Biology Marcela Davila-Lopez (marcela.davila@medkem.gu.se) How many and where Data growth DB: What and Why A Database is a shared collection of logically related data,
More informationPackage pcagopromoter
Version 1.26.0 Date 2012-03-16 Package pcagopromoter November 13, 2018 Title pcagopromoter is used to analyze DNA micro array data Author Morten Hansen, Jorgen Olsen Maintainer Morten Hansen
More informationR / Bioconductor packages for gene and genome annotation
R / Bioconductor packages for gene and genome annotation Martin Morgan Bioconductor / Fred Hutchinson Cancer Research Center Seattle, WA, USA 15-19 June 2009 Annotations Scenario Differnetial expression
More informationMATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 16 November pm BRAGG Cluster
MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 6 November 2009 3.00 pm BRAGG Cluster This document contains the tasks need to be done and completed by
More informationSeminar III: R/Bioconductor
Leonardo Collado Torres lcollado@lcg.unam.mx Bachelor in Genomic Sciences www.lcg.unam.mx/~lcollado/ August - December, 2009 1 / 50 Class outline Public Data Intro biomart GEOquery ArrayExpress annotate
More informationUsing ReportingTools in an Analysis of Microarray Data
Using ReportingTools in an Analysis of Microarray Data Jason A. Hackney and Jessica L. Larson November 17, 2017 Contents 1 Introduction 2 2 Differential expression analysis using limma 2 3 GO analysis
More informationTextual Description of webbioc
Textual Description of webbioc Colin A. Smith October 13, 2014 Introduction webbioc is a web interface for some of the Bioconductor microarray analysis packages. It is designed to be installed at local
More informationThe rtracklayer package
The rtracklayer package Michael Lawrence January 22, 2018 Contents 1 Introduction 2 2 Gene expression and microrna target sites 2 2.1 Creating a target site track..................... 2 2.1.1 Constructing
More informationPractical: Read Counting in RNA-seq
Practical: Read Counting in RNA-seq Hervé Pagès (hpages@fhcrc.org) 5 February 2014 Contents 1 Introduction 1 2 First look at some precomputed read counts 2 3 Aligned reads and BAM files 4 4 Choosing and
More information2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.
Web resources -- Tour. page 1 of 8 This is a guided tour. Any homework is separate. In fact, this exercise is used for multiple classes and is publicly available to everyone. The entire tour will take
More informationPackage EventPointer
Type Package Package EventPointer September 5, 2018 Title An effective identification of alternative splicing events using junction arrays and RNA-Seq data Version 1.4.0 Author Juan Pablo Romero, Ander
More informationAho, Kaisa-Leena; Kerkelä, Erja; Yli-Harja, Olli; Roos, Christophe. Construction of a computational data analysis pipeline using a workflow system
Tampere University of Technology Author(s) Title Citation Aho, Kaisa-Leena; Kerkelä, Erja; Yli-Harja, Olli; Roos, Christophe Construction of a computational data analysis pipeline using a workflow system
More informationGenome Browser. Background and Strategy
Genome Browser Background and Strategy Contents What is a genome browser? Purpose of a genome browser Examples Structure Extra Features Contents What is a genome browser? Purpose of a genome browser Examples
More informationPackage DupChecker. April 11, 2018
Type Package Package DupChecker April 11, 2018 Title a package for checking high-throughput genomic data redundancy in meta-analysis Version 1.16.0 Date 2014-10-07 Author Quanhu Sheng, Yu Shyr, Xi Chen
More informationBioMart: a research data management tool for the biomedical sciences
Yale University From the SelectedWorks of Rolando Garcia-Milian 2014 BioMart: a research data management tool for the biomedical sciences Rolando Garcia-Milian, Yale University Available at: https://works.bepress.com/rolando_garciamilian/2/
More informationAnalysis of Genomic and Proteomic Data. Practicals. Benjamin Haibe-Kains. February 17, 2005
Analysis of Genomic and Proteomic Data Affymetrix c Technology and Preprocessing Methods Practicals Benjamin Haibe-Kains February 17, 2005 1 R and Bioconductor You must have installed R (available from
More informationLab: Using R and Bioconductor
Lab: Using R and Bioconductor Robert Gentleman Florian Hahne Paul Murrell June 19, 2006 Introduction In this lab we will cover some basic uses of R and also begin working with some of the Bioconductor
More informationSequence Alignment. GBIO0002 Archana Bhardwaj University of Liege
Sequence Alignment GBIO0002 Archana Bhardwaj University of Liege 1 What is Sequence Alignment? A sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity.
More informationHow to use CNTools. Overview. Algorithms. Jianhua Zhang. April 14, 2011
How to use CNTools Jianhua Zhang April 14, 2011 Overview Studies have shown that genomic alterations measured as DNA copy number variations invariably occur across chromosomal regions that span over several
More informationKEGG.db. August 19, Bioconductor annotation data package
KEGG.db August 19, 2018 KEGG.db Bioconductor annotation data package Welcome to the KEGG.db annotation Package. The purpose of this package was to provide detailed information about the latest version
More informationPackage GEOmetadb. October 4, 2013
Package GEOmetadb October 4, 2013 Type Package Title A compilation of metadata from NCBI GEO Version 1.20.0 Date 2011-11-28 Depends GEOquery,RSQLite Author Jack Zhu and Sean Davis Maintainer Jack Zhu
More informationDesign and Annotation Files
Design and Annotation Files Release Notes SeqCap EZ Exome Target Enrichment System The design and annotation files provide information about genomic regions covered by the capture probes and the genes
More informationDepartment of Computer Science, UTSA Technical Report: CS TR
Department of Computer Science, UTSA Technical Report: CS TR 2008 008 Mapping microarray chip feature IDs to Gene IDs for microarray platforms in NCBI GEO Cory Burkhardt and Kay A. Robbins Department of
More informationPackage HomoVert. November 10, 2010
Package HomoVert November 10, 2010 Version 0.4.1 Date 2010-10-27 Title HomoVert: Functions to convert Gene IDs between species Author Matthew Fero Maintainer Matthew Fero
More informationImport GEO Experiment into Partek Genomics Suite
Import GEO Experiment into Partek Genomics Suite This tutorial will illustrate how to: Import a gene expression experiment from GEO SOFT files Specify annotations Import RAW data from GEO for gene expression
More informationDatabase Searching Lecture - 2
Database Searching Lecture - 2 Slides borrowed from: Debbie Laudencia-Chingcuanco, USDA-ARS Cheryl Seaton, USDA-ARS Victoria Carrollo, USDA-ARS Zjelka McBride, UC Davis Database Searching Utilizes Search
More informationPackage RmiR. R topics documented: September 26, 2018
Package RmiR September 26, 2018 Title Package to work with mirnas and mirna targets with R Description Useful functions to merge microrna and respective targets using differents databases Version 1.36.0
More informationPackage graphite. June 29, 2018
Version 1.27.2 Date 2018-05-22 Package graphite June 29, 2018 Title GRAPH Interaction from pathway Topological Environment Author Gabriele Sales , Enrica Calura ,
More informationRelational Databases for Biologists: Efficiently Managing and Manipulating Your Data
Relational Databases for Biologists: Efficiently Managing and Manipulating Your Data Session 1 Data Conceptualization and Database Design Robert Latek, Ph.D. Sr. Bioinformatics Scientist Whitehead Institute
More informationUseful software utilities for computational genomics. Shamith Samarajiwa CRUK Autumn School in Bioinformatics September 2017
Useful software utilities for computational genomics Shamith Samarajiwa CRUK Autumn School in Bioinformatics September 2017 Overview Search and download genomic datasets: GEOquery, GEOsearch and GEOmetadb,
More information