Bioconductor tutorial

Similar documents
Robert Gentleman! Copyright 2011, all rights reserved!

I. Overview of the Bioconductor Project. Bioinformatics and Biostatistics Lab., Seoul National Univ. Seoul, Korea Eun-Kyung Lee

BioConductor Overviewr

Analysis of Genomic and Proteomic Data. Practicals. Benjamin Haibe-Kains. February 17, 2005

Lab: Using R and Bioconductor

Programming with R. Educational Materials 2006 S. Falcon, R. Ihaka, and R. Gentleman

An introduction to Genomic Data Structures

Programming with R. Educational Materials 2006 S. Falcon, R. Ihaka, and R. Gentleman

Affymetrix Microarrays

Lab: An introduction to Bioconductor

INSTALLATION AND CONFIGURATION GUIDE R SOFTWARE for PIPELINE PILOT 2016

From raw data to gene annotations

Introduction to R. Educational Materials 2007 S. Falcon, R. Ihaka, and R. Gentleman

Package AffyExpress. October 3, 2013

Introduction: microarray quality assessment with arrayqualitymetrics

HowTo: Querying online Data

Package frma. R topics documented: March 8, Version Date Title Frozen RMA and Barcode

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Wolfgang Huber

Exploring cdna Data. Achim Tresch, Andreas Buness, Tim Beißbarth, Florian Hahne, Wolfgang Huber. June 17, 2005

An Introduction to Bioconductor s ExpressionSet Class

Analysis of two-way cell-based assays

Codelink Legacy: the old Codelink class

Analysis of screens with enhancer and suppressor controls

Textual Description of webbioc

Exploring cdna Data. Achim Tresch, Andreas Buness, Wolfgang Huber, Tim Beißbarth

Using metama for differential gene expression analysis from multiple studies

Bioconductor: Annotation Package Overview

Package makecdfenv. March 13, 2019

Course Introduction. Welcome! About us. Thanks to Cancer Research Uk! Mark Dunning 27/07/2015

Package affypdnn. January 10, 2019

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

PathWave: short manual Version 1.0 February 4th, 2009

Introduction to R: Part I

Annotation and Gene Set Analysis with R y Bioconductor

Package PGSEA. R topics documented: May 4, Type Package Title Parametric Gene Set Enrichment Analysis Version 1.54.

Microarray Technology (Affymetrix ) and Analysis. Practicals

Introduction: microarray quality assessment with arrayqualitymetrics

How to use CNTools. Overview. Algorithms. Jianhua Zhang. April 14, 2011

MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 16 November pm BRAGG Cluster

An Introduction to R- Programming

Introduction to Mfuzz package and its graphical user interface

Frozen Robust Multi-Array Analysis and the Gene Expression Barcode

R version has been released on (Linux source code versions)

Biobase development and the new eset

RDBMS in bioinformatics: the Bioconductor experience

GS Analysis of Microarray Data

Building R objects from ArrayExpress datasets

Building Packages. Chao-Jen Wong, Nishant Gopalakrishnan, Marc Carson, and Patrick Aboyoun May, Fred Hutchinson Cancer Research Center

Biobase. April 20, 2011

Introduction into R. A Short Overview. Thomas Girke. December 8, Introduction into R Slide 1/21

Package virtualarray

Bioinformatics Workshop - NM-AIST

Package SCAN.UPC. July 17, 2018

8.1 Come analizzare i dati: R

Introduction to the Bioconductor marray package : Input component

affyqcreport: A Package to Generate QC Reports for Affymetrix Array Data

Introduction to R statistical environment

Programming R. Manuel J. A. Eugster. Chapter 1: Basic Vocabulary. Last modification on April 11, 2012

Alternative CDF environments

Bioconductor exercises 1. Exploring cdna data. June Wolfgang Huber and Andreas Buness

Creating a New Annotation Package using SQLForge

Package sigqc. June 13, 2018

Package SCAN.UPC. October 9, Type Package. Title Single-channel array normalization (SCAN) and University Probability of expression Codes (UPC)

ReadqPCR: Functions to load RT-qPCR data into R

CompClustTk Manual & Tutorial

/ Computational Genomics. Normalization

Package GSRI. March 31, 2019

Package ConsensusClusterPlus

R Basics. Laurent Gatto Computational Statistics for Genome Biology 2 nd July 2012

SimBindProfiles: Similar Binding Profiles, identifies common and unique regions in array genome tiling array data

Package arrayqualitymetrics

End-to-end analysis of cell-based screens: from raw intensity readings to the annotated hit list (Short version)

Drug versus Disease (DrugVsDisease) package

AGA User Manual. Version 1.0. January 2014

Using FARMS for summarization Using I/NI-calls for gene filtering. Djork-Arné Clevert. Institute of Bioinformatics, Johannes Kepler University Linz

Package EDASeq. December 30, 2018

Description of gcrma package

Expander Online Documentation

Aho, Kaisa-Leena; Kerkelä, Erja; Yli-Harja, Olli; Roos, Christophe. Construction of a computational data analysis pipeline using a workflow system

Package Risa. November 28, 2017

Workshop: R and Bioinformatics

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017

CARMAweb users guide version Johannes Rainer

Short Introduction to R

funricegenes Comprehensive understanding and application of rice functional genes

mirnet Tutorial Starting with expression data

Working with Affymetrix data: estrogen, a 2x2 factorial design example

Package mimager. March 7, 2019

Package POST. R topics documented: November 8, Type Package

Basic R Part 1 BTI Plant Bioinformatics Course

Microarray Data Analysis (V) Preprocessing (i): two-color spotted arrays

Human Disease Models Tutorial

Graphs,EDA and Computational Biology. Robert Gentleman

Microarray annotation and biological information

Package SC3. November 27, 2017

Tutorial - Analysis of Microarray Data. Microarray Core E Consortium for Functional Glycomics Funded by the NIGMS

Why use R? Getting started. Why not use R? Introduction to R: Log into tak. Start R R or. It s hard to use at first

Part 1: Getting Started

Normalization: Bioconductor s marray package

Transcription:

Bioconductor tutorial Adapted by Alex Sanchez from tutorials by (1) Steffen Durinck, Robert Gentleman and Sandrine Dudoit (2) Laurent Gautier (3) Matt Ritchie (4) Jean Yang

Outline The Bioconductor Project OOP in R and Bioconductor Start-up: Installation, Courses, Vignettes

The Bioconductor Project

Bioconductor Bioconductor is an open source and open development software project for the analysis of biomedical and genomic data. The project was started in the Fall of 2001 and includes more than 25 core developers in the US, Europe, and Australia. Releases v 1.0: May 2 nd, 2002, 15 packages. v 1.1: November 18 th, 2002, 20 packages. v 1.2: May 28 th, 2003, 30 packages... v 1.9: October 4, 2006, 188 packages.... V 2.11 October 2012 608 Software p.,668 Annotation p., 137 Data p.

Goals Provide access to powerful statistical and graphical methods for the analysis of genomic data. Facilitate the integration of biological metadata (GenBank, GO, Entrez, PubMed) in the analysis of experimental data. Allow the rapid development of extensible, interoperable, and scalable software. Promote high-quality documentation and reproducible research. Provide training in computational and statistical methods.

Bioconductor packages R and the R package system are used to design and distribute Bioconductor software. An R package is a structured collection of code (R, C, or other), documentation, and/or data for performing specific types of analyses. Different types of packages Code Data (metadata or experimental data) Other (course packages )

Bioconductor packages Code packages provide implementations of specialized statistical and graphical methods. Data packages: Biological metadata: mappings between different gene identifiers (e.g., AffyID, GO, Entrez, PMID), CDF and probe sequence information for Affy arrays. E.g. hgu133plus2, GO, KEGG. Experimental data: code, data, and documentation for specific experiments or projects. yeastcc: Spellman et al. (1998) yeast cell cycle. golubesets: Golub et al. (2000) ALL/AML data. Course packages: code, data, documentation, and labs for the instruction of a particular course. E.g. EMBO03 course package.

Some packages arranged by their functionality Pre-processing CEL, CDF affy vsn.gpr,.spot, MAGEML marray limma vsn ExpressionSet Differential expression edd genefilter limma multtest ROC + CRAN Graphs & networks graph RBGL Rgraphviz Graphics gplots geneplotter hexbin + CRAN Cluster analysis CRAN class cluster MASS mva Prediction CRAN class e1071 ipred LogitBoost MASS nnet randomforest rpart Annotation annotate annaffy + metadata packages Data estrogen AMLL

OOP in Bioconductor and R

OOP The Bioconductor project has adopted the objectoriented programming (OOP) paradigm proposed in J. M. Chambers (1998). Programming with Data. This object-oriented class/method design allows efficient representation and manipulation of large and complex biological datasets of multiple types. Tools for programming using the class/method mechanism are provided in the R methods package. Tutorial:www.omegahat.org/RSMethods/index.html.

OOP: classes A class provides a software abstraction of a real world object. It reflects how we think of certain objects and what information these objects should contain. Classes are defined in terms of slots which contain the relevant data. An object is an instance of a class. A class defines the structure, inheritance, and initialization of objects.

OOP: methods A method is a function that performs an action on data (objects). Methods define how a particular function should behave depending on the class of its arguments. Methods allow computations to be adapted to particular data types, i.e., classes. A generic function is a dispatcher, it examines its arguments and determines the appropriate method to invoke. Examples of generic functions in R include plot, summary, print.

OOP: methods It is important to realize that when calling a generic function (such as plot), the actions performed depend on the class of the arguments. Methods define how a particular function should behave depending on the class of its arguments. Methods allow computations to be adapted to particular data types, i.e., classes.

ExpressionSet class Processed Affymetrix or spotted array data assaydata phenodata annotation annotation Matrix of expression measures, genes x samples Sample level covariates, instance of class annotateddataframe Name of annotation data source (annotation package) Name of features = data identifiers description notes MIAME information Any notes Use of object-oriented programming to deal with data complexity. S4 class/method mechanism (methods package).

AffyBatch class Probe-level intensity data for a batch of arrays (same CDF) cdfname nrow exprs ncol se.exprs Name of CDF file for arrays in the batch Dimensions of the array Matrices of probe-level intensities and SEs rows probe cells, columns arrays. phenodata annotation description notes Sample level covariates, instance of class phenodata Name of annotation data MIAME information Any notes

Getting Started

Installation 1. Main R software: download from CRAN ( cran.r-project.org), use latest release. 2. Bioconductor packages: download from Bioconductor (www.bioconductor.org), use latest release. Available for Linux/Unix, Windows, and Mac OS.

Installation After installing R, install Bioconductor packages using getbioc install script. From R > source("http://www.bioconductor.org/bioclite().r") > bioclite() In general, R packages can be installed using the function install.packages. In Windows, can also use Packages pull-down menus.

Installing vs. loading Packages only need to be installed once. But packages must be loaded with each new R session. Packages are loaded using the function library. From R > library(biobase) or the Packages pull-down menus in Windows. To update packages, use function update.packages or Packages pull-down menus in Windows. To quit: > q()

Documentation and help R manuals and tutorials:available from the R website or on-line in an R session. R on-line help system: detailed on-line documentation, available in text, HTML, PDF, and LaTeX formats. > help.start() > help(lm) >?hclust > apropos(mean) > example(hclust) > demo() > demo(image)

Short courses Bioconductor short courses modular training segments on software and statistical methodology; lectures notes, computer labs, and course packages available on WWW for self-instruction.

Vignettes Bioconductor has adopted a new documentation paradigm, the vignette. A vignette is an executable document consisting of a collection of code chunks and documentation text chunks. Vignettes provide dynamic, integrated, and reproducible statistical documents that can be automatically updated if either data or analyses are changed. Vignettes can be generated using the Sweave function from the R tools package.

Vignettes Each Bioconductor package contains at least one vignette, providing task-oriented descriptions of the package's functionality. Vignettes are located in the doc subdirectory of an installed package and are accessible from the help browser. Vignettes can be used interactively. Vignettes are also available separately from the Bioconductor website.

Vignettes Tools are being developed for managing and using this repository of step-by-step tutorials Some packages have functions to access or browse vignettes Biobase: openvignette Menu of available vignettes and interface for viewing vignettes (PDF). browsevignette()- tkwidgets: vexplorer Interactive use of vignettes.

Vignettes HowTo s: Task-oriented descriptions of package functionality. Executable documents consisting of documentation text and code chunks. Dynamic, integrated, and reproducible statistical documents. Can be used interactively vexplorer. Generated using Sweave (tools package). vexplorer

References R www.r-project.org, cran.r-project.org software (CRAN); documentation; newsletter: R News; mailing list. Bioconductor www.bioconductor.org software, data, and documentation (vignettes); training materials from short courses; mailing list.

Exercises Install Bioconductor using BiocLite() Load the Biobase package Browse which vignettes are available using the openvignette() function Browse the vignettes and carefully read and try the exercises in # 1: An Introduction to Biobase and ExpressionSets