Introduction to Cancer Genomics

Size: px
Start display at page:

Download "Introduction to Cancer Genomics"

Transcription

1 Introduction to Cancer Genomics Gene expression data analysis part I David Gfeller Computational Cancer Biology Ludwig Center for Cancer research david.gfeller@unil.ch 1

2 Overview 1. Basic understanding of RNA-Seq data processing. 2. Differential expression. Examples of R code 3. Dimensionality reduction. 2

3 Goals Help you understand what can be done with a computer -> programming logic Give you some basic idea of how to ask the computer to perform some tasks -> syntax. Show you a few examples of gene expression data analysis in R that you could reuse for your projects (see also practical). 3

4 Gene expression experiments Microarrays: Chip with DNA probes that will pair with DNA (retro-transcribed RNA) in a sample. Intensity is measured as a light signal. Very popular in ( ) RNA-Seq: Directly count how many transcripts (mrna molecules) originate from each gene in a sample. Increasingly replacing microarrays for gene expression analyses 4

5 RNA-Seq RNA fragmentation Reverse transcription Adaptors + amplification Sequencing ACCTAG CGGTAA ATGGCA TGGGAC TATAGG >100M reads RNA Map to reference transcriptome Gene A Gene B Gene expression => Quite easy (count the reads) Gene fusion => More difficult (especially for new fusion events) Splicing => More difficult (especially for poorly annotated isoforms) 5

6 1 - Typical output of RNA-Seq Raw sequences: - Fastq format (sequence of the reads + quality information) Processed data: - Counts: Number of reads mapping to each gene/transcript. - Bam format (compressed) - Sra format (compressed) 6

7 How to think about these data in a computer Sample1: gene1: 254; gene2: 1284; gene3: 7234; Sample2: gene1: 5; gene2: 362; gene3: 0; Sample3: gene1: 8902; gene2: 2199; gene3: 722; Each expression value corresponds to a scalar. Each sample corresponds to a vector. All samples form a matrix (M) N genes S samples M[s,n] corresponds to expression of gene n in sample s 7

8 Computers like numbers In R: - Scalar (numeric) - Vector (array) - Matrix (multidimensional arrays, e.g. S x N) Gene expression data are naturally digitalized, which makes them especially appropriate to use with computers Many other biological objects can be digitalized as vectors or matrices: - Protein/DNA sequences <-> vectors of letters/numbers - Protein structures <-> vectors/matrices of 3D coordinates - Interactions <-> N x N matrix with 1 s and 0 s - Image <-> matrix of pixel (1/0 for two-color image) - Set of measurements <-> vector of values 8

9 How to think about these data in a computer In R, once you load your data into a matrix (M), you can very easily: - Print one specific column: M[,2] - Print one specific line: M[1,] - Plot the correlation of two genes: plot(m[,5], M[,7]) - Make operations on lines or columns. 9

10 Let s practice Create a empty directory Tutorial_Gfeller and Tutorial_Gfeller/ Data Download the file: GSE93722_RAW.tar at: Put it in Tutorial_Gfeller/Data/ and uncompress it and uncompress the zip files. Each of the files corresponds to the gene expression profiling of a melanoma sample. Open Rstudio. Set the working directory (Session -> Set Working Directory) to Tutorial_Gfeller. Create a new Rscript file (File-> New File -> R script); this is where you will write your code and save it in Tutorial_Gfeller as file.r. 10

11 Let s load the data Each GSMxxx corresponds to one sample First have a look at the files in a Excel (or any text editor). To start with, we will focus on the expected_count column The command to load file is read.delim(): m1 <- read.delim("data/gse93722_raw/gsm _lau125.genes.results.txt ) Name of the object that will store the data. Path to the file to be loaded Then execute the command in the Console (pasting it or command+enter). Now you can look at the elements of m1 (e.g., for the first line, type m1[1,] in the console). Does it correspond to the first line of the file? With dim(m1)you can check the dimensions of m1. 11

12 Let s load the data Load the other files into m2 (LAU1255), m3 (LAU1314) and m4 LAU355). Build a matrix taking the fifth column in each file: M <- matrix(nrow=4, ncol=dim(m1)[1]) M[1,] <- m1[,5] In the first line, put the 5 th column of m1 Initialize an empty matrix with the correct dimension Do the same with m2, m3 and m4 (if you had many files, we would do a loop, see exercises). Try to query any entry of your matrix (e.g., M[3,5]). Do you get the expected number? 12

13 Genes have (many) names In these files, we have Ensembl gene Ids We want to convert them to Common Gene names. We need a file with the mapping (two columns, one for Ensembl IDs, one for gene names). Go to: Select Ensembl Genes 90, then Human genes. In Attributes, Select GENE: -> Gene stable ID and EXTERNAL: -> HGNC symbol. Click on Results, then Unique results only, and Go to save to a local file (put the file in Tutorial_Gfeller/Data). 13

14 Then in R Open the file: mapping <- read.delim("data/mart_export.txt") Use the match() function to find the position in mapping of all the genes for which you have expression data in m1: i <- match(m1[,1], mapping[,1]) Then build a vector with the gene names gene <- as.character(mapping[i,2]) N <- length(gene) Verify that the mapping is correct by checking a few examples 14

15 Computers like simple and sequential calculations Additions/subtractions and multiplications/divisions You need to decompose any problem into a set of simple operations. You need to tell the computer about every step of your calculations (e.g., loop over all entries in one column). Example: Find the average expression of a gene (e.g., EGFR) across samples 15

16 How to do it on a computer gene = EGFR M = 1) Have a matrix M with all expression values and a vector gene with the name of the genes (columns of M). 2) Find the column corresponding to your gene: n <- which(gene == EGFR ) 3) Initialize a scalar: av <- 0 4) Go through each element of the column: S <- dim(m)[1] for(s in 1:S){ av <- av + M[s,n] } M[,n] 5) Normalize your value: av <- av/s 16

17 How programming languages work The exact commands will change between programming languages (R, python, perl, C, matlab), but the logic remains the same ( grammar ). Learning the syntax ( words ) can be done with many online resources. In these two days, we will focus on R, since it is very convenient for graphical visualization of the data. Many built-in functions (e.g., average()), but important to understand the logic. 17

18 Typical output of RNA-Seq Raw sequences: - Fastq format (sequence of the reads + quality information Processed data: - Counts: Number of reads mapping to each gene/transcript. - Bam format (compressed) - Sra format (compressed) 18

19 Computational analyses Alignments Isoforms (splicing) Low complexity regions (repeats) Variable regions (TCR, MHC) Sequencing errors Poorly annotated regions / genomes ACCTAG CGGTAA ATGGCA TGGGAC TATAGG >100M reads Map to reference transcriptome Gene A Gene B 19

20 What else needs to be considered Different samples can have different total number of reads (e.g., different sequencing depth). Sample 1 Gene A Gene B Sample 2 Gene A Gene B Longer genes have more reads. Gene A Gene B If you want to compare expression between samples, you need to renormalize by total number of reads, If you want to compare expression between genes, you need to renormalize by gene length,

21 How to do it (naïve way) M = N <- dim(m)[2] M.norm <- matrix(nrow=s, ncol=n) # Initialize an empty matrix for( s in 1:S ){ tot=0; for (n in 1:N){ tot=tot+m[s,n] # Compute the sum over row s } for (n in 1:N){ M.norm[s,n] <- M[s,n]/tot # Normalize row s } } M.norm <- M.norm* # Avoid having too small numbers 21

22 A few names commonly used Raw counts: Number of reads mapping to a gene Scaled counts: After renormalization by total number of counts in the sample. Reads Per Kilobase Million (RPKM): Divide by the total number of reads and then by the gene length. Multiply by to have numbers that are easier to read. Transcripts Per Kilobase Million (TPM): Divide by gene length and then normalize across all genes (i.e. sum of TPMs of all genes is the same for all samples)

23 Scaled counts vs TPM vs RPKM TPM are increasingly used. The sum is always equal to 10 6 in TPM. The two values (TPM vs RPKM) are equivalent, up to a renormalizing factor. Scaled counts are enough to compare the same gene in different samples. TPM/RPKM are required to compare different genes. 23

24 Studying expression of some gene in two types of samples G1 G2 M[,n] 1) Define the groups: G1 <- c(1,2); G2 <- c(3,4) 2) Find the column corresponding to the gene: n <- which(gene== CD19 ) 3) Take the mean over the blue box: av1 <- 0; for(s in G1) { av1 <- av1 + M.norm[s,n] }; av1 <- av1/length(g1) 4) Take the mean over the red box: av2 <- 0; for(s in G2) {av2 <- av2 + M.norm[s,n] }; av2 <- av2/length(g2) 5) Compare expression. 6) With more samples you can do statistics (T-test, boxplot, see exercises). 24

25 2 - Differential expression Expression level How can we quantify these differences? S1 S2 Samples 25

26 Differential expression Log fold change: High expression genes can show big differences in counts ( to ), compared to low expression genes (10 to 20), even if they experience the same relative change. Better to use logarithms. 10 -> 20 = log 2 fold change of 1 = >

27 P-value: Differential expression Give a statistical significance, but not trivial to estimate. Expression level Expression level Expression level Differences in the mean values are not enough! 27

28 Differential expression P-value: Give a statistical significance, but not trivial to estimate. Expression level Depending on your random model, the first case may be more likely to appear by chance. 28

29 Differential expression P-value: Give a statistical significance, but not trivial to estimate. Expression level Advanced statistical methods have been developed to estimate P- values in RNA-Seq data! 29

30 Differential expression P-value: Give a statistical significance, but not trivial to estimate. Gene 1 Gene 2 Gene 9 Expression level Gene 8 Gene 3 Gene 7 Gene 4 Gene 5 Gene 6 Gene 10 Gene 11 Many genes (20 000) => many testing => Higher chances that the differences are just due to chance. 30

31 Tools for differential expression Accurate estimation of P-values aim at considering these different issues in testing the hypothesis that the expression values come from the same distribution or have the same mean in two conditions. Consider the multiple testing problem. gene mean Log-fold change P-value P-value adjusted Tools in R: - EdgeR - DESeq2 P= genes 31

32 How to show your results? P_adj < 0.05 P_adj >= 0.05 How to plot this in your computer? 1) Select genes with P_adj >= 0.05: ind1 <- which( P[,5] >= 0.05 ) 2) Plot these points plot( P[ind1, 2], P[ind1, 3] ) 3) Select genes with P_adj < 0.05: ind2 <- which( P[,5] < 0.05 ) gene mean Log-fold change P-value P-value adjusted 4) Plot these points par(new=t) # This is to overlay the graphs plot( P[ind2, 2], P[ind2, 3], col= red ) P= 32

33 3 - Visualizing high-dimensional data Each sample can be considered as a point in a very high dimensional space (N dimensions). In this high-dimensional space, are some samples more similar to each other? Replicates Similar cell types Cancer subtypes 33

34 Example in 3D (i.e. 3 genes) Gene 2 S5 S2 S4 S1 S3 Gene 1 Visually, you can see that: - S1, S3, S4 are similar to each other. - S2, S5 are similar to each other. Can you quantify it? - Distance - Angle (correlation) Gene 3 34

35 Distances - How would you do it on a computer? Gene 2 S5 S2 S4 S1 S3 Gene 1 S1 <- c(5, 6, -1) S2 <- c(-2, 5, 3) d12 <- 0 for(i in 1:3){ d12 <- d12 + (S1[i]-S2[i])**2 } d12 <- sqrt(d12) Here we used the ** for taking the square of a number and the sqrt() function for square root. Gene 3 35

36 What if you have genes? Very hard to visualize You can still compute distances d12 <- 0 N <- length(s1) for(i in 1:N){ d12 <- d12 + (S1[i]-S2[i])**2 } d12 <- sqrt(d12) This is a big advantage of using programming languages, compared to Excel (or manual calculations ) 36

37 Visualization Distances are still not very intuitive If you have many points (S), the number of pairwise distances is S(S-1)/2 Idea: Project the data in 2D, so that it represents optimally the raw data (gene expression profiles) in the N-dimensional space. 37

38 2D projection the good choice PC2 S5 S2 Gene 2 S4 S1 S3 PC2 S5 S2 In 2D S4 S1 S3 PC1 Gene 1 PC1 Gene 3 38

39 2D projection the bad choice PC2 S5 S2 Gene 2 S4 S1 S3 PC2 In 2D S4 S2 S1 S5 S3 PC1 Gene 1 PC1 Gene 3 39

40 Principle Component Analysis (PCA) PC2 S5 S2 Gene 2 S4 S1 S3 How to select your 2D plan on which to project the data? - Intuitive idea: Take axes with the largest variance or dispersion (Principal Components). PC1 - The math behind is not simple (eigenvalue decomposition of Gene 1 covariance matrix) but does not depend on the number of genes (dimension). Gene 3 - You do not need to understand the math to use it. 40

41 How to do it on your computer In R, use function prcomp (stats package). S1 <- c(5, 6, -1) S2 <- c(-2, 5, 3) S3 <- c(5.5, 6.5, -1.3) S4 <- c(4, 6.5, -0.3) S5 <- c(-2.2, 5.3, 3.1) x <- c(s1[1], S2[1], S3[1], S4[1], S5[1]) y <- c(s1[2], S2[2], S3[2], S4[2], S5[2]) z <- c(s1[3], S2[3], S3[3], S4[3], S5[3]) Plot the data in 3D library(rgl) plot3d(x,y,z, xlim=c(-10,10), ylim=c(-10,10), zlim=c(-10,10)) Make a PCA analysis mat <- t(matrix(c(s1, S2, S3, S4, S5), nrow=3)) pca = prcomp(mat) plot(pca$x[,1], pca$x[,2]) Each point in space Coordinates along x, y, z axes Make a matrix with each point in one line See practical this afternoon 41

42 Now let s look at the tumor expression Run: pca = prcomp(m.norm) data # Plot the samples along the two first components plot(pca$x[,1], pca$x[,2]) What do you see? Does it make sense in light of expression of CD19? 42

43 Principle component analysis some Gene 2 PC1 Gene 1 discussions - The axis with the largest variance do not necessarily reflect the structures in the data. - In PCA, the principle components are always orthogonal (linear method). - It is often useful to make sure the mean of the samples is at 0. PC1 43

44 Many refinements/alternatives In PCA, only select a subset of genes (high expression, high variability, ). Multi-dimensional scaling (MDS). Plot the points in 2D so that distances in the original space are best preserved (R package cmdscale ). Stochastic Neighbor Embedding (tsne). Very popular these days (R package tsne ). Non-linear techniques (not a simple projection). All these techniques are fully unsupervised: they do not need to know what your data are, which cluster you should expect, 44

45 Start with PCA. How to choose? If you know what your samples are (e.g., different cell types), you can try to play a bit with parameters (e.g., choice of genes, choice of algorithm) to have meaningful clusters. Find optimal parameters that best capture the signal in your data. => Allows you to discover new things Overfit your data: See only what you want to see (even if it is not there). Prevents from seeing anything new 45

46 Where to access gene expression data GEO: Largest collection of gene expression data (microarray, RNA-Seq). Often has counts (not only raw data). ENA (European Nucleotide Archive): Large collection of raw RNA-Seq data (bam files). ArrayExpress: functional genomics data See exercises this afternoon 46

47 Where can we access cancer gene expression data TCGA: large collection of tumor RNA-Seq, Exome-Seq, methylation, clinical information, > patients with sequenced tumors See exercises tomorrow 47

48 General remarks about programming Computers like numbers and simple operations Need to decompose complex tasks into simple steps. Learning a programming language takes time, but you do not need to know everything before starting. First understand the logics, then use books or online resources for the syntax. Data analysis takes time Analyzing large datasets is often more challenging than producing them 48

49 General remarks about programming Many ways of making many mistakes!!! We all do mistakes You need to check your outputs when you write a code If you do a normalization on matrix rows, check that the row sums are truly equal. If there is something incoherent in your output, always go back to find the mistakes (do not impute to noise ), even if the data come from a bioinformatics expert. 49

50 General remarks about programming In the beginning, it is a big investment to write a script, rather than using Excel. But in the long range, it allows you to go much faster and quickly analyze many datasets without having to redo everything each time. Many analyses cannot be done in Excel, while R provides many packages that you can use. 50

51 How to get support for bioinformatics analyses of gene expression data Sequencing facility: GTF (Keith Harshman) Standard pipelines for normalizing and PCA Bioinformatics core facility (Delorenzi) or Vital- IT (Xenarios). Very specific analyses: groups working in computational biology. 51

52 Questions? 52

How to store and visualize RNA-seq data

How to store and visualize RNA-seq data How to store and visualize RNA-seq data Gabriella Rustici Functional Genomics Group gabry@ebi.ac.uk EBI is an Outstation of the European Molecular Biology Laboratory. Talk summary How do we archive RNA-seq

More information

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University RNA-Seq Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University joshua.ainsley@tufts.edu Day four Quantifying expression Intro to R Differential expression

More information

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1 Automated Bioinformatics Analysis System on Chip ABASOC version 1.1 Phillip Winston Miller, Priyam Patel, Daniel L. Johnson, PhD. University of Tennessee Health Science Center Office of Research Molecular

More information

Colorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi

Colorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi Colorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi Although a little- bit long, this is an easy exercise

More information

Sequence Analysis Pipeline

Sequence Analysis Pipeline Sequence Analysis Pipeline Transcript fragments 1. PREPROCESSING 2. ASSEMBLY (today) Removal of contaminants, vector, adaptors, etc Put overlapping sequence together and calculate bigger sequences 3. Analysis/Annotation

More information

Data Processing and Analysis in Systems Medicine. Milena Kraus Data Management for Digital Health Summer 2017

Data Processing and Analysis in Systems Medicine. Milena Kraus Data Management for Digital Health Summer 2017 Milena Kraus Digital Health Summer Agenda Real-world Use Cases Oncology Nephrology Heart Insufficiency Additional Topics Data Management & Foundations Biology Recap Data Sources Data Formats Business Processes

More information

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..

More information

Services Performed. The following checklist confirms the steps of the RNA-Seq Service that were performed on your samples.

Services Performed. The following checklist confirms the steps of the RNA-Seq Service that were performed on your samples. Services Performed The following checklist confirms the steps of the RNA-Seq Service that were performed on your samples. SERVICE Sample Received Sample Quality Evaluated Sample Prepared for Sequencing

More information

Why use R? Getting started. Why not use R? Introduction to R: Log into tak. Start R R or. It s hard to use at first

Why use R? Getting started. Why not use R? Introduction to R: Log into tak. Start R R or. It s hard to use at first Why use R? Introduction to R: Using R for statistics ti ti and data analysis BaRC Hot Topics October 2011 George Bell, Ph.D. http://iona.wi.mit.edu/bio/education/r2011/ To perform inferential statistics

More information

RNA-seq. Manpreet S. Katari

RNA-seq. Manpreet S. Katari RNA-seq Manpreet S. Katari Evolution of Sequence Technology Normalizing the Data RPKM (Reads per Kilobase of exons per million reads) Score = R NT R = # of unique reads for the gene N = Size of the gene

More information

Dimension reduction : PCA and Clustering

Dimension reduction : PCA and Clustering Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental

More information

Quantification. Part I, using Excel

Quantification. Part I, using Excel Quantification In this exercise we will work with RNA-seq data from a study by Serin et al (2017). RNA-seq was performed on Arabidopsis seeds matured at standard temperature (ST, 22 C day/18 C night) or

More information

Using R for statistics and data analysis

Using R for statistics and data analysis Introduction ti to R: Using R for statistics and data analysis BaRC Hot Topics October 2011 George Bell, Ph.D. http://iona.wi.mit.edu/bio/education/r2011/ Why use R? To perform inferential statistics (e.g.,

More information

/ Computational Genomics. Normalization

/ Computational Genomics. Normalization 10-810 /02-710 Computational Genomics Normalization Genes and Gene Expression Technology Display of Expression Information Yeast cell cycle expression Experiments (over time) baseline expression program

More information

How do microarrays work

How do microarrays work Lecture 3 (continued) Alvis Brazma European Bioinformatics Institute How do microarrays work condition mrna cdna hybridise to microarray condition Sample RNA extract labelled acid acid acid nucleic acid

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

Exercise 1 Review. --outfiltermismatchnmax : max number of mismatch (Default 10) --outreadsunmapped fastx: output unmapped reads

Exercise 1 Review. --outfiltermismatchnmax : max number of mismatch (Default 10) --outreadsunmapped fastx: output unmapped reads Exercise 1 Review Setting parameters STAR --quantmode GeneCounts --genomedir genomedb -- runthreadn 2 --outfiltermismatchnmax 2 --readfilesin WTa.fastq.gz --readfilescommand zcat --outfilenameprefix WTa

More information

Differential gene expression analysis using RNA-seq

Differential gene expression analysis using RNA-seq https://abc.med.cornell.edu/ Differential gene expression analysis using RNA-seq Applied Bioinformatics Core, September/October 2018 Friederike Dündar with Luce Skrabanek & Paul Zumbo Day 3: Counting reads

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

Gene Expression Data Analysis. Qin Ma, Ph.D. December 10, 2017

Gene Expression Data Analysis. Qin Ma, Ph.D. December 10, 2017 1 Gene Expression Data Analysis Qin Ma, Ph.D. December 10, 2017 2 Bioinformatics Systems biology This interdisciplinary science is about providing computational support to studies on linking the behavior

More information

CompClustTk Manual & Tutorial

CompClustTk Manual & Tutorial CompClustTk Manual & Tutorial Brandon King Copyright c California Institute of Technology Version 0.1.10 May 13, 2004 Contents 1 Introduction 1 1.1 Purpose.............................................

More information

Advanced RNA-Seq 1.5. User manual for. Windows, Mac OS X and Linux. November 2, 2016 This software is for research purposes only.

Advanced RNA-Seq 1.5. User manual for. Windows, Mac OS X and Linux. November 2, 2016 This software is for research purposes only. User manual for Advanced RNA-Seq 1.5 Windows, Mac OS X and Linux November 2, 2016 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark Contents 1 Introduction

More information

Introduction to GE Microarray data analysis Practical Course MolBio 2012

Introduction to GE Microarray data analysis Practical Course MolBio 2012 Introduction to GE Microarray data analysis Practical Course MolBio 2012 Claudia Pommerenke Nov-2012 Transkriptomanalyselabor TAL Microarray and Deep Sequencing Core Facility Göttingen University Medical

More information

ROTS: Reproducibility Optimized Test Statistic

ROTS: Reproducibility Optimized Test Statistic ROTS: Reproducibility Optimized Test Statistic Fatemeh Seyednasrollah, Tomi Suomi, Laura L. Elo fatsey (at) utu.fi March 3, 2016 Contents 1 Introduction 2 2 Algorithm overview 3 3 Input data 3 4 Preprocessing

More information

CLC Server. End User USER MANUAL

CLC Server. End User USER MANUAL CLC Server End User USER MANUAL Manual for CLC Server 10.0.1 Windows, macos and Linux March 8, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark

More information

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise

More information

Our typical RNA quantification pipeline

Our typical RNA quantification pipeline RNA-Seq primer Our typical RNA quantification pipeline Upload your sequence data (fastq) Align to the ribosome (Bow>e) Align remaining reads to genome (TopHat) or transcriptome (RSEM) Make report of quality

More information

Gene Survey: FAQ. Gene Survey: FAQ Tod Casasent DRAFT

Gene Survey: FAQ. Gene Survey: FAQ Tod Casasent DRAFT Gene Survey: FAQ Tod Casasent 2016-02-22-1245 DRAFT 1 What is this document? This document is intended for use by internal and external users of the Gene Survey package, results, and output. This document

More information

Single/paired-end RNAseq analysis with Galaxy

Single/paired-end RNAseq analysis with Galaxy October 016 Single/paired-end RNAseq analysis with Galaxy Contents: 1. Introduction. Quality control 3. Alignment 4. Normalization and read counts 5. Workflow overview 6. Sample data set to test the paired-end

More information

A review of RNA-Seq normalization methods

A review of RNA-Seq normalization methods A review of RNA-Seq normalization methods This post covers the units used in RNA-Seq that are, unfortunately, often misused and misunderstood I ll try to clear up a bit of the confusion here The first

More information

Supplementary Figure 1. Fast read-mapping algorithm of BrowserGenome.

Supplementary Figure 1. Fast read-mapping algorithm of BrowserGenome. Supplementary Figure 1 Fast read-mapping algorithm of BrowserGenome. (a) Indexing strategy: The genome sequence of interest is divided into non-overlapping 12-mers. A Hook table is generated that contains

More information

RNA-Seq analysis with Astrocyte Differential expression and transcriptome assembly

RNA-Seq analysis with Astrocyte Differential expression and transcriptome assembly RNA-Seq analysis with Astrocyte Differential expression and transcriptome assembly Beibei Chen Ph.D BICF 9/28/2016 Agenda Launch Workflows using Astrocyte BICF Workflows BICF RNA-seq Workflow Experimental

More information

Transcript quantification using Salmon and differential expression analysis using bayseq

Transcript quantification using Salmon and differential expression analysis using bayseq Introduction to expression analysis (RNA-seq) Transcript quantification using Salmon and differential expression analysis using bayseq Philippine Genome Center University of the Philippines Prepared by

More information

Reference guided RNA-seq data analysis using BioHPC Lab computers

Reference guided RNA-seq data analysis using BioHPC Lab computers Reference guided RNA-seq data analysis using BioHPC Lab computers This document assumes that you already know some basics of how to use a Linux computer. Some of the command lines in this document are

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the

More information

11/8/2017 Trinity De novo Transcriptome Assembly Workshop trinityrnaseq/rnaseq_trinity_tuxedo_workshop Wiki GitHub

11/8/2017 Trinity De novo Transcriptome Assembly Workshop trinityrnaseq/rnaseq_trinity_tuxedo_workshop Wiki GitHub trinityrnaseq / RNASeq_Trinity_Tuxedo_Workshop Trinity De novo Transcriptome Assembly Workshop Brian Haas edited this page on Oct 17, 2015 14 revisions De novo RNA-Seq Assembly and Analysis Using Trinity

More information

Tutorial: RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and Expression measures

Tutorial: RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and Expression measures : RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and February 24, 2014 Sample to Insight : RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and : RNA-Seq Analysis

More information

Differential Expression

Differential Expression Differential Expression Data In this practical, as before, we will work with RNA-Seq data from Arabidopsis seeds that matured at standard temperature (ST, 22 C day/18 C night) or at high temperature (HT,

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

SVM Classification in -Arrays

SVM Classification in -Arrays SVM Classification in -Arrays SVM classification and validation of cancer tissue samples using microarray expression data Furey et al, 2000 Special Topics in Bioinformatics, SS10 A. Regl, 7055213 What

More information

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 A / CX 4242 DVA March 6, 2014 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Analyze! Limited memory size! Data may not be fitted to the memory of your machine! Slow computation!

More information

Easy visualization of the read coverage using the CoverageView package

Easy visualization of the read coverage using the CoverageView package Easy visualization of the read coverage using the CoverageView package Ernesto Lowy European Bioinformatics Institute EMBL June 13, 2018 > options(width=40) > library(coverageview) 1 Introduction This

More information

RNA-Seq Analysis With the Tuxedo Suite

RNA-Seq Analysis With the Tuxedo Suite June 2016 RNA-Seq Analysis With the Tuxedo Suite Dena Leshkowitz Introduction In this exercise we will learn how to analyse RNA-Seq data using the Tuxedo Suite tools: Tophat, Cuffmerge, Cufflinks and Cuffdiff.

More information

ChIP-Seq Tutorial on Galaxy

ChIP-Seq Tutorial on Galaxy 1 Introduction ChIP-Seq Tutorial on Galaxy 2 December 2010 (modified April 6, 2017) Rory Stark The aim of this practical is to give you some experience handling ChIP-Seq data. We will be working with data

More information

Expression Analysis with the Advanced RNA-Seq Plugin

Expression Analysis with the Advanced RNA-Seq Plugin Expression Analysis with the Advanced RNA-Seq Plugin May 24, 2016 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com

More information

srap: Simplified RNA-Seq Analysis Pipeline

srap: Simplified RNA-Seq Analysis Pipeline srap: Simplified RNA-Seq Analysis Pipeline Charles Warden October 30, 2017 1 Introduction This package provides a pipeline for gene expression analysis. The normalization function is specific for RNA-Seq

More information

Database Repository and Tools

Database Repository and Tools Database Repository and Tools John Matese May 9, 2008 What is the Repository? Save and exchange retrieved and analyzed datafiles Perform datafile manipulations (averaging and annotations) Run specialized

More information

ArrayExpress and Expression Atlas: Mining Functional Genomics data

ArrayExpress and Expression Atlas: Mining Functional Genomics data and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL gabry@ebi.ac.uk What is functional genomics (FG)? The aim of FG is to understand the function

More information

Visualization using CummeRbund 2014 Overview

Visualization using CummeRbund 2014 Overview Visualization using CummeRbund 2014 Overview In this lab, we'll look at how to use cummerbund to visualize our gene expression results from cuffdiff. CummeRbund is part of the tuxedo pipeline and it is

More information

Mapping RNA sequence data (Part 1: using pathogen portal s RNAseq pipeline) Exercise 6

Mapping RNA sequence data (Part 1: using pathogen portal s RNAseq pipeline) Exercise 6 Mapping RNA sequence data (Part 1: using pathogen portal s RNAseq pipeline) Exercise 6 The goal of this exercise is to retrieve an RNA-seq dataset in FASTQ format and run it through an RNA-sequence analysis

More information

ChIP-seq (NGS) Data Formats

ChIP-seq (NGS) Data Formats ChIP-seq (NGS) Data Formats Biological samples Sequence reads SRA/SRF, FASTQ Quality control SAM/BAM/Pileup?? Mapping Assembly... DE Analysis Variant Detection Peak Calling...? Counts, RPKM VCF BED/narrowPeak/

More information

Gene expression & Clustering (Chapter 10)

Gene expression & Clustering (Chapter 10) Gene expression & Clustering (Chapter 10) Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species Dynamic programming Approximate pattern matching

More information

Galaxy workshop at the Winter School Igor Makunin

Galaxy workshop at the Winter School Igor Makunin Galaxy workshop at the Winter School 2016 Igor Makunin i.makunin@uq.edu.au Winter school, UQ, July 6, 2016 Plan Overview of the Genomics Virtual Lab Introduce Galaxy, a web based platform for analysis

More information

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017 RNA-Seq Analysis of Breast Cancer Data November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

User s Guide. Using the R-Peridot Graphical User Interface (GUI) on Windows and GNU/Linux Systems

User s Guide. Using the R-Peridot Graphical User Interface (GUI) on Windows and GNU/Linux Systems User s Guide Using the R-Peridot Graphical User Interface (GUI) on Windows and GNU/Linux Systems Pitágoras Alves 01/06/2018 Natal-RN, Brazil Index 1. The R Environment Manager...

More information

Clustering Techniques

Clustering Techniques Clustering Techniques Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 16 Lopresti Fall 2007 Lecture 16-1 - Administrative notes Your final project / paper proposal is due on Friday,

More information

TP RNA-seq : Differential expression analysis

TP RNA-seq : Differential expression analysis TP RNA-seq : Differential expression analysis Overview of RNA-seq analysis Fusion transcripts detection Differential expresssion Gene level RNA-seq Transcript level Transcripts and isoforms detection 2

More information

Anaquin - Vignette Ted Wong January 05, 2019

Anaquin - Vignette Ted Wong January 05, 2019 Anaquin - Vignette Ted Wong (t.wong@garvan.org.au) January 5, 219 Citation [1] Representing genetic variation with synthetic DNA standards. Nature Methods, 217 [2] Spliced synthetic genes as internal controls

More information

Dr. Gabriela Salinas Dr. Orr Shomroni Kaamini Rhaithata

Dr. Gabriela Salinas Dr. Orr Shomroni Kaamini Rhaithata Analysis of RNA sequencing data sets using the Galaxy environment Dr. Gabriela Salinas Dr. Orr Shomroni Kaamini Rhaithata Microarray and Deep-sequencing core facility 30.10.2017 RNA-seq workflow I Hypothesis

More information

Long Read RNA-seq Mapper

Long Read RNA-seq Mapper UNIVERSITY OF ZAGREB FACULTY OF ELECTRICAL ENGENEERING AND COMPUTING MASTER THESIS no. 1005 Long Read RNA-seq Mapper Josip Marić Zagreb, February 2015. Table of Contents 1. Introduction... 1 2. RNA Sequencing...

More information

Clustering analysis of gene expression data

Clustering analysis of gene expression data Clustering analysis of gene expression data Chapter 11 in Jonathan Pevsner, Bioinformatics and Functional Genomics, 3 rd edition (Chapter 9 in 2 nd edition) Human T cell expression data The matrix contains

More information

ChIP-seq hands-on practical using Galaxy

ChIP-seq hands-on practical using Galaxy ChIP-seq hands-on practical using Galaxy In this exercise we will cover some of the basic NGS analysis steps for ChIP-seq using the Galaxy framework: Quality control Mapping of reads using Bowtie2 Peak-calling

More information

CPIB SUMMER SCHOOL 2011: INTRODUCTION TO BIOLOGICAL MODELLING

CPIB SUMMER SCHOOL 2011: INTRODUCTION TO BIOLOGICAL MODELLING CPIB SUMMER SCHOOL 2011: INTRODUCTION TO BIOLOGICAL MODELLING 1 Getting started Practical 4: Spatial Models in MATLAB Nick Monk Matlab files for this practical (Mfiles, with suffix.m ) can be found at:

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Goal: Learn how to use various tool to extract information from RNAseq reads. 4.1 Mapping RNAseq Reads to a Genome Assembly

Goal: Learn how to use various tool to extract information from RNAseq reads. 4.1 Mapping RNAseq Reads to a Genome Assembly ESSENTIALS OF NEXT GENERATION SEQUENCING WORKSHOP 2014 UNIVERSITY OF KENTUCKY AGTC Class 4 RNAseq Goal: Learn how to use various tool to extract information from RNAseq reads. Input(s): magnaporthe_oryzae_70-15_8_supercontigs.fasta

More information

Maximizing Public Data Sources for Sequencing and GWAS

Maximizing Public Data Sources for Sequencing and GWAS Maximizing Public Data Sources for Sequencing and GWAS February 4, 2014 G Bryce Christensen Director of Services Questions during the presentation Use the Questions pane in your GoToWebinar window Agenda

More information

m6aviewer Version Documentation

m6aviewer Version Documentation m6aviewer Version 1.6.0 Documentation Contents 1. About 2. Requirements 3. Launching m6aviewer 4. Running Time Estimates 5. Basic Peak Calling 6. Running Modes 7. Multiple Samples/Sample Replicates 8.

More information

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1 Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1,2 Keyue Ding, Ph.D. Nov. 8, 2014 1 NCIC Clinical Trials Group, Kingston, Ontario, Canada 2 Dept. Public

More information

Introduction to Matlab. Sasha Lukyanov, 2018 Xenopus Bioinformatics Workshop, MBL, Woods Hole

Introduction to Matlab. Sasha Lukyanov, 2018 Xenopus Bioinformatics Workshop, MBL, Woods Hole Introduction to Matlab Sasha Lukyanov, 2018 Xenopus Bioinformatics Workshop, MBL, Woods Hole MATLAB Environment This image cannot currently be displayed. What do we use? Help? If you know the name of the

More information

INTRODUCTION TO BIOINFORMATICS

INTRODUCTION TO BIOINFORMATICS Molecular Biology-2017 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain

More information

TECH NOTE Improving the Sensitivity of Ultra Low Input mrna Seq

TECH NOTE Improving the Sensitivity of Ultra Low Input mrna Seq TECH NOTE Improving the Sensitivity of Ultra Low Input mrna Seq SMART Seq v4 Ultra Low Input RNA Kit for Sequencing Powered by SMART and LNA technologies: Locked nucleic acid technology significantly improves

More information

How to use the DEGseq Package

How to use the DEGseq Package How to use the DEGseq Package Likun Wang 1,2 and Xi Wang 1. October 30, 2018 1 MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST /Department of Automation, Tsinghua University. 2

More information

Why use R? Getting started. Why not use R? Introduction to R: It s hard to use at first. To perform inferential statistics (e.g., use a statistical

Why use R? Getting started. Why not use R? Introduction to R: It s hard to use at first. To perform inferential statistics (e.g., use a statistical Why use R? Introduction to R: Using R for statistics ti ti and data analysis BaRC Hot Topics November 2013 George W. Bell, Ph.D. http://jura.wi.mit.edu/bio/education/hot_topics/ To perform inferential

More information

Testing for Differential Expression

Testing for Differential Expression Testing for Differential Expression Objectives Once we've obtained abundance counts for our genes/exons/transcripts, we are usually interested in identifying those genes/exons/transcripts that are differentially

More information

Matlab project Independent component analysis

Matlab project Independent component analysis Matlab project Independent component analysis Michel Journée Dept. of Electrical Engineering and Computer Science University of Liège, Belgium m.journee@ulg.ac.be September 2008 What is Independent Component

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Metabolomic Data Analysis with MetaboAnalyst

Metabolomic Data Analysis with MetaboAnalyst Metabolomic Data Analysis with MetaboAnalyst User ID: guest6522519400069885256 April 14, 2009 1 Data Processing and Normalization 1.1 Reading and Processing the Raw Data MetaboAnalyst accepts a variety

More information

CQN (Conditional Quantile Normalization)

CQN (Conditional Quantile Normalization) CQN (Conditional Quantile Normalization) Kasper Daniel Hansen khansen@jhsph.edu Zhijin Wu zhijin_wu@brown.edu Modified: August 8, 2012. Compiled: April 30, 2018 Introduction This package contains the CQN

More information

mrna-seq Basic processing Read mapping (shown here, but optional. May due if time allows) Gene expression estimation

mrna-seq Basic processing Read mapping (shown here, but optional. May due if time allows) Gene expression estimation mrna-seq Basic processing Read mapping (shown here, but optional. May due if time allows) Tophat Gene expression estimation cufflinks Confidence intervals Gene expression changes (separate use case) Sample

More information

CLUSTERING IN BIOINFORMATICS

CLUSTERING IN BIOINFORMATICS CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of

More information

Drug versus Disease (DrugVsDisease) package

Drug versus Disease (DrugVsDisease) package 1 Introduction Drug versus Disease (DrugVsDisease) package The Drug versus Disease (DrugVsDisease) package provides a pipeline for the comparison of drug and disease gene expression profiles where negatively

More information

Exercises: Analysing RNA-Seq data

Exercises: Analysing RNA-Seq data Exercises: Analysing RNA-Seq data Version 2018-03 Exercises: Analysing RNA-Seq data 2 Licence This manual is 2011-18, Simon Andrews, Laura Biggins. This manual is distributed under the creative commons

More information

ChIP-seq Analysis Practical

ChIP-seq Analysis Practical ChIP-seq Analysis Practical Vladimir Teif (vteif@essex.ac.uk) An updated version of this document will be available at http://generegulation.info/index.php/teaching In this practical we will learn how

More information

Introduction to Systems Biology II: Lab

Introduction to Systems Biology II: Lab Introduction to Systems Biology II: Lab Amin Emad NIH BD2K KnowEnG Center of Excellence in Big Data Computing Carl R. Woese Institute for Genomic Biology Department of Computer Science University of Illinois

More information

The software and data for the RNA-Seq exercise are already available on the USB system

The software and data for the RNA-Seq exercise are already available on the USB system BIT815 Notes on R analysis of RNA-seq data The software and data for the RNA-Seq exercise are already available on the USB system The notes below regarding installation of R packages and other software

More information

Exercise 1. RNA-seq alignment and quantification. Part 1. Prepare the working directory. Part 2. Examine qualities of the RNA-seq data files

Exercise 1. RNA-seq alignment and quantification. Part 1. Prepare the working directory. Part 2. Examine qualities of the RNA-seq data files Exercise 1. RNA-seq alignment and quantification Part 1. Prepare the working directory. 1. Connect to your assigned computer. If you do not know how, follow the instruction at http://cbsu.tc.cornell.edu/lab/doc/remote_access.pdf

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu ECG782: Multidimensional Digital Signal Processing Spring 2014 TTh 14:30-15:45 CBC C313 Lecture 06 Image Structures 13/02/06 http://www.ee.unlv.edu/~b1morris/ecg782/

More information

SOM Tutorial. Camden Jansen Mortazavi Lab

SOM Tutorial. Camden Jansen Mortazavi Lab SOM Tutorial Camden Jansen Mortazavi Lab csjansen@uci.edu Presentation outline Background on Self-Organizing Maps (SOMs) In-depth description of SOM training Using SOMatic s features Using the SOMatic

More information

Mapping NGS reads for genomics studies

Mapping NGS reads for genomics studies Mapping NGS reads for genomics studies Valencia, 28-30 Sep 2015 BIER Alejandro Alemán aaleman@cipf.es Genomics Data Analysis CIBERER Where are we? Fastq Sequence preprocessing Fastq Alignment BAM Visualization

More information

CS313 Exercise 4 Cover Page Fall 2017

CS313 Exercise 4 Cover Page Fall 2017 CS313 Exercise 4 Cover Page Fall 2017 Due by the start of class on Thursday, October 12, 2017. Name(s): In the TIME column, please estimate the time you spent on the parts of this exercise. Please try

More information

PROMO 2017a - Tutorial

PROMO 2017a - Tutorial PROMO 2017a - Tutorial Introduction... 2 Installing PROMO... 2 Step 1 - Importing data... 2 Step 2 - Preprocessing... 6 Step 3 Data Exploration... 9 Step 4 Clustering... 13 Step 5 Analysis of sample clusters...

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

Course on Microarray Gene Expression Analysis

Course on Microarray Gene Expression Analysis Course on Microarray Gene Expression Analysis ::: Normalization methods and data preprocessing Madrid, April 27th, 2011. Gonzalo Gómez ggomez@cnio.es Bioinformatics Unit CNIO ::: Introduction. The probe-level

More information

Inf2B assignment 2. Natural images classification. Hiroshi Shimodaira and Pol Moreno. Submission due: 4pm, Wednesday 30 March 2016.

Inf2B assignment 2. Natural images classification. Hiroshi Shimodaira and Pol Moreno. Submission due: 4pm, Wednesday 30 March 2016. Inf2B assignment 2 (Ver. 1.2) Natural images classification Submission due: 4pm, Wednesday 30 March 2016 Hiroshi Shimodaira and Pol Moreno This assignment is out of 100 marks and forms 12.5% of your final

More information

Introduction to R: Using R for statistics and data analysis

Introduction to R: Using R for statistics and data analysis Why use R? Introduction to R: Using R for statistics and data analysis George W Bell, Ph.D. BaRC Hot Topics November 2014 Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/

More information

Taxonomically Clustering Organisms Based on the Profiles of Gene Sequences Using PCA

Taxonomically Clustering Organisms Based on the Profiles of Gene Sequences Using PCA Journal of Computer Science 2 (3): 292-296, 2006 ISSN 1549-3636 2006 Science Publications Taxonomically Clustering Organisms Based on the Profiles of Gene Sequences Using PCA 1 E.Ramaraj and 2 M.Punithavalli

More information

Linear and Non-linear Dimentionality Reduction Applied to Gene Expression Data of Cancer Tissue Samples

Linear and Non-linear Dimentionality Reduction Applied to Gene Expression Data of Cancer Tissue Samples Linear and Non-linear Dimentionality Reduction Applied to Gene Expression Data of Cancer Tissue Samples Franck Olivier Ndjakou Njeunje Applied Mathematics, Statistics, and Scientific Computation University

More information

7 Control Structures, Logical Statements

7 Control Structures, Logical Statements 7 Control Structures, Logical Statements 7.1 Logical Statements 1. Logical (true or false) statements comparing scalars or matrices can be evaluated in MATLAB. Two matrices of the same size may be compared,

More information

Package SC3. November 27, 2017

Package SC3. November 27, 2017 Type Package Title Single-Cell Consensus Clustering Version 1.7.1 Author Vladimir Kiselev Package SC3 November 27, 2017 Maintainer Vladimir Kiselev A tool for unsupervised

More information