From microarray images to Biological knowledge. Junior Barrera BIOINFO-USP DCC/IME-USP

Size: px
Start display at page:

Download "From microarray images to Biological knowledge. Junior Barrera BIOINFO-USP DCC/IME-USP"

Transcription

1 From microarray images to Biological knowledge Junior Barrera BIOINFO-USP DCC/IME-USP

2 Team Hugo A. Armelin Junior Barrera Helena Brentaini Marcel Brun Y. Chen Edward R. Dougherty Roberto M. Cesar Jr. Daniel Dantas Gustavo Esteves Marco D. Gubitoso Nina S. T. Hirata Roberto Hirata Jr Luiz F. Reis Paulo S. Silva Sandro de Souza Walter Trepode...

3 Layout Introduction Chip design Image analysis Normalization Genes Signature Clustering Genetic networks An environment for knowledge discovery

4 Introduction

5 Knowledge evolution in genetics Gene expression

6 Knowledge evolution in genetics

7 Data acquisition

8 Data acquisition

9 Analysis Phases G. Signature Measure Normal, Clustering Clustering Clustering G. R. Net.

10 Biological questions What genes are enough to separate a cancerous tissue from a normal one? What genes can separate to types of cancer? What genes differentiate two types of chickens or trees? What genes regulate a pathway?

11 Chip design

12 Clone Selection 1. Clustering for the same gene 5 Known gene 3 2. Choice of a clone representing the gene 5 3 Cuidado com sítios de Poliadenilação!!

13 Affect hybridization process Clone size - they should have similar size Clone position in the gene - they should come from similar regions Clone similarity - they should not have large similar regions

14 Hybridization cdnas Captura das Imagens Cy5 Misturas de cdnas marcados CY3 e CY5 Cy3

15 Image Analysis

16 A scanned image of a microarray slide

17 Processo de Segmentação de Imagens de Microarrays Hirata R, Barrera J, Hashimoto R, Dantas D, Esteves G. In press, 2002.

18 Raw data to the gene expression estimation step

19 Available solutions Scanalyze: usually doesn t t find misaligned spots. SpotFinder(TIGR): subarrays must be placed manually. Arrayvision: very good on locating misaligned spots; many options. UCSF Spot: : does everything automatically if the image is perfect. Quantarray,, F-scanF scan, Dapple, Genepix, Imagene etc. All of them require user interaction to some level.

20 Our aim... Is to reduce the user interaction, doing the job automatically and measuring correctly the relative mrna concentrations. This will make the process cheaper and faster. User interaction makes the segmentation subjective. Eliminating that, the results may be more reproducible.

21 Our software

22 Parameter setting In this window the user sets parameters for a whole family of arrays He can save in a file for reusing them

23 A vertical image profile is the sum of the spots values of each image line

24 The subarray gridding Is done by filtering the horizontal and vertical profiles

25 And finally taking the local minima of the filtered profile

26 the same is done with the horizontal profile. Here the result

27 Spots gridding is done separately for each subarray

28 The profile filtering is simpler having just one step, and also uses local minima

29 The spots detection step is basically the application of the Watershed operator

30 To avoid oversegmentation the image must be filtered

31 The filtered image also gives markers that will be used in Watershed

32 We give as input to the Watershed the markers, grid and the filtered image gradient

33 Here the resulting grid in white and spots cortours in light blue

34 Segmentation example

35 Segmentation example

36 Raw data to the gene expression estimation step The raw data of a spot consists on: the pixels values of both channels inside its rectangular region of interest which pixels belong to foreground or backround Foreground is the region with spotted cdna Background is the region without it.

37 Raw data to the gene expression estimation step

38 Gene expression estimation Is to find a value that represents the relative quantity of mrna in the two samples.

39 Some techniques to estimate gene expression Linear regression or least-squares squares fit of the values of pixels in the two channels.

40 Some techniques to estimate gene expression (ch1i-ch1b) ch1b) / (ch2i-ch2b) ch2b) where chxi is the estimated foreground intensity and chxb is the estimated backround intensity of channel X.

41 Some techniques to estimate gene expression To estimate chxi and chxb we can do: mean or median of all pixels in the foreground and background. mean or median of some percentiles in the foreground and background (fixed region method) mean or median of higher percentiles of all the pixels in the rectangle to estimate chxi and of lower percentiles to estimate the chxb.. Foreground and background information is ignored (histogram method)

42 Gene expression generation The program saves the expression data in a tab separated text file The file has the same format of the ones generated by ScanAlyze

43 Validation We made controlled experiments to test the expression estimation techniques. The objective of the experiment was to test how expression was affected by: position in the slide dilution of cdna length of mrna fragments being marked with cy3 or cy5

44 Validation We spotted microarrays with 32 blocks, each block with 6 genes x 5 dilutions x 2 repetitions + 4 landmarks = 64 spots We made six slides like this and, onto them, we poured six different mrna soups: Dilution gene 43A 43B 44A 44B 45A 45B Irf Trp ST IL Q Lys

45 Validation Here each point is the value of a spot obtained by the fixed region method. Spots from different dilutions are grouped. The black ones are from the three bigger mrna fragments, and the red, from the three smaller. sqrt( ( (ch1i A -ch1b A ) x (ch2i B -ch2b B ) ) sqrt( ( (ch2i A -ch2b A ) x (ch1i B -ch1b B ) )

46 Validation And here is the best result, obtained with the histogram method. sqrt( ( (ch1i A -ch1b A ) x (ch2i B -ch2b B ) ) sqrt( ( (ch2i A -ch2b A ) x (ch1i B -ch1b B ) )

47 Normalization

48 Normalization The expected expression of the gene IRF was 1.0 but the expression found was 1.6 This is due to the physical properties of the dyes.

49 Normalization When we have a single slide, we must eliminate the constant k assuming, when appropriate, that we can normalize all the spots using the expression of a housekeeping gene we can normalize assuming that the mean of normalized expression rates is one x = k (ch1i ch1b) (ch2i ch2b)

50 Normalization by swap Consists on eliminating the influence of the dyes properties by using two slides, and swapping the dye used to label the mrna sample. Use it if you find the single slide normalization hypotheses too strong.

51 Normalization by swap Better results can be achieved by doing swap experiments. x = k (ch1i A ch1b A ) = (ch2i B ch2b B ) 1 (ch2i A ch2b A ) (ch1i B ch1b B ) k

52 Normalization by swap Better results can be achieved by doing swap experiments. x = (ch1i (ch2i A A ch1b ch2b A A ) ) (ch2i (ch1i B B ch2b ch1b B B ) )

53 Genes Signature

54 Cancer tissue characterization Problem: from a small set (60) of microarrays, find a minimum number of genes that are enough to separate cancer A and B.

55 v erb b2 avian erythroblastic leukemia viral oncogene homolog LINEAR CLASSIFIER (DISPERSED GAUSSIAN) w/ σ = the tumor sample from Patient phosphofructokinase, platelet 7 th ( ) BRCA1 BRCA2/sporadic 1 th ( )

56 Approach randomize data compute classifier using genes subsets measure error for different dispersions choose the subset that balance small error and high dispersion. A supercomputer is required.

57 Linear classifier Dispersion centered in the sample Flat round dispersion model Error computed analytically (faster) X i [R n-1 ] 2 R h 2 r k R x k h X j [R 1 ] Misclassification error, ε n (h,r) Hyper-plane section: V n-1

58 Robustness analysis r k r j

59 Error curves under various dispersion levels, σ σ = 0.20 σ = 0.30 σ = 0.40 σ = 0.50 σ = 0.60 σ = 0.70 misclassification erorr, ε # of variables, d

60 Linear programming optimization

61

62

63

64 Steps The best linear classifier uses about genes Genes used are eliminated and the best linear classifier is computed, more genes are separated The procedure is repeated till having about 100 genes The full search is applied in the selected subset of genes

65 Separation

66 Validation Expression of chosen subsets of genes are measured several times in low cost experiments If the experiments reveal compact clusters the subset of genes chosen should be correct.

67 Clustering

68 Time course data 6 time course data C1 C2 C3 4 Clustered by dendrogram 2 Original clusters log2(ratio) 0 Dendrogram time KL plot multidimensional space different views of Microarray data KL plot mis-classified C1 C2 C

69 Example No error! Tighter clusters due to small variance Results from Fuzzy c-means

70 Gene Regulation Networks

71 Cell peptide other signals GENES NETWORK mrna Transcr. Translat. Proteins Pool Metabolic Pathways

72 Cell Cycle Modeling Division Steps x1fp... x1 z u1 y1fp x1f y1f I u2 u5 p w1fp vfp w2fp w1f v w2f y2fp w1 w2 x3f x4f y1f y2f y2f y1 y2 z Forward Signal Feedback to p Feedback to previous layer... x6fp x6f x6

73 Modeling Dynamical Systems

74 Simulator Architecture

75

76

77

78 Knockout x1fp... x1 Division Steps z u1 y1fp x1f y1f I u2 u5 p w1fp vfp w2fp w1f v w2f y2fp w1 w2 x3f x4f y1f y2f y2f y1 y2 z Forward Signal Feedback to p Feedback to previous layer... x6fp x6f x6

79

80

81 An environment for knowledge discovery

82 Objected oriented database P1 P2 Pn Pi : analytical and mining procedures (kernel parallel)

83 Integrated Environment Genbank database Another databases Access control module Clinical data Sequence module P1 microarray module P1 analitical operational P2 P3 analitical operational P2 P3 query Pn Pn query query

84 Users 1 Analysis tools writer SpreadSheet Dataminig 2 M_database MOLAP/ROLAP server 3 Metadata Data Warehouse Data Mart Integrator Wrapper Wrapper Wrapper 4 IMS RDMS Others

85 GRID Computer - DCC-IME-USP Internet and Intranet E4000 E4000:Databases and web application services (by SEFAZ) 10 processors 16 G-bytes main memory 1 T-bytes disk Processing services E3500 V880 V880?... (by CAGE-FAPESP) 4 processors 8 gigabytes (by Malaria-FAPESP) 16 processors 32 G-bytes (by eucalipto-cnpq???) 16 processors 32 G-bytes

86 Σ What genes regulate the pathway A->B->C->D? Wet Lab Proteome Transcriptome Genome Pathways

Clustering Algorithms: Can anything be Concluded?

Clustering Algorithms: Can anything be Concluded? Clustering Algorithms: Can anything be Concluded? Edward R. Dougherty, Seungchan Kim Texas A&M University Junior Barrera, Marcel Brun, Roberto Marcondes Universidade de Sao Paulo Yidong Chen, Michael Bittner,

More information

/ Computational Genomics. Normalization

/ Computational Genomics. Normalization 10-810 /02-710 Computational Genomics Normalization Genes and Gene Expression Technology Display of Expression Information Yeast cell cycle expression Experiments (over time) baseline expression program

More information

SVM Classification in -Arrays

SVM Classification in -Arrays SVM Classification in -Arrays SVM classification and validation of cancer tissue samples using microarray expression data Furey et al, 2000 Special Topics in Bioinformatics, SS10 A. Regl, 7055213 What

More information

Introduction to GE Microarray data analysis Practical Course MolBio 2012

Introduction to GE Microarray data analysis Practical Course MolBio 2012 Introduction to GE Microarray data analysis Practical Course MolBio 2012 Claudia Pommerenke Nov-2012 Transkriptomanalyselabor TAL Microarray and Deep Sequencing Core Facility Göttingen University Medical

More information

Micro-array Image Analysis using Clustering Methods

Micro-array Image Analysis using Clustering Methods Micro-array Image Analysis using Clustering Methods Mrs Rekha A Kulkarni PICT PUNE kulkarni_rekha@hotmail.com Abstract Micro-array imaging is an emerging technology and several experimental procedures

More information

Automatic Techniques for Gridding cdna Microarray Images

Automatic Techniques for Gridding cdna Microarray Images Automatic Techniques for Gridding cda Microarray Images aima Kaabouch, Member, IEEE, and Hamid Shahbazkia Department of Electrical Engineering, University of orth Dakota Grand Forks, D 58202-765 2 University

More information

EECS490: Digital Image Processing. Lecture #22

EECS490: Digital Image Processing. Lecture #22 Lecture #22 Gold Standard project images Otsu thresholding Local thresholding Region segmentation Watershed segmentation Frequency-domain techniques Project Images 1 Project Images 2 Project Images 3 Project

More information

Microarray Data Analysis (V) Preprocessing (i): two-color spotted arrays

Microarray Data Analysis (V) Preprocessing (i): two-color spotted arrays Microarray Data Analysis (V) Preprocessing (i): two-color spotted arrays Preprocessing Probe-level data: the intensities read for each of the components. Genomic-level data: the measures being used in

More information

MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS

MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS Mathematical and Computational Applications, Vol. 5, No. 2, pp. 240-247, 200. Association for Scientific Research MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS Volkan Uslan and Đhsan Ömür Bucak

More information

Nature Publishing Group

Nature Publishing Group Figure S I II III 6 7 8 IV ratio ssdna (S/G) WT hr hr hr 6 7 8 9 V 6 6 7 7 8 8 9 9 VII 6 7 8 9 X VI XI VIII IX ratio ssdna (S/G) rad hr hr hr 6 7 Chromosome Coordinate (kb) 6 6 Nature Publishing Group

More information

Fuzzy C-means with Bi-dimensional Empirical Mode Decomposition for Segmentation of Microarray Image

Fuzzy C-means with Bi-dimensional Empirical Mode Decomposition for Segmentation of Microarray Image www.ijcsi.org 316 Fuzzy C-means with Bi-dimensional Empirical Mode Decomposition for Segmentation of Microarray Image J.Harikiran 1, D.RamaKrishna 2, M.L.Phanendra 3, Dr.P.V.Lakshmi 4, Dr.R.Kiran Kumar

More information

Goal-oriented Schema in Biological Database Design

Goal-oriented Schema in Biological Database Design Goal-oriented Schema in Biological Database Design Ping Chen Department of Computer Science University of Helsinki Helsinki, Finland 00014 EMAIL: pchen@cs.helsinki.fi Abstract In this paper, I reviewed

More information

Introduction to Data Mining of Microarrays using the MicroArray Explorer

Introduction to Data Mining of Microarrays using the MicroArray Explorer Introduction to Data Mining of Microarrays using the MicroArray Explorer Peter F. Lemkin Lab. Experimental & Computational Biology, CCR, NCI Frederick, MD 21702 MAExplorer: http://www.lecb.ncifcrf.gov/maexplorer

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

EECS 730 Introduction to Bioinformatics Microarray. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Microarray. Luke Huan Electrical Engineering and Computer Science EECS 730 Introduction to Bioinformatics Microarray Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ GeneChip 2011/11/29 EECS 730 2 Hybridization to the Chip 2011/11/29

More information

CARMAweb users guide version Johannes Rainer

CARMAweb users guide version Johannes Rainer CARMAweb users guide version 1.0.8 Johannes Rainer July 4, 2006 Contents 1 Introduction 1 2 Preprocessing 5 2.1 Preprocessing of Affymetrix GeneChip data............................. 5 2.2 Preprocessing

More information

Preprocessing -- examples in microarrays

Preprocessing -- examples in microarrays Preprocessing -- examples in microarrays I: cdna arrays Image processing Addressing (gridding) Segmentation (classify a pixel as foreground or background) Intensity extraction (summary statistic) Normalization

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

Tutorial:OverRepresentation - OpenTutorials

Tutorial:OverRepresentation - OpenTutorials Tutorial:OverRepresentation From OpenTutorials Slideshow OverRepresentation (about 12 minutes) (http://opentutorials.rbvi.ucsf.edu/index.php?title=tutorial:overrepresentation& ce_slide=true&ce_style=cytoscape)

More information

Normalization: Bioconductor s marray package

Normalization: Bioconductor s marray package Normalization: Bioconductor s marray package Yee Hwa Yang 1 and Sandrine Dudoit 2 October 30, 2017 1. Department of edicine, University of California, San Francisco, jean@biostat.berkeley.edu 2. Division

More information

Analyzing ICAT Data. Analyzing ICAT Data

Analyzing ICAT Data. Analyzing ICAT Data Analyzing ICAT Data Gary Van Domselaar University of Alberta Analyzing ICAT Data ICAT: Isotope Coded Affinity Tag Introduced in 1999 by Ruedi Aebersold as a method for quantitative analysis of complex

More information

Pathway Studio Quick Start Guide

Pathway Studio Quick Start Guide Pathway Studio Quick Start Guide This Quick Start Guide is for users of the Pathway Studio 4.0 pathway analysis software. The Quick Start Guide demonstrates the key features of the software and provides

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1 Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group

More information

cdna Microarray Genome Image Processing Using Fixed Spot Position

cdna Microarray Genome Image Processing Using Fixed Spot Position American Journal of Applied Sciences 3 (2): 1730-1734, 2006 ISSN 1546-9239 2006 Science Publications cdna Microarray Genome Image Processing Using Fixed Spot Position Basim Alhadidi, Hussam Nawwaf Fakhouri

More information

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017 RNA-Seq Analysis of Breast Cancer Data November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com

More information

Bioconductor exercises 1. Exploring cdna data. June Wolfgang Huber and Andreas Buness

Bioconductor exercises 1. Exploring cdna data. June Wolfgang Huber and Andreas Buness Bioconductor exercises Exploring cdna data June 2004 Wolfgang Huber and Andreas Buness The following exercise will show you some possibilities to load data from spotted cdna microarrays into R, and to

More information

Analysis of (cdna) Microarray Data: Part I. Sources of Bias and Normalisation

Analysis of (cdna) Microarray Data: Part I. Sources of Bias and Normalisation Analysis of (cdna) Microarray Data: Part I. Sources of Bias and Normalisation MICROARRAY ANALYSIS My (Educated?) View 1. Data included in GEXEX a. Whole data stored and securely available b. GP3xCLI on

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services Patrick Wendel Imperial College, London Data Mining and Exploration Middleware for Distributed and Grid Computing,

More information

MAGE-ML: MicroArray Gene Expression Markup Language

MAGE-ML: MicroArray Gene Expression Markup Language MAGE-ML: MicroArray Gene Expression Markup Language Links: - Full MAGE specification: http://cgi.omg.org/cgi-bin/doc?lifesci/01-10-01 - MAGE-ML Document Type Definition (DTD): http://cgi.omg.org/cgibin/doc?lifesci/01-11-02

More information

Introduction to Systems Biology II: Lab

Introduction to Systems Biology II: Lab Introduction to Systems Biology II: Lab Amin Emad NIH BD2K KnowEnG Center of Excellence in Big Data Computing Carl R. Woese Institute for Genomic Biology Department of Computer Science University of Illinois

More information

CLUSTERING IN BIOINFORMATICS

CLUSTERING IN BIOINFORMATICS CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of

More information

Segmentation

Segmentation Lecture 6: Segmentation 24--4 Robin Strand Centre for Image Analysis Dept. of IT Uppsala University Today What is image segmentation? A smörgåsbord of methods for image segmentation: Thresholding Edge-based

More information

Windows. RNA-Seq Tutorial

Windows. RNA-Seq Tutorial Windows RNA-Seq Tutorial 2017 Gene Codes Corporation Gene Codes Corporation 525 Avis Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com

More information

MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 16 November pm BRAGG Cluster

MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 16 November pm BRAGG Cluster MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 6 November 2009 3.00 pm BRAGG Cluster This document contains the tasks need to be done and completed by

More information

ECS 234: Data Analysis: Clustering ECS 234

ECS 234: Data Analysis: Clustering ECS 234 : Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

THE preceding chapters were all devoted to the analysis of images and signals which

THE preceding chapters were all devoted to the analysis of images and signals which Chapter 5 Segmentation of Color, Texture, and Orientation Images THE preceding chapters were all devoted to the analysis of images and signals which take values in IR. It is often necessary, however, to

More information

Network Analysis, Visualization, & Graphing TORonto (NAViGaTOR) User Documentation

Network Analysis, Visualization, & Graphing TORonto (NAViGaTOR) User Documentation Network Analysis, Visualization, & Graphing TORonto (NAViGaTOR) User Documentation Jurisica Lab, Ontario Cancer Institute http://ophid.utoronto.ca/navigator/ November 10, 2006 Contents 1 Introduction 2

More information

Double Self-Organizing Maps to Cluster Gene Expression Data

Double Self-Organizing Maps to Cluster Gene Expression Data Double Self-Organizing Maps to Cluster Gene Expression Data Dali Wang, Habtom Ressom, Mohamad Musavi, Cristian Domnisoru University of Maine, Department of Electrical & Computer Engineering, Intelligent

More information

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1 Automated Bioinformatics Analysis System on Chip ABASOC version 1.1 Phillip Winston Miller, Priyam Patel, Daniel L. Johnson, PhD. University of Tennessee Health Science Center Office of Research Molecular

More information

How do microarrays work

How do microarrays work Lecture 3 (continued) Alvis Brazma European Bioinformatics Institute How do microarrays work condition mrna cdna hybridise to microarray condition Sample RNA extract labelled acid acid acid nucleic acid

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

Biclustering for Microarray Data: A Short and Comprehensive Tutorial

Biclustering for Microarray Data: A Short and Comprehensive Tutorial Biclustering for Microarray Data: A Short and Comprehensive Tutorial 1 Arabinda Panda, 2 Satchidananda Dehuri 1 Department of Computer Science, Modern Engineering & Management Studies, Balasore 2 Department

More information

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. Americo Pereira, Jan Otto Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. ABSTRACT In this paper we want to explain what feature selection is and

More information

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging Prof. Daniel Cremers 8. Boosting and Bagging Repetition: Regression We start with a set of basis functions (x) =( 0 (x), 1(x),..., M 1(x)) x 2 í d The goal is to fit a model into the data y(x, w) =w T

More information

Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data

Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Ryan Atallah, John Ryan, David Aeschlimann December 14, 2013 Abstract In this project, we study the problem of classifying

More information

Vector Xpression 3. Speed Tutorial: III. Creating a Script for Automating Normalization of Data

Vector Xpression 3. Speed Tutorial: III. Creating a Script for Automating Normalization of Data Vector Xpression 3 Speed Tutorial: III. Creating a Script for Automating Normalization of Data Table of Contents Table of Contents...1 Important: Please Read...1 Opening Data in Raw Data Viewer...2 Creating

More information

Segmentation

Segmentation Lecture 6: Segmentation 215-13-11 Filip Malmberg Centre for Image Analysis Uppsala University 2 Today What is image segmentation? A smörgåsbord of methods for image segmentation: Thresholding Edge-based

More information

Is deformable image registration a solved problem?

Is deformable image registration a solved problem? Is deformable image registration a solved problem? Marcel van Herk On behalf of the imaging group of the RT department of NKI/AVL Amsterdam, the Netherlands DIR 1 Image registration Find translation.deformation

More information

Excel R Tips. is used for multiplication. + is used for addition. is used for subtraction. / is used for division

Excel R Tips. is used for multiplication. + is used for addition. is used for subtraction. / is used for division Excel R Tips EXCEL TIP 1: INPUTTING FORMULAS To input a formula in Excel, click on the cell you want to place your formula in, and begin your formula with an equals sign (=). There are several functions

More information

Machine Learning. Computational biology: Sequence alignment and profile HMMs

Machine Learning. Computational biology: Sequence alignment and profile HMMs 10-601 Machine Learning Computational biology: Sequence alignment and profile HMMs Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Growth

More information

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1 Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1,2 Keyue Ding, Ph.D. Nov. 8, 2014 1 NCIC Clinical Trials Group, Kingston, Ontario, Canada 2 Dept. Public

More information

Modern Medical Image Analysis 8DC00 Exam

Modern Medical Image Analysis 8DC00 Exam Parts of answers are inside square brackets [... ]. These parts are optional. Answers can be written in Dutch or in English, as you prefer. You can use drawings and diagrams to support your textual answers.

More information

SVM CLASSIFICATION AND ANALYSIS OF MARGIN DISTANCE ON MICROARRAY DATA. A Thesis. Presented to. The Graduate Faculty of The University of Akron

SVM CLASSIFICATION AND ANALYSIS OF MARGIN DISTANCE ON MICROARRAY DATA. A Thesis. Presented to. The Graduate Faculty of The University of Akron SVM CLASSIFICATION AND ANALYSIS OF MARGIN DISTANCE ON MICROARRAY DATA A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master

More information

Region-based Segmentation

Region-based Segmentation Region-based Segmentation Image Segmentation Group similar components (such as, pixels in an image, image frames in a video) to obtain a compact representation. Applications: Finding tumors, veins, etc.

More information

Analyzing Genomic Data with NOJAH

Analyzing Genomic Data with NOJAH Analyzing Genomic Data with NOJAH TAB A) GENOME WIDE ANALYSIS Step 1: Select the example dataset or upload your own. Two example datasets are available. Genome-Wide TCGA-BRCA Expression datasets and CoMMpass

More information

Informatica Data Explorer Performance Tuning

Informatica Data Explorer Performance Tuning Informatica Data Explorer Performance Tuning 2011 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)

More information

Clustering Color/Intensity. Group together pixels of similar color/intensity.

Clustering Color/Intensity. Group together pixels of similar color/intensity. Clustering Color/Intensity Group together pixels of similar color/intensity. Agglomerative Clustering Cluster = connected pixels with similar color. Optimal decomposition may be hard. For example, find

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

0 Graphical Analysis Use of Excel

0 Graphical Analysis Use of Excel Lab 0 Graphical Analysis Use of Excel What You Need To Know: This lab is to familiarize you with the graphing ability of excels. You will be plotting data set, curve fitting and using error bars on the

More information

Workload Characterization Techniques

Workload Characterization Techniques Workload Characterization Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

Introduction to the Bioconductor marray package : Input component

Introduction to the Bioconductor marray package : Input component Introduction to the Bioconductor marray package : Input component Yee Hwa Yang 1 and Sandrine Dudoit 2 April 30, 2018 Contents 1. Department of Medicine, University of California, San Francisco, jean@biostat.berkeley.edu

More information

Feature Selection in Knowledge Discovery

Feature Selection in Knowledge Discovery Feature Selection in Knowledge Discovery Susana Vieira Technical University of Lisbon, Instituto Superior Técnico Department of Mechanical Engineering, Center of Intelligent Systems, IDMEC-LAETA Av. Rovisco

More information

Out-of-sample extension of diffusion maps in a computer-aided diagnosis system. Application to breast cancer virtual slide images.

Out-of-sample extension of diffusion maps in a computer-aided diagnosis system. Application to breast cancer virtual slide images. Out-of-sample extension of diffusion maps in a computer-aided diagnosis system. Application to breast cancer virtual slide images. Philippe BELHOMME Myriam OGER Jean-Jacques MICHELS Benoit PLANCOULAINE

More information

Introduction to Medical Imaging (5XSA0) Module 5

Introduction to Medical Imaging (5XSA0) Module 5 Introduction to Medical Imaging (5XSA0) Module 5 Segmentation Jungong Han, Dirk Farin, Sveta Zinger ( s.zinger@tue.nl ) 1 Outline Introduction Color Segmentation region-growing region-merging watershed

More information

Automated Data Pipelines & Intelligent Microscopy

Automated Data Pipelines & Intelligent Microscopy Galaxy Workshop Freiburg University Automated Data Pipelines & Intelligent Microscopy RNAi Screening Facility BioQuant, Heidelberg University Jürgen Reymann Scientific Infrastructure Automated Data Pipelines

More information

Tutorial 1: Exploring the UCSC Genome Browser

Tutorial 1: Exploring the UCSC Genome Browser Last updated: May 12, 2011 Tutorial 1: Exploring the UCSC Genome Browser Open the homepage of the UCSC Genome Browser at: http://genome.ucsc.edu/ In the blue bar at the top, click on the Genomes link.

More information

Norbert Schuff VA Medical Center and UCSF

Norbert Schuff VA Medical Center and UCSF Norbert Schuff Medical Center and UCSF Norbert.schuff@ucsf.edu Medical Imaging Informatics N.Schuff Course # 170.03 Slide 1/67 Objective Learn the principle segmentation techniques Understand the role

More information

Gene expression & Clustering (Chapter 10)

Gene expression & Clustering (Chapter 10) Gene expression & Clustering (Chapter 10) Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species Dynamic programming Approximate pattern matching

More information

10 Microarray Analysis Using the MicroArray Explorer

10 Microarray Analysis Using the MicroArray Explorer 10 Microarray Analysis Using the MicroArray Explorer Peter F. Lemkin, Gregory C. Thornwall, Jai Evans 3 (1) Laboratory of Experimental and Computational Biology, CCR, NCI- Frederick, Frederick, MD 21702;

More information

RPPanalyzer (Version 1.0.3) Analyze reverse phase protein array data User s Guide

RPPanalyzer (Version 1.0.3) Analyze reverse phase protein array data User s Guide RPPanalyzer (Version 1.0.3) Analyze reverse phase protein array data User s Guide Heiko Mannsperger and Stephan Gade German Cancer Research Center Heidelberg, Germany October 1, 2012 Contents 1 Introduction

More information

ROTS: Reproducibility Optimized Test Statistic

ROTS: Reproducibility Optimized Test Statistic ROTS: Reproducibility Optimized Test Statistic Fatemeh Seyednasrollah, Tomi Suomi, Laura L. Elo fatsey (at) utu.fi March 3, 2016 Contents 1 Introduction 2 2 Algorithm overview 3 3 Input data 3 4 Preprocessing

More information

Introduction to Bioinformatics AS Laboratory Assignment 2

Introduction to Bioinformatics AS Laboratory Assignment 2 Introduction to Bioinformatics AS 250.265 Laboratory Assignment 2 Last week, we discussed several high-throughput methods for the analysis of gene expression in cells. Of those methods, microarray technologies

More information

TIGR ExpressConverter

TIGR ExpressConverter TIGR ExpressConverter (Version 1.7) January, 2005 Microarray Software Group The Institute for Genomic Research Table of Contents Introduction ---------------------------------------------------------------------------

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

Organizing, cleaning, and normalizing (smoothing) cdna microarray data

Organizing, cleaning, and normalizing (smoothing) cdna microarray data Organizing, cleaning, and normalizing (smoothing) cdna microarray data All product names are given as examples only and they are not endorsed by the USDA or the University of Illinois. INTRODUCTION The

More information

SEEK User Manual. Introduction

SEEK User Manual. Introduction SEEK User Manual Introduction SEEK is a computational gene co-expression search engine. It utilizes a vast human gene expression compendium to deliver fast, integrative, cross-platform co-expression analyses.

More information

Contents. ! Data sets. ! Distance and similarity metrics. ! K-means clustering. ! Hierarchical clustering. ! Evaluation of clustering results

Contents. ! Data sets. ! Distance and similarity metrics. ! K-means clustering. ! Hierarchical clustering. ! Evaluation of clustering results Statistical Analysis of Microarray Data Contents Data sets Distance and similarity metrics K-means clustering Hierarchical clustering Evaluation of clustering results Clustering Jacques van Helden Jacques.van.Helden@ulb.ac.be

More information

Image Analysis - Lecture 5

Image Analysis - Lecture 5 Texture Segmentation Clustering Review Image Analysis - Lecture 5 Texture and Segmentation Magnus Oskarsson Lecture 5 Texture Segmentation Clustering Review Contents Texture Textons Filter Banks Gabor

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 3/3/08 CAP5510 1 Gene g Probe 1 Probe 2 Probe N 3/3/08 CAP5510

More information

AGA User Manual. Version 1.0. January 2014

AGA User Manual. Version 1.0. January 2014 AGA User Manual Version 1.0 January 2014 Contents 1. Getting Started... 3 1a. Minimum Computer Specifications and Requirements... 3 1b. Installation... 3 1c. Running the Application... 4 1d. File Preparation...

More information

Clustering Jacques van Helden

Clustering Jacques van Helden Statistical Analysis of Microarray Data Clustering Jacques van Helden Jacques.van.Helden@ulb.ac.be Contents Data sets Distance and similarity metrics K-means clustering Hierarchical clustering Evaluation

More information

Lecture: Segmentation I FMAN30: Medical Image Analysis. Anders Heyden

Lecture: Segmentation I FMAN30: Medical Image Analysis. Anders Heyden Lecture: Segmentation I FMAN30: Medical Image Analysis Anders Heyden 2017-11-13 Content What is segmentation? Motivation Segmentation methods Contour-based Voxel/pixel-based Discussion What is segmentation?

More information

Breeding Guide. Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel

Breeding Guide. Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel Breeding Guide Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel www.phenome-netwoks.com Contents PHENOME ONE - INTRODUCTION... 3 THE PHENOME ONE LAYOUT... 4 THE JOBS ICON...

More information

Topics of the talk. Biodatabases. Data types. Some sequence terminology...

Topics of the talk. Biodatabases. Data types. Some sequence terminology... Topics of the talk Biodatabases Jarno Tuimala / Eija Korpelainen CSC What data are stored in biological databases? What constitutes a good database? Nucleic acid sequence databases Amino acid sequence

More information

Performance Evaluation of the TINA Medical Image Segmentation Algorithm on Brainweb Simulated Images

Performance Evaluation of the TINA Medical Image Segmentation Algorithm on Brainweb Simulated Images Tina Memo No. 2008-003 Internal Memo Performance Evaluation of the TINA Medical Image Segmentation Algorithm on Brainweb Simulated Images P. A. Bromiley Last updated 20 / 12 / 2007 Imaging Science and

More information

Introduction to Cancer Genomics

Introduction to Cancer Genomics Introduction to Cancer Genomics Gene expression data analysis part I David Gfeller Computational Cancer Biology Ludwig Center for Cancer research david.gfeller@unil.ch 1 Overview 1. Basic understanding

More information

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. Michael Nechyba 1. Abstract The objective of this project is to apply well known

More information

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms

More information

Dimension reduction : PCA and Clustering

Dimension reduction : PCA and Clustering Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental

More information

Introduction to R and Statistical Data Analysis

Introduction to R and Statistical Data Analysis Microarray Center Introduction to R and Statistical Data Analysis PART II Petr Nazarov petr.nazarov@crp-sante.lu 22-11-2010 OUTLINE PART II Descriptive statistics in R (8) sum, mean, median, sd, var, cor,

More information

Image Processing: Final Exam November 10, :30 10:30

Image Processing: Final Exam November 10, :30 10:30 Image Processing: Final Exam November 10, 2017-8:30 10:30 Student name: Student number: Put your name and student number on all of the papers you hand in (if you take out the staple). There are always

More information

Generalized Principal Component Analysis CVPR 2007

Generalized Principal Component Analysis CVPR 2007 Generalized Principal Component Analysis Tutorial @ CVPR 2007 Yi Ma ECE Department University of Illinois Urbana Champaign René Vidal Center for Imaging Science Institute for Computational Medicine Johns

More information

GLIDER: Gradient Landmark-Based Distributed Routing for Sensor Networks. Stanford University. HP Labs

GLIDER: Gradient Landmark-Based Distributed Routing for Sensor Networks. Stanford University. HP Labs GLIDER: Gradient Landmark-Based Distributed Routing for Sensor Networks Qing Fang Jie Gao Leonidas J. Guibas Vin de Silva Li Zhang Stanford University HP Labs Point-to-Point Routing in Sensornets Routing

More information

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations

More information

ScienceDirect. Fuzzy clustering algorithms for cdna microarray image spots segmentation.

ScienceDirect. Fuzzy clustering algorithms for cdna microarray image spots segmentation. Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 417 424 International Conference on Information and Communication Technologies (ICICT 2014) Fuzzy clustering

More information

Committee: Dr. Rosemary Renaut 1 Professor Department of Mathematics and Statistics, Director Computational Biosciences PSM Arizona State University

Committee: Dr. Rosemary Renaut 1 Professor Department of Mathematics and Statistics, Director Computational Biosciences PSM Arizona State University Evaluation of Gene Selection Using Support Vector Machine Recursive Feature Elimination A report presented in fulfillment of internship requirements of the CBS PSM Degree Committee: Dr. Rosemary Renaut

More information

DNA microarrays [1] are used to measure the expression

DNA microarrays [1] are used to measure the expression IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 24, NO. 7, JULY 2005 901 Mixture Model Analysis of DNA Microarray Images K. Blekas*, Member, IEEE, N. P. Galatsanos, Senior Member, IEEE, A. Likas, Senior Member,

More information

GeneSifter.Net User s Guide

GeneSifter.Net User s Guide GeneSifter.Net User s Guide 1 2 GeneSifter.Net Overview Login Upload Tools Pairwise Analysis Create Projects For more information about a feature see the corresponding page in the User s Guide noted in

More information