Supervised Clustering of Yeast Gene Expression Data
|
|
- Godwin Daniels
- 5 years ago
- Views:
Transcription
1 Supervised Clustering of Yeast Gene Expression Data In the DeRisi paper five expression profile clusters were cited, each containing a small number (7-8) of genes. In the following examples we apply supervised clustering techniques to these cluster prototypes, classifying the remaining genes in the dataset. Classifiers were first trained on the genes in the original clusters, and then applied to the remaining genes to assign them to a cluster. In the first example, a Kohonen self-organizing feature map was used to arrange the original clusters in a two dimensional layout. The unclassified genes were mapped using this layout, creating a clustering of the genes. New clusters were defined by selecting a region of the map corresponding to each new cluster, thus classifying the genes within that region. In the second example, a decision tree was produced by training it on the original clusters. An extra cluster was added to represent those genes not sufficiently satisfying the original DeRisi cluster expression profiles. The remaining genes were filtered removing those without a significant change in expression level, and were then classified by the decision tree. In the third example, a Naive-Bayes classifier was generated from the original clusters.
2 15 DeRisi Cluster Expression Profiles 10 Fold Change Time Centroid B (n=7) Centroid C (n=7) Centroid D (n=7) Centroid E (n=7) Centroid F (n=8) The original DeRisi clusters are represented by a graph of the cluster centroids.
3 A parallel coordinates visualization displaying gene expression levels for each DeRisi cluster.
4 A Kohonen self-organizing feature map computes a new pair of axes and locates the genes according to its idea of similarity.
5 A Kohonen self-organizing feature map displaying user defined clusters.
6 Kohonen Map Cluster Expression Profiles Fold Change Time Centroid none (n=5730) Centroid newb (n=17) Centroid newf (n=210) Centroid newd (n=143) Centroid newc (n=26) Centroid newe (n=27) The Kohonen self-organizing feature map clusters presented by a plot of the cluster centroids.
7 A parallel coordinates visualization showing the new Kohonen map clusters as compared to the original Derisi clusters.
8 A visualization of a decision tree that was created from the original DeRisi clusters (plus an extra None cluster). This part of the subtree shows clusters E and F being split from cluster None at time 18.5, and clusters E and F being split apart at time 14.5.
9 Decision Tree Cluster Expression Profiles Fold Change Time Centroid none (n=197) Centroid C (n=21) Centroid E (n=71) Centroid B (n=47) Centroid D (n=72) Centroid F (n=347) The decision tree clusters presented by a plot of the cluster centroids.
10 Visualization of the Naive-Bayes classifier created from the original DeRisi clusters. The attributes are listed in order of importance (with respect to the cluster designation). The fact that the squares for time 18.5 are mostly one color indicates time 18.5 is a very good predictor for the cluster class.
11 This visualization of the Naive-Bayes classifier shows the probability distribution for cluster D. Cluster D can be classified perfectly from attribute T18.5 alone.
12 Cluster G2/M (n=195) Time Points Expression levels of the five yeast cell cycle peak phases, as designated from the Spellman dataset. The average of each cluster is plotted for all time periods (T0-T160), along with the standard deviation values for each peak phase. T0 T10 T20 T30 T40 T50 T60 T70 T80 T90 T100 T110 T120 T130 T140 T150 T160 T0 T10 T20 T30 T40 T50 T60 T70 T80 T90 T100 T110 T120 T130 T140 T150 T160 T0 T10 T20 T30 T40 T50 T60 T70 T80 T90 T100 T110 T120 T130 T140 T150 T160 Fold Change Cluster G1(n=300) T0 T10 T20 T30 T40 T50 T60 T70 T80 T90 T100 T110 T120 T130 T140 T150 T160 Time Points Cluster S (n=71) Time Points T0 T10 T20 T30 T40 T50 T60 T70 T80 T90 T100 T110 T120 T130 T140 T150 T160 Fold Change Cluster S/G2 (n=121) Time Points Cluster M/G1 (n=113) Time Points Fold Change Fold Change Fold Change
13 A visualization of the five Spellman peak phase clusters displayed as a sequence of sixteen histograms for each cell cycle.
14 A Radviz visualization of the yeast cell cycle data, clustered using the time and colored by Spellman s peak phase classification. This visualization technique employs the physical concept of spring forces to position the multi-dimensional data.
15 A dendogram visualization displaying a user selected cluster generated by a standard hierarchical clustering method
16 A K-means clustering of the Spellman data. The visualization features a relative neighborhood graph (minimum set of lines that connect the centroids) and the outliers for all five K-means clusters.
17 A plot displaying the results of a Kohonen self-organizing map generated from the Spellman data, with the classification from Cho overlaid.
18 The statistical results of a self-organizing feature map trained on the Spellman data. Blue lines display the cluster centroids and the red lines show the standard deviations.
19 Comparison of a K-means clustering technique that generates 30 clusters with the five expression patterns designated by Spellman. While some of the 30 clusters represent subsets of a Spellman class (such as the yellow lines), other clusters have genes that fall into two or more Spellman Classes.
20 A comparison of two clustering techniques using a jittered scatterplot of the Spellman data. Five clusters from one technique (along the Y-axis) are compared with 12 clusters from another technique (along the X-axis). If the X-axis clusters were a pure superset of the Y-axis clusters then there would only be one clump per vertical line. In this case only the 12 th cluster on the X-axis is pure while the 1 st is nearly so.
21 A circle segment visualization comparing the results of different classification techniques. The true class is represented in color, while the predicted class is represented with a grayscale. If the change in grayscale value matches the change in color, then there is a strong correlation between the true and predicted class. In this example the "cl03" correlates well with the true class feature, the peak.
22 Comparing Clustering Techniques Rank Clustering Data Number of %correct %correct %correct %correct %correct Technique Clusters method -1 method -2 method -3 method -4 maximum 1 Kohonen 3 Norm Kohonen 1 Norm Kohonen 2 Norm C K-means 1 Norm SOM 4 Original SOM 12 Original Kohonen 2 Original C K-means 1 Original Kohonen 1 Original Kohonen 3 Original C K-means 2 Norm SOM 7 Norm M K-means 1 Original Dendogram 2 Original K-means 2 Original SOM 7 Original Dendogram 1 Original SOM 12 Norm M K-means 2 Original M K-means 3 Original random Original The results of several clustering techniques were compared to the five Spellman classifications (G2/M, G1, S, S/G2, and M/G1). For a given technique, each generated cluster was considered to be a subset of one of the Spellman classes. The class chosen for each cluster was based on the majority of Spellman classes for the genes in that cluster. After each cluster was categorized, the resulting accuracies were calculated. The total percent correct and the average accuracy for each class was calculated and is presented in the method columns.
23 Unsupervised Clustering of Yeast Gene Expression Data In the Cho paper, 416 genes were visually identified as cell cycle regulated. In the Spellman paper, the Cho data was combined with the results from other experiments and 800 genes were identified algorithmically as cell cycle regulated. In the following examples, we apply various unsupervised clustering techniques to a subset of the Cho dataset (the 800 genes that were identified in). The first row (images 1-3) consists of visualizations of the original data (gene expression levels during two cell cycles). The second row (images 4-6) visually presents the results of several clustering algorithms. The third row (images 7-9) displays the statistical properties of each cluster generated by various algorithms. The fourth row (images 10-12) provides visual comparisons between selected clustering algorithms.
24 References Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell Dec; 9(12): Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW. A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2: 65-73, DeRisi JL, Iyer VR, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science Oct 24; 278(5338):
Visualizing Gene Clusters using Neighborhood Graphs in R
Theresa Scharl & Friedrich Leisch Visualizing Gene Clusters using Neighborhood Graphs in R Technical Report Number 16, 2008 Department of Statistics University of Munich http://www.stat.uni-muenchen.de
More informationReconstructing Boolean Networks from Noisy Gene Expression Data
2004 8th International Conference on Control, Automation, Robotics and Vision Kunming, China, 6-9th December 2004 Reconstructing Boolean Networks from Noisy Gene Expression Data Zheng Yun and Kwoh Chee
More informationIntroduction to Mfuzz package and its graphical user interface
Introduction to Mfuzz package and its graphical user interface Matthias E. Futschik SysBioLab, Universidade do Algarve URL: http://mfuzz.sysbiolab.eu and Lokesh Kumar Institute for Advanced Biosciences,
More informationCompClustTk Manual & Tutorial
CompClustTk Manual & Tutorial Brandon King Copyright c California Institute of Technology Version 0.1.10 May 13, 2004 Contents 1 Introduction 1 1.1 Purpose.............................................
More informationAlgorithms for Bounded-Error Correlation of High Dimensional Data in Microarray Experiments
Algorithms for Bounded-Error Correlation of High Dimensional Data in Microarray Experiments Mehmet Koyutürk, Ananth Grama, and Wojciech Szpankowski Department of Computer Sciences, Purdue University West
More informationContents. ! Data sets. ! Distance and similarity metrics. ! K-means clustering. ! Hierarchical clustering. ! Evaluation of clustering results
Statistical Analysis of Microarray Data Contents Data sets Distance and similarity metrics K-means clustering Hierarchical clustering Evaluation of clustering results Clustering Jacques van Helden Jacques.van.Helden@ulb.ac.be
More informationClustering Jacques van Helden
Statistical Analysis of Microarray Data Clustering Jacques van Helden Jacques.van.Helden@ulb.ac.be Contents Data sets Distance and similarity metrics K-means clustering Hierarchical clustering Evaluation
More informationGenomics - Problem Set 2 Part 1 due Friday, 1/26/2018 by 9:00am Part 2 due Friday, 2/2/2018 by 9:00am
Genomics - Part 1 due Friday, 1/26/2018 by 9:00am Part 2 due Friday, 2/2/2018 by 9:00am One major aspect of functional genomics is measuring the transcript abundance of all genes simultaneously. This was
More informationNew Genetic Operators for Solving TSP: Application to Microarray Gene Ordering
New Genetic Operators for Solving TSP: Application to Microarray Gene Ordering Shubhra Sankar Ray, Sanghamitra Bandyopadhyay, and Sankar K. Pal Machine Intelligence Unit, Indian Statistical Institute,
More informationCLUSTERING GENE EXPRESSION DATA USING AN EFFECTIVE DISSIMILARITY MEASURE 1
International Journal of Computational Bioscience, Vol. 1, No. 1, 2010 CLUSTERING GENE EXPRESSION DATA USING AN EFFECTIVE DISSIMILARITY MEASURE 1 R. Das, D.K. Bhattacharyya, and J.K. Kalita Abstract This
More informationDouble Self-Organizing Maps to Cluster Gene Expression Data
Double Self-Organizing Maps to Cluster Gene Expression Data Dali Wang, Habtom Ressom, Mohamad Musavi, Cristian Domnisoru University of Maine, Department of Electrical & Computer Engineering, Intelligent
More informationGenomics - Problem Set 2 Part 1 due Friday, 1/25/2019 by 9:00am Part 2 due Friday, 2/1/2019 by 9:00am
Genomics - Part 1 due Friday, 1/25/2019 by 9:00am Part 2 due Friday, 2/1/2019 by 9:00am One major aspect of functional genomics is measuring the transcript abundance of all genes simultaneously. This was
More informationCompClustTk Manual & Tutorial
CompClustTk Manual & Tutorial Brandon King Diane Trout Copyright c California Institute of Technology Version 0.2.0 May 16, 2005 Contents 1 Introduction 1 1.1 Purpose.............................................
More informationGene Expression Clustering with Functional Mixture Models
Gene Expression Clustering with Functional Mixture Models Darya Chudova, Department of Computer Science University of California, Irvine Irvine CA 92697-3425 dchudova@ics.uci.edu Eric Mjolsness Department
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationPackage Mfuzz. R topics documented: March 26, Version Date Title Soft clustering of time series gene expression data
Package Mfuzz March 26, 2013 Version 2.16.1 Date 2012-09-20 Title Soft clustering of time series gene expression data Author Matthias Futschik Maintainer Matthias Futschik
More informationAn Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis
An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer Science The University of Oklahoma Norman, Oklahoma,
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationEvaluation and comparison of gene clustering methods in microarray analysis
Evaluation and comparison of gene clustering methods in microarray analysis Anbupalam Thalamuthu 1 Indranil Mukhopadhyay 1 Xiaojing Zheng 1 George C. Tseng 1,2 1 Department of Human Genetics 2 Department
More informationPackage Mfuzz. R topics documented: July 4, Version Date
Version 2.41.0 Date 2016-10-18 Package Mfuzz July 4, 2018 Title Soft clustering of time series gene expression data Author Matthias Futschik Maintainer Matthias Futschik
More informationMissing Data Estimation in Microarrays Using Multi-Organism Approach
Missing Data Estimation in Microarrays Using Multi-Organism Approach Marcel Nassar and Hady Zeineddine Progress Report: Data Mining Course Project, Spring 2008 Prof. Inderjit S. Dhillon April 02, 2008
More informationAdvances in microarray technologies (1 5) have enabled
Statistical modeling of large microarray data sets to identify stimulus-response profiles Lue Ping Zhao*, Ross Prentice*, and Linda Breeden Divisions of *Public Health Sciences and Basic Sciences, Fred
More informationMining Microarray Gene Expression Data
Mining Microarray Gene Expression Data Michinari Momma (1) Minghu Song (2) Jinbo Bi (3) (1) mommam@rpi.edu, Dept. of Decision Sciences and Engineering Systems (2) songm@rpi.edu, Dept. of Chemistry (3)
More informationModes and Clustering for Time-Warped Gene Expression Profile Data
Modes and Clustering for Time-Warped Gene Expression Profile Data Xueli Liu and Hans-Georg Müller,. Departments of Human Genetics and Biomathematics, UCLA School of Medicine, Los Angeles, CA 995. Department
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More information/ Computational Genomics. Normalization
10-810 /02-710 Computational Genomics Normalization Genes and Gene Expression Technology Display of Expression Information Yeast cell cycle expression Experiments (over time) baseline expression program
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationPredicting Gene Function and Localization
Predicting Gene Function and Localization By Ankit Kumar and Raissa Largman CS 229 Fall 2013 I. INTRODUCTION Our data comes from the 2001 KDD Cup Data Mining Competition. The competition had two tasks,
More informationClustering Techniques
Clustering Techniques Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 16 Lopresti Fall 2007 Lecture 16-1 - Administrative notes Your final project / paper proposal is due on Friday,
More informationOverview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8
Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions
More informationTutorial:OverRepresentation - OpenTutorials
Tutorial:OverRepresentation From OpenTutorials Slideshow OverRepresentation (about 12 minutes) (http://opentutorials.rbvi.ucsf.edu/index.php?title=tutorial:overrepresentation& ce_slide=true&ce_style=cytoscape)
More informationEvaluation of different biological data and computational classification methods for use in protein interaction prediction.
Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly
More informationA NOVEL HYBRID APPROACH TO ESTIMATING MISSING VALUES IN DATABASES USING K-NEAREST NEIGHBORS AND NEURAL NETWORKS
International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 7(A), July 2012 pp. 4705 4717 A NOVEL HYBRID APPROACH TO ESTIMATING MISSING
More informationHow do microarrays work
Lecture 3 (continued) Alvis Brazma European Bioinformatics Institute How do microarrays work condition mrna cdna hybridise to microarray condition Sample RNA extract labelled acid acid acid nucleic acid
More informationBiclustering Bioinformatics Data Sets. A Possibilistic Approach
Possibilistic algorithm Bioinformatics Data Sets: A Possibilistic Approach Dept Computer and Information Sciences, University of Genova ITALY EMFCSC Erice 20/4/2007 Bioinformatics Data Sets Outline Introduction
More informationPROBLEM 4
PROBLEM 2 PROBLEM 4 PROBLEM 5 PROBLEM 6 PROBLEM 7 PROBLEM 8 PROBLEM 9 PROBLEM 10 PROBLEM 11 PROBLEM 12 PROBLEM 13 PROBLEM 14 PROBLEM 16 PROBLEM 17 PROBLEM 22 PROBLEM 23 PROBLEM 24 PROBLEM 25
More informationRadmacher, M, McShante, L, Simon, R (2002) A paradigm for Class Prediction Using Expression Profiles, J Computational Biol 9:
Microarray Statistics Module 3: Clustering, comparison, prediction, and Go term analysis Johanna Hardin and Laura Hoopes Worksheet to be handed in the week after discussion Name Clustering algorithms:
More informationA Hybrid Algorithm for K-medoid Clustering of Large Data Sets
A Hybrid Algorithm for K-medoid Clustering of Large Data Sets Weiguo Sheng Department of Information System and Computing, Brunel University, UBX 3PH London, UK Email: weiguo.sheng@brunel.ac.uk Abstract-ln
More informationMICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS
Mathematical and Computational Applications, Vol. 5, No. 2, pp. 240-247, 200. Association for Scientific Research MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS Volkan Uslan and Đhsan Ömür Bucak
More informationWhat is clustering. Organizing data into clusters such that there is high intra- cluster similarity low inter- cluster similarity
Clustering What is clustering Organizing data into clusters such that there is high intra- cluster similarity low inter- cluster similarity Informally, finding natural groupings among objects. High dimensional
More informationGPU Accelerated PK-means Algorithm for Gene Clustering
GPU Accelerated PK-means Algorithm for Gene Clustering Wuchao Situ, Yau-King Lam, Yi Xiao, P.W.M. Tsang, and Chi-Sing Leung Department of Electronic Engineering, City University of Hong Kong, Hong Kong,
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 3/3/08 CAP5510 1 Gene g Probe 1 Probe 2 Probe N 3/3/08 CAP5510
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationIncorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data
Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Ryan Atallah, John Ryan, David Aeschlimann December 14, 2013 Abstract In this project, we study the problem of classifying
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationUse of biclustering for missing value imputation in gene expression data
ORIGINAL RESEARCH Use of biclustering for missing value imputation in gene expression data K.O. Cheng, N.F. Law, W.C. Siu Department of Electronic and Information Engineering, The Hong Kong Polytechnic
More informationR (2) Data analysis case study using R for readily available data set using any one machine learning algorithm.
Assignment No. 4 Title: SD Module- Data Science with R Program R (2) C (4) V (2) T (2) Total (10) Dated Sign Data analysis case study using R for readily available data set using any one machine learning
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationUsing Google s PageRank Algorithm to Identify Important Attributes of Genes
Using Google s PageRank Algorithm to Identify Important Attributes of Genes Golam Morshed Osmani Ph.D. Student in Software Engineering Dept. of Computer Science North Dakota State Univesity Fargo, ND 58105
More informationClustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017
Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationCHAPTER 5 CLUSTER VALIDATION TECHNIQUES
120 CHAPTER 5 CLUSTER VALIDATION TECHNIQUES 5.1 INTRODUCTION Prediction of correct number of clusters is a fundamental problem in unsupervised classification techniques. Many clustering techniques require
More informationValidating Clustering for Gene Expression Data
Validating Clustering for Gene Expression Data Ka Yee Yeung David R. Haynor Walter L. Ruzzo Technical Report UW-CSE-00-01-01 January, 2000 Department of Computer Science & Engineering University of Washington
More informationPackage ctc. R topics documented: August 2, Version Date Depends amap. Title Cluster and Tree Conversion.
Package ctc August 2, 2013 Version 1.35.0 Date 2005-11-16 Depends amap Title Cluster and Tree Conversion. Author Antoine Lucas , Laurent Gautier biocviews Microarray,
More informationConstructing Bayesian Network Models of Gene Expression Networks from Microarray Data
Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data Peter Spirtes a, Clark Glymour b, Richard Scheines a, Stuart Kauffman c, Valerio Aimale c, Frank Wimberly c a Department
More informationRedefining and Enhancing K-means Algorithm
Redefining and Enhancing K-means Algorithm Nimrat Kaur Sidhu 1, Rajneet kaur 2 Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India 1 Assistant Professor,
More informationIntroduction to Bioinformatics AS Laboratory Assignment 2
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 2 Last week, we discussed several high-throughput methods for the analysis of gene expression in cells. Of those methods, microarray technologies
More informationDI TRANSFORM. The regressive analyses. identify relationships
July 2, 2015 DI TRANSFORM MVstats TM Algorithm Overview Summary The DI Transform Multivariate Statistics (MVstats TM ) package includes five algorithm options that operate on most types of geologic, geophysical,
More informationA STUDY ON DYNAMIC CLUSTERING OF GENE EXPRESSION DATA
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY ON DYNAMIC CLUSTERING OF GENE EXPRESSION DATA ADELA-MARIA SÎRBU Abstract. Microarray and next-generation sequencing technologies
More informationCANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.
CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. Michael Nechyba 1. Abstract The objective of this project is to apply well known
More informationUnsupervised Learning I: K-Means Clustering
Unsupervised Learning I: K-Means Clustering Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp. 487-515, 532-541, 546-552 (http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf)
More informationChapter 1. Using the Cluster Analysis. Background Information
Chapter 1 Using the Cluster Analysis Background Information Cluster analysis is the name of a multivariate technique used to identify similar characteristics in a group of observations. In cluster analysis,
More informationEMMA: An EM-based Imputation Technique for Handling Missing Sample-Values in Microarray Expression Profiles.
EMMA: An EM-based Imputation Technique for Handling Missing Sample-Values in Microarray Expression Profiles. Amitava Karmaker 1 *, Edward A. Salinas 2, Stephen Kwek 3 1 University of Wisconsin-Stout, Menomonie,
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationAn integrated tool for microarray data clustering and cluster validity assessment
An integrated tool for microarray data clustering and cluster validity assessment Nadia Bolshakova Department of Computer Science Trinity College Dublin Ireland +353 1 608 3688 Nadia.Bolshakova@cs.tcd.ie
More informationSupervised vs.unsupervised Learning
Supervised vs.unsupervised Learning In supervised learning we train algorithms with predefined concepts and functions based on labeled data D = { ( x, y ) x X, y {yes,no}. In unsupervised learning we are
More informationIntroduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)
Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data
More informationNature Publishing Group
Figure S I II III 6 7 8 IV ratio ssdna (S/G) WT hr hr hr 6 7 8 9 V 6 6 7 7 8 8 9 9 VII 6 7 8 9 X VI XI VIII IX ratio ssdna (S/G) rad hr hr hr 6 7 Chromosome Coordinate (kb) 6 6 Nature Publishing Group
More informationRobust PDF Table Locator
Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records
More informationA Frequent Itemset Nearest Neighbor Based Approach for Clustering Gene Expression Data
A Frequent Itemset Nearest Neighbor Based Approach for Clustering Gene Expression Data Rosy Das, D. K. Bhattacharyya and J. K. Kalita Department of Computer Science and Engineering Tezpur University, Tezpur,
More informationEstimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification
1 Estimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification Feng Chu and Lipo Wang School of Electrical and Electronic Engineering Nanyang Technological niversity Singapore
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationPackage RobustRankAggreg
Type Package Package RobustRankAggreg Title Methods for robust rank aggregation Version 1.1 Date 2010-11-14 Author Raivo Kolde, Sven Laur Maintainer February 19, 2015 Methods for aggregating ranked lists,
More informationModel-Based Clustering and Data Transformations for Gene Expression Data
To appear, Bioinformatics and The Third Georgia Tech-Emory International Conference on Bioinformatics Model-Based Clustering and Data Transformations for Gene Expression Data Yeung, K. Y. y Fraley, C.
More informationCSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..
More informationData mining techniques for actuaries: an overview
Data mining techniques for actuaries: an overview Emiliano A. Valdez joint work with Banghee So and Guojun Gan University of Connecticut Advances in Predictive Analytics (APA) Conference University of
More informationCorrelation Motif Vignette
Correlation Motif Vignette Hongkai Ji, Yingying Wei October 30, 2018 1 Introduction The standard algorithms for detecting differential genes from microarray data are mostly designed for analyzing a single
More informationParallel Coordinates ++
Parallel Coordinates ++ CS 4460/7450 - Information Visualization Feb. 2, 2010 John Stasko Last Time Viewed a number of techniques for portraying low-dimensional data (about 3
More informationBiological Networks Analysis
Biological Networks Analysis Introduction and Dijkstra s algorithm Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The clustering problem: partition genes into distinct
More informationClaNC: The Manual (v1.1)
ClaNC: The Manual (v1.1) Alan R. Dabney June 23, 2008 Contents 1 Installation 3 1.1 The R programming language............................... 3 1.2 X11 with Mac OS X....................................
More informationDESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES
EXPERIMENTAL WORK PART I CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES The evaluation of models built using statistical in conjunction with various feature subset
More informationData mining with Support Vector Machine
Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine
More informationGrid-Layout Visualization Method in the Microarray Data Analysis Interactive Graphics Toolkit
Grid-Layout Visualization Method in the Microarray Data Analysis Interactive Graphics Toolkit Li Xiao, Oleg Shats, and Simon Sherman * Nebraska Informatics Center for the Life Sciences Eppley Institute
More informationMining Gene Expression Data Using PCA Based Clustering
Vol. 5, No. 1, January-June 2012, pp. 13-18, Published by Serials Publications, ISSN: 0973-7413 Mining Gene Expression Data Using PCA Based Clustering N.P. Gopalan 1 and B. Sathiyabhama 2 * 1 Department
More informationVisual Data Mining. Overview. Apr. 24, 2007 Visual Analytics presentation Julia Nam
Overview Visual Data Mining Apr. 24, 2007 Visual Analytics presentation Julia Nam Visual Classification: An Interactive Approach to Decision Tree Construction M. Ankerst, C. Elsen, M. Ester, H. Kriegel,
More informationCS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008
CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof. Ruiz Problem
More informationTHE MANUAL.
THE MANUAL Jeffrey T. Leek, Eva Monsen, Alan R. Dabney, and John D. Storey Department of Biostatistics Department of Genome Sciences University of Washington http://faculty.washington.edu/jstorey/edge/
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationAPPLY DATA CLUSTERING TO GENE EXPRESSION DATA
California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 12-2015 APPLY DATA CLUSTERING TO GENE EXPRESSION DATA Abdullah Jameel
More informationAdaptive quality-based clustering of gene expression profiles
Adaptive quality-based clustering of gene expression profiles Frank De Smet *, Janick Mathys, Kathleen Marchal, Gert Thijs, Bart De Moor, Yves Moreau ESAT-SISTA/COSIC/DocArch, K.U.Leuven, Kasteelpark Arenberg
More informationSeismic facies analysis using generative topographic mapping
Satinder Chopra + * and Kurt J. Marfurt + Arcis Seismic Solutions, Calgary; The University of Oklahoma, Norman Summary Seismic facies analysis is commonly carried out by classifying seismic waveforms based
More informationBMC Bioinformatics. Open Access. Abstract
BMC Bioinformatics BioMed Central Methodology article Methods for simultaneously identifying coherent local clusters with smooth global patterns in gene expression profiles Yin-Jing Tien 1, Yun-Shien Lee
More informationFigures and figure supplements
RESEARCH ARTICLE Figures and figure supplements Comprehensive machine learning analysis of Hydra behavior reveals a stable basal behavioral repertoire Shuting Han et al Han et al. elife 8;7:e35. DOI: https://doi.org/.755/elife.35
More informationFPF-SB: a Scalable Algorithm for Microarray Gene Expression Data Clustering
FPF-SB: a Scalable Algorithm for Microarray Gene Expression Data Clustering Filippo Geraci 1,3, Mauro Leoncini 2,1, Manuela Montangero 2,1, Marco Pellegrini 1, and M. Elena Renda 1 1 CNR, Istituto di Informatica
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationTime Series Gene Expression Data Classification via L 1 -norm Temporal SVM
Time Series Gene Expression Data Classification via L 1 -norm Temporal SVM Carlotta Orsenigo and Carlo Vercellis Dept. of Management, Economics and Industrial Engineering, Politecnico di Milano Via Lambruschini
More informationA Feature Selection Method to Handle Imbalanced Data in Text Classification
A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University
More informationRandom Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources. Y. Qi, J. Klein-Seetharaman, and Z.
Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources Y. Qi, J. Klein-Seetharaman, and Z. Bar-Joseph Pacific Symposium on Biocomputing 10:531-542(2005) RANDOM FOREST
More information