A systematic performance evaluation of clustering methods for single-cell RNA-seq data Supplementary Figures and Tables

Size: px
Start display at page:

Download "A systematic performance evaluation of clustering methods for single-cell RNA-seq data Supplementary Figures and Tables"

Transcription

1 A systematic performance evaluation of clustering methods for single-cell RNA-seq data Supplementary Figures and Tables Angelo Duò, Mark D Robinson, Charlotte Soneson

2 Supplementary Table : Overview of the datasets used in the study. Data set Total # cells Total # features Filtering Retained # cells Retained # features Koh 8,98 KohTCC 8,98 Kumar 6,9 KumarTCC 6 8, SimKumareasy,66 SimKumarhard 99,68 SimKumar8hard 99,6 Trapnell, TrapnellTCC 7 68,9 Zhengmixeq,99,68 Zhengmixuneq 6,98 6, Zhengmix8eq,99,76,898,898 MDrop,898 8,9 8,9 MDrop 8,9 6, 6,6 MDrop 6,6 6 8, 6 8, MDrop 6 8,,6,6 MDrop,6 99,6 99,6 MDrop 99,6 99,6 99,6 MDrop 99,6,, MDrop, 7 68,9 7 68,9 MDrop 7 68,9,,6,,7 MDrop,,7 6,,6,79,6 MDrop,8,6,97,7,798,7 MDrop,66,7

3 Zhengmixeq Zhengmixuneq Zhengmix8eq cd.monocytes cd.monocytes dimension dimension dimension cd.monocytes cd.t.helper cd6.nk memory.t naive.t dimension dimension Kumar dimension Trapnell Koh H7_derived_APS Dgcr8 knockout mouse serum+lif v6. mouse i+lif v6. mouse serum+lif H7_derived_DLtM T T T8 dimension dimension dimension H7_derived_DCntrlDrmmtm H7_derived_DLLpPXM H7_derived_ESMT H7_derived_MPS H7_derived_Sclrtm H7_dreived_D._Smtmrs H7hESC dimension dimension SimKumareasy dimension SimKumarhard SimKumar8hard Group Group Group Group Group Group dimension Group dimension dimension Group Group Group Group Group Group Group6 Group7 Group8 dimension dimension KohTCC dimension TrapnellTCC KumarTCC H7_derived_APS H7_derived_DCntrlDrmmtm H7_derived_DLLpPXM H7_derived_ESMT H7_derived_MPS H7_derived_Sclrtm T T T8 Dgcr8 knockout mouse serum+lif v6. mouse i+lif v6. mouse serum+lif H7_dreived_D._Smtmrs dimension H7_derived_DLtM dimension dimension H7hESC dimension dimension dimension Supplementary Figure : t-sne representations of all data sets used in the study (before gene filtering). The cells are colored by the annotation used to define the ground truth, to which all clusterings are compared.

4 Zhengmixeq Zhengmixuneq MDrop MDrop.6% 6.8% MDrop 7.%.%.%.7% 9.% 9.8%.6%.6%.8%.%.%.%.88%.% 6%.9% Kumar Trapnell MDrop Koh MDrop.9% 9.% MDrop.%.9% 8.9%.% 6.%.% 7.%.%.9%.%.89% 6.%.86%.% 8.7%.%.%.8%.7% SimKumareasy SimKumarhard MDrop %.% 7.%.6% Zhengmix8eq SimKumar8hard MDrop.99% 6%.% 6.9%.6%.%.7%.9% KohTCC 7.% MDrop % MDrop MDrop 8.%.7%.%.6%.%.% 6.8%.79% 8% 6%.%.7%.% KumarTCC.69% 8.% % TrapnellTCC MDrop.9%.7% 6.% 6.%.%.% 9.7% 7.9% % Supplementary Figure : Overlaps between the features retained by the different gene filtering approaches, for each of the data sets. The numbers represent percentages of the total number of features selected by at least one of the approaches.

5 Koh KohTCC Kumar KumarTCC SimKumareasy SimKumarhard SimKumar8hard Trapnell TrapnellTCC Zhengmixeq Zhengmixuneq Zhengmix8eq..7 Median Adjusted Rand Index filtered filtered filteredmdrop Number of clusters SC SCsvm Supplementary Figure : Adjusted Rand Index (ARI) as a function of the number of clusters (k), for each of the data sets (columns) and gene filterings (rows). The vertical lines indicate the true number of populations in the respective data sets.

6 A sce_filtered_zhengmixeq B sce_filtered_zhengmixeq True class True class cd.monocytes cd.monocytes dimension Clusters dimension Clusters C dimension sce_filtered_zhengmixuneq D dimension sce_filtered_trapnell SC True class cd.monocytes True class T T dimension Clusters dimension T8 Clusters dimension dimension Supplementary Figure : t-sne plots showing examples of data set/clustering method combinations with a large disagreement between the inferred and true clusters. The respective data sets, gene filtering methods and clustering methods are indicated above each plot. The true clusters are indicated with different shapes, and the inferred clusters in different colors. 6

7 Kumar KumarTCC SimKumareasy KohTCC Koh Zhengmixuneq SimKumarhard Zhengmixeq SimKumar8hard Zhengmix8eq TrapnellTCC Trapnell filtered filtered filteredmdrop Median ARI..7.. SCsvm SC SCsvm SC SCsvm SC Supplementary Figure : Adjusted Rand Index (ARI) at the number of clusters that gives the best performance. Some methods failed to return any clustering for certain data sets (indicated by white squares). Association between number of detected features and cluster assignment Koh KohTCC Kumar KumarTCC 7 SimKumareasy SimKumarhard SimKumar8hard Trapnell 8 6 TrapnellTCC Zhengmixeq Zhengmixuneq Zhengmix8eq SC SCsvm SC SCsvm SC SCsvm SC SCsvm SC SCsvm log(adjusted p value) filtering filtered filtered filteredmdrop Supplementary Figure 6: Association between the number of detected features and the cluster assignment, for all data sets and filterings, with the true number of clusters. Only one of the five runs is shown. Values are adjusted p-values (using the Benjamini-Hochberg method) from a Kruskal-Wallis test. 7

8 Koh KohTCC Kumar KumarTCC SimKumareasy SimKumarhard SimKumar8hard Trapnell TrapnellTCC Zhengmixeq Zhengmixuneq Zhengmix8eq Elapsed time (s) filtered filtered filteredmdrop SC SCsvm Number of clusters Supplementary Figure 7: Actual run times versus the imposed number of clusters, for each data set and gene filtering method. 8

9 Koh KohTCC Kumar KumarTCC SimKumareasy SimKumarhard SimKumar8hard Trapnell TrapnellTCC Zhengmixeq Zhengmixuneq Zhengmix8eq Elapsed time (s) filtered filtered filteredmdrop SC SCsvm SC SCsvm SC SCsvm SC SCsvm SC SCsvm SC SCsvm SC SCsvm SC SCsvm SC SCsvm SC SCsvm SC SCsvm SC SCsvm Supplementary Figure 8: Actual run times, across all numbers of clusters, for each data set and gene filtering method. 9

10 Median ARI. MDrop_MDrop.8 _.6 MDrop MDrop. SCsvm SC Supplementary Figure 9: Comparison of the clusterings obtained after different gene filterings. The values represent the median Adjusted Rand Index (ARI) across the data sets and filterings, for the true number of clusters, for each pair of filterings. Koh KohTCC Kumar KumarTCC SimKumareasy SimKumarhard SimKumar8hard Trapnell TrapnellTCC Zhengmixeq Zhengmixuneq Zhengmix8eq Entropy filtered filtered filteredmdrop SC SCsvm Number of clusters Supplementary Figure : Entropy for different imposed numbers of clusters. The true number of clusters is indicated by the vertical line, and the cross represents the entropy of the true partition.

11 filtered filtered filteredmdrop Difference between estimated and true k SC SCsvm SC SCsvm SC SCsvm Supplementary Figure : Difference between estimated and true number of clusters, for methods including a function to estimate the number of clusters. Kumar KumarTCC filtered filtered filteredmdrop SimKumareasy KohTCC Koh SimKumar8hard SimKumarhard Zhengmixeq Zhengmixuneq Zhengmix8eq Median ARI TrapnellTCC Trapnell SCsvm SC SCsvm SC SCsvm SC Supplementary Figure : Adjusted Rand Index (ARI) at the number of clusters estimated by the indicated methods. Empty squares indicate that the estimated number of clusters was outside the evaluated range (- for data sets with less than or equal to 7 true clusters, - for data sets with more than 7 true clusters.

Review: Identification of cell types from single-cell transcriptom. method

Review: Identification of cell types from single-cell transcriptom. method Review: Identification of cell types from single-cell transcriptomes using a novel clustering method University of North Carolina at Charlotte October 12, 2015 Brief overview Identify clusters by merging

More information

Clustering. RNA-seq: What is it good for? Finding Similarly Expressed Genes. Data... And Lots of It!

Clustering. RNA-seq: What is it good for? Finding Similarly Expressed Genes. Data... And Lots of It! RNA-seq: What is it good for? Clustering High-throughput RNA sequencing experiments (RNA-seq) offer the ability to measure simultaneously the expression level of thousands of genes in a single experiment!

More information

Minitab Notes for Activity 1

Minitab Notes for Activity 1 Minitab Notes for Activity 1 Creating the Worksheet 1. Label the columns as team, heat, and time. 2. Have Minitab automatically enter the team data for you. a. Choose Calc / Make Patterned Data / Simple

More information

Supplementary Material. Cell type-specific termination of transcription by transposable element sequences

Supplementary Material. Cell type-specific termination of transcription by transposable element sequences Supplementary Material Cell type-specific termination of transcription by transposable element sequences Andrew B. Conley and I. King Jordan Controls for TTS identification using PET A series of controls

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

srap: Simplified RNA-Seq Analysis Pipeline

srap: Simplified RNA-Seq Analysis Pipeline srap: Simplified RNA-Seq Analysis Pipeline Charles Warden October 30, 2017 1 Introduction This package provides a pipeline for gene expression analysis. The normalization function is specific for RNA-Seq

More information

ChIP-seq hands-on practical using Galaxy

ChIP-seq hands-on practical using Galaxy ChIP-seq hands-on practical using Galaxy In this exercise we will cover some of the basic NGS analysis steps for ChIP-seq using the Galaxy framework: Quality control Mapping of reads using Bowtie2 Peak-calling

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

Math 121 Project 4: Graphs

Math 121 Project 4: Graphs Math 121 Project 4: Graphs Purpose: To review the types of graphs, and use MS Excel to create them from a dataset. Outline: You will be provided with several datasets and will use MS Excel to create graphs.

More information

Windows. RNA-Seq Tutorial

Windows. RNA-Seq Tutorial Windows RNA-Seq Tutorial 2017 Gene Codes Corporation Gene Codes Corporation 525 Avis Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com

More information

mirnet Tutorial Starting with expression data

mirnet Tutorial Starting with expression data mirnet Tutorial Starting with expression data Computer and Browser Requirements A modern web browser with Java Script enabled Chrome, Safari, Firefox, and Internet Explorer 9+ For best performance and

More information

Package SC3. November 27, 2017

Package SC3. November 27, 2017 Type Package Title Single-Cell Consensus Clustering Version 1.7.1 Author Vladimir Kiselev Package SC3 November 27, 2017 Maintainer Vladimir Kiselev A tool for unsupervised

More information

Package scmap. March 29, 2019

Package scmap. March 29, 2019 Type Package Package scmap March 29, 2019 Title A tool for unsupervised projection of single cell RNA-seq data Version 1.4.1 Author Vladimir Kiselev Maintainer Vladimir Kiselev

More information

Package SC3. September 29, 2018

Package SC3. September 29, 2018 Type Package Title Single-Cell Consensus Clustering Version 1.8.0 Author Vladimir Kiselev Package SC3 September 29, 2018 Maintainer Vladimir Kiselev A tool for unsupervised

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and

More information

Differential Expression Analysis at PATRIC

Differential Expression Analysis at PATRIC Differential Expression Analysis at PATRIC The following step- by- step workflow is intended to help users learn how to upload their differential gene expression data to their private workspace using Expression

More information

SNPViewer Documentation

SNPViewer Documentation SNPViewer Documentation Module name: Description: Author: SNPViewer Displays SNP data plotting copy numbers and LOH values Jim Robinson (Broad Institute), gp-help@broad.mit.edu Summary: The SNPViewer displays

More information

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1 Automated Bioinformatics Analysis System on Chip ABASOC version 1.1 Phillip Winston Miller, Priyam Patel, Daniel L. Johnson, PhD. University of Tennessee Health Science Center Office of Research Molecular

More information

Package SIMLR. December 1, 2017

Package SIMLR. December 1, 2017 Version 1.4.0 Date 2017-10-21 Package SIMLR December 1, 2017 Title Title: SIMLR: Single-cell Interpretation via Multi-kernel LeaRning Maintainer Daniele Ramazzotti Depends

More information

How to Make Graphs in EXCEL

How to Make Graphs in EXCEL How to Make Graphs in EXCEL The following instructions are how you can make the graphs that you need to have in your project.the graphs in the project cannot be hand-written, but you do not have to use

More information

SEEK User Manual. Introduction

SEEK User Manual. Introduction SEEK User Manual Introduction SEEK is a computational gene co-expression search engine. It utilizes a vast human gene expression compendium to deliver fast, integrative, cross-platform co-expression analyses.

More information

Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming

Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming R. Karthick 1, Dr. Malathi.A 2 Research Scholar, Department of Computer

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

16 - Comparing Groups

16 - Comparing Groups 16 - Comparing Groups Contents 16 - COMPARING GROUPS... 1 QUALITATIVE DATA: CONTENT OF CODED SEGMENTS... 1 QUANTITATIVE DATA: CODE FREQUENCIES... 3 16 - Comparing Groups MAXQDA allows you to compare different

More information

Tutorial: RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and Expression measures

Tutorial: RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and Expression measures : RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and February 24, 2014 Sample to Insight : RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and : RNA-Seq Analysis

More information

How to use the DEGseq Package

How to use the DEGseq Package How to use the DEGseq Package Likun Wang 1,2 and Xi Wang 1. October 30, 2018 1 MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST /Department of Automation, Tsinghua University. 2

More information

Charts in Excel 2003

Charts in Excel 2003 Charts in Excel 2003 Contents Introduction Charts in Excel 2003...1 Part 1: Generating a Basic Chart...1 Part 2: Adding Another Data Series...3 Part 3: Other Handy Options...5 Introduction Charts in Excel

More information

Nature Methods: doi: /nmeth Supplementary Figure 1

Nature Methods: doi: /nmeth Supplementary Figure 1 Supplementary Figure 1 Schematic representation of the Workflow window in Perseus All data matrices uploaded in the running session of Perseus and all processing steps are displayed in the order of execution.

More information

ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013

ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013 ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013 1. Data and objectives We will use the data from GEO (GSE35368, Toedling, Servant et al. 2011). Two samples were

More information

CHAPTER 6. The Normal Probability Distribution

CHAPTER 6. The Normal Probability Distribution The Normal Probability Distribution CHAPTER 6 The normal probability distribution is the most widely used distribution in statistics as many statistical procedures are built around it. The central limit

More information

Supplementary text S6 Comparison studies on simulated data

Supplementary text S6 Comparison studies on simulated data Supplementary text S Comparison studies on simulated data Peter Langfelder, Rui Luo, Michael C. Oldham, and Steve Horvath Corresponding author: shorvath@mednet.ucla.edu Overview In this document we illustrate

More information

Expression Analysis with the Advanced RNA-Seq Plugin

Expression Analysis with the Advanced RNA-Seq Plugin Expression Analysis with the Advanced RNA-Seq Plugin May 24, 2016 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com

More information

Canadian National Longitudinal Survey of Children and Youth (NLSCY)

Canadian National Longitudinal Survey of Children and Youth (NLSCY) Canadian National Longitudinal Survey of Children and Youth (NLSCY) Fathom workshop activity For more information about the survey, see: http://www.statcan.ca/ Daily/English/990706/ d990706a.htm Notice

More information

Page 1. Graphical and Numerical Statistics

Page 1. Graphical and Numerical Statistics TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise

More information

PROMO 2017a - Tutorial

PROMO 2017a - Tutorial PROMO 2017a - Tutorial Introduction... 2 Installing PROMO... 2 Step 1 - Importing data... 2 Step 2 - Preprocessing... 6 Step 3 Data Exploration... 9 Step 4 Clustering... 13 Step 5 Analysis of sample clusters...

More information

RNA-Seq Analysis With the Tuxedo Suite

RNA-Seq Analysis With the Tuxedo Suite June 2016 RNA-Seq Analysis With the Tuxedo Suite Dena Leshkowitz Introduction In this exercise we will learn how to analyse RNA-Seq data using the Tuxedo Suite tools: Tophat, Cuffmerge, Cufflinks and Cuffdiff.

More information

Projection with Public Data (PPD)

Projection with Public Data (PPD) Projection with Public Data (PPD) Goal To compare users 16S rrna data with published datasets by processing and normalization them together, and projecting into 3D PCoA plot for visual comparative analysis

More information

Package ALDEx2. April 4, 2019

Package ALDEx2. April 4, 2019 Type Package Package ALDEx2 April 4, 2019 Title Analysis Of Differential Abundance Taking Sample Variation Into Account Version 1.14.1 Date 2017-10-11 Author Greg Gloor, Ruth Grace Wong, Andrew Fernandes,

More information

Package M3Drop. August 23, 2018

Package M3Drop. August 23, 2018 Version 1.7.0 Date 2016-07-15 Package M3Drop August 23, 2018 Title Michaelis-Menten Modelling of Dropouts in single-cell RNASeq Author Tallulah Andrews Maintainer Tallulah Andrews

More information

Detecting Novel Associations in Large Data Sets

Detecting Novel Associations in Large Data Sets Detecting Novel Associations in Large Data Sets J. Hjelmborg Department of Biostatistics 5. februar 2013 Biostat (Biostatistics) Detecting Novel Associations in Large Data Sets 5. februar 2013 1 / 22 Overview

More information

Using the Kolmogorov-Smirnov Test for Image Segmentation

Using the Kolmogorov-Smirnov Test for Image Segmentation Using the Kolmogorov-Smirnov Test for Image Segmentation Yong Jae Lee CS395T Computational Statistics Final Project Report May 6th, 2009 I. INTRODUCTION Image segmentation is a fundamental task in computer

More information

HIERARCHICAL DATA VISUALIZATION AS A TOOL FOR DEVELOPING STUDENT UNDERSTANDING OF VARIATION OF DATA GENERATED IN SIMULATIONS

HIERARCHICAL DATA VISUALIZATION AS A TOOL FOR DEVELOPING STUDENT UNDERSTANDING OF VARIATION OF DATA GENERATED IN SIMULATIONS HIERARCHICAL DATA VISUALIZATION AS A TOOL FOR DEVELOPING STUDENT UNDERSTANDING OF VARIATION OF DATA GENERATED IN SIMULATIONS William Concord Consortium, USA wfinzer@concord.org Data Games is a data exploration

More information

Project 11 Graphs (Using MS Excel Version )

Project 11 Graphs (Using MS Excel Version ) Project 11 Graphs (Using MS Excel Version 2007-10) Purpose: To review the types of graphs, and use MS Excel 2010 to create them from a dataset. Outline: You will be provided with several datasets and will

More information

Version 1.0. Reference manual: RaceID (Rare Cell Type Identification)

Version 1.0. Reference manual: RaceID (Rare Cell Type Identification) Version.0 Reference manual: RaceID (Rare Cell Type Identification) 0. Prerequisites... I. Input data... II. Preprocessing of input data... 3 III. k-means clustering... 3 IV. Outlier identification... 8

More information

Slides URL.

Slides URL. MDS vs t SNE 1 2 Slides URL https://git.embl.de/klaus/mds_vs_tsne 3 Multi Dimensional Scaling (MDS) MDS: given distances can we infer (Eucledian) coordinates? yes!, via the double centering trick distances

More information

Technical University of Munich. Exercise 8: Neural Networks

Technical University of Munich. Exercise 8: Neural Networks Technical University of Munich Chair for Bioinformatics and Computational Biology Protein Prediction I for Computer Scientists SoSe 2018 Prof. B. Rost M. Bernhofer, M. Heinzinger, D. Nechaev, L. Richter

More information

Statistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes. Clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group will be similar (or

More information

Package EMDomics. November 11, 2018

Package EMDomics. November 11, 2018 Package EMDomics November 11, 2018 Title Earth Mover's Distance for Differential Analysis of Genomics Data The EMDomics algorithm is used to perform a supervised multi-class analysis to measure the magnitude

More information

DREM. Dynamic Regulatory Events Miner (v1.0.9b) User Manual

DREM. Dynamic Regulatory Events Miner (v1.0.9b) User Manual DREM Dynamic Regulatory Events Miner (v1.0.9b) User Manual Jason Ernst (jernst@cs.cmu.edu) Ziv Bar-Joseph Machine Learning Department School of Computer Science Carnegie Mellon University Contents 1 Introduction

More information

How do microarrays work

How do microarrays work Lecture 3 (continued) Alvis Brazma European Bioinformatics Institute How do microarrays work condition mrna cdna hybridise to microarray condition Sample RNA extract labelled acid acid acid nucleic acid

More information

Package BayesMetaSeq

Package BayesMetaSeq Type Package Package BayesMetaSeq March 19, 2016 Title Bayesian hierarchical model for RNA-seq differential meta-analysis Version 1.0 Date 2016-01-29 Author Tianzhou Ma, George Tseng Maintainer Tianzhou

More information

3.3 Hardware Karnaugh Maps

3.3 Hardware Karnaugh Maps 2P P = P = 3.3 Hardware UIntroduction A Karnaugh map is a graphical method of Boolean logic expression reduction. A Boolean expression can be reduced to its simplest form through the 4 simple steps involved

More information

BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14)

BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14) BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14) Genome Informatics (Part 1) https://bioboot.github.io/bggn213_f17/lectures/#14 Dr. Barry Grant Nov 2017 Overview: The purpose of this lab session is

More information

CompClustTk Manual & Tutorial

CompClustTk Manual & Tutorial CompClustTk Manual & Tutorial Brandon King Copyright c California Institute of Technology Version 0.1.10 May 13, 2004 Contents 1 Introduction 1 1.1 Purpose.............................................

More information

Evaluation and comparison of gene clustering methods in microarray analysis

Evaluation and comparison of gene clustering methods in microarray analysis Evaluation and comparison of gene clustering methods in microarray analysis Anbupalam Thalamuthu 1 Indranil Mukhopadhyay 1 Xiaojing Zheng 1 George C. Tseng 1,2 1 Department of Human Genetics 2 Department

More information

fastgear Manual Contents

fastgear Manual Contents fastgear Manual fastgear was developed in a collaborative project by Rafal Mostowy, Nicholas J. Croucher, Cheryl P. Andam, Jukka Corander, William P. Hanage, and Pekka Marttinen* Contents Introduction...

More information

Chapter 2 Assignment (due Thursday, April 19)

Chapter 2 Assignment (due Thursday, April 19) (due Thursday, April 19) Introduction: The purpose of this assignment is to analyze data sets by creating histograms and scatterplots. You will use the STATDISK program for both. Therefore, you should

More information

Statistics with a Hemacytometer

Statistics with a Hemacytometer Statistics with a Hemacytometer Overview This exercise incorporates several different statistical analyses. Data gathered from cell counts with a hemacytometer is used to explore frequency distributions

More information

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017 RNA-Seq Analysis of Breast Cancer Data November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

EXERCISE: GETTING STARTED WITH SAV

EXERCISE: GETTING STARTED WITH SAV Sequencing Analysis Viewer (SAV) Overview 1 EXERCISE: GETTING STARTED WITH SAV Purpose This exercise explores the following topics: How to load run data into SAV How to explore run metrics with SAV Getting

More information

User s Guide. Using the R-Peridot Graphical User Interface (GUI) on Windows and GNU/Linux Systems

User s Guide. Using the R-Peridot Graphical User Interface (GUI) on Windows and GNU/Linux Systems User s Guide Using the R-Peridot Graphical User Interface (GUI) on Windows and GNU/Linux Systems Pitágoras Alves 01/06/2018 Natal-RN, Brazil Index 1. The R Environment Manager...

More information

CURIOUS BROWSERS: Automated Gathering of Implicit Interest Indicators by an Instrumented Browser

CURIOUS BROWSERS: Automated Gathering of Implicit Interest Indicators by an Instrumented Browser CURIOUS BROWSERS: Automated Gathering of Implicit Interest Indicators by an Instrumented Browser David Brown Mark Claypool Computer Science Department Worcester Polytechnic Institute Worcester, MA 01609,

More information

Chapter 2 Assignment (due Thursday, October 5)

Chapter 2 Assignment (due Thursday, October 5) (due Thursday, October 5) Introduction: The purpose of this assignment is to analyze data sets by creating histograms and scatterplots. You will use the STATDISK program for both. Therefore, you should

More information

Variable Selection for Consistent Clustering

Variable Selection for Consistent Clustering Variable Selection for Consistent Clustering Ron Yurko Rebecca Nugent Sam Ventura Department of Statistics Carnegie Mellon University Classification Society, 2017 Typical Questions in Cluster Analysis

More information

NUMERICAL COMPUTING For Finance Using Excel. Sorting and Displaying Data

NUMERICAL COMPUTING For Finance Using Excel. Sorting and Displaying Data NUMERICAL COMPUTING For Finance Using Excel Sorting and Displaying Data Outline 1 Sorting data Excel Sort tool (sort data in ascending or descending order) Simple filter (by ROW, COLUMN, apply a custom

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

AUTOMATIC BLASTOMERE DETECTION IN DAY 1 TO DAY 2 HUMAN EMBRYO IMAGES USING PARTITIONED GRAPHS AND ELLIPSOIDS

AUTOMATIC BLASTOMERE DETECTION IN DAY 1 TO DAY 2 HUMAN EMBRYO IMAGES USING PARTITIONED GRAPHS AND ELLIPSOIDS AUTOMATIC BLASTOMERE DETECTION IN DAY 1 TO DAY 2 HUMAN EMBRYO IMAGES USING PARTITIONED GRAPHS AND ELLIPSOIDS Amarjot Singh 1, John Buonassisi 1, Parvaneh Saeedi 1, Jon Havelock 2 1. School of Engineering

More information

Accelerated Life Testing Module Accelerated Life Testing - Overview

Accelerated Life Testing Module Accelerated Life Testing - Overview Accelerated Life Testing Module Accelerated Life Testing - Overview The Accelerated Life Testing (ALT) module of AWB provides the functionality to analyze accelerated failure data and predict reliability

More information

University of Florida CISE department Gator Engineering. Clustering Part 2

University of Florida CISE department Gator Engineering. Clustering Part 2 Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical

More information

Package GREP2. July 25, 2018

Package GREP2. July 25, 2018 Type Package Package GREP2 July 25, 2018 Title GEO RNA-Seq Experiments Processing Pipeline Version 1.0.1 Maintainer Naim Al Mahi An R based pipeline to download and process Gene Expression

More information

Quantification. Part I, using Excel

Quantification. Part I, using Excel Quantification In this exercise we will work with RNA-seq data from a study by Serin et al (2017). RNA-seq was performed on Arabidopsis seeds matured at standard temperature (ST, 22 C day/18 C night) or

More information

EBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments

EBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments EBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments Ning Leng and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 The model 2 2.1 EBSeqHMM model..........................................

More information

Predicting ground-level scene Layout from Aerial imagery. Muhammad Hasan Maqbool

Predicting ground-level scene Layout from Aerial imagery. Muhammad Hasan Maqbool Predicting ground-level scene Layout from Aerial imagery Muhammad Hasan Maqbool Objective Given the overhead image predict its ground level semantic segmentation Predicted ground level labeling Overhead/Aerial

More information

PRACTICE EXERCISES. Family Utility Expenses

PRACTICE EXERCISES. Family Utility Expenses PRACTICE EXERCISES Family Utility Expenses Your cousin, Rita Dansie, wants to analyze her family's utility expenses for 2012. She wants to save money during months when utility expenses are lower so that

More information

Package SEPA. September 26, 2018

Package SEPA. September 26, 2018 Type Package Title SEPA Version 1.10.0 Date 2015-04-17 Author Zhicheng Ji, Hongkai Ji Maintainer Zhicheng Ji Package SEPA September 26, 2018 Given single-cell RNA-seq data and true experiment

More information

Step-by-Step Guide to Basic Genetic Analysis

Step-by-Step Guide to Basic Genetic Analysis Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control

More information

Page Segmentation by Web Content Clustering

Page Segmentation by Web Content Clustering Page Segmentation by Web Content Clustering Sadet Alcic Heinrich-Heine-University of Duesseldorf Department of Computer Science Institute for Databases and Information Systems May 26, 20 / 9 Outline Introduction

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1 Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group

More information

Practical OmicsFusion

Practical OmicsFusion Practical OmicsFusion Introduction In this practical, we will analyse data, from an experiment which aim was to identify the most important metabolites that are related to potato flesh colour, from an

More information

Package monocle. March 10, 2014

Package monocle. March 10, 2014 Package monocle March 10, 2014 Type Package Title Analysis tools for single-cell expression experiments. Version 0.99.0 Date 2013-11-19 Author Cole Trapnell Maintainer Cole Trapnell

More information

Cellular Automata. Cellular Automata contains three modes: 1. One Dimensional, 2. Two Dimensional, and 3. Life

Cellular Automata. Cellular Automata contains three modes: 1. One Dimensional, 2. Two Dimensional, and 3. Life Cellular Automata Cellular Automata is a program that explores the dynamics of cellular automata. As described in Chapter 9 of Peak and Frame, a cellular automaton is determined by four features: The state

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

Package clusterseq. R topics documented: June 13, Type Package

Package clusterseq. R topics documented: June 13, Type Package Type Package Package clusterseq June 13, 2018 Title Clustering of high-throughput sequencing data by identifying co-expression patterns Version 1.4.0 Depends R (>= 3.0.0), methods, BiocParallel, bayseq,

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

Supplementary Information for: Global similarity and local divergence in human and mouse gene co-expression networks

Supplementary Information for: Global similarity and local divergence in human and mouse gene co-expression networks Supplementary Information for: Global similarity and local divergence in human and mouse gene co-expression networks Panayiotis Tsaparas 1, Leonardo Mariño-Ramírez 2, Olivier Bodenreider 3, Eugene V. Koonin

More information

Title: Connect It: Performance Optimization Session #: 521 Speaker: J.R. Horton Company: HP OpenView ITSM

Title: Connect It: Performance Optimization Session #: 521 Speaker: J.R. Horton Company: HP OpenView ITSM Title: Connect It: Performance Optimization Session #: 521 Speaker: J.R. Horton Company: HP OpenView ITSM Agenda Connect It Overview Optimization features How to tune scenario performance Other considerations

More information

1. Setting up & data entry

1. Setting up & data entry 1. Setting up & data entry Begin by typing date, event name, and height in row 31, columns A-D. We selected 31 simply to leave some room at the top of the document, as that is where your timeline will

More information

Supplementary Figures

Supplementary Figures Supplementary Figures re co rd ed ch annels darkf ield MPM2 PI cell cycle phases G1/S/G2 prophase metaphase anaphase telophase bri ghtfield 55 pixel Supplementary Figure 1 Images of Jurkat cells captured

More information

Background. Parallel Coordinates. Basics. Good Example

Background. Parallel Coordinates. Basics. Good Example Background Parallel Coordinates Shengying Li CSE591 Visual Analytics Professor Klaus Mueller March 20, 2007 Proposed in 80 s by Alfred Insellberg Good for multi-dimensional data exploration Widely used

More information

A formal concept analysis approach to consensus clustering of multi-experiment expression data

A formal concept analysis approach to consensus clustering of multi-experiment expression data Hristoskova et al. BMC Bioinformatics 214, 15:151 METHODOLOGY ARTICLE Open Access A formal concept analysis approach to consensus clustering of multi-experiment expression data Anna Hristoskova 1*, Veselka

More information

STEM. Short Time-series Expression Miner (v1.1) User Manual

STEM. Short Time-series Expression Miner (v1.1) User Manual STEM Short Time-series Expression Miner (v1.1) User Manual Jason Ernst (jernst@cs.cmu.edu) Ziv Bar-Joseph Center for Automated Learning and Discovery School of Computer Science Carnegie Mellon University

More information

All About PlexSet Technology Data Analysis in nsolver Software

All About PlexSet Technology Data Analysis in nsolver Software All About PlexSet Technology Data Analysis in nsolver Software PlexSet is a multiplexed gene expression technology which allows pooling of up to 8 samples per ncounter cartridge lane, enabling users to

More information

Multi-View 3D Object Detection Network for Autonomous Driving

Multi-View 3D Object Detection Network for Autonomous Driving Multi-View 3D Object Detection Network for Autonomous Driving Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, Tian Xia CVPR 2017 (Spotlight) Presented By: Jason Ku Overview Motivation Dataset Network Architecture

More information

Tables in Microsoft Word

Tables in Microsoft Word Tables in Microsoft Word In this lesson we re going to create and work with Tables in Microsoft Word. Tables are used to improve the organisation and presentation of data in your documents. If you haven

More information

SA+ Spreadsheets. Fig. 1

SA+ Spreadsheets. Fig. 1 SongSeq User Manual 1- SA+ Spreadsheets 2- Make Template And Sequence a. Template Selection b. Syllables Identification: Using One Pair of Features c. Loading Target Files d. Viewing Results e. Identification

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu November 7, 2017 Learnt Clustering Methods Vector Data Set Data Sequence Data Text

More information

Joint Embeddings of Shapes and Images. 128 dim space visualized by t-sne

Joint Embeddings of Shapes and Images. 128 dim space visualized by t-sne MDS Embedding MDS takes as input a distance matrix D, containing all N N pair of distances between elements xi, and embed the elements in N dimensional space such that the inter distances Dij are preserved

More information