More about liquid association

Size: px
Start display at page:

Download "More about liquid association"

Transcription

1 More about liquid association

2 Liquid Association (LA) LA is a generalized notion of association for describing certain kind of ternary relationship between variables in a system. (Li 2002 PNAS) low (-) Y high (+) Liquid Association low (-) X high (+) transit state 1 state 2 Linear (state 1) Linear (state 2) Green points represent four conditions for cellular state 1. Red points represent four conditions for cellular state 2. Blue points represent the transit state between cellular states 1 and 2. (X,Y) forms a LA. Profiles of genes X and Y are displayed in the above scatter plot. Important! Correlation between X and Y is 0

3 Statistical theory for LA X, Y, Z random variables with mean 0 and variance 1 Corr(X,Y)=E(XY)=E(E(XY Z))=Eg(Z) g(z) an ideal summary of association pattern between X and Y when Z =z g (z)=derivative of g(z) Definition. The LA of X and Y with respect to Z is LA(X,Y Z)= Eg (Z)

4 Statistical theory-la Theorem. If Z is standard normal, then LA(X,Y Z)=E(XYZ) Proof. By Stein s Lemma : Eg (Z)=Eg(Z)Z =E(E(XY Z)Z)=E(XYZ) Additional math. properties: bounded by third moment =0, if jointly normal transformation

5 Stein Lemma To compute E(g (Z)) is not easy. With help from mathematical statistics theory, the LA(X,Y Z) can be simplified as E(XYZ) when Z follows normal distribution. LA(X,Y Z) = E ( g (Z)) = E (Zg(Z)) = E(ZE(XY Z)) = E (E(XYZ Z)) = E(XYZ) Stein lemma

6 Lemma 1 : Eh (X)=h(1)-h(0) X uniform[0,1] h is differentiable Fundamental theorem of calculus Sir Issac Newton ( ) Gottfried Leibniz ( ) [from Wikipedia]

7 Lemma 2: Eh (X)= EXh(X) X~Normal(0,1) Stein s Lemma Charles Stein Integration by part Proof : Start from the right side Write down the density of X Integration by part

8 Lemma 3: EXh(X)= λeh(x+1) X~Poisson(λ) Chen-Stein method Poisson approximation Louis Chen National University of Singapore Director of IMS

9 Inadmissibility of normal mean when dimension 3 X 1 ~N(μ 1, σ 2 ) X 2 ~N(μ 2,σ 2 ) X 3 ~N(μ 3, σ 2 ) Squared error loss for estimating the mean parameters; variance known Risk = E{(X 1 - μ 1 ) 2 + (X 2 - μ 2 ) 2 + (X 3 - μ 3 ) 2 }=3 σ 2 Better estimate can be constructed by shrinkage toward the origin. y=(x 1,x 2,x 3 ) ; θ=(μ 1,μ 2,μ 3 ) By Stein s lemma, an unbiased estimate of the risk of Jame-Stein estimate can be constructed; 3 σ 2

10 Normality? Convert each gene expression profile by taking normal score transformation LA(X,Y Z) = average of triplet product of three gene profiles: (x 1 y 1 z 1 + x 2 y 2 z 2 +. ) / n

11 X, Y, Z Liquid Association is not Partial correlation Z->X, Z->Y (Causal analysis ) X=aZ+b+e 1 Y=a Z+b +e 2 Partial correlation of X and Y with respect to (adjusted by, given ) Z =corr (e 1, e 2 ) If Z causes X and Y, then partial correlation=0 (X=Coke sale, Y=eye disease incidence rate, Z=season) Starting with a pair of positively correlated genes Y, Z (corr(y,z) > 0 ), find X to reduce the partial correlation This procedure is very different from LA.

12 Quadratic relationship Sometimes liquid association may occur when X and Y have a quadratic trend. This is often the case when Z has good correlation with either X or Y For example, Y=X 2 + e 1,where X is normal with mean 0,variance 0; e1 mean 0, variance 1 Corr (X,Y)=0 Z= 0.8X+0.6e 2 ; e2 mean 0, variance 1 Show LA activity plot. E(XYZ)= 0.8EX 4 >0

13 Statistical significance P-value can be calculated by permutation test or by large sample approximation Plot of liquid association is provided by two methods: MLE for mixture model discrete method

14 Figure 3. Organization chart for incorporating LA with similarity based methods. Coexpressed genes found by profile similarity analysis can be pooled together to obtain a consensus profile for LA-scouting. Likewise, the genes identified through LA system can be further analyzed for patterns of clustering. For some applications, the scouting variable may come from external sources related to the expression profiles. SVD: singular value decomposition; PCA: principal component analysis. Full genome expression profiles Similarity based analysis LA-based analysis co-expression neighbors hierachical cluster eigen profile by SVD or PCA; etc. Finding LA-scouting genes for a given pair of genes Finding LAPs for a given scouting variable Z Using a consensus profile as Z using a gene profile as Z Using an external variable as Z Similarity based analysis

15 An website for co-mining public and inhouse data

16 Data sets Organisms : Primary: homo sapiens ; mouse; yeast Others: C. elegans; arabidopsis; e. coli Homo sapiens: 17 datasets (more are added now) 60 cell line_affy: 60 conditions, 5611 genes 60 cell line cdna: 60 conditions, 9706 genes GNF_atlas (2002): 101 conditions, genes GNF_atlas(2004): 158 conditions, genes Human eqtl (B-cell): 355 conditions, 8793 genes Lung caner : 4 data sets: Bhattacharjee et al : 203 conditions, genes Beer et al : 96 conditions, 7129 genes Gaber et al: 73 conditions, genes Wigle et al: 39 conditions, genes

17 Facilities Basic Correlation for a pair of genes Liquid association for a triplet of genes Enhancement Advanced search methods Gene symbols; gene locations; gene ontology; regulation (Transfac); locus link Compute Variations in computing LA scores Liquid association (default) Projective LA (for multiple genes)» Transformation LA scouting genes Correlation only Raw data; normal transformation Clustering: k-mean, hierarchical clustering, self-organizing methods ( still testing)

18 Facilities-continued Post-LA refining tools Summary Counts, histogram, GO, Pathway (still testing) Correlation Liquid association Instant link to Entrez Genes or SGD(yeast only) Liquid association graphs (two methods ) Save Info (gene annotation, from public domain) Gene_sym, Gene-Name, chrom, start, stop, etc. (expression data, computed) Indices, Ranks, Quantitles, Rank_LAP, Rank_Corr, Transfac GO term (for yeast now) Compute Correlation matrix (raw or normalized) Clustering (K means; hierarchical )

19 Facilities (continued) P* : permutation with 50,000 iterations (testing ) P** : permutation with 1,000,000 (does not work yet) Download (create excel files for exporting ) MAP (chromosome locations of output genes) Alert system MS markers MS candidate genes Yeast genetics User added system (talk to us) Disease pages (work in progress) Multiple sclerosis Group by Adding genes Delete; modify Computation methods. Databases,

20 Special tools (under development) For handling marker data Converting to binary data Additional links Precomputed data Master LA genes (for limited datasets for now) Protein Complex data (only in yeast for now) KEGG pathway

SEEK User Manual. Introduction

SEEK User Manual. Introduction SEEK User Manual Introduction SEEK is a computational gene co-expression search engine. It utilizes a vast human gene expression compendium to deliver fast, integrative, cross-platform co-expression analyses.

More information

Expander Online Documentation

Expander Online Documentation Expander Online Documentation Table of Contents Introduction...1 Starting EXPANDER...2 Input Data...4 Preprocessing GE Data...8 Viewing Data Plots...12 Clustering GE Data...14 Biclustering GE Data...17

More information

Nature Methods: doi: /nmeth Supplementary Figure 1

Nature Methods: doi: /nmeth Supplementary Figure 1 Supplementary Figure 1 Schematic representation of the Workflow window in Perseus All data matrices uploaded in the running session of Perseus and all processing steps are displayed in the order of execution.

More information

Missing Data Estimation in Microarrays Using Multi-Organism Approach

Missing Data Estimation in Microarrays Using Multi-Organism Approach Missing Data Estimation in Microarrays Using Multi-Organism Approach Marcel Nassar and Hady Zeineddine Progress Report: Data Mining Course Project, Spring 2008 Prof. Inderjit S. Dhillon April 02, 2008

More information

/ Computational Genomics. Normalization

/ Computational Genomics. Normalization 10-810 /02-710 Computational Genomics Normalization Genes and Gene Expression Technology Display of Expression Information Yeast cell cycle expression Experiments (over time) baseline expression program

More information

Package LiquidAssociation

Package LiquidAssociation Type Package Title LiquidAssociation Version 1.36.0 Date 2009-10-07 Package LiquidAssociation Author Yen-Yi Ho Maintainer Yen-Yi Ho March 8, 2019 The package contains functions

More information

Evaluation and comparison of gene clustering methods in microarray analysis

Evaluation and comparison of gene clustering methods in microarray analysis Evaluation and comparison of gene clustering methods in microarray analysis Anbupalam Thalamuthu 1 Indranil Mukhopadhyay 1 Xiaojing Zheng 1 George C. Tseng 1,2 1 Department of Human Genetics 2 Department

More information

Tutorial:OverRepresentation - OpenTutorials

Tutorial:OverRepresentation - OpenTutorials Tutorial:OverRepresentation From OpenTutorials Slideshow OverRepresentation (about 12 minutes) (http://opentutorials.rbvi.ucsf.edu/index.php?title=tutorial:overrepresentation& ce_slide=true&ce_style=cytoscape)

More information

Dimension reduction : PCA and Clustering

Dimension reduction : PCA and Clustering Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Dimension Induced Clustering

Dimension Induced Clustering Dimension Induced Clustering Aris Gionis Alexander Hinneburg Spiros Papadimitriou Panayiotis Tsaparas HIIT, University of Helsinki Martin Luther University, Halle Carnegie Melon University HIIT, University

More information

Analyzing ICAT Data. Analyzing ICAT Data

Analyzing ICAT Data. Analyzing ICAT Data Analyzing ICAT Data Gary Van Domselaar University of Alberta Analyzing ICAT Data ICAT: Isotope Coded Affinity Tag Introduced in 1999 by Ruedi Aebersold as a method for quantitative analysis of complex

More information

Clustering analysis of gene expression data

Clustering analysis of gene expression data Clustering analysis of gene expression data Chapter 11 in Jonathan Pevsner, Bioinformatics and Functional Genomics, 3 rd edition (Chapter 9 in 2 nd edition) Human T cell expression data The matrix contains

More information

Step-by-Step Guide to Advanced Genetic Analysis

Step-by-Step Guide to Advanced Genetic Analysis Step-by-Step Guide to Advanced Genetic Analysis Page 1 Introduction In the previous document, 1 we covered the standard genetic analyses available in JMP Genomics. Here, we cover the more advanced options

More information

Customizable information fields (or entries) linked to each database level may be replicated and summarized to upstream and downstream levels.

Customizable information fields (or entries) linked to each database level may be replicated and summarized to upstream and downstream levels. Manage. Analyze. Discover. NEW FEATURES BioNumerics Seven comes with several fundamental improvements and a plethora of new analysis possibilities with a strong focus on user friendliness. Among the most

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1 Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group

More information

VIDAEXPERT: DATA ANALYSIS Here is the Statistics button.

VIDAEXPERT: DATA ANALYSIS Here is the Statistics button. Here is the Statistics button. After creating dataset you can analyze it in different ways. First, you can calculate statistics. Open Statistics dialog, Common tabsheet, click Calculate. Min, Max: minimal

More information

DAVID hands-on. by Ester Feldmesser, June 2017

DAVID hands-on. by Ester Feldmesser, June 2017 DAVID hands-on by Ester Feldmesser, June 2017 1. Go to the DAVID website (http://david.abcc.ncifcrf.gov/) 2. Press on Start Analysis: 3. Choose the Upload tab in the left panel: 4. Download the k-means5_arabidopsis.txt

More information

Differential Expression Analysis at PATRIC

Differential Expression Analysis at PATRIC Differential Expression Analysis at PATRIC The following step- by- step workflow is intended to help users learn how to upload their differential gene expression data to their private workspace using Expression

More information

Application of Hierarchical Clustering to Find Expression Modules in Cancer

Application of Hierarchical Clustering to Find Expression Modules in Cancer Application of Hierarchical Clustering to Find Expression Modules in Cancer T. M. Murali August 18, 2008 Innovative Application of Hierarchical Clustering A module map showing conditional activity of expression

More information

Introduction to GE Microarray data analysis Practical Course MolBio 2012

Introduction to GE Microarray data analysis Practical Course MolBio 2012 Introduction to GE Microarray data analysis Practical Course MolBio 2012 Claudia Pommerenke Nov-2012 Transkriptomanalyselabor TAL Microarray and Deep Sequencing Core Facility Göttingen University Medical

More information

Predicting Disease-related Genes using Integrated Biomedical Networks

Predicting Disease-related Genes using Integrated Biomedical Networks Predicting Disease-related Genes using Integrated Biomedical Networks Jiajie Peng (jiajiepeng@nwpu.edu.cn) HanshengXue(xhs1892@gmail.com) Jin Chen* (chen.jin@uky.edu) Yadong Wang* (ydwang@hit.edu.cn) 1

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Step-by-Step Guide to Relatedness and Association Mapping Contents

Step-by-Step Guide to Relatedness and Association Mapping Contents Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...

More information

Exercises. Biological Data Analysis Using InterMine workshop exercises with answers

Exercises. Biological Data Analysis Using InterMine workshop exercises with answers Exercises Biological Data Analysis Using InterMine workshop exercises with answers Exercise1: Faceted Search Use HumanMine for this exercise 1. Search for one or more of the following using the keyword

More information

What is clustering. Organizing data into clusters such that there is high intra- cluster similarity low inter- cluster similarity

What is clustering. Organizing data into clusters such that there is high intra- cluster similarity low inter- cluster similarity Clustering What is clustering Organizing data into clusters such that there is high intra- cluster similarity low inter- cluster similarity Informally, finding natural groupings among objects. High dimensional

More information

Correlation Motif Vignette

Correlation Motif Vignette Correlation Motif Vignette Hongkai Ji, Yingying Wei October 30, 2018 1 Introduction The standard algorithms for detecting differential genes from microarray data are mostly designed for analyzing a single

More information

The Course Structure for the MCA Programme

The Course Structure for the MCA Programme The Course Structure for the MCA Programme SEMESTER - I MCA 1001 Problem Solving and Program Design with C 3 (3-0-0) MCA 1003 Numerical & Statistical Methods 4 (3-1-0) MCA 1007 Discrete Mathematics 3 (3-0-0)

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

Recommender Systems New Approaches with Netflix Dataset

Recommender Systems New Approaches with Netflix Dataset Recommender Systems New Approaches with Netflix Dataset Robert Bell Yehuda Koren AT&T Labs ICDM 2007 Presented by Matt Rodriguez Outline Overview of Recommender System Approaches which are Content based

More information

Generalized trace ratio optimization and applications

Generalized trace ratio optimization and applications Generalized trace ratio optimization and applications Mohammed Bellalij, Saïd Hanafi, Rita Macedo and Raca Todosijevic University of Valenciennes, France PGMO Days, 2-4 October 2013 ENSTA ParisTech PGMO

More information

Programming Exercise 7: K-means Clustering and Principal Component Analysis

Programming Exercise 7: K-means Clustering and Principal Component Analysis Programming Exercise 7: K-means Clustering and Principal Component Analysis Machine Learning May 13, 2012 Introduction In this exercise, you will implement the K-means clustering algorithm and apply it

More information

srap: Simplified RNA-Seq Analysis Pipeline

srap: Simplified RNA-Seq Analysis Pipeline srap: Simplified RNA-Seq Analysis Pipeline Charles Warden October 30, 2017 1 Introduction This package provides a pipeline for gene expression analysis. The normalization function is specific for RNA-Seq

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates CSE 51: Design and Analysis of Algorithms I Spring 016 Lecture 11: Clustering and the Spectral Partitioning Algorithm Lecturer: Shayan Oveis Gharan May nd Scribe: Yueqi Sheng Disclaimer: These notes have

More information

The Allen Human Brain Atlas offers three types of searches to allow a user to: (1) obtain gene expression data for specific genes (or probes) of

The Allen Human Brain Atlas offers three types of searches to allow a user to: (1) obtain gene expression data for specific genes (or probes) of Microarray Data MICROARRAY DATA Gene Search Boolean Syntax Differential Search Mouse Differential Search Search Results Gene Classification Correlative Search Download Search Results Data Visualization

More information

WebGestalt Manual. January 30, 2013

WebGestalt Manual. January 30, 2013 WebGestalt Manual January 30, 2013 The Web-based Gene Set Analysis Toolkit (WebGestalt) is a suite of tools for functional enrichment analysis in various biological contexts. WebGestalt compares a user

More information

Using the DATAMINE Program

Using the DATAMINE Program 6 Using the DATAMINE Program 304 Using the DATAMINE Program This chapter serves as a user s manual for the DATAMINE program, which demonstrates the algorithms presented in this book. Each menu selection

More information

Version 2.4 of Idiogrid

Version 2.4 of Idiogrid Version 2.4 of Idiogrid Structural and Visual Modifications 1. Tab delimited grids in Grid Data window. The most immediately obvious change to this newest version of Idiogrid will be the tab sheets that

More information

Introduction to Bioinformatics AS Laboratory Assignment 2

Introduction to Bioinformatics AS Laboratory Assignment 2 Introduction to Bioinformatics AS 250.265 Laboratory Assignment 2 Last week, we discussed several high-throughput methods for the analysis of gene expression in cells. Of those methods, microarray technologies

More information

Project 11 Graphs (Using MS Excel Version )

Project 11 Graphs (Using MS Excel Version ) Project 11 Graphs (Using MS Excel Version 2007-10) Purpose: To review the types of graphs, and use MS Excel 2010 to create them from a dataset. Outline: You will be provided with several datasets and will

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) SAS Adv. Analytics or Predictive Modelling:- Class Room: Training Fee & Duration : 30K & 3 Months Online Training Fee & Duration : 33K & 3 Months Learning SAS:

More information

Network-based auto-probit modeling for protein function prediction

Network-based auto-probit modeling for protein function prediction Network-based auto-probit modeling for protein function prediction Supplementary material Xiaoyu Jiang, David Gold, Eric D. Kolaczyk Derivation of Markov Chain Monte Carlo algorithm with the GO annotation

More information

User s Guide. Using the R-Peridot Graphical User Interface (GUI) on Windows and GNU/Linux Systems

User s Guide. Using the R-Peridot Graphical User Interface (GUI) on Windows and GNU/Linux Systems User s Guide Using the R-Peridot Graphical User Interface (GUI) on Windows and GNU/Linux Systems Pitágoras Alves 01/06/2018 Natal-RN, Brazil Index 1. The R Environment Manager...

More information

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression Machine learning, pattern recognition and statistical data modelling Lecture 3. Linear Methods (part 1) Coryn Bailer-Jones Last time... curse of dimensionality local methods quickly become nonlocal as

More information

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging 1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant

More information

User Manual. Ver. 3.0 March 19, 2012

User Manual. Ver. 3.0 March 19, 2012 User Manual Ver. 3.0 March 19, 2012 Table of Contents 1. Introduction... 2 1.1 Rationale... 2 1.2 Software Work-Flow... 3 1.3 New in GenomeGems 3.0... 4 2. Software Description... 5 2.1 Key Features...

More information

Lecture Topic Projects

Lecture Topic Projects Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data

More information

Expander 7.2 Online Documentation

Expander 7.2 Online Documentation Expander 7.2 Online Documentation Introduction... 2 Starting EXPANDER... 2 Input Data... 3 Tabular Data File... 4 CEL Files... 6 Working on similarity data no associated expression data... 9 Working on

More information

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\ Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured

More information

Processing of binary images

Processing of binary images Binary Image Processing Tuesday, 14/02/2017 ntonis rgyros e-mail: argyros@csd.uoc.gr 1 Today From gray level to binary images Processing of binary images Mathematical morphology 2 Computer Vision, Spring

More information

Methods for Intelligent Systems

Methods for Intelligent Systems Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering

More information

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data Smoothing Dissimilarities for Cluster Analysis: Binary Data and unctional Data David B. University of South Carolina Department of Statistics Joint work with Zhimin Chen University of South Carolina Current

More information

Select the Points You ll Use. Tech Assignment: Find a Quadratic Function for College Costs

Select the Points You ll Use. Tech Assignment: Find a Quadratic Function for College Costs In this technology assignment, you will find a quadratic function that passes through three of the points on each of the scatter plots you created in an earlier technology assignment. You will need the

More information

JMP Book Descriptions

JMP Book Descriptions JMP Book Descriptions The collection of JMP documentation is available in the JMP Help > Books menu. This document describes each title to help you decide which book to explore. Each book title is linked

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Database Repository and Tools

Database Repository and Tools Database Repository and Tools John Matese May 9, 2008 What is the Repository? Save and exchange retrieved and analyzed datafiles Perform datafile manipulations (averaging and annotations) Run specialized

More information

caution in interpreting graph-theoretic diagnostics

caution in interpreting graph-theoretic diagnostics April 17, 2013 What is a network [1, 2, 3] What is a network [1, 2, 3] What is a network [1, 2, 3] What is a network [1, 2, 3] What is a network a collection of more or less identical agents or objects,

More information

Visual Encoding Design

Visual Encoding Design CSE 442 - Data Visualization Visual Encoding Design Jeffrey Heer University of Washington Review: Expressiveness & Effectiveness / APT Choosing Visual Encodings Assume k visual encodings and n data attributes.

More information

Clustering gene expression data

Clustering gene expression data Clustering gene expression data 1 How Gene Expression Data Looks Entries of the Raw Data matrix: Ratio values Absolute values Row = gene s expression pattern Column = experiment/condition s profile genes

More information

Week 7 Picturing Network. Vahe and Bethany

Week 7 Picturing Network. Vahe and Bethany Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups

More information

Metabolomic Data Analysis with MetaboAnalyst

Metabolomic Data Analysis with MetaboAnalyst Metabolomic Data Analysis with MetaboAnalyst User ID: guest6522519400069885256 April 14, 2009 1 Data Processing and Normalization 1.1 Reading and Processing the Raw Data MetaboAnalyst accepts a variety

More information

JAVA PROGRAMMING. Unit-3 :Creating Gui Using The Abstract Windowing Toolkit:

JAVA PROGRAMMING. Unit-3 :Creating Gui Using The Abstract Windowing Toolkit: JAVA PROGRAMMING UNIT-1: Introduction To Java, Getting Started With Java, Applets And Application, Creating A Java Application, Creating A Java Applets, Object Oriented Programming In Java, Object And

More information

9.1. K-means Clustering

9.1. K-means Clustering 424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific

More information

STEM. Short Time-series Expression Miner (v1.1) User Manual

STEM. Short Time-series Expression Miner (v1.1) User Manual STEM Short Time-series Expression Miner (v1.1) User Manual Jason Ernst (jernst@cs.cmu.edu) Ziv Bar-Joseph Center for Automated Learning and Discovery School of Computer Science Carnegie Mellon University

More information

M. Tech. Bioinformatics (Evening) Semester II w.e.f Jan 2017

M. Tech. Bioinformatics (Evening) Semester II w.e.f Jan 2017 M. Tech. Bioinformatics (Evening) Semester II S. Course Subject Subject Periods Evaluation Scheme Subject No. category code Sessional Exam Total Theory CT TA Total ESE 1. ESA MTE-503 Applied Mathematics

More information

Iterative Signature Algorithm for the Analysis of Large-Scale Gene Expression Data. By S. Bergmann, J. Ihmels, N. Barkai

Iterative Signature Algorithm for the Analysis of Large-Scale Gene Expression Data. By S. Bergmann, J. Ihmels, N. Barkai Iterative Signature Algorithm for the Analysis of Large-Scale Gene Expression Data By S. Bergmann, J. Ihmels, N. Barkai Reasoning Both clustering and Singular Value Decomposition(SVD) are useful tools

More information

2. Take a few minutes to look around the site. The goal is to familiarize yourself with a few key components of the NCBI.

2. Take a few minutes to look around the site. The goal is to familiarize yourself with a few key components of the NCBI. 2 Navigating the NCBI Instructions Aim: To become familiar with the resources available at the National Center for Bioinformatics (NCBI) and the search engine Entrez. Instructions: Write the answers to

More information

2016 Stat-Ease, Inc. & CAMO Software

2016 Stat-Ease, Inc. & CAMO Software Multivariate Analysis and Design of Experiments in practice using The Unscrambler X Frank Westad CAMO Software fw@camo.com Pat Whitcomb Stat-Ease pat@statease.com Agenda Goal: Part 1: Part 2: Show how

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,

More information

Outline. Multivariate analysis: Least-squares linear regression Curve fitting

Outline. Multivariate analysis: Least-squares linear regression Curve fitting DATA ANALYSIS Outline Multivariate analysis: principal component analysis (PCA) visualization of high-dimensional data clustering Least-squares linear regression Curve fitting e.g. for time-course data

More information

CELLULAR automata (CA) are mathematical models for

CELLULAR automata (CA) are mathematical models for 1 Cellular Learning Automata with Multiple Learning Automata in Each Cell and its Applications Hamid Beigy and M R Meybodi Abstract The cellular learning automata, which is a combination of cellular automata

More information

RiceFREND Ver 2.0 User Manual

RiceFREND Ver 2.0 User Manual RiceFREND Ver 2.0 User Manual About Coexpression Index Coexpression Search Options Coexpression Gene Network in Hyper Tree Coexpression Gene Network in Cytoscape Web (Single) Coexpression Gene Network

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 15: Microarray clustering http://compbio.pbworks.com/f/wood2.gif Some slides were adapted from Dr. Shaojie Zhang (University of Central Florida) Microarray

More information

Chapter 5: Outlier Detection

Chapter 5: Outlier Detection Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 5: Outlier Detection Lecture: Prof. Dr.

More information

Exploring gene expression datasets

Exploring gene expression datasets Exploring gene expression datasets Alexey Sergushichev Dec 4-5, St. Louis About the workshop We will cover the basic analysis of gene expression matrices No working with raw data The focus is on being

More information

ECS 234: Data Analysis: Clustering ECS 234

ECS 234: Data Analysis: Clustering ECS 234 : Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

CLC Server. End User USER MANUAL

CLC Server. End User USER MANUAL CLC Server End User USER MANUAL Manual for CLC Server 10.0.1 Windows, macos and Linux March 8, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark

More information

Introduction to Machine Learning

Introduction to Machine Learning Department of Computer Science, University of Helsinki Autumn 2009, second term Session 8, November 27 th, 2009 1 2 3 Multiplicative Updates for L1-Regularized Linear and Logistic Last time I gave you

More information

MATLAB Based Optimization Techniques and Parallel Computing

MATLAB Based Optimization Techniques and Parallel Computing MATLAB Based Optimization Techniques and Parallel Computing Bratislava June 4, 2009 2009 The MathWorks, Inc. Jörg-M. Sautter Application Engineer The MathWorks Agenda Introduction Local and Smooth Optimization

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

Enterprise Miner Software: Changes and Enhancements, Release 4.1

Enterprise Miner Software: Changes and Enhancements, Release 4.1 Enterprise Miner Software: Changes and Enhancements, Release 4.1 The correct bibliographic citation for this manual is as follows: SAS Institute Inc., Enterprise Miner TM Software: Changes and Enhancements,

More information

CLUSTERING IN BIOINFORMATICS

CLUSTERING IN BIOINFORMATICS CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of

More information

How do microarrays work

How do microarrays work Lecture 3 (continued) Alvis Brazma European Bioinformatics Institute How do microarrays work condition mrna cdna hybridise to microarray condition Sample RNA extract labelled acid acid acid nucleic acid

More information

FEATURE SELECTION TECHNIQUES

FEATURE SELECTION TECHNIQUES CHAPTER-2 FEATURE SELECTION TECHNIQUES 2.1. INTRODUCTION Dimensionality reduction through the choice of an appropriate feature subset selection, results in multiple uses including performance upgrading,

More information

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual Miao-Xin Li, Jiang Li Department of Psychiatry Centre for Genomic Sciences Department

More information

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018 Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018 Contents Introduction... 1 Start DIONE... 2 Load Data... 3 Missing Values... 5 Explore Data... 6 One Variable... 6 Two Variables... 7 All

More information

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical

More information

Models for Nurses: Quadratic Model ( ) Linear Model Dx ( ) x Models for Doctors:

Models for Nurses: Quadratic Model ( ) Linear Model Dx ( ) x Models for Doctors: The goal of this technology assignment is to graph several formulas in Excel. This assignment assumes that you using Excel 2007. The formula you will graph is a rational function formed from two polynomials,

More information

Tight Clustering: a method for extracting stable and tight patterns in expression profiles

Tight Clustering: a method for extracting stable and tight patterns in expression profiles Statistical issues in microarra analsis Tight Clustering: a method for etracting stable and tight patterns in epression profiles Eperimental design Image analsis Normalization George C. Tseng Dept. of

More information

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140 Data Mining CS 5140 / CS 6140 Jeff M. Phillips January 7, 2019 What is Data Mining? What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

We deliver Global Engineering Solutions. Efficiently. This page contains no technical data Subject to the EAR or the ITAR

We deliver Global Engineering Solutions. Efficiently. This page contains no technical data Subject to the EAR or the ITAR Numerical Computation, Statistical analysis and Visualization Using MATLAB and Tools Authors: Jamuna Konda, Jyothi Bonthu, Harpitha Joginipally Infotech Enterprises Ltd, Hyderabad, India August 8, 2013

More information

Properties of Biological Networks

Properties of Biological Networks Properties of Biological Networks presented by: Ola Hamud June 12, 2013 Supervisor: Prof. Ron Pinter Based on: NETWORK BIOLOGY: UNDERSTANDING THE CELL S FUNCTIONAL ORGANIZATION By Albert-László Barabási

More information

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics 1. Introduction EE 546, Univ of Washington, Spring 2016 performance of numerical methods complexity bounds structural convex optimization course goals and topics 1 1 Some course info Welcome to EE 546!

More information