SEEK User Manual. Introduction

Similar documents
User guide for GEM-TREND

mirnet Tutorial Starting with expression data

Drug versus Disease (DrugVsDisease) package

Differential Expression Analysis at PATRIC

EGAN Tutorial: A Basic Use-case

Pathway Analysis using Partek Genomics Suite 6.6 and Partek Pathway

The Allen Human Brain Atlas offers three types of searches to allow a user to: (1) obtain gene expression data for specific genes (or probes) of

EECS730: Introduction to Bioinformatics

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017

MetScape User Manual

Step-by-Step Guide to Basic Genetic Analysis

Lecture 5. Functional Analysis with Blast2GO Enriched functions. Kegg Pathway Analysis Functional Similarities B2G-Far. FatiGO Babelomics.

Application of Hierarchical Clustering to Find Expression Modules in Cancer

VisANT 4.0 User Manual. Contents

Tutorial:OverRepresentation - OpenTutorials

GenViewer Tutorial / Manual

PROMO 2017a - Tutorial

m6aviewer Version Documentation

Tutorial: Jump Start on the Human Epigenome Browser at Washington University

Release Notes. JMP Genomics. Version 4.0

Functional enrichment analysis

Package sigora. September 28, 2018

Genetic Analysis. Page 1

Blast2GO Teaching Exercises

BovineMine Documentation

STEM. Short Time-series Expression Miner (v1.1) User Manual

SNPViewer Documentation

DAVID hands-on. by Ester Feldmesser, June 2017

GeneSifter.Net User s Guide

PathwayExplorer Tutorial Contents

Drug/Cell-line Browser (DBC) v1.0 - User Manual

Wilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment

CLC Server. End User USER MANUAL

HymenopteraMine Documentation

DataAssist v2.0 Software User Instructions

Advanced RNA-Seq 1.5. User manual for. Windows, Mac OS X and Linux. November 2, 2016 This software is for research purposes only.

Topics of the talk. Biodatabases. Data types. Some sequence terminology...

Wilson Leung 05/27/2008 A Simple Introduction to NCBI BLAST

miscript mirna PCR Array Data Analysis v1.1 revision date November 2014

Also, for all analyses, two other files are produced upon program completion.

Introduction to GE Microarray data analysis Practical Course MolBio 2012

Import GEO Experiment into Partek Genomics Suite

cbioportal /5/401

WebGestalt Manual. January 30, 2013

Agilent G6825AA METLIN Personal Metabolite Database for MassHunter Workstation

User Guide. v Released June Advaita Corporation 2016

Exercises. Biological Data Analysis Using InterMine workshop exercises with answers

Tutorial. Identification of Variants Using GATK. Sample to Insight. November 21, 2017

More about liquid association

ICTR UW Institute of Clinical and Translational Research. i2b2 User Guide. Version 1.0 Updated 9/11/2017

RiceFREND Ver 2.0 User Manual

CyKEGGParser User Manual

ViTraM: VIsualization of TRAnscriptional Modules

i2b2 User Guide Informatics for Integrating Biology & the Bedside Version 1.0 October 2012

Expander Online Documentation

GS Analysis of Microarray Data

AGA User Manual. Version 1.0. January 2014

/ Computational Genomics. Normalization

Expression Analysis with the Advanced RNA-Seq Plugin

FANTOM: Functional and Taxonomic Analysis of Metagenomes

ChromHMM: automating chromatin-state discovery and characterization

Gene Set Enrichment Analysis. GSEA User Guide

ViTraM: VIsualization of TRAnscriptional Modules

How to Use the Cancer-Rates.Info/NJ

Analyzing Variant Call results using EuPathDB Galaxy, Part II

MAGE-ML: MicroArray Gene Expression Markup Language

NetWalker Genomic Data Integration Platform. User Guide

1. Introduction Supported data formats/arrays Aligned BAM files How to load and open files Affymetrix files...

safe September 23, 2010

i2b2 User Guide University of Minnesota Clinical and Translational Science Institute

Colorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi

Introduction to Systems Biology II: Lab

Tutorial: Resequencing Analysis using Tracks

Pathway Analysis of Untargeted Metabolomics Data using the MS Peaks to Pathways Module

Gegenees genome format...7. Gegenees comparisons...8 Creating a fragmented all-all comparison...9 The alignment The analysis...

Orange3 Data Fusion Documentation. Biolab

Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data

MDA Blast2GO Exercises

Astra Scheduling Grids

Tutorial. Comparative Analysis of Three Bovine Genomes. Sample to Insight. November 21, 2017

Software Reference Sheet: Inserting and Organizing Data in a Spreadsheet

Importing and Merging Data Tutorial

Load KEGG Pathways. Browse and Load KEGG Pathways

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London

Nature Methods: doi: /nmeth Supplementary Figure 1

Gene selection through Switched Neural Networks

Alpha 1 i2b2 User Guide

Pathway Studio Quick Start Guide

Database Repository and Tools

SciMiner User s Manual

Package LncPath. May 16, 2016

BioMart: a research data management tool for the biomedical sciences

Human Disease Models Tutorial

CompClustTk Manual & Tutorial

Package splinetimer. December 22, 2016

Track the entire global pharmaceutical R&D pipeline live from your desktop

Tutorial. Variant Detection. Sample to Insight. November 21, 2017

ChIP-seq hands-on practical using Galaxy

CARMAweb users guide version Johannes Rainer

User s Guide. Using the R-Peridot Graphical User Interface (GUI) on Windows and GNU/Linux Systems

Transcription:

SEEK User Manual Introduction SEEK is a computational gene co-expression search engine. It utilizes a vast human gene expression compendium to deliver fast, integrative, cross-platform co-expression analyses. In addition, SEEK provides instant visualization of the co-expressed genes in relevant datasets. The following sections walk users through the various pages on the SEEK website, provide annotations to different functions on each page, and provide interpretations to the visualization heat maps generated by SEEK. Version.0

.0 Main Page 3.0, 4.0 Gene, Dataset Analysis Dialogs 2.0 Expression View Page 5.0 Expression Zoom-In Page 6.0 Gene Enrichment Page 7.0 Search Refinement Page 8.0 Options Page.0 Export Page 9.0 Co-expression View Page 0.0 Gene Co-expression Contributions Page

.0 Main Page. Query box the query gene(s) in gene symbol format and separated by spaces

2.0 Expression View Page (Search Result) 2 3 8 2 5 4 3 6 6b 7 9 0

When the search of the user's query is completed, Expression View is the first page presented to the user.. The user's query 2. The query genes' expression profiles across top relevant datasets detected for this query 3. The co-expressed genes' expression profiles 4. The list of co-expressed genes with each gene's rank, co-expression score, and gene name indicated. Clicking on the gene name invokes the Gene Analysis Dialog. 5. An expression profile within a dataset. The dataset heading shows the rank of the dataset, the relevance weight (in parenthesis), and a list of keywords describing the dataset based on mining the dataset description. Clicking on the dataset heading invokes the Dataset Analysis Dialog. 6. Hierarchical clustering of conditions within a dataset based on the expressions of top 50 co-expressed genes 6b. Expression heat-map. Hatching lines indicate that a gene is not present in the dataset. Each row in the heat-map can be clicked to invoke the Expression Zoom-In Page 7. Color gradient indicates down- and up-regulated expression values 8. Navigate to the next page of co-expressed genes, or to the next page of datasets 9. Gene enrichment analysis of top co-expressed genes (See Enrichment Page) 0. Refining search by narrowing down datasets (See Search Refinement Page). Textual export of SEEK's retrieved gene list, dataset list, and expression matrices on this page (See Export Pages) 2. Search and visualization options (See Options Page) 3. Toggle between Expression View and Co-expression View (See Co-expression View Page)

3.0 Gene Analysis Dialog 2. Gene name, description, and ENTREZ id. Clicking on the ENTREZ name re-directs the user to the NCBI Entrez Gene page. 2. Gene description, to be invoked by clicking on [ + ] icon

4.0 Dataset Analysis Dialog 2 3 4. Dataset GSE id, keywords, NCBI GEO link 2. Description of the dataset, to be invoked by clicking on [ + ] 3. Down-regulated conditions, where the query genes and the top 50 co-expressed genes have an average gene-centered expression of -.0 or lower. Gene-centered expression is calculated by subtracting the absolute expression value by the row-average, next divided by the row standard deviation. 4. Up-regulated conditions, where the query and the top 50 co-expressed genes have an average gene-centered expression of.0 or above.

5.0 Expression Zoom-In Page 2 3 This view shows the conditional gene expression in a given gene (SMO) and dataset (GSE2630). Each dataset is made up of a set of conditions. Each condition has a set of attributes annotated to it (acquired from NCBI Gene Expression Omnibus), such as patient age, sex, subtype information, drug treatments, etc. SEEK utilizes these attribute-value pairs for displaying and sorting purposes.. The attribute to be displayed. Changing the display attribute changes the information about the condition being displayed but does not alter the order of conditions.

2. The sorting attribute. Changing this will reorder the conditions according to the attribute values. For example, selecting sort attribute to be sex will reorder the conditions so that all conditions with value female are grouped together, followed by the group of conditions with value male. Another sorting option is to adopt an expression-based approach to hierarchically cluster the conditions (which does not use attribute annotations). 3. The attribute values. Mousing-over the condition label will display all information about this condition. Clicking the label re-directs the user to the NCBI page regarding this condition. Tip: The display attribute and the sort attribute do not need to be the same. In fact, users should take advantage of this to discover any relationship that might exist between attributes. Tip: We also suggest pairing hierarchical clustering (as the sorting option) with a display attribute of the user's choice. This combination has the potential to discover connections between the expression value and an attribute type. For example, users might be curious to know if the breast cancer patients with ESR+ status is correlated with having an up-regulated expression for the gene of interest.

6.0 Gene Enrichment Page 2 3

. The number of top ranked genes to perform enrichment. 2. Perform a gene-set enrichment analysis against an external functional database of gene sets. The functional databases that are connected to SEEK are: Chromosomal position: MsigDB positional gene sets Pathway: BioCarta gene sets KEGG pathway gene sets Reactome gene sets Differentially expressed genes: MsigDB chemical and genetic perturbation gene sets Biological process: Gene Ontology Biological Process branch (with experimental annotations only) Gene Ontology Biological Process branch (with electronic annotations) mirna motif: TargetScan gene sets of shared mirna motif 3. The set of enriched terms found to be statistically significant. Standard hypergeometric tests are performed followed by a Benjamini Hochberg multiple hypothesis testing correction. The table displays the term name, p-value, q-value, size of the term (T), size of the analysis set (removing genes without any annotation) (A), and size of the overlap (T&A). Tip: Mouse-over a number in the column T&A to see the names of the overlapping genes. For example:

7.0 Search Refinement Page 2 3 Users may find the search refinement to be helpful in the following situations: -- The query is too small to be informative for SEEK's dataset weighting. -- Users would like to further continue the analysis upon seeing the search results. They wish to perform the same query on a subset of datasets related to a certain disease, cell type, or tissue. -- Users wish to see the effect of integrating only the top X datasets ranked by SEEK. The search refinement page can help supplement the query-based dataset weighting to provide more dataset-specific results. Users can:. Limit datasets by tissue, cell type, or disease 2. Limit datasets by their rank assigned by SEEK 3. Restore to the default use all datasets option

Limiting datasets by tissue, cell type, disease: 4 2 3 5 SEEK has pre-mapped the datasets to UMLS tissue, cell type, and disease categories based on the dataset descriptions obtained from Gene Expression Omnibus.. Select one or more tissue, cell type, or disease categories. 2. Enter a keyword in the text-box to filter the list (Optional). All categories are displayed in the list if no keyword is entered. Use None to select none, or All to select all in the list. 3. Click Check Selection to see what datasets are being selected. 4. A list of selected datasets is displayed. 5. Click Refine to begin search refinement of the current query within the selected dataset subset.

Limiting datasets by rank: 2. Rank of the dataset, based on the dataset ranking for the current query. Users can select top 0, top 00, or however many datasets. 2. Click Refine to begin the search integration of the current query within the top X datasets.

8.0 Options Page: 2 3 4 6 5 7 8. Three choices of dataset aggregation methods are offered: CV RBP weighted (default), Order statistics 2, Equally weighted. 2. If CV RBP weighted is chosen, the parameter p 3 used in dataset weighting. 3. Specify a minimum threshold for using a dataset. Because not all datasets contain all of the query genes, this option allows users to specify the minimum fraction of query genes required to be present in order to consider a dataset for integration. A dataset that does not meet this threshold is skipped for the query. 4. Three choices of distance measures are offered. Although all of them are based on Pearson correlations, they vary according to additional processing. ) Pearson correlation. 2) Pearson correlation + Fisher transform + standardization (producing a z-score). 3) Z-score + geneconnectivity correction 4. This reduces the influence of well-connected genes in search while boosting weakly connected genes. 5. If true, display the gene-centered expressions in the Expression View.

6. If specified, restrict genes in the Expression View by their P-values. A P-value is computed for each gene to the current query, based on the empirical probability that the score of the gene exceeds the observed co-expression score in a large pool of random queries. 7. If true, each cell in the heat map generated in the Co-expression View indicates the co-expression z-score multiplied by the dataset weight. 8. If true, and the user has previously made a dataset selection using the Refine Search function, the dataset selection will be carried over to all future queries. However, if the user decides to modify the dataset selection, he/she needs to re-select datasets using Refine Search again. There is currently no function to modify an existing selection once it has been made. CV RBP weighting (the default in SEEK) uses a novel dataset weighting algorithm: w d = [( p) rel(i) p rank (i) ]/ Q q Q i=... R q, where Q is the query, R q is the ranking of genes generated when sorting genes in the genome by their correlation to a single query gene q. i is an item in the ranking R q. rel(i) is.0 if i is one of the genes in Q, 0 otherwise. rank(i) is the rank of i in R q. p is a tuning parameter (see below) that is set at 0.99. The integration of datasets using the weights is described by: score g,q = [w d [ z d (g,q)/ Q ]]/ w d d D q Q d D, where D is the set of datasets, Q is the query, z d is the z-scored correlation that is corrected for gene-connectivity, w d is the weight of the dataset d. 2 Order statistics is the search algorithm described in the MEM paper (Adler et al, Genome biology, 2009, 0:R39) 3 The parameter p refers to the w d formula (default value of p is 0.99). It tunes the importance of highly ranked correlations. A lower value of p indicates a higher importance attached to the top ranked correlations. The recommended range for p is between 0.95 and 0.999. Adjusting this parameter can control the distribution of weights across datasets. 4 Gene-connectivity correction on z-score: z corrected (g, q)=z(g, q) / G z(g, x), where G is the genome. x G

9.0 Co-expression View Page 2 3 4 5

The Co-expression View displays the co-expression landscape across 50 datasets at a time.. The top 50 datasets. The heading for a dataset includes the dataset rank and a list of keywords describing the dataset. 2. Query cross-validations. The heat-map demonstrates how each query gene correlates with the rest of the query across 50 datasets. 3. Color gradient scale corresponding to the query cross-validation heat-map. Values indicate the contribution of query q in the formula for w d. 4. The co-expression heat-map. A solid color cell denotes the co-expression score of a gene to the query in a particular dataset. A cell with hatching lines denotes a missing gene in the dataset. With this heat-map view, users can visualize the individual datasets' contributions to the final co-expression score for each gene in the rank list. 5. Color gradient scale corresponding to the co-expression heat-map. Values indicate z-scored correlation. Clicking a gene name on the Co-expression View Page invokes Gene Co-expression Contribution Page.

0.0 Gene Co-expression Contribution Page This view mode presents the co-expression scores of a given gene to the query across all 50 datasets.

.0 Export Pages Gene list export Tab-delimited format exporting the list of co-expressed genes ranked by co-expression score to the current query. Tip: Search a gene of interest to see where it is ranked right within the browser. (Use the browser search function)

Dataset list export Tab-delimited format exporting the list of datasets ranked by query-relevance weight to the current query. Tip: Search a dataset of interest to see where it is ranked right within the browser. (Use the browser search function)