Exercises. Biological Data Analysis Using InterMine workshop exercises with answers

Similar documents
BovineMine Documentation

HymenopteraMine Documentation

2. Take a few minutes to look around the site. The goal is to familiarize yourself with a few key components of the NCBI.

Genome Browsers - The UCSC Genome Browser

Tutorial 1: Exploring the UCSC Genome Browser

BioMart: a research data management tool for the biomedical sciences

Analyzing Variant Call results using EuPathDB Galaxy, Part II

Preprint. Bovine Genome Database: Tools for Mining the Bos taurus Genome. Running Title: Bovine Genome Database

Finding and Exporting Data. BioMart

Creating and Using Genome Assemblies Tutorial

MetaStorm: User Manual

SEEK User Manual. Introduction

GenViewer Tutorial / Manual

User Manual. Ver. 3.0 March 19, 2012

Genome Browsers Guide

Wilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment

ChIP-seq hands-on practical using Galaxy

Genomic Analysis with Genome Browsers.

Bioinformatics Hubs on the Web

Tutorial:OverRepresentation - OpenTutorials

2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.

Editing Pathway/Genome Databases

Wilson Leung 05/27/2008 A Simple Introduction to NCBI BLAST

Analyzing ChIP- Seq Data in Galaxy

You will be re-directed to the following result page.

Pathway Analysis using Partek Genomics Suite 6.6 and Partek Pathway

DAVID hands-on. by Ester Feldmesser, June 2017

A Quick Guide to Using Cerebral in InnateDB

Browser Exercises - I. Alignments and Comparative genomics

Human Disease Models Tutorial

User guide for GEM-TREND

FunRich Tool Documentation

Exercises: Motif Searching

Data Walkthrough: Background

Exploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix

Tutorial: RNA-Seq analysis part I: Getting started

Import GEO Experiment into Partek Genomics Suite

Pathway Analysis of Untargeted Metabolomics Data using the MS Peaks to Pathways Module

Facilitating Semantic Alignment of EBI Resources

BLAST Exercise 2: Using mrna and EST Evidence in Annotation Adapted by W. Leung and SCR Elgin from Annotation Using mrna and ESTs by Dr. J.

Editing Pathway/Genome Databases

Introduction to Genome Browsers

GEP Project Management System: Annotation Project Submission

BioExtract Server User Manual

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017

Topics of the talk. Biodatabases. Data types. Some sequence terminology...

GEP Project Management System: TSS Project Submission

The UCSC Genome Browser

Tutorial 1: Using Excel to find unique values in a list

ArrayExpress and Expression Atlas: Mining Functional Genomics data

INTRODUCTION TO BIOINFORMATICS

UCSC Genome Browser Pittsburgh Workshop -- Practical Exercises

Editing Pathway/Genome Databases

Annotating a Genome in PATRIC

Metasearch Process for Transcription Targets

WebGestalt Manual. January 30, 2013

Proteome Comparison: A fine-grained tool for comparative genomics

Genetic Analysis. Page 1

ChIP-Seq Tutorial on Galaxy

When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame

From genomic regions to biology

Sequence Alignment. GBIO0002 Archana Bhardwaj University of Liege

Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data

The Allen Human Brain Atlas offers three types of searches to allow a user to: (1) obtain gene expression data for specific genes (or probes) of

Web Resources. iphemap: An atlas of phenotype to genotype relationships of human ipsc models of neurological diseases

Genomics - Problem Set 2 Part 1 due Friday, 1/25/2019 by 9:00am Part 2 due Friday, 2/1/2019 by 9:00am

Colorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi

OnBase Guide Creating Document Packet Template

org.hs.ipi.db November 7, 2017 annotation data package

Humboldt-University of Berlin

The genexplain platform. Workshop SW2: Pathway Analysis in Transcriptomics, Proteomics and Metabolomics

CACAO Training. Jim Hu and Suzi Aleksander Spring 2016

A short Introduction to UCSC Genome Browser

Tutorial 4 BLAST Searching the CHO Genome

EBI services. Jennifer McDowall EMBL-EBI

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

Information Resources in Molecular Biology Marcela Davila-Lopez How many and where

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London

Advanced UCSC Browser Functions

GWAS Exercises 3 - GWAS with a Quantiative Trait

8:15 Introduction/Overview Michelle Giglio. 8:45 CloVR background W. Florian Fricke. 9:15 Hands-on: Start CloVR W. Florian Fricke

Core Technology Development Team Meeting

VisANT 4.0 User Manual. Contents

This tutorial will guide a curator/user to create the files to upload phenotype experiment annotation and data onto T3.

Using the GEMM Applications System. Signing in When you first access the GEMM portal, you will be presented with this login screen.

Cyverse tutorial 1 Logging in to Cyverse and data management. Open an Internet browser window and navigate to the Cyverse discovery environment:

The UCSC Gene Sorter, Table Browser & Custom Tracks

Mouse Atlas DiscoverySpace Tutorial

HIPPIE User Manual. (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu)

TAIR User guide. TAIR User Guide Version 1.0 1

Searching the ENCODE Portal

Lecture 5. Functional Analysis with Blast2GO Enriched functions. Kegg Pathway Analysis Functional Similarities B2G-Far. FatiGO Babelomics.

Database of Curated Mutations (DoCM) ournal/v13/n10/full/nmeth.4000.

Core Technology Development Team Meeting

Genome Environment Browser (GEB) user guide

How To: Run the ENCODE histone ChIP- seq analysis pipeline on DNAnexus

Nature Biotechnology: doi: /nbt Supplementary Figure 1

A quick review. Which molecular processes/functions are involved in a certain phenotype (e.g., disease, stress response, etc.)

Tutorial for the PNNL Biodiversity Library Skyline Plugin

HsAgilentDesign db

Transcription:

Exercises Biological Data Analysis Using InterMine workshop exercises with answers

Exercise1: Faceted Search Use HumanMine for this exercise 1. Search for one or more of the following using the keyword search (result for PPARG only shown below): PPARG rs10509540 *insulin* Human PPARG is the first gene result returned when HumanMine is searched for PPARG. Click on this gene to be taken to the report page.

Exercise1: Faceted Search: 2. Filter and create a list: Search for *diabetes* Filter for publications Make a list of the publications Use the checkbox in the header to select all the publications and make a list

Exercise2: Exploring a Gene: You are interested in the Human PPARG gene and want to know the following things about it: Use HumanMine for this exercise 1. On which chromosome is PPARG located? 2. Can I access the sequence for the PPARG gene? 3. With which diseases is PPARG associated? 4. In which tissues is PPARG most highly expressed? 5. Does the PPARG protein have any know isoforms? 6. Is there a PPARG orthologue in D. melanogaster? 7. Does this orthologue interact with any other Genes/proteins? Identify the interaction type (genetic/physical) 8. For the interaction with CG3040, what was the original experiment and publication that determined this interaction

Exercise2: Exploring a Gene: 1. On which chromosome is PPARG located? 2. Can I access the sequence for the PPARG gene? The first section on the gene report page provides information about the chromosome location of the gene along with identifiers and synonyms and a link to the FASTA DNA sequence..

Exercise2: Exploring a Gene: 3. With which diseases is PPARG associated? In HumanMine a summary of data is provided at the top of the report page (note this feature is not available in all InterMine's). Diseases links to a table for OMIM data Further disease information is sometimes available from the Curated comments from Uniprot :

Exercise2: Exploring a Gene: 4. In which tissues is PPARG most highly expressed? Data on tissue expression can be found from three sources: A. Human gene expression atlas of 5372 samples representing 369 different cell and tissue types, disease states and cell lines: from http://www.ebi.ac.uk/arrayexpress/experiments/e-mtab-62/ B. Human Protein Atlas: http://www.proteinatlas.org/ C. Curated comments from Uniprot:

Exercise2: Exploring a Gene: 5. Does the PPARG protein have any know isoforms? a. Navigate to the proteins table using the quick links b. Select the PPARG_HUMAN protein (the canonical uniprot annotated protein) to be taken to the protein report page. Note: we can also see from this table that PPARG has two isoforms: PPARG_HUMAN2 and PPARG_HUMAN3 c. Navigate to the Isoforms table on the protein report page. Note that this table links to a report page for each of the isoforms.

Exercise2: Exploring a Gene: 6. Is there a PPARG orthologue in D. melanogaster? Use the Links to other Mines to navigate to the D. melanogaster orthologue in FlyMine Note that there are three othologous fly genes. For this exercise select the first. 7. Does this orthologue interact with any other proteins? Identify the interaction type (genetic/physical). Use the Interactions quick link to navigate to protein and genetic Interaction data. Eip75B has a genetic interaction with genes Nos, Ph-p, Smn, S6K, Kr, Rheb and Hr51 and physical interactions with CG3040, COX5B and Hr51.

Exercise2: Exploring a Gene: 8. For the interaction with CG3040, what was the original experiment and publication that determined this interaction a. Select Show in table format b. Select the interaction name for the first entry c. Select the experiment name d. Information about the publication is on the experiment page

Exercise3: List Upload: Use FlyMine for this exercise 1. Navigate to the lists tab and upload sub-tab 2. Select the example list (leave type and organism as the default values). 3. Click Create list. 4. Examine and understand the list page, name and save your list. E2f has matched two genes (duplicates) - in this case you need to decide which of the two genes you want in your list (or both). The action column allows you to do this. Two of the identifiers in the list matched the same gene: FBgn0010433 and ato. This is indicated in the direct hits.

Exercise3: List Upload: One of the identifiers is a protein identifier (TWIST_DROME). As the associated gene could be identified, this has been added to the list. This is shown under non-gene identifiers. Two of the identifiers matched a synonym (rather than a current identifier). As the synonyms matched only one gene, these are automatically added to the list.

Exercise4: List Analysis Pages: Use FlyMine for this exercise Examine the FlyMine public list: PL_FlyTF_putativeTFs 1. In which tissues are these genes most highly expressed? 2. What is the most enriched GO term for this list? 3. How many genes in the list are annotated with this GO term? Note: you could make a sub-list containing only genes from this list annotated with this term by clicking on the matches number 4. Navigate to the MouseMine database to examine the mouse orthologues for this list. 5. How many mouse orthologues are there for this list? 6. Are these mouse genes enriched for any phenotypes (Mammalian Phenotype Ontology)?

Exercise4: List Analysis Pages: 1. In which tissues are these genes most highly expressed? 445 genes are up-regulated in the larval CNS 2. What is the most enriched GO term for this list? 3. How many genes in the list are annotated with this GO term? 476 genes in the list are annotated with the GO term transcription, DNA templated. This is the most enriched GO term You can click on this number to create A sub-list of just these 476 genes

Exercise4: List Analysis Pages: 4. Navigate to the MouseMine database to examine the mouse orthologues for this list. 5. How many mouse orthologues are there for this list? 6. Are these mouse genes enriched for any phenotypes?

Exercise 5: Template searches: 1. Browse the template searches in FlyMine and HumanMine - try running a few or changing the filters. 2. Use the search box to find template searches for interactions 3. Filter the FlyMine template searches to show only expression templates.

Exercise 6: Using template searches: In our exploration of the PPARG gene we could see that it is Up-regulated in adipose tissue. Use the template searches in HumanMine to answer the following questions: 1. What other genes are up-regulated in adipose tissue according to the array express dataset? Don't forget to add a filter for UP expression. Save a list of these genes. 2. Do any of the genes identified in 1. interact with PPARG? Save a list of these genes.

Exercise 6: Using template searches: Use HumanMine for this exercise 1. What other genes are up-regulated in adipose tissue according to the Array express dataset? Don't forget to add a filter for UP expression. Save a list of these genes. Find the following template and set Condition = Adipose tissue Filter the column Atlas Expression.Expression for UP Save as List : Atlas Expression > Gene. You should have a list of 1, 654 genes

Exercise 6: Using template searches: 2. Do any of the genes identified in 1. interact with PPARG? Save a list of these genes. Use the following template: Set the first condition to PPARG and the second condition to your saved list. Save the genes from Gene > Interaction > Participant 2 (24 genes)

Exercise 7: Query Builder: Using HumanMine: we will build a query to show Human genes and OMIM diseases, and then add a further constraint to show genes associated With all types of Diabetes. 1. Start your query from Gene 2. Constrain Organism to Homo Sapiens 3. Add the columns of data we want in our results: Gene: Primary identifier and Symbol Disease: name 4. Run this search - Show results. 5. Return to the query (Use the Trail in the top left) and add a constraint to Disease name for CONTAINS Diabetes 6. Run the search and save the set of genes

Exercise 7: Query Builder: Use HumanMine for this exercise 1. Start your query from Gene 2. Constrain Organism.name to Homo sapiens

Exercise 7: Query Builder: 3. Add the columns of data we want in our results: Gene: Primary identifier and Symbol Disease: name 4. Run this search - Show results.

Exercise 7: Query Builder: 5. Return to the query (Use the Trail in the top left) and add a constraint to Disease name for CONTAINS Diabetes NOTE: Make sure you change the Constraint from = to CONTAINS 6. Run the search and save the set of genes (66 genes)

Exercise 8: Analysis Workflows: Use HumanMine for this exercise 1. Identify the sets of genes you have created under the lists view tab. 2. Use the list set operations available on this page to intersect the list of diabetes genes you created with the query builder with your previous set of genes (genes expressed in adipose tissue that interact with pparg) created in exercise 6. Only 1 gene is found that is expressed in the adipose, interacts with PPARG and is associated with diabetes. You can find out more about this gene by clicking through to its report page.

Exercise 8: Analysis Workflows: 3. We now want to know if this gene has been identified in GWAS studies. Run a template on your intersected list/gene to find this out. 4. Use the column summary to find if any of the GWAS phenotypes are related to diabetes.

Exercise 9: Region Search: Use FlyMine for this exercise 1. Select the example set of regions 2. De-select the features and re-select Genes and Regulatory regions 3. Extend the search by 5kb 4. Run the search

Exercise 9: Region Search: Examine the results and: 5. Create a list of all genes found. 6. Create a list of the regulatory regions found in the first genomic span.