Down with Species-Specific Database Projects, Up with Data Services

Size: px
Start display at page:

Download "Down with Species-Specific Database Projects, Up with Data Services"

Transcription

1 1 Down with Species-Specific Database Projects, Up with Data Services Lincoln D. Stein, Cold Spring Harbor Laboratory This whitepaper begins with an illustration drawn from a database that has nothing to do with the plant kingdom. Figure 1 shows a Cell page from WormBase, a web site devoted to the genome and biology of C. elegans. Among the information available on this page is its role in the organism's life cycle, its lineage, its fate, a diagram of its anatomy, information on the genes that are known to be turned on in this cell, and links to citations that refer to the cell. Figure 1: The Standard HTML Display of a C. elegans Cell Figure 2 shows what is displayed when the user clicks on the link labeled "XML Display" at the top of the first page. This displays the raw database record in XML (extensible Markup Language) form. It is the same information as was

2 2 displayed on the HTML page, but freed of extraneous formatting and easily parsed by standard software. Figure 2: The XML Representation of a C. elegans Cell Every single data object in WormBase, whether it be a cell, a gene, a sequence, a protein, a genetic map, a mutant, or an allele, is available in XML form. To fetch the data, there is a simple published URL: where N and C are the name and class of the object to fetch in XML form. For the page shown earlier, the object name is ADAR, and the class is Cell. The schema for the class can be fetched using a variant on this URL: Where N is the name of the class, for example Cell.

3 3 At first this service might seem extraneous. After all, the primary target audience for WormBase, the bench biologist, has no need for XML. What he wants is a browsable HTML page, or a downloadable Excel spreadsheet. In fact, the XML service is provided for the benefit of computer-savvy biologists and bioinformatics professionals, both those directly involved with WormBase, and those outside the project. For the WormBase insiders, the XML interface allows us to decouple our HTML pages from the underlying database. The core WormBase sequence annotation viewer is driven entirely off XML, and can be moved from data source to data source without modification. More importantly, the XML service is one of the ways that we make WormBase "sticky" for other bioinformatics efforts. Should a bioinformaticist need to download a genetic map, a cell lineage or a set of sequence annotations from Wormbase, he need only generate a request for the URL given above and the database will obligingly yield up all it knows in a predictable, easily-parsed format. Other ways that we have endeavoured to make WormBase sticky are: 1) a universal incoming link format allows an external web page to link to WormBase using the database object's name and class. 2) ad hoc queries on the database using an HTTP request 3) direct ad hoc queries on the database via a command-line Perl API 4) flatfile dumps of sequence annotations using HTTP requests 5) structured dumps of sequence annotations using the distributed annotation system (DAS) protocol 6) FTPable tab-delimited tables containing frequently-requested extracts of the database 7) the entire database (in ACEDB format) can be downloaded and installed 8) all the software for WormBase is open source and available on the WormBase FTP site. There is no restriction on who uses the software and what purpose they use it for. Why Biological Databases Should be Sticky Stickiness means having hooks for third parties to connect to. It is a hallmark of good biological web sites. NCBI's Entrez is sticky by virtue of having a published URL for incoming links, and the LinkOut system for outgoing connections. The

4 4 UCSC genome browser provides flat file dumps and a call interface for dumping out selected annotation tracks. Ensembl provides a Perl command-line interface for accessing its resources at a high level of abstraction and a SQL interface for issuing low-level queries. Why is this important? Because the era in which biological databases could stand alone has departed, if indeed it existed at all. First is the breakdown of the one-species/one- database model. We are moving away from a world of singlespecies databases towards a situation in which biological databases become specialists for a particular type of information across a wide variety of species. Prominent examples include TIGR's TOGA groups, which cross species boundaries, Interpro's protein families, and the Gene Ontology Consortium's process terms. Another factor is the increasing sophistication of users. As a new generation of computer-savvy biologists appear, biological databases are now called upon to serve the needs of researchers who want to do more than browse web pages. They want to extract the information, transform it, compare it to their own data, and integrate it into data services that they have built themselves. Sticky databases provide the hooks needed to integrate data. They provide the stable interfaces needed to link data sources together, to extract and transform information, and possibly even to submit new data. They get used in creative ways that their inventors did not envision, enriching the community, and enlivening the scientific discourse. Non-sticky databases are pretty web sites; very useful in their own right, but with no potential to grow beyond the immediate goals of their designers. Biological Databases as Service Providers The ultimate sticky database is one that is organized around the idea of a set of data services. Instead of offering a few hooks to the community, a biological data service is nothing but hooks. I'll give a concrete example of what I mean. Imagine a database that contains a number of genetic maps, each of which is composed of a set of markers. A genetic map service would have a published interface which accepts requests for genetic maps and returns lists of markers and their positions. Other interfaces would allow the user to retrieve the list of maps available, and to select certain genetic maps based on the map type and species. Now imagine a marker information service that contains molecular information for genetic markers: the primer pairs, assay conditions, and so forth. It would respond to requests for markers by returning the associated information, and

5 5 provide a query interface for selecting markers by their type, polymorphism rate, and so forth. Using a combination of these two service, a programmer could still put together the classic browsable genetic map interface which draws genetic maps and responds to clicks on markers by returning the corresponding molecular information. By breaking the information into discrete services with a published interface, the data provider has opened this information to the community to use for diverse purposes such as comparative map analysis. There are other benefits. The service model makes it possible for the two databases to be physically separate, and possibly under different administrative control. Data visualization and query tools can now be written to a stable interface, providing modularity. This modularity, in turn, encourages code reuse and sharing, and allows one user interfaces to run on top of many data sources. The DAS Experience The prototype for this type of biological information service is DAS, the Distributed Annotation System. DAS is an experimental client/server protocol designed by Sean Eddy and myself which allows biological databases to become service providers for sequence annotation information. In the core of the protocol, an information consumer asks the server for all or a subset of its annotations in a particular region of the genome, and the server responds by returning a list of its annotations. It is then up to the client to store, analyze or display the information. The protocol deliberately limits the amount of information that can be transmitted by the data source to the bare essentials of a sequence annotation: a reference point on the genome, the sequence range that the annotation covers, and a brief description of the annotation type. For further information on how the annotation was made and its significance, the client is referred to a URL provided by the data source. The DAS protocol allows the same visualization and analysis tool to run on top of any database that provides a DAS service. Data providers can retrofit their databases to provide the DAS service by creating a relatively thin DAS compatibility layer. So far, the DAS experiment seems successful. In recent months, it has proven to be very popular among the model organism databases, and is now used by EBI Ensembl, the human genome browser at UCSC, TIGR, WormBase, University of Cambridge, the Berkeley Drosophila Sequencing Project, and others. There is considerable enthusiasm among the developer community, and an everincreasing number of data providers have indicated their intent to build or install DAS servers.

6 6 Modularizing Biological Databases In addition to the genetic mapping and sequence annotation services described earlier, I see many opportunities for reorganizing biological databases as sets of discrete services. Here are a few ideas for standard services: - A comparative genetic mapping service, which given coordinates on one map, translates those coordinates to another map. This could be used to compare different genetic maps in the same species as well as those in different, but substantially synteneic species. - Along the same lines, a genome assembly translation service which translate coordinates from one version of a sequence assembly to another. - A gene ontology service, which given a protein identifier, returns the gene ontology assignments for that protein. - A protein family service, which given a protein identifier, returns its domains, families and superfamilies according to one or more protein classification systems. - A mutant strain service, which given a phenotype and a species, returns all strains that express that phenotype (this presupposes a phenotype ontology, such as several groups are developing) - A sequence similarity service, which runs searches for nearly identical sequences using one of the new fast algorithms (e.g. SSAHA or BLAT). I envision these services being implemented on top of an industry-standard communications protocol. I lean towards SOAP/XML because of the predominant industry trend in this direction, but other protocols (e.g. CORBA) should be taken under consideration. WormBase as an Anachronism Let s return to WormBase for a moment. Although I think WormBase has done well in serving the needs of the C. elegans community, its original goal to be the authorative one-stop shop for all C. elegans information is an increasingly unrealistic one. We do very well at presenting the C. elegans genome, genetic map and mutants, not so with the proteome and cellular anatomy, and very poorly when it comes to microarray data or transposon-mediated knockouts. The fact is that our expertise is strong in some areas but weak in others, and that the ability of the community to

7 7 develop new types of information outstrips the WormBase curators ability to classify and incorporate it. In recognition of this, WormBase has made alliances with other data providers so that we can draw on each others strengths. The oldest of our alliances is with WormPD, the database of the C. elegans proteome provided by Proteome, Inc. (now a division of Incyte). We have agreed upon a common nomenclature for the proteins and developed a simple calling scheme that allows WormBase to link to Proteome for protein information, and for Proteome to link back to WormBase for genetic mapping and genomic information. More recently, we have made similar arrangements with SwissProt for Gene Ontology and protein family information, and with EuGenes for orthologue clustering. We are using DAS to exchange information with TIGR, and are currently working on data exchange protocols with the C. elegans Microarray project at Stanford, the Orfeome project at Dana Farber, and the Transcriptome project at NCBI. We realize that becoming a component in a network of data service providers allows us to play to our strengths. Ultimately both WormBase and the community benefits. Challenges and Opportunities Transitioning from a species-oriented to a service-oriented mission presents both challenges and opportunities for providers of biological data. As described earlier, this transition would allow a data source that has garnered expertise in, say, the storage and analysis of microarray data from Arabidopsis, to establish itself as a provider of microarray data services for a large number of plant species. However, the reemphasis would also benefit newcomers, who could now focus on setting up a discrete service rather than making the much more challenging leap to become a complete source for species-specific information. The major challenge is integration and standardization. It is very good for a data source to provide external hooks into its database, but the real benefits only kick in when several data sources settle on a standard interface. This allows the same software tools to be used across all data sources, and encourages new sources to create compliant interfaces, a phenomenon known as the "network effect." However, it is notoriously difficult to develop standards among biological databases. There are several reasons for this. One is simply that standardization is hard. There are many potential technical approaches, and reasonable people will reasonably disagree. However, as the Internet sector has shown, it is possible to overcome these technical barriers by adopting well-tested standardization practices. I happen to favor the IETF (Internet Engineering Task

8 8 Force) model, in which proposals for standardization are accompanied by reference implementations that can then be tested head-to-head, but other types of standardization processes work as well. What the NSF Can Do Service standardization won't occur unless there is a strong incentive to do so, and to date the funding practices of NSF and other agencies have discouraged evolution in this direction. By focusing efforts on species-specific databases, and by insisting that these projects become self-sufficient (i.e. profitable) after the initial development is finished, funding agencies encourage data providers to build proprietary, non-portable systems. The next time a database is needed, groups need to start again from scratch. This is a wasteful and inefficient practice. A more parsimonious approach would be to take the long view and provide funding directly for the development of biological data service infrastructure. The deliverables for such projects would be portable, general purpose, software and standards that are made freely available to the academic community as well as industry. The projects should not be tied to a particular species, and should not piggybacked on top of a database delivery project; when a data release deadline looms, the need to push the data out the door always wins out over the portability of the underlying software. The flip side of funding for infrastructure development is funding for infrastructure operations. I feel strongly that biological data services are no different from the stock centers in their need for stable, reliable, long-term funding. In many fields, the data services have become indispensable to the practice of biological research. Let s figure out a way to ensure the ongoing growth and availability of this infrastructure. Finally, it is important for the funding agencies to coordinate their efforts. The NHGRI and NIHGMS have recently announced their intention to jointly fund the development of a "model organism database toolkit." The USDA ARS has called together a working group to create a common set of services for agricultural sequencing projects. The DOE has organized workshops to develop standard XML formats for exchanging biological data. Working together with a consistent vision of the goal, the funding agencies can transform the bioinformatics landscape from a small number of insular database projects, to a large number of open, interoperable data services, together forming the fabric of a new biological data infrastructure.

Software review. Biomolecular Interaction Network Database

Software review. Biomolecular Interaction Network Database Biomolecular Interaction Network Database Keywords: protein interactions, visualisation, biology data integration, web access Abstract This software review looks at the utility of the Biomolecular Interaction

More information

Bioinformatics Data Distribution and Integration via Web Services and XML

Bioinformatics Data Distribution and Integration via Web Services and XML Letter Bioinformatics Data Distribution and Integration via Web Services and XML Xiao Li and Yizheng Zhang* College of Life Science, Sichuan University/Sichuan Key Laboratory of Molecular Biology and Biotechnology,

More information

Topics of the talk. Biodatabases. Data types. Some sequence terminology...

Topics of the talk. Biodatabases. Data types. Some sequence terminology... Topics of the talk Biodatabases Jarno Tuimala / Eija Korpelainen CSC What data are stored in biological databases? What constitutes a good database? Nucleic acid sequence databases Amino acid sequence

More information

Genome Browsers Guide

Genome Browsers Guide Genome Browsers Guide Take a Class This guide supports the Galter Library class called Genome Browsers. See our Classes schedule for the next available offering. If this class is not on our upcoming schedule,

More information

WormBase Todd Harris, PhD. CBPSS Mini Symposium

WormBase Todd Harris, PhD. CBPSS Mini Symposium WormBase Todd Harris, PhD todd@wormbase.org @tharris CBPSS Mini Symposium Mission Provide the biomedical research community with accurate, current, and accessible information on the genetics, genomics,

More information

DATA-SHARING PLAN FOR MOORE FOUNDATION Coral resilience investigated in the field and via a sea anemone model system

DATA-SHARING PLAN FOR MOORE FOUNDATION Coral resilience investigated in the field and via a sea anemone model system DATA-SHARING PLAN FOR MOORE FOUNDATION Coral resilience investigated in the field and via a sea anemone model system GENERAL PHILOSOPHY (Arthur Grossman, Steve Palumbi, and John Pringle) The three Principal

More information

Finding and Exporting Data. BioMart

Finding and Exporting Data. BioMart September 2017 Finding and Exporting Data Not sure what tool to use to find and export data? BioMart is used to retrieve data for complex queries, involving a few or many genes or even complete genomes.

More information

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services Enabling Open Science: Data Discoverability, Access and Use Jo McEntyre Head of Literature Services www.ebi.ac.uk About EMBL-EBI Part of the European Molecular Biology Laboratory International, non-profit

More information

Information Resources in Molecular Biology Marcela Davila-Lopez How many and where

Information Resources in Molecular Biology Marcela Davila-Lopez How many and where Information Resources in Molecular Biology Marcela Davila-Lopez (marcela.davila@medkem.gu.se) How many and where Data growth DB: What and Why A Database is a shared collection of logically related data,

More information

Bioinformatics Hubs on the Web

Bioinformatics Hubs on the Web Bioinformatics Hubs on the Web Take a class The Galter Library teaches a related class called Bioinformatics Hubs on the Web. See our Classes schedule for the next available offering. If this class is

More information

CAP BIOINFORMATICS Su-Shing Chen CISE. 8/19/2005 Su-Shing Chen, CISE 1

CAP BIOINFORMATICS Su-Shing Chen CISE. 8/19/2005 Su-Shing Chen, CISE 1 CAP 5510-2 BIOINFORMATICS Su-Shing Chen CISE 8/19/2005 Su-Shing Chen, CISE 1 Building Local Genomic Databases Genomic research integrates sequence data with gene function knowledge. Gene ontology to represent

More information

XML in the bipharmaceutical

XML in the bipharmaceutical XML in the bipharmaceutical sector XML holds out the opportunity to integrate data across both the enterprise and the network of biopharmaceutical alliances - with little technological dislocation and

More information

2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.

2) NCBI BLAST tutorial   This is a users guide written by the education department at NCBI. Web resources -- Tour. page 1 of 8 This is a guided tour. Any homework is separate. In fact, this exercise is used for multiple classes and is publicly available to everyone. The entire tour will take

More information

How to store and visualize RNA-seq data

How to store and visualize RNA-seq data How to store and visualize RNA-seq data Gabriella Rustici Functional Genomics Group gabry@ebi.ac.uk EBI is an Outstation of the European Molecular Biology Laboratory. Talk summary How do we archive RNA-seq

More information

Genome Browsers - The UCSC Genome Browser

Genome Browsers - The UCSC Genome Browser Genome Browsers - The UCSC Genome Browser Background The UCSC Genome Browser is a well-curated site that provides users with a view of gene or sequence information in genomic context for a specific species,

More information

User Manual. Ver. 3.0 March 19, 2012

User Manual. Ver. 3.0 March 19, 2012 User Manual Ver. 3.0 March 19, 2012 Table of Contents 1. Introduction... 2 1.1 Rationale... 2 1.2 Software Work-Flow... 3 1.3 New in GenomeGems 3.0... 4 2. Software Description... 5 2.1 Key Features...

More information

Data Curation Profile Human Genomics

Data Curation Profile Human Genomics Data Curation Profile Human Genomics Profile Author Profile Author Institution Name Contact J. Carlson N. Brown Purdue University J. Carlson, jrcarlso@purdue.edu Date of Creation October 27, 2009 Date

More information

A Protocol for Maintaining Multidatabase Referential Integrity. Articial Intelligence Center. SRI International, EJ229

A Protocol for Maintaining Multidatabase Referential Integrity. Articial Intelligence Center. SRI International, EJ229 A Protocol for Maintaining Multidatabase Referential Integrity Peter D. Karp Articial Intelligence Center SRI International, EJ229 333 Ravenswood Ave. Menlo Park, CA 94025 voice: 415-859-6375 fax: 415-859-3735

More information

Human Disease Models Tutorial

Human Disease Models Tutorial Mouse Genome Informatics www.informatics.jax.org The fundamental mission of the Mouse Genome Informatics resource is to facilitate the use of mouse as a model system for understanding human biology and

More information

ECLIPSE PERSISTENCE PLATFORM (ECLIPSELINK) FAQ

ECLIPSE PERSISTENCE PLATFORM (ECLIPSELINK) FAQ ECLIPSE PERSISTENCE PLATFORM (ECLIPSELINK) FAQ 1. What is Oracle proposing in EclipseLink, the Eclipse Persistence Platform Project? Oracle is proposing the creation of the Eclipse Persistence Platform

More information

Software review. Shopping in the genome market with EnsMart

Software review. Shopping in the genome market with EnsMart Shopping in the genome market with EnsMart Keywords: genome databases, human genome, comparative genomics, data mining, open source software Abstract Life scientists who work with the supermarket of genome

More information

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING Larson Hogstrom, Mukarram Tahir, Andres Hasfura Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 18.337/6.338

More information

Using WebGBrowse to Visualize Genome Annotation on GBrowse

Using WebGBrowse to Visualize Genome Annotation on GBrowse Protocol Using WebGBrowse to Visualize Genome Annotation on GBrowse Ram Podicheti and Qunfeng Dong 1 Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA INTRODUCTION

More information

Design and Implementation of a Service Discovery Architecture in Pervasive Systems

Design and Implementation of a Service Discovery Architecture in Pervasive Systems Design and Implementation of a Service Discovery Architecture in Pervasive Systems Vincenzo Suraci 1, Tiziano Inzerilli 2, Silvano Mignanti 3, University of Rome La Sapienza, D.I.S. 1 vincenzo.suraci@dis.uniroma1.it

More information

mpmorfsdb: A database of Molecular Recognition Features (MoRFs) in membrane proteins. Introduction

mpmorfsdb: A database of Molecular Recognition Features (MoRFs) in membrane proteins. Introduction mpmorfsdb: A database of Molecular Recognition Features (MoRFs) in membrane proteins. Introduction Molecular Recognition Features (MoRFs) are short, intrinsically disordered regions in proteins that undergo

More information

Helpful Galaxy screencasts are available at:

Helpful Galaxy screencasts are available at: This user guide serves as a simplified, graphic version of the CloudMap paper for applicationoriented end-users. For more details, please see the CloudMap paper. Video versions of these user guides and

More information

> Semantic Web Use Cases and Case Studies

> Semantic Web Use Cases and Case Studies > Semantic Web Use Cases and Case Studies Case Study: Improving Web Search using Metadata Peter Mika, Yahoo! Research, Spain November 2008 Presenting compelling search results depends critically on understanding

More information

Feed the Future Innovation Lab for Peanut (Peanut Innovation Lab) Data Management Plan Version:

Feed the Future Innovation Lab for Peanut (Peanut Innovation Lab) Data Management Plan Version: Feed the Future Innovation Lab for Peanut (Peanut Innovation Lab) Data Management Plan Version: 20180316 Peanut Innovation Lab Management Entity The University of Georgia, Athens, Georgia Feed the Future

More information

org.hs.ipi.db November 7, 2017 annotation data package

org.hs.ipi.db November 7, 2017 annotation data package org.hs.ipi.db November 7, 2017 org.hs.ipi.db annotation data package Welcome to the org.hs.ipi.db annotation Package. The annotation package was built using a downloadable R package - PAnnBuilder (download

More information

BovineMine Documentation

BovineMine Documentation BovineMine Documentation Release 1.0 Deepak Unni, Aditi Tayal, Colin Diesh, Christine Elsik, Darren Hag Oct 06, 2017 Contents 1 Tutorial 3 1.1 Overview.................................................

More information

XML for Bioinformatics

XML for Bioinformatics XML for Bioinformatics Ethan Cerami XML for Bioinformatics Library of Congress Cataloging-in-Publication Data Cerami, Ethan. XML for bioinformatics / Ethan Cerami. p. cm. Includes bibliographical references

More information

Indiana University Research Technology and the Research Data Alliance

Indiana University Research Technology and the Research Data Alliance Indiana University Research Technology and the Research Data Alliance Rob Quick Manager High Throughput Computing Operations Officer - OSG and SWAMP Board Member - RDA Organizational Assembly RDA Mission

More information

SELF-SERVICE SEMANTIC DATA FEDERATION

SELF-SERVICE SEMANTIC DATA FEDERATION SELF-SERVICE SEMANTIC DATA FEDERATION WE LL MAKE YOU A DATA SCIENTIST Contact: IPSNP Computing Inc. Chris Baker, CEO Chris.Baker@ipsnp.com (506) 721 8241 BIG VISION: SELF-SERVICE DATA FEDERATION Biomedical

More information

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography Christopher Crosby, San Diego Supercomputer Center J Ramon Arrowsmith, Arizona State University Chaitan

More information

Analyzing Variant Call results using EuPathDB Galaxy, Part II

Analyzing Variant Call results using EuPathDB Galaxy, Part II Analyzing Variant Call results using EuPathDB Galaxy, Part II In this exercise, we will work in groups to examine the results from the SNP analysis workflow that we started yesterday. The first step is

More information

Tutorial 1: Exploring the UCSC Genome Browser

Tutorial 1: Exploring the UCSC Genome Browser Last updated: May 12, 2011 Tutorial 1: Exploring the UCSC Genome Browser Open the homepage of the UCSC Genome Browser at: http://genome.ucsc.edu/ In the blue bar at the top, click on the Genomes link.

More information

MDA Blast2GO Exercises

MDA Blast2GO Exercises MDA 2011 - Blast2GO Exercises Ana Conesa and Stefan Götz March 2011 Bioinformatics and Genomics Department Prince Felipe Research Center Valencia, Spain Contents 1 Annotate 10 sequences with Blast2GO 2

More information

Blast2GO Teaching Exercises

Blast2GO Teaching Exercises Blast2GO Teaching Exercises Ana Conesa and Stefan Götz 2012 BioBam Bioinformatics S.L. Valencia, Spain Contents 1 Annotate 10 sequences with Blast2GO 2 2 Perform a complete annotation process with Blast2GO

More information

White Paper: Delivering Enterprise Web Applications on the Curl Platform

White Paper: Delivering Enterprise Web Applications on the Curl Platform White Paper: Delivering Enterprise Web Applications on the Curl Platform Table of Contents Table of Contents Executive Summary... 1 Introduction... 2 Background... 2 Challenges... 2 The Curl Solution...

More information

Integrated Access to Biological Data. A use case

Integrated Access to Biological Data. A use case Integrated Access to Biological Data. A use case Marta González Fundación ROBOTIKER, Parque Tecnológico Edif 202 48970 Zamudio, Vizcaya Spain marta@robotiker.es Abstract. This use case reflects the research

More information

BIOINFORMATICS A PRACTICAL GUIDE TO THE ANALYSIS OF GENES AND PROTEINS

BIOINFORMATICS A PRACTICAL GUIDE TO THE ANALYSIS OF GENES AND PROTEINS BIOINFORMATICS A PRACTICAL GUIDE TO THE ANALYSIS OF GENES AND PROTEINS EDITED BY Genome Technology Branch National Human Genome Research Institute National Institutes of Health Bethesda, Maryland B. F.

More information

IP PBX for Service Oriented Architectures Communications Web Services

IP PBX for Service Oriented Architectures Communications Web Services IP PBX for Service Oriented Architectures Communications Web Services.......... Introduction Enterprise communications have traditionally been provided by closed, stand-alone PBX systems. Installed in

More information

CompClustTk Manual & Tutorial

CompClustTk Manual & Tutorial CompClustTk Manual & Tutorial Brandon King Copyright c California Institute of Technology Version 0.1.10 May 13, 2004 Contents 1 Introduction 1 1.1 Purpose.............................................

More information

Retrieving factual data and documents using IMGT-ML in the IMGT information system

Retrieving factual data and documents using IMGT-ML in the IMGT information system Retrieving factual data and documents using IMGT-ML in the IMGT information system Authors : Chaume D. *, Combres K. *, Giudicelli V. *, Lefranc M.-P. * * Laboratoire d'immunogénétique Moléculaire, LIGM,

More information

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services Patrick Wendel Imperial College, London Data Mining and Exploration Middleware for Distributed and Grid Computing,

More information

HymenopteraMine Documentation

HymenopteraMine Documentation HymenopteraMine Documentation Release 1.0 Aditi Tayal, Deepak Unni, Colin Diesh, Chris Elsik, Darren Hagen Apr 06, 2017 Contents 1 Welcome to HymenopteraMine 3 1.1 Overview of HymenopteraMine.....................................

More information

WheatIS: Progress report

WheatIS: Progress report WheatIS: Progress report WheatIS Annual meeting, San Diego, 9 January 2015 WheatIS data submission DSpace Beta-version to test: http://urgi.versailles.inra.fr/xmlui/ At the moment, available submission

More information

3DA Meta Data Exporter for Revit is a registered trademark of 3DA Systems Inc. and 3dasystems.com

3DA Meta Data Exporter for Revit is a registered trademark of 3DA Systems Inc. and 3dasystems.com Copyright This manual is protected by copyright laws. No part of it may be translated, copied or reproduced, in any form or by any means, without written permission from 3DA Systems Inc. 3DA reserves the

More information

How to integrate data into Tableau

How to integrate data into Tableau 1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service

More information

SEMANTIC SOLUTIONS FOR OIL & GAS: ROLES AND RESPONSIBILITIES

SEMANTIC SOLUTIONS FOR OIL & GAS: ROLES AND RESPONSIBILITIES SEMANTIC SOLUTIONS FOR OIL & GAS: ROLES AND RESPONSIBILITIES Jeremy Carroll, Ralph Hodgson, {jeremy,ralph}@topquadrant.com This paper is submitted to The W3C Workshop on Semantic Web in Energy Industries

More information

Introduction to Genome Browsers

Introduction to Genome Browsers Introduction to Genome Browsers Rolando Garcia-Milian, MLS, AHIP (Rolando.milian@ufl.edu) Department of Biomedical and Health Information Services Health Sciences Center Libraries, University of Florida

More information

ISO INTERNATIONAL STANDARD. Health informatics Genomic Sequence Variation Markup Language (GSVML)

ISO INTERNATIONAL STANDARD. Health informatics Genomic Sequence Variation Markup Language (GSVML) INTERNATIONAL STANDARD ISO 25720 First edition 2009-08-15 Health informatics Genomic Sequence Variation Markup Language (GSVML) Informatique de santé Langage de balisage de la variation de séquence génomique

More information

Facilitating Semantic Alignment of EBI Resources

Facilitating Semantic Alignment of EBI Resources Facilitating Semantic Alignment of EBI Resources 17 th March, 2017 Tony Burdett Technical Co-ordinator Samples, Phenotypes and Ontologies Team www.ebi.ac.uk What is EMBL-EBI? Europe s home for biological

More information

Creating and Using Genome Assemblies Tutorial

Creating and Using Genome Assemblies Tutorial Creating and Using Genome Assemblies Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Create a Genome Assembly for Danio rerio 2 2. Building Annotation Sources 5 A. Creating a Reference

More information

Data Curation Profile Plant Genetics / Corn Breeding

Data Curation Profile Plant Genetics / Corn Breeding Profile Author Author s Institution Contact Researcher(s) Interviewed Researcher s Institution Katherine Chiang Cornell University Library ksc3@cornell.edu Withheld Cornell University Date of Creation

More information

Min Wang. April, 2003

Min Wang. April, 2003 Development of a co-regulated gene expression analysis tool (CREAT) By Min Wang April, 2003 Project Documentation Description of CREAT CREAT (coordinated regulatory element analysis tool) are developed

More information

EBI is an Outstation of the European Molecular Biology Laboratory.

EBI is an Outstation of the European Molecular Biology Laboratory. EBI is an Outstation of the European Molecular Biology Laboratory. InterPro is a database that groups predictive protein signatures together 11 member databases single searchable resource provides functional

More information

Using The Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes

Using The Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes UNIT 1.11 Using The Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes Leonore Reiser 1, Shabari Subramaniam 1, Donghui Li 1, and Eva Huala 1 1 Phoenix Bioinformatics,

More information

Integrating large, fast-moving, and heterogeneous data sets in biology.

Integrating large, fast-moving, and heterogeneous data sets in biology. Integrating large, fast-moving, and heterogeneous data sets in biology. C. Titus Brown Asst Prof, CSE and Microbiology; BEACON NSF STC Michigan State University ctb@msu.edu Introduction Background: Modeling

More information

The GenAlg Project: Developing a New Integrating Data Model, Language, and Tool for Managing and Querying Genomic Information

The GenAlg Project: Developing a New Integrating Data Model, Language, and Tool for Managing and Querying Genomic Information The GenAlg Project: Developing a New Integrating Data Model, Language, and Tool for Managing and Querying Genomic Information Joachim Hammer and Markus Schneider Department of Computer and Information

More information

SETTING UP AN HCS DATA ANALYSIS SYSTEM

SETTING UP AN HCS DATA ANALYSIS SYSTEM A WHITE PAPER FROM GENEDATA JANUARY 2010 SETTING UP AN HCS DATA ANALYSIS SYSTEM WHY YOU NEED ONE HOW TO CREATE ONE HOW IT WILL HELP HCS MARKET AND DATA ANALYSIS CHALLENGES High Content Screening (HCS)

More information

einfrastructures Concertation Event

einfrastructures Concertation Event einfrastructures Concertation Event Steve Crumb, Executive Director December 5, 2007 OGF Vision & Mission Our Vision: The Open Grid Forum accelerates grid adoption to enable scientific discovery and business

More information

Tutorial:OverRepresentation - OpenTutorials

Tutorial:OverRepresentation - OpenTutorials Tutorial:OverRepresentation From OpenTutorials Slideshow OverRepresentation (about 12 minutes) (http://opentutorials.rbvi.ucsf.edu/index.php?title=tutorial:overrepresentation& ce_slide=true&ce_style=cytoscape)

More information

Writing a Data Management Plan A guide for the perplexed

Writing a Data Management Plan A guide for the perplexed March 29, 2012 Writing a Data Management Plan A guide for the perplexed Agenda Rationale and Motivations for Data Management Plans Data and data structures Metadata and provenance Provisions for privacy,

More information

NCBI News, November 2009

NCBI News, November 2009 Peter Cooper, Ph.D. NCBI cooper@ncbi.nlm.nh.gov Dawn Lipshultz, M.S. NCBI lipshult@ncbi.nlm.nih.gov Featured Resource: New Discovery-oriented PubMed and NCBI Homepage The NCBI Site Guide A new and improved

More information

Evaluating Three Scrutability and Three Privacy User Privileges for a Scrutable User Modelling Infrastructure

Evaluating Three Scrutability and Three Privacy User Privileges for a Scrutable User Modelling Infrastructure Evaluating Three Scrutability and Three Privacy User Privileges for a Scrutable User Modelling Infrastructure Demetris Kyriacou, Hugh C Davis, and Thanassis Tiropanis Learning Societies Lab School of Electronics

More information

Migration to Service Oriented Architecture Using Web Services Whitepaper

Migration to Service Oriented Architecture Using Web Services Whitepaper WHITE PAPER Migration to Service Oriented Architecture Using Web Services Whitepaper Copyright 2004-2006, HCL Technologies Limited All Rights Reserved. cross platform GUI for web services Table of Contents

More information

INTEGRATING BIOLOGICAL DATABASES

INTEGRATING BIOLOGICAL DATABASES INTEGRATING BIOLOGICAL DATABASES Lincoln D. Stein Recent years have seen an explosion in the amount of available biological data. More and more genomes are being sequenced and annotated, and protein and

More information

Bioinformatics approach for exploring MS/MS proteomics data

Bioinformatics approach for exploring MS/MS proteomics data Bioinformatics approach for exploring MS/MS proteomics data Mudita Singhal, Kyle Klicker, George Chin, Lynn Trease, Eric Stephan, Deborah Gracio Computational Sciences and Mathematics, Pacific Northwest

More information

Advanced UCSC Browser Functions

Advanced UCSC Browser Functions Advanced UCSC Browser Functions Dr. Thomas Randall tarandal@email.unc.edu bioinformatics.unc.edu UCSC Browser: genome.ucsc.edu Overview Custom Tracks adding your own datasets Utilities custom tools for

More information

Network Analysis, Visualization, & Graphing TORonto (NAViGaTOR) User Documentation

Network Analysis, Visualization, & Graphing TORonto (NAViGaTOR) User Documentation Network Analysis, Visualization, & Graphing TORonto (NAViGaTOR) User Documentation Jurisica Lab, Ontario Cancer Institute http://ophid.utoronto.ca/navigator/ November 10, 2006 Contents 1 Introduction 2

More information

Tutorial: Using the SFLD and Cytoscape to Make Hypotheses About Enzyme Function for an Isoprenoid Synthase Superfamily Sequence

Tutorial: Using the SFLD and Cytoscape to Make Hypotheses About Enzyme Function for an Isoprenoid Synthase Superfamily Sequence Tutorial: Using the SFLD and Cytoscape to Make Hypotheses About Enzyme Function for an Isoprenoid Synthase Superfamily Sequence Requirements: 1. A web browser 2. The cytoscape program (available for download

More information

A Brief Description of the SAIL Environment

A Brief Description of the SAIL Environment A Brief Description of the SAIL Environment Stephen Bannasch and Robert Tinker July 31, 2008 O VERVIEW SAIL (the Scalable Architecture for Interactive Learning) is both a framework and a collection of

More information

SciMiner User s Manual

SciMiner User s Manual SciMiner User s Manual Copyright 2008 Junguk Hur. All rights reserved. Bioinformatics Program University of Michigan Ann Arbor, MI 48109, USA Email: juhur@umich.edu Homepage: http://jdrf.neurology.med.umich.edu/sciminer/

More information

In the sense of the definition above, a system is both a generalization of one gene s function and a recipe for including and excluding components.

In the sense of the definition above, a system is both a generalization of one gene s function and a recipe for including and excluding components. 1 In the sense of the definition above, a system is both a generalization of one gene s function and a recipe for including and excluding components. 2 Starting from a biological motivation to annotate

More information

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Layers of an information system. Design strategies.

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Layers of an information system. Design strategies. Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 2 Distributed Information Systems Architecture Chapter Outline

More information

Web Services in Cincom VisualWorks. WHITE PAPER Cincom In-depth Analysis and Review

Web Services in Cincom VisualWorks. WHITE PAPER Cincom In-depth Analysis and Review Web Services in Cincom VisualWorks WHITE PAPER Cincom In-depth Analysis and Review Web Services in Cincom VisualWorks Table of Contents Web Services in VisualWorks....................... 1 Web Services

More information

Genome Browser. Background & Strategy. Spring 2017 Faction II

Genome Browser. Background & Strategy. Spring 2017 Faction II Genome Browser Background & Strategy Spring 2017 Faction II Outline Beginning of the Last Phase Goals State of Art Applicable Genome Browsers Not So Genome Browsers Storing Data Strategy for the website

More information

TBtools, a Toolkit for Biologists integrating various HTS-data

TBtools, a Toolkit for Biologists integrating various HTS-data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 TBtools, a Toolkit for Biologists integrating various HTS-data handling tools with a user-friendly interface Chengjie Chen 1,2,3*, Rui Xia 1,2,3, Hao Chen 4, Yehua

More information

ehealth Ministerial Conference 2013 Dublin May 2013 Irish Presidency Declaration

ehealth Ministerial Conference 2013 Dublin May 2013 Irish Presidency Declaration ehealth Ministerial Conference 2013 Dublin 13 15 May 2013 Irish Presidency Declaration Irish Presidency Declaration Ministers of Health of the Member States of the European Union and delegates met on 13

More information

A cell-cycle knowledge integration framework

A cell-cycle knowledge integration framework A cell-cycle knowledge integration framework Erick Antezana Dept. of Plant Systems Biology. Flanders Interuniversity Institute for Biotechnology/Ghent University. Ghent BELGIUM. erant@psb.ugent.be http://www.psb.ugent.be/cbd/

More information

Click on "+" button Select your VCF data files (see #Input Formats->1 above) Remove file from files list:

Click on + button Select your VCF data files (see #Input Formats->1 above) Remove file from files list: CircosVCF: CircosVCF is a web based visualization tool of genome-wide variant data described in VCF files using circos plots. The provided visualization capabilities, gives a broad overview of the genomic

More information

Principles for Interoperability in the Internet of Things

Principles for Interoperability in the Internet of Things Principles for Interoperability in the Internet of Things A Technical Paper prepared for SCTE/ISBE by J. Clarke Stevens Principal Architect, Emerging Technologies Shaw Communications 2420 17th Street Denver,

More information

Genome Browser. Background and Strategy

Genome Browser. Background and Strategy Genome Browser Background and Strategy Contents What is a genome browser? Purpose of a genome browser Examples Structure Extra Features Contents What is a genome browser? Purpose of a genome browser Examples

More information

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 2 Distributed Information Systems Architecture Chapter Outline

More information

Data Mining Technologies for Bioinformatics Sequences

Data Mining Technologies for Bioinformatics Sequences Data Mining Technologies for Bioinformatics Sequences Deepak Garg Computer Science and Engineering Department Thapar Institute of Engineering & Tecnology, Patiala Abstract Main tool used for sequence alignment

More information

About the Edinburgh Pathway Editor:

About the Edinburgh Pathway Editor: About the Edinburgh Pathway Editor: EPE is a visual editor designed for annotation, visualisation and presentation of wide variety of biological networks, including metabolic, genetic and signal transduction

More information

Proceedings of the Postgraduate Annual Research Seminar

Proceedings of the Postgraduate Annual Research Seminar Proceedings of the Postgraduate Annual Research Seminar 2006 202 Database Integration Approaches for Heterogeneous Biological Data Sources: An overview Iskandar Ishak, Naomie Salim Faculty of Computer

More information

Towards Interactive Exploration of Images, Meta-Data, and Analytic Results in the Open Microscopy Environment

Towards Interactive Exploration of Images, Meta-Data, and Analytic Results in the Open Microscopy Environment Towards Interactive Exploration of Images, Meta-Data, and Analytic Results in the Open Microscopy Environment Harry Hochheiser and Ilya G. Goldberg Image Informatics and Computational Biology Unit, Laboratory

More information

Literature Databases

Literature Databases Literature Databases Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Overview 1. Databases 2. Publications in Science 3. PubMed and

More information

Chapter 2 Distributed Information Systems Architecture

Chapter 2 Distributed Information Systems Architecture Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 2 Distributed Information Systems Architecture Chapter Outline

More information

Chapter 4 Research Prototype

Chapter 4 Research Prototype Chapter 4 Research Prototype According to the research method described in Chapter 3, a schema and ontology-assisted heterogeneous information integration prototype system is implemented. This system shows

More information

Grid computing and bioinformatics development. A case study on the Oryza sativa (rice) genome*

Grid computing and bioinformatics development. A case study on the Oryza sativa (rice) genome* Pure Appl. Chem., Vol. 74, No. 6, pp. 891 897, 2002. 2002 IUPAC Grid computing and bioinformatics development. A case study on the Oryza sativa (rice) genome* Wasinee Rungsarityotin, Noppadon Khiripet,

More information

SobekCM Digital Repository : A Retrospective

SobekCM Digital Repository : A Retrospective SobekCM Digital Repository : A Retrospective By Mark Sullivan (12/11/2014) As 2014 draws to a close, the time is ripe for a retrospective on the accomplishments of the SobekCM community over the last twelve

More information

The ELIXIR of Linked Data

The ELIXIR of Linked Data The ELIXIR of Linked Data Professor Carole Goble (UK node) Barend Mons (NL node), Helen Parkinson (EMBL-EBI node) The Interoperability Services Backbone Team European Life Sciences Infrastructure for Biological

More information

Postgres Plus and JBoss

Postgres Plus and JBoss Postgres Plus and JBoss A New Division of Labor for New Enterprise Applications An EnterpriseDB White Paper for DBAs, Application Developers, and Enterprise Architects October 2008 Postgres Plus and JBoss:

More information

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003 The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY

More information

SDI, Containers and DevOps - Cloud Adoption Trends Driving IT Transformation

SDI, Containers and DevOps - Cloud Adoption Trends Driving IT Transformation SDI, Containers and DevOps - Cloud Adoption Trends Driving IT Transformation Research Report August 2017 suse.com Executive Summary As we approach 2020, businesses face a maelstrom of increasing customer

More information

Chapter 1: Distributed Information Systems

Chapter 1: Distributed Information Systems Chapter 1: Distributed Information Systems Contents - Chapter 1 Design of an information system Layers and tiers Bottom up design Top down design Architecture of an information system One tier Two tier

More information

QLIKVIEW ARCHITECTURAL OVERVIEW

QLIKVIEW ARCHITECTURAL OVERVIEW QLIKVIEW ARCHITECTURAL OVERVIEW A QlikView Technology White Paper Published: October, 2010 qlikview.com Table of Contents Making Sense of the QlikView Platform 3 Most BI Software Is Built on Old Technology

More information