Down with Species-Specific Database Projects, Up with Data Services
|
|
- Andra Burns
- 5 years ago
- Views:
Transcription
1 1 Down with Species-Specific Database Projects, Up with Data Services Lincoln D. Stein, Cold Spring Harbor Laboratory This whitepaper begins with an illustration drawn from a database that has nothing to do with the plant kingdom. Figure 1 shows a Cell page from WormBase, a web site devoted to the genome and biology of C. elegans. Among the information available on this page is its role in the organism's life cycle, its lineage, its fate, a diagram of its anatomy, information on the genes that are known to be turned on in this cell, and links to citations that refer to the cell. Figure 1: The Standard HTML Display of a C. elegans Cell Figure 2 shows what is displayed when the user clicks on the link labeled "XML Display" at the top of the first page. This displays the raw database record in XML (extensible Markup Language) form. It is the same information as was
2 2 displayed on the HTML page, but freed of extraneous formatting and easily parsed by standard software. Figure 2: The XML Representation of a C. elegans Cell Every single data object in WormBase, whether it be a cell, a gene, a sequence, a protein, a genetic map, a mutant, or an allele, is available in XML form. To fetch the data, there is a simple published URL: where N and C are the name and class of the object to fetch in XML form. For the page shown earlier, the object name is ADAR, and the class is Cell. The schema for the class can be fetched using a variant on this URL: Where N is the name of the class, for example Cell.
3 3 At first this service might seem extraneous. After all, the primary target audience for WormBase, the bench biologist, has no need for XML. What he wants is a browsable HTML page, or a downloadable Excel spreadsheet. In fact, the XML service is provided for the benefit of computer-savvy biologists and bioinformatics professionals, both those directly involved with WormBase, and those outside the project. For the WormBase insiders, the XML interface allows us to decouple our HTML pages from the underlying database. The core WormBase sequence annotation viewer is driven entirely off XML, and can be moved from data source to data source without modification. More importantly, the XML service is one of the ways that we make WormBase "sticky" for other bioinformatics efforts. Should a bioinformaticist need to download a genetic map, a cell lineage or a set of sequence annotations from Wormbase, he need only generate a request for the URL given above and the database will obligingly yield up all it knows in a predictable, easily-parsed format. Other ways that we have endeavoured to make WormBase sticky are: 1) a universal incoming link format allows an external web page to link to WormBase using the database object's name and class. 2) ad hoc queries on the database using an HTTP request 3) direct ad hoc queries on the database via a command-line Perl API 4) flatfile dumps of sequence annotations using HTTP requests 5) structured dumps of sequence annotations using the distributed annotation system (DAS) protocol 6) FTPable tab-delimited tables containing frequently-requested extracts of the database 7) the entire database (in ACEDB format) can be downloaded and installed 8) all the software for WormBase is open source and available on the WormBase FTP site. There is no restriction on who uses the software and what purpose they use it for. Why Biological Databases Should be Sticky Stickiness means having hooks for third parties to connect to. It is a hallmark of good biological web sites. NCBI's Entrez is sticky by virtue of having a published URL for incoming links, and the LinkOut system for outgoing connections. The
4 4 UCSC genome browser provides flat file dumps and a call interface for dumping out selected annotation tracks. Ensembl provides a Perl command-line interface for accessing its resources at a high level of abstraction and a SQL interface for issuing low-level queries. Why is this important? Because the era in which biological databases could stand alone has departed, if indeed it existed at all. First is the breakdown of the one-species/one- database model. We are moving away from a world of singlespecies databases towards a situation in which biological databases become specialists for a particular type of information across a wide variety of species. Prominent examples include TIGR's TOGA groups, which cross species boundaries, Interpro's protein families, and the Gene Ontology Consortium's process terms. Another factor is the increasing sophistication of users. As a new generation of computer-savvy biologists appear, biological databases are now called upon to serve the needs of researchers who want to do more than browse web pages. They want to extract the information, transform it, compare it to their own data, and integrate it into data services that they have built themselves. Sticky databases provide the hooks needed to integrate data. They provide the stable interfaces needed to link data sources together, to extract and transform information, and possibly even to submit new data. They get used in creative ways that their inventors did not envision, enriching the community, and enlivening the scientific discourse. Non-sticky databases are pretty web sites; very useful in their own right, but with no potential to grow beyond the immediate goals of their designers. Biological Databases as Service Providers The ultimate sticky database is one that is organized around the idea of a set of data services. Instead of offering a few hooks to the community, a biological data service is nothing but hooks. I'll give a concrete example of what I mean. Imagine a database that contains a number of genetic maps, each of which is composed of a set of markers. A genetic map service would have a published interface which accepts requests for genetic maps and returns lists of markers and their positions. Other interfaces would allow the user to retrieve the list of maps available, and to select certain genetic maps based on the map type and species. Now imagine a marker information service that contains molecular information for genetic markers: the primer pairs, assay conditions, and so forth. It would respond to requests for markers by returning the associated information, and
5 5 provide a query interface for selecting markers by their type, polymorphism rate, and so forth. Using a combination of these two service, a programmer could still put together the classic browsable genetic map interface which draws genetic maps and responds to clicks on markers by returning the corresponding molecular information. By breaking the information into discrete services with a published interface, the data provider has opened this information to the community to use for diverse purposes such as comparative map analysis. There are other benefits. The service model makes it possible for the two databases to be physically separate, and possibly under different administrative control. Data visualization and query tools can now be written to a stable interface, providing modularity. This modularity, in turn, encourages code reuse and sharing, and allows one user interfaces to run on top of many data sources. The DAS Experience The prototype for this type of biological information service is DAS, the Distributed Annotation System. DAS is an experimental client/server protocol designed by Sean Eddy and myself which allows biological databases to become service providers for sequence annotation information. In the core of the protocol, an information consumer asks the server for all or a subset of its annotations in a particular region of the genome, and the server responds by returning a list of its annotations. It is then up to the client to store, analyze or display the information. The protocol deliberately limits the amount of information that can be transmitted by the data source to the bare essentials of a sequence annotation: a reference point on the genome, the sequence range that the annotation covers, and a brief description of the annotation type. For further information on how the annotation was made and its significance, the client is referred to a URL provided by the data source. The DAS protocol allows the same visualization and analysis tool to run on top of any database that provides a DAS service. Data providers can retrofit their databases to provide the DAS service by creating a relatively thin DAS compatibility layer. So far, the DAS experiment seems successful. In recent months, it has proven to be very popular among the model organism databases, and is now used by EBI Ensembl, the human genome browser at UCSC, TIGR, WormBase, University of Cambridge, the Berkeley Drosophila Sequencing Project, and others. There is considerable enthusiasm among the developer community, and an everincreasing number of data providers have indicated their intent to build or install DAS servers.
6 6 Modularizing Biological Databases In addition to the genetic mapping and sequence annotation services described earlier, I see many opportunities for reorganizing biological databases as sets of discrete services. Here are a few ideas for standard services: - A comparative genetic mapping service, which given coordinates on one map, translates those coordinates to another map. This could be used to compare different genetic maps in the same species as well as those in different, but substantially synteneic species. - Along the same lines, a genome assembly translation service which translate coordinates from one version of a sequence assembly to another. - A gene ontology service, which given a protein identifier, returns the gene ontology assignments for that protein. - A protein family service, which given a protein identifier, returns its domains, families and superfamilies according to one or more protein classification systems. - A mutant strain service, which given a phenotype and a species, returns all strains that express that phenotype (this presupposes a phenotype ontology, such as several groups are developing) - A sequence similarity service, which runs searches for nearly identical sequences using one of the new fast algorithms (e.g. SSAHA or BLAT). I envision these services being implemented on top of an industry-standard communications protocol. I lean towards SOAP/XML because of the predominant industry trend in this direction, but other protocols (e.g. CORBA) should be taken under consideration. WormBase as an Anachronism Let s return to WormBase for a moment. Although I think WormBase has done well in serving the needs of the C. elegans community, its original goal to be the authorative one-stop shop for all C. elegans information is an increasingly unrealistic one. We do very well at presenting the C. elegans genome, genetic map and mutants, not so with the proteome and cellular anatomy, and very poorly when it comes to microarray data or transposon-mediated knockouts. The fact is that our expertise is strong in some areas but weak in others, and that the ability of the community to
7 7 develop new types of information outstrips the WormBase curators ability to classify and incorporate it. In recognition of this, WormBase has made alliances with other data providers so that we can draw on each others strengths. The oldest of our alliances is with WormPD, the database of the C. elegans proteome provided by Proteome, Inc. (now a division of Incyte). We have agreed upon a common nomenclature for the proteins and developed a simple calling scheme that allows WormBase to link to Proteome for protein information, and for Proteome to link back to WormBase for genetic mapping and genomic information. More recently, we have made similar arrangements with SwissProt for Gene Ontology and protein family information, and with EuGenes for orthologue clustering. We are using DAS to exchange information with TIGR, and are currently working on data exchange protocols with the C. elegans Microarray project at Stanford, the Orfeome project at Dana Farber, and the Transcriptome project at NCBI. We realize that becoming a component in a network of data service providers allows us to play to our strengths. Ultimately both WormBase and the community benefits. Challenges and Opportunities Transitioning from a species-oriented to a service-oriented mission presents both challenges and opportunities for providers of biological data. As described earlier, this transition would allow a data source that has garnered expertise in, say, the storage and analysis of microarray data from Arabidopsis, to establish itself as a provider of microarray data services for a large number of plant species. However, the reemphasis would also benefit newcomers, who could now focus on setting up a discrete service rather than making the much more challenging leap to become a complete source for species-specific information. The major challenge is integration and standardization. It is very good for a data source to provide external hooks into its database, but the real benefits only kick in when several data sources settle on a standard interface. This allows the same software tools to be used across all data sources, and encourages new sources to create compliant interfaces, a phenomenon known as the "network effect." However, it is notoriously difficult to develop standards among biological databases. There are several reasons for this. One is simply that standardization is hard. There are many potential technical approaches, and reasonable people will reasonably disagree. However, as the Internet sector has shown, it is possible to overcome these technical barriers by adopting well-tested standardization practices. I happen to favor the IETF (Internet Engineering Task
8 8 Force) model, in which proposals for standardization are accompanied by reference implementations that can then be tested head-to-head, but other types of standardization processes work as well. What the NSF Can Do Service standardization won't occur unless there is a strong incentive to do so, and to date the funding practices of NSF and other agencies have discouraged evolution in this direction. By focusing efforts on species-specific databases, and by insisting that these projects become self-sufficient (i.e. profitable) after the initial development is finished, funding agencies encourage data providers to build proprietary, non-portable systems. The next time a database is needed, groups need to start again from scratch. This is a wasteful and inefficient practice. A more parsimonious approach would be to take the long view and provide funding directly for the development of biological data service infrastructure. The deliverables for such projects would be portable, general purpose, software and standards that are made freely available to the academic community as well as industry. The projects should not be tied to a particular species, and should not piggybacked on top of a database delivery project; when a data release deadline looms, the need to push the data out the door always wins out over the portability of the underlying software. The flip side of funding for infrastructure development is funding for infrastructure operations. I feel strongly that biological data services are no different from the stock centers in their need for stable, reliable, long-term funding. In many fields, the data services have become indispensable to the practice of biological research. Let s figure out a way to ensure the ongoing growth and availability of this infrastructure. Finally, it is important for the funding agencies to coordinate their efforts. The NHGRI and NIHGMS have recently announced their intention to jointly fund the development of a "model organism database toolkit." The USDA ARS has called together a working group to create a common set of services for agricultural sequencing projects. The DOE has organized workshops to develop standard XML formats for exchanging biological data. Working together with a consistent vision of the goal, the funding agencies can transform the bioinformatics landscape from a small number of insular database projects, to a large number of open, interoperable data services, together forming the fabric of a new biological data infrastructure.
Software review. Biomolecular Interaction Network Database
Biomolecular Interaction Network Database Keywords: protein interactions, visualisation, biology data integration, web access Abstract This software review looks at the utility of the Biomolecular Interaction
More informationBioinformatics Data Distribution and Integration via Web Services and XML
Letter Bioinformatics Data Distribution and Integration via Web Services and XML Xiao Li and Yizheng Zhang* College of Life Science, Sichuan University/Sichuan Key Laboratory of Molecular Biology and Biotechnology,
More informationTopics of the talk. Biodatabases. Data types. Some sequence terminology...
Topics of the talk Biodatabases Jarno Tuimala / Eija Korpelainen CSC What data are stored in biological databases? What constitutes a good database? Nucleic acid sequence databases Amino acid sequence
More informationGenome Browsers Guide
Genome Browsers Guide Take a Class This guide supports the Galter Library class called Genome Browsers. See our Classes schedule for the next available offering. If this class is not on our upcoming schedule,
More informationWormBase Todd Harris, PhD. CBPSS Mini Symposium
WormBase Todd Harris, PhD todd@wormbase.org @tharris CBPSS Mini Symposium Mission Provide the biomedical research community with accurate, current, and accessible information on the genetics, genomics,
More informationDATA-SHARING PLAN FOR MOORE FOUNDATION Coral resilience investigated in the field and via a sea anemone model system
DATA-SHARING PLAN FOR MOORE FOUNDATION Coral resilience investigated in the field and via a sea anemone model system GENERAL PHILOSOPHY (Arthur Grossman, Steve Palumbi, and John Pringle) The three Principal
More informationFinding and Exporting Data. BioMart
September 2017 Finding and Exporting Data Not sure what tool to use to find and export data? BioMart is used to retrieve data for complex queries, involving a few or many genes or even complete genomes.
More informationEnabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services
Enabling Open Science: Data Discoverability, Access and Use Jo McEntyre Head of Literature Services www.ebi.ac.uk About EMBL-EBI Part of the European Molecular Biology Laboratory International, non-profit
More informationInformation Resources in Molecular Biology Marcela Davila-Lopez How many and where
Information Resources in Molecular Biology Marcela Davila-Lopez (marcela.davila@medkem.gu.se) How many and where Data growth DB: What and Why A Database is a shared collection of logically related data,
More informationBioinformatics Hubs on the Web
Bioinformatics Hubs on the Web Take a class The Galter Library teaches a related class called Bioinformatics Hubs on the Web. See our Classes schedule for the next available offering. If this class is
More informationCAP BIOINFORMATICS Su-Shing Chen CISE. 8/19/2005 Su-Shing Chen, CISE 1
CAP 5510-2 BIOINFORMATICS Su-Shing Chen CISE 8/19/2005 Su-Shing Chen, CISE 1 Building Local Genomic Databases Genomic research integrates sequence data with gene function knowledge. Gene ontology to represent
More informationXML in the bipharmaceutical
XML in the bipharmaceutical sector XML holds out the opportunity to integrate data across both the enterprise and the network of biopharmaceutical alliances - with little technological dislocation and
More information2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.
Web resources -- Tour. page 1 of 8 This is a guided tour. Any homework is separate. In fact, this exercise is used for multiple classes and is publicly available to everyone. The entire tour will take
More informationHow to store and visualize RNA-seq data
How to store and visualize RNA-seq data Gabriella Rustici Functional Genomics Group gabry@ebi.ac.uk EBI is an Outstation of the European Molecular Biology Laboratory. Talk summary How do we archive RNA-seq
More informationGenome Browsers - The UCSC Genome Browser
Genome Browsers - The UCSC Genome Browser Background The UCSC Genome Browser is a well-curated site that provides users with a view of gene or sequence information in genomic context for a specific species,
More informationUser Manual. Ver. 3.0 March 19, 2012
User Manual Ver. 3.0 March 19, 2012 Table of Contents 1. Introduction... 2 1.1 Rationale... 2 1.2 Software Work-Flow... 3 1.3 New in GenomeGems 3.0... 4 2. Software Description... 5 2.1 Key Features...
More informationData Curation Profile Human Genomics
Data Curation Profile Human Genomics Profile Author Profile Author Institution Name Contact J. Carlson N. Brown Purdue University J. Carlson, jrcarlso@purdue.edu Date of Creation October 27, 2009 Date
More informationA Protocol for Maintaining Multidatabase Referential Integrity. Articial Intelligence Center. SRI International, EJ229
A Protocol for Maintaining Multidatabase Referential Integrity Peter D. Karp Articial Intelligence Center SRI International, EJ229 333 Ravenswood Ave. Menlo Park, CA 94025 voice: 415-859-6375 fax: 415-859-3735
More informationHuman Disease Models Tutorial
Mouse Genome Informatics www.informatics.jax.org The fundamental mission of the Mouse Genome Informatics resource is to facilitate the use of mouse as a model system for understanding human biology and
More informationECLIPSE PERSISTENCE PLATFORM (ECLIPSELINK) FAQ
ECLIPSE PERSISTENCE PLATFORM (ECLIPSELINK) FAQ 1. What is Oracle proposing in EclipseLink, the Eclipse Persistence Platform Project? Oracle is proposing the creation of the Eclipse Persistence Platform
More informationSoftware review. Shopping in the genome market with EnsMart
Shopping in the genome market with EnsMart Keywords: genome databases, human genome, comparative genomics, data mining, open source software Abstract Life scientists who work with the supermarket of genome
More informationJULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING
JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING Larson Hogstrom, Mukarram Tahir, Andres Hasfura Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 18.337/6.338
More informationUsing WebGBrowse to Visualize Genome Annotation on GBrowse
Protocol Using WebGBrowse to Visualize Genome Annotation on GBrowse Ram Podicheti and Qunfeng Dong 1 Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA INTRODUCTION
More informationDesign and Implementation of a Service Discovery Architecture in Pervasive Systems
Design and Implementation of a Service Discovery Architecture in Pervasive Systems Vincenzo Suraci 1, Tiziano Inzerilli 2, Silvano Mignanti 3, University of Rome La Sapienza, D.I.S. 1 vincenzo.suraci@dis.uniroma1.it
More informationmpmorfsdb: A database of Molecular Recognition Features (MoRFs) in membrane proteins. Introduction
mpmorfsdb: A database of Molecular Recognition Features (MoRFs) in membrane proteins. Introduction Molecular Recognition Features (MoRFs) are short, intrinsically disordered regions in proteins that undergo
More informationHelpful Galaxy screencasts are available at:
This user guide serves as a simplified, graphic version of the CloudMap paper for applicationoriented end-users. For more details, please see the CloudMap paper. Video versions of these user guides and
More information> Semantic Web Use Cases and Case Studies
> Semantic Web Use Cases and Case Studies Case Study: Improving Web Search using Metadata Peter Mika, Yahoo! Research, Spain November 2008 Presenting compelling search results depends critically on understanding
More informationFeed the Future Innovation Lab for Peanut (Peanut Innovation Lab) Data Management Plan Version:
Feed the Future Innovation Lab for Peanut (Peanut Innovation Lab) Data Management Plan Version: 20180316 Peanut Innovation Lab Management Entity The University of Georgia, Athens, Georgia Feed the Future
More informationorg.hs.ipi.db November 7, 2017 annotation data package
org.hs.ipi.db November 7, 2017 org.hs.ipi.db annotation data package Welcome to the org.hs.ipi.db annotation Package. The annotation package was built using a downloadable R package - PAnnBuilder (download
More informationBovineMine Documentation
BovineMine Documentation Release 1.0 Deepak Unni, Aditi Tayal, Colin Diesh, Christine Elsik, Darren Hag Oct 06, 2017 Contents 1 Tutorial 3 1.1 Overview.................................................
More informationXML for Bioinformatics
XML for Bioinformatics Ethan Cerami XML for Bioinformatics Library of Congress Cataloging-in-Publication Data Cerami, Ethan. XML for bioinformatics / Ethan Cerami. p. cm. Includes bibliographical references
More informationIndiana University Research Technology and the Research Data Alliance
Indiana University Research Technology and the Research Data Alliance Rob Quick Manager High Throughput Computing Operations Officer - OSG and SWAMP Board Member - RDA Organizational Assembly RDA Mission
More informationSELF-SERVICE SEMANTIC DATA FEDERATION
SELF-SERVICE SEMANTIC DATA FEDERATION WE LL MAKE YOU A DATA SCIENTIST Contact: IPSNP Computing Inc. Chris Baker, CEO Chris.Baker@ipsnp.com (506) 721 8241 BIG VISION: SELF-SERVICE DATA FEDERATION Biomedical
More informationEarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography
EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography Christopher Crosby, San Diego Supercomputer Center J Ramon Arrowsmith, Arizona State University Chaitan
More informationAnalyzing Variant Call results using EuPathDB Galaxy, Part II
Analyzing Variant Call results using EuPathDB Galaxy, Part II In this exercise, we will work in groups to examine the results from the SNP analysis workflow that we started yesterday. The first step is
More informationTutorial 1: Exploring the UCSC Genome Browser
Last updated: May 12, 2011 Tutorial 1: Exploring the UCSC Genome Browser Open the homepage of the UCSC Genome Browser at: http://genome.ucsc.edu/ In the blue bar at the top, click on the Genomes link.
More informationMDA Blast2GO Exercises
MDA 2011 - Blast2GO Exercises Ana Conesa and Stefan Götz March 2011 Bioinformatics and Genomics Department Prince Felipe Research Center Valencia, Spain Contents 1 Annotate 10 sequences with Blast2GO 2
More informationBlast2GO Teaching Exercises
Blast2GO Teaching Exercises Ana Conesa and Stefan Götz 2012 BioBam Bioinformatics S.L. Valencia, Spain Contents 1 Annotate 10 sequences with Blast2GO 2 2 Perform a complete annotation process with Blast2GO
More informationWhite Paper: Delivering Enterprise Web Applications on the Curl Platform
White Paper: Delivering Enterprise Web Applications on the Curl Platform Table of Contents Table of Contents Executive Summary... 1 Introduction... 2 Background... 2 Challenges... 2 The Curl Solution...
More informationIntegrated Access to Biological Data. A use case
Integrated Access to Biological Data. A use case Marta González Fundación ROBOTIKER, Parque Tecnológico Edif 202 48970 Zamudio, Vizcaya Spain marta@robotiker.es Abstract. This use case reflects the research
More informationBIOINFORMATICS A PRACTICAL GUIDE TO THE ANALYSIS OF GENES AND PROTEINS
BIOINFORMATICS A PRACTICAL GUIDE TO THE ANALYSIS OF GENES AND PROTEINS EDITED BY Genome Technology Branch National Human Genome Research Institute National Institutes of Health Bethesda, Maryland B. F.
More informationIP PBX for Service Oriented Architectures Communications Web Services
IP PBX for Service Oriented Architectures Communications Web Services.......... Introduction Enterprise communications have traditionally been provided by closed, stand-alone PBX systems. Installed in
More informationCompClustTk Manual & Tutorial
CompClustTk Manual & Tutorial Brandon King Copyright c California Institute of Technology Version 0.1.10 May 13, 2004 Contents 1 Introduction 1 1.1 Purpose.............................................
More informationRetrieving factual data and documents using IMGT-ML in the IMGT information system
Retrieving factual data and documents using IMGT-ML in the IMGT information system Authors : Chaume D. *, Combres K. *, Giudicelli V. *, Lefranc M.-P. * * Laboratoire d'immunogénétique Moléculaire, LIGM,
More informationDiscovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London
Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services Patrick Wendel Imperial College, London Data Mining and Exploration Middleware for Distributed and Grid Computing,
More informationHymenopteraMine Documentation
HymenopteraMine Documentation Release 1.0 Aditi Tayal, Deepak Unni, Colin Diesh, Chris Elsik, Darren Hagen Apr 06, 2017 Contents 1 Welcome to HymenopteraMine 3 1.1 Overview of HymenopteraMine.....................................
More informationWheatIS: Progress report
WheatIS: Progress report WheatIS Annual meeting, San Diego, 9 January 2015 WheatIS data submission DSpace Beta-version to test: http://urgi.versailles.inra.fr/xmlui/ At the moment, available submission
More information3DA Meta Data Exporter for Revit is a registered trademark of 3DA Systems Inc. and 3dasystems.com
Copyright This manual is protected by copyright laws. No part of it may be translated, copied or reproduced, in any form or by any means, without written permission from 3DA Systems Inc. 3DA reserves the
More informationHow to integrate data into Tableau
1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service
More informationSEMANTIC SOLUTIONS FOR OIL & GAS: ROLES AND RESPONSIBILITIES
SEMANTIC SOLUTIONS FOR OIL & GAS: ROLES AND RESPONSIBILITIES Jeremy Carroll, Ralph Hodgson, {jeremy,ralph}@topquadrant.com This paper is submitted to The W3C Workshop on Semantic Web in Energy Industries
More informationIntroduction to Genome Browsers
Introduction to Genome Browsers Rolando Garcia-Milian, MLS, AHIP (Rolando.milian@ufl.edu) Department of Biomedical and Health Information Services Health Sciences Center Libraries, University of Florida
More informationISO INTERNATIONAL STANDARD. Health informatics Genomic Sequence Variation Markup Language (GSVML)
INTERNATIONAL STANDARD ISO 25720 First edition 2009-08-15 Health informatics Genomic Sequence Variation Markup Language (GSVML) Informatique de santé Langage de balisage de la variation de séquence génomique
More informationFacilitating Semantic Alignment of EBI Resources
Facilitating Semantic Alignment of EBI Resources 17 th March, 2017 Tony Burdett Technical Co-ordinator Samples, Phenotypes and Ontologies Team www.ebi.ac.uk What is EMBL-EBI? Europe s home for biological
More informationCreating and Using Genome Assemblies Tutorial
Creating and Using Genome Assemblies Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Create a Genome Assembly for Danio rerio 2 2. Building Annotation Sources 5 A. Creating a Reference
More informationData Curation Profile Plant Genetics / Corn Breeding
Profile Author Author s Institution Contact Researcher(s) Interviewed Researcher s Institution Katherine Chiang Cornell University Library ksc3@cornell.edu Withheld Cornell University Date of Creation
More informationMin Wang. April, 2003
Development of a co-regulated gene expression analysis tool (CREAT) By Min Wang April, 2003 Project Documentation Description of CREAT CREAT (coordinated regulatory element analysis tool) are developed
More informationEBI is an Outstation of the European Molecular Biology Laboratory.
EBI is an Outstation of the European Molecular Biology Laboratory. InterPro is a database that groups predictive protein signatures together 11 member databases single searchable resource provides functional
More informationUsing The Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes
UNIT 1.11 Using The Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes Leonore Reiser 1, Shabari Subramaniam 1, Donghui Li 1, and Eva Huala 1 1 Phoenix Bioinformatics,
More informationIntegrating large, fast-moving, and heterogeneous data sets in biology.
Integrating large, fast-moving, and heterogeneous data sets in biology. C. Titus Brown Asst Prof, CSE and Microbiology; BEACON NSF STC Michigan State University ctb@msu.edu Introduction Background: Modeling
More informationThe GenAlg Project: Developing a New Integrating Data Model, Language, and Tool for Managing and Querying Genomic Information
The GenAlg Project: Developing a New Integrating Data Model, Language, and Tool for Managing and Querying Genomic Information Joachim Hammer and Markus Schneider Department of Computer and Information
More informationSETTING UP AN HCS DATA ANALYSIS SYSTEM
A WHITE PAPER FROM GENEDATA JANUARY 2010 SETTING UP AN HCS DATA ANALYSIS SYSTEM WHY YOU NEED ONE HOW TO CREATE ONE HOW IT WILL HELP HCS MARKET AND DATA ANALYSIS CHALLENGES High Content Screening (HCS)
More informationeinfrastructures Concertation Event
einfrastructures Concertation Event Steve Crumb, Executive Director December 5, 2007 OGF Vision & Mission Our Vision: The Open Grid Forum accelerates grid adoption to enable scientific discovery and business
More informationTutorial:OverRepresentation - OpenTutorials
Tutorial:OverRepresentation From OpenTutorials Slideshow OverRepresentation (about 12 minutes) (http://opentutorials.rbvi.ucsf.edu/index.php?title=tutorial:overrepresentation& ce_slide=true&ce_style=cytoscape)
More informationWriting a Data Management Plan A guide for the perplexed
March 29, 2012 Writing a Data Management Plan A guide for the perplexed Agenda Rationale and Motivations for Data Management Plans Data and data structures Metadata and provenance Provisions for privacy,
More informationNCBI News, November 2009
Peter Cooper, Ph.D. NCBI cooper@ncbi.nlm.nh.gov Dawn Lipshultz, M.S. NCBI lipshult@ncbi.nlm.nih.gov Featured Resource: New Discovery-oriented PubMed and NCBI Homepage The NCBI Site Guide A new and improved
More informationEvaluating Three Scrutability and Three Privacy User Privileges for a Scrutable User Modelling Infrastructure
Evaluating Three Scrutability and Three Privacy User Privileges for a Scrutable User Modelling Infrastructure Demetris Kyriacou, Hugh C Davis, and Thanassis Tiropanis Learning Societies Lab School of Electronics
More informationMigration to Service Oriented Architecture Using Web Services Whitepaper
WHITE PAPER Migration to Service Oriented Architecture Using Web Services Whitepaper Copyright 2004-2006, HCL Technologies Limited All Rights Reserved. cross platform GUI for web services Table of Contents
More informationINTEGRATING BIOLOGICAL DATABASES
INTEGRATING BIOLOGICAL DATABASES Lincoln D. Stein Recent years have seen an explosion in the amount of available biological data. More and more genomes are being sequenced and annotated, and protein and
More informationBioinformatics approach for exploring MS/MS proteomics data
Bioinformatics approach for exploring MS/MS proteomics data Mudita Singhal, Kyle Klicker, George Chin, Lynn Trease, Eric Stephan, Deborah Gracio Computational Sciences and Mathematics, Pacific Northwest
More informationAdvanced UCSC Browser Functions
Advanced UCSC Browser Functions Dr. Thomas Randall tarandal@email.unc.edu bioinformatics.unc.edu UCSC Browser: genome.ucsc.edu Overview Custom Tracks adding your own datasets Utilities custom tools for
More informationNetwork Analysis, Visualization, & Graphing TORonto (NAViGaTOR) User Documentation
Network Analysis, Visualization, & Graphing TORonto (NAViGaTOR) User Documentation Jurisica Lab, Ontario Cancer Institute http://ophid.utoronto.ca/navigator/ November 10, 2006 Contents 1 Introduction 2
More informationTutorial: Using the SFLD and Cytoscape to Make Hypotheses About Enzyme Function for an Isoprenoid Synthase Superfamily Sequence
Tutorial: Using the SFLD and Cytoscape to Make Hypotheses About Enzyme Function for an Isoprenoid Synthase Superfamily Sequence Requirements: 1. A web browser 2. The cytoscape program (available for download
More informationA Brief Description of the SAIL Environment
A Brief Description of the SAIL Environment Stephen Bannasch and Robert Tinker July 31, 2008 O VERVIEW SAIL (the Scalable Architecture for Interactive Learning) is both a framework and a collection of
More informationSciMiner User s Manual
SciMiner User s Manual Copyright 2008 Junguk Hur. All rights reserved. Bioinformatics Program University of Michigan Ann Arbor, MI 48109, USA Email: juhur@umich.edu Homepage: http://jdrf.neurology.med.umich.edu/sciminer/
More informationIn the sense of the definition above, a system is both a generalization of one gene s function and a recipe for including and excluding components.
1 In the sense of the definition above, a system is both a generalization of one gene s function and a recipe for including and excluding components. 2 Starting from a biological motivation to annotate
More informationChapter Outline. Chapter 2 Distributed Information Systems Architecture. Layers of an information system. Design strategies.
Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 2 Distributed Information Systems Architecture Chapter Outline
More informationWeb Services in Cincom VisualWorks. WHITE PAPER Cincom In-depth Analysis and Review
Web Services in Cincom VisualWorks WHITE PAPER Cincom In-depth Analysis and Review Web Services in Cincom VisualWorks Table of Contents Web Services in VisualWorks....................... 1 Web Services
More informationGenome Browser. Background & Strategy. Spring 2017 Faction II
Genome Browser Background & Strategy Spring 2017 Faction II Outline Beginning of the Last Phase Goals State of Art Applicable Genome Browsers Not So Genome Browsers Storing Data Strategy for the website
More informationTBtools, a Toolkit for Biologists integrating various HTS-data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 TBtools, a Toolkit for Biologists integrating various HTS-data handling tools with a user-friendly interface Chengjie Chen 1,2,3*, Rui Xia 1,2,3, Hao Chen 4, Yehua
More informationehealth Ministerial Conference 2013 Dublin May 2013 Irish Presidency Declaration
ehealth Ministerial Conference 2013 Dublin 13 15 May 2013 Irish Presidency Declaration Irish Presidency Declaration Ministers of Health of the Member States of the European Union and delegates met on 13
More informationA cell-cycle knowledge integration framework
A cell-cycle knowledge integration framework Erick Antezana Dept. of Plant Systems Biology. Flanders Interuniversity Institute for Biotechnology/Ghent University. Ghent BELGIUM. erant@psb.ugent.be http://www.psb.ugent.be/cbd/
More informationClick on "+" button Select your VCF data files (see #Input Formats->1 above) Remove file from files list:
CircosVCF: CircosVCF is a web based visualization tool of genome-wide variant data described in VCF files using circos plots. The provided visualization capabilities, gives a broad overview of the genomic
More informationPrinciples for Interoperability in the Internet of Things
Principles for Interoperability in the Internet of Things A Technical Paper prepared for SCTE/ISBE by J. Clarke Stevens Principal Architect, Emerging Technologies Shaw Communications 2420 17th Street Denver,
More informationGenome Browser. Background and Strategy
Genome Browser Background and Strategy Contents What is a genome browser? Purpose of a genome browser Examples Structure Extra Features Contents What is a genome browser? Purpose of a genome browser Examples
More informationChapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system
Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 2 Distributed Information Systems Architecture Chapter Outline
More informationData Mining Technologies for Bioinformatics Sequences
Data Mining Technologies for Bioinformatics Sequences Deepak Garg Computer Science and Engineering Department Thapar Institute of Engineering & Tecnology, Patiala Abstract Main tool used for sequence alignment
More informationAbout the Edinburgh Pathway Editor:
About the Edinburgh Pathway Editor: EPE is a visual editor designed for annotation, visualisation and presentation of wide variety of biological networks, including metabolic, genetic and signal transduction
More informationProceedings of the Postgraduate Annual Research Seminar
Proceedings of the Postgraduate Annual Research Seminar 2006 202 Database Integration Approaches for Heterogeneous Biological Data Sources: An overview Iskandar Ishak, Naomie Salim Faculty of Computer
More informationTowards Interactive Exploration of Images, Meta-Data, and Analytic Results in the Open Microscopy Environment
Towards Interactive Exploration of Images, Meta-Data, and Analytic Results in the Open Microscopy Environment Harry Hochheiser and Ilya G. Goldberg Image Informatics and Computational Biology Unit, Laboratory
More informationLiterature Databases
Literature Databases Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Overview 1. Databases 2. Publications in Science 3. PubMed and
More informationChapter 2 Distributed Information Systems Architecture
Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 2 Distributed Information Systems Architecture Chapter Outline
More informationChapter 4 Research Prototype
Chapter 4 Research Prototype According to the research method described in Chapter 3, a schema and ontology-assisted heterogeneous information integration prototype system is implemented. This system shows
More informationGrid computing and bioinformatics development. A case study on the Oryza sativa (rice) genome*
Pure Appl. Chem., Vol. 74, No. 6, pp. 891 897, 2002. 2002 IUPAC Grid computing and bioinformatics development. A case study on the Oryza sativa (rice) genome* Wasinee Rungsarityotin, Noppadon Khiripet,
More informationSobekCM Digital Repository : A Retrospective
SobekCM Digital Repository : A Retrospective By Mark Sullivan (12/11/2014) As 2014 draws to a close, the time is ripe for a retrospective on the accomplishments of the SobekCM community over the last twelve
More informationThe ELIXIR of Linked Data
The ELIXIR of Linked Data Professor Carole Goble (UK node) Barend Mons (NL node), Helen Parkinson (EMBL-EBI node) The Interoperability Services Backbone Team European Life Sciences Infrastructure for Biological
More informationPostgres Plus and JBoss
Postgres Plus and JBoss A New Division of Labor for New Enterprise Applications An EnterpriseDB White Paper for DBAs, Application Developers, and Enterprise Architects October 2008 Postgres Plus and JBoss:
More informationThe MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003
The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY
More informationSDI, Containers and DevOps - Cloud Adoption Trends Driving IT Transformation
SDI, Containers and DevOps - Cloud Adoption Trends Driving IT Transformation Research Report August 2017 suse.com Executive Summary As we approach 2020, businesses face a maelstrom of increasing customer
More informationChapter 1: Distributed Information Systems
Chapter 1: Distributed Information Systems Contents - Chapter 1 Design of an information system Layers and tiers Bottom up design Top down design Architecture of an information system One tier Two tier
More informationQLIKVIEW ARCHITECTURAL OVERVIEW
QLIKVIEW ARCHITECTURAL OVERVIEW A QlikView Technology White Paper Published: October, 2010 qlikview.com Table of Contents Making Sense of the QlikView Platform 3 Most BI Software Is Built on Old Technology
More information