Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London
|
|
- Stewart Rich
- 5 years ago
- Views:
Transcription
1 Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services Patrick Wendel Imperial College, London Data Mining and Exploration Middleware for Distributed and Grid Computing, September 18-19, 2003
2 Why Discovery Net? Data Challenge: Distributed, heterogeneous & large scale data sets Novel and real-time data sources Resource Challenge Novel specialised data analysis components/services continually being published/made available Computational resources provided Information Challenge: Data cleaning, normalisation & calibration New data needs to be related to existing data Knowledge Challenge: Collaborative, interactive & people-intensive Result interpretation & validation in relation to existing knowledge Knowledge sharing is key
3 What is Discovery Net Goal : Construct an Infrastructure for Global wide Knowledge Discovery Services Key Technologies: Grid and Distributed Computing Workflow and service composition Data Mining & Visualisation. Data Access & Information Structuring. High Throughput Screening Devices: real-time.
4 Discovery Net: Unifying the World s Knowledge Data Integration: Dynamic Real Time Construction of Data Grids Application Integration: Component and Service-based Integration People Integration: Global-wide Discovery Groupware Knowledge Integration: Multi-subjects and Multi-modality Integrative Analysis to Cross Validate and Annotate Related Discovery Work
5 What is Discovery Net Information Scientific Discovery Scientific Literature Real Time Integration Workflow Construction Databases Operational Data Images Dynamic Application Interactive Visual Integration Analysis Using Distributed Resources Instrument Data
6 Discovery Net Layer Model (Life Science Application) D-Net Clients: End-user applications and user interface allowing scientists to construct and drive knowledge discovery activities Deployment Web/Grid Services OGSA D-Net Middleware: Provides execution logic for distributed knowledge discovery and access to distributed resources Computation & Data Resources: Distributed databases, compute servers and scientific devices. High Performance and Grid-enabled Transfer Protocol (GSI-FTP, DSTP..) Grid-enabled Infrastructure (GSI)
7 Goal: Plug & Play DNet server DNet Server A Knowledge Grid based on D-Net Servers Data Sources, Analysis Components and Knowledge Discovery Processes Knowledge discovery services InfoGrid Data access & Storage Components Computation Deployment DNet API DNet server DNet Server DNet participating client DNet Client DNet Server XML DPML Internet DNet client DNet Client Web client WWW RDBMS Data sources Computational services Several types of clients for different usage (from thin web client to participating client) Current implmentation based on Java distributed objects (EJB), moving towards Web/Grid service But deployment and API access through standard Web/Grid service
8 Discovery Process Management Workflow based service composition Data-flow approach fits Knowledge Discovery process Allows scientists to develop processes. Towards a Standard Workflow Representation for Discovery Informatics: Discovery Process Markup Language (DPML): Contains component data-flow graphs, but also Records collaboration information (user, changes) Records execution constraints (location, parameterisation) Becomes a key intellectual property: Discovery Processes can be stored, reused, audited, refined and deployed in various forms D-Net Workflow for Genome Annotation : 16 services executing across Internet
9 InfoGrid: Dynamic Data Integration Dynamic Data Integration = On-demand access to heterogeneous data sources + information structuring Towards a Dynamic Information Integration Methodology: Specialised Information Source Access: InfoGrid allows users to register, locate and connect to various specialised information sources. On the-fly Integration: InfoGrid allows Journals Project Reports Patents Structures Libraries Catalogues Trails Patients Clinical Journals Integrative Analysis Chemistry Gene Biological Screening Protein / Targets Sequence Activity Protocols Toxicology Metabolic Pathways Sequence Structure Location users to build their own integration Synthetic Expression Function structure on the fly (Worst case: pathways Function proprietary protocol/format, best case JDBC/HTTP-XML-XPath/Web Service). Easy Maintenance: Wrappers/Drivers to new data sources can be added through a clean API
10 Dynamic Application Integration Services Dynamic Application Integration = Ondemand access and composition of remote analysis components Clustering Classification Towards a Dynamic Component Integration: Component service: allow users to register, locate and remotely execute components (Java component interface or Web Service port type). Execution service: allow users to control the execution of components distributed environments Easy Maintenance: New components can be added through a clean API Regression Promoter Prediction D-NET API Gene function perdition Homology Search
11 Discovery Deployment Discovery Deployment = On-demand rapid application construction and publishing Towards a Dynamic Deployment of Knowledge Discovery Procedures: Deployment Engine : allows users to build and publish applications based on DPML code coordinating remotely execute components, as Web Page, Web/Grid Service, command line tool. Easy Maintenance: New discovery procedures described in DPML, a Standardised representation of composed discovery procedures Discovery Component Discovery Service Discovery Process in DPML Report Batch processing Storage & Reporting Servers: allow users to share DPML procedures and to generate workflow audit reports.
12 Knowledge Integration & Interpretation Dynamic Knowledge Interpretation = cross-reference and verify analysis results against background knowledge Towards a Knowledge Integration Framework: Multi-subject data analysis Specialised Client Interfaces: Interactive Analysis and dynamic component interaction Text Mining Sequence Genetic Analysis Pathway Result Annotation, Structuring and Storage: Information source query, result browsing, sharing and markup Analysis Analysis Life science example application
13 Workflow execution Component execution location resolution User list of known resources A component can require explicitly to be executed on a particular resource A component can choose from a set of resources proposed (and could use Grid resource information systems and network weather information to determine where to go) For unconstrained components, simple near the data execution policy: If single input data location then execute there Otherwise fallback to original execution location Allows usual DPKD workflows to be designed Handles data management and transfer (serialisation, Java based, FTP based)
14 Discovery Net and Grid technologies Cluster/Campus Grid level: Partial or complete workflow execution on Condor / SGE Task farming on subset of the workflow Global Grid: GSI integration (Java Cog Kit) GSI-FTP transfer functionality (Java Cog Kit) OGSA Grid Service access to functionalities (GT3) Potential use of GRIS or NWS in component implementation Globus scheduler? Unicore? SRB?
15 Discovery Net Application Testbeds GUSTO UNITS with wireless connectivity Life Science Testbed: Gene sequencing, Protein Chips High Throughput real-time genome annotation testbed: analyse and interpret new sequences using existing distributed bioinformatics tools and databases Environmental Modelling Pollution Sensors (GUSTO): SO 2, Benzene,.. High Throughput real-time pollution monitoring testbed: analyse, interpret time-resolved correlations among remote stations, and with other environmental data sets Geo-hazard Prediction Multi-spectral, multi-temporal, Satellite imagery Real-time geo-hazard prediction testbed: analyse, interpret satellite images with other data sets to generate thematic knowledge
16 Case Study: SC2002 HPC Challenge High Throughput Sequencers Identify Organism Chromosomes Identify Organism s DNA D-Net based Global Collaborative Real- Time Genome Annotation Nucleotide-level Annotation Genes Gene markers Regulatory Segmental Regions Duplication Literature References trnas, rrnas Non-translated Repetitive RNAs Elements SNP Variations.. EMBL TIGR NCBI SNP genscan grail E-PCR blast Repeat Masker genscan Protein-level Annotation Identify Proteins Functional Characteisation Domain Fold Prediction Literature References Classify into Protein Families Homologues 3-D Structure Secondary structure.. Inter Pro SMART Inter SWISS Pro PROT blast PFAM predator 3D-PSSM Motif Search DSC Genome Annotation Process-level Annotation Relate Cell Cycle Drugs Cell death Literature References Metabolism Biological Process.. Embryogenesis.. Pathway Ontologies Maps GO CSNDB AmiGO GeneMaps virtual KEGG GK GenNav chip 15 DBs 21 Applications
17 500 3 Nucleotide Annotation Workflows How It Works Interactive Editor & Visualisation Download sequence from Reference Server Inter Pro EMBL SMART NCBI KEGG SWISS PROT Save to Distributed Annotation Server TIGR SNP GO 1800 clicks Web access 200 copy/paste weeks work in 1 workflow and few second execution Execute distributed annotation workflow
18 Conclusion and Future works Towards an open integration platform that enables scientists to conduct their KD activities Several levels of integration required Enable use of available resources Evolution towards cost model integration (performance, value, QoS) Semantic based service retrieval and composition Other useful standards? (OGSA-DAI?)
Topics of the talk. Biodatabases. Data types. Some sequence terminology...
Topics of the talk Biodatabases Jarno Tuimala / Eija Korpelainen CSC What data are stored in biological databases? What constitutes a good database? Nucleic acid sequence databases Amino acid sequence
More informationPowering Knowledge Discovery. Insights from big data with Linguamatics I2E
Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural
More informationHymenopteraMine Documentation
HymenopteraMine Documentation Release 1.0 Aditi Tayal, Deepak Unni, Colin Diesh, Chris Elsik, Darren Hagen Apr 06, 2017 Contents 1 Welcome to HymenopteraMine 3 1.1 Overview of HymenopteraMine.....................................
More informationAbout the Edinburgh Pathway Editor:
About the Edinburgh Pathway Editor: EPE is a visual editor designed for annotation, visualisation and presentation of wide variety of biological networks, including metabolic, genetic and signal transduction
More informationFinding and Exporting Data. BioMart
September 2017 Finding and Exporting Data Not sure what tool to use to find and export data? BioMart is used to retrieve data for complex queries, involving a few or many genes or even complete genomes.
More informationData Mining Technologies for Bioinformatics Sequences
Data Mining Technologies for Bioinformatics Sequences Deepak Garg Computer Science and Engineering Department Thapar Institute of Engineering & Tecnology, Patiala Abstract Main tool used for sequence alignment
More informationSEEK User Manual. Introduction
SEEK User Manual Introduction SEEK is a computational gene co-expression search engine. It utilizes a vast human gene expression compendium to deliver fast, integrative, cross-platform co-expression analyses.
More informationComplex Query Formulation Over Diverse Information Sources Using an Ontology
Complex Query Formulation Over Diverse Information Sources Using an Ontology Robert Stevens, Carole Goble, Norman Paton, Sean Bechhofer, Gary Ng, Patricia Baker and Andy Brass Department of Computer Science,
More informationKnowledge Discovery Services and Tools on Grids
Knowledge Discovery Services and Tools on Grids DOMENICO TALIA DEIS University of Calabria ITALY talia@deis.unical.it Symposium ISMIS 2003, Maebashi City, Japan, Oct. 29, 2003 OUTLINE Introduction Grid
More informationSoftware review. Biomolecular Interaction Network Database
Biomolecular Interaction Network Database Keywords: protein interactions, visualisation, biology data integration, web access Abstract This software review looks at the utility of the Biomolecular Interaction
More informationTechnical Computing with MATLAB
Technical Computing with MATLAB University Of Bath Seminar th 19 th November 2010 Adrienne James (Application Engineering) 1 Agenda Introduction to MATLAB Importing, visualising and analysing data from
More informationNCBI News, November 2009
Peter Cooper, Ph.D. NCBI cooper@ncbi.nlm.nh.gov Dawn Lipshultz, M.S. NCBI lipshult@ncbi.nlm.nih.gov Featured Resource: New Discovery-oriented PubMed and NCBI Homepage The NCBI Site Guide A new and improved
More informationXML in the bipharmaceutical
XML in the bipharmaceutical sector XML holds out the opportunity to integrate data across both the enterprise and the network of biopharmaceutical alliances - with little technological dislocation and
More informationD360: Unlock the value of your scientific data Solving Informatics Problems for Translational Research
D360: Unlock the value of your scientific data Solving Informatics Problems for Translational Research Dr. Fabian Bös, Senior Application Scientist Certara Spain SL Martin-Kollar-Str. 17, 81829 Munich
More informationTwo Examples of Datanomic. David Du Digital Technology Center Intelligent Storage Consortium University of Minnesota
Two Examples of Datanomic David Du Digital Technology Center Intelligent Storage Consortium University of Minnesota Datanomic Computing (Autonomic Storage) System behavior driven by characteristics of
More informationData mining fundamentals
Data mining fundamentals Elena Baralis Politecnico di Torino Data analysis Most companies own huge bases containing operational textual documents experiment results These bases are a potential source of
More informationWorkflow, Planning and Performance Information, information, information Dr Andrew Stephen M c Gough
Workflow, Planning and Performance Information, information, information Dr Andrew Stephen M c Gough Technical Coordinator London e-science Centre Imperial College London 17 th March 2006 Outline Where
More informationD B M G Data Base and Data Mining Group of Politecnico di Torino
DataBase and Data Mining Group of Data mining fundamentals Data Base and Data Mining Group of Data analysis Most companies own huge databases containing operational data textual documents experiment results
More informationSELF-SERVICE SEMANTIC DATA FEDERATION
SELF-SERVICE SEMANTIC DATA FEDERATION WE LL MAKE YOU A DATA SCIENTIST Contact: IPSNP Computing Inc. Chris Baker, CEO Chris.Baker@ipsnp.com (506) 721 8241 BIG VISION: SELF-SERVICE DATA FEDERATION Biomedical
More informationDistributed Repository for Biomedical Applications
Distributed Repository for Biomedical Applications L. Corradi, I. Porro, A. Schenone, M. Fato University of Genoa Dept. Computer Communication and System Sciences (DIST) BIOLAB Contact: ivan.porro@unige.it
More informationEBI services. Jennifer McDowall EMBL-EBI
EBI services Jennifer McDowall EMBL-EBI The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number 226073 (Integrating
More informationCustomisable Curation Workflows in Argo
Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:
More informationOntology-Based Mediation in the. Pisa June 2007
http://asp.uma.es Ontology-Based Mediation in the Amine System Project Pisa June 2007 Prof. Dr. José F. Aldana Montes (jfam@lcc.uma.es) Prof. Dr. Francisca Sánchez-Jiménez Ismael Navas Delgado Raúl Montañez
More informationMetadata, Ontologies and Information Models for Grid PSE Toolkits based on Web Services
Metadata, Ontologies and Information Models for Grid PSE Toolkits based on Web Services Carmela Comito 1, Carlo Mastroianni 2 and Domenico Talia 1,2 ABSTRACT: 1 DEIS, University of Calabria, Via P. Bucci
More informationISO INTERNATIONAL STANDARD. Health informatics Genomic Sequence Variation Markup Language (GSVML)
INTERNATIONAL STANDARD ISO 25720 First edition 2009-08-15 Health informatics Genomic Sequence Variation Markup Language (GSVML) Informatique de santé Langage de balisage de la variation de séquence génomique
More informationEnabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services
Enabling Open Science: Data Discoverability, Access and Use Jo McEntyre Head of Literature Services www.ebi.ac.uk About EMBL-EBI Part of the European Molecular Biology Laboratory International, non-profit
More informationEBI patent related services
EBI patent related services 4 th Annual Forum for SMEs October 18-19 th 2010 Jennifer McDowall Senior Scientist, EMBL-EBI EBI is an Outstation of the European Molecular Biology Laboratory. Overview Patent
More informationUnstructured Text in Big Data The Elephant in the Room
Unstructured Text in Big Data The Elephant in the Room David Milward ICIC, October 2013 Click Unstructured to to edit edit Master Master Big title Data style title style Big Data Volume, Variety, Velocity
More informationGoal-oriented Schema in Biological Database Design
Goal-oriented Schema in Biological Database Design Ping Chen Department of Computer Science University of Helsinki Helsinki, Finland 00014 EMAIL: pchen@cs.helsinki.fi Abstract In this paper, I reviewed
More informationGenome Environment Browser (GEB) user guide
Genome Environment Browser (GEB) user guide GEB is a Java application developed to provide a dynamic graphical interface to visualise the distribution of genome features and chromosome-wide experimental
More informationBIOINFORMATICS A PRACTICAL GUIDE TO THE ANALYSIS OF GENES AND PROTEINS
BIOINFORMATICS A PRACTICAL GUIDE TO THE ANALYSIS OF GENES AND PROTEINS EDITED BY Genome Technology Branch National Human Genome Research Institute National Institutes of Health Bethesda, Maryland B. F.
More informationFrom Visual Data Exploration and Analysis to Scientific Conclusions
From Visual Data Exploration and Analysis to Scientific Conclusions Alexandra Vamvakidou, PhD September 15th, 2016 HUMAN HEALTH ENVIRONMENTAL HEALTH 2014 PerkinElmer The Power of a Visual Data We Collect
More informationLife Sciences Oracle Based Solutions. June 2004
Life Sciences Oracle Based Solutions June 2004 Overview of Accelrys Leading supplier of computation tools to the life science and informatics research community: Bioinformatics Cheminformatics Modeling/Simulation
More informationThe VERCE Architecture for Data-Intensive Seismology
The VERCE Architecture for Data-Intensive Seismology Iraklis A. Klampanos iraklis.klampanos@ed.ac.uk 1 http://www.verce.eu Overall Objectives VERCE: Virtual Earthquake and Seismology Research Community
More informationIntroduction. Software Trends. Topics for Discussion. Grid Technology. GridForce:
GridForce: A Multi-tier Approach to Prepare our Workforce for Grid Technology Bina Ramamurthy CSE Department University at Buffalo (SUNY) 201 Bell Hall, Buffalo, NY 14260 716-645-3180 (108) bina@cse.buffalo.edu
More informationDatabase Searching Lecture - 2
Database Searching Lecture - 2 Slides borrowed from: Debbie Laudencia-Chingcuanco, USDA-ARS Cheryl Seaton, USDA-ARS Victoria Carrollo, USDA-ARS Zjelka McBride, UC Davis Database Searching Utilizes Search
More informationTBtools, a Toolkit for Biologists integrating various HTS-data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 TBtools, a Toolkit for Biologists integrating various HTS-data handling tools with a user-friendly interface Chengjie Chen 1,2,3*, Rui Xia 1,2,3, Hao Chen 4, Yehua
More informationSoftware review. Shopping in the genome market with EnsMart
Shopping in the genome market with EnsMart Keywords: genome databases, human genome, comparative genomics, data mining, open source software Abstract Life scientists who work with the supermarket of genome
More informationEPCC Sun Data and Compute Grids Project Update
EPCC Sun Data and Compute Grids Project Update Using Sun Grid Engine and Globus for Multi-Site Resource Sharing Grid Engine Workshop, Regensburg, September 2003 Geoff Cawood Edinburgh Parallel Computing
More informationAn Algebra for Protein Structure Data
An Algebra for Protein Structure Data Yanchao Wang, and Rajshekhar Sunderraman Abstract This paper presents an algebraic approach to optimize queries in domain-specific database management system for protein
More informationLiving Transformation: ISAF-SEAS Case Study for Urgent Computing
Living Transformation: ISAF-SEAS Case Study for Urgent Computing Alok Chaturvedi Purdue University West Lafayette, IN 47907 alok@purdue.edu (765) 494-9048 Topics What is ISAF? SEAS role in ISAF SEAS in
More informationRapid Deployment of VS Workflows. Meta Scheduling Service
Rapid Deployment of VS Workflows on PHOSPHORUS using Meta Scheduling Service M. Shahid, Bjoern Hagemeier Fraunhofer Institute SCAI, Research Center Juelich. (TNC 2009) Outline Introduction and Motivation
More informationCategorized software tools: (this page is being updated and links will be restored ASAP. Click on one of the menu links for more information)
Categorized software tools: (this page is being updated and links will be restored ASAP. Click on one of the menu links for more information) 1 / 5 For array design, fabrication and maintaining a database
More informationScientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego
Scientific Workflow Tools Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego 1 escience Today Increasing number of Cyberinfrastructure (CI) technologies Data Repositories: Network
More informationand the bringing cabig cancer data to the.net Developer and Microsoft Office User Communities
and the bringing cabig cancer data to the.net Developer and Microsoft Office User Communities http://xl-cabig-client.sourceforge.net/ Robert Macura Tom Macura escience Workshop October 2005 Science Paradigms
More informationNational Centre for Text Mining NaCTeM. e-science and data mining workshop
National Centre for Text Mining NaCTeM e-science and data mining workshop John Keane Co-Director, NaCTeM john.keane@manchester.ac.uk School of Informatics, University of Manchester What is text mining?
More information2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.
Web resources -- Tour. page 1 of 8 This is a guided tour. Any homework is separate. In fact, this exercise is used for multiple classes and is publicly available to everyone. The entire tour will take
More informationApplication of Support Vector Machine In Bioinformatics
Application of Support Vector Machine In Bioinformatics V. K. Jayaraman Scientific and Engineering Computing Group CDAC, Pune jayaramanv@cdac.in Arun Gupta Computational Biology Group AbhyudayaTech, Indore
More informationGrid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms
Grid Computing 1 Resource sharing Elements of Grid Computing - Computers, data, storage, sensors, networks, - Sharing always conditional: issues of trust, policy, negotiation, payment, Coordinated problem
More informationHow to use KAIKObase Version 3.1.0
How to use KAIKObase Version 3.1.0 Version3.1.0 29/Nov/2010 http://sgp2010.dna.affrc.go.jp/kaikobase/ Copyright National Institute of Agrobiological Sciences. All rights reserved. Outline 1. System overview
More informationBovineMine Documentation
BovineMine Documentation Release 1.0 Deepak Unni, Aditi Tayal, Colin Diesh, Christine Elsik, Darren Hag Oct 06, 2017 Contents 1 Tutorial 3 1.1 Overview.................................................
More informationScientific Workflows
Scientific Workflows Overview More background on workflows Kepler Details Example Scientific Workflows Other Workflow Systems 2 Recap from last time Background: What is a scientific workflow? Goals: automate
More informationACET s e-research Activities
18 June 2008 1 Computing Resources 2 Computing Resources Scientific discovery and advancement of science through advanced computing Main Research Areas Computational Science Middleware Technologies for
More informationrid Computing: An Indistry View
rid Computing: An Indistry View ision, Strategy, Software, Examples Wolfgang Gentzsch Director Grid Computing Sun Microsystems Inc Agenda Grid Computing: Vision & Strategy Architecture Building Blocks
More informationIntegrated Access to Biological Data. A use case
Integrated Access to Biological Data. A use case Marta González Fundación ROBOTIKER, Parque Tecnológico Edif 202 48970 Zamudio, Vizcaya Spain marta@robotiker.es Abstract. This use case reflects the research
More informationIntroduction to GE Microarray data analysis Practical Course MolBio 2012
Introduction to GE Microarray data analysis Practical Course MolBio 2012 Claudia Pommerenke Nov-2012 Transkriptomanalyselabor TAL Microarray and Deep Sequencing Core Facility Göttingen University Medical
More informationGeBBA Lab Genomic and Bioinformatic Applied to Biotech
GeBBA Lab Genomic and Bioinformatic Applied to Biotech Sergio D Ascia, NSI Bologna Italy - s.dascia@nsi-mail.it Giuseppe Frangiamone, NSI Bologna Italy g.frangiamone@nsi-mail.it NSI - Nier Soluzioni Informatiche
More informationGrid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen
Grid Computing 7700 Fall 2005 Lecture 5: Grid Architecture and Globus Gabrielle Allen allen@bit.csc.lsu.edu http://www.cct.lsu.edu/~gallen Concrete Example I have a source file Main.F on machine A, an
More informationBioinformatics explained: BLAST. March 8, 2007
Bioinformatics Explained Bioinformatics explained: BLAST March 8, 2007 CLC bio Gustav Wieds Vej 10 8000 Aarhus C Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com info@clcbio.com Bioinformatics
More informationUsing DAML format for representation and integration of complex gene networks: implications in novel drug discovery
Using DAML format for representation and integration of complex gene networks: implications in novel drug discovery K. Baclawski Northeastern University E. Neumann Beyond Genomics T. Niu Harvard School
More informationBLAST, Profile, and PSI-BLAST
BLAST, Profile, and PSI-BLAST Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 26 Free for academic use Copyright @ Jianlin Cheng & original sources
More informationThe Model-Driven Semantic Web Emerging Standards & Technologies
The Model-Driven Semantic Web Emerging Standards & Technologies Elisa Kendall Sandpiper Software March 24, 2005 1 Model Driven Architecture (MDA ) Insulates business applications from technology evolution,
More informationQuerying Multiple Bioinformatics Information Sources: Can Semantic Web Research Help?
Querying Multiple Bioinformatics Information Sources: Can Semantic Web Research Help? David Buttler, Matthew Coleman 1, Terence Critchlow 1, Renato Fileto, Wei Han, Ling Liu, Calton Pu, Daniel Rocco, Li
More informationData Integration and Data Warehousing Database Integration Overview
Data Integration and Data Warehousing Database Integration Overview Sergey Stupnikov Institute of Informatics Problems, RAS ssa@ipi.ac.ru Outline Information Integration Problem Heterogeneous Information
More informationPortals and workflows: Taverna Workbench. Paolo Romano National Cancer Research Institute, Genova
Portals and workflows: Taverna Workbench Paolo Romano National Cancer Research Institute, Genova (paolo.romano@istge.it) 1 Summary Information and data integration in biology Web Services and workflow
More informationExploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix
Exploring and Exploiting the Biological Maze Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Motivation An abundance of biological data sources contain data about scientific entities, such as
More informationWinBioinfTools: Bioinformatics Tools for Windows High Performance Computing Server Mohamed Abouelhoda Nile University
WinBioinfTools: Bioinformatics Tools for Windows High Performance Computing Server 2008 joint project between Nile University, Microsoft Egypt, and Cairo Microsoft Innovation Center Mohamed Abouelhoda
More informationGrid-Based Data Mining and the KNOWLEDGE GRID Framework
Grid-Based Data Mining and the KNOWLEDGE GRID Framework DOMENICO TALIA (joint work with M. Cannataro, A. Congiusta, P. Trunfio) DEIS University of Calabria ITALY talia@deis.unical.it Minneapolis, September
More informationJuliusz Pukacki OGF25 - Grid technologies in e-health Catania, 2-6 March 2009
Grid Technologies for Cancer Research in the ACGT Project Juliusz Pukacki (pukacki@man.poznan.pl) OGF25 - Grid technologies in e-health Catania, 2-6 March 2009 Outline ACGT project ACGT architecture Layers
More informationA distributed computation of Interpro Pfam, PROSITE and ProDom for protein annotation
E.O. Ribeiro et al. 590 A distributed computation of Interpro Pfam, PROSITE and ProDom for protein annotation Edward de O. Ribeiro¹, Gustavo G. Zerlotini¹, Irving R.M. Lopes¹, Victor B.R. Ribeiro¹, Alba
More informationExercises. Biological Data Analysis Using InterMine workshop exercises with answers
Exercises Biological Data Analysis Using InterMine workshop exercises with answers Exercise1: Faceted Search Use HumanMine for this exercise 1. Search for one or more of the following using the keyword
More informationS.No QUESTIONS COMPETENCE LEVEL UNIT -1 PART A 1. Illustrate the evolutionary trend towards parallel distributed and cloud computing.
VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur 603203. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : IV & VII Section : CSE -1& 2 Subject Code : CS6703 Subject Name : Grid
More informationMSF: A Workflow Service Infrastructure for Computational Grid Environments
MSF: A Workflow Service Infrastructure for Computational Grid Environments Seogchan Hwang 1 and Jaeyoung Choi 2 1 Supercomputing Center, Korea Institute of Science and Technology Information, 52 Eoeun-dong,
More informationGrid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007
Grid Programming: Concepts and Challenges Michael Rokitka SUNY@Buffalo CSE510B 10/2007 Issues Due to Heterogeneous Hardware level Environment Different architectures, chipsets, execution speeds Software
More informationGraph Modeling and Analysis in Oracle
Graph Modeling and Analysis in Oracle Susie Stephens Principal Product Manager, Life Sciences Oracle Corporation BioPathways, July 30, 2004 Access Distributed Data UltraSearch External Sites Distributed
More informationAbstract. of biological data of high variety, heterogeneity, and semi-structured nature, and the increasing
Paper ID# SACBIO-129 HAVING A BLAST: ANALYZING GENE SEQUENCE DATA WITH BLASTQUEST WHERE DO WE GO FROM HERE? Abstract In this paper, we pursue two main goals. First, we describe a new tool called BlastQuest,
More informationAutomation of bioinformatics processes through workflow management systems
Automation of bioinformatics processes through workflow management systems Paolo Romano Bioinformatics National Cancer Research Institute of Genoa, Italy paolo.romano@istge.it Summary Information and data
More informationBioExtract Server User Manual
BioExtract Server User Manual University of South Dakota About Us The BioExtract Server harnesses the power of online informatics tools for creating and customizing workflows. Users can query online sequence
More informationSequence Alignment. GBIO0002 Archana Bhardwaj University of Liege
Sequence Alignment GBIO0002 Archana Bhardwaj University of Liege 1 What is Sequence Alignment? A sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity.
More informationDesigning a Java-based Grid Scheduler using Commodity Services
Designing a Java-based Grid Scheduler using Commodity Services Patrick Wendel Arnold Fung Moustafa Ghanem Yike Guo patrick@inforsense.com arnold@inforsense.com mmg@doc.ic.ac.uk yg@doc.ic.ac.uk InforSense
More informationCLC Server. End User USER MANUAL
CLC Server End User USER MANUAL Manual for CLC Server 10.0.1 Windows, macos and Linux March 8, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark
More informationmy Grid Requirements for the information repository and management of information in mygrid Document Class: Requirements document
my Grid Requirements for the information repository and management of information in mygrid Document Class: Requirements document Document Reference: PL2 Issue No: 0.2 Author: Peter Li Institution: University
More informationLecture 5. Functional Analysis with Blast2GO Enriched functions. Kegg Pathway Analysis Functional Similarities B2G-Far. FatiGO Babelomics.
Lecture 5 Functional Analysis with Blast2GO Enriched functions FatiGO Babelomics FatiScan Kegg Pathway Analysis Functional Similarities B2G-Far 1 Fisher's Exact Test One Gene List (A) The other list (B)
More informationStructural Bioinformatics
Structural Bioinformatics Elucidation of the 3D structures of biomolecules. Analysis and comparison of biomolecular structures. Prediction of biomolecular recognition. Handles three-dimensional (3-D) structures.
More informationDatabase and Knowledge-Base Systems: Data Mining. Martin Ester
Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro
More informationWhat is a Web Service?
Web Services What is a Web Service? Piece of software available over Internet Uses standardized (i.e., XML) messaging system More general definition: collection of protocols and standards used for exchanging
More informationMaster Thesis. Andreas Schlicker
Master Thesis A Global Approach to Comparative Genomics: Comparison of Functional Annotation over the Taxonomic Tree by Andreas Schlicker A Thesis Submitted to the Center for Bioinformatics of Saarland
More informationAnnotating a Genome in PATRIC
Annotating a Genome in PATRIC The following step-by-step workflow is intended to help you learn how to navigate the new PATRIC workspace environment in order to annotate and browse your genome on the PATRIC
More informationTools and Services for Distributed Knowledge Discovery on Grids
Tools and Services for Distributed Knowledge Discovery on Grids Domenico Talia (Joint work with Mario Cannataro and Paolo Trunfio) DEIS University of Calabria, ITALY talia@deis.unical.it HPC 2002, Cetraro,
More informationArchitecture, Metadata, and Ontologies in the Knowledge Grid
UNICZ UNICAL Architecture, Metadata, and Ontologies in the Knowledge Grid Mario Cannataro University Magna Græcia of Catanzaro, Italy cannataro@unicz.it joint work with: D. Talia, C. Comito, A. Congiusta,
More informationMaximizing Public Data Sources for Sequencing and GWAS
Maximizing Public Data Sources for Sequencing and GWAS February 4, 2014 G Bryce Christensen Director of Services Questions during the presentation Use the Questions pane in your GoToWebinar window Agenda
More informationIntegrative Informatics
Early Vision Integrative Informatics Isaac S. Kohane 4.27.04 PIP s Integration Integrating Genomics and Pharmacology RNA expression in NCI 60 cell lines was determined using Affymetrix HU6000 arrays 5,223
More informationA SURVEY OF DATA MINING & ITS APPLICATIONS
A SURVEY OF DATA MINING & ITS APPLICATIONS Pankaj jain M.Tech Student, Computer Science Siddhi Vinayak College of Science & Hr.Education, Alwar (Rajasthan) Abstract- Data mining consists of evolving set
More informationCAP BIOINFORMATICS Su-Shing Chen CISE. 8/19/2005 Su-Shing Chen, CISE 1
CAP 5510-2 BIOINFORMATICS Su-Shing Chen CISE 8/19/2005 Su-Shing Chen, CISE 1 Building Local Genomic Databases Genomic research integrates sequence data with gene function knowledge. Gene ontology to represent
More information* Inter-Cloud Research: Vision
* Inter-Cloud Research: Vision for 2020 Ana Juan Ferrer, ATOS & Cluster Chair Vendor lock-in for existing adopters Issues: Lack of interoperability, regulatory context, SLAs. Inter-Cloud: Hardly automated,
More informationA Grid Middleware for Ontology Access
Available online at http://www.ges2007.de This document is under the terms of the CC-BY-NC-ND Creative Commons Attribution A Grid Middleware for Access Michael Hartung 1 and Erhard Rahm 2 1 Interdisciplinary
More informationLecture 5 Advanced BLAST
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 5 Advanced BLAST BLAST Recap Sequence Alignment Complexity and indexing BLASTN and BLASTP Basic parameters
More informationSarah Cohen-Boulakia. Université Paris Sud, LRI CNRS UMR
Sarah Cohen-Boulakia Université Paris Sud, LRI CNRS UMR 8623 cohen@lri.fr 01 69 15 32 16 https://www.lri.fr/~cohen/bigdata/biodata-ami2b.html Understanding Life Sciences Progress in multiple domains: biology,
More informationbiokepler: A Comprehensive Bioinforma2cs Scien2fic Workflow Module for Distributed Analysis of Large- Scale Biological Data
biokepler: A Comprehensive Bioinforma2cs Scien2fic Workflow Module for Distributed Analysis of Large- Scale Biological Data Ilkay Al/ntas 1, Jianwu Wang 2, Daniel Crawl 1, Shweta Purawat 1 1 San Diego
More informationCoherence and Binding for Workflows and Service Compositions
Coherence and Binding for Workflows and Service Compositions Item Type Presentation Authors Patrick, Timothy Citation Coherence and Binding for Workflows and Service Compositions 2005-10, Download date
More information