Convocatoria Movilidad de Jóvenes Doctores

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Convocatoria Movilidad de Jóvenes Doctores"

Transcription

1 UNIVERSIDAD CARLOS III DE MADRID ESCUELA POLITÉCNICA SUPERIOR DEPARTAMENTO DE INFORMÁTICA Semantically-Enhanced Bioinformatics Platform (SEBIO) MEMORIA Convocatoria Movilidad de Jóvenes Doctores Autor: Dr. Juan Miguel Gómez Responsable en la Institución de acogida: Prof. Dr. Ying Liu Institución de acogida: Laboratory of Bioinformatics and Medical Informatics, University of Texas at Dallas (UTD)

2 Table of Contents TABLE OF CONTENTS...I 1 INTRODUCTION Context and Objectives Contributions of this Report Report Organization PROBLEM STATEMENT Problem Scenarios Real World Scenarios Semantic Heterogeneity RELATED WORK SEBIO FUNDAMENTAL CONCEPTS Bioinformatics The Semantic Web SEBIO Components MASIA: A MICRO-ARRAY INFORMATION AND DATA INTEGRATION SEMANTICS-BASED ARCHITECTURE Introduction and Goals Micro-Array Data Sources Heterogeneity in Biological Data Micro-Array Data Sources The MGED Ontology Micro-Array Data Integration The MASIA approach The MASIA Software Architecture BLISS: A BIOMEDICAL LITERATURE SOCIAL RANKING SYSTEM Introduction and Goals Collaborative Discovery Bridging the Gap: Social Semantics BLISS: A Biological Literature Social Ranking System BIRD: BIOMEDICAL INFORMATION INTEGRATION AND DISCOVERY WITH SEMANTIC WEB SERVICES Introduction and Goals BIRD: Biological Information Integration Discovery I

3 7.3 Needle in a haystack: Dealing with Semantic Web Services Using BIRD for Biomedical Information Integration CONCLUSIONS AND FUTURE WORK REFERENCES Printed Publications World Wide Web Resources II

4 1 Introduction This report aims at summarizing and outlining the scope of the scientific joint work and collaboration that has taken place during a two month research stay in the context of the Convocatoria de Movilidad de Jóvenes Doctores de la Universidad Carlos III de Madrid. This research has been conducted in the Laboratory of BioInformatics and Medical Informatics of the University of Dallas at Texas (UTD), headed by Prof. Dr. Ying Liu, one of the most prominent researchers in the area. In the last years, technological advances in high-throughput techniques and efficient data gathering methods coupled with a world wide effort in computational biology have resulted in a vast amount of life science data often available in distributed and heterogeneous repositories. These repositories contain interesting information such as sequence and structure data, annotations for biological data, results of complex and expensive computations, genetic sequences and multiple bio-datasets. However, the multiplicity and heterogeneity in the objectives, methods, representation, and platforms of these data sources and analysis tools have created an urgent and immediate need for research in resource integration and platform independent processing of investigative queries involving heterogeneous data sources and analysis tools. It is now universally recognized that a database approach to analysis and management of biological data offers a convenient, high level, and efficient alternative to high volume biological data processing and management. The Semantic Web and Semantic Web Services paradigm promise a new level of data and process integration that can be leveraged to develop novel high-performance data and process management systems for biological applications. Biomedical ontologies constitute a best-of-breed approach for addressing the aforementioned problems. Using semantic technologies as a key technology for interoperation of various datasets enables data integration of the vast amount of biological and biomedical data. In this research proposal, we aim at providing a Semantically-Enhanced BioInformatics Platform (SEBIO) that handles effectively: Conceptual Models for Biological Data Use of Semantics to manage interoperation of Biomedical datasets Biomedical Data Engineering using ontologies 1

5 Support of Ontologies for Biological Information Retrieval and Web Services Our SEBIO approach ascertains knowledge discovery and provides a semanticallyenhanced solution to harvest and integrate information from text, biological databases, ontologies and terminological resources. This would for example be used with largescale knowledge bases such as the Gene Ontology. Equally, we will benefit from the existence of semantically annotated corpora for testing and training. We will also focus on the study of paramount importance for efficient semantic mining based on NLP techniques. 1.1 Context and Objectives The goal of this research staying is to start off and progress an international research collaboration line which enables mutual synergy and knowledge transfer and exchange. This would revert into fruitful cooperation results among the Universidad Carlos III and the SoftLab Group together with the University of Dallas and the group of Prof. Dr. Ying Liu. These objectives have been fully achieved, as the outcome of this report will show, being established a two-way scientific cooperation solidly built up on the accomplishments of the SEBIOS proposal. In the following sections, we will document the mentioned outcome and the contributions of this report. 1.2 Contributions of this Report Technology is a means optimizing benefits and resources for a number of disciplines, particularly research disciplines. In particular, several research disciplines such as Bioinformatics, where information integration is critical, could benefit from harnessing the potential of new approaches, such as the Semantic Web and Semantic Web Services. For this, we believe that the Semantic Web and Semantic Web Services paradigm (see section 4.2 for further details) promise a new level of data and process integration that can be leveraged to develop novel high-performance data and process management systems for biological applications. Using semantic technologies as a key technology for interoperation of various datasets enables data integration of the vast amount of biological and biomedical data. The breakthroughs of adding semantics to such data is depicted in Figure 1. In a nutshell, the use of knowledge-oriented biomedical data 2

6 integration would lead to achieving Intelligent Biomedical Data Integration, which will bring biomedical research to its full potential. Figure 1. Intelligent Biomedical Data Integration In the following sections, we will unfold how the SEBIO approach contributes and sets itself as a cornerstone for the development of Intelligent Biomedical Integration. 1.3 Report Organization This report is organized as follows. In chapter two, the problem statement is presented. We describe the context of our work and particularly, our domain of interest. We illustrate our motivation with the precise problem scenario and we use it to formulate the leading guidelines of our work. In chapter 3, the state of the art bioinformatics, what we understand for bioinformatics and biomedical research. 3

7 The gist of our work is discussed in chapter 4, 5, 6 and 7, where the SEBIO approach is unfolded. Finally, the remainder of the report presents our conclusions and future work section, which concludes the report. 2 Problem Statement As discussed in [Cohen, 04], it is undeniable that, among the sciences, biology played a key role in the twentieth century. Over the past fifteen years we have witnessed a dramatic transformation in the practice of life sciences research. We have already selected many of the proverbial low hanging fruit of dominant mutations and simple diseases. Chronic and more complex diseases, as well as efforts to design microbes for engineering needs or to uncover the basis of a genetic repair, need the ladder of IT to reach the higher branches in living systems. At the same time, technological improvements in sequencing instrumentation and automated sample preparation have made it possible to create high throughput facilities for DNA sequencing, high throughput combinatorial chemistry for drug screening, high throughput proteomics, high throughput genomics, etc. In consequence, what was once a cottage industry marked by scarce expensive data obtained largely by the manual efforts of small groups of graduate students, post-docs and a few technicians has become industrialized and data-rich, marked by factory scale sequencing organizations. That role is likely to acquire further importance in the years to come. In the wake of the work of Watson and Crick [Watson and Crick, 2003] and the sequencing of the human genome, far-reaching discoveries are constantly being made. One of the central factors promoting the importance of biology is its relationship with medicine. Fundamental progress in medicine depends on elucidating some of the mysteries that occur in the biological sciences. However, biomedical research is now information intensive; the volume and diversity of new data sources challenges current database technologies. The development and tuning of database technologies for biology and medicine will maintain and accelerate the current pace for innovation and discovery. There are four main classes of situations in which data management technology is critical to supporting health-related goals, mentioned in the following: 4

8 - The rapid construction of task-specific databases to manage diverse data for solving a targeted problem - The creation of data systems that assist research efforts by combining multiple sources of data to generate and test new hypotheses, for instance, about disease and their treatments. - Management of databases to accumulate data supporting entire research communities. - The creation of databases to support data collection and analysis for clinical and field decision support. Such data integration is technically difficult for several reasons. First, the technologies on which different databases are based may differ and do not interoperate smoothly. Standards for cross-database communication allow the databases (and their users) to exchange information. Secondly, the precise naming conventions for many scientific concepts (such as individual genes, proteins or drugs) in fast developing fields are often inconsistent, and so mappings are required between different vocabularies. Third, the precise underlying biological model for the data may be different (scientists view things differently) and so to integrate these data requires a common model of the concepts that are relevant and their allowable relations. This reason is particularly crucial because unstated assumptions may lead to improper use of information, on the surface, appears to be valid. Fourth, as our understanding of a particular domain improves, not only will data change, but even database structures will evolve. Any users of the data source, including in particular any data integrators must be able to manage such data source evolution. These problems will be tackled more in detail in the problem scenarios of the following section. 2.1 Problem Scenarios In this section, we will first focus on three real world scenarios where the aforementioned problems are found. Subsequently, we will focus on the major problem that encompasses them, namely: semantic heterogeneity. We will finally discuss its major consequences and how we will tackle with such problem Real World Scenarios 5

9 The first real world scenario was the Four Corners Hantavirus Outbreak. Identifying new pathogens used to take months to years. The identification of Legionnaires disease and AIDS pathogens are cases in point. However, in 1993, when healthy young people in the American southwest began to die from an unknown pathogen, the virus responsible was identified in only one week using a combination of molecular biology and bioinformatics approaches. Traditional immunological approaches were only able to suggest that the virus involved in their Four Corners epidemic was distantly related to family of viruses known as hantaviruses, not enough information to prevent or treat de the disease. DNA sequences of related viruses in the hantavirus family were retrieved from DNA sequence databases and allowed the design of molecular probes (PCR primers) which were used in the first definitive test for the virus (confirming it as the pathogen) and allowing the determination of the DNA sequence of the new virus. In turn, the DNA sequence allowed the identification of the new virus closes relatives (in viruses fond in Korea), which shared similar animal vectors (rodents) and produced similar symptoms. Because the Four Corners hantavirus produces symptoms that resemble those of cold or flu before progressing to pulmonary arrest and sudden death, the assay developed bases on sequences found in DNA sequence databases was critical in stopping the spread of this epidemic. If this information had not been available i.e. online, well described and searchable, it might have taken several years and many deaths before this pathogen was identified. In the intervening ten years, electronic data resources have continued to grow, leading to ongoing challenges in building the kind of integrated, online resources needed to attack similar disease. The 2003 SARS threat underlines this need. A second real world scenario is related to the WTC Victim Identification. After the tragedy of September, 11, 2001, the police officers in New York City had the task of identifying the remains of victims, so that they could be returned to family members. Existing database systems were built predominantly on the assumption that individual remains would be found and identified on a one-by-one basis. The possibility of more than 300 victims and tens of thousands of samples was never considered in the design of the initial database system There are two sources of DNA in the tissues: nuclear and mitochondrial. Each of these sources has a number of attributes that can be measured and the combination of these attributes tends to be unique for individuals, thus allowing identification. Given a sample of known origin (taken from the personal effects of the victims and gathered 6

10 from their families), it can be compared with the profile of attributes gathered from the unknown samples, and matched. In many cases, additional evidence is required, including DNA samples from parents and siblings (who shared some, but not all DNA attributes with their relatives), information about where the remains were found, information about what personal effects were used for identification and the contact information about all the people who are reported as missing. To manage these data, the investigators built a complex system using cutting-edge database technology and state-of-the-art understanding of how to use genetics and other evidence to identify victims. The resulting tool continues to evolve, but has assisted in the identification of many victims and the return of their remains to loved ones. Although this database was built under extraordinary circumstances, the need for urgent assembly and integration of data and the provision of novel analytic capabilities based on this data occurs routinely in both biomedical research as well as in the delivery of healthcare. When these needs arise, it is too late to perform essential background research in order to support these efforts, and so these needs must be anticipated in order to respond in a timely manner to urgent needs. Finally, the malaria studies real world scenario also deals with information integration. The malaria parasite, Plasmodium Falciparum, is responsible for nearly 11 million deaths annually of children under the age of five. One of the great scientific achievements of 202 was the publication of the full genome (the DNA sequence) of both the parasite as well as the mosquito (Anopheles Gambiae) that carries it to human victims and the fist public release of the full genome database (PlasmoDB) [PlasmoDB, 03]. For the first time ever, we have the complete triad of genomes involved in this disease (the parasite, the vector mosquito and the human host). A primary health goal is to develop new drugs to effectively treat and perhaps eradicate malaria as a major threat to human health. The genome database provides the list of the genes that are present in the parasite, but does not organize these genes into the pathways and networks of interactions that could be used to understand the underlying wiring of the parasite and how it works. Fortunately, there are other databases, including the MetaCYC database [MetaCyc, 03] of metabolic pathways that can be used to assemble the genes into the metabolic machine that makes the parasite run. With a clear picture of this machine, we are able to identify vulnerable regions that can be targeted for interference with new drugs. In order to validate these targeted metabolic capabilities, we use other research databases (revealing where and when genes are turned on and off, including micro-array databases and proteomics databases) in order to prioritize the possible targets and asses heir 7

11 likelihood of success. Given a set of genes that would be good targets, we can further filter them by comparing them to human genes in order to help ensure that the new drugs will not be toxic for human use. In some cases, the gene targets are proteins with known three-dimensional structures (or strong similarity to known structures), stored in the Protein Data Bank (PDB), and in those cases we can explore the detailed atomic structure of these genes, and use databases of existing compounds in order to get a detailed understanding of how a potential drug might actually interact with its target an what modifications might make the drug more potent. At the end of this pipeline, then, we will have a relatively small set of candidate for further drug development that have been filtered using disparate information sources, each of which provides a unique type of information. The resulting drugs can then be tested experimentally, and the process of drug discovery has begun taking full advantage of all the relevant data sources upfront, thus decreasing the time to useful new drugs Semantic Heterogeneity The need to manage bioinformatics data has been coming into increasingly sharp focus for some time. Years ago, these data sat in silos attached to specific applications. Then the Web came into the arena, bringing the hurly-burly of data becoming available across applications, departments and entities in general. However, throughout these developments, a particular underlying problem has remained unsolved: data resides in thousands of incompatible formats and cannot be systematically managed, integrated, unified or cleansed. To make matters worse, this incompatibility is not limited to the use of different data technologies or to the multiple different flavors of each technology (for example, the different relational databases in existence), but also because of its incompatibility in terms of semantics. Hence, the most challenging incompatibility arises from semantic differences. In principle, each data asset is set up with its own world-view and vocabulary i.e. its schema. This incompatibility exists even if both assets use the same technology. For example, one database could have a table called Protein A intending to model a particular protein and classifying its function, categorizing it and relating it with some others proteins. Another database could simply refer to the same concept (the very particular Protein A) as a Protein Alfa, (although not including channel partners) and be sub-divided in a different way, related to some other proteins and linked to various functions. Since both proteins (despite being the same) present such dissimilarities, 8

12 they will never be related or co-related. If a particular researcher wants to know all the information about Protein A, it will not be able to accomplish a detailed overview of the information since these sources are absolutely unrelated. In a larger context, this problem may be multiplied by thousands of data structures located in hundreds of incompatible databases and message formats. And the problem is growing; bioinformatics related techniques continue to father more data, reengineer intense and massive data techniques processes and integrate with more sources. Moreover, developers are continuing to write new applications and to create new databases based on requests from users, without worrying about overall data management issues. 3 Related Work In this section, we focus on the related work surrounding the scope of this work. Hence, we will first give the rationale of the importance of bioinformatics and information integration by providing a background and context for them. We also provide an explanation about the requirements that motivate this work. We provide a contextualized and well-referenced description of the related work. Integration of heterogeneous data in life sciences is a growing and recognized challenge. As it has been discussed in [Gopalacharyulu et al, 05], several approaches for biological data integration have been developed. Well-known approaches include rulebased links such as SRS [Etzold and Argos, 93] or [Etzold et al, 96], federated middleware frameworks, such as Kleisli system [Davidson et al, 97] or [Chung and Wong, 99] as well as wrappers based solutions such as IBM Discovery Link [Hass et al, 01]. In parallel, progress has been made to organize biological knowledge in a conceptual way by developing ontologies and domain-specific vocabularies such as in [Ashburner et al, 00] or [Bard and Rhee, 04]. With the emergence of the Semantic Web, the ontology-based approach to life science data integration has become more ostensible. In this context, data integration comprises problems like homogenizing the data model with schema integration, combining multiple database queries and answers, transforming and integrating the latter to construct knowledge based on underlying knowledge representation. However, the ontology-based approach can not solve the evolving concepts in biology and its best 9

13 promise lies in specialized domains and environments where concepts and vocabularies can be well controlled [Searls, 05]. A similar approach to the work presented has been followed in [Gopalacharyulu et al, 05]. Their integration approach is based on the premise that relationships between biological entities can be represented as a complex network. The context dependency is achieved by a judicious use of distance measures on these networks. The biological entities and the distances between them are mapped for the purpose of visualization into the lower dimensional space suing the Sammon s mapping. Finally, their system implementation is based on a multi-tier architecture using a native XML database and software tools for querying and visualizing complex biological networks. However, the forthcomings of the approach are hampered by the fact that they are stated at pure XML-level without taking into account particular semantics of the mappings and hence, not being able to exploit the semantics inherent to the data formats. In this work, we have presented a novel approach to achieve micro-array data integration stemming from massive data gathering experiments. Finally, our future work will focus on finding more use cases and real-world scenarios to validate the efficiency of our approach and determine the feasibility of the semantic match of lightweight ontologies and mappings in particular contexts. This section has described several requirements extracted from our example. It has provided definitions and contextualized explanations of those. In the next section, we will examine how these requirements are faced in the current state of the art. 10

14 4 SEBIO Fundamental Concepts In this section, we first define the fundamental concepts that are encompassed by a Semantically-Enhanced Bioinformatics Platform (SEBIO), namely what is bioinformatics and its main goals, what is the Semantic Web and how it can help to data and information integration and finally, how are we going to partition our approach in three main projects overcoming this vision. 4.1 Bioinformatics There is no commonly world-wide recognized definition of Bioinformatics. There are also several standpoints when considering current Bioinformatics goals. Fundamentally, [Cohen, 04] claim the main role of Bioinformatics is to aid biologists in gathering and processing genomic data to study protein function and medical researchers in making detailed studies of protein structures to facilitate drug design. On a more general perspective, these goals can be outlined as follows: Inferring a protein shape and function from a given sequence of amino acids. Finding all the genes and proteins in a given genome. Determining sites in the protein structure where drug molecules can be attached. Hence, the major role of bioinformatics is to help infer gene function from existing data, this data being varied, incomplete and noisy. For that a number of techniques, strategies and approaches are summarized in the following as explained in [Cohen, 04]: Comparing Sequences: Given the huge number of gene sequences available, algorithms to compare them must be developed and allow deletion, insertion and replacements of symbols representing nucleotides or amino acids, for such transmutations occur in nature. Constructing Evolutionary (Phylogenetic Trees: Often constructed after comparing sequences from different organisms, these trees group the sequences according o their degree of similarity. Particularly, they serve as a guide to reason about how these sequences have been transformed through evolution. Detecting Patterns in Sequences: Using machine-learning and probabilistic grammars or neural networks, the goal here is to detect parts of DNA and amino acids. 11

15 Determining 3D structures from Sequences: Infer shapes from RNA sequences has been shown and proved difficult and remains an unsolved problem (cubic complexity). Inferring Cell Regulation: Genes interact with each other. Proteins can also prevent or assist the production of other proteins. What drives the behavior of these interactions? It is relevant to study the role of the gene or the protein in a metabolic or signaling pathway. Determining Protein Function and Metabolic Pathways: The objective here is to interpret human annotations for protein function and also to develop databases representing graphs that can be queried for the existence of nodes (specifying reactions) and paths (specifying sequences of reactions). Assembling DNA fragments A different approach is taken by [Ignacimuthu, 05], where the definition and main goals of bioinformatics are split into aims, tasks and areas. The main aims of Bioinformatics are as follows: To organize data in a way that allows researchers to access existing information and to submit new entries as they are produced. Develop tools and resources that aid in the analysis and management of data. Use these tools to analyze the data and interpret the results in a biologically meaningful manner. Main tasks involve the analysis of sequence information what implies what follows: Identifying the genes in the DNA sequence from various organisms. Developing methods to study the structure and function of newly identified sequences and corresponding structural RNA sequences. Identify families of related sequences and the development of models. Aligning similar sequences and generating phylogenetic trees to examine evolutionary relationships. Finally, the main areas to be tackled with are: Handling and management of biological data including its organization, control, linkages, analysis and so on. 12

16 Communication among people, projects and institutions engaged in biological research and applications. Organization, access, search and retrieval of biological information, documents and literature. Analysis and interpretation of the biological data through computational approaches concerning visualization, mathematical modeling and development of algorithms for highly parallel processing of complex biological structures. In a nutshell and in what the scope of this work concerns, bioinformatics is the use of techniques from applied mathematics, informatics, statistics, and computer science to solve biological problems, whereby integration and exchange of data within and among organizations is a universally recognized critical need. Bioinformatics and biomedical research deal with the problem of information and data integration, since, by far, the most obvious frustration of a life scientist today is the extreme difficulty in putting together information available from multiple distinct sources. As it is also discussed in section 2 and particularly more detailed with real world problem scenarios in section 2.1.1, a commonly noted obstacle for integration efforts in bioinformatics is that relevant information is widely distributed, both across the Internet and within individual organizations, and is found in a variety of storage formats, both traditional relational databases and non-traditional sources (e.g. text data sources in semi-structured text files or XML, and the result of analytic applications such as genefinding application or homology searches). Finally, as discussed in section 2.1.2, arguably a more critical need in data integration is to overcome semantic heterogeneity i.e. to identify objects in different databases that represent the same or related biological objects (genes, proteins, etc) and to resolve the differences in database structures or schemas, among the related entities. 4.2 The Semantic Web The Semantic Web term was coined in [Berners-Lee et al., 01] to describe the evolution of a Web that consisted largely of documents for humans to read towards a new paradigm that included data and information for computers to manipulate. Ontologies 13

17 [Fensel, 02] are its cornerstone technology, providing structured vocabularies that describe a formal specification of a shared conceptualization. The fundamental aim of the Semantic Web is to provide a response to the ever-growing need for data integration on the Web. The benefit of adding semantics is bridging nomenclature and terminological inconsistencies to comprehend underlying meaning in a unified manner. Semantics can be achieved by formally capturing the meaning of data, since a common data format will likely never be achieved, eventually leading to efficiently managing data by establishing a common understanding [Shadbolt et al, 05]. The de facto Semantic Web standard ontology language is OWL (Web Ontology Language) [OWL, 04]. OWL is a markup language for publishing and sharing data using ontologies on the Internet. OWL is a vocabulary extension of the Resource Description Framework (RDF) and is derived from the DAML+OIL Web Ontology Language. The OWL specification is maintained by the World Wide Web Consortium (W3C). OWL currently has three flavors: OWL Lite, OWL DL, and OWL Full. These flavors incorporate different features, and in general it is easier to reason about OWL Lite than OWL DL and OWL DL than OWL Full. OWL Lite and OWL DL are constructed in such a way that every statement can be decided in finite time; OWL Full can contain endless 'loops'. OWL DL is based on description logics. Its subset OWL Lite is based on less expressive logic. A more detailed explanation of the three increasingly expressive sublanguages designed for use by specific communities of implementers and users can be found in the following. OWL Lite supports those users primarily needing a classification hierarchy and simple constraints. For example, while it supports cardinality constraints, it only permits cardinality values of 0 or 1. It should be simpler to provide tool support for OWL Lite than its more expressive relatives, and OWL Lite provides a quick migration path for thesauri and other taxonomies. Owl Lite also has a lower formal complexity than OWL DL. OWL DL supports those users who want the maximum expressiveness while retaining computational completeness (all conclusions are guaranteed to be computed) and decidability (all computations will finish in finite time). OWL DL includes all OWL language constructs, but they can be used only under certain restrictions (for example, while a class may be a subclass of many classes, a class cannot be an instance of another class). OWL DL is so named due to its correspondence with description logic, a field of research that has studied the logics that form the formal foundation of OWL. Finally, OWL Full is meant for users who want maximum expressiveness and the syntactic 14

18 freedom of RDF with no computational guarantees. For example, in OWL Full a class can be treated simultaneously as a collection of individuals and as an individual in its own right. OWL Full allows an ontology to augment the meaning of the pre-defined (RDF or OWL) vocabulary. It is unlikely that any reasoning software will be able to support complete reasoning for every feature of OWL Full. A more lightweight ontology language is the Resource Description Framework (RDF) [Hayes, 04]. RDF is a family of specifications for a metadata model that is often implemented as an application of XML. The RDF family of specifications is maintained by the World Wide Web Consortium (W3C). The RDF metadata model is based upon the idea of making statements about resources in the form of a subject-predicate-object expression, called a triple in RDF terminology. The subject is the resource, the "thing" being described. The predicate is a trait or aspect about that resource, and often expresses a relationship between the subject and the object. The object is the object of the relationship or value of that trait. The RDF simple data model and ability to model disparate, abstract concepts has also led to its increasing use in knowledge management applications unrelated to Semantic Web activity. 4.3 SEBIO Components In the last years, technological advances in high-throughput techniques and efficient data gathering methods coupled with a world wide effort in computational biology have resulted in a vast amount of life science data often available in distributed and heterogeneous repositories. These repositories contain interesting information such as sequence and structure data, annotations for biological data, results of complex and expensive computations, genetic sequences and multiple bio-datasets. However, the multiplicity and heterogeneity in the objectives, methods, representation, and platforms of these data sources and analysis tools have created an urgent and immediate need for research in resource integration and platform independent processing of investigative queries involving heterogeneous data sources and analysis tools. It is now universally recognized that a database approach to analysis and management of biological data offers a convenient, high level, and efficient alternative to high volume biological data processing and management. For this, we believe that the Semantic Web and Semantic Web Services paradigm (see section 4.2 for further details) promise a new level of data and process integration that can be leveraged to develop novel high-performance data and process management systems for biological applications. Using semantic technologies as a key technology 15

19 for interoperation of various datasets enables data integration of the vast amount of biological and biomedical data. The breakthroughs of adding semantics to such dataare the use of knowledge-oriented biomedical data integration would lead to achieving Intelligent Biomedical Data Integration, which will bring biomedical research to its full potential. For that, we have divided SEBIO and its main goals into three main projects, which achieve all together the aforementioned goals. These three projects tackle with a particular feature of the SEBIO platform, namely: semantic data integration, semantic web services integration and finally, literature data integration. The three projects and the features covered are shown in Figure 2. Figure 2. SEBIO components The Micro-Array Information and Data Integration Semantics-based Architecture (MASIA) is an architecture to enable micro-array data sources integration. The Biomedical Information Integration and Discovery with Semantic Web Services (BIRD) aims at achieving fundamental integration for biomedical information sources. Finally, the Biomedical Literature Social Ranking System offers a wide range of documents and literature ranked in terms of interest about a number of topics. 16

20 These three projects are encompassed by the SEBIO approach but significant research has been made in the context of each and every of them. Hence, in the next sections, these projects will be discussed in detail, begin the real outcome of this work. 5 MASIA: A Micro-Array Information and Data Integration Semantics-based Architecture With the advent of online accessible bioinformatics data, information integration has become a fundamental issue. In the last years, the ability to perform biological in silico experiments has increased massively, largely due to high-throughput techniques gathering massive amounts of data. As Semantic Web is maturing and data and information integration is growing, harnessing the synergy in both approaches can leverage the lack of widely-accepted standards by fostering the use of semantic web technologies to represent, store and query metadata and data across bioinformatics datasets. In this section, we present MASIA, a fully-fledged semantically-enhanced architecture for the integration of micro-array data sources. We propose the MGED ontology as a basis for various integration data formats and depict a use-case scenario to show the advantages of our approach. 5.1 Introduction and Goals Searching and integrating data from various sources has become a fundamental is-sue in Bioinformatics research. Particularly in those fields where massive data gather-ing is faced, the need of information integration is critical, preserving by all means the semantics inherent to the different data sources and formats. As discussed in [Ignacimunthu, 05], such integration would permit to organize properly data fostering the analysis and access of such information to accomplish critical tasks such process-ing micro-array data to study protein function and medical researchers in making de-tailed studies of protein structures to facilitate drug design. A DNA microarray (also commonly known as gene chip, DNA chip, or biochip) is a collection of microscopic DNA spots attached to a solid surface, such as glass, plastic or silicon chip forming an array for the purpose of expression profiling, monitoring 17

21 expression levels for thousands of genes simultaneously. Measuring gene expression using micro-arrays is relevant to many areas of biology and medicine, such as studying treatments, disease, and developmental stages. For example, micro-arrays can be used to identify disease genes by comparing gene expression in disease and normal cells. These experiments are however pooling out massive amounts of data which are stored in a number of data formats and data sources, hampering the interoperability among them. The main problem looming over the lack of integration is the fact that the current Web is an environment primarily developed for human users and micro-array data resources lack of widely accepted standards, what leads to a tremendous data heterogeneity. The need of adding semantics to the Web and use semantics to achieve in-formation integration becomes even more critical as information systems become more complicate and data formats gain a more complex structure. The Semantic Web is about adding machine-understandable and machine-processable metadata to Web resources through its key-enabling technology: ontologies [Fensel, 02]. Ontologies are a formal, explicit and shared specification of a conceptualization. The goal of the Semantic Web is to provide a response to the ever-growing need for data integration on the Web. The benefit of adding semantics is bridging nomenclature and terminological inconsistencies to comprehend underlying meaning in a unified manner. In this section, we present MASIA, a Micro-Array Information and Data Integration Semantics-based Architecture. The breakthrough of MASIA is using semantics as a formal means of leveraging different vocabularies and terminologies and foster integration. Firstly, the MASIA approach consists of a methodology to gather requirements, collect and classify metadata and the different data schemas stemming from the data resources to be integrated, construct a Unifying Information Model (UIM), rationalize the data semantics an utilize it. Secondly, we depict the MASIA software architecture as fully-fledged software architecture by outlining the functionality of the software architecture components and its capability to enable integration. The remainder of this section is organized as follows. Section 5.2 describes a number of micro-array data sources and the problem scenario. In Section 5.3, micro-array data integration is introduced. Section 5.4 presents the MASIA methodology and requirements. Finally, section 5.5 depicts the proof-of-concept architecture. 18

22 5.2 Micro-Array Data Sources In this section, firstly we discuss the current caveats in data integration for biological data. Then we focus in micro-array data integration problems and depict how the MGED ontology can server as a backbone basis for integration of such formats Heterogeneity in Biological Data The need to manage bioinformatics data has been coming into increasingly sharp focus for some time. Years ago, these data sat in silos attached to specific applications. Then the Web came into the arena, bringing the hurly-burly of data becoming available across applications, departments and entities in general. However, throughout these developments, a particular underlying problem has remained unsolved: data resides in thousands of incompatible formats and cannot be systematically managed, integrated, unified or cleansed. To make matters worse, this incompatibility is not limited to the use of different data technologies or to the multiple different flavors of each tech-nology (for example, the different relational databases in existence), but also because of its incompatibility in terms of semantics. Hence, the most challenging incompatibility arises from semantic differences. In principle, each data asset is set up with its own world-view and vocabulary i.e. its schema. This incompatibility exists even if both assets use the same technology. For example, one database could have a table called Protein A intending to model a particular protein and classifying its function, categorizing it and relating it with some others proteins. Another database could simply refer to the same concept (the very particular Protein A) as a Protein Alfa, (although not including channel partners) and be subdivided in a different way, related to some other proteins and linked to various functions. Since both proteins (despite being the same) present such dissimilarities, they will never be related or co-related. If a particular researcher wants to know all the information about Protein A, it will not be able to accomplish a detailed overview of the information since these sources are absolutely unrelated. In a larger context, this problem may be multiplied by thousands of data structures located in hundreds of incompatible databases and message formats. And the problem is growing; bioinformatics related techniques continue to father more data, reengineer intense and massive data techniques processes and integrate with more sources. Moreover, developers are continuing to write new applications and to create new databases based on requests from users, without worrying about overall data management issues. 19

23 5.2.2 Micro-Array Data Sources The lack of standardization in arrays presents an interoperability problem in bioinformatics, which hinders the exchange of array data. A number of micro-array data sources scattered all over the world are providing such information. One of the most prominent efforts is the Stanford Micro Array Database (SMD) 1, a micro-array experiments results database hoarding data from experiments, public experiments, 7192 spots, 1553 users, 309 labs, 43 organisms and 29 publications. Another micro-array data source is the European Bioinformatics Institute (EBI) ArrayExpress 2, a public repository for micro-array data, complemented by the ArrayExpress Data Warehouse, which stores gene-indexed expression profiles from a particular subset of experiments in the repository. Also the Maxsd 3 project from the University of Man-chester is a data warehouse and visualization environment for genomic expression data. The aforementioned lack of standardization in the data formats of these resources is hampering the potential exchange of array data and analysis. Various grass-roots opensource projects are attempting to facilitate the exchange and analysis of data produced with non-proprietary chips. The "Minimum Information About a Micro-array Experiment" (MIAME) 4 XML based standard for describing a micro-array experiment is being adopted by many journals as a requirement for the submission of papers incorporating micro-array results. The goal of MIAME is to outline the minimum information required to interpret unambiguously and potentially reproduce and verify an array based gene expression monitoring experiment. Although details for particular experiments may be different, MIAME aims to define the core that is common to most experiments. MIAME is not a formal specification, but a set of guidelines. A standard micro-array data model and exchange format MAGE, which is able to capture information specified by MIAME, has been submitted by EBI (for MGED) and Rosetta Biosoftware and recently became an Adopted Specification of the OMG standards group. Many organizations, including Agilent, Affymetrix, and Iobion, have contributed ideas to MAGE. Also, the ArrayExpress source from EBI claims to be compliant with this initiative

24 However, the heterogeneity among these formats and many other that lay beyond the scope of this article has triggered the development of a conceptual model or ontology, the MGED ontology, which will be detailed in the next section The MGED Ontology The MGED ontology is a conceptual model for micro-array experiments in support of MAGE v.1. The aim of MGED is to establish concepts, definitions, terms, and resources for standardized description of a micro-array experiment in support of MAGE v.1. The MGED ontology is divided into the MGED Core ontology which is intended to be stable and in synch with MAGE v.1 and the MGED Extended ontology which adds further associations and classes not found in MAGE v.1. Since the MGED has been recognized as a de-facto unifying terminology but most of the actors in the micro-array data sources scenario, it is a perfect gold standard candidate for being a common understanding model. The MGED is depicted in the next figure. Figure 3. The MGED Ontology As previously mentioned, the primary purpose of the MGED Ontology is to provide standard terms for the annotation of micro-array experiments. These terms will enable structure queries of elements of the experiments. Furthermore, the terms will also enable unambiguous descriptions of how the experiment was performed. The terms will be provided in the form of an ontology which means that the terms will be organized into classes with properties and will be defined. A standard ontology format will be used. 21

25 For descriptions of biological material (biomaterial) and certain treatments used in the experiment, terms may come from external resources that are specified in the Ontology. Software programs utilizing the Ontology are expected to generate forms for annotation, populate databases directly, or generate files in the established MAGE-ML format. Thus, the Ontology will be used directly by investigators annotating their micro-array experiments as well as by software and database developers and therefore will be developed with these very practical applications in mind. Currently, the MGED ontology has an OWL-syntax (see next section) and it had a RDF-S syntax but it was retired due to incompleteness. The fundamental importance of an ontology as a formal specification of a shared conceptualization and the impact on our approach is discussed can be found on section 4.2, where the Semantic Web languages and formal foundations are further explained. 5.3 Micro-Array Data Integration Micro-array experiments are massive data gathering. Efforts to integrate different micro-array data from such experiments will always have to struggle with a large number of physically different data formats. While a common data format will likely never be achieved, the key to efficiently managing data is to establish a common understanding. This is the idea behind semantics, bridging nomenclature and terminological in-consistencies to comprehend underlying meaning in a unified manner. Semantics can be achieved by formally capturing the meaning of data. This is accomplished by relating physical data schemas to concepts in an agreed-upon model. We call this central model which acts as a unifying entity under whose umbrella physical data schemas a Unifying Information Model (UIM). The UIM does not reflect any specific data model, but rather reflects the agreed-upon scientific view, scientific vocabulary and rules which will provide a common basis for understanding data. Semantics build upon traditional informal metadata and captures the formal meaning of data in agreed upon terms. For example, in the MIAME context, a number of Bio-source properties (properties that stem from the probe or sample being examined in the experiment) must be pro-vided, namely: organism (NCBI taxonomy), contact details for sample and several descriptors relevant to the particular sample. These descriptors are: sex, age, development stage, organism part (tissue), cell type, animal/plant strain or line, genetic variation (e.g., gene knockout, transgenic variation), individual genetic characteristics (e.g., disease alleles, 22

26 polymorphisms), disease state or normal, additional clinical information available and the individual (for interrelation of the samples in the experiment). Clearly, a proper management of this hierarchical data structure must take place. Following our approach, the UIM might capture the major meaning or intention of concepts such a descriptors and the more specific concepts of a sex, age and so on and so forth. A semantic mapping will then relate physical data schemas of the various micro-array data formats to the Unifying Information Model. For instance, a semantic mapping might capture the fact that one of the micro-array experiments has a concept called age that is called maturity by a relational database table, years by an XML Schema of another experiment and time-to-live by another experiment. There-fore the semantic mapping formally captures the meaning of the data by reference to the agreed-upon terminology, in this case agreed upon the MAME model as a basis for the UIM. The motivation to capture data semantics gains momentum when data management and use becomes critical for an analysis and statistical treatment of the data. Essentially, semantics saves time by capturing the meaning of data once. Without semantics, each data asset will be interpreted multiple times by different users since it will be designed, implemented, integrated, cleansed, extended, extracted and eventually decommissioned. Furthermore, this independent interpretation will be time-consuming and error-prone. With clearly defined semantics, the data asset is mapped and interpreted only once, prone to be related, processed and linked properly with a number of other subsequent assets. Secondly, new assets can be generated from the Unifying Information Model. Finally, the most significant impact of semantics is a strategic one. Semantics turns hundreds of data sources into one coherent body of information. The semantic architecture includes a record of where data is and what it means. Using this record of which information is represented in each data asset, it becomes possible to automate the search for overlap and redundancy. The UIM provides the basis for creating new data assets in a consistent way and serves as a reliable reference for understanding the interrelationship between disparate sources and for automatically planning how to translate between them. Finally, semantics provides a central basis for impact analysis and for the smooth computer-aided realization of change. In summary, semantics play a key role for integration enabling an effective strategic approach to data management. In the following section, we will present the MASIA methodology and architecture to turn micro-array data integration based on a UIM into a solution for the data integration heterogeneity problem scenario. 23

27 5.4 The MASIA approach In this section, we present the MASIA methodology to tie the both ends of the data sources and the Unifying Information Model (UIM). We also present the software requirements for the software functional requirements of the MASIA system. There are some problems which have to be faced when trying to use Semantic Information Management. Firstly, a fragmented data environment leads to business information quality problems. How to bridge the gap between just simple data towards structured information? Also, information management is a key issue in a dynamic environment such as modern enterprise where application deployment, business process reengineering or possible re-structuring of the data models leads to a burden of hardcoded scripts, data assets and proprietary definitions. Finally, meaning and context of the data must be captured and managed in a way that represents some long-term value for the enterprise. How to bridge the gap between this situation and the Semantic Information Management level is defined by the Semantic Information Management methodology. This methodology is structured such that each stage adds value in its own right, while simultaneously progressing the enterprise towards the benefits of a full semantic data integration. 1. Gather Requirements: Establish the project scope, survey the relevant data sources and capture the organization s information requirements. 2. Collect and classify metadata: Catalog data assets and collect metadata relevant to the organization and its use of data. 3. Construct Unifying Information Model: Capture the desired business world-view, a comprehensive vocabulary and business rules. 4. Rationalize the data semantics: Capture the meaning of data by mapping to the Information Model. 5. Publish/Deploy: Share the Information Model, metadata and semantics with rele-vant stakeholders; customize it to their specialized needs. 6. Utilize: Create processes to ensure utilization of architecture in achieving data management, data integration and data quality. 24

28 Figure 4. Methodology For these steps to be followed, the system must present also some features. Suc-cessful semantically-enhanced integration must be supported by an appropriate suite of architectural components which should be fully integrated. Key components of the supporting system should include: Metadata Repository: A repository for storing metadata on data assets, sche-mas and models associated with the assets. Data Semantics: Integrated tools for ontology modeling to support the crea-tion of a Unifying Information Model and for semantically mapping data schemas to such Unifying Information Model. Data Management Services: The system should use the Unifying Information Model standard business terminology as a lens through which data is man-aged. Data management should include the ability to author and to edit the Information Model, discover data assets for any given business concept, ad-minister data, create reports and statistics about data assets, test and simulate the Unifying Information Model and analyze impact in support of change. Data Integration Services: The system should automatically generate code for queries and data transformation scripts between any two mapped data sche-mas, utilizing the common understanding provided by data semantics. Data Quality Services - In order to provide a systematic approach to data quality, the system should support the identification and decommissioning of redundant data assets. It should support comparison for ensuring consistency among semantically different data and validation/cleansing of individual sources against the central repository of rules. 25

29 Metadata Interface: The system must be able to collect metadata and data models directly from relational databases and other asset types and to ex-change metadata with other metadata repositories. Similarly, the metadata and models accumulated by the system must be open to exchange with other systems through the use of adaptors and standards such as XMI (XML Metadata Interchange standard). Run-Time Interface: A key differentiator of the semantic information technology is the active data integration capabilities. The Run-Time Interface ensures that queries, translation scripts, schemas and cleansing scripts generated automatically by the system may be exported using standard languages. User Interface: The User Interface should include a rich thick-client for power users in the data management group. Platform : The system should include a platform supporting version control, collaboration, permission management and configuration for all metadata and active content in the system. In this section, we have presented the MASIA methodology and software functional requirements, including a scarce description of particular software components functionality and capability. 5.5 The MASIA Software Architecture In this section, we present a novel and promising architecture to tackle with the situation depicted in the previous section. We propose a tailor-made value-adding technological solution which addresses the aforementioned challenges and solve the integration problem regarding to searching, finding, interacting and integrating heterogeneous sources by means of semantic technologies. The MASIA architecture is composed by a number of components depicted in the following figure. 26

30 Figure 5. The MASIA Software Architecture These components will be detailed in what follows: Crawler: It is a software agent which browsers the information sources in a methodical, automatic manner. It is a technology suitable for nearly any application that re-quires full-text search, especially cross-platform. Mappings Engine: The Mappings Engine is a set of integrated tools for semantically mapping data schemas to such Unifying Information Model. The Mappings Engine is to enhance the semi-automatic mapping of schemas and concepts or categories of the UIM in order to alleviate the tedious process which requires human intervention. Since automatic map-ping is envisaged as not recommendable due to semantic incompatibilities and ambi-guities among the source schemas and data formats, it should bridge the gap between cost-efficient machine-learning mapping techniques and pure human interaction. The Mappings Engine takes the MGED ontology (see section 5.2.3) as a conceptual basis for the mappings from the various sources. It will then relate, as explained in section 5.3, data schemas with the semantic structure of the ontology. 27

31 YARS: The YARS (Yet Another RDF Store) 5 system is a semantic data store that allows semantic querying and offers a higher abstraction layer to enable fast storage and retrieval of large amounts of RDF (see section 4.2 for more details about Semantic Web languages such as RDF) while keeping a small footprint and a lightweight architecture approach. YARS deals with data and legacy integration. GUI: This is the component that interacts with the user. It collects the users request and presents the results obtained. In our particular architecture, the GUI will collect requests pertaining to search criteria, such as, for example, descriptor. The GUI communicates with the Execution Manager component providing the user request and displays the results provided as a response from the Execution Manager component. Query Engine: The Query Engine component uses a query language to make queries into the YARS storage system. The semantics of the query are defined not by a precise rendering of a formal syntax, but by an interpretation of the most suitable results of the query. Since YARS stores RDF triples (see section 4.2 for more details about Semantic Web languages), there are a number of possibilities to use as a query language. The RDF Query Language (RDQL) 6, a W3C recommendation which has been replaced by a more recent recommendation, the SPARQL Query Language for RDF 7. Since YARS enables SPARQL querying, due to pragmatic reasons this is the query language of our choice. Execution Manager: The Execution Manager component is the main component of the architecture. It manages the different interactions among the components. Firstly, it communicates with the mapping engine to verify that the information extracted by the crawler are being correctly mapped on the MGED ontology as a Unifying Information Model (UIM) and finally stored into YARS with an RDF syntax. Secondly, it accepts the users search request through the GUI and hand them over the query engine, which, in turn, queries YARS to retrieve all RDF triples related with the particular search criteria. By retrieving a huge number of

32 triples from all the integrated resources, the user benefits from a knowledgeaware search response which is mapped to the underlying terminology and unified criteria of the Unifying Information Model, with the added advantage that all resources can be tracked and identified separately i.e. data provenance can be traced and assigned to a particular resource. 6 BLISS: A Biomedical Literature Social Ranking System With the explosion of online accessible bioinformatics literature, selection of the most suitable resources has become very important for further progress. Bioinformatics literature access relies heavily on the Web, but searching quality literature is hindered by the caveats of information overloading. Recently, the exchange of information on the Web has gained momentum with the raise of some socially-oriented collaborative trends. Current phenomena such as blogging, wikis or social software sites such as Digg or Slashdot have emerged as a paradigm shift in which the consumer-producer equation on the Web has reverted. The increasing success of these initiatives in pointing at and recommending resources on the Web is fuelling a new type of social recommendation for the discovery and location of resources. Together with current Semantic Web technologies and vocabularies that have gained momentum and proved useful, they can help to overcome the significant shortcomings of information overload and foster sharing and collaboration through semantics. In this section, we present the BLISS system, a proof-of-concept implementation of a biological literature social ranking system used in the bioinformatics field. 29

33 6.1 Introduction and Goals Over the past fifteen years we have witnessed a dramatic transformation in the practice of life sciences research. As a consequence, biomedical literature has increased exponentially and its use has been so far the low hanging fruit of searching endlessly documents and articles with old-fashioned information retrieval techniques. Achieving the full potential of current search of biomedical information resources, fundamentally articles about a particular topic or subject, needs the ladder of IT to reach the higher branches. The Web is undergoing significant change with regards to how people communicate. A shift in the Web content consumer-producer paradigm is making the Web a means of conversation, cooperation and mass empowerment. Emerging killer applications combine sharing information, social dimension, undermining the very principles where content have relied for decades, namely information asymmetry and top-down content delivery. The Semantic Web has emerged as an attempt to provide machine-processable metadata to the ever increasing information resources on the Web. Following the paradigm shift, some initiatives such as the FOAF project [FOAF, 05] or the SIOC [SIOC, 05] vocabulary, aim at fostering the social aspect. In these approaches, a vocabulary and proper semantics are defined for widely used terms which benefit thereafter from: annotation, taxonomy or tagging and custom semantics. This does not follow the traditional network definitions of devices or objects (phones, fax machines, computers or documents) being linked but moves to the next level, where what are being linked are people and organizations [Reed et al, 05]. The breakthrough of adding semantic metadata to services (in our case, Web Ser-vices) is the ability to enable automatic or semi-automatic discovery. However, this leads to the so-called chicken-egg problem of metadata. The provider of the service would request for a good excuse or reason, a good application or benefit, from providing the metadata. However, if the metadata is not generated, no application or value-added functionality can be achieved. In this paper, we argue that collaborative discovery can bridge the gap of a number of caveats found in real world scenarios based on an analogy with how current Web resources are being found, shared and provided by the aforementioned social software trends. Hence, collaborative discovery would be learning from the recent changes on the Web becoming simply old wine in new bottles. The remainder of this section is organized as follows. Section 6.2 discusses several aspects of collaborative discovery. In Section 6.3, we discuss how collaborative discovery could bridge the gap between 30

34 semantically-enabled discovery techniques and social software trends. Finally, section 6.4 pro-vides an overview of BLISS, a proof-of-concept implementation used in the bioinformatics field and concludes the paper. 6.2 Collaborative Discovery It has been recently been acknowledged that the blogging and social bookmarking phenomena are the most popular means of communication on the Web, affecting pub-lic opinion and mass media around the world [Kline et al, 05]. A weblog or blog is simply a website in which items are posted and displayed with the newest at the top. They combine text, images and links to other either websites or blogs. Social book-marking consists in locating, ranking, sharing bookmarks and classifying them appropriately by tagging them i.e., assigning them a tag, a descriptive keyword, which summarizes the category they belong to. Even more, the combination of both blogging and social bookmarking has joint into collaborative websites such as Digg 8 or Slashdot 9. In these websites, news, stories and pointers at the location of other interesting Web resources are submitted by users, and then promoted to the front page through a user-based ranking system. This differs from the hierarchical editorial system that many other news sites employ. More particularly, in Digg, readers can go through all of the stories (or pointers to resources location) that have been submitted by the users in the "digg all" section of the site. A digg is a vote coming from a registered user by which he stresses the importance and reveals interest about. Once a story has received enough "diggs", depending on the calculations performed by Digg's algorithm, it appears on Digg's front page. Should the story not receive enough diggs, or if enough users make use of the problem report feature to point out issues with the submission, the story will remain in the "digg all" area, where it may eventually be removed. If a user wants to look for a particular resource on a particular topic, he can use the tag hierarchy to navigate through topics and then find resources that have been considered of interest and useful by the other Digg users

35 Figure 6. A screenshot from Digg This could be envisaged as a collaborative search strategy in which the users act as a filter of the current information overload on the Web. Fundamentally, users are building up a certain type of human metadata to locate in a distributed manner an algorithm to find out real valuable resources. 6.3 Bridging the Gap: Social Semantics In principle, a user is aiming at finding a particular resource to fulfill a particular goal [Fensel & Bussler, 02]. For example, a user would like to locate a History book and he would just try to use one of the world-wide and well-known search engines such as Google, Yahoo, etc. Fundamentally, this can be achieved by two means. Either the service provider (Amazon) provides metadata and waits for a software agent to find and interpret it, finally accessing it or a third party points out at the service by ensuring its quality. An issue that has loomed over this approach is how the lack of motivation, accuracy of efficiency form the provider perspective in providing the metadata hampers its full potential. As we have mentioned in the introduction, the so-called chicken-egg problem 32

36 of metadata shows up. The provider of the service would request for a good excuse or reason, a good application or benefit, from providing the metadata. However, if the metadata is not generated, no application or value-added functionality can be achieved. In the latter approach, let us imagine a user John of the aforementioned software Digg, which points out at a great library where the resource is found by bookmarking, blogging, ranking and qualifying it. The gist of the matter consists of turning this simple recommendation into human-generated metadata. This is what can be achieved using Social Semantics, i.e., semantic metadata harvested from social collaborative software. First of all, automatically generated metadata can be extracted from the assertion by John in a machine-understandable manner, using an RDF vocabulary such as the Semantically Interlinked Online Communities [SIOC, 05] vocabulary. This would imply having a machine-readable syntax and lightweight semantics achieved by the RDF graph (fundamentally, relationships among resources). Secondly, since the assertion has been assessed, ranked and filtered in a collaborative manner from the users of the collaborative software environment, it means that it counts on a wide consensus, which turns the assertion into significant, precise and reliable information. Finally, all this has been achieved without the effort or the least hassle from the service provider, in a cost effective manner and it is ready to be widely used. In addition, the tagging system certainly constitutes an interesting development, since folksonomies that are emerging organically appear to be a potential source of metadata. They arise because a large number of people are interested in particular information and are encouraged to describe it, being it rather than a centralized for of a classification, a free bottom-up attempt to classify information [Shadbolt et al, 06]. They are very near of the concept of shallow ontologies which comprise relatively few unchanging terms that organize very large amounts of data by using a set of very common and alwaysshowing-up terms and relations. Finally, regarding how more metadata could be harvested from the blogging side of the website, we could refer to [Karger and Quan, 04] to find out how blogs provide an important source of metadata. In a nutshell, a blog is structured and annotated, usually can be found in RSS 2.0 (not RDF compliant) but also sometimes in RSS 1.0 (based on RDF). Secondly, it complies with a well-known and widely-used structured with already provides some metadata which could be expressed by, for example, the Dublin Core. Eventually, it is also categorized and tagged. Regarding the effectiveness of the approach, rather than bog ourselves into details, there are a number of social sciences studies about the effect of the opinion of the masses or 33

37 the opinion leader [Wolf, 87] regarding how critical mass in terms of balanced biased of interest and attention attraction, is highly effective. These semantic capabilities are the cornerstones of what we call SITIO, a Social Semantic Recommendation system. The idea is to provide a collaborative discovery system such as Digg or Slashdot with the aforementioned semantic metadata. The advantage of this is, apart from what has been previously stressed, that query answering based on keywords does not allow exploiting the semantics inherent to these structured or semi-structured data formats. They also benefit from formal reasoning and inference strategies for classifying and relating information. 6.4 BLISS: A Biological Literature Social Ranking System As it is pointed out in [Cohen, 04], it is undeniable that, among the sciences, biology played a key role in the twentieth century. That role is likely to acquire further importance in the years to come. In the wake of the work on DNA and the sequencing of the human genome, far-reaching discoveries are constantly being made. One of the central factors promoting the importance of biology is its relationship with medicine. Fundamental progress in medicine depends on elucidating some of the mysteries that occur in the biological sciences. Biologists need software that is reliable and can deal with huge amounts of data as well as interfaces that facilitate the human-machine interactions. Consider a biologist working on osteoporosis, a major bone disease affecting millions of people. To target osteoporosis, understanding the balance between cells, which produce bone substance and cells called osteoclasts which consume it, is important. Imagine that a scientist would like to find fundamental and best-of-breed information about gene expression levels of osteoclasts during cell differentiation. He would also wish to find all the important articles about the subject not having to go through the burden of the vast amount of literature that can be found in digital libraries, journals and huge information repositories. All this information, taken together, may lead to insights into which proteins may be suitable targets to treat bone-related diseases. Most of the needed information and analysis tools are accessible over the Web. However, they are designed for low-throughput human use and not for high-throughput automated use. The vision of a Semantic Web for bioinformatics transparently integrates some of these resources through the use of mark-up languages, ontologies and metadata provided for the applications involved in this process. 34

38 The Biological Literature Social Ranking System (BLISS) is a joint research effort among the Universidad Carlos III de Madrid and the Laboratory of Bioinformatics and Medical Informatics of the University of Texas at Dallas (UTD). A screenshot of BLISS is depicted in the next figure. Figure 7. The BLISS implementation The main features of the system are outlined as follows: The user (biologist, bioinformatician, medical,etc) finds an article interest-ing and wants to communicate it to the community. For that, he selects that article (providing a URL as a pointer) and selecting a category under which it is relevant (e.g. Yeast or Lung Cancer). Users who join the system can, provided their experience in the field, vote and hence rank the documents properly. The more votes an article gets, the higher it climbs up. Potential users (for example, newbies) can then be recommended and suggested a number of articles of particular importance for a number of topics, what, given the social nature of the approach, ensures the quality and feedback of the articles. 35

39 In BLISS, the biologist would find all the information regarding osteoporosis classified under the relevant categories (these ones under the Survey of Oncology Society taxonomy), where users of the system have recommended them. But apart from that, BLISS provides relevant metadata that can be harvested and used for intelligent collaborative discovery, as discussed in section 2. For example, BLISS provides a labelled graph (based on a RDF representation) of resources being recommended under a common topic or with similar features. MEDLINE is a major repository of biomedical literature supported by the U.S. National Library of Medicine (NLM). It currently collects and maintains more than 15 million abstracts in the field of biology and medicine, and is incremented by additional thousands of new articles every day. PubMed is the most popular interface to access the MEDLINE database. If the articles searched by the biologists about osteoporosis are in MEDLINE, he will found them via a PubMed identifier link. He could then be certain that the article is of a certain quality (since it has been verified and recommended by a pool of users) and access it directly via the BLISS interface. Current work in this research line is proving very fruitful and interesting with case studies and scenarios coming from the real world. As the use of new communication paradigms technologies on the Web grows and changes, the problem for finding and relating appropriate resources in order to achieve a particular goal will get more acute. In this paper, we have proposed an approach based on collaborative discovery and social semantic ranking system as a particular means to bridge the gap between provided metadata from the service provider perspective and the current collaborative discovery techniques and initiatives on the Web. The forthcomings of our approach are twofold. On the one hand, current technology can easily add some plug-in to improve and add the described functionality and benefit from the harvesting of more information, such as the previously noted. On the other hand, a more accurate critical mass use of social software based collaborative discovery techniques can foster the effectiveness and efficiency of discovery, enhancing eventually the whole resource discovery approach. We have also presented BLISS, a proof-of-concept implementation that is being used in the bioinformatics field. Finally, our future work will focus on finding more use cases and real-world scenarios to validate the efficiency of our approach and determine the feasibility of the collaborative discovery in particular domains. This work is related to existing efforts about social software and new distributed collaborative trends. 36

40 7 BIRD: Biomedical Information Integration and Discovery with Semantic Web Services Biomedical research is now information intensive; the volume and diversity of new data sources challenges current database technologies. The development and tuning of database technologies for biology and medicine will maintain and accelerate the current pace for innovation and discovery. New promising application fields such as the Semantic Web and Semantic Web Services can leverage the potential of biomedical information integration and discovery, facing the problem of semantic heterogeneity of biomedical information sources in a variety of storage and data formats widely distributed both across the Internet and within individual organizations. In this paper, we present BIRD, a fully-fledged biomedical information integration solution that combines natural language analysis and semantically-empowered techniques to ascertain how the user needs can be best fit. Our approach is backed with a proof-ofconcept implementation where the breakthrough and efficiency of integrating the biomedical publications database PubMed, the Database of Interacting Proteins (DIP) and the Munich Information Center for Protein Sequences (MIPS) has been tested. 7.1 Introduction and Goals Integration and exchange of data within and among organizations is a universally recognized need in bioinformatics and genomics research. By far the most obvious frustration of a life scientist today is the extreme difficulty in putting together information available from multiple distinct sources. A commonly noted obstacle for integration efforts in bioinformatics is that relevant information is widely distributed, both across the Internet and within individual organizations, and is found in a variety of storage formats, both traditional relational databases and non-traditional sources (e.g. text data sources in semi-structured text files or XML, and the result of analytic applications such as gene-finding application or homology searches). Arguably the most critical need in biomedical data integration is to overcome se-mantic heterogeneity i.e. to identify objects in different databases that represent the same or related biological objects (genes, proteins, etc) and to resolve the differences in database structures or schemas, among the related objects. Such data integration is technically difficult for several reasons. First, the technologies on which different databases are based may differ and do not interoperate smoothly. 37

41 Standards for cross-database communication allow the databases (and their users) to exchange information. Secondly, the precise naming conventions for many scientific concepts (such as individual genes, proteins or drugs) in fast developing fields are often inconsistent, and so mappings are required between different vocabularies. Third, the precise underlying biological model for the data may be different (scientists view things differently) and so to integrate these data requires a common model of the concepts that are relevant and their allowable relations. This reason is particularly crucial because non-stated assumptions may lead to improper use of information, on the surface, appears to be valid. Fourth, as our understanding of a particular domain improves, not only will data change, but even database structures will evolve. Any users of the data source, including in particular any data integrators must be able to manage such data source evolution. Since the current Web is an environment primarily developed for human users, the need of adding semantics to the Web becomes more critical as organizations rely on the service-oriented architecture paradigms to expose and data sources by means of Web Services. The Semantic Web is about adding machine-understandable and machineprocessable metadata to Web resources through its key-enabling technology: ontologies [Fensel, 02]. Ontologies are a formal, explicit and shared specification of a conceptualization. The breakthrough of adding semantics to Web Services leads to the Se-mantic Web Services paradigm, which offers the possibility of ascertain which services could best fit the wishes and fulfil the goals of the user. Semantic Web Services can be discovered, located and accessed since they provide formal means of leveraging different vocabularies and terminologies and foster mediation. However, the problem of bridging the gap between the current Web, primarily designed for human users whose intentions are expressed in natural language, and the formalization of those wishes remains. Potential users might deter from using Semantic Web Services, since its underlying formalization and unease of use hampers its use from rich user-interaction perspective. Hence, we present in this paper our work on the Biomedical Information and Integration Discovery with Semantic Web Services (BIRD) platform, which fosters the intelligent interaction between natural language user intentions and the existing Semantic Web Services execution environments. Our contribution is an overall solution, based on a fully-fledged architecture and proof-of-concept implementation that transforms the user intentions into semantically-empowered goals that can be used to en-compass interaction with a number of available Semantic Web Services architectures such as WSMX [WSMX], OWL-S Virtual Machine [OWL-S] and METEOR-S [METEOR-S]. 38

42 The remainder of this paper is organized as follows. Section 7.2 describes the Biomedical Information and Integration Discovery with Semantic Web Services (BIRD) platform for Semantic Web services. In Section 7.3, the interaction between BIRD and the available Semantic Web Services execution environments is introduced. Finally, section 7.4 presents the proof-of-concept implementation based in a real world scenario in which the integration of the biomedical publications database PubMed, the Database of Interacting Proteins (DIP) and the Munich Information Centre for Protein Sequences (MIPS) has been tested. 7.2 BIRD: Biological Information Integration Discovery BIRD is a two-faced software agent designed to interact with and human beings as a gateway or a man-in-the-middle towards Semantic Web Services execution environments. The main goal of the system is to help users express their needs in terms of information retrieval and achieve information integration by means of Semantic Web Services. BIRD allows users to state their needs via natural language or go through a list of the most important terms, extracted from the Gene Ontology (GO). For this, BIRD makes use of ontology-driven data mining. This implies that it firstly captures and gathers which are the terms and the user would like to search (e.g. Gene A, Protein Y) by using as a reference the aforementioned terms of the GO. Secondly, it builds up a lightweight ontology i.e. a very simple graph made of the relationships of those terms. Finally, it looks for which goal from the goal template repository fits better with the search criteria and requirements from the user. A goal in Semantic Web Services technology refers to the aim a user expects to fulfil by the use of the service. Once BIRD has inferred the goals derived from the users wishes, it sends them to the suitable Semantic Web Services execution environment, which will retrieve the out-come resulting of the integration of the applications being accessed (e.g. all the bio-medical, biological publications and medical databases). The aim of this section is to describe the functionality of the components in the architecture. Loose coupling and reusability of components have been the major intentions behind the architectural decisions and implementation. Some of these details are reflected in the particular components to make them more understandable. 39

43 Figure 8. The BIRD Architecture This figure depicts the main components of BIRD. The core component is the Control Manager, which supervises all the process and act as intermediary among the other components. The GUI is placed between the user and the Control Manager. The users have two possibilities: they can either introduce a text in natural language or use the ontology-guided tool (but this option would require further work which is not envisaged in this section), which assists them in expressing their goals. When the Control Manager has extracted the user intention, it invokes the Goal Loader. The Goal Loader retrieves all the possible goals from the repository and the Goal Matcher infers which goals are needed to achieve the users requests. Finally, the Control Manager sends separately these goals to the Goal Sender, which is responsible for dispatching them to the suitable execution environment. In the following subsections, a concise description of each of these components will be presented. Language Analyzer: The task of the Language Analyzer is to filter and process the input introduced by the user in natural language and determine the concepts (attributes and values) and relations included in it. Goal Loader: This component looks for goal templates in the Goal Template Repository where different types of goals are stored. Actually, the Goal Loader retrieves all the goal tem-plates and transmits them to the Control Manager. Since in this version of BIRD there is no fixed Semantic Web Services execution environment, different types of goal repositories are taken into 40

44 account. The repository is outside the architecture so that anybody may plug in his/her own goal repository. Goal Matcher: Matching is a widely-used term, which, in our case, encompasses a syntactic and se-mantic perspective. The Goal Matcher compares the ontology elements obtained from the analysis of the user s wishes to the description of the goal templates extracted from the repository. From this matching, several goals are selected that are composed by the Control Manager in order to build up the sequence of execution. Goal Sender: This component sends the different goals to the execution environment, which returns the results obtained from the execution of the services. Its functionality is quite simple since the sequence of execution is predefined in the BIRD Control Manager. The sending of goals is sequential, without taking into account any other workflow constructs. GUI: This is the component that interacts with the user. It collects the users request and presents the results obtained to them. The following figure depicts the basic outlook of the GUI. Figure 9. Simple GUI outlook 41

45 Control Manager: This is the main component of the architecture. It manages the different interactions among the components. Firstly, it accepts the users input through the GUI. It can be either natural language text or a structured sentence written with the assistance of the Ontology-guided Input. If the input is in natural language, then it instructs the Language Analyzer to attempt the recognition of the major concepts in the text and communicates with the Goal Loader and the Goal Matcher to orchestrate the different goals that will be sent to the execution environment through the Goal Sender. Then, it communicates with the GUI so that the users receive a view of the selected goals and decides if they are correct and comply with their expectations. Finally, if the user approves them, they are sequentially sent. In this section, we have depicted the BIRD architecture. In the following sections, we will discuss how to deal with the Semantic Web Services approach. 7.3 Needle in a haystack: Dealing with Semantic Web Services One of the most important features of the system is its capability to interoperate with different Semantic Web Services execution environments. Several approaches dealing with the Semantic Web Services have emerged. As the process to find a common standard for this technology has not yet finalized, it is important not to obviate any of them. Therefore, BIRD has been designed to support and interact with some of these approaches (those incorporating an execution environment). In this section, the whole set of approaches submitted to the W3C (World Wide Web Consortium) are briefly described, and the way BIRD deals with them is detailed. Along with some of the W3C submissions for Semantic Web Services, execution environments have also been defined to automate the execution of Web Services semantically annotated. Therefore, WSMX for example, as we pointed out before, is an execution environment aimed at serving as reference implementation for WSMO. It enables automatic discovery, selection, mediation, invocation and interoperation of Semantic Web Services by accepting goals in a specified format as input and returning the Web Services invocation results. Another example is the OWL-S Virtual Machine [2] for OWL-S annotated Web Services. It uses OWL-S description of Web Services and OWL ontologies to control the interaction between Web Services. Similarly to WSMX, OWL-S Virtual Machine is a complete framework for Semantic Web Services that starts from parsing an OWL-S 42

46 description, and executes the process model consistently with the OWL-S opera-tional semantics. Besides, it uses the OWL-S Grounding to transform the abstract description of the information exchanges between the provider and the requester into WSDL operations. Finally, METEOR-S [3] can be associated with WSDL-S. The METEOR-S project attempts to add semantics to the complete Web process lifecycle. Particularly, four categories of semantics are identified: Data Semantics (semantics of inputs/outputs of Web Services), Functional Semantics (what does a service do?), Execution Semantics (correctness and verification of execution), and QoS Semantics (performance and cost parameters associated with service). The main advantage of this approach is that it is built upon existing service oriented architecture and Semantic Web standards if possible, thus adding semantics to current industry standards. BIRD is able to interact with these three execution environments. It possesses a goal template repository that contains a different set of goal templates for each execu-tion environment. When BIRD obtains the knowledge representing the user goals, it tries to match it with the whole set of goal templates. It can be the case that goal tem-plates of these three different execution environments are needed in order to achieve the user goals. If so, BIRD sends the goals to the different execution environments sequentially as needed. It is also possible that all the goals can be accomplished from the same execution environment, thus being easier for BIRD to achieve the user expectation. The way it works is depicted in the Figure 6. 43

47 Figure 10. BIRD dealing with Semantic Web Execution Environments In this section, we have tackled how BIRD deals with Semantic Web Services Execution Environments. In the following, we will focus on the breakthroughs and efficiency of applying this approach to biomedical data sources. 7.4 Using BIRD for Biomedical Information Integration In this section, we subsequently present a real-world based use case scenario in order to show the advantages provided by BIRD from the user perspective. There are three data sources that are being integrated, detailed in what follows. Firstly, the PubMed 10 database, a free search engine offered by the United States National Library of Medicine (as part of the Entrez information retrieval system). The inclusion of an article in PubMed does not endorse that article contents and the ser-vice allows searching the MEDLINE database. MEDLINE covers over 4,800 journals published and also offers access to citations to articles that are out-of-scope (e.g., covering plate tectonics or astrophysics) from certain MEDLINE journals, primarily general science and general chemistry journals, for which the life sciences articles are indexed for MEDLINE, in-process citations which provide a record for an article before it is indexed and added to MEDLINE or converted to out-of-scope status and

2 The IBM Data Governance Unified Process

2 The IBM Data Governance Unified Process 2 The IBM Data Governance Unified Process The benefits of a commitment to a comprehensive enterprise Data Governance initiative are many and varied, and so are the challenges to achieving strong Data Governance.

More information

Acquiring Experience with Ontology and Vocabularies

Acquiring Experience with Ontology and Vocabularies Acquiring Experience with Ontology and Vocabularies Walt Melo Risa Mayan Jean Stanford The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended

More information

The National Cancer Institute's Thésaurus and Ontology

The National Cancer Institute's Thésaurus and Ontology The National Cancer Institute's Thésaurus and Ontology Jennifer Golbeck 1, Gilberto Fragoso 2, Frank Hartel 2, Jim Hendler 1, Jim Oberthaler 2, Bijan Parsia 1 1 University of Maryland, College Park 2 National

More information

Semantic Web: vision and reality

Semantic Web: vision and reality Semantic Web: vision and reality Mile Jovanov, Marjan Gusev Institute of Informatics, FNSM, Gazi Baba b.b., 1000 Skopje {mile, marjan}@ii.edu.mk Abstract. Semantic Web is set of technologies currently

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Most of today s Web content is intended for the use of humans rather than machines. While searching documents on the Web using computers, human interpretation is required before

More information

Semantic Web. Dr. Philip Cannata 1

Semantic Web. Dr. Philip Cannata 1 Semantic Web Dr. Philip Cannata 1 Dr. Philip Cannata 2 Dr. Philip Cannata 3 Dr. Philip Cannata 4 See data 14 Scientific American.sql on the class website calendar SELECT strreplace(x, 'sa:', '') "C" FROM

More information

Languages and tools for building and using ontologies. Simon Jupp, James Malone

Languages and tools for building and using ontologies. Simon Jupp, James Malone An overview of ontology technology Languages and tools for building and using ontologies Simon Jupp, James Malone jupp@ebi.ac.uk, malone@ebi.ac.uk Outline Languages OWL and OBO classes, individuals, relations,

More information

Integrating large, fast-moving, and heterogeneous data sets in biology.

Integrating large, fast-moving, and heterogeneous data sets in biology. Integrating large, fast-moving, and heterogeneous data sets in biology. C. Titus Brown Asst Prof, CSE and Microbiology; BEACON NSF STC Michigan State University ctb@msu.edu Introduction Background: Modeling

More information

Using Ontologies for Data and Semantic Integration

Using Ontologies for Data and Semantic Integration Using Ontologies for Data and Semantic Integration Monica Crubézy Stanford Medical Informatics, Stanford University ~~ November 4, 2003 Ontologies Conceptualize a domain of discourse, an area of expertise

More information

Electronic Health Records with Cleveland Clinic and Oracle Semantic Technologies

Electronic Health Records with Cleveland Clinic and Oracle Semantic Technologies Electronic Health Records with Cleveland Clinic and Oracle Semantic Technologies David Booth, Ph.D., Cleveland Clinic (contractor) Oracle OpenWorld 20-Sep-2010 Latest version of these slides: http://dbooth.org/2010/oow/

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

NCBI News, November 2009

NCBI News, November 2009 Peter Cooper, Ph.D. NCBI cooper@ncbi.nlm.nh.gov Dawn Lipshultz, M.S. NCBI lipshult@ncbi.nlm.nih.gov Featured Resource: New Discovery-oriented PubMed and NCBI Homepage The NCBI Site Guide A new and improved

More information

National Antimicrobial Resistance Surveillance

National Antimicrobial Resistance Surveillance Discover Discover Kalamazoo Strategic Profile 017-00 APRIL 017 INTRODUCTION Strategic Profile: 017 00 Surveillance for antimicrobial resistance provides vital data on the emergence and spread of resistant

More information

Smart Open Services for European Patients. Work Package 3.5 Semantic Services Definition Appendix E - Ontology Specifications

Smart Open Services for European Patients. Work Package 3.5 Semantic Services Definition Appendix E - Ontology Specifications 24Am Smart Open Services for European Patients Open ehealth initiative for a European large scale pilot of Patient Summary and Electronic Prescription Work Package 3.5 Semantic Services Definition Appendix

More information

Natural Language Processing with PoolParty

Natural Language Processing with PoolParty Natural Language Processing with PoolParty Table of Content Introduction to PoolParty 2 Resolving Language Problems 4 Key Features 5 Entity Extraction and Term Extraction 5 Shadow Concepts 6 Word Sense

More information

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE YING DING 1 Digital Enterprise Research Institute Leopold-Franzens Universität Innsbruck Austria DIETER FENSEL Digital Enterprise Research Institute National

More information

SEBI: An Architecture for Biomedical Image Discovery, Interoperability and Reusability based on Semantic Enrichment

SEBI: An Architecture for Biomedical Image Discovery, Interoperability and Reusability based on Semantic Enrichment SEBI: An Architecture for Biomedical Image Discovery, Interoperability and Reusability based on Semantic Enrichment Ahmad C. Bukhari 1, Michael Krauthammer 2, Christopher J.O. Baker 1 1 Department of Computer

More information

Protein Data Bank Japan

Protein Data Bank Japan Protein Data Bank Japan http://www.pdbj.org/ PDBj Today gene information for many species is just at the point of being revealed. To make use of this information, it is necessary to look at the proteins

More information

Data Virtualization Implementation Methodology and Best Practices

Data Virtualization Implementation Methodology and Best Practices White Paper Data Virtualization Implementation Methodology and Best Practices INTRODUCTION Cisco s proven Data Virtualization Implementation Methodology and Best Practices is compiled from our successful

More information

SIEM Solutions from McAfee

SIEM Solutions from McAfee SIEM Solutions from McAfee Monitor. Prioritize. Investigate. Respond. Today s security information and event management (SIEM) solutions need to be able to identify and defend against attacks within an

More information

Automating Instance Migration in Response to Ontology Evolution

Automating Instance Migration in Response to Ontology Evolution Automating Instance Migration in Response to Ontology Evolution Mark Fischer 1, Juergen Dingel 1, Maged Elaasar 2, Steven Shaw 3 1 Queen s University, {fischer,dingel}@cs.queensu.ca 2 Carleton University,

More information

The MUSING Approach for Combining XBRL and Semantic Web Data. ~ Position Paper ~

The MUSING Approach for Combining XBRL and Semantic Web Data. ~ Position Paper ~ The MUSING Approach for Combining XBRL and Semantic Web Data ~ Position Paper ~ Christian F. Leibold 1, Dumitru Roman 1, Marcus Spies 2 1 STI Innsbruck, Technikerstr. 21a, 6020 Innsbruck, Austria {Christian.Leibold,

More information

Unstructured Text in Big Data The Elephant in the Room

Unstructured Text in Big Data The Elephant in the Room Unstructured Text in Big Data The Elephant in the Room David Milward ICIC, October 2013 Click Unstructured to to edit edit Master Master Big title Data style title style Big Data Volume, Variety, Velocity

More information

Towards Semantic Data Mining

Towards Semantic Data Mining Towards Semantic Data Mining Haishan Liu Department of Computer and Information Science, University of Oregon, Eugene, OR, 97401, USA ahoyleo@cs.uoregon.edu Abstract. Incorporating domain knowledge is

More information

Health Information Exchange Content Model Architecture Building Block HISO

Health Information Exchange Content Model Architecture Building Block HISO Health Information Exchange Content Model Architecture Building Block HISO 10040.2 To be used in conjunction with HISO 10040.0 Health Information Exchange Overview and Glossary HISO 10040.1 Health Information

More information

D360: Unlock the value of your scientific data Solving Informatics Problems for Translational Research

D360: Unlock the value of your scientific data Solving Informatics Problems for Translational Research D360: Unlock the value of your scientific data Solving Informatics Problems for Translational Research Dr. Fabian Bös, Senior Application Scientist Certara Spain SL Martin-Kollar-Str. 17, 81829 Munich

More information

Knowledge and Ontological Engineering: Directions for the Semantic Web

Knowledge and Ontological Engineering: Directions for the Semantic Web Knowledge and Ontological Engineering: Directions for the Semantic Web Dana Vaughn and David J. Russomanno Department of Electrical and Computer Engineering The University of Memphis Memphis, TN 38152

More information

Semantic agents for location-aware service provisioning in mobile networks

Semantic agents for location-aware service provisioning in mobile networks Semantic agents for location-aware service provisioning in mobile networks Alisa Devlić University of Zagreb visiting doctoral student at Wireless@KTH September 9 th 2005. 1 Agenda Research motivation

More information

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016 Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016 Introduction One trillion is a really big number. What could you store with one trillion facts?» 1000

More information

Integrating IEC & IEEE 1815 (DNP3)

Integrating IEC & IEEE 1815 (DNP3) Integrating IEC 61850 & IEEE 1815 (DNP3) Andrew West Regional Technical Director, SUBNET Solutions, Inc. SUMMARY North America has a mature electric power grid. The majority of grid automation changes

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Semantics Modeling and Representation. Wendy Hui Wang CS Department Stevens Institute of Technology

Semantics Modeling and Representation. Wendy Hui Wang CS Department Stevens Institute of Technology Semantics Modeling and Representation Wendy Hui Wang CS Department Stevens Institute of Technology hwang@cs.stevens.edu 1 Consider the following data: 011500 18.66 0 0 62 46.271020111 25.220010 011500

More information

STORAGE EFFICIENCY: MISSION ACCOMPLISHED WITH EMC ISILON

STORAGE EFFICIENCY: MISSION ACCOMPLISHED WITH EMC ISILON STORAGE EFFICIENCY: MISSION ACCOMPLISHED WITH EMC ISILON Mission-critical storage management for military and intelligence operations EMC PERSPECTIVE TABLE OF CONTENTS INTRODUCTION 3 THE PROBLEM 3 THE

More information

Electrical engineering. data management. A practical foundation for a true mechatronic data model

Electrical engineering. data management. A practical foundation for a true mechatronic data model W H I T E P A P E R Z u k e n T h e P a r t n e r f o r S u c c e s s Electrical engineering data management A practical foundation for a true mechatronic data model d a t a m a n a g e m e n t z u k e

More information

BPS Suite and the OCEG Capability Model. Mapping the OCEG Capability Model to the BPS Suite s product capability.

BPS Suite and the OCEG Capability Model. Mapping the OCEG Capability Model to the BPS Suite s product capability. BPS Suite and the OCEG Capability Model Mapping the OCEG Capability Model to the BPS Suite s product capability. BPS Contents Introduction... 2 GRC activities... 2 BPS and the Capability Model for GRC...

More information

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment Shigeo Sugimoto Research Center for Knowledge Communities Graduate School of Library, Information

More information

Evolving SQL Queries for Data Mining

Evolving SQL Queries for Data Mining Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper

More information

Information mining and information retrieval : methods and applications

Information mining and information retrieval : methods and applications Information mining and information retrieval : methods and applications J. Mothe, C. Chrisment Institut de Recherche en Informatique de Toulouse Université Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 1 Databases and Database Users Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1-2 OUTLINE Types of Databases and Database Applications

More information

Topics of the talk. Biodatabases. Data types. Some sequence terminology...

Topics of the talk. Biodatabases. Data types. Some sequence terminology... Topics of the talk Biodatabases Jarno Tuimala / Eija Korpelainen CSC What data are stored in biological databases? What constitutes a good database? Nucleic acid sequence databases Amino acid sequence

More information

NERC CIP VERSION 6 BACKGROUND COMPLIANCE HIGHLIGHTS

NERC CIP VERSION 6 BACKGROUND COMPLIANCE HIGHLIGHTS NERC CIP VERSION 6 COMPLIANCE BACKGROUND The North American Electric Reliability Corporation (NERC) Critical Infrastructure Protection (CIP) Reliability Standards define a comprehensive set of requirements

More information

An Algebra for Protein Structure Data

An Algebra for Protein Structure Data An Algebra for Protein Structure Data Yanchao Wang, and Rajshekhar Sunderraman Abstract This paper presents an algebraic approach to optimize queries in domain-specific database management system for protein

More information

An Infrastructure for MultiMedia Metadata Management

An Infrastructure for MultiMedia Metadata Management An Infrastructure for MultiMedia Metadata Management Patrizia Asirelli, Massimo Martinelli, Ovidio Salvetti Istituto di Scienza e Tecnologie dell Informazione, CNR, 56124 Pisa, Italy {Patrizia.Asirelli,

More information

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company Taxonomy Tools: Collaboration, Creation & Integration Dave Clarke Global Taxonomy Director dave.clarke@dowjones.com Dow Jones & Company Introduction Software Tools for Taxonomy 1. Collaboration 2. Creation

More information

SELF-SERVICE SEMANTIC DATA FEDERATION

SELF-SERVICE SEMANTIC DATA FEDERATION SELF-SERVICE SEMANTIC DATA FEDERATION WE LL MAKE YOU A DATA SCIENTIST Contact: IPSNP Computing Inc. Chris Baker, CEO Chris.Baker@ipsnp.com (506) 721 8241 BIG VISION: SELF-SERVICE DATA FEDERATION Biomedical

More information

Helmi Ben Hmida Hannover University, Germany

Helmi Ben Hmida Hannover University, Germany Helmi Ben Hmida Hannover University, Germany 1 Summarizing the Problem: Computers don t understand Meaning My mouse is broken. I need a new one 2 The Semantic Web Vision the idea of having data on the

More information

Vulnerability Assessments and Penetration Testing

Vulnerability Assessments and Penetration Testing CYBERSECURITY Vulnerability Assessments and Penetration Testing A guide to understanding vulnerability assessments and penetration tests. OVERVIEW When organizations begin developing a strategy to analyze

More information

OPTIMIZATION MAXIMIZING TELECOM AND NETWORK. The current state of enterprise optimization, best practices and considerations for improvement

OPTIMIZATION MAXIMIZING TELECOM AND NETWORK. The current state of enterprise optimization, best practices and considerations for improvement MAXIMIZING TELECOM AND NETWORK OPTIMIZATION The current state of enterprise optimization, best practices and considerations for improvement AOTMP.com The Next Evolution of Telecom Management OVERVIEW As

More information

Study on the Application Analysis and Future Development of Data Mining Technology

Study on the Application Analysis and Future Development of Data Mining Technology Study on the Application Analysis and Future Development of Data Mining Technology Ge ZHU 1, Feng LIN 2,* 1 Department of Information Science and Technology, Heilongjiang University, Harbin 150080, China

More information

Industry Adoption of Semantic Web Technology

Industry Adoption of Semantic Web Technology IBM China Research Laboratory Industry Adoption of Semantic Web Technology Dr. Yue Pan panyue@cn.ibm.com Outline Business Drivers Industries as early adopters A Software Roadmap Conclusion Data Semantics

More information

Exploring Persuasiveness of Just-in-time Motivational Messages for Obesity Management

Exploring Persuasiveness of Just-in-time Motivational Messages for Obesity Management Exploring Persuasiveness of Just-in-time Motivational Messages for Obesity Management Megha Maheshwari 1, Samir Chatterjee 1, David Drew 2 1 Network Convergence Lab, Claremont Graduate University http://ncl.cgu.edu

More information

Intelligent Operations Utilizing IT to Accelerate Social Innovation

Intelligent Operations Utilizing IT to Accelerate Social Innovation 324 Hitachi Review Vol. 63 (2014), No. 6 Featured Articles Operations Utilizing IT to Accelerate Social Innovation Toshiyuki Moritsu, Ph.D. Keiko Fukumoto Junji Noguchi, Ph.D. Takeshi Ishizaki Katsuya

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

SAS and Grid Computing Maximize Efficiency, Lower Total Cost of Ownership Cheryl Doninger, SAS Institute, Cary, NC

SAS and Grid Computing Maximize Efficiency, Lower Total Cost of Ownership Cheryl Doninger, SAS Institute, Cary, NC Paper 227-29 SAS and Grid Computing Maximize Efficiency, Lower Total Cost of Ownership Cheryl Doninger, SAS Institute, Cary, NC ABSTRACT IT budgets are declining and data continues to grow at an exponential

More information

Transforming Security from Defense in Depth to Comprehensive Security Assurance

Transforming Security from Defense in Depth to Comprehensive Security Assurance Transforming Security from Defense in Depth to Comprehensive Security Assurance February 28, 2016 Revision #3 Table of Contents Introduction... 3 The problem: defense in depth is not working... 3 The new

More information

Six Sigma in the datacenter drives a zero-defects culture

Six Sigma in the datacenter drives a zero-defects culture Six Sigma in the datacenter drives a zero-defects culture Situation Like many IT organizations, Microsoft IT wants to keep its global infrastructure available at all times. Scope, scale, and an environment

More information

INTRODUCING A MULTIVIEW SOFTWARE ARCHITECTURE PROCESS BY EXAMPLE Ahmad K heir 1, Hala Naja 1 and Mourad Oussalah 2

INTRODUCING A MULTIVIEW SOFTWARE ARCHITECTURE PROCESS BY EXAMPLE Ahmad K heir 1, Hala Naja 1 and Mourad Oussalah 2 INTRODUCING A MULTIVIEW SOFTWARE ARCHITECTURE PROCESS BY EXAMPLE Ahmad K heir 1, Hala Naja 1 and Mourad Oussalah 2 1 Faculty of Sciences, Lebanese University 2 LINA Laboratory, University of Nantes ABSTRACT:

More information

The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization

The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization May 2014 Prepared by: Zeus Kerravala The Top Five Reasons to Deploy Software-Defined Networks and Network Functions

More information

Properties of Biological Networks

Properties of Biological Networks Properties of Biological Networks presented by: Ola Hamud June 12, 2013 Supervisor: Prof. Ron Pinter Based on: NETWORK BIOLOGY: UNDERSTANDING THE CELL S FUNCTIONAL ORGANIZATION By Albert-László Barabási

More information

M2 Glossary of Terms and Abbreviations

M2 Glossary of Terms and Abbreviations M2 Glossary of Terms and Abbreviations 11 June 2015 M2: Electronic Standards for the Transfer of Regulatory Information Updated at ICH Expert Working Group meeting, Fukuoka, June 2015 Definitions... 2

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

January 16, Re: Request for Comment: Data Access and Data Sharing Policy. Dear Dr. Selby:

January 16, Re: Request for Comment: Data Access and Data Sharing Policy. Dear Dr. Selby: Dr. Joe V. Selby, MD, MPH Executive Director Patient-Centered Outcomes Research Institute 1828 L Street, NW, Suite 900 Washington, DC 20036 Submitted electronically at: http://www.pcori.org/webform/data-access-and-data-sharing-policypublic-comment

More information

Evolution of the ICT Field: New Requirements to Specialists

Evolution of the ICT Field: New Requirements to Specialists Lumen 2/2017 ARTICLE Evolution of the ICT Field: New Requirements to Specialists Vladimir Ryabov, PhD, Principal Lecturer, School of Business and Culture, Lapland UAS Tuomo Lindholm, MBA, Senior Lecturer,

More information

Definition of Information Systems

Definition of Information Systems Information Systems Modeling To provide a foundation for the discussions throughout this book, this chapter begins by defining what is actually meant by the term information system. The focus is on model-driven

More information

Grid Computing a new tool for science

Grid Computing a new tool for science Grid Computing a new tool for science CERN, the European Organization for Nuclear Research Dr. Wolfgang von Rüden Wolfgang von Rüden, CERN, IT Department Grid Computing July 2006 CERN stands for over 50

More information

An Integrated Framework to Enhance the Web Content Mining and Knowledge Discovery

An Integrated Framework to Enhance the Web Content Mining and Knowledge Discovery An Integrated Framework to Enhance the Web Content Mining and Knowledge Discovery Simon Pelletier Université de Moncton, Campus of Shippagan, BGI New Brunswick, Canada and Sid-Ahmed Selouani Université

More information

The IDN Variant TLD Program: Updated Program Plan 23 August 2012

The IDN Variant TLD Program: Updated Program Plan 23 August 2012 The IDN Variant TLD Program: Updated Program Plan 23 August 2012 Table of Contents Project Background... 2 The IDN Variant TLD Program... 2 Revised Program Plan, Projects and Timeline:... 3 Communication

More information

Automatic Test Markup Language Sept 28, 2004

Automatic Test Markup Language <ATML/> Sept 28, 2004 Automatic Test Markup Language Sept 28, 2004 ATML Document Page 1 of 16 Contents Automatic Test Markup Language...1 ...1 1 Introduction...3 1.1 Mission Statement...3 1.2...3 1.3...3 1.4

More information

Semantic Web Mining and its application in Human Resource Management

Semantic Web Mining and its application in Human Resource Management International Journal of Computer Science & Management Studies, Vol. 11, Issue 02, August 2011 60 Semantic Web Mining and its application in Human Resource Management Ridhika Malik 1, Kunjana Vasudev 2

More information

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS Manoj Paul, S. K. Ghosh School of Information Technology, Indian Institute of Technology, Kharagpur 721302, India - (mpaul, skg)@sit.iitkgp.ernet.in

More information

Automatic Generation of Workflow Provenance

Automatic Generation of Workflow Provenance Automatic Generation of Workflow Provenance Roger S. Barga 1 and Luciano A. Digiampietri 2 1 Microsoft Research, One Microsoft Way Redmond, WA 98052, USA 2 Institute of Computing, University of Campinas,

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

e-infrastructures in FP7 INFO DAY - Paris

e-infrastructures in FP7 INFO DAY - Paris e-infrastructures in FP7 INFO DAY - Paris Carlos Morais Pires European Commission DG INFSO GÉANT & e-infrastructure Unit 1 Global challenges with high societal impact Big Science and the role of empowered

More information

ICME: Status & Perspectives

ICME: Status & Perspectives ICME: Status & Perspectives from Materials Science and Engineering Surya R. Kalidindi Georgia Institute of Technology New Strategic Initiatives: ICME, MGI Reduce expensive late stage iterations Materials

More information

Universal Model Framework -- An Introduction

Universal Model Framework -- An Introduction Universal Model Framework -- An Introduction By Visible Systems Corporation www.visible.com This document provides an introductory description of the Universal Model Framework an overview of its construct

More information

A Design Rationale Representation for Model-Based Designs in Software Engineering

A Design Rationale Representation for Model-Based Designs in Software Engineering A Design Rationale Representation for Model-Based Designs in Software Engineering Adriana Pereira de Medeiros, Daniel Schwabe, and Bruno Feijó Dept. of Informatics, PUC-Rio, Rua Marquês de São Vicente

More information

A Web-Based Protocol Tracking Management System For Clinical Research

A Web-Based Protocol Tracking Management System For Clinical Research A Web-Based Protocol Tracking Management System For Clinical Research Huey Cheung a, Yang Fann b, Shaohua A. Wang a, Barg Upender a, Adam Frazin a Raj Lingam b, Sarada Chintala a, Frank Pecjak a, Gladys

More information

A Formal Approach for the Inference Plane Supporting Integrated Management Tasks in the Future Internet in ManFI Selected Management Topics Session

A Formal Approach for the Inference Plane Supporting Integrated Management Tasks in the Future Internet in ManFI Selected Management Topics Session In conjuction with: A Formal Approach for the Inference Plane Supporting Integrated Management Tasks in the Future Internet in ManFI Selected Management Topics Session Martín Serrano Researcher at TSSG-WIT

More information

Korea Institute of Oriental Medicine, South Korea 2 Biomedical Knowledge Engineering Laboratory,

Korea Institute of Oriental Medicine, South Korea 2 Biomedical Knowledge Engineering Laboratory, A Medical Treatment System based on Traditional Korean Medicine Ontology Sang-Kyun Kim 1, SeJin Nam 2, Dong-Hun Park 1, Yong-Taek Oh 1, Hyunchul Jang 1 1 Literature & Informatics Research Division, Korea

More information

Ontology Refinement and Evaluation based on is-a Hierarchy Similarity

Ontology Refinement and Evaluation based on is-a Hierarchy Similarity Ontology Refinement and Evaluation based on is-a Hierarchy Similarity Takeshi Masuda The Institute of Scientific and Industrial Research, Osaka University Abstract. Ontologies are constructed in fields

More information

Shine a Light on Dark Data with Vertica Flex Tables

Shine a Light on Dark Data with Vertica Flex Tables White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,

More information

From IHE Audit Trails to XES Event Logs Facilitating Process Mining

From IHE Audit Trails to XES Event Logs Facilitating Process Mining 40 Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed

More information

Available online at ScienceDirect. Procedia Computer Science 52 (2015 )

Available online at  ScienceDirect. Procedia Computer Science 52 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 52 (2015 ) 1071 1076 The 5 th International Symposium on Frontiers in Ambient and Mobile Systems (FAMS-2015) Health, Food

More information

Comparison of FP tree and Apriori Algorithm

Comparison of FP tree and Apriori Algorithm International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.78-82 Comparison of FP tree and Apriori Algorithm Prashasti

More information

Data Cleansing Strategies

Data Cleansing Strategies Page 1 of 8 Data Cleansing Strategies InfoManagement Direct, October 2004 Kuldeep Dongre The presence of data alone does not ensure that all the management functions and decisions can be smoothly undertaken.

More information

Introduction to Data Science

Introduction to Data Science UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics

More information

Extracting knowledge from Ontology using Jena for Semantic Web

Extracting knowledge from Ontology using Jena for Semantic Web Extracting knowledge from Ontology using Jena for Semantic Web Ayesha Ameen I.T Department Deccan College of Engineering and Technology Hyderabad A.P, India ameenayesha@gmail.com Khaleel Ur Rahman Khan

More information

Featured Articles AI Services and Platforms A Practical Approach to Increasing Business Sophistication

Featured Articles AI Services and Platforms A Practical Approach to Increasing Business Sophistication 118 Hitachi Review Vol. 65 (2016), No. 6 Featured Articles AI Services and Platforms A Practical Approach to Increasing Business Sophistication Yasuharu Namba, Dr. Eng. Jun Yoshida Kazuaki Tokunaga Takuya

More information

UNCLASSIFIED. FY 2016 Base FY 2016 OCO

UNCLASSIFIED. FY 2016 Base FY 2016 OCO Exhibit R-2, RDT&E Budget Item Justification: PB 2016 Office of the Secretary Of Defense Date: February 2015 0400: Research, Development, Test & Evaluation, Defense-Wide / BA 2: COST ($ in Millions) Prior

More information

THE GETTY VOCABULARIES TECHNICAL UPDATE

THE GETTY VOCABULARIES TECHNICAL UPDATE AAT TGN ULAN CONA THE GETTY VOCABULARIES TECHNICAL UPDATE International Working Group Meetings January 7-10, 2013 Joan Cobb Gregg Garcia Information Technology Services J. Paul Getty Trust International

More information

Collaborative Ontology Construction using Template-based Wiki for Semantic Web Applications

Collaborative Ontology Construction using Template-based Wiki for Semantic Web Applications 2009 International Conference on Computer Engineering and Technology Collaborative Ontology Construction using Template-based Wiki for Semantic Web Applications Sung-Kooc Lim Information and Communications

More information

Taccumulation of the social network data has raised

Taccumulation of the social network data has raised International Journal of Advanced Research in Social Sciences, Environmental Studies & Technology Hard Print: 2536-6505 Online: 2536-6513 September, 2016 Vol. 2, No. 1 Review Social Network Analysis and

More information

General Framework for Secure IoT Systems

General Framework for Secure IoT Systems General Framework for Secure IoT Systems National center of Incident readiness and Strategy for Cybersecurity (NISC) Government of Japan August 26, 2016 1. General Framework Objective Internet of Things

More information

The roles and limitations of the Semantic Web are still unclear. The Semantic Web hopes to provide reliable, cheap, and speedy access to data.

The roles and limitations of the Semantic Web are still unclear. The Semantic Web hopes to provide reliable, cheap, and speedy access to data. SEMANTIC WEB December 22, 2007 We need a unifying logical language for data - for the machine interfaces to data systems - in the same way that HTML was a unifying language for human interfaces to information

More information

Towards Ontology Mapping: DL View or Graph View?

Towards Ontology Mapping: DL View or Graph View? Towards Ontology Mapping: DL View or Graph View? Yongjian Huang, Nigel Shadbolt Intelligence, Agents and Multimedia Group School of Electronics and Computer Science University of Southampton November 27,

More information

Supporting Bioinformatic Experiments with A Service Query Engine

Supporting Bioinformatic Experiments with A Service Query Engine Supporting Bioinformatic Experiments with A Service Query Engine Xuan Zhou Shiping Chen Athman Bouguettaya Kai Xu CSIRO ICT Centre, Australia {xuan.zhou,shiping.chen,athman.bouguettaya,kai.xu}@csiro.au

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 4, Jul-Aug 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 4, Jul-Aug 2015 RESEARCH ARTICLE OPEN ACCESS Multi-Lingual Ontology Server (MOS) For Discovering Web Services Abdelrahman Abbas Ibrahim [1], Dr. Nael Salman [2] Department of Software Engineering [1] Sudan University

More information

How to Leverage Containers to Bolster Security and Performance While Moving to Google Cloud

How to Leverage Containers to Bolster Security and Performance While Moving to Google Cloud PRESENTED BY How to Leverage Containers to Bolster Security and Performance While Moving to Google Cloud BIG-IP enables the enterprise to efficiently address security and performance when migrating to

More information

NeOn Methodology for Building Ontology Networks: a Scenario-based Methodology

NeOn Methodology for Building Ontology Networks: a Scenario-based Methodology NeOn Methodology for Building Ontology Networks: a Scenario-based Methodology Asunción Gómez-Pérez and Mari Carmen Suárez-Figueroa Ontology Engineering Group. Departamento de Inteligencia Artificial. Facultad

More information

Implementing ITIL v3 Service Lifecycle

Implementing ITIL v3 Service Lifecycle Implementing ITIL v3 Lifecycle WHITE PAPER introduction GSS INFOTECH IT services have become an integral means for conducting business for all sizes of businesses, private and public organizations, educational

More information