Semantic Technology Opportunities Avinash Punekar Scientific Publishing Services
April 2011 2 Semantic Technology
April 2011 3 What is Semantic Technology? ² Semantic Web ² Web 3.0 ² Linked Open Data / Linked Enterprise Data ² Web of Data ² Web of Things ² GGG Giant Global Graph ² Is about using software to leverage our understanding and use of information ² And more!!!
April 2011 4 Semantic Technology It is all about DATA ² Semantic Data that is not only machine READABLE. ² It is machine UNDERSTANDABLE! It is not ² A software package ² Something that will ever be complete ² A replacement for the current Web ² A pipe dream ² A silver bullet
April 2011 5 Semantic Technology It is ² A Web-scale architecture ² A metadata technology ² A layer of meaning on the existing Web ² In use TODAY! Semantic enrichment is a process whereby text within a research or scholarly document is annotated by semantic metadata. It enables free text to be converted into a database of knowledge by extracting the concepts and linking the concepts to related knowledge bases.
April 2011 6 Semantic Technology Machine Understanding - How? ² By uniquely identifying THINGS ² By uniquely identifying RELATIONSHIPS ² By using TRIPLES
April 2011 7 Semantic Technology What is a THING? A THING is anything that can be uniquely identified by a URI or a literal (string) Me à http://twitter.com/ericaxel My postal code à http://www.city-data.com/zips/90043.html The White House à Lat: 38.89859 Long: -77.035971 L.A. County s sales tax rate à 9.750 % à http://ericfranzon.com/operator.jpg
April 2011 8 Semantic Technology What is a RELATIONSHIP? Something which connects two THINGS uniquely --- isfatherof -------à <owl:objectproperty rdf:id="isfather"><rdfs:domain rdf:resource="#person"/><rdfs:range rdf:resource="#person"/></owl:objectproperty>
April 2011 9 Semantic Technology What is a TRIPLE? book has title This a Relationship Thing ------------------------------à Thing Predicate Subject ------------------------------à Object
April 2011 10 Semantic Technology Where is it now?
April 2011 11 Semantic Technology Technologies ²RDBMS Data, Schema. Query Language ²Semantic Data, Schema (Vocabularies), Query Language ²Data Language Resource Description Framework ²RDF is good for distributing data across the Web and pretending it s in one place http://plushbeautybar.com dc:creator http://www.ericaxel.com/foaf.rdf http://www.geonames.org/maps/google_34.021_-118.396.html dc: location N 34 1' 16' W 118 23' 47'' http://twitter.com/ericaxel foaf: knows Dave McComb
April 2011 12 Semantic Technology Vocabularies ² Ontologies ² Taxonomies ² Folksonomies
April 2011 13 Semantic Technology Some are ways of describing vocabularies: ² RDF: property triple RELATIONSHIPS ² RDFs (RDF Schema) ² OWL (Web Ontology Language) Some are controlled vocabularies like: ² Dublin Core ² SKOS (Simple Knowledge Organization System) ² SIOC (Semantically-Interlinked Online Communities) Reuse or make up your own!
April 2011 14 Semantic Technology Query Language: SPARQL ² SPARQL ² Protocol ² And ² RDF ² Query ² Language
April 2011 15 Semantic Components The semantic web comprises the standards and tools of HTML5, XML, XML Schema, RDF, RDF Schema and OWL that are organized in the Semantic Web Stack. The OWL Web Ontology Language Overview describes the function and relationship of each of these components of the semantic web: XML provides an elemental syntax for content structure within documents, yet associates no semantics with the meaning of the content contained within. XML Schema is a language for providing and restricting the structure and content of elements contained within XML documents. RDF is a simple language for expressing data models, which refer to objects ("resources") and their relationships. An RDF-based model can be represented in XML syntax. RDF Schema extends RDF and is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized-hierarchies of such properties and classes. OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes. SPARQL is a protocol and query language for semantic web data sources.
April 2011 16 Semantic Projects DBpedia - DBpedia is an effort to publish structured data extracted from Wikipedia: the data is published in RDF and made available on the Web for use under the GNU Free Documentation License, thus allowing Semantic Web agents to provide inferencing and advanced querying over the Wikipedia-derived dataset and facilitating interlinking, re-use and extension in other data-sources FOAF - A popular application of the semantic web is Friend of a Friend (or FoaF), which uses RDF to describe the relationships people have to other people and the "things" around them. FOAF is an example of how the Semantic Web attempts to make use of the relationships within a social context. GoodRelations for e-commerce - A huge potential for Semantic Web technologies lies in adding data structure and typed links to the vast amount of offer data, product model features, and tendering / request for quotation data. The GoodRelations ontology is a popular vocabulary for expressing product information, prices, payment options, etc. It also allows expressing demand in a straightforward fashion. GoodRelations has been adopted by BestBuy, Yahoo, OpenLink Software, O'Reilly Media, the Book Mashup, and many others. NextBio - A database consolidating high-throughput life sciences experimental data tagged and connected via biomedical ontologies. Nextbio is accessible via a search engine interface. Researchers can contribute their findings for incorporation to the database. The database currently supports gene or protein expression data and is steadily expanding to support other biological data types.
April 2011 17 Web Ontology Language (OWL) The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies endorsed by the World Wide Web Consortium. They are characterised by formal semantics and RDF/XML-based serializations for the Semantic Web. OWL has attracted both academic, medical and commercial interest. Basic Formal Ontology,[14] a formal upper ontology designed to support scientific research BioPAX, an ontology for the exchange and interoperability of biological pathway (cellular processes) data BMO, an e-business Model Ontology based on a review of enterprise ontologies and business model literature CCO (Cell-Cycle Ontology, an application ontology that represents the cell cycle Ccontology, an e-business ontology to support online customer complaint management CIDOC Conceptual Reference Model, an ontology for cultural heritage[19] COSMO, a Foundation Ontology designed to contain representations of all of the primitive concepts needed to logically specify the meanings of any domain entity. It is intended to serve as a basic ontology that can be used to translate among the representations in other ontologies or databases. It started as a merger of the basic elements of the OpenCyc and SUMO ontologies, and has been supplemented with other ontology elements (types, relations) so as to include representations of all of the words in the Longman dictionary defining vocabulary.
April 2011 18 Web Ontology Language (OWL) The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies endorsed by the World Wide Web Consortium. They are characterised by formal semantics and RDF/XML-based serializations for the Semantic Web. OWL has attracted both academic, medical and commercial interest. Cyc, a large Foundation Ontology for formal representation of the universe of discourse. Disease Ontology, designed to facilitate the mapping of diseases and associated conditions to particular medical codes DOLCE, a Descriptive Ontology for Linguistic and Cognitive Engineering Dublin Core, a simple ontology for documents and publishing Foundational, Core and Linguistic Ontologies Foundational Model of Anatomy, an ontology for human anatomy Gene Ontology for genomics GUM (Generalized Upper Model), a linguistically-motivated ontology for mediating between clients systems and natural language technology NIFSTD Ontologies from the Neuroscience Information Framework: a modular set of ontologies for the neuroscience domain. See http://neuinfo.org
April 2011 19 Web Ontology Language (OWL) The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies endorsed by the World Wide Web Consortium. They are characterised by formal semantics and RDF/XML-based serializations for the Semantic Web. OWL has attracted both academic, medical and commercial interest. OBO Foundry, a suite of interoperable reference ontologies in biomedicine Ontology for Biomedical Investigations, an open access, integrated ontology for the description of biological and clinical investigations OMNIBUS Ontology, an ontology of learning, instruction, and instructional design Plant Ontology for plant structures and growth/development stages, etc. POPE, Purdue Ontology for Pharmaceutical Engineering PRO, the Protein Ontology of the Protein Information Resource, Georgetown University. Program abstraction taxonomy program abstraction taxonomy Protein Ontology for proteomics Systems Biology Ontology (SBO), for computational models in biology Many more (ONIX, MARC, Dublin Core)
April 2011 20 Semantic Technology Why is it important to us? ² It is the future ² All major governments have made adoption mandatory ² All big businesses have adopted it ² The scope in all areas and especially in publishing is huge ² Fundamentally changes what we are doing ² Our customers have adopted it ² Presents new opportunities
April 2011 21 Market The market for semantic enrichment industries/sectors are: will be much larger. Some of the ² Publishing Books, Journals ² Media & Entertainment ² Banking, Finance, Insurance ² Pharmaceutical ² Medical ² Government ² Legal
April 2011 22 Semantic Opportunities ² Content Abstraction ² Technical Data Extraction ² Keyword/Semantic Indexing ² Bibliographic Data Management ² Editorial Services ² Taxonomy, Thesaurus, Ontology, Terminology ² Annotation, Recommendation Creation ² Semantic Tagging ² Semantic Linking ² Researched Linking ² Resource Repurposing
April 2011 23 Content Abstraction Content abstraction is the process of creating a condensed version of a full text article or other technical and research documents. An abstract will provide an indication to the reader of the core themes discussed in the full text. This is used as a document surrogate by publishers to promote the delivery and sales of full text documents. Indicative Abstracts - This discusses what the article indicates in terms of topic and methodology, without providing the key content present in the article. Examples: Product reviews, book abstracts etc. Informative Abstracts - It provides a condensed view of the entire content in the full text document, culling out the key topics and concepts covered. Examples: Abstracts of technical articles, technical standards and specifications. Structured Abstracts - Abstracts created in a structured format with pre-defined headings that truly represent the way the full text is organized. Examples: Abstracts of clinical trials and medical case reports. Here the abstracts follow the typical structure of Introduction/ Background, Scope/ Methods, Results/ discussion/ conclusion based on the specific house style followed by the publisher or information provider. Enhanced/ Value Added Abstracts - Abstracts that pick out the key knowledge that are helpful for decision making using domain expertise and inferences. Examples: English abstracts of patents in multiple languages that extract the key patentability parameters like novelty, use and advantage. Bottom-line summaries/ clinical pearls, etc.
April 2011 24 Technical Data Abstraction Technical data extraction is the process of extracting properties, attributes, metadata and conceptual entities from unstructured technical documents such as patents and non-patent technical literature. Few examples of data that can be extracted from typical chemical and life science related documents are: Systematic Chemical Names (IUPAC Nomenclature) with different spellings; Commas, Periods, Hyphens, Parentheses, Apostrophes, Plusses, Minuses and Greek Symbols Common or Generic Names Trade Names Company Codes Abbreviations Fragmented Descriptors Molecular Formula Genetic Information
April 2011 25 Keyword/Semantic Indexing Indexing is a process where the key descriptors that can represent the core theme of an article or a document are extracted and such article or document is tagged with those descriptors. Such descriptors can be in the form of keywords that are actually present in the document (keyword indexing) or descriptors that represent the key concepts elaborated in the article, but not necessarily to be present in the document (Semantic Indexing). Some of the areas are: Journal Indexing Subject Category Indexing Image Indexing Medical Indexing and Coding/ Evidence Based Rating Drug Indexing Chemical Structure Drawing and Indexing
April 2011 26 Bibliographic Data Management It includes developing, validating, updating and editing bibliographic databases based on the cataloguing rules of some of the leading bibliographic databases like ISSN, OCLC and other leading catalogs. It should also include Onix and RSS feeds.
April 2011 27 Editorial Services Editorial Workflow Administration - Handle the entire manuscript handling process from peer reviewer selection, tracking of manuscripts, reminders to peer reviewers, and style checking of manuscripts. Developmental Editing - Work in tandem with the authors in editing and fine tuning their manuscript.provide services such as fact checking and content enrichment to enhance the authenticity and readability of the manuscript. Content Editing Language Editing Technical Editing Proofreading Editorial Services for Business and Commercial News Services: News Summaries Press Report Analysis Newsletters Media Monitoring Product and Service Descriptions
April 2011 28 Taxonomy, Thesaurus, Ontology, Terminology The offerings should include the following: Taxonomy development and maintenance Taxonomy Mapping/ Integration Taxonomy expansion Semantic labeling of taxonomy nodes through ontology Development of niche taxonomies for medical specialties Automated content mining and vertical search solutions through the deployment of taxonomy and ontology Lexicon development - Word variants, Spelling variants, Morphological variants, Language variants Thesaurus development - Multilevel Broader and Narrower Terms Hierarchical Displays, Construction of Equivalent Terms (Synonyms), Construction of Associated Terms (Related Terms) Ontology Development- Conceptual definition for each node, Disambiguation of homonyms, Deconstruction of existing taxonomies and semantic labeling of taxonomy nodes
April 2011 29 Annotation/Recommendation Creation Annotation Creation During this process the data from the databases is annotated semantically. The process makes the heterogeneous collection data syntactically and semantically interoperable. Recommendation Creation - Rules that define more associative relations between different metadata items need to be created. These rules are based on the domain ontologies, the collection item annotations, and expert knowledge
April 2011 30 Semantic Services @ SPS ² Semantic Tagging Services We can offer our services for content transformation with semantic tagging. ² Semantic Linking Services We can offer our services for semantic linking of the semantic tags with external objects, resources or databases. ² Researched Linking Services In addition to the above service, we can also offer the services of our teams which can research the disparate information over the internet consisting of the above objects, resources, databases which can then be linked to the content. ² Resource Repurposing/Rebuilding Services We can also offer our services for repurposing/rebuilding of resources, objects such as images, graphs, charts, tables, animations, audios, videos, etc.
April 2011 31 Thank You Avinash Punekar avinash@sps.co.in Phone: + 91 91766 50335 Scientific Publishing Services