IBM China Research Laboratory Industry Adoption of Semantic Web Technology Dr. Yue Pan panyue@cn.ibm.com
Outline Business Drivers Industries as early adopters A Software Roadmap Conclusion
Data Semantics returns The three most important problems in Databases used to be Performance, Performance and Performance; in the future, the three most important problems will be Semantics, Semantics and Semantics (paraphrase) Stefano Ceri, June 11, 1998 What it is The study of how to establish and maintain the correspondence between a data source, hereafter a model, and its intended subject matter. Let s take a historical view 1970~80 Entity Relationship Model 1980~ Description Logics and other KR 2000 ~ Semantic Web Now Enterprise Semantic Web is chosen as one of the Gartner s highlighted Emerging Technologies with high business impact in 2006
Data Semantics: Why Now Progress in storage and system Human resource cost Large number of databases/applications silos deployed in enterprise Business becoming more dynamic,
Critical Attributes of Information On Demand Moving From a Project-Based to a Flexible Architecture Deliver Information In Context In Line Effectively Governed Integrate Information Structured & Unstructured Timely Accurate Manage Complexity Open Standards Flexible & Resilient Infrastructure Heterogeneous Applications & Information
Semantic Web Today s Web is mainly for human consumption Semantic Web make it towards data and programs A Web of (Hyper)Text (HTML) -> A Web of Data (XML) -> A Web of Meaningful Data (RDF,OWL ) Smarter Data (vs. Smarter Machines) Make content easier for machines to find, access and process Express data and meaning in standard machine-readable format Support decentralized definition and management... this is what the Semantic Web is about
IBM Research Industries as Early Adopters Diagnosis of needs for Semantic Technology and Semantic Web Complexity of Data (structures, sources, volume) Volatility of data Willingness to Experiement on the Bleeding Edge Most likely Healthcare and Life Sciences Financial Market Interactive media companies Aerospace and defense Maybe Insurance Chemical and Petroleum Travel and Transportation Electronics Reference A. Jackson, L. Weitzman, S. Martin. A Beginner s Guide to the Semantic Web. IBM Report, 2004.
Healthcare and Life Science UMLS Unified Medical Language System UMLS Semantic Network HL7 - Healthcare Level Seven HL7 CTS Common Terminology Service Mayo Clinic s LexGrid Reference Vipul Kashyap,The UMLS Semantic Network and the Semantic Web. G. Eysenbach. The Semantic Web and healthcare consumers: a new challenge and opportunity on the horizon? Int. J. Healthcare Technology and Management 5, 2003 LexGrid http://informatics.mayo.edu/lexgrid/
Healthcare and Life Science (Continue) OMIC (Genomics, transcriptomics, proteomics etc.) standards Gene Ontology Life Science Identifier MIT s Haystack BioDASH Simile Reference X. Wang, R. Gorlitsky & J.S. Almeida. From XML to RDF: how semantic web technologies will change the design of 'omic' standards. Nature Biotechnology 23, 2005. Semantic Web in Health-care and Life Sciences - Applications and Demonstrations, http://www.w3.org/2005/04/swls/
Financial Market XBRL - Extensible Business Reporting Language XBRL Taxonomy XBRL Linkbase Portia s XBRL Tool& Service Reference Gartner Highlights Key Emerging Technologies in 2005 Hype Cycle http://www.portia.dk/websites/xbrltoolsservices.htm
Interactive Media Dublin Core RSS Really Simple Syndication Reference Dublin Core Metadata Initiative. http://dublincore.org RSS. http://en.wikipedia.org/wiki/rss_(file_format)
A Software Roadmap Place the foundation Representation Language and Standards Interpretation Agent (reasoner, knowledge base) Enable ontology management Metadata Management Master Data Management Web Service Registry Support Semantic Web
Semantic Web Activity The Semantic Web is not going to replace current Web (HTTP, XML, URL), but to build on top of it and get the full potential of the Web Semantic Web Activity 1997, the W3C defined the first Resource Description Framework (RDF) 1999, RDF became a W3C recommendation 2004, RDF Schema (RDFS) and Web Ontology Language (OWL) became a W3C recommendation 2005, Work has begun on the Rule Interchange Format Tim Berners-Lee, AAAI 2006 Conference keynote
Challenges in Current Semantic Web OWL Full (DL, Lite) is neither necessary or adequate in many domains. How to subset and extend them in specific applications? If subset and/or extend OWL, how to still ensure the interoperability between the dialects? Most of existing data source is in other kind of data models which use different assumptions, like close world assumption in database. Semantic Web must be bootstraped from existing data sources, then how to overcome the incompatible assumptions? Scalable and high performance knowledge base support millions and billions of triples Distributed reasoning algorithm
Integrated Ontology Development Toolkit IODT is both a core technology for managing RDF, RDFS and OWL ontology data and an API and toolkit for developing and using the technology. EODM Source Code at http://www.eclipse.org/emft/projects/eodm/ IODT Binary code at http://www.alphaworks.ibm.com/tech/semanticstk Source Code at https://cs.opensource.ibm.com/projects/iodt Ranks 19 in download counts of all 556 alphaworks technologies in June 2006
Integrated Ontology Development Toolkit Ontology Ontology Object-Oriented ODM Model (Model Driven Perspective) ECore Model (EMF Perspective) EODM Model Triples and Logic Model (Semantic Perspective) Minerva
A Software Roadmap Place the foundation Representation Language and Standards Interpretation Agent (reasoner, knowledge base) Enable ontology management Metadata Management Master Data Management Web Service Registry Support Semantic Web
Metadata: Help understand/exploit information assets We need a system to automatically reconcile product inventory from our subsidiaries. Business Analyst Software Architect We can build a system to integrate product information across subsidiaries. We re losing revenue because similar products are hard to find in our online catalog. End Users Business metadata Information Assets Operational metadata Application metadata Developer Data Architect This EJB will represent product information in a vendorindependent way. This data model will integrate our product inventory with vendor information. We ll need to a put in high speed WAN between our subsidiary data centers. IT Admin Data Admin We can deploy this data model using federation to integrate existing inventory databases. Source: IBM Metadata TT Study, 2005
Metadata Challenges What is metadata: Metadata defines the content, context, and structure of information IT level Table names, column names and data types key constraints between tables Text annotations/descriptions of what the tables and columns represent Business level Business concepts Business objects Business rules Current limitations Lack of interoperability between metadata Each data source has its own metadata and/or metadata repository Limit of business understanding and collaboration between business and IT Models are IT-centric, not business-centric Insufficient semantics for dynamic SOA environments Informational, but not computational, limiting automation
Semantic Metadata Management Using ontology to model business concept and ontology instances for business objects Capture the industry-wide or enterprise-wide business concept into ontologies Populate and manage metadata about business concept, physical data and the inter-relationship between them Construct virtual business object dynamically from the metadata Enables greater IT flexibility to more responsively support business operations Give a view of data from business level instead of IT Enable dynamic definition of new business concepts Apply rule and policy to business objects, instead of IT-level objects
IBM Research Master Data Definition Master data is data that is shared across systems (such as lists or hierarchies of customers, suppliers, accounts, or organizational units) and is used to classify and define transactional data. [IDC] Transactions Query and Reporting Analysis Examples MDM Hub Trans data Master data Metadata Sell Product A to Customer X on 1/1/06 for $100. Trans data Master data Metadata With Master Data, we should be able to answer to such questions What is a customer? Procurement Hierarchies; Rules; Workflow CRM Define the concept Trans data Master data Metadata How to add a new customer? Trans data Master data Metadata Defines the workflow How to know that 2 customers refers to the same identity? Defines business rules Marketing Business Performance Management Value of Master Data Management Trans data Master data Metadata data Master data Metadata Trans data Master data Metadata Use a MDM Hub as the master to keep the multiple system already deployed as the slave consistent Finance Data warehouse Planning Apply workflows and rules consistently across applications
Master Data Challenges Three Essential Technologies for MDM An enterprise-wide data model" which provides a logical model for aggregating and reconciling the various data sources that comprise a master record. Federated capabilities" to connect independent data stores with a thin structure while leaving most of the data in their source locations. "Identity management" to not only securely identify customers but also centrally manage privacy policies. Next Generation Master Data Management Flexible model Easy to change Expressive Semantically Complete Effective to manage metadata and data Scalable and of high performance Source: CDI INSTITUTE MARKETPULSE SURVEY 2004 Data Model Storage Model
Semantic Master Data Management Use ontology and ontology instances to model master data Develop scalable ontology repository and search engine Add-on Value of ontology model for master data Support dynamic categories of customers, products Support rich relationship types and reasoning over relationship Query over relationship by leveraging semantic query (e.g. SPARQL) Categorize objects on the fly for master data exchanges
Web Service Discovery Need a standard interoperable platform that enables companies and applications to quickly, easily, and dynamically find and use Web services over the Internet
Semantic Web Services Semantics can improve software reuse and discovery, significantly facilitate composition of Web services and enable integrating legacy applications semantic models provide agreement on the meaning and intended use of terms Reasoning techniques can be used to find the semantic similarity between the service description and the request. Example: WSDL-S Associate semantic annotations with Web services described using WSDL Other Representations WSMO, OWL-S, SWSA/SWSL. WSDL-S Informaiton Model R. Akkiraju, J. Farrell, J.Miller, M. Nagarajan, M. Schmidt, A. Sheth, K. Verma, "Web Service Semantics - WSDL-S, " A joint UGA-IBM Technical Note, version 1.0, April 18, 2005
A Software Roadmap Place the foundation Representation Language and Standards Interpretation Agent (reasoner, knowledge base) Enable ontology management Metadata Management Master Data Management Web Service Registry Support Semantic Web
Information Delivery and Integration on Semantic Web Support many kinds of data integration, so that we can 1) postpone the labor-intensive aspects of data integration until they are absolutely needed and choose the right approach of integration; 2) reuse previous data integration efforts Federation ETL Aggregation Search Iterative RDF+OWL+SPARQL+HTTP SPARQL Service RDF form of Data Ontologies of Objects Object Crawler Search Interface Ontologies of Objects RDF form of Data SPARQL Service Adapter Mapping Index of Objects Mapping Adapter XQuery Service Existing Database SQL Schema SQL Schema Existing XML
Conclude: Semantic Web, the Next Generation Web Information On Demand Unstructured Information Electronic File Web 1.0 Web 3.0 Semantic Web Structured Information Database Web 2.0 (SOA ) Digitalized Structured Linked Integrated Intelligent
Q&A Thank You! Semantic Technologies Integrated Ontology Development Toolkit Confidential 2005 IBM Corporation