Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Similar documents
Natural Language Processing with PoolParty

Fusing Corporate Thesaurus Management with Linked Data using PoolParty

PoolParty. Thesaurus Management Semantic Search Linked Data. ISKO UK, London September 14, Andreas Blumauer

DBpedia Data Processing and Integration Tasks in UnifiedViews

Linking SharePoint Documents with Structured Data. Towards Unified Views of Business-critical Information. Andreas Blumauer Director PoolParty Ltd, UK

Utilizing, creating and publishing Linked Open Data with the Thesaurus Management Tool PoolParty

POOLPARTY SEMANTIC SUITE FUNCTIONAL OVERVIEW. Andreas Blumauer CEO, Semantic Web Company. Helmut Nagy COO, Semantic Web Company

Copyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata. A Tale of Two Types of Vocabularies

Things to consider when using Semantics in your Information Management strategy. Toby Conrad Smartlogic

4) DAVE CLARKE. OASIS: Constructing knowledgebases around high resolution images using ontologies and Linked Data

OVERVIEW. In depth. Smartlogic Semaphore. The what? and how? of our Content Intelligence solution FIND OUT MORE >

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

Semantic Integration with Apache Jena and Apache Stanbol

ACCELERATE YOUR SHAREPOINT ADOPTION AND ROI WITH CONTENT INTELLIGENCE

Ontology-based Architecture Documentation Approach

Copyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata. A Tale of Two Types of Vocabularies

Planning and Administering SharePoint 2016

Planning and Administering SharePoint 2016

A service based on Linked Data to classify Web resources using a Knowledge Organisation System

Course : Planning and Administering SharePoint 2016

Financial Dataspaces: Challenges, Approaches and Trends

ANNUAL REPORT Visit us at project.eu Supported by. Mission

PoolParty - Thesaurus Server

THE GETTY VOCABULARIES TECHNICAL UPDATE

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

Using Linked Data and taxonomies to create a quick-start smart thesaurus

Ontology Summit2007 Survey Response Analysis. Ken Baclawski Northeastern University

A: PLANNING AND ADMINISTERING SHAREPOINT 2016

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge

PROJECT PERIODIC REPORT

Enterprise Data Catalog for Microsoft Azure Tutorial

DEVELOPING MICROSOFT SHAREPOINT SERVER 2013 ADVANCED SOLUTIONS. Course: 20489A; Duration: 5 Days; Instructor-led

Open And Linked Data Oracle proposition Subtitle

OKKAM-based instance level integration

AutoFocus, an Open Source Facet-Driven Enterprise Search Solution

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

DBpedia-An Advancement Towards Content Extraction From Wikipedia

case study The Asset Description Metadata Schema (ADMS) A common vocabulary to publish semantic interoperability assets on the Web July 2011

TopBraid EVN. A Tour of Recent Enhancements. Copyright 2014 TopQuadrant Inc. Slide 1

Produce and Consume Linked Data with Drupal!

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Semantic Technologies for Nuclear Knowledge Modelling and Applications

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company

Semantic Annotation, Search and Analysis

Course Outline: Course : Core Solutions Microsoft SharePoint Server 2013

Microsoft SharePoint Server 2013 Plan, Configure & Manage

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Powering Linked Open Data Applications

Developing Microsoft SharePoint Server 2013 Advanced Solutions

A Scalable Architecture for Extracting, Aligning, Linking, and Visualizing Multi-Int Data

Microsoft Developing Microsoft SharePoint Server 2013 Advanced Solutions

Planning and Administering SharePoint 2016 ( A)

Enterprise Multimedia Integration and Search

COURSE OUTLINE MOC : PLANNING AND ADMINISTERING SHAREPOINT 2016

Taxonomy for Self-service Delivery

The NEPOMUK project. Dr. Ansgar Bernardi DFKI GmbH Kaiserslautern, Germany

Building the Multilingual Web of Data. Integrating NLP with Linked Data and RDF using the NLP Interchange Format

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

User Configurable Semantic Natural Language Processing

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.

Introducing Semantic AI

BA Insight Federator. How the BA Insight Federator Extends SharePoint Search

Collaborative editing of knowledge resources for cross-lingual text mining

Advanced Solutions of Microsoft SharePoint Server 2013 Course Contact Hours

POOLPARTY SEMANTIC INTEGRATOR SMART DATA INTEGRATION. Andreas Blumauer CEO, Semantic Web Company. Helmut Nagy COO, Semantic Web Company

3 Publishing Technique

"Charting the Course... MOC /2: Planning, Administering & Advanced Technologies of SharePoint Course Summary

Report from the W3C Semantic Web Best Practices Working Group

Linked Data. The World is Your Database

Reducing Consumer Uncertainty

Top 7 Data API Headaches (and How to Handle Them) Jeff Reser Data Connectivity & Integration Progress Software

August 2012 Daejeon, South Korea

Improving Drupal search experience with Apache Solr and Elasticsearch

Agricultural bibliographic data sharing & interoperability in China

Porting Social Media Contributions with SIOC

Linked Data. Department of Software Enginnering Faculty of Information Technology Czech Technical University in Prague Ivo Lašek, 2011

Publishing Statistical Data and Geospatial Data as Linked Data Creating a Semantic Data Platform

Tamr Technical Whitepaper

Semantic Interoperability of Basic Data in the Italian Public Sector Giorgia Lodi

Building knowledge graphs in DIG. Pedro Szekely and Craig Knoblock University of Southern California Information Sciences Institute dig.isi.

W3C Workshop on the Future of Social Networking, January 2009, Barcelona

Realtime visitor analysis with Couchbase and Elasticsearch

Динамичното семантично публикуване в Би Би Си (Empowering Dynamic Semantic Publishing at the BBC) CESAR, META-NET Meeting, Sofia

Springer Science+ Business, LLC

Semantic Web and Natural Language Processing

The Emerging Data Lake IT Strategy

Semantic Web Fundamentals

From Online Community Data to RDF

COMP9321 Web Application Engineering

Development of an e-library Web Application

COMP6217 Social Networking Technologies Web evolution and the Social Semantic Web. Dr Thanassis Tiropanis

COMP9321 Web Application Engineering

Microsoft SharePoint Server

Links, languages and semantics: linked data approaches in The European Library and Europeana. Valentine Charles, Nuno Freire & Antoine Isaac

Semantic Web Fundamentals

REGULATORY REPORTING FOR FINANCIAL SERVICES

Is Linked Data the future of data integration in the enterprise?

Catching the wave Tools and Technology for Taxonomists Taxonomy Bootcamp London October 16, 2018

Informatica Enterprise Information Catalog

Transcription:

Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz

Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview... 6 Summary... 7 PoolParty Thesaurus Server... 7 PoolParty Extractor... 9 PoolParty UnifiedViews... 11 PoolParty Graph Search Server... 12 Integrating PoolParty... 13 PoolParty PowerTagging... 14 PoolParty Semantic Integrator... 15 2016, Semantic Web Company (SWC) 2

Introduction PoolParty Semantic Suite (http://www.poolparty.biz/) is a world-class semantic technology suite that offers sharply focused solutions to transform knowledge organization and content business. As a semantic middleware, PoolParty enriches all kinds of information with valuable metadata and links business and content assets automatically. Figure 1: PoolParty Semantic Suite - Overview PoolParty Technical Overview PoolParty technology platform consists of several components and can be configured and extended to meet individual requirements. PoolParty Thesaurus Server supports web-based taxonomy and ontology management. It is completely built on top of W3C s Semantic Web standards (http://www.w3.org/standards/semanticweb/). In its core, PoolParty uses RDF to represent SKOS and other vocabularies like Dublin Core or FOAF, therefore an RDF triple store is used as its technological basis. Compared to other systems, which still rely on relational databases, PoolParty is ready to consume and to publish Linked Data out-of-the-box. 2016, Semantic Web Company (SWC) 3

Besides the possibility to publish any PoolParty based thesaurus via a Linked Data front-end, the system offers a SPARQL endpoint (http://www.w3.org/tr/rdf-sparql-query/) to execute queries over each thesaurus project. This technology can be used to integrate knowledge graphs with content platforms (Wikis, CMS, etc.) or search engines. Additionally, PoolParty supports highly scalable and precise entity extraction and other text mining services. Its ability to transform structured and unstructured information into RDF offers new options for data analytics. PoolParty Semantic Integrator, which is the most advanced configuration level of PoolParty Suite, is a solution for data analytics built on top of Linked Data Warehouses with the power of SPARQL engines in its core. In addition to full support of SPARQL, PoolParty APIs offer traditional means to integrate semantics into enterprise information systems and web platforms. Based on JSON REST, developers can make use of all CRUD methods necessary to maintain a taxonomy from within a third-party application like a CMS. PoolParty integrations have been implemented with content platforms like Drupal, SharePoint, Confluence, Alfresco, or WordPress. As an additional result, guidelines have been developed, which can be reused for other integration projects. PoolParty was also successfully integrated into search engines like Solr, Elasticsearch, Mindbreeze or Intrafind. The subsequent figure illustrates the most important technical components of PoolParty technology platform: As a basic layer, various data formats and sources can be consumed and transformed, either to generate knowledge graphs from it, or as an incoming information stream to get automatically linked to the knowledge graphs in the RDF store. PoolParty Semantic Middleware, which is deployed in Apache Tomcat, offers various semantic services, GUIs and APIs. Core services are built on top of the Spring Framework (http://spring.io/). Taxonomists and thesaurus managers take care of the knowledge graphs. Information architects make sure that relevant content sources are properly linked to those, which is a precondition to establish applications and services like semantic indexing on top. PoolParty APIs are used by developers to generate smart applications and semantic services like recommender systems, automatic tagging facilities, or semantic search applications. Linked Data Warehouses (Remote RDF Graph Databases) can be attached to the platform to store all resulting RDF graphs to make them available via powerful SPARQL queries and reports. 2016, Semantic Web Company (SWC) 4

Figure 2: PoolParty Semantic Suite - Technical Overview 2016, Semantic Web Company (SWC) 5

PoolParty Components Overview The technical architecture above provides an overview of the technical components that provide the basis for the PoolParty Semantic platform. From an application perspective the platform can be divided in different functional components (see diagram below) that are combined to specific product bundles offering various integration options. Figure 3: PoolParty Semantic Suite - Functional Components 2016, Semantic Web Company (SWC) 6

In the following chapters an overview of the different components is provided and integration options are outlined: PoolParty Thesaurus Server PoolParty Extractor PoolParty UnifiedViews PoolParty Graph Search Server Integrating PoolParty Summary PoolParty Product Suite offers a wide variety of options to benefit from semantic technologies. The major topics covered are: Semantic search, taxonomy and ontology management, data integration and linked data. PoolParty uses in its core Semantic Web technologies which are built around open standards and state-of-the art technologies. Professional metadata management is the key for efficient information management in large organisations and on the web. PoolParty combines methodologies from the Semantic Web with text mining algorithms and approaches for collaborative knowledge engineering to make applications smarter and to improve user experience. PoolParty Thesaurus Server Taxonomies and thesauri in the age of the web most often should be engineered and maintained in a collaborative manner. PoolParty is fully web-based; administrators need only a web browser to do all typical CRUD operations like creating, deleting or editing concepts or relations in a knowledge graph. Workflows and approval processes can be activated if desired. 2016, Semantic Web Company (SWC) 7

Figure 4: PoolParty Thesaurus Server - Graphical User Interface The PoolParty graph modelling approach intertwines taxonomy and ontology management in a new way. SKOS taxonomies can be extended by ontologies like schema.org or FOAF. PoolParty users can create their own custom ontologies and schemas by reusing existing ontologies. By means of text corpus analysis and automatic quality checks, PoolParty supports taxonomists to make sure that resulting thesauri comprise the content base in a meaningful way. Automatic suggestions for the supervised extension of taxonomies are generated through elaborated text mining algorithms. The rise of Linked Data indicated by the enormous growth of the Linked Open Data cloud is an important argument for many organisations to maintain their own data at least partly based on Linked Data standards. PoolParty makes use of existing Linked Data sources, e.g. concepts can be aligned and enriched with additional information from sources like Dbpedia, Geonames, Wikidata or others. To generate seed-thesauri for a certain domain, DBpedia can be used a source to extract a taxonomy automatically. A wiki frontend for each thesaurus project helps to involve other people than taxonomists in the thesaurus development process. Linking concepts is another flexible way to build thesauri in decentralised structures. Based on the linked data principles, thesauri can be maintained at different places but still can be connected to each other indicating that several concepts are similar or even identical to each other. 2016, Semantic Web Company (SWC) 8

PoolParty is an enterprise-ready system, which offers high reliability, security, performance and mechanisms like failover, which guarantees smooth workflows and protection from loss of data. The use of open standards guarantee a high investment security. The integration of PoolParty thesauri with enterprise systems can be realised on top of standard APIs. PoolParty Thesaurus Server is natively built on top of RDF triple stores. Its graph management facilities can be integrated with all graph stores providing a SAIL API (http://rdf4j.org/sesame/2.7/docs/users.docbook?view). PoolParty Extractor The PoolParty Extractor analyses documents and text and extracts meaningful phrases, named entities, categories or other metadata automatically with high throughput and accuracy. Different data or metadata schemas can be mapped to a SKOS thesaurus that is used as a unified semantic knowledge model. During this process the extracted entities are linked to the knowledge model (the thesaurus in the PoolParty Thesaurus Server) via URIs that provide a direct way to integration following Semantic Web principles. The PoolParty Extractor is implemented as a pipeline of annotation units where each specific unit adds to the final result. This keeps the system flexible and allows it to be adapted quickly to new requirements. Advanced linguistic features include classification, corpus statistics and disambiguation. Documents are classified along the structure of a thesaurus which allows the user to flexibly change the classification criteria. Corpora (sets of domain specific documents) are a great way to add background knowledge to text mining processes. They provide term frequencies and distributions that improve the scoring of entities and drive the detection of new relevant entities from text. Ambiguity can greatly reduce the precision of entity extraction when identical terms are used to refer to different entities. Such ambiguities can be modeled in PoolParty and improve extraction quality and in the end the experience of the users that interact with the annotation results. 2016, Semantic Web Company (SWC) 9

Figure 5: PoolParty Extractor - Result of extraction in a demo application The text mining functionality of the PoolParty Extractor is integrated with other systems via a web service API that follows the RESTful principle and produces results in JSON. The API is designed for high throughput. In situations with special requirements in terms of high availability or scalability the system can be operated in clustered mode, too. Out of the box, the system comes with connectors to RDF graph databases that enable easy integration of the results of text mining processes with other RDF data. 2016, Semantic Web Company (SWC) 10

PoolParty UnifiedViews PoolParty UnifiedViews as part of the PoolParty Semantic Integrator provides a framework to develop, execute, monitor, debug, schedule, and share RDF data processing tasks. It can be seen as a Extract-Transform-Load (ETL) framework for RDF, although it doesn t strictly follow ETL processes as, for example one process can trigger the next one. Data processing tasks are modelled as pipelines via a graphical interface and can consist of several Data Processing Units (DPUs). PoolParty UnifiedViews comes with a predefined set of DPUs and offers a well documented API to develop Custom DPUs (plugins) on demand. Figure 6: PoolParty UnifiedViews - Graphical User Interface 2016, Semantic Web Company (SWC) 11

Pipelines can be scheduled to run on a timely basis or can be triggered by other pipelines. The Execution Monitor gives detailed information on the execution of a pipeline. All data processed is stored in separate graphs and can be reviewed for debugging. The underlying triple store is configurable. Per default, a built-in memory store is used but basically any triple store supporting the SAIL API can be integrated. The Scheduler also includes a notification system that allows to send information about the outcome of scheduled data processing tasks. PoolParty UnifiedViews is based on the open source UnifiedViews project as SWC is a member of the core development team (see http://unifiedviews.eu). PoolParty UnifiedViews is a SWC-maintained and supported build of UnifiedViews including DPUs that allow easy integration with the other PoolParty Semantic Integrator server components. PoolParty Graph Search Server The usual workflow of a PoolParty Graph Search project starts with the gathering of structured and unstructured data from various sources by using UnifiedViews and/or by transforming documents into RDF by means of the PoolParty Extractor. The processed RDF data is stored in a search index like Apache Solr or Elastic Search or a Enterprise Triple Store. The PoolParty Graph Search Server offers a web service API that follows the RESTful principle and produces results in JSON for traditional document search applications with additional beneficial features like synonym search and hierarchical drill down based on the knowledge graph that is managed with PoolParty Thesaurus Server. 2016, Semantic Web Company (SWC) 12

Figure 7: PoolParty Graph Search Using an Enterprise Triple Store allows in addition to uses special data structures in the data that has to be searched in combination with optimized SPARQL queries in combination with default the document search features. The date can be organized in named graphs to provide a separation of data. By this, you can aggregate and manage large volumes of information like DBpedia, WordNet, Geonames etc. to integrate those into your search and analytics applications. Since all relevant data can be acquired from the triple store, interactive visualizations or other forms of analytics, like reports can be built using SPARQL queries. Customizing integrated linked knowledge graphs and adapting SPARQL queries allows you to adapt and modify your analysis applications in a very dynamic and agile way. Like you do mashups, you can combine different data sets and formulate queries to retrieve data accordingly to your needs. Integrating PoolParty Its APIs make PoolParty ready to be integrated into any Enterprise Information System and by that provide means to connect those systems via a semantic layer based on the developed taxonomies (knowledge graphs). Different APIs are provided per component. 2016, Semantic Web Company (SWC) 13

All of them share the following features: JSON RESTful: Semantic technologies ready to be integrated based on popular web technologies CRUD: Create, read, update, and delete full-blown API for all kinds of interactions with taxonomies and knowledge graphs Secure: PoolParty API is fully integrated into PoolParty s security layer based on Spring SPARQL endpoint: In addition to the standard API, PoolParty s SPARQL endpoint is used to execute more complex queries and integrate data in a highly flexible manner The following two chapters outline on one hand or unified approach of integrating PoolParty into any Enterprise Information system (PowerTagging) and on the other hand outline the principles of the PoolParty Semantic Integrator that allows to establish a unified layer over several systems: PoolParty PowerTagging PoolParty PowerTagging is a unified integration approach to semantically enrich Enterprise Information Systems (EIS) and provide advanced search features inside those systems. From the PoolParty Server side it is based on the Thesaurus Server API and the Entity Extractor API and requires the respective integration on the EIS side. Existing integrations are in place for Atlassian Confluence, Drupal, Alfresco and MS SharePoint. The advantages of this approach are manifold: Automatic concept tagging: Annotate your content and attachments with concepts from your thesaurus and add additional tags if you like. Consistent metadata: Benefit from consistent tagging by the provision of auto-completion based on controlled vocabularies. Enhanced search: Extend your CMS s search capabilities by search facets, precise similarity search, automatic query expansion, sentiment analysis, and trend diagrams Bulk-tagging: Existing CMS and its whole content base can be tagged automatically at once via bulk-tagging Multilinguality: Multilingual thesauri and therefore multilingual tagging and search is supported. 2016, Semantic Web Company (SWC) 14

Figure 8: Power Tagging Integration - MS SharePoint This approach establishes a unified metadata layer within one Enterprise Information Systems. Of course the next step would be to use this information to connect different systems using the same metadata layer or linking different metadata layers via semantic web standards. This approach id offered by the PoolParty Semantic Integrator and outlined in the next chapter. PoolParty Semantic Integrator The PoolParty Semantic Integrator is a unified integration approach for connecting data from different sources. It provides a Unified Metadata Layer based on semantic web standards allowing to create integrated views over different data sources. This is a novel and cost efficient approach for data integration allowing Enterprises to explore and use their data in ways that where not possible before. From the PoolParty Server side it is based on the Thesaurus Server API, the Entity Extractor API and the Graph Search Server API. PoolParty UnifiedViews can be used to automate and schedule different processing, transformation and synchronization tasks. That way a Semantic Platform is in place supporting the following features: SQL to RDF mapping: Data residing in different relational data bases can be integrated easily and cost efficient. Text to RDF mapping: Based the PoolParty Extractor unstructured information can be integrated in the "enterprise knowledge graph". Integrations with various Enterprise CMS: Existing Power Tagging integrations can added and by that information in different systems interlinked. 2016, Semantic Web Company (SWC) 15

Integrate with LOD sources: The internal (enterprise) knowledge graph can be enriched by freely available knowledge sources like DBpedia, Freebase, Geonames etc. Unified API: Since all information is based on semantic web standards a unified data model is in place providing a unified API with SPARQL as a standardized query language. Using these features the PoolParty Semantic Integrator allows to develop application that provide access to the knowledge of an enterprise beyond simple search e.g. Semantic (Graph) Search, linked data based search, Geospatial search, Recommender Engines, etc. Data analytics can be done faster and more cost efficient providing 360-degrees views on decision-critical business objects. Figure 9: PoolParty Semantic Integrator - High-level architecture 2016, Semantic Web Company (SWC) 16