DataGraft: A Platform for Open Data Publishing

Similar documents
DataGraft: Simplifying Open Data Publishing

DataGraft: One-Stop-Shop for Open Data Management 1

Enabling suggestions in tabular data cleaning and RDF mapping validation

Documented methodology and guidelines

Tabular Data Cleaning and Linked Data Generation with Grafterizer. Dina Sukhobok Master s Thesis Spring 2016

OpenGovIntelligence. Deliverable 3.1. OpenGovIntelligence ICT tools - first release

LinDA: A Service Infrastructure for Linked Data Analysis and Provision of Data Statistics

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Accessing information about Linked Data vocabularies with vocab.cc

OpenGovIntelligence. Deliverable 3.5. OpenGovIntelligence ICT tools

PROJECT PERIODIC REPORT

Demo: Linked Open Statistical Data for the Scottish Government

D WSMO Data Grounding Component

Semantic Web Fundamentals

Data-Transformation on historical data using the RDF Data Cube Vocabulary

Europeana Creative. EDM Endpoint. Custom Views

FAGI-gis: A tool for fusing geospatial RDF data

DBpedia Data Processing and Integration Tasks in UnifiedViews

The C3S Climate Data Store and its upcoming use by CAMS

Open DaaS requirements, design & architecture specification

Evolution of INSPIRE interoperability solutions for e-government

SLIPO. Scalable Linking and Integration of Big POI data. Giorgos Giannopoulos IMIS/Athena RC

Payola: Collaborative Linked Data Analysis and Visualization Framework

Europeana Creative. EDM Endpoint. Custom Views.

Interlinking Media Archives with the Web of Data

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

DBpedia-An Advancement Towards Content Extraction From Wikipedia

MOC 20416B: Implementing Desktop Application Environments

Norwegian State of Estate A Reporting Service for the State-owned Properties in Norway

The UK Marine Environmental Data and Information Network MEDIN

OpenBudgets.eu: Fighting Corruption with Fiscal Transparency. Project Number: Start Date of Project: Duration: 30 months

Linking Distributed Data across the Web

Deliverable Final Data Management Plan

Utilizing, creating and publishing Linked Open Data with the Thesaurus Management Tool PoolParty

Automated Visualization Support for Linked Research Data

Intelligent Information Management

SAP Analytics Cloud Process Flows

There are multiple types of charts/graphs: Column, Line, Pie, Bar, Area, Scatter, Stock, Surface, Doughnut, Bubble, and, Radar.

Harvesting Open Government Data with DCAT-AP

enanomapper database, search tools and templates Nina Jeliazkova, Nikolay Kochev IdeaConsult Ltd. Sofia, Bulgaria

Combining Business Intelligence with Semantic Technologies: The CUBIST Project

Programming the Semantic Web

From Online Community Data to RDF

Semantic Web Fundamentals

LOTED: Exploiting Linked Data in Analyzing European Procurement Notices

Publishing data for maximized reuse

Things to consider when using Semantics in your Information Management strategy. Toby Conrad Smartlogic

IRMOS Newsletter. Issue N 5 / January Editorial. In this issue... Dear Reader, Editorial p.1

Dissemination Web Service. Programmatic access to Eurostat data & metadata

Your Voice is Your Passport: Implementing Voice-driven Applications with Amazon Alexa

Promoting semantic interoperability between public administrations in Europe

Data Management Glossary

The Portal Aspect of the LSST Science Platform. Gregory Dubois-Felsmann Caltech/IPAC. LSST2017 August 16, 2017

Querying multiple Linked Data sources on the Web. Ruben Verborgh

PortalU, a Tool to Support the Implementation of the Shared Environmental Information System (SEIS) in Germany

Crea%ng and U%lizing Linked Open Sta%s%cal Data for the Development of Advanced Analy%cs Services E. Kalampokis, A. Karamanou, A. Nikolov, P.

Serving Ireland s Geospatial as Linked Data on the Web

No Programming Required Create web apps rapidly with Web AppBuilder for ArcGIS

COLLABORATIVE EUROPEAN DIGITAL ARCHIVE INFRASTRUCTURE

EDB Ark 2.0 Release Notes

Data Center Management and Automation Strategic Briefing

D&B Market Insight Release Notes. July 2016

Webinar Annotate data in the EUDAT CDI

Reproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team

Monitoring Azure Azure Monitor How, What, Why?

Making Open Data work for Europe

Magento Marketplace's New Extension Quality Program

The NextGEOSS Project

INSPIRE & Linked Data: Bridging the Gap Part II: Tools for linked INSPIRE data

Own change. TECHNICAL WHITE PAPER Data Integration With REST API

Mapping between Digital Identity Ontologies through SISM

BEYOND Ground Segment Facility The Hellenic Sentinel Data Hub (Mirror Site)

SenseMark A database benchmark and evaluation study for alternative databases for Sensor data and IoT

Medical-domain Machine Translation in KConnect

Deliverable Initial Data Management Plan

Paper ###-YYYY. SAS Enterprise Guide: A Revolutionary Tool! Jennifer First, Systems Seminar Consultants, Madison, WI

Sharing software for INSPIRE implementation, use and reuse

Sharing Archival Metadata MODULE 20. Aaron Rubinstein

FP7-INFRASTRUCTURES Grant Agreement no Scoping Study for a pan-european Geological Data Infrastructure D 4.4

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data

Semantic Web T LS Update

Towards Green Linked Data

Esri and MarkLogic: Location Analytics, Multi-Model Data

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

Database Developers Forum APEX

Introduction

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS

A Journey to Power BI

FAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide

Building Intelligent Cross Platform Mobile Applications using Xamarin & Azure Search. Liam Cavanagh Principal Program Manager Azure

Who s Who A Linked Data Visualisation Tool for Mobile Environments

Open And Linked Data Oracle proposition Subtitle

The Emerging Data Lake IT Strategy

LODatio: A Schema-Based Retrieval System forlinkedopendataatweb-scale

Design & Manage Persistent URIs

OKKAM-based instance level integration

case study The Asset Description Metadata Schema (ADMS) A common vocabulary to publish semantic interoperability assets on the Web July 2011

Technical implementation plan for next period

SEMANTIC TECHNOLOGIES FOR CULTURAL HERITAGE SMARTCULTURE CONFERENCE , BRUSSELS

Transcription:

DataGraft: A Platform for Open Data Publishing Dumitru Roman 1, Marin Dimitrov 2, Nikolay Nikolov 1, Antoine Putlier 1, Brian Elvesæter 1, Alex Simov 2, Yavor Petkov 2 1 SINTEF, Forskningsveien 1a, 0373 Oslo, Norway {firstname.lastname}@sintef.no 2 Ontotext AD, Tsarigradsko Shosse 47A, 1784 Sofia, Bulgaria {firstname.lastname}@ontotext.com Abstract. DataGraft is a platform for Open Data management. It has the goals to simplify and speed up the data publishing process and to improve the reliability and scalability of the data consumption process. This demonstrator provides a summary of the key features of the current DataGraft platform as well as simple demo scenario from the domain of property-related data. 1 Introduction DataGraft has the goal of providing tools and approaches for easier and lower-cost publication and reuse of Open Data (and Linked Data in particular). The lifecycle for publishing Open Data typically involves data cleaning & transformation (most often from tabular formats), mapping to standard Linked Data models and generating a semantic RDF graph. The resulting semantic graph is stored in a triple store, so that applications and services can easily access and query the data. While this process is rather straightforward, publishing and consuming of (linked) Open Data still remains a complex and time consuming task due to a variety of reasons: 1. The technical complexity of preparing Open Data for publication is high toolkits are poorly integrated and often require expert knowledge; 2. There is a considerable cost for publishing data and providing reliable access to it. The required expertise & resources often become excessively high for many nonprofit organisations; 3. The poorly maintained and fragmented supply of Open Data: datasets are usually provided through disconnected channels; inconsistently formatted and structured; poorly maintained. 2 The DataGraft Platform DataGraft 1 provides a cloud-based platform for open Data publishing. Its key features are: 1 http://datagraft.net/

Interactive design of data transformations: transformations provide feedback to publishers on how data changes; Repeatable data transformations: data transformation processes often need to be repeatedly executed as new data arrives. Executable and repeatable transformations are a key requirement for a low cost data publication process; Shareable and reusable data transformations: Capabilities to reuse and extend data transformations created by other developers further lowers the data publication cost; Reliable data access: provisioning data reliably is another key aspect for the 3 rd party data services and applications built on top of Open Data. Fig. 1. Key DataGraft components The key enablers of DataGraft are shown in Fig. 1. Grafter 2, which is an open source framework of reusable components designed to support complex data transformations. Grafter provides a domain-specific language (DSL), which allows the specification of transformation pipelines that convert tabular data or produce linked data graphs. The main advantages of Grafter over similar ETL frameworks include: 1) efficient support for very large datasets, due to its streaming approach for data processing; 2) its highly modular and extensible design; 3) the ability to serialize and execute transformations as services in a sandboxed environment. Grafterizer is an open source web-based frontend for data cleaning and transformation built on top of Grafter. It provides an interactive user interface that supports the data transformation process: 1) forking of existing data transformations; 2) creating complex data transformation workflows by combining and configuring data transformation steps; and 3) live preview of the data transformation over sample data. Another key enabler is the semantic Graph Database-as-a-Service (DBaaS) triple store, which is used for accessing the Linked Data on the platform. With a databaseas-a-service solution, data publishers do not need to deal with administrative overheads such as installation, upgrades and maintenance, provisioning, etc. From the point of view of a data publisher or a data consumer, the DBaaS provides standard 2 http://grafter.org/

APIs and endpoints for Linked Data access, querying, and management. These functionalities are based on a complex cloud architecture, which ensures the database scalability, extensibility and availability on large scale [1]. Finally, the Open Data portal integrates the components together in a web-based interface. The entire process of publishing data is reduced to a simple wizard-like interface, where publishers can simply drop their data and enter some basic metadata. Currently, the platform provides a number of visualization widgets, including tables, line charts, bar charts, pie charts, scatter charts, bubble charts and maps. 3 Demo Scenario: Publishing Property-related Data The simple demonstration scenario will highlight the capabilities of the DataGraft platform: transforming data by the State of Estate service for state-owned properties in Norway and publishing the data as Linked Data. The scenario workflow is summarised in Fig. 2. The scenario will demonstrate: Fig. 2. Demo scenario 1. Interactive specification of tabular data transformations and mapping of tabular data to graph data (Linked Data); 2. Publication of data transformations on the DataGraft asset catalogue; 3. Execution and storage of transformed data on the semantic graph database-asa-service on DataGraft; 4. Sharing, reusing and extending user-generated content; 5. Querying published data from the live endpoint and visualising query results (Fig. 3). A visitor of the demonstration will learn how to:

Use DataGraft to for simple data transformation and publishing; Easily create data transformations through the DataGraft s GUI; Share and reuse data transformations already published in DataGraft; Run data transformations and publish the resulting data on DataGraft s cloudbased semantic graph database; Access and query data published on DataGraft; Use DataGraft for real life applications (publishing property data). Fig. 3. Data query and visualization in DataGraft DataGraft is available via http://datagraft.net/ and further details can be found in [2]. 4 Ongoing Work DataGraft is currently under active development within the prodatamarket project 3 and new features and improvements are being added to the live platform on a regular basis. Various new DataGraft features are already in development or planned to be delivered within the next 12 months: 3 http://prodatamarket.eu/

Extending the data hosting platform towards data science and analytics, with the ability to configure and run simple analytics directly on the platform (rather than downloading data and running the analytics locally); Ability to interlink the generated Linked Data to existing datasets in a semiautomated manner; Dealing with data streams (rather than static input data files); Extensions towards working with large geo-spatial datasets and queries; Ability to share and reuse other assets, such as data queries or visualization widgets; Improved error reporting in data transformations; Acknowledgements. This work was partly funded by the European Commission within the following research projects: DaPaaS (FP7 610988), SmartOpenData (FP7 603824), InfraRisk (FP7 603960), and prodatamarket (H2020 644497). References 1. M. Dimitrov, A. Simov, and Y. Petkov. Low-cost Open Data As-a-Service in the Cloud. In proceedings of the 2nd Semantic Web Developers Workshop (SemDev 2015), part of the Extended Semantic Web Conference (ESWC 2015), May 31st 2015, Portoroz, Slovenia. 2. D. Roman, N. Nikolov, A. Putlier, D. Sukhobok, B. Elvesæter, A. Berre, X. Ye, M. Dimitrov, A. Simov, M. Zarev, R. Moynihan, B. Roberts, I. Berlocher, S. Kim, T. Lee, A. Smith, and T. Heath. DataGraft: One-Stop-Shop for Open Data Management. Technical Report, January 2016. Available at http://www.semantic-webjournal.net/system/files/swj1285.pdf.