The EHRI GraphQL API IEEE Big Data Workshop on Computational Archival Science

Similar documents
GraphQL: Mind Your Ps and QLs

GraphQL for Archival Metadata: An Overview of the EHRI GraphQL API

15/06/2018 In Out, In Out, And Shake It All About. A Moving Story of Data

Introduction to GraphQL and Relay. Presenter: Eric W. Greene

European Holocaust Research Infrastructure Theme [INFRA ] GA no Deliverable D19.5

The Future of the Realtime Web BETTER APIS WITH GRAPHQL. Josh

API Documentation. Web Application Development. Zsolt Tóth. University of Miskolc. Zsolt Tóth (University of Miskolc) API Documentation / 28

Reflecting on digital library technologies

DATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016

Graphene Documentation

The DataCite Metadata Schema. Frauke Ziedorn Workshop: Metadata and Persistent Identifiers for Social and Economic Data 7th May 2012


LUCITY REST API INTRODUCTION AND CORE CONCEPTS

REST API Documentation Using OpenAPI (Swagger)

GraphQL. Concepts & Challenges. - I m Robert Mosolgo - Work from home Ruby developer - From Charlottesville VA - For GitHub

ADVANCED DATABASES CIS 6930 Dr. Markus Schneider. Group 5 Ajantha Ramineni, Sahil Tiwari, Rishabh Jain, Shivang Gupta

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

GraphQL in Python and Django. Patrick

LOG8430: Architecture logicielle et conception avancée

Oracle APEX 18.1 New Features

Joining the BRICKS Network - A Piece of Cake

Best Practices for Choosing Content Reporting Tools and Datasources. Andrew Grohe Pentaho Director of Services Delivery, Hitachi Vantara

Course Content MongoDB

Its All About The Metadata

Big Data Hadoop Stack

Sharing Archival Metadata MODULE 20. Aaron Rubinstein

RPC VS. REST VS. GraphQL

bjoern Documentation Release 0.1 Fabian Yamaguchi

JVA-563. Developing RESTful Services in Java

Big Data Hadoop Course Content

GraphQL - when REST API is not

Testing with Soap UI. Tomaš Maconko

Vlad Vinogradsky

Data Exchange and Conversion Utilities and Tools (DExT)

The Community Data Portal and the WMO WIS

Integration Framework. Architecture

X-S Framework Leveraging XML on Servlet Technology

The Design of a DLS for the Management of Very Large Collections of Archival Objects

Increasing access to OA material through metadata aggregation

Etanova Enterprise Solutions

An early challenge that LabKey users encounter is what is the best way to get their data into the server and what are the appropriate data types to

Web2py Instant Admin Documentation

ORCA-Registry v2.4.1 Documentation

Site Administrator Help

OAI-PMH. DRTC Indian Statistical Institute Bangalore

Metadata and Encoding Standards for Digital Initiatives: An Introduction

com.walmartlabs/lacinia-pedestal Documentation

CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench

What is database? Types and Examples

The Now Platform Reference Guide

OCF Specification Overview Core Technology Specification. OCF 2.0 Release June 2018

DATA FORMATS FOR DATA SCIENCE Remastered

Preventing Injection Vulnerabilities through Context-Sensitive String Evaluation (CSSE)

Programming Technologies for Web Resource Mining

Distributed Multitiered Application

We are ready to serve Latest Testing Trends, Are you ready to learn? New Batch Details

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

SYMFONY2 WEB FRAMEWORK

The Materials Data Facility

Table of Contents. I. Pre-Requisites A. Audience B. Pre-Requisites. II. Introduction A. The Problem B. Overview C. History

OpenGovIntelligence. Deliverable 3.5. OpenGovIntelligence ICT tools

The NoPlsql and Thick Database Paradigms

Query Based Reports in Maximo. Overview of Maximo Ad-hoc reporting functionality

SobekCM METS Editor Application Guide for Version 1.0.1

Topics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL

Using the MySQL Document Store

Oracle NoSQL Database Enterprise Edition, Version 18.1

B2SAFE metadata management

D2.5 Data mediation. Project: ROADIDEA

Linked Data in Translation-Kits

Portal Cache Tuning with Portal Cache Viewer Open Mic 10/01/2014

RESTful API Design APIs your consumers will love

ComSi WooCommerce product synchronization between unlimited webshops

Documenting APIs with Swagger. TC Camp. Peter Gruenbaum

Fondly Collisions: Archival hierarchy and the Europeana Data Model

Database Acceleration Solution Using FPGAs and Integrated Flash Storage

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

Call: Datastage 8.5 Course Content:35-40hours Course Outline

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

Problem: Solution: No Library contains all the documents in the world. Networking the Libraries

Web Services For Translation

Top 7 Data API Headaches (and How to Handle Them) Jeff Reser Data Connectivity & Integration Progress Software

Full Stack Web Developer Nanodegree Syllabus

XML information Packaging Standards for Archives

Archivists Toolkit: Description Functional Area

Investigating Source Code Reusability for Android and Blackberry Applications

XML Web Service? A programmable component Provides a particular function for an application Can be published, located, and invoked across the Web

Exploring the Nuxeo REST API

Research repository models: Can one size fit all?

WhamTech SmartData Fabric Healthcare Configurable FHIR REST APIs

MuseKnowledge Hybrid Search

MarkLogic 8 Overview of Key Features COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Yahoo Search ATS Plugins. Daniel Morilha and Scott Beardsley

Copyright 2014 Blue Net Corporation. All rights reserved

CISC 7610 Lecture 4 Approaches to multimedia databases

foreword to the first edition preface xxi acknowledgments xxiii about this book xxv about the cover illustration

Chapter 13 XML: Extensible Markup Language

How APEXBlogs was built

InfoSphere Data Architect Pluglets

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Transcription:

The EHRI GraphQL API IEEE Big Data Workshop on Computational Archival Science 13/12/2017 Mike Bryant CONNECTING COLLECTIONS

The EHRI Project The main objective of EHRI is to support the Holocaust research community by 1. integrating information on key archival collections and institutions into an online portal (https://portal.ehri-project.eu) 2. encouraging collaborative Holocaust research and archiving and investigating new methodologies http://ehri-project.eu

EHRI Portal APIs why? APIs = ways of getting data in structured form We want to offer our data to researchers in structured form We want to facilitate data interchange data between our own sites Also: Experimentation!

Embedding EHRI data into Wordpress

Search API REST-style: http://jsonapi.org

OAI-PMH Dublin Core & EAD

XML Downloads (EAD / EAC / EAG)

Ad-hoc datasets

Wikidata ghettos, concentration camps

Why another API? Lots of EHRI data not exposed by previous interfaces: Interconnections / links between items Annotations Digital provenance REST-like Search API cumbersome to extend and maintain Needed something better suited to building alternative Uis Also: something that was more user-friendly?

GraphQL? Invented by Facebook in 2012 Not related to graph databases! Name refers to a generic object graph Not tied to any type of storage mechanism Schema based Uses a schema to define the data types that are available and validate requests Allows requests to precisely defining the data required in the response Introspection: standard requests can inspect the schema Somewhere between HTTP REST API and full DB query language

GraphQL query & response example Queries list data fields required from those defined in the schema. Optional or mandatory parameters can qualify certain fields. Output data corresponds one-toone with query.

GraphQL ecosystem Big ecosystem of libraries and tools, supported by companies such as Facebook and Github: Apollo Relay Graphcool graphql-java and many more GraphiQL Online query-editing interface Works with any standard GraphQL API end-point

GraphiQL Inline documentation

GraphiQL Auto-completion of fields

GraphiQL Inline error messages

The EHRI GraphQL API - implementation Runs as a plugin to EHRI s Neo4j-based database Minimal network overhead Index-free traversals and native property lookups Inherits access-control system used in the EHRI portal GraphQL schema quite similar to database schema, but de-coupled and simplified in various ways

Use Case retrieving annotations

Use Case retrieving annotations

Use Case - context Archival resources are hierarchical, not a flat set Parent item Child items This can cause problems of over-fetching or underfetching of data in resource-oriented (REST-like) APIs: How much context is integral to the resource? How much is needed for any given usage? vs

Use Case - context

Use Case Bulk data streaming Most APIs are batch-oriented / paginated Fetching bulk data means fetching successive pages in multiple requests Many Big Data systems are better suited to stream processing rather than in-memory or distributed batches

Use Case Bulk data streaming EHRI GraphQL API allows setting a streaming mode Disables default pagination, allowing a single request to stream potentially unlimited amounts of data Very useful when combined with a JSON stream processing tool such as JQ curl -header X-Stream:true - data-binary @query.gql \ https://portal.ehri-project.eu/api/graphql

EHRI GraphQL: limitations Not nearly as expressive as a bona-fida database query language: Cannot do aggregation, sorting, or search Flip-side to allowing streaming bulk requests, available operations are deliberately limited in complexity to avoid potential resource-usage problems Currently only supports JSON response format

Future work Add lots more examples that users can load in GraphiQL Allow users to download data as a file, from GraphiQL Improve schema in certain ways

In summary No API is perfectly suited to every task GraphQL has many advantages Bring your own storage : can integrate with legacy databases Query model handles archival hierarcy/context well Schema and introspection enable many user aids and make API more explorable and discoverable Very active ecosystem of libraries and tools

THANKS! CONNECTING COLLECTIONS