Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016

Similar documents
An Oracle White Paper November Primavera Unifier Integration Overview: A Web Services Integration Approach

Generate Invoice and Revenue for Labor Transactions Based on Rates Defined for Project and Task

Oracle CIoud Infrastructure Load Balancing Connectivity with Ravello O R A C L E W H I T E P A P E R M A R C H

Automatic Receipts Reversal Processing

Creating Custom Project Administrator Role to Review Project Performance and Analyze KPI Categories

Configuring Oracle Business Intelligence Enterprise Edition to Support Teradata Database Query Banding

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

Loading User Update Requests Using HCM Data Loader

Installation Instructions: Oracle XML DB XFILES Demonstration. An Oracle White Paper: November 2011

Oracle NoSQL Database For Time Series Data O R A C L E W H I T E P A P E R D E C E M B E R

Oracle DIVArchive Storage Plan Manager

Correction Documents for Poland

An Oracle White Paper December Oracle Exadata Database Machine Warehouse Architectural Comparisons

Oracle Exadata Statement of Direction NOVEMBER 2017

Veritas NetBackup and Oracle Cloud Infrastructure Object Storage ORACLE HOW TO GUIDE FEBRUARY 2018

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Oracle Fusion Configurator

Tutorial on How to Publish an OCI Image Listing

JD Edwards EnterpriseOne Licensing

An Oracle White Paper June Enterprise Database Cloud Deployment with Oracle SuperCluster T5-8

Using the Oracle Business Intelligence Publisher Memory Guard Features. August 2013

Oracle Cloud Applications. Oracle Transactional Business Intelligence BI Catalog Folder Management. Release 11+

Working with Time Zones in Oracle Business Intelligence Publisher ORACLE WHITE PAPER JULY 2014

Automatic Data Optimization with Oracle Database 12c O R A C L E W H I T E P A P E R S E P T E M B E R

Oracle Data Masking and Subsetting

An Oracle White Paper September Security and the Oracle Database Cloud Service

Oracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE

Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fences

Load Project Organizations Using HCM Data Loader O R A C L E P P M C L O U D S E R V I C E S S O L U T I O N O V E R V I E W A U G U S T 2018

Oracle Enterprise Performance Reporting Cloud. What s New in September 2016 Release (16.09)

An Oracle White Paper December, 3 rd Oracle Metadata Management v New Features Overview

An Oracle White Paper September Oracle Utilities Meter Data Management Demonstrates Extreme Performance on Oracle Exadata/Exalogic

Subledger Accounting Reporting Journals Reports

Configuring a Single Oracle ZFS Storage Appliance into an InfiniBand Fabric with Multiple Oracle Exadata Machines

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Hybrid Columnar Compression (HCC) on Oracle Database 18c O R A C L E W H IT E P A P E R FE B R U A R Y

StorageTek ACSLS Manager Software Overview and Frequently Asked Questions

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

Oracle Data Provider for.net Microsoft.NET Core and Entity Framework Core O R A C L E S T A T E M E N T O F D I R E C T I O N F E B R U A R Y

An Oracle White Paper October Oracle Social Cloud Platform Text Analytics

Oracle JD Edwards EnterpriseOne Object Usage Tracking Performance Characterization Using JD Edwards EnterpriseOne Object Usage Tracking

Oracle Hyperion Planning on the Oracle Database Appliance using Oracle Transparent Data Encryption

Repairing the Broken State of Data Protection

Achieving High Availability with Oracle Cloud Infrastructure Ravello Service O R A C L E W H I T E P A P E R J U N E

Sun Fire X4170 M2 Server Frequently Asked Questions

Extreme Performance Platform for Real-Time Streaming Analytics

Oracle Database 12c: JMS Sharded Queues

Oracle JD Edwards EnterpriseOne Object Usage Tracking Performance Characterization Using JD Edwards EnterpriseOne Object Usage Tracking

Siebel CRM Applications on Oracle Ravello Cloud Service ORACLE WHITE PAPER AUGUST 2017

RAC Database on Oracle Ravello Cloud Service O R A C L E W H I T E P A P E R A U G U S T 2017

An Oracle White Paper October The New Oracle Enterprise Manager Database Control 11g Release 2 Now Managing Oracle Clusterware

Oracle Enterprise Data Quality New Features Overview

Oracle Event Processing Extreme Performance on Sparc T5

Oracle Secure Backup. Getting Started. with Cloud Storage Devices O R A C L E W H I T E P A P E R F E B R U A R Y

Oracle Business Activity Monitoring 12c Best Practices ORACLE WHITE PAPER DECEMBER 2015

April Understanding Federated Single Sign-On (SSO) Process

Hard Partitioning with Oracle VM Server for SPARC O R A C L E W H I T E P A P E R J U L Y

Oracle Flash Storage System QoS Plus Operation and Best Practices ORACLE WHITE PAPER OCTOBER 2016

An Oracle White Paper June StorageTek In-Drive Reclaim Accelerator for the StorageTek T10000B Tape Drive and StorageTek Virtual Storage Manager

Oracle Learn Cloud. Taleo Release 16B.1. Release Content Document

Oracle Service Registry - Oracle Enterprise Gateway Integration Guide

Technical Upgrade Guidance SEA->SIA migration

Using Oracle In-Memory Advisor with JD Edwards EnterpriseOne

Oracle Big Data Connectors

WebCenter Portal Task Flow Customization in 12c O R A C L E W H I T E P A P E R J U N E

Oracle WebLogic Portal O R A C L E S T A T EM EN T O F D I R E C T IO N F E B R U A R Y 2016

Oracle Communications Interactive Session Recorder and Broadsoft Broadworks Interoperability Testing. Technical Application Note

An Oracle White Paper September, Oracle Real User Experience Insight Server Requirements

Oracle Clusterware 18c Technical Overview O R A C L E W H I T E P A P E R F E B R U A R Y

A Distinctive View across the Continuum of Care with Oracle Healthcare Master Person Index ORACLE WHITE PAPER NOVEMBER 2015

An Oracle Technical White Paper September Detecting and Resolving Oracle Solaris LUN Alignment Problems

SecureFiles Migration O R A C L E W H I T E P A P E R F E B R U A R Y

ORACLE S PEOPLESOFT GENERAL LEDGER 9.2 (WITH COMBO EDITING) USING ORACLE DATABASE 11g FOR ORACLE SOLARIS (UNICODE) ON AN ORACLE S SPARC T7-2 Server

ORACLE SNAP MANAGEMENT UTILITY FOR ORACLE DATABASE

An Oracle Technical White Paper October Sizing Guide for Single Click Configurations of Oracle s MySQL on Sun Fire x86 Servers

An Oracle White Paper May Oracle VM 3: Overview of Disaster Recovery Solutions

Oracle Flashback Data Archive (FDA) O R A C L E W H I T E P A P E R M A R C H

Deploying Custom Operating System Images on Oracle Cloud Infrastructure O R A C L E W H I T E P A P E R M A Y

Oracle Database Vault

An Oracle White Paper July Methods for Downgrading from Oracle Database 11g Release 2

Oracle Utilities CC&B V2.3.1 and MDM V2.0.1 Integrations. Utility Reference Model Synchronize Master Data

Oracle VM 3: IMPLEMENTING ORACLE VM DR USING SITE GUARD O R A C L E W H I T E P A P E R S E P T E M B E R S N

Oracle Grid Infrastructure 12c Release 2 Cluster Domains O R A C L E W H I T E P A P E R N O V E M B E R

August 6, Oracle APEX Statement of Direction

Oracle Service Cloud Agent Browser UI. November What s New

An Oracle White Paper July Oracle WebCenter Portal: Copying a Runtime-Created Skin to a Portlet Producer

DATA INTEGRATION PLATFORM CLOUD. Experience Powerful Data Integration in the Cloud

Oracle Database Appliance X6-2S / X6-2M ORACLE ENGINEERED SYSTEMS NOW WITHIN REACH FOR EVERY ORGANIZATION

Oracle Financial Consolidation and Close Cloud. What s New in the December Update (16.12)

Oracle Financial Consolidation and Close Cloud. What s New in the November Update (16.11)

ORACLE DATABASE LIFECYCLE MANAGEMENT PACK

STORAGETEK SL150 MODULAR TAPE LIBRARY

Deploy VPN IPSec Tunnels on Oracle Cloud Infrastructure. White Paper September 2017 Version 1.0

Technical White Paper August Recovering from Catastrophic Failures Using Data Replicator Software for Data Replication

Migrating VMs from VMware vsphere to Oracle Private Cloud Appliance O R A C L E W H I T E P A P E R O C T O B E R

An Oracle White Paper February Optimizing Storage for Oracle PeopleSoft Applications

Oracle Database Vault

Increasing Network Agility through Intelligent Orchestration

SOA Cloud Service Automatic Service Migration

Transitioning from Oracle Directory Server Enterprise Edition to Oracle Unified Directory

Transcription:

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016

Introduction One trillion is a really big number. What could you store with one trillion facts?» 1000 tweets for every one of the 1 Billion Twitter users.» 770 facts about every one of the 1.3 Billion Facebook users.» 10 facts from 107 Billion sensors, located somewhere on the planet.» 400 metabolic readings for each of the 2.5 Billion heart beats over an average human life time.» 12 facts about every one of the 86 Billion neurons in the human brain.» 5 facts about every one of the 200 Billion stars in the Milky Way Galaxy.» 7 facts about every one of the 150 Billion galaxies in the universe.» 6,350 facts about each of the 158 Million books in the Library of Congress, the largest in the world.» 10 facts about each of the 107 Billion people who ever lived Resource Description Framework (RDF) graphs and the analytics they permit are becoming central to big data applications for social networks and linked data. These applications are often found in public sector, healthcare and life sciences, finance, media, and intelligence communities. The World Wide Web Consortium (W3C) 1 defines RDF and the Web Ontology Language (OWL) graph standards for representing and defining semantic data and rules, and SPARQL, a pattern matching query language designed specifically for graph analysis. The basic nature of an RDF graph facilitates identification, integration, and discovery:» RDF data elements are globally unique. They are defined using Uniform Resource Identifiers (URIs) that enable a consistent metadata layer for integration of disparate data sources.» RDF data elements are linked to form a graph. Elements are used to make statements in the form of subject-predicate-object triples. Predicates (edges) link the subject and object (nodes) and can describe any relationship or property. The object can be another subject to link triples together to form a graph or a literal that is an attribute of the subject. The triples can be further qualified with a fourth named graph component, which are referred to as RDF quads.» The RDF model allows easy, dynamic schema evolution. Adding a new schema element is as easy as inserting a triple with a new predicate.» RDF and SPARQL support ad hoc queries. Queries may not be known when the schema is designed. 1 http://www.w3.org/rdf/ 1 ORACLE SPATIAL AND GRAPH: BENCHMARKING A TRILLION EDGES RDF GRAPH

» The RDF model makes an Open World Assumption that can facilitate discovery. It assumes that what is unknown is undefined, rather than false, as is the case with relational technology. It also has technologies that help discover missing results.» RDF embeds semantics (meaning) directly in the data. Entities are categorized with classes, predicates are properties or relationships, and they are all part of the data, unlike column headers, foreign keys, or constraints in relational data.» RDF supports machine-driven inferencing for discovery. The OWL semantic language and rules used to define the predicates in triples are based on formal Description Logics that enable automatic discovery, such as identifying same-as relationships between different terms with the same meaning in two applications. The set of inferred triples (conclusions that can be drawn) is referred to as an entailment.» The OWL language can unify an enterprise s dictionaries, vocabularies, and taxonomies. All of the terms used by the applications in an enterprise can be related to each other and form concepts. Concepts are managed as one or more domain-specific ontologies and stored in RDF graphs. Ontologies are linked to the asserted instance data in graphs and used for inferencing and querying. This is another capability that facilitates creating a consistent metadata layer for data integration. The Lehigh University Benchmark 2 (LUBM) is a de facto industry standard benchmark for evaluating RDF graph store product performance. It is used by RDF graph store vendors to characterize the load, inference, and query performance of their product. Vendors post results on the W3C Large Triple Stores page 3. End users use LUBM benchmark results as part of their evaluation of an RDF Graph store product. The benchmark includes a W3C OWL-based university ontology, a data generator to create a graph of any size, and fourteen test queries. Oracle conducted a one trillion edges LUBM benchmark (LUBM 4400k) in September 2014 with Oracle Spatial and Graph in Oracle Database 12c on an Oracle Exadata Database Machine and achieved two record-setting accomplishments:» Oracle believes its benchmark is the largest complete LUBM benchmark in the industry to date.» The combined load, inference, and query results are the fastest RDF graph performance numbers reported; this is especially significant for a benchmark of this scale and complexity. The details for this benchmark, including results, configuration, and best practices are discussed in the next section of this paper. 2 http://swat.cse.lehigh.edu/projects/lubm/ 3 https://www.w3.org/wiki/largetriplestores 2 ORACLE SPATIAL AND GRAPH: BENCHMARKING A TRILLION EDGES RDF GRAPH

A Trillion Edges RDF Graph Benchmark on Oracle Database 12c As big data graphs grow from billions to trillions of relationships it becomes increasingly important to characterize product performance. Oracle conducted an RDF graph LUBM 4400k benchmark in September 2014. It involved loading, inferencing, and querying over one trillion edges with Oracle Spatial and Graph in Oracle Database 12c on an Oracle Exadata Database Machine. The LUBM environment was used to generate data about universities and their departments. The data was created and ordered into 4.4 million named graphs by expanding the triples into quads. There was one named graph per university. The overall graph included 605.4 billion unique asserted quads and an entailment of another 475.6+ billion quads. The Results The Oracle Spatial and Graph LUBM 4400k benchmark on Oracle Database 12c achieved the following results:» Data Loading Performance: 1.420 million Quads Loaded and Indexed per Second.» 605.4 Billion Quads were loaded and two indexes were created in 115.2 hours.» Note: Graph loading in Oracle Database is unique in the industry for checking that quads are well formed and for removing duplicates.» Inference Performance: 1.527 million Triples Inferred and Indexed per Second.» 475.6 Billion Triples and two indexes were created in 86.5 hours.» SPARQL Query Performance: 1.130 Million Query Results per Second.» 92.5 Billion Answers were generated in 22.5 hours. A TRILLION TRIPLES GRAPH Asserted Inferred Total Answers 605.4 Billion Quads 475.6 Billion Triples 1.081 Trillion Quads 92.5 Billion The Configuration The market-leading performance of this benchmark was due to the combination of the native RDF graph store capabilities of Oracle Spatial and Graph in Oracle Database 12c on the balanced configuration of an Oracle Exadata Database Machine X4-2. The unique capabilities of the Exadata Database Machine that assisted benchmark query performance include:» Smart Scan that reduces data movement between storage servers (cells) and database server by pushing queries down to the storage cell,» storage indexes used by the storage cell to read only regions of storage that have relevant data, and» InfiniBand fabric that provides fast transfer (40 Gb/second) of relevant bytes back to the database server to complete the execution of a query. The Oracle Exadata Database Machine X4-2 High capacity full rack was configured as follows:» 8 database nodes (192 cores) and 14 storage nodes (168 cores)» 2 TB total RAM and 44.8 TB Flash Cache» ZS3-2 storage with 2 controllers and 8 trays of disks» Software: Oracle Database 12.1.0.1 standard installation on Exadata Database Machine. 3 ORACLE SPATIAL AND GRAPH: BENCHMARKING A TRILLION EDGES RDF GRAPH

Best Practices Used The best practices fall into two categories, database settings and tuning. Database settings:» SGA_TARGET=132GB» PGA_AGGREGATE_TARGET=100G» Open cursors=1000» Processes=1000» 32K blocksize given to all graph tablespaces» a TEMP group created with 3 bigfile tablespaces» Use of the auto-allocate option for allocation of tablespace extents coupled with a large, 8 million bytes extent size. This reduced the number of waits caused by HV enqueue contention; that is, waits on a lock that is used to alter the high-water mark in a tablespace. As a result, contention among multiple processes requesting tablespace expansion could be avoided.» DOP settings (296, 256, 192) for automatic degrees of parallelism used in loading, inferencing, and querying.» Use of additional compression beyond basic table compression during inferencing provided by the Hybrid Columnar Compression feature of Oracle Exadata Database Machine. Tuning:» Oracle Enterprise Manager provided specific performance insights into operations for tuning. The methodology used is documented in the Oracle Database Performance Tuning Guide. 4 Conclusion RDF graphs provide unique, standards-based, big data capabilities for metadata integration, and discovery to support social networks and linked data applications in a variety of industries. Oracle Spatial and Graph demonstrated industry-leading scalability and performance for loading, inference, and querying a one trillion edges RDF graph managed in Oracle Database 12c. The LUBM 4400k RDF graph benchmark benefited from the balanced hardware configuration of an Oracle Exadata Database Machine X4-2. The best practices settings used to achieve these benchmark results are also generally applicable to real-world applications on Oracle Exadata Database Machine and other balanced hardware configurations. 4 http://docs.oracle.com/database/121/tgdba/toc.htm 4 ORACLE SPATIAL AND GRAPH: BENCHMARKING A TRILLION EDGES RDF GRAPH

Oracle Corporation, World Headquarters Worldwide Inquiries 500 Oracle Parkway Phone: +1.650.506.7000 Redwood Shores, CA 94065, USA Fax: +1.650.506.7200 CONNECT WITH US blogs.oracle.com/oracle facebook.com/oracle twitter.com/oracle oracle.com Copyright 2016, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only, and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 0116 Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph November 2016 Author: Bill Beauregard Contributing Authors: Souri Das, Matt Perry, Zhe Wu 1 ENTER TITLE OF DOCUMENT HERE