New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Similar documents
Oracle NoSQL Database For Time Series Data O R A C L E W H I T E P A P E R D E C E M B E R

NOSQL DATABASE CLOUD SERVICE. Flexible Data Models. Zero Administration. Automatic Scaling.

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle CIoud Infrastructure Load Balancing Connectivity with Ravello O R A C L E W H I T E P A P E R M A R C H

Oracle TimesTen Scaleout: Revolutionizing In-Memory Transaction Processing

Veritas NetBackup and Oracle Cloud Infrastructure Object Storage ORACLE HOW TO GUIDE FEBRUARY 2018

Creating Custom Project Administrator Role to Review Project Performance and Analyze KPI Categories

Oracle NoSQL Database Parent-Child Joins and Aggregation O R A C L E W H I T E P A P E R A P R I L,

Achieving High Availability with Oracle Cloud Infrastructure Ravello Service O R A C L E W H I T E P A P E R J U N E

Generate Invoice and Revenue for Labor Transactions Based on Rates Defined for Project and Task

Oracle NoSQL Database Enterprise Edition, Version 18.1

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

Configuring Oracle Business Intelligence Enterprise Edition to Support Teradata Database Query Banding

Oracle DIVArchive Storage Plan Manager

Oracle JD Edwards EnterpriseOne Object Usage Tracking Performance Characterization Using JD Edwards EnterpriseOne Object Usage Tracking

Oracle Data Provider for.net Microsoft.NET Core and Entity Framework Core O R A C L E S T A T E M E N T O F D I R E C T I O N F E B R U A R Y

Correction Documents for Poland

Oracle Database 12c: JMS Sharded Queues

Extreme Performance Platform for Real-Time Streaming Analytics

Oracle NoSQL Database Parent-Child Joins and Aggregation O R A C L E W H I T E P A P E R M A Y,

Capacity planning for Oracle NoSQL Database cloud service

Oracle Fusion Configurator

Oracle Exadata Statement of Direction NOVEMBER 2017

August 6, Oracle APEX Statement of Direction

Automatic Receipts Reversal Processing

Loading User Update Requests Using HCM Data Loader

Installation Instructions: Oracle XML DB XFILES Demonstration. An Oracle White Paper: November 2011

Working with Time Zones in Oracle Business Intelligence Publisher ORACLE WHITE PAPER JULY 2014

An Oracle White Paper October The New Oracle Enterprise Manager Database Control 11g Release 2 Now Managing Oracle Clusterware

Oracle Business Activity Monitoring 12c Best Practices ORACLE WHITE PAPER DECEMBER 2015

Using the Oracle Business Intelligence Publisher Memory Guard Features. August 2013

An Oracle White Paper November Primavera Unifier Integration Overview: A Web Services Integration Approach

Tutorial on How to Publish an OCI Image Listing

October Oracle Application Express Statement of Direction

Oracle Big Data Connectors

Oracle Clusterware 18c Technical Overview O R A C L E W H I T E P A P E R F E B R U A R Y

Migrating VMs from VMware vsphere to Oracle Private Cloud Appliance O R A C L E W H I T E P A P E R O C T O B E R

RAC Database on Oracle Ravello Cloud Service O R A C L E W H I T E P A P E R A U G U S T 2017

Load Project Organizations Using HCM Data Loader O R A C L E P P M C L O U D S E R V I C E S S O L U T I O N O V E R V I E W A U G U S T 2018

Oracle Cloud Applications. Oracle Transactional Business Intelligence BI Catalog Folder Management. Release 11+

Oracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE

StorageTek ACSLS Manager Software Overview and Frequently Asked Questions

Siebel CRM Applications on Oracle Ravello Cloud Service ORACLE WHITE PAPER AUGUST 2017

Oracle JD Edwards EnterpriseOne Object Usage Tracking Performance Characterization Using JD Edwards EnterpriseOne Object Usage Tracking

Oracle Secure Backup. Getting Started. with Cloud Storage Devices O R A C L E W H I T E P A P E R F E B R U A R Y

Key Features. High-performance data replication. Optimized for Oracle Cloud. High Performance Parallel Delivery for all targets

JD Edwards EnterpriseOne Licensing

An Oracle White Paper September Security and the Oracle Database Cloud Service

Oracle Event Processing Extreme Performance on Sparc T5

Oracle Communications Interactive Session Recorder and Broadsoft Broadworks Interoperability Testing. Technical Application Note

Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fences

An Oracle White Paper August Building Highly Scalable Web Applications with XStream

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016

An Oracle White Paper May Oracle VM 3: Overview of Disaster Recovery Solutions

DATA INTEGRATION PLATFORM CLOUD. Experience Powerful Data Integration in the Cloud

An Oracle White Paper June Enterprise Database Cloud Deployment with Oracle SuperCluster T5-8

An Oracle Technical White Paper October Sizing Guide for Single Click Configurations of Oracle s MySQL on Sun Fire x86 Servers

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

An Oracle White Paper September, Oracle Real User Experience Insight Server Requirements

Oracle VM 3: IMPLEMENTING ORACLE VM DR USING SITE GUARD O R A C L E W H I T E P A P E R S E P T E M B E R S N

Oracle Data Masking and Subsetting

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Oracle WebLogic Server Multitenant:

Application Container Cloud

Automatic Data Optimization with Oracle Database 12c O R A C L E W H I T E P A P E R S E P T E M B E R

E-BUSINESS SUITE APPLICATIONS R12 (R12.2.5) ORDER MANAGEMENT (OLTP) BENCHMARK - USING ORACLE11g

Benefits of an Exclusive Multimaster Deployment of Oracle Directory Server Enterprise Edition

Using Oracle In-Memory Advisor with JD Edwards EnterpriseOne

Oracle Grid Infrastructure 12c Release 2 Cluster Domains O R A C L E W H I T E P A P E R N O V E M B E R

E-BUSINESS SUITE APPLICATIONS R12 (R12.2.5) HR (OLTP) BENCHMARK - USING ORACLE11g ON ORACLE S CLOUD INFRASTRUCTURE

Increasing Network Agility through Intelligent Orchestration

An Oracle White Paper. Released April 2013

An Oracle White Paper December, 3 rd Oracle Metadata Management v New Features Overview

Data Capture Recommended Operating Environments

Oracle Service Cloud Agent Browser UI. November What s New

Oracle Financial Consolidation and Close Cloud. What s New in the February Update (17.02)

Subledger Accounting Reporting Journals Reports

Establishing secure connections between Oracle Ravello and Oracle Database Cloud O R A C L E W H I T E P A P E R N O V E M E B E R

Pricing Cloud: Upgrading to R13 - Manual Price Adjustments from the R11/R12 Price Override Solution O R A C L E W H I T E P A P E R A P R I L

An Oracle White Paper October Release Notes - V Oracle Utilities Application Framework

Overview. Implementing Fibre Channel SAN Boot with the Oracle ZFS Storage Appliance. January 2014 By Tom Hanvey; update by Peter Brouwer Version: 2.

Sun Fire X4170 M2 Server Frequently Asked Questions

Oracle Cloud Infrastructure Virtual Cloud Network Overview and Deployment Guide ORACLE WHITEPAPER JANUARY 2018 VERSION 1.0

Oracle WebLogic Portal O R A C L E S T A T EM EN T O F D I R E C T IO N F E B R U A R Y 2016

Oracle Enterprise Performance Reporting Cloud. What s New in September 2016 Release (16.09)

Hard Partitioning with Oracle VM Server for SPARC O R A C L E W H I T E P A P E R J U L Y

VISUAL APPLICATION CREATION AND PUBLISHING FOR ANYONE

Deploying Apache Cassandra on Oracle Cloud Infrastructure Quick Start White Paper October 2016 Version 1.0

Transitioning from Oracle Directory Server Enterprise Edition to Oracle Unified Directory

Repairing the Broken State of Data Protection

Oracle Grid Infrastructure Cluster Domains O R A C L E W H I T E P A P E R F E B R U A R Y

Oracle Database Appliance X6-2S / X6-2M ORACLE ENGINEERED SYSTEMS NOW WITHIN REACH FOR EVERY ORGANIZATION

Oracle Enterprise Data Quality New Features Overview

April Understanding Federated Single Sign-On (SSO) Process

An Oracle Technical Article March Certification with Oracle Linux 4

Migration Best Practices for Oracle Access Manager 10gR3 deployments O R A C L E W H I T E P A P E R M A R C H 2015

ORACLE FABRIC MANAGER

Cloud Operations for Oracle Cloud Machine ORACLE WHITE PAPER MARCH 2017

An Oracle White Paper October Minimizing Planned Downtime of SAP Systems with the Virtualization Technologies in Oracle Solaris 10

Oracle Flash Storage System QoS Plus Operation and Best Practices ORACLE WHITE PAPER OCTOBER 2016

Transcription:

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval O R A C L E W H I T E P A P E R F E B R U A R Y 2 0 1 6 1 NEW ORACLE NoSQL DATABASE APIs that SPEED INSERTION AND RETRIEVAL

Introduction Fast insertion and retrieval of data into a NoSQL database is of critical importance to a modern organization. As large amounts of data are created (think Big Data or Internet of Things), a reliable and industry-leading database must be able to respond to the demands of both the users who consume the information as well as the mechanisms that are creating the data. In the following sections we discuss: Oracle NoSQL Database Summary and Motivation for a new API BulkPut API with example code BulkGet API with example code Oracle NoSQL Database Oracle NoSQL Database is a highly scalable, highly available, fault tolerant, Always On distributed key-value database that you can deploy on low cost commodity hardware in a scale-out manner. Use cases of Oracle NoSQL Database include Distributed Web-scale Applications, Real Time Event Processing, Mobile Data Management, Time Series and Sensor Data Management, Online Gaming, etc. Oracle NoSQL Database offers all the features that are common to a typical NoSQL product like Elasticity, Eventually Consistent Transactions, Multiple Data Centers, Secondary Indexes, Security and Schema Flexibility. The key differentiators include ACID Transactions, Online Rolling Upgrade, Streaming Large Object Support, availability on Engineered Systems, and Oracle Technology integrated. Data can be modeled as relational-database-style tables, JSON documents, or key-value pairs. Oracle NoSQL Database is a sharded (shared-nothing) system that distributes the data uniformly across the multiple shards in the cluster, based on the hashed value of the primary key. Within each shard, storage nodes are replicated to ensure high availability, rapid failover in the event of a node failure, and optimal load balancing of queries. NoSQL Database provides Java, C, Python and node.js drivers and a REST API to simplify application development. NoSQL Database is integrated with a wide variety of related Oracle and open source applications in order to simplify and streamline the development and deployment of modern big data applications. NoSQL Database is dual-licensed and available as an open-source community edition as well as a commercially licensed Enterprise Edition. Motivation Our customers have often asked us what s the fastest and most efficient way to insert and retrieve large number of records in Oracle NoSQL database? Recently, a shipping company reached out to us with the specific requirement of using Oracle NoSQL database for their ship management application, which is used to track the movements of their container ships that move the cargo from port to port. The cargo ships are all fitted with GPS and other tracking devices that relay a ship's location after a few seconds into the application. The application is then queried for: 1) The location of all the ships displayed on the map 2) A specific ship's trajectory over a given period of time. 2 NEW ORACLE NOSQL DATABASE API S THAT SPEED INSERTION AND RETRIEVAL

As the volume of the location data started growing, the company started finding it hard to scale the application and is now looking at a back-end system that can ingest this large data-set very efficiently. Oracle NoSQL Database BulkPut Historically, we have supported the option to execute a batch of operations for records that share the same shard key, which is what our large airline customer (Airbus) has done. They pre-sort the data by the shard key and then perform a multi-record insert for a batch of records that share the same shard key. Basically, rather than sending and storing one record at a time, they can send a large number of records in a single operation. This certainly saved network trips, but they could only batch insert records that shared the same shard key. With Oracle NoSQL Database release 3.5.2, we have added the ability to do a bulk insert records across different shards in parallel, allowing application developers to work more effectively with very large data-sets. The BulkPut API is available for table as well as for the key/value data model. The API provides significant performance gains over single row inserts by reducing the network traffic round trips as well as by doing ordered inserts in batch on internally sorted data across different shards in parallel. BulkPut API KV interface: Loads Key/Value pairs supplied by special purpose streams into the store. public void put(list<entrystream<keyvalue>> streams, BulkWriteOptions bulkwriteoptions) Table interface: Loads rows supplied by special purpose streams into the store. public void put(list<entrystream<row>> streams, BulkWriteOptions bulkwriteoptions) streams the streams that supply the rows to be inserted. bulkwriteoptions non-default arguments controlling the behavior the bulk write operations Stream Interface: public interface EntryStream<E> { String name(); E getnext(); void completed(); void keyexists(e entry); void catch Exception (RuntimeException exception, E entry); } BulkPut Performance We ran Yahoo Cloud Server Benchmark (YCSB) benchmark internally to measure the performance of the new BulkPut API. For the performance test, we set up a 3x3 NoSQL cluster, - three shards each having three copies of data for scalability and high availability reasons. The cluster was set up to run on bare metal servers with uniform configuration a total of nine servers (One NoSQL Storage Node per server) each configured with 250 GB RAM, 1TB HDD running Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz. The client machine also had similar hardware configuration. For the workload distribution we used Zipfian distribution of keys, which is intended to model real loads. We ingested a total of 150 M records (50 M records per shard) across the datastore, using a total of nine parallel 3 NEW ORACLE NOSQL DATABASE API S THAT SPEED INSERTION AND RETRIEVAL

threads (3 per shard) and a total of 54 input streams across nine storage nodes (6 per storage node). Parallel threads can be configured using pershardparallelism of BulkWriteOptions and InputStream can be configured using streamparallelism of BulkWriteOptions.The results for the benchmark run are shown in the graph below: Oracle NoSQL database allows a configurable acknowledgment-based durability policy that describes whether the master node will wait for these acknowledgments before considering the write operation to have completed successfully. You can require the master node to wait for no acknowledgments, acknowledgments from a simple majority of replica nodes in primary zones, or acknowledgments from all replica nodes in primary zones. The more acknowledgments the master requires, the slower its write performance will be. We ran the performance test with Durability settings for None (Ack-None) and Simple Majority (Ack-Simple Majority). The above graph compares the throughput (ops/sec) of BulkPut API and Simple Put API with NoSQL store having 1000 partitions for different durability settings. As seen from the above charts, there is over a 100% increase in throughput with either of the durability settings. Sample Example on GitHub An example of the BulkPut API can be found at: https://github.com/swatianand/nosql The sample demonstrates how to use the BulkPut API in your application code. There s also a ReadMe file in the same repository. Refer to the readme file for details related to the program execution. Oracle NoSQL Database BulkGet API Customers continue to ask for fast and easy-to-use methods for retrieving data from a NoSQL database. An example of such a request would be on an ecommerce website, where potential customers want to retrieve all the phones in the price range of $ 200 to $ 500 from Apple, Samsung, Nokia, and Motorola (for example) and a host of other manufacturers to return all the details including the images of the products. With release 3.4.7, we have introduced a high performance Bulk Get API, to retrieve records matching multiple primary keys in a single operation. For those of you who are familiar with SQL syntax, you can think of this to be similar to the IN clause that you supply to the SQL query. The API takes a list of keys, can be a partial or complete shard key, does a parallel scan across the shards for all keys in the list, and provides an iterator API over the 4 NEW ORACLE NOSQL DATABASE API S THAT SPEED INSERTION AND RETRIEVAL

matching rows including those matched by the ancestor or descendant tables. The result is not transactional and the operation effectively provides read-committed isolation. The implementation batches the fetching of rows in the iterator, to minimize the number of network round trips, while not monopolizing the available bandwidth. Batches are fetched in parallel across multiple Replication Nodes, and the degree of parallelism is controlled by the TableIteratorOptions argument. There are a few constraints the user should be aware of: 1) A primary key must contain all the fields defined for the table's shard key 2) The primary key should belong to the same table. For more information, refer to the Java documentation for the API. We support both the key/value and table interfaces for the API. Performance In our internal Yahoo Server Scalability Benchmark (YCSB) runs we found that we could retrieve 30 M rows in 149 sec with 72 executor threads, running on 3x3 NoSQL cluster (3 shards each having 3 copies of data), with 90 reader threads (client-side threads) and each record size is 100 bytes. The number of executor threads can be configured by the maxconcurrentrequests parameter of the TableIteratorOptions. The hardware configuration of the machine is similar to those described in the BulkPut performance section described above. Refer to the chart below for details of the benchmark runs: (a) (b) As seen from the timing graph (a) above, getting 30 M rows using simple get api would have taken us 420 seconds, which reduces to 149ms with 72 executor threads (Plotted on X-Axis) using the bulk get API. This is almost a 3X improvement! And as seen on the graph (b) the throughput went to 200K ops/s with 72 executor threads from 68k ops/sec using a simple get operation. That is again a 3X improvement! In the above charts, the bulk-x is the maximum number of concurrent request that specifies the maximum degree of parallelism (in effect the maximum number of NoSQL Client side threads) to be used when running an iteration. The optimal value for a parameter varies based on the nature of the application requirements -- some may want to be unobtrusive and use minimal resources (but efficiently) with elapsed time being a lower priority, e.g. running analytic on secondary zones, whereas some may want a strong real-time latency running multi-get on Primary Zones. Sample Example on GitHub 5 NEW ORACLE NOSQL DATABASE API S THAT SPEED INSERTION AND RETRIEVAL

An example of the BulkPut API can be found at: https://github.com/swatianand/oraclenosqlbulkgget. The sample example demonstrates how to use the new BulkGet API in the phone example that s described above. The repository also contains a ReadMe file that describes the table and also the steps to run the example. The example returns an iterator over the keys matching the manufacturers within the price range [200-500] supplied by the iterator. If along with other details it also desired to retrieve the images of all the phones, then the images can be modeled as a child table (for efficiency reasons) and the same can be retrieved in the single API call. Summary Oracle NoSQL BulkGet API and BulkPut API provide the most effective and performant way to store and fetch the data in bulk from Oracle NoSQL database. As demonstrated by the YCSB runs, using this API you can expect between a 2 and 3 times performance improvement for retrieving data in bulk. 6 NEW ORACLE NOSQL DATABASE API S THAT SPEED INSERTION AND RETRIEVAL

Oracle Corporation, World Headquarters Worldwide Inquiries 500 Oracle Parkway Phone: +1.650.506.7000 Redwood Shores, CA 94065, USA Fax: +1.650.506.7200 C O N N E C T W I T H U S blogs.oracle.com/oracle facebook.com/oracle twitter.com/oracle oracle.com Copyright 2016, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only, and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 0116 February, 2016 Authors: Anand Chandak, Jin Zhao, Michael Schulman 7 NEW ORACLE NOSQL DATABASE API S THAT SPEED INSERTION AND RETRIEVAL