<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Similar documents
Oracle NoSQL Database Overview Marie-Anne Neimat, VP Development

Oracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Oracle NoSQL Database Enterprise Edition, Version 18.1

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

MySQL Cluster Web Scalability, % Availability. Andrew

Oracle NoSQL Database Enterprise Edition, Version 18.1

Architecture of a Real-Time Operational DBMS

MySQL HA Solutions Selecting the best approach to protect access to your data

Everything You Need to Know About MySQL Group Replication

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

MySQL Architecture Design Patterns for Performance, Scalability, and Availability

Introduction to Oracle NoSQL Database

Introduction to Database Services

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Scalability of web applications

MySQL High Availability

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

Mix n Match Async and Group Replication for Advanced Replication Setups. Pedro Gomes Software Engineer

MySQL & NoSQL: The Best of Both Worlds

VoltDB vs. Redis Benchmark

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

CS 655 Advanced Topics in Distributed Systems

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Intra-cluster Replication for Apache Kafka. Jun Rao

Building Highly Available and Scalable Real- Time Services with MySQL Cluster

WHITE PAPER: BEST PRACTICES. Sizing and Scalability Recommendations for Symantec Endpoint Protection. Symantec Enterprise Security Solutions Group

Introduction to Distributed Data Systems

MySQL Group Replication. Bogdan Kecman MySQL Principal Technical Engineer

Oracle NoSQL Database

CISC 7610 Lecture 2b The beginnings of NoSQL

CIT 668: System Architecture. Amazon Web Services

Introduction to MySQL InnoDB Cluster

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Chapter 11: Implementing File Systems

MySQL High Availability. Michael Messina Senior Managing Consultant, Rolta-AdvizeX /

Aurora, RDS, or On-Prem, Which is right for you

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

OPERATING SYSTEM. Chapter 12: File System Implementation

Oracle NoSQL Database at OOW 2017

Group Replication: A Journey to the Group Communication Core. Alfranio Correia Principal Software Engineer

<Insert Picture Here> Value of TimesTen Oracle TimesTen Product Overview

BIS Database Management Systems.

Performance comparisons and trade-offs for various MySQL replication schemes

4 Myths about in-memory databases busted

SCALABLE CONSISTENCY AND TRANSACTION MODELS

A RESTful Java Framework for Asynchronous High-Speed Ingest

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

Craig Blitz Oracle Coherence Product Management

Building Consistent Transactions with Inconsistent Replication

TIBCO StreamBase 10 Distributed Computing and High Availability. November 2017

VOLTDB + HP VERTICA. page

MySQL Cluster Ed 2. Duration: 4 Days

!1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

What every DBA needs to know about JDBC connection pools Bridging the language barrier between DBA and Middleware Administrators

Oracle Database 18c and Autonomous Database

State of the Dolphin Developing new Apps in MySQL 8

Next-Generation Cloud Platform

Chapter 12: File System Implementation

MySQL Replication Update

MIS Database Systems.

Oracle NoSQL Database Concepts Manual. Release 18.3

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

Chapter 10: File System Implementation

Rule 14 Use Databases Appropriately

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.

Migrating Oracle Databases To Cassandra

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition

Federated Array of Bricks Y Saito et al HP Labs. CS 6464 Presented by Avinash Kulkarni

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Chapter 12: File System Implementation

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

Using Oracle NoSQL Database Workshop Ed 1

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

Oracle Database Mobile Server, Version 12.2

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

MarkLogic Server. Database Replication Guide. MarkLogic 9 May, Copyright 2017 MarkLogic Corporation. All rights reserved.

Documentation Accessibility. Access to Oracle Support

Safe Harbor Statement

Solaris Engineered Systems

MapR Enterprise Hadoop

10. Replication. Motivation

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

High Availability for Enterprise Clouds: Oracle Solaris Cluster and OpenStack

April 21, 2017 Revision GridDB Reliability and Robustness

Extreme Computing. NoSQL.

Dynamo: Amazon s Highly Available Key-value Store

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

MySQL Cluster for Real Time, HA Services

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

Distributed Data Management Replication

Chapter 24 NOSQL Databases and Big Data Storage Systems

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

Oracle NoSQL Database 3.0

Chapter 11: Implementing File

Transcription:

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle.

Agenda Oracle and NoSQL Oracle NoSQL Database Architecture Oracle NoSQL Database Technical Details

What Does NoSQL Mean To Oracle? Distributed Large data (Terabyte Petabyte range) Two categories OLTP Our focus is here BI (M/R & Hadoop) and we integrate here Data Models Key-Value Our focus is here Document Columnar Berkeley DB by itself is not really NoSQL by this definition

Target Use Cases Large schema-less data repositories Web applications (click-through capture) Online retail Sensor/statistics/network capture (factory automation for example) Backup services for mobile devices Scalable authentication Personalization Social Networks

Design Requirements Terabytes to Petabytes of data 10K s to 1M s ops/sec No single point of failure Elastic scalability on commodity hardware Fast, predictable response time to simple queries Flexible ACID transactions, a la Berkeley DB Unstructured or semi-structured data, with clustering capability Simple administration, enterprise support Commercial-grade NoSQL solution

Agenda Oracle and NoSQL Oracle NoSQL Database Architecture Oracle NoSQL Database Technical Details

Oracle NoSQL Database Overview A Distributed, Scalable Key-Value Database Simple Data Model Key-value pair (with major/minor key paradigm) CRUD + iteration Application NoSQL DB Driver (.jar) Scalability Data partitioning and distribution Optimized data access via intelligent driver High availability One or more replicas Resilient to partition master failures No single point of failure Disaster recovery through location of replicas Storage Nodes Data Center A Transparent load balancing Reads from master or replicas Driver is network topology & latency aware Elasticity (Release 2) Online addition/removal of storage nodes and automatic data redistribution Application NoSQL DB Driver (.jar) Storage Nodes Data Center B

Oracle NoSQL Database What the Programmer Sees Simple data model key-value pair (major/minor key paradigm) Simple operations CRUD, RMW (CAS), iteration Conflict resolution not required ACID transactions for records within a major key, single API call Unordered scan of all data (non-transactional) Ordered iteration across sub keys within a key Consistency (read) and durability (write) spec d per operation Major key: userid Sub key: subscriptions address Value: expiration date phone # email id

Oracle NoSQL Database No Single Point of Failure App Servers can be replicated by user Nodes are kept current (underlying BDB-JE HA technology) Driver (Request Dispatcher) is linked into each App Server Nodes may live in multiple Data Centers Node Replacement Graceful degration until Node is fixed, or node is replaced Automatic recovery

Oracle NoSQL Database Easy Management Web-based console and CLI commands Manages and Monitors Topology Configuration changes Load: Number of operations, data size Performance: Latency, throughput. Min, max, average, trailing, Events: Failover, recovery, load distribution Alerts: Failure, poor performance, We couldn t develop the product without this stuff

Agenda Oracle and NoSQL Oracle NoSQL Database Architecture Oracle NoSQL Database Technical Details

Oracle NoSQL Database Building Blocks Berkeley DB Java Edition Robust storage for a distributed key-value database ACID transactions Persistence High availability High throughput Large capacity Simple administration Already used in Amazon Dynamo Voldemort (LinkedIn) GenieDB

Oracle NoSQL Database Building upon Berkeley DB Java Edition Data Distribution Dynamic Partitioning (aka sharding ) Load Balancing Monitoring and Administration Predictable Latency Multi-Node Backup One-stop Support for entire stack Optimized Hardware (Oracle Big Data Appliance)

Distributed Data Storage Network Traffic Minimizing latency is key Design Target: High volume -- 100K-1M ops/second Many concurrent threads Driver is topology aware Directs operations to the proper node Performs load balancing Single network hop except when node fails or topology changes Topology changes returned with results Single and multi-record operations Common use case will access single key-value pair or related set of sub-keys

Oracle NoSQL Database Storage Terminology Key space hashes into multiple hash buckets (partitions) full key space Set of partitions maps to a replication group (logical container for subset of data)... Set of replication nodes for each replication group provides HA and read scalability for each replication group m m m m Storage node (physical or virtual machine) runs each replication node... Showing only a subset of the storage nodes

Oracle NoSQL Database Driver Locating Data: Partition Map MD5(major key) % npartitions Partition ID Partition Map maps Partition ID to Rep Group Each Driver has a copy of the Partition Map Initialized on the first request to any node Allows direct contact with a node capable of servicing the request

Oracle NoSQL Database Driver Locating Data: Rep Node State Table Rep Node State Table (RNST) locates optimum Rep Node within a Rep Group to handle a request Partition ID/Operation Type/Consistency RNST Contains Rep Node in Rep Group Rep Node in Rep Group that is currently the Master VLSN lag at a replica (how far behind a replica is) RNST Entry's last update time Number outstanding requests Trailing average time associated with a request Allows for varying topology and response times as seen from the drivers RNST updated by responses

Resolving a Request Hash Major Key to determine Partition id Use Partition Map to map Partition id to Rep Group Use State Table to determine eligible Rep Node(s) within Rep Group Use Load Balancer to select best eligible Rep Node Contact Rep Node directly Rep Node handles (almost always) or forwards request (almost never) Operation result + new Partition Map/RNST information returned to client

Oracle NoSQL Database Storage Nodes A machine with its own local storage and IP address Physical (preferred) or virtual Additional Storage Nodes increase throughput, capacity, or decrease latency Not necessarily symmetrical Different processors, memory, capacity May host one or more Replication Nodes By default, Replication Nodes are 1:1 with Storage Nodes It would be unwise to locate multiple Replication Nodes in the same Replication Group on the same Storage Node

Oracle NoSQL Database Topology Example 100M keys, 9 Storage Nodes might be configured as 1000 Partitions 3 Rep Groups Replication Factor 3 9 Rep/Storage Nodes P1 P1 P8 P2 P2 P9 P3 P333 P13 P14......... P111 Rep Group 1 Rep Group 2 Rep Group 3 SN1 SN3 SN2 SN4 SN5 SN6 SN7 SN8 SN9

High Availability/Replication Overview Replication group consists of single master and multiple replicas Logical replication Dynamic group membership, failover, elections Synchronous or Asynchronous Transaction durability and consistency guarantees are specifiable per operation Uses TCP/IP and standard Java libraries Supports nodes with heterogeneous platform hardware/os HA Monitor class tracks Replication Group state

High Availability/Replication Failure and Recovery Nodes join a replication group When quorum is reached, election is held resulting in Master + Replicas Node Failure Failed nodes can be replaced from the group via the Administrative API Rejoining nodes automatically synchronize with the master Isolated nodes can still service reads as long as consistency guarantees are met Master Failover Nodes detect failure via missing heartbeat or connection failure Automatic election of new master, via a distributed two phase election algorithm (PAXOS) For maximum availability, whenever feasible, elections are overlapped with the normal operation of a node System automatically maintains group membership and status

High Availability/Replication Transaction Behavior Controlled by a Durability option Transactions on the master can fail if Not enough nodes to support durability Node transitions from Master to Replica

Durability (Writes) Specified on per-operation basis, default can be changed Durability consists of... Sync policy (Master and Replica) Sync force to disk Write No Sync force to OS No Sync write when convenient Replica Ack Policy All Simple Majority None

Consistency (Reads) Specified on per-operation basis, default can be changed Consistency Absolute (read from Master) Time-based Version None (read from any node)

CAP We focus on simple_majority replica ack policy Only one version maintained No user reconciliation of records We might truncate transactions We are probably more CA than AP

What We ve Been Testing YCSB-based QA/benchmarking Key ~= 10 bytes, Data = 1108 bytes Prefer Keys up to 10 s of bytes, Data up to 100K > 100KB data ok Configurations of 10-200 nodes Typical Replication Factor of 3 (master + 2 replicas) Minimal I/O overhead B+Tree fits in memory => one I/O per record read Writes are buffered + log structured storage system == fast write throughput

Performance: Nashua Single Rep Group (3 nodes), Sun X4170, dual Xeon, X5670, 2.93 GHz (3.3 GHz turbo), 2 x 6 core, 2 threads/ core (24 contexts), 72GB memory, 8 x 300GB 10k SAS in RAID-5 400M records Inserts 15.3k/sec 9ms (95% ile), 12ms (99% ile) 500M records 50/50 read/update, overall 6,542 ops/sec Read latency: 7/32/58 ms (avg/95/99) Update latency: 8/33/61 ms (avg/95/99)

Performance: SLC 72 20 Rep Group on 60 VMs on 12 physical servers 2x6 Xeon, 96GB / physical machine 2 core x 2 threads (4 contexts), 16GB per VM 100M records/rep Group Inserts 109k/sec latency: 5/12/60 ms (avg/95/99) 50/50 read/update, overall 33k ops/sec Read latency: 16/74/155 ms (avg/95/99) Update latency: 19/79/171 ms (avg/95/99) Remember: these are VMs so we re only testing scalability, not real performance

Performance: Intel Dupont 64 Rep Groups on 192 physical servers 2x6 Xeon x5760 (2.93GHz/3.3GHz turbo), 24GB / machine 300GB local disk/machine 2.1B total records (YCSB limitation) 50/50 read/update, overall 101.4k ops/sec Ext3 tuning Read latency: 10/21/73 ms (avg/95/99) Update latency: 23/25/197(*) ms (avg/95/99) (*) multiple YCSB clients may be considering same keys hot under investigation

QUESTIONS?