Scylla Open Source 3.0

Similar documents
Applying Control Theory to Create a Self-Optimizing Database

SCYLLA: NoSQL at Ludicrous Speed. 主讲人 :ScyllaDB 软件工程师贺俊

Samsung Z-SSD and ScyllaDB: Delivering Low Latency and Multi-Terabyte Capacity in a Persistent Database

Migrating to Cassandra in the Cloud, the Netflix Way

GFS: The Google File System. Dr. Yingwu Zhu

Distributed File Systems II

Building the Real-Time Big Data Database: Seven Design Principles behind Scylla

GFS: The Google File System

ADVANCED DATABASES CIS 6930 Dr. Markus Schneider

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Migrating Oracle Databases To Cassandra

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

CA485 Ray Walshe Google File System

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Big Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

The Next Generation of Extreme OLTP Processing with Oracle TimesTen

A Fast and High Throughput SQL Query System for Big Data

Extreme Computing. NoSQL.

Bigtable. Presenter: Yijun Hou, Yixiao Peng

NPTEL Course Jan K. Gopinath Indian Institute of Science

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

IBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights

[This is not an article, chapter, of conference paper!]

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Apache Cassandra. Tips and tricks for Azure

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Outline. Introduction Background Use Cases Data Model & Query Language Architecture Conclusion

Axway API Management 7.5.x Cassandra Best practices. #axway

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

BigTable. Chubby. BigTable. Chubby. Why Chubby? How to do consensus as a service

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Database Architectures

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

Architekturen für die Cloud

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

The Google File System

BigTable. CSE-291 (Cloud Computing) Fall 2016

Cassandra- A Distributed Database

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

VMware Mirage Getting Started Guide

Distributed Data Management Replication

Part 1: Indexes for Big Data

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

How VoltDB does Transactions

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

The Google File System

CSE 124: Networked Services Lecture-16

Oracle NoSQL Database Release Notes. Release

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin

Aerospike Scales with Google Cloud Platform

Introduction to MapReduce

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Assignment 5. Georgia Koloniari

CSE 124: Networked Services Fall 2009 Lecture-19

Intro Cassandra. Adelaide Big Data Meetup.

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

April 21, 2017 Revision GridDB Reliability and Robustness

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Ghislain Fourny. Big Data 5. Column stores

The Google File System

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Schema-Agnostic Indexing with Azure Document DB

Percona Live September 21-23, 2015 Mövenpick Hotel Amsterdam

Glossary. Updated: :00

The Google File System

10 Million Smart Meter Data with Apache HBase

Ghislain Fourny. Big Data 5. Wide column stores

6.830 Lecture Spark 11/15/2017

CISC 7610 Lecture 2b The beginnings of NoSQL

Introduction to Distributed Data Systems

Map-Reduce. Marco Mura 2010 March, 31th

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

NoSQL BENCHMARKING AND TUNING. Nachiket Kate Santosh Kangane Ankit Lakhotia Persistent Systems Ltd. Pune, India

VERITAS Storage Foundation 4.0 TM for Databases

Technical Brief: Microsoft Configuration Manager 2012 and Nomad

There And Back Again

Cassandra Design Patterns

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

BigTable: A Distributed Storage System for Structured Data

Cassandra Database Security

Google is Really Different.

DataStax Enterprise 4.0 In-Memory Option A look at performance, use cases, and anti-patterns. White Paper

Comparison of Storage Protocol Performance ESX Server 3.5

BENCHMARK: PRELIMINARY RESULTS! JUNE 25, 2014!

Huge market -- essentially all high performance databases work this way

Protecting Mission-Critical Workloads with VMware Fault Tolerance W H I T E P A P E R

Transcription:

SCYLLADB PRODUCT OVERVIEW Scylla Open Source 3.0 Scylla is an open source NoSQL database that offers the horizontal scale-out and fault-tolerance of Apache Cassandra, but delivers 10X the throughput and consistent, low single-digit latencies. Implemented from scratch in C++, Scylla s close-to-the-hardware design significantly reduces the number of database nodes you require and self-optimizes to dynamic workloads and various hardware combinations.

CONTENTS INTRODUCTION 3 MATERIALIZED VIEWS 3 GLOBAL SECONDARY INDEXES 4 ALLOW FILTERING 5 NEW FILE FORMAT 5 HINTED HANDOFF 6 FULL, MULTI-PARTITION SCAN IMPROVEMENTS 7 STREAMING IMPROVEMENTS 8 RELEASE NOTES 9

INTRODUCTION With the release of Scylla Open Source 3.0, we ve introduced a rich set of new features for more efficient querying, reduced storage requirements, lower repair times, and better overall database performance. Already the industry s most performant NoSQL database, Scylla now includes production-ready features that surpass the capabilities of Apache Cassandra. Scylla Open Source 3.0 is now available for download MATERIALIZED VIEWS Material Views automate the tedious and inefficient chores created when an application maintains several tables with the same data organized differently. Data is divided into partitions that can be found by a partition key. Sometimes the application needs to find a partition or partitions by the value of another column. Doing this efficiently without scanning all of the partitions requires indexing. People have been using Materialized Views, also calling them denormalization, for years as a client-side implementation. In those days, the application maintained two or more views and two or more separate tables with the same data but under a different partition key. Every time the application wanted to write data, it needed to write to both tables, and reads were done directly (and efficiently) from the desired table. However, ensuring any level of consistency between the data in the two or more views required complex and slow application logic. Scylla s Materialized Views feature moves this complexity out of the application and into the servers. The implementation is faster (fewer round trips to the applications) and more reliable. This approach makes it much easier for applications to begin using multiple views into their data. The application just declares the additional views, Scylla creates the new view tables, and on every update to the base table the view tables are automatically updated as well. Writes are executed only on the base table directly and are automatically propagated to the view tables. Reads go directly to the view tables. As usual, the Scylla version is compatible in features and CQL syntax with the Apache Cassandra version (where it is still in experimental mode). CREATE TABLE buildings ( name text, city text, built int, Feet int, PRIMARY KEY (name) ); CREATE MATERIALIZED VIEW building_by_city AS SELECT * FROM buildings WHERE city IS NOT NULL PRIMARY KEY (city, name); SELECT * from building_by_city; SELECT name, built, feet from building_by_city LIMIT 1; 3

GLOBAL SECONDARY INDEXES Scylla Open Source 3.0 introduces production-ready global secondary indexes that can scale to any size distributed cluster unlike the local-indexing approach adopted by Apache Cassandra. The secondary index uses a Materialized View index under the hood in order to make the index independent from the amount of nodes in the cluster. Secondary Indexes are (mostly) transparent to the application. Queries have access to all the columns in the table and you can add and remove indexes without changing the application. Secondary Indexes can also have less storage overhead than Materialized Views because Secondary Indexes need to duplicate only the indexed column and primary key, not the queried columns like with a Materialized View. For the same reason, updates can be more efficient with Secondary Indexes because only changes to the primary key and indexed column cause an update in the index view. In the case of a Materialized View, an update to any of the columns that appear in the view requires the backing view to be updated. As always, the decision whether to use Secondary Indexes or Materialized Views really depends on the requirements of your application. If you need maximum performance and are likely to query a specific set of columns, you should use Materialized Views. However, if the application needs to query different sets of columns, Secondary Indexes are a better choice because they can be added and removed with less storage overhead depending on application needs. Global secondary indexes minimize the amount of data retrieved from the database, providing many benefits: Results are paged and customizable Filtering is supported to narrow result sets Keys, rather than data, are denormalized Supports more general-purpose use cases than Materialized Views 4

ALLOW FILTERING Allow filtering is a way to make a more complex query, returning only a subset of matching results. Because the filtering is done on the server, this feature also reduces the amount of data transferred over the network between the cluster and the application. Such filtering may incur processing impacts to the Scylla cluster. For example, a query might require the database to filter an extremely large data set before returning a response. By default, such queries are prevented from execution, returning the following message: Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. Unpermitted queries include those that restrict: Non-partition key fields Parts of primary keys that are not prefixes Partition keys with something other than an equality relation (though you can combine SI with ALLOW FILTERING to support inequalities; >= or <=; see below) Clustering keys with a range restriction and then by other conditions (see this blog) However, in some cases (usually due to data modeling decisions), applications need to make queries that violate these basic rules. Starting with Scylla Open Source 3.0, queries can be appended with the ALLOW FILTERING keyword to bypass this restriction and utilize server-side filtering. The benefits of filtering include: Cassandra query compatibility Spark-Cassandra connector query compatibility Query flexibility against legacy data sets NEW FILE FORMAT Scylla Open Source 3.0 introduces support for a more performant storage format (SSTable), which is not only compatible with Apache Cassandra 3.x but also reduces storage volume by as much as 3X. The older 2.x format used to duplicate the column name next to each cell on disk. The new format eliminates the duplication and the column names are stored once, within the schema. 5

The newly introduced format is identical to that used by Apache Cassandra 3.x, while remaining backward-compatible with prior Scylla SSTable formats. New deployments of Scylla Open Source 3.0 will automatically use the new format, while existing files remain unchanged. This new storage format delivers important benefits, including: Can read existing Apache Cassandra 3.x files when migrating Faster than previous versions Reduced storage footprint of up to 66%, depending on the data model used Range delete support HINTED HANDOFF Hinted handoffs are designed to help when any individual node is temporarily unresponsive due to heavy write load, network weather, hardware failure, or any other factor. Hinted handoffs also help in the event of short-term network issues or node restarts, reducing the time for scheduled repairs, and resulting in higher overall performance for distributed deployments. Originally introduced as an experimental feature in Scylla Open Source 2.1, hinted handoffs are another production-ready feature in Scylla Open Source 3.0. Based on the replication factor (RF), the co-ordinator attempts to write to RF nodes If one node is down, acknowledgments are only returned from two nodes Write 1 V 2,4,5 W 1,2,5 Client Z 1,3,4 X 1,3,4 Nodes = V, W, X, Y, Z Partitions = 1, 2, 3, 4, 5 Y 2,3,5 The co-ordinator will write and store a hint for the missing node Write 1 Client Request Ack V 2,4,5 Ack Z 1,3,4 Write Ack Write Hint x:1 W 1,2,5 Write X 1,3,4 Once the down node comes up, the co-ordinator will replay the hint for that node. After the coordinator receives an acknowledgement of the write, the hint is deleted. Nodes = V, W, X, Y, Z Partitions = 1, 2, 3, 4, 5 Y 2,3,5 6

Technically, a hint is a record of a write request held by the coordinator until an unresponsive replica node comes back online. When a write is deemed successful but one or more replica nodes fail to acknowledge it, Scylla will write a hint that is replayed to those nodes when they recover. Once the node becomes available again, the write request data in the hint is written to the replica node. Hinted handoffs deliver the following benefits: Minimizes the difference between data in the nodes when nodes are down whether for scheduled upgrades or for all-too-common intermittent network issues. Reduces the amount of data transferred during repair. Reduces the chances of checksum mismatch (during read-repair) and thus improves overall latency. FULL, MULTI-PARTITION SCAN IMPROVEMENTS Scylla Open Source 3.0 builds on earlier improvements by extending stateful paging to support range scans as well. As opposed to other partition queries, which read a single partition or a list of distinct partitions, range scans read all of the partitions that fall into the range specified by the client. Since the precise number and identity of partitions in a given range cannot be determined in advance, the query must read data from all nodes containing data for the range. Before (results are normalized to 1.0) After scylladb.com/2018/11/01/more-efficient-range-scan-paging-with-scylla-3-0/ 7

To improve range scan paging, Scylla Open Source 3.0 introduces a new control algorithm for reading all data belonging to a range from all shards, which caches the intermediate streams on each of the shards and directs paged queries to the matching, previously used, cached results. The new algorithm is essentially a multiplexer that combines the output of readers opened on affected shards into a single stream. The readers are created on-demand when the partition scan attempts to read from the shard. To ensure that the read won t stall, the algorithm uses buffering and read-ahead. Benefits include: Improved system responsiveness Throughput of range scans improved by as much as 30% Amount of data read from the disk reduced by as much as 40% Disk operations lowered by as much as 75% STREAMING IMPROVEMENTS Streaming is used during node recovery to populate restored nodes with data replicated from running nodes. The Scylla streaming model reads data on one node, transmits it to another node, and then writes to disk. The sender creates SSTable readers to read the rows from SSTables on disk and sends them over the network. The receiver receives the rows from the network and writes them to a memtable. The rows in memtable are flushed into SSTables periodically or when the memtable is full. Before 8

After In Scylla Open Source 3.0, stream synchronization between nodes bypasses memtables, significantly reducing the time to repair, add and remove nodes. These improvements result in higher performance when there is a change in the cluster topology, improving streaming bandwidth by as much as 240% and reducing the time it takes to perform a rebuild operation by 70%. Scylla s new streaming improvements provide the following benefits: Lower memory consumption. The saved memory can be used to handle your CQL workload instead. Better CPU utilization. No CPU cycles are used to insert and sort memtables. Bigger SSTables and fewer compactions. Before After RELEASE NOTES You can read more details about Scylla Open Source 3.0 in the Release Notes. Download Scylla Open Source 3.0 Now! 9

ABOUT SCYLLADB Scylla is the real-time big data database. Fully compatible with Apache Cassandra, Scylla embraces a shared-nothing approach that increases throughput and storage capacity as much as 10X that of Cassandra. Comcast, Ola Cabs, Samsung, AdGear, IBM Compose, Grab, Snapfish, Intel, MediaMath, AppNexus, CERN, SAS, AppNexus, Samsung, Investing.com, L3 Technologies and many more leading companies have adopted Scylla to realize order-ofmagnitude performance improvements and reduce hardware costs. ScyllaDB was founded by the team responsible for the KVM hypervisor and is backed by Bessemer Venture Partners, Innovation Endeavors, Wing Venture Capital, Qualcomm Ventures, TLV Partners, Magma Venture Partners, Western Digital Capital and Samsung Ventures. For more information: ScyllaDB.com SCYLLADB.COM United States Headquarters 1900 Embarcadero Rd Palo Alto, CA 94303 U.S.A. Phone: +1 (650) 332-9582 Email: info@scylladb.com Israel Headquarters 11 Galgalei Haplada Herzelia, Israel Copyright 2017 ScyllaDB Inc. All rights reserved. All trademarks or registered trademarks used herein are property of their respective owners.