vbuckets: The Core Enabling Mechanism for Couchbase Server Data Distribution (aka Auto-Sharding )

Similar documents
Intro to Couchbase Server for ColdFusion - Clustered NoSQL and Caching at its Finest

Dealing with Memcached Challenges

Matt Ingenthron. Couchbase, Inc.

Couchbase Server. Chris Anderson Chief

Moving from RELATIONAL TO NoSQL: Relational to NoSQL:

Distributed Systems. 29. Distributed Caching Paul Krzyzanowski. Rutgers University. Fall 2014

Making Non-Distributed Databases, Distributed. Ioannis Papapanagiotou, PhD Shailesh Birari

Couchbase Architecture Couchbase Inc. 1

relational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance

Equitrac Office and Express DCE High Availability White Paper

CS Silvia Zuffi - Sunil Mallya. Slides credits: official membase meetings

Caching At Twitter and moving towards a persistent, in-memory key-value store

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

High Performance with Distributed Caching. Key Requirements For Choosing The Right Solution

08 Distributed Hash Tables

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Deploying Software Defined Storage for the Enterprise with Ceph. PRESENTATION TITLE GOES HERE Paul von Stamwitz Fujitsu

4 Myths about in-memory databases busted

Configuring IP Multicast Routing

Jyotheswar Kuricheti

Realtime visitor analysis with Couchbase and Elasticsearch

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Office and Express Print Release High Availability Setup Guide

USING ARTIFACTORY TO MANAGE BINARIES ACROSS MULTI-SITE TOPOLOGIES

BEAWebLogic. Server. Automatic and Manual Service-level Migration

ebay s Architectural Principles

A memcached implementation in Java. Bela Ban JBoss 2340

Architecting Splunk For High Availability And Disaster Recovery

Hash-Based Indexing 1

Firepower Threat Defense Cluster for the Firepower 4100/9300

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

F5 Networks F5LTM12: F5 Networks Configuring BIG-IP LTM: Local Traffic Manager. Upcoming Dates. Course Description. Course Outline

BENCHMARK: PRELIMINARY RESULTS! JUNE 25, 2014!

Amazon AWS-Solution-Architect-Associate Exam

Moving from Relational to NoSQL: How to Get Started

BIG-IQ Centralized Management: ADC. Version 5.0

Relational to NoSQL: Getting started from SQL Server. Shane Johnson Sr. Product Marketing Manager Couchbase

GUIDE. Optimal Network Designs with Cohesity

Aerospike Scales with Google Cloud Platform

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

Microservices Smaller is Better? Eberhard Wolff Freelance consultant & trainer

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Configuring OpenFlow 1

Oracle Enterprise Manager. 1 Before You Install. System Monitoring Plug-in for Oracle Unified Directory User's Guide Release 1.0

Distributed ETL. A lightweight, pluggable, and scalable ingestion service for real-time data. Joe Wang

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC

Configuring CWMP Service

App Engine: Datastore Introduction

Building Durable Real-time Data Pipeline

Large-scale Game Messaging in Erlang at IMVU

Developing in NoSQL with Couchbase and Java. Raghavan N. Srinivas Couchbase 123

<Insert Picture Here> MySQL Cluster What are we working on

Q-Balancer Range FAQ The Q-Balance LB Series General Sales FAQ

Cisco Meeting Server. Load Balancing Calls Across Cisco Meeting Servers. White Paper. 22 January Cisco Systems, Inc.

Tardigrade: Leveraging Lightweight Virtual Machines to Easily and Efficiently Construct Fault-Tolerant Services

NetApp EF-Series and Couchbase

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

More on Testing and Large Scale Web Apps

Ceph Rados Gateway. Orit Wasserman Fosdem 2016

Scalable and Secure Internet Services and Architecture PRESENTATION REPORT Semester: Winter 2015 Course: ECE 7650

Scaling for Humongous amounts of data with MongoDB

CSI33 Data Structures

Jinho Hwang and Timothy Wood George Washington University

F5 BIG-IQ Centralized Management: Local Traffic & Network. Version 5.2

Configuring IP Multicast Routing

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Providing Multi-tenant Services with FPGAs: Case Study on a Key-Value Store

MySQL HA Solutions Selecting the best approach to protect access to your data

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

6.9. Communicating to the Outside World: Cluster Networking

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores

IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store

Scalable overlay Networks

Configuring IP Multicast Routing

Scaling DreamFactory

OpenEdge & CouchDB. Integrating the OpenEdge ABL with CouchDB. Don Beattie Software Architect Quicken Loans Inc.

Distributed Data Analytics Partitioning

Become a MongoDB Replica Set Expert in Under 5 Minutes:

High Availability Options

Perforce Tunables. Michael Shields Performance Lab Manager Perforce Software. March, Abstract

Data modeling in Key Value and Document Stores Philipp Fehre Developer Advocate, Couchbase

NewSQL Without Compromise

Architecture of a Real-Time Operational DBMS

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Guide to TCP/IP, Third Edition. Chapter 12: TCP/IP, NetBIOS, and WINS

Tuesday, June 22, JBoss Users & Developers Conference. Boston:2010

MongoDB Architecture

IP SLAs Overview. Finding Feature Information. Information About IP SLAs. IP SLAs Technology Overview

Cisco Expressway Cluster Creation and Maintenance

ProxySQL's Internals

Data Center Interconnection

Charlie Garrod School of Computer Science

Scaling MongoDB. Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB Senior Service Technical Service Engineer.

Chapter 13 TRANSPORT. Mobile Computing Winter 2005 / Overview. TCP Overview. TCP slow-start. Motivation Simple analysis Various TCP mechanisms

MySQL & NoSQL: The Best of Both Worlds

Unisys SafeGuard Solutions

1 Connectionless Routing

Putting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt

ROCK INK PAPER COMPUTER

Friday, April 26, 13

Transcription:

vbuckets: The Core Enabling Mechanism for Data Distribution (aka Auto-Sharding )

Table of Contents vbucket Defined 3 key-vbucket-server ping illustrated 4 vbuckets in a world of s 5 TCP ports Deployment Option 1 Using embedded 6 Deployment Option 2 Standalone installed on each server 7 Deployment Option 3 vbucket aware 8 2

A key design goal for requires to support overthe-counter ( ) s while also providing data replication, failover and dynamic cluster reconfiguration. The vbucket concept is a foundational mechanism for meeting these seemingly irreconcilable requirements. In this document, we explore the concept of vbuckets in, covering definitions, key ping, and deployment options. Note: For simplicity, in this document we completely ignore multi-tenancy (the unit of multi-tenancy in is the bucket, which represents a virtual instance inside a single cluster). The bucket and vbucket concepts are not to be confused they are unrelated. For purposes of this document, a bucket can simply be viewed as synonymous with a cluster. vbuckets defined A vbucket is defined as the owner of a subset of the key space of a cluster. Every key belongs to a vbucket. A ping function is used to calculate the vbucket in which a given key belongs. In, that ping function is a hashing function that takes a key as input and outputs a vbucket identifier (each cluster has a fixed number of vbuckets determined when the cluster is first installed). Once the vbucket identifier has been computed, a table is consulted to lookup the server currently acting as the master server for that vbucket. The table contains one row per vbucket, pairing the vbucket to its master server. A server appearing in this table can be (and usually is) responsible for multiple vbuckets. Key vbucket (hash function) All possible keys Key 1 Key 2 Key 3 Key 4 Key 5 Key 6 Key 7 Key 8 Key 9 Key 10 vbuckets vbucket 1 vbucket 2 vbucket 3 vbucket Server (table lookup) Master Server Server 1 Server 1 Server 2 The hashing function used by Couchbase Server to keys to vbuckets is configurable both the hashing algorithm and the output space (the total number of vbuckets output by the function). Naturally, if the number of vbuckets in the output space of the hash function is changed, then the table which s vbuckets to Servers must be resized. Key m vbucket n Server p 3

Couchbase key-vbucket-server ping illustrated The vbucket mechanism provides a layer of indirection between the hashing algorithm and the server responsible for a given key. This indirection is useful in managing the orderly transition from one cluster configuration to another, whether the transition was planned (e.g. adding new servers to a cluster) or unexpected (e.g. a server failure). Memcached, in contrast, has no intermediary. It uses a hashing function to directly keys to servers (using a statically-maintained list of servers as the output space). When the server list is changed, the hashing function will re keys to new servers. Because is a cache, it just drops the data that has been moved and it will eventually be re-cached by the. This doesn t work with a database. The data can t just be dropped it has to be moved. The diagram below shows how key-server ping works when using the vbucket construct. There are three servers in the cluster. A wants to read the value of KEY. The first hashes the key to calculate the vbucket which owns KEY1. Assume Hash(KEY) = vbucket 8. The then consults the vbucket-server ping table and determines is the master server for vbucket 8. The read operation is sent to by the Couchbase library. Hash(KEY) Server A Server A Server A Server B Server B Server B After some period of time, there is a need to add a server to the cluster (e.g. to sustain performance in the face of growing use). The administrator adds Server D to the cluster and the vbucket Map is updated as follows (Note: the vbucket-server is updated by an internal algorithm and that updated table is transmitted by to all cluster participants servers, s and proxies): Hash(KEY) Server A Server A Server B Server B Server D Server D Server D 4

After the addition, a once again wants to read the value of KEY. Because the hashing algorithm in this case has not changed, Hash(KEY) = vbucket 8, as before. The examines the vbucket-server ping table and determines Server D is now the master server for vbucket 8. The read operation is sent to Server D. vbuckets in a world of s is designed to be a drop-in replacement for an existing server, while adding persistence, replication, failover and dynamic cluster reconfiguration. Existing s will likely be using an old to communicate with an cluster. This will probably be using a hashing algorithm to directly keys to servers, as previously described. To make this work, a is required. Note that the optimal solution is to replace the library with a that implements the vbucket concept directly (though a will continue to be desirable in some environments). There are vbucket-aware s for Java,.NET, Ruby, PHP, Python and C/C++. But in this example, we assume an is already running and that a change is undesired. TCP ports listens for on two configurable ports. TCP ports and (see figure below) are the defaults. Both ports are memcapable, supporting the ASCII and Binary protocols (binary only on ). Port is the port on which an embedded listens ( the traditional standard port). It can receive, and successfully process, for keys that are owned by vbuckets not hosted by this server. The will forward the request to the right server then return the result to the. Port is the port on which the database server listens. It will reject for keys owned by vbuckets not hosted by this server. The sends the vbucket number in the request. The vbucket is then compared with the list of vbuckets hosted by this server. 5

Memcached Server Embedded Standalone vbucket-aware server list server list localhost NEW Deployment Option 1 Deployment Option 2 Deployment Option 3 Priority A1 Priority A2 Priority B Deployment Option 1 Using embedded The first deployment option is to communicate with the embedded in Couchbase Server over port. This option allows you to install and begin using it with an existing, via an, without also installing another piece of software. The tradeoff is a potential performance impact, though Couchbase Server attempts to minimize latency and throughput degradation. In this deployment option (as shown in detail below) versus an deployment, in a worst case scenario, server ping will happen twice (e.g. using direct hashing to a server list on the, then using vbucket hashing and server ping on the ) with an additional round trip network hop introduced. SERVER A SERVER B SERVER C Embedded Embedded Embedded 2 vbuckethash(key) = Server A Server A Server A Server A Server B Server B Server B 1 ConsistentHash(KEY) = server list Server A Server B Unmodified Application 6

Assume there is an existing, with an, with a server list of three servers (Servers A, B and C). is installed in place of the server software on each of these three servers. As shown in the figure above, when the wants to Get(KEY), it will call a function in the library. The library will hash(key) [see 1] and be directed, based on the server list and hashing function, to. The Get operation is sent to, port (the ). When it arrives to the port [see 2], the Key is hashed again to determine its vbucket and server ping. This time, the result is Server A. The will contact Server A on port, perform the read operation and return the result to the. Deployment Option 2 Standalone installed on each server The second option is to deploy a standalone, which performs substantially the same way as the embedded, but potentially eliminating a network hop. A standalone deployed on a may also be able to provide valuable services, such as connection pooling. The diagram below shows the flow with a standalone (the is called moxi ) installed on the server. The is configured to have just one server in its server list (localhost), so all operations are forwarded to localhost: a port serviced by the. The hashes the key to a vbucket, looks up the host server in the vbucket table, and then sends the operation to the appropriate (Server A in this case) on port. SERVER A SERVER B SERVER C Standalone Standalone Standalone localhost 2 vbuckethash(key) = Server A Unmodified Application Server A Server A Server A Server B Server B Server B 1 localhost 7

Deployment Option 3 vbucket aware In the final case, no is installed anywhere in the data flow. The has been updated and performs server selection directly via the vbucket mechanism. Where there is flexibility to replace technology on an existing, or for new development, this is the highest performance option. SERVER A SERVER B SERVER C vbucket-aware vbucket-aware vbucket-aware vbucket-aware 1 vbuckethash(key) = Server A Server A Server A Server A Server B Server B Server B Unmodified Application For more information on building and deploying s with, visit www.couchbase.com. 8