vbuckets: The Core Enabling Mechanism for Data Distribution (aka Auto-Sharding )
Table of Contents vbucket Defined 3 key-vbucket-server ping illustrated 4 vbuckets in a world of s 5 TCP ports Deployment Option 1 Using embedded 6 Deployment Option 2 Standalone installed on each server 7 Deployment Option 3 vbucket aware 8 2
A key design goal for requires to support overthe-counter ( ) s while also providing data replication, failover and dynamic cluster reconfiguration. The vbucket concept is a foundational mechanism for meeting these seemingly irreconcilable requirements. In this document, we explore the concept of vbuckets in, covering definitions, key ping, and deployment options. Note: For simplicity, in this document we completely ignore multi-tenancy (the unit of multi-tenancy in is the bucket, which represents a virtual instance inside a single cluster). The bucket and vbucket concepts are not to be confused they are unrelated. For purposes of this document, a bucket can simply be viewed as synonymous with a cluster. vbuckets defined A vbucket is defined as the owner of a subset of the key space of a cluster. Every key belongs to a vbucket. A ping function is used to calculate the vbucket in which a given key belongs. In, that ping function is a hashing function that takes a key as input and outputs a vbucket identifier (each cluster has a fixed number of vbuckets determined when the cluster is first installed). Once the vbucket identifier has been computed, a table is consulted to lookup the server currently acting as the master server for that vbucket. The table contains one row per vbucket, pairing the vbucket to its master server. A server appearing in this table can be (and usually is) responsible for multiple vbuckets. Key vbucket (hash function) All possible keys Key 1 Key 2 Key 3 Key 4 Key 5 Key 6 Key 7 Key 8 Key 9 Key 10 vbuckets vbucket 1 vbucket 2 vbucket 3 vbucket Server (table lookup) Master Server Server 1 Server 1 Server 2 The hashing function used by Couchbase Server to keys to vbuckets is configurable both the hashing algorithm and the output space (the total number of vbuckets output by the function). Naturally, if the number of vbuckets in the output space of the hash function is changed, then the table which s vbuckets to Servers must be resized. Key m vbucket n Server p 3
Couchbase key-vbucket-server ping illustrated The vbucket mechanism provides a layer of indirection between the hashing algorithm and the server responsible for a given key. This indirection is useful in managing the orderly transition from one cluster configuration to another, whether the transition was planned (e.g. adding new servers to a cluster) or unexpected (e.g. a server failure). Memcached, in contrast, has no intermediary. It uses a hashing function to directly keys to servers (using a statically-maintained list of servers as the output space). When the server list is changed, the hashing function will re keys to new servers. Because is a cache, it just drops the data that has been moved and it will eventually be re-cached by the. This doesn t work with a database. The data can t just be dropped it has to be moved. The diagram below shows how key-server ping works when using the vbucket construct. There are three servers in the cluster. A wants to read the value of KEY. The first hashes the key to calculate the vbucket which owns KEY1. Assume Hash(KEY) = vbucket 8. The then consults the vbucket-server ping table and determines is the master server for vbucket 8. The read operation is sent to by the Couchbase library. Hash(KEY) Server A Server A Server A Server B Server B Server B After some period of time, there is a need to add a server to the cluster (e.g. to sustain performance in the face of growing use). The administrator adds Server D to the cluster and the vbucket Map is updated as follows (Note: the vbucket-server is updated by an internal algorithm and that updated table is transmitted by to all cluster participants servers, s and proxies): Hash(KEY) Server A Server A Server B Server B Server D Server D Server D 4
After the addition, a once again wants to read the value of KEY. Because the hashing algorithm in this case has not changed, Hash(KEY) = vbucket 8, as before. The examines the vbucket-server ping table and determines Server D is now the master server for vbucket 8. The read operation is sent to Server D. vbuckets in a world of s is designed to be a drop-in replacement for an existing server, while adding persistence, replication, failover and dynamic cluster reconfiguration. Existing s will likely be using an old to communicate with an cluster. This will probably be using a hashing algorithm to directly keys to servers, as previously described. To make this work, a is required. Note that the optimal solution is to replace the library with a that implements the vbucket concept directly (though a will continue to be desirable in some environments). There are vbucket-aware s for Java,.NET, Ruby, PHP, Python and C/C++. But in this example, we assume an is already running and that a change is undesired. TCP ports listens for on two configurable ports. TCP ports and (see figure below) are the defaults. Both ports are memcapable, supporting the ASCII and Binary protocols (binary only on ). Port is the port on which an embedded listens ( the traditional standard port). It can receive, and successfully process, for keys that are owned by vbuckets not hosted by this server. The will forward the request to the right server then return the result to the. Port is the port on which the database server listens. It will reject for keys owned by vbuckets not hosted by this server. The sends the vbucket number in the request. The vbucket is then compared with the list of vbuckets hosted by this server. 5
Memcached Server Embedded Standalone vbucket-aware server list server list localhost NEW Deployment Option 1 Deployment Option 2 Deployment Option 3 Priority A1 Priority A2 Priority B Deployment Option 1 Using embedded The first deployment option is to communicate with the embedded in Couchbase Server over port. This option allows you to install and begin using it with an existing, via an, without also installing another piece of software. The tradeoff is a potential performance impact, though Couchbase Server attempts to minimize latency and throughput degradation. In this deployment option (as shown in detail below) versus an deployment, in a worst case scenario, server ping will happen twice (e.g. using direct hashing to a server list on the, then using vbucket hashing and server ping on the ) with an additional round trip network hop introduced. SERVER A SERVER B SERVER C Embedded Embedded Embedded 2 vbuckethash(key) = Server A Server A Server A Server A Server B Server B Server B 1 ConsistentHash(KEY) = server list Server A Server B Unmodified Application 6
Assume there is an existing, with an, with a server list of three servers (Servers A, B and C). is installed in place of the server software on each of these three servers. As shown in the figure above, when the wants to Get(KEY), it will call a function in the library. The library will hash(key) [see 1] and be directed, based on the server list and hashing function, to. The Get operation is sent to, port (the ). When it arrives to the port [see 2], the Key is hashed again to determine its vbucket and server ping. This time, the result is Server A. The will contact Server A on port, perform the read operation and return the result to the. Deployment Option 2 Standalone installed on each server The second option is to deploy a standalone, which performs substantially the same way as the embedded, but potentially eliminating a network hop. A standalone deployed on a may also be able to provide valuable services, such as connection pooling. The diagram below shows the flow with a standalone (the is called moxi ) installed on the server. The is configured to have just one server in its server list (localhost), so all operations are forwarded to localhost: a port serviced by the. The hashes the key to a vbucket, looks up the host server in the vbucket table, and then sends the operation to the appropriate (Server A in this case) on port. SERVER A SERVER B SERVER C Standalone Standalone Standalone localhost 2 vbuckethash(key) = Server A Unmodified Application Server A Server A Server A Server B Server B Server B 1 localhost 7
Deployment Option 3 vbucket aware In the final case, no is installed anywhere in the data flow. The has been updated and performs server selection directly via the vbucket mechanism. Where there is flexibility to replace technology on an existing, or for new development, this is the highest performance option. SERVER A SERVER B SERVER C vbucket-aware vbucket-aware vbucket-aware vbucket-aware 1 vbuckethash(key) = Server A Server A Server A Server A Server B Server B Server B Unmodified Application For more information on building and deploying s with, visit www.couchbase.com. 8