Axway API Management 7.5.x Cassandra Best practices. #axway

Size: px
Start display at page:

Download "Axway API Management 7.5.x Cassandra Best practices. #axway"

Transcription

1 Axway API Management 7.5.x Cassandra Best practices #axway

2 Axway API Management 7.5.x Cassandra Best practices Agenda Apache Cassandra - Overview Apache Cassandra - Focus on consistency level Apache Cassandra - Additional definitions Cassandra configuration for API Mgt - Overview Cassandra configuration for API Mgt - Single node deployment Cassandra configuration for API Mgt - 3 Node Cluster Deployment (Single Datacenter) Apache Cassandra - Reference Apache Cassandra - Tools Apache Cassandra - To go further in Cassandra understanding 1

3 Apache Cassandra Overview 2

4 Cassandra overview What is Cassandra? Apache Cassandra is an open source, distributed and decentralized/distributed storage system (database), for managing very large amounts of structured data spread out across the world. It provides highly available service with no single point of failure. Cassandra is classified as a NoSQL database. Primary objective of a NoSQL database is to have: simplicity of design horizontal scaling finer control over availability 3

5 Cassandra overview Architecture Cluster with 4 Cassandra nodes Cassandra has peer-to-peer distributed system across its nodes, and data is distributed among all the nodes in a cluster. All the nodes in a cluster play the same role. Each node is independent and at the same time interconnected to other nodes. Each node in a cluster can accept read and write requests, regardless of where the data is actually located in the cluster. When a node goes down, read/write requests can be served from other nodes in the network. Node 4 Node 1 Node 3 Node 2 4

6 Cassandra overview Write process We consider the following data to illustrate the Cassandra features: Jim Age:36 Car:camaro Gender:M Carol Age:37 Car:subaru Gender:F Johnny Age:12 Gender:M Suzy Age:10 Gender:F Primary key Let s see how Cassandra stores those data: Partitioners : it determines how data is distributed across the nodes in the cluster (including replicas). Basically, a partitioner is a function for deriving a token representing a row from its partion key, typically by hashing. Each row of data is then distributed across the cluster by the value of the token. Jim Carol Johnny Suzy 5e A9a F4eb27cea7 78b421309e Several functions are available to distribute data. In our example, for each primary key, the MD5 hash function is used (128 bit). 5

7 Cassandra overview Write process A range of hash is associated to each node: Node Start End Node 1 Node 2 Node 3 Node 4 0xc x x x x x x xc Broken into ranges (In our example we have 4 nodes so each node = a quarter of it) Data are written in the node corresponding to the related range: Jim Carol Johnny Suzy 5e A9a F4eb27cea7 78b421309e Node 3 Node 4 Node 1 Node 3 6

8 Cassandra overview Replication factor - Definition The replication factor is the total number of replicas across the cluster. You define the replication factor for each data center. Generally you should set the replication strategy greater than one, but no more than the number of nodes in the cluster. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica. If we go back to our previous example : With a replication factor set to 3, there are 3 replicas for each data : Jim Suzy Carol Node 4 Node 1 Replication made clockwise Jim Suzy Carol Johnny Node 2 Carol Johnny Jim Node 3 Suzy Johnny 7

9 Cassandra overview Reading Data Reading data is performed in parallel across a cluster. A user requests data from any node (which becomes that user s coordinator node ), with the user s query being assembled from one or more nodes holding the necessary data. Read - Process The client is aware of every single node. It can ask every single node (Every node can received a read question). In this example, the node 4 do not have the right data. The node 4 knows all nodes of the cluster. It will play the role of a coordinator. 5µs ack Node 1 Primary If a particular node having the required data is down, Cassandra simply requests data from another node holding a replicated copy of that data. Client Read 12µs ack Node 4 500µs ack Node 3 + Copy of 1 12µs ack Node 2 + Copy of 1 8

10 Cassandra overview Read - Process If node 4 is crashed, the client will ask the node 1 Node 1 Primary Client Read Node 4 Node 2 + Copy of 1 Node 3 + Copy of 1 9

11 Apache Cassandra Focus on consistency level 10

12 Consistency Definition Consistency refers to how up-to-date and synchronized a row of Cassandra data is on all of its replicas. Cassandra extends the concept of eventual consistency by offering tunable consistency. For any given read or write operation, the client application decides how consistent the requested data must be. Even at low consistency levels, Cassandra writes to all replicas of the partition key, even replicas in other data centers. The consistency level determines only the number of replicas that need to acknowledge the write success to the client application. Typically, a client specifies a consistency level that is less than the replication factor specified by the keyspace. This practice ensures that the coordinating server node reports the write successful even if some replicas are down or otherwise not responsive to the write. Resource: 11

13 Consistency (Write) Definition The coordinator sends a write request to all replicas that own the row being written. As long as all replica nodes are up and available, they will get the write regardless of the consistency level specified by the client. The write consistency level determines how many replica nodes must respond with a success acknowledgment in order for the write to be considered successful. Success means that the data was written to the commit log and the memtable as described in About writes. Level definition One : A write must be written to the commit log and memtable of at least one replica node. Two : A write must be written to the commit log and memtable of at least two replica node. Quorum* : A write must be written to the commit log and memtable on a quorum of replica nodes across all data centers. Local_Quorum* : Strong consistency. A write must be written to the commit log and memtable on a quorum of replica nodes in the same data center as the coordinator node. Avoids latency of interdata center communication. All : A write must be written to the commit log and memtable on all replica nodes in the cluster for that partition. Note* : Q=QUORUM (Q = N / 2 + 1) with N=Replication factor 12

14 Consistency (Write) Example (1/3) Example (2/3) If we go back to our previous example (4 nodes / replication factor=3). The incoming write (1) will go to all 3 nodes (2) that own the requested row : Client Write Johnny 1 Write Node 4 Node 1 +Johnny Write 2 Client Write Node 3 +Johnny Write Node 2 +Johnny If the write consistency level specified by the client is ONE (1), the first node to complete the write responds back to the coordinator (3), which then proxies the success message back to the client (4). A consistency level of ONE means that it is possible that 2 of the 3 replicas could miss the write if they happened to be down at the time the request was made. If a replica misses a write, Cassandra will make the row consistent later using one of its built-in repair mechanisms: hinted handoff, read repair, or anti-entropy node repair. Write Johnny Write consistency level = One 1 Write 12µs ack 4 3 5µs ack Node 4 Write 2 Node 1 +Johnny Write Write Node 2 Node 3 13

15 Consistency (Write) Example (3/3) If the write consistency level specified by the client is ALL (1), a write must be written to the commit log and memtable on all replica nodes in the cluster for that partition (2). The coordinator node must wait the acknowledgement (3) of all replicas (specified by the consistency level. 3 in our example) before acknowledging the client (4): Client Write Johnny Write consistency level = All 1 Write 12µs ack 4 3 5µs ack Node 4 Write 2 Node 1 +Johnny Write 12µs ack Write 3 Node 2 +Johnny 500µs ack 3 Node 3 +Johnny 14

16 Consistency (Read) Definition There are three types of read requests that a coordinator node can send to a replica: A direct read request A digest request A background read repair request The coordinator node contacts one replica node with a direct read request. Then the coordinator sends a digest request to a number of replicas determined by the consistency level specified by the client. The digest request checks the data in the replica node to make sure it is up to date. Then the coordinator sends a digest request to all remaining replicas. If any replica nodes have out of date data, a background read repair request is sent. Read repair requests ensure that the requested row is made consistent on all replicas. For a digest request the coordinator first contacts the replicas specified by the consistency level. The coordinator sends these requests to the replicas that are currently responding the fastest. The nodes contacted respond with a digest of the requested data; if multiple nodes are contacted, the rows from each replica are compared in memory to see if they are consistent. If they are not, then the replica that has the most recent data (based on the timestamp) is used by the coordinator to forward the result back to the client. To ensure that all replicas have the most recent version of frequently-read data, the coordinator also contacts and compares the data from all the remaining replicas that own the row in the background. If the replicas are inconsistent, the coordinator issues writes to the out-of-date replicas to update the row to the most recent values. This process is known as read repair. Read repair can be configured per table for non-quorum consistency levels (using read_repair_chance), and is enabled by default 15

17 Consistency (Read) Level definition One : Returns a response from the closest replica, as determined by the snitch. By default, a read repair runs in the background to make the other replicas consistent. Two : Returns the most recent data from two of the closest replicas. Quorum* : Returns the record after a quorum of replicas from all data centers has responded. Local_Quorum* : Returns the record after a quorum of replicas in the current data center as the coordinator node has reported. Avoids latency of inter-data center communication. All : Returns the record after all replicas have responded. The read operation will fail if a replica does not respond. Note* : Q=QUORUM (Q = N / 2 + 1) with N=Replication factor 16

18 Consistency (Read) Example (1/2) In a single data center cluster with a replication factor of 3, and a read consistency level of ONE (1), the closest replica for the given row is contacted to fulfill the read request (2). In the background a read repair is potentially initiated : Client Read Johnny consistency level = One Node 1 Johnny 1 Read Read 2 12µs ack 5µs ack Node 4 Node 3 Johnny Node 2 Johnny Example (2/2) In a single data center cluster with a replication factor of 3, and a read consistency level of QUORUM (1), 2 of the 3 replicas for the given row must respond to fulfill the read request (2&3). If the contacted replicas have different versions of the row, the replica with the most recent version will return the requested data. In the background, the third replica is checked for consistency with the first two, and if needed, a read repair is initiated for the out-of-date replicas: Client Read Johnny Read consistency level = QUORUM 1 Read 12µs ack 4 3 5µs ack Node 4 2 Node 1 +Johnny Read Read 12µs ack Read 3 Node 2 +Johnny 500µs ack Node 3 +Johnny 17

19 Consistency Summary Using a replication factor of 3, a quorum is 2 nodes. The cluster can tolerate 1 replica down. Node 1 Johnny Node 2 Johnny Node 3 Johnny Node 1 Node 2 Node 3 Node 4 Node N Node 4 Johnny Johnny Johnny Node N 2 nodes can still provide the data The Quorum can be achieved Only one node can still provide the data. The Quorum can never be achieved Using a replication factor of 6, a quorum is 4. The cluster can tolerate 2 replicas down. Node 1 Johnny Node 2 Johnny Node 3 Johnny Node 4 Johnny Node 5 Johnny Node 6 Johnny Node 7 Node N Node 1 Johnny Node 2 Johnny Node 3 Johnny Node 4 Johnny Node 5 Johnny Node 6 Johnny Node 7 Node N 4 nodes can still provide the data The Quorum can be achieved Only 3 nodes can still provide the data The Quorum can never be achieved 18

20 Apache Cassandra Additional definitions 19

21 Seed Nodes Definition The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster. Cassandra nodes use this list of hosts to find each other and learn the topology of the ring. To prevent problems in gossip communications, use the same list of seed nodes for all nodes in a cluster. More than a single seed node per data center is recommended for fault tolerance Example: cassandra.yaml 20

22 Keyspace Definition A cluster is a container for keyspaces typically a single keyspace. A keyspace is the outermost container for data in Cassandra, corresponding closely to a relational database. In the same way that a database is a container for tables, a keyspace is a container for a list of one or more column families. A column family is roughly analogous to a table in the relational model, and is a container for a collection of rows. Each row contains ordered columns. Column families represent the structure of your data. Each keyspace has at least one and often many column families. Like a relational database, a keyspace has a name and a set of attributes that define keyspace-wide behavior : Replication factor Replica placement strategy Column families 21

23 Cassandra configuration for API Mgt Overview 22

24 Cassandra usage for API Management API Gateway KPS Custom KPS tables * API Gateway OAuth Token Store * API Manager (Client Registry, API Catalog, quotas) API Gateway Client Registry (Client Registry for API Keys and OAuth solution, when API Gateway only is used) (*) Cassandra is optional for those, other data store options are available 23

25 Cassandra configuration for API Management - overview Regarding the previous chapters, elements that must be configured are : Element Configuration Configuration location / tool Each Cassandra node - Node configuration - IP, port - Node configuration for client (API GW/Manager) connection - rpc address and port - Node configuration related to cluster - Seed node (for the node to be aware of all other nodes in the cluster) - Listen address and port used for internode communication (used for replication) Keyspace - Replication factor - Replica placement strategy Cassandra client (API Gateway/Manager) Read and write consistency level Cassandra.yaml Policy studio Policy studio 24

26 Best Practices (1/2) Always build your Cassandra deployment pattern first Configure JAVA_HOME referencing an independent ORACLE JRE 1.8 installation This completely separates the interdependency between APIGateway and Cassandra Enable Authentication and SSL between Cassandra Hector client and Cassandra server (multi-node cluster) Enable SSL communication between Cassandra servers (multi-node cluster) For optimal write performance, place the commit log be on a separate disk partition. Advice per datastax 25

27 Best Practices (2/2) Configure Cassandra H/A before installing the product Supported configuration is only strong consistency with minimum: 3 nodes, Quorum for read and write level of consistency, replication factor : 3 Synchronize time on all servers Don't add a node to the cluster if the seed node is not started Start Cassandra Node 1 (seed) first, after it has booted, start Cassandra Node 2, and, after this has booted, start Cassandra Node 3 26

28 Cassandra configuration for API Mgt Single node deployment 27

29 Single Node A single node deployment is the simplest Suitable for development environment ONLY The use of LOCALHOST within the cassandra.yaml and Cassandra- Host (see Policy Studio server settings) is only suitable for a single node deployment. NOTE: IPV6 may need to be disabled API Gateway with (or without API Manager) Cassandra Node 1 28

30 Single node configuration Cassandra.yaml seed_provider: # Addresses of hosts that are deemed contact points. # Cassandra nodes use this list of hosts to find each other and learn # the topology of the ring. You must change this if you are running # multiple nodes! - class_name: org.apache.cassandra.locator.simpleseedprovider parameters: # seeds is actually a comma-delimited list of addresses. # Ex: "<ip1>,<ip2>,<ip3>" - seeds: " " listen_address Interface used for data connection (PORT 7000 or 7001) listen_address: # listen_interface: eth0 Note: Choose listen address or listen interface not both rpc_address Interface used for client connection (port 9160 and 9042) rpc_address: # rpc_interface: eth1 Note: Choose listen address or listen interface not both Keyspace and client configuration Register Host Designate Admin Node Manager Configure API Gateway Instance Configure Hector Client via Policy Studio Once configuration is deployed Cassandra keyspace will be created Install APIMGR Server Settings / Cassandra / Hosts 29

31 Cassandra configuration for API Mgt 3 Node Cluster Deployment (Single Datacenter) 30

32 3 Node Cluster DC 1: Cassandra DB Node 1: Cassandra DB Node 2: Cassandra DB Node 3: API Gateway with (or without API Manager) API Gateway with (or without API Manager) Cassandra Node 1 Cassandra Node 1 Cassandra Node 2 31

33 Each node configuration Cassandra.yaml seed_provider: # Addresses of hosts that are deemed contact points. # Cassandra nodes use this list of hosts to find each other and learn # the topology of the ring. You must change this if you are running # multiple nodes! - class_name: org.apache.cassandra.locator.simpleseedprovider parameters: # seeds is actually a comma-delimited list of addresses. # Ex: "<ip1>,<ip2>,<ip3>" - seeds: (all Cassandra instances should reference the same seed) listen_address Interface used for data connection (PORT 7000 or 7001) listen_address: (change address to correspond with Cassandra instance) # listen_interface: eth0 Note: Choose listen address or listen interface not both rpc_address Interface used for client connection (port 9160 and 9042) rpc_address: (change address to correspond with Cassandra instance) # rpc_interface: eth1 Note: Choose listen address or listen interface not both 32

34 Keyspace and client configuration For the first API Gateway/Manager Register Host Configure 1 APIGateway instance Configure Hector Client via Policy Studio Once configuration is deployed Cassandra keyspace will be created Install APIMGR on first gateway Update Read/Write Consistency Level to QUORUM for KPS collections via Policy Studio Register remaining hosts and configure APIGateway Instance --- AFTER replication factor is updated Update replication factor Login to the Cassandra DB Node 1 # cd../cassandra/bin #./cqlsh <IP Address>./cqlsh # Find keyspace > DESCRIBE KEYSPACES; Example: x8746e4a4_e423_40ac_95a7_ e4e5d_group_2 # Execute the following command to alter table > ALTER KEYSPACE x8746e4a4_e423_40ac_95a7_ e4e5d_group_2 WITH REPLICATION = {'class' : SimpleStrategy', 'replication_factor : 3 }; # Exit cqlsh utility and run the following command on all cassandra instances: Run nodetool repair x8746e4a4_e423_40ac_95a7_ e4e5d_group_2 on all cassandra instances. 33

35 Keyspace and client configuration For the other API Gateway/Manager Register remaining hosts and configure API Gateway Instance 34

36 Apache Cassandra Reference 35

37 Reference Reference- /tmp --noexec /tmp noexec (Cassandra Only) If /tmp noexec is configured an error will be generated when cassandra is started Solution: Create tmp directory in /opt/cassandra Edit the cassandra-env.sh file which would be under /opt/axway/cassandra/conf folder and add the following lines: # vi cassandra-env.sh JVM_OPTS="$JVM_OPTS - Djava.io.tmpdir=$CASSANDRA_HOME/tmp Solution 2: This works for both APIGateway and Cassandra sudo mount -o remount,exec /tmp Reference: JAVA_HOME JAVA_HOME # tar xvzf jdk-8u101-linux-x64.tar.gz -C /opt/jdk/ Make sure the JAVA_HOME variable is available for all the users, by adding the following entries in /etc/profile file. # sudo vi /etc/profile JAVA_HOME=/opt/jdk/jdk1.8.0_101 Export PATH=$PATH:$JAVA_HOME/bin 36

38 Apache Cassandra Tools 37

39 Tools 38

40 Tools DBeaver - Linux 39

41 Apache Cassandra To go further in Cassandra understanding 40

42 Cassandra: Components Write process - Additional definitions Commit log: The commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log. Mem-table: A mem-table is a memory-resident data structure. After commit log, the data will be written to the mem-table. Sometimes, for a single-column family, there will be multiple mem-tables. SSTable: It is a disk file to which the data is flushed from the mem-table when its contents reach a threshold value. 41

43 Cassandra: Components Write process (1/3) Client 1 The client send the request to the node Update users Set firstname = Patrick Where id= pmcfadin 1 2 It is written into the commit log (written on the server disk). It is very fast. Cassandra server File system Write Rowkey, Column (id= pmcfadin,firstname = Patrick 2 id= pmcfadin, firstname = Patrick Commit log Data directory Resource: 42

44 Cassandra: Components Write process (2/3) Client 3 Then the data is put on a memtable stored in memory Update users Set firstname = Patrick Where id= pmcfadin 4 4 Acknowledgement to the client Cassandra server File system 3 Memtable Table = users id= pmcfadin firstname = Patrick Lastname name = McFadin id= pmcfadin, firstname = Patrick Commit log Data directory Resource: 43

45 Cassandra: Components Write process (3/3) Client Update users Set firstname = Patrick Where id= pmcfadin 5 The flush process writes out data into a file called SStable. It is flushed to disk. It is not about random IO but sequential IO (sequential write). It is ordered by time. Cassandra server File system Memtable Table = users id= pmcfadin firstname = Patrick Lastname name = McFadin 5 id= pmcfadin firstname = Patrick Lastname name = McFadin Commit log Data directory Resource: 44

46 Replication Strategies Definition A replication strategy determines the nodes where replicas are placed. Two replication strategies are available: SimpleStrategy : Use only for a single data center. SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or data center location). NetworkTopologyStategy Use when you have (or plan to have) your cluster deployed across multiple data centers. This strategy specify how many replicas you want in each data center. Strategy is configured per KEYSPACE 45

47 Snitch Definition A snitch determines which data centers and racks nodes belong to. They inform Cassandra about the network topology so that requests are routed efficiently and allows Cassandra to distribute replicas by grouping machines into data centers and racks. Specifically, the replication strategy places the replicas based on the information provided by the new snitch. All nodes must return to the same rack and data center. Cassandra does its best not to have more than one replica on the same rack (which is not necessarily a physical location). Example SimpleSnitch The SimpleSnitch is used only for single-data center deployments. The SimpleSnitch (default) is used only for single-data center deployments. It does not recognize data center or rack information and can be used only for single-data center deployments or single-zone in public clouds. It treats strategy order as proximity, which can improve cache locality when disabling read repair. Using a SimpleSnitch, you define the keyspace to use SimpleStrategy and specify a replication factor. GossipingPropertyFileSnitch Automatically updates all nodes using gossip when adding new nodes and is recommended for production. This snitch is recommended for production. It uses rack and data center information for the local node defined in the cassandra-rackdc.properties file and propagates this information to other nodes via gossip. Referenced in cassandra.yaml endpoint_snitch:simplesnitch 46

48 Internode communications (gossip) Definition Cassandra uses a protocol called gossip to discover location and state information about the other nodes participating in a Cassandra cluster. Gossip is a peer-to-peer communication protocol in which nodes periodically exchange state information about themselves and about other nodes they know about. The gossip process runs every second and exchanges state messages with up to three other nodes in the cluster. The nodes exchange information about themselves and about the other nodes that they have gossiped about, so all nodes quickly learn about all other nodes in the cluster. A gossip message has a version associated with it, so that during a gossip exchange, older information is overwritten with the most current state for a particular node. 47

49 Thank you! 48

ADVANCED DATABASES CIS 6930 Dr. Markus Schneider

ADVANCED DATABASES CIS 6930 Dr. Markus Schneider ADVANCED DATABASES CIS 6930 Dr. Markus Schneider Group 2 Archana Nagarajan, Krishna Ramesh, Raghav Ravishankar, Satish Parasaram Drawbacks of RDBMS Replication Lag Master Slave Vertical Scaling. ACID doesn

More information

DataStax Distribution of Apache Cassandra 3.x

DataStax Distribution of Apache Cassandra 3.x DataStax Distribution of Apache Cassandra 3.x Documentation November 24, 216 Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation 216 DataStax,

More information

Apache Cassandra 2.1 for DSE (EOL)

Apache Cassandra 2.1 for DSE (EOL) Apache Cassandra 2.1 for DSE (EOL) Updated: 2018-06-11-07:00 2018 DataStax, Inc. All rights reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the

More information

Apache Cassandra 2.1

Apache Cassandra 2.1 Apache Cassandra 21 Documentation February 17, 2015 2015 DataStax All rights reserved Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation Contents

More information

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook Cassandra - A Decentralized Structured Storage System Avinash Lakshman and Prashant Malik Facebook Agenda Outline Data Model System Architecture Implementation Experiments Outline Extension of Bigtable

More information

Index. Sam R. Alapati 2018 S. R. Alapati, Expert Apache Cassandra Administration,

Index. Sam R. Alapati 2018 S. R. Alapati, Expert Apache Cassandra Administration, A Accrual detection mechanism, 157 Administration tools cassandra-stress tool, 24 cassandra utility, 24 nodetool utility, 24 SSTable utilities, 24 Akka, 280 Allocation algorithm, 177 178 ALTER KEYSPACE

More information

Apache Cassandra 3.0 for DSE 5.0 (Earlier version)

Apache Cassandra 3.0 for DSE 5.0 (Earlier version) Apache Cassandra 3. for DSE 5. (Earlier version) Updated: 218-9-8-7: 218 DataStax, Inc. All rights reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries

More information

Cassandra Installation and Configuration Guide. Installation

Cassandra Installation and Configuration Guide. Installation Cassandra Installation and Configuration Guide Installation 6/18/2018 Contents 1 Installation 1.1 Step 1: Downloading and Setting Environmental Variables 1.2 Step 2: Edit configuration files 1.3 Step 3:

More information

Apache Cassandra 2.0

Apache Cassandra 2.0 Apache Cassandra 2. Documentation December 5, 214 214 DataStax. All rights reserved. Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation Contents

More information

CS 655 Advanced Topics in Distributed Systems

CS 655 Advanced Topics in Distributed Systems Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3

More information

Cassandra Database Security

Cassandra Database Security Cassandra Database Security Author: Mohit Bagria NoSQL Database A NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular

More information

Migrating to Cassandra in the Cloud, the Netflix Way

Migrating to Cassandra in the Cloud, the Netflix Way Migrating to Cassandra in the Cloud, the Netflix Way Jason Brown - @jasobrown Senior Software Engineer, Netflix Tech History, 1998-2008 In the beginning, there was the webapp and a single database in a

More information

Glossary. Updated: :00

Glossary. Updated: :00 Updated: 2018-07-25-07:00 2018 DataStax, Inc. All rights reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

More information

Big Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

Big Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI Big Data Development CASSANDRA NoSQL Training - Workshop November 20 to 24 2016 (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 9798 Dubai UAE, email training-coordinator@isidusnet

More information

Apache Cassandra. Tips and tricks for Azure

Apache Cassandra. Tips and tricks for Azure Apache Cassandra Tips and tricks for Azure Agenda - 6 months in production Introduction to Cassandra Design and Test Getting ready for production The first 6 months 1 Quick introduction to Cassandra Client

More information

Certified Apache Cassandra Professional VS-1046

Certified Apache Cassandra Professional VS-1046 Certified Apache Cassandra Professional VS-1046 Certified Apache Cassandra Professional Certification Code VS-1046 Vskills certification for Apache Cassandra Professional assesses the candidate for skills

More information

Cassandra Administrator Guide. intra-mart Accel Platform. Table of Contents. Revision Information

Cassandra Administrator Guide. intra-mart Accel Platform. Table of Contents. Revision Information Table of Contents 1. Revision Information 2. Introduction 2.1. Document Contents 2.2. Target Readers 2.3. URLs of external site in this Document 3. Cassandra Overview 3.1. About Apache Cassandra 3.2. Cassandra

More information

Cassandra multi-datacenter operations essentials Apache: Big Data Vancouver, CA

Cassandra multi-datacenter operations essentials Apache: Big Data Vancouver, CA Cassandra multi-datacenter operations essentials Apache: Big Data 2016 - Vancouver, CA Julien Anguenot (@anguenot) agenda key notions configuration and tuning tools and operations monitoring things you

More information

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014 Cassandra @ Spotify Scaling storage to million of users world wide! Jimmy Mårdell October 14, 2014 2 About me Jimmy Mårdell Tech Product Owner in the Cassandra team 4 years at Spotify

More information

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:

More information

Outline. Introduction Background Use Cases Data Model & Query Language Architecture Conclusion

Outline. Introduction Background Use Cases Data Model & Query Language Architecture Conclusion Outline Introduction Background Use Cases Data Model & Query Language Architecture Conclusion Cassandra Background What is Cassandra? Open-source database management system (DBMS) Several key features

More information

API Gateway Version November Installation Guide

API Gateway Version November Installation Guide API Gateway Version 7.5.3 9 November 2017 Installation Guide Copyright 2017 Axway All rights reserved. This documentation describes the following Axway software: Axway API Gateway 7.5.3 No part of this

More information

Apache Cassandra Documentation

Apache Cassandra Documentation Apache Cassandra Documentation February 16, 2012 2012 DataStax. All rights reserved. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Apache,!Apache!Cassandra,!Apache!Hadoop,!Hadoop!and!the!eye!logo! are!trademarks!of!the!apache!software!foundation!

More information

Cassandra- A Distributed Database

Cassandra- A Distributed Database Cassandra- A Distributed Database Tulika Gupta Department of Information Technology Poornima Institute of Engineering and Technology Jaipur, Rajasthan, India Abstract- A relational database is a traditional

More information

June 20, 2017 Revision NoSQL Database Architectural Comparison

June 20, 2017 Revision NoSQL Database Architectural Comparison June 20, 2017 Revision 0.07 NoSQL Database Architectural Comparison Table of Contents Executive Summary... 1 Introduction... 2 Cluster Topology... 4 Consistency Model... 6 Replication Strategy... 8 Failover

More information

Presented By: Devarsh Patel

Presented By: Devarsh Patel : Amazon s Highly Available Key-value Store Presented By: Devarsh Patel CS5204 Operating Systems 1 Introduction Amazon s e-commerce platform Requires performance, reliability and efficiency To support

More information

CQL for Apache Cassandra 3.0 (Earlier version)

CQL for Apache Cassandra 3.0 (Earlier version) CQL for Apache Cassandra 3.0 (Earlier version) Updated: 2018-08-20-07:00 2018 DataStax, Inc. All rights reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries

More information

Dynamo: Amazon s Highly Available Key-Value Store

Dynamo: Amazon s Highly Available Key-Value Store Dynamo: Amazon s Highly Available Key-Value Store DeCandia et al. Amazon.com Presented by Sushil CS 5204 1 Motivation A storage system that attains high availability, performance and durability Decentralized

More information

Web Services and Applications Deployment Guide. Multiple Data Center Deployment

Web Services and Applications Deployment Guide. Multiple Data Center Deployment Web Services and Applications Deployment Guide Multiple Data Center Deployment 1/17/2018 Contents 1 Multiple Data Center Deployment 1.1 Overview 1.2 Architecture 1.3 Incoming traffic distribution 1.4 Configuration

More information

BIG DATA AND CONSISTENCY. Amy Babay

BIG DATA AND CONSISTENCY. Amy Babay BIG DATA AND CONSISTENCY Amy Babay Outline Big Data What is it? How is it used? What problems need to be solved? Replication What are the options? Can we use this to solve Big Data s problems? Putting

More information

CS Amazon Dynamo

CS Amazon Dynamo CS 5450 Amazon Dynamo Amazon s Architecture Dynamo The platform for Amazon's e-commerce services: shopping chart, best seller list, produce catalog, promotional items etc. A highly available, distributed

More information

FAQs Snapshots and locks Vector Clock

FAQs Snapshots and locks Vector Clock //08 CS5 Introduction to Big - FALL 08 W.B.0.0 CS5 Introduction to Big //08 CS5 Introduction to Big - FALL 08 W.B. FAQs Snapshots and locks Vector Clock PART. LARGE SCALE DATA STORAGE SYSTEMS NO SQL DATA

More information

Cassandra Installation and Configuration Guide. Orchestration Server 8.1.4

Cassandra Installation and Configuration Guide. Orchestration Server 8.1.4 Cassandra Installation and Configuration Guide Orchestration Server 8.1.4 12/15/2017 Table of Contents Cassandra 2.2.5 / 3.9 Installation/Configuration Guide 3 Overview 4 Prerequisites 6 Installation 7

More information

5 reasons why choosing Apache Cassandra is planning for a multi-cloud future

5 reasons why choosing Apache Cassandra is planning for a multi-cloud future White Paper 5 reasons why choosing Apache Cassandra is planning for a multi-cloud future Abstract We have been hearing for several years now that multi-cloud deployment is something that is highly desirable,

More information

Making Non-Distributed Databases, Distributed. Ioannis Papapanagiotou, PhD Shailesh Birari

Making Non-Distributed Databases, Distributed. Ioannis Papapanagiotou, PhD Shailesh Birari Making Non-Distributed Databases, Distributed Ioannis Papapanagiotou, PhD Shailesh Birari Dynomite Ecosystem Dynomite - Proxy layer Dyno - Client Dynomite-manager - Ecosystem orchestrator Dynomite-explorer

More information

A Non-Relational Storage Analysis

A Non-Relational Storage Analysis A Non-Relational Storage Analysis Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Cloud Computing - 2nd semester 2012/2013 Universitat Politècnica de Catalunya Microblogging - big data?

More information

Installation and Configuration Guide for Cassandra Message Store Release 8.0.2

Installation and Configuration Guide for Cassandra Message Store Release 8.0.2 [1]Oracle Communications Messaging Server Installation and Configuration Guide for Cassandra Message Store Release 8.0.2 E79615-01 October 2017 Oracle Communications Messaging Server Installation and Configuration

More information

Infrastructures for Cloud Computing and Big Data

Infrastructures for Cloud Computing and Big Data University of Bologna Dipartimento di Informatica Scienza e Ingegneria (DISI) Engineering Bologna Campus Class of Computer Networks M or Infrastructures for Cloud Computing and Big Data Global Data Storage

More information

Deploying Apache Cassandra on Oracle Cloud Infrastructure Quick Start White Paper October 2016 Version 1.0

Deploying Apache Cassandra on Oracle Cloud Infrastructure Quick Start White Paper October 2016 Version 1.0 Deploying Apache Cassandra on Oracle Cloud Infrastructure Quick Start White Paper October 2016 Version 1.0 Disclaimer The following is intended to outline our general product direction. It is intended

More information

Intro Cassandra. Adelaide Big Data Meetup.

Intro Cassandra. Adelaide Big Data Meetup. Intro Cassandra Adelaide Big Data Meetup instaclustr.com @Instaclustr Who am I and what do I do? Alex Lourie Worked at Red Hat, Datastax and now Instaclustr We currently manage x10s nodes for various customers,

More information

Bitnami Cassandra for Huawei Enterprise Cloud

Bitnami Cassandra for Huawei Enterprise Cloud Bitnami Cassandra for Huawei Enterprise Cloud Description Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers,

More information

Scylla Open Source 3.0

Scylla Open Source 3.0 SCYLLADB PRODUCT OVERVIEW Scylla Open Source 3.0 Scylla is an open source NoSQL database that offers the horizontal scale-out and fault-tolerance of Apache Cassandra, but delivers 10X the throughput and

More information

Exploring Cassandra and HBase with BigTable Model

Exploring Cassandra and HBase with BigTable Model Exploring Cassandra and HBase with BigTable Model Hemanth Gokavarapu hemagoka@indiana.edu (Guidance of Prof. Judy Qiu) Department of Computer Science Indiana University Bloomington Abstract Cassandra is

More information

NetApp FAS and Cassandra

NetApp FAS and Cassandra VC Technical Report NetApp FAS and Cassandra Akshay Patil, Karthikeyan Nagalingam July 2016 TR-4527 TABLE OF CONTENTS 1 Introduction... 4 2 Solution Overview... 4 2.1 NetApp FAS... 4 2.2 Snap Creator Framework...

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

API Gateway Version September Key Property Store User Guide

API Gateway Version September Key Property Store User Guide API Gateway Version 7.5.2 15 September 2017 Key Property Store User Guide Copyright 2017 Axway All rights reserved. This documentation describes the following Axway software: Axway API Gateway 7.5.2 No

More information

Replication in Distributed Systems

Replication in Distributed Systems Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over

More information

ExaminingCassandra Constraints: Pragmatic. Eyes

ExaminingCassandra Constraints: Pragmatic. Eyes International Journal of Management, IT & Engineering Vol. 9 Issue 3, March 2019, ISSN: 2249-0558 Impact Factor: 7.119 Journal Homepage: Double-Blind Peer Reviewed Refereed Open Access International Journal

More information

Analysis, Archival, & Retrieval Guide. StorageX 8.0

Analysis, Archival, & Retrieval Guide. StorageX 8.0 Analysis, Archival, & Retrieval Guide StorageX 8.0 March 2018 Copyright 2018 Data Dynamics, Inc. All Rights Reserved. The trademark Data Dynamics is the property of Data Dynamics, Inc. All other brands,

More information

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Copyright 2013, Oracle and/or its affiliates. All rights reserved. 1 Oracle NoSQL Database: Release 3.0 What s new and why you care Dave Segleau NoSQL Product Manager The following is intended to outline our general product direction. It is intended for information purposes

More information

Distributed PostgreSQL with YugaByte DB

Distributed PostgreSQL with YugaByte DB Distributed PostgreSQL with YugaByte DB Karthik Ranganathan PostgresConf Silicon Valley Oct 16, 2018 1 CHECKOUT THIS REPO: github.com/yugabyte/yb-sql-workshop 2 About Us Founders Kannan Muthukkaruppan,

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

Intuitive distributed algorithms. with F#

Intuitive distributed algorithms. with F# Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype

More information

10. Replication. Motivation

10. Replication. Motivation 10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure

More information

DataStax Enterprise 4.0 In-Memory Option A look at performance, use cases, and anti-patterns. White Paper

DataStax Enterprise 4.0 In-Memory Option A look at performance, use cases, and anti-patterns. White Paper DataStax Enterprise 4.0 In-Memory Option A look at performance, use cases, and anti-patterns White Paper Table of Contents Abstract... 3 Introduction... 3 Performance Implications of In-Memory Tables...

More information

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Goal of the presentation is to give an introduction of NoSQL databases, why they are there. 1 Goal of the presentation is to give an introduction of NoSQL databases, why they are there. We want to present "Why?" first to explain the need of something like "NoSQL" and then in "What?" we go in

More information

c 2014 Ala Walid Shafe Alkhaldi

c 2014 Ala Walid Shafe Alkhaldi c 2014 Ala Walid Shafe Alkhaldi LEVERAGING METADATA IN NOSQL STORAGE SYSTEMS BY ALA WALID SHAFE ALKHALDI THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science

More information

It also performs many parallelization operations like, data loading and query processing.

It also performs many parallelization operations like, data loading and query processing. Introduction to Parallel Databases Companies need to handle huge amount of data with high data transfer rate. The client server and centralized system is not much efficient. The need to improve the efficiency

More information

Cassandra Design Patterns

Cassandra Design Patterns Cassandra Design Patterns Sanjay Sharma Chapter No. 1 "An Overview of Architecture and Data Modeling in Cassandra" In this package, you will find: A Biography of the author of the book A preview chapter

More information

Computer Engineer Programming Electronics Math <3 <3 Physics Lego Meetups Animals Coffee GIFs

Computer Engineer Programming Electronics Math <3 <3 Physics Lego Meetups Animals Coffee GIFs SASI and Secondary Indexes Hi! Computer Engineer Programming Electronics Math

More information

Performance Evaluation of NoSQL Databases

Performance Evaluation of NoSQL Databases Performance Evaluation of NoSQL Databases A Case Study - John Klein, Ian Gorton, Neil Ernst, Patrick Donohoe, Kim Pham, Chrisjan Matser February 2015 PABS '15: Proceedings of the 1st Workshop on Performance

More information

Advanced Databases ( CIS 6930) Fall Instructor: Dr. Markus Schneider. Group 17 Anirudh Sarma Bhaskara Sreeharsha Poluru Ameya Devbhankar

Advanced Databases ( CIS 6930) Fall Instructor: Dr. Markus Schneider. Group 17 Anirudh Sarma Bhaskara Sreeharsha Poluru Ameya Devbhankar Advanced Databases ( CIS 6930) Fall 2016 Instructor: Dr. Markus Schneider Group 17 Anirudh Sarma Bhaskara Sreeharsha Poluru Ameya Devbhankar BEFORE WE BEGIN NOSQL : It is mechanism for storage & retrieval

More information

Cassandra From Her Nephew s POV An presentation about using Cassandra with Astyanax! By Mike Epstein! Principal callfire.

Cassandra From Her Nephew s POV An presentation about using Cassandra with Astyanax! By Mike Epstein! Principal callfire. Cassandra From Her Nephew s POV An presentation about using Cassandra with Astyanax By Mike Epstein Principal Engineer @ callfire.com A little bit about the presenter - I am currently a principal engineer

More information

Migrating Oracle Databases To Cassandra

Migrating Oracle Databases To Cassandra BY UMAIR MANSOOB Why Cassandra Lower Cost of ownership makes it #1 choice for Big Data OLTP Applications. Unlike Oracle, Cassandra can store structured, semi-structured, and unstructured data. Cassandra

More information

BARNS: Backup and Recovery for NoSQL Databases

BARNS: Backup and Recovery for NoSQL Databases BARNS: Backup and Recovery for NoSQL Databases Atish Kathpal, Priya Sehgal Advanced Technology Group, NetApp 1 Why Backup/Restore NoSQL DBs? Customers are directly ingesting into NoSQL Security breach

More information

Eventual Consistency 1

Eventual Consistency 1 Eventual Consistency 1 Readings Werner Vogels ACM Queue paper http://queue.acm.org/detail.cfm?id=1466448 Dynamo paper http://www.allthingsdistributed.com/files/ amazon-dynamo-sosp2007.pdf Apache Cassandra

More information

Getting to know. by Michelle Darling August 2013

Getting to know. by Michelle Darling August 2013 Getting to know by Michelle Darling mdarlingcmt@gmail.com August 2013 Agenda: What is Cassandra? Installation, CQL3 Data Modelling Summary Only 15 min to cover these, so please hold questions til the end,

More information

Tools for Social Networking Infrastructures

Tools for Social Networking Infrastructures Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes

More information

EECS 498 Introduction to Distributed Systems

EECS 498 Introduction to Distributed Systems EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Dynamo Recap Consistent hashing 1-hop DHT enabled by gossip Execution of reads and writes Coordinated by first available successor

More information

Datacenter replication solution with quasardb

Datacenter replication solution with quasardb Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION

More information

Simba ODBC Driver with SQL Connector for Apache Cassandra

Simba ODBC Driver with SQL Connector for Apache Cassandra Simba ODBC Driver with SQL Connector for Apache Cassandra 2.0.16 The release notes provide details of enhancements and features in Simba ODBC Driver with SQL Connector for Apache Cassandra 2.0.16, as well

More information

Background. Distributed Key/Value stores provide a simple put/get interface. Great properties: scalability, availability, reliability

Background. Distributed Key/Value stores provide a simple put/get interface. Great properties: scalability, availability, reliability Background Distributed Key/Value stores provide a simple put/get interface Great properties: scalability, availability, reliability Increasingly popular both within data centers Cassandra Dynamo Voldemort

More information

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo Document Sub Title Yotpo Technical Overview 07/18/2016 2015 Yotpo Contents Introduction... 3 Yotpo Architecture... 4 Yotpo Back Office (or B2B)... 4 Yotpo On-Site Presence... 4 Technologies... 5 Real-Time

More information

VMware Mirage Getting Started Guide

VMware Mirage Getting Started Guide Mirage 5.8 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more recent editions of this document,

More information

CAP Theorem, BASE & DynamoDB

CAP Theorem, BASE & DynamoDB Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत DS256:Jan18 (3:1) Department of Computational and Data Sciences CAP Theorem, BASE & DynamoDB Yogesh Simmhan Yogesh Simmhan

More information

Why Scale-Out Big Data Apps Need A New Scale- Out Storage

Why Scale-Out Big Data Apps Need A New Scale- Out Storage Why Scale-Out Big Data Apps Need A New Scale- Out Storage Modern storage for modern business Rob Whiteley, VP, Marketing, Hedvig April 9, 2015 Big data pressures on storage infrastructure The rise of elastic

More information

BigTable: A Distributed Storage System for Structured Data

BigTable: A Distributed Storage System for Structured Data BigTable: A Distributed Storage System for Structured Data Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26

More information

Dynamo: Amazon s Highly Available Key-value Store

Dynamo: Amazon s Highly Available Key-value Store Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and

More information

A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA

A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA Hima S 1, Varalakshmi P 2 and Surekha Mariam Varghese 3 Department of Computer Science and Engineering, M.A. College of Engineering, Kothamangalam,

More information

Introduction Data Model API Building Blocks SSTable Implementation Tablet Location Tablet Assingment Tablet Serving Compactions Refinements

Introduction Data Model API Building Blocks SSTable Implementation Tablet Location Tablet Assingment Tablet Serving Compactions Refinements Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. M. Burak ÖZTÜRK 1 Introduction Data Model API Building

More information

Distributed Data Analytics Partitioning

Distributed Data Analytics Partitioning G-3.1.09, Campus III Hasso Plattner Institut Different mechanisms but usually used together Distributing Data Replication vs. Replication Store copies of the same data on several nodes Introduces redundancy

More information

Massively scalable NoSQL with Apache Cassandra! Jonathan Ellis Project Chair, Apache Cassandra CTO,

Massively scalable NoSQL with Apache Cassandra! Jonathan Ellis Project Chair, Apache Cassandra CTO, Massively scalable NoSQL with Apache Cassandra! Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced Cassandra Job Trends Big Data trend Why Big Data Matters Big data Analytics (Hadoop)?

More information

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat Dynamo: Amazon s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat Dynamo An infrastructure to host services Reliability and fault-tolerance at massive scale Availability providing

More information

Trade- Offs in Cloud Storage Architecture. Stefan Tai

Trade- Offs in Cloud Storage Architecture. Stefan Tai Trade- Offs in Cloud Storage Architecture Stefan Tai Cloud computing is about providing and consuming resources as services There are five essential characteristics of cloud services [NIST] [NIST]: http://csrc.nist.gov/groups/sns/cloud-

More information

Discover CephFS TECHNICAL REPORT SPONSORED BY. image vlastas, 123RF.com

Discover CephFS TECHNICAL REPORT SPONSORED BY. image vlastas, 123RF.com Discover CephFS TECHNICAL REPORT SPONSORED BY image vlastas, 123RF.com Discover CephFS TECHNICAL REPORT The CephFS filesystem combines the power of object storage with the simplicity of an ordinary Linux

More information

Compactions in Apache Cassandra

Compactions in Apache Cassandra Thesis no: MSEE-2016-21 Compactions in Apache Cassandra Performance Analysis of Compaction Strategies in Apache Cassandra Srinand Kona Faculty of Computing Blekinge Institute of Technology SE-371 79 Karlskrona

More information

Apache Cassandra - A Decentralized Structured Storage System

Apache Cassandra - A Decentralized Structured Storage System Apache Cassandra - A Decentralized Structured Storage System Avinash Lakshman Prashant Malik from Facebook Presented by: Oded Naor Acknowledgments Some slides are based on material from: Idit Keidar, Topics

More information

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Flat Datacenter Storage Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Motivation Imagine a world with flat data storage Simple, Centralized, and easy to program Unfortunately, datacenter networks

More information

4. Managing Big Data. Cloud Computing & Big Data MASTER ENGINYERIA INFORMÀTICA FIB/UPC. Fall Jordi Torres, UPC - BSC

4. Managing Big Data. Cloud Computing & Big Data MASTER ENGINYERIA INFORMÀTICA FIB/UPC. Fall Jordi Torres, UPC - BSC 4. Managing Big Data Cloud Computing & Big Data MASTER ENGINYERIA INFORMÀTICA FIB/UPC Fall - 2013 Jordi Torres, UPC - BSC www.jorditorres.eu Slides are only for presentation guide We will discuss+debate

More information

Distributed Data with Cassandra NetworkTopologyStrategy

Distributed Data with Cassandra NetworkTopologyStrategy Distributed Data with Cassandra NetworkTopologyStrategy Presented by: Eric Tamme Outline Use case for multi data center Cassandra deployment Use Case: Sourcing and Locality OnSIP and Cassandra Cassandra

More information

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement

More information

Oracle NoSQL Database 3.0

Oracle NoSQL Database 3.0 Oracle NoSQL Database 3.0 Installation, Cluster Topology Deployment, HA and more Seth Miller, Oracle ACE Robert Greene, Product Management / Strategy Oracle Server Technologies July 09, 2014 Safe Harbor

More information

Dynamo: Key-Value Cloud Storage

Dynamo: Key-Value Cloud Storage Dynamo: Key-Value Cloud Storage Brad Karp UCL Computer Science CS M038 / GZ06 22 nd February 2016 Context: P2P vs. Data Center (key, value) Storage Chord and DHash intended for wide-area peer-to-peer systems

More information

Compaction strategies in Apache Cassandra

Compaction strategies in Apache Cassandra Thesis no: MSEE-2016:31 Compaction strategies in Apache Cassandra Analysis of Default Cassandra stress model Venkata Satya Sita J S Ravu Faculty of Computing Blekinge Institute of Technology SE-371 79

More information

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014 NoSQL Databases Amir H. Payberah Swedish Institute of Computer Science amir@sics.se April 10, 2014 Amir H. Payberah (SICS) NoSQL Databases April 10, 2014 1 / 67 Database and Database Management System

More information

CQL for DataStax Enterprise 5.1 (Previous version)

CQL for DataStax Enterprise 5.1 (Previous version) CQL for DataStax Enterprise 5.1 (Previous version) Updated: 2018-06-11-07:00 2018 DataStax, Inc. All rights reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries

More information

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed

More information

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation Dynamo Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/20 Outline Motivation 1 Motivation 2 3 Smruti R. Sarangi Leader

More information

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies Object Storage 201 PRESENTATION TITLE GOES HERE Understanding Architectural Trade-offs in Object Storage Technologies SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA

More information

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin ZHT: Const Eventual Consistency Support For ZHT Group Member: Shukun Xie Ran Xin Outline Problem Description Project Overview Solution Maintains Replica List for Each Server Operation without Primary Server

More information