NOSQL DATABASES OCTOBER 20, A comparison between the MongoDB, Cassandra, and Redis databases ANDREW HYTE

Size: px
Start display at page:

Download "NOSQL DATABASES OCTOBER 20, A comparison between the MongoDB, Cassandra, and Redis databases ANDREW HYTE"

Transcription

1 NOSQL DATABASES A comparison between the MongoDB, Cassandra, and Redis databases OCTOBER 20, 2015 ANDREW HYTE

2 Contents Introduction... 2 MongoDB... 2 History... 2 Data Model... 2 Physical Storage... 3 Transactions... 4 Scalability... 4 Cassandra... 5 History... 5 Data Model... 5 Physical Storage... 6 Transactions... 6 Scalability... 7 Redis... 8 History... 8 Data Model... 8 Physical Storage... 8 Transactions... 9 Scalability... 9 Differences Conclusions... 10

3 Introduction NoSQL Databases are acclaimed by many web developers for their partition tolerance. Most NoSQL databases have that in common however there are many differences in the way they achieve that scalability and in how much availability or consistency they provide. This report examines the history, data models, physical storage, transactional capabilities, and the scalability of three different types of NoSQL databases. Pay attention to the differences between the three and the advantages or disadvantages one affords over another. MongoDB MongoDb may be the most well-known of all NoSQL databases. This assumption is supported by the fact that the DB-Engines group has ranked MongoDB as the number four most popular Database Management System overall, lead in popularity only by three relational databases (Oracle, MySQL, and Microsoft SQL Server) (DB-Engines). History It was originally created by two developers who founded DoubleClick, Eliot Horowitz and Dwight Merriman (Chodorow). The two left their company and went on to found several other startups and ran into the same problem over and over: How to scale out an application? What they decided to do next was to create a type of platform as a service similar to Google app engine. They originally were going to call the database ED for Eliot and Dwight. The database was only part of this PaaS package. The system as a whole was not readily adopted and the project would have been a flop had it not been for the database. People were saying stuff like, Well, the database is cool, but blech, app engine. (Chodorow). When the owners recognized what they had, they decided to strip out the database, name it Mongo, coming from Humongous, and open source it. The database quickly started gaining traction and soon had many developers not only using it but also contributing to the project and creating their own versions. Today there are countless projects which originated from MongoDB such as: Casbah, Morphia, MongoMapper, Mongoose, CandyGram, MongoKit, Mongoid, and Ming, to name a few. Data Model MongoDB uses documents in the Binary JSON format to store data instead of tables and rows as used in a traditional relational database (MongoDB). MongoDB has a document data model which works well for most modern software applications using the object oriented programming paradigm. Since the document data model is lightweight, traversable and fast, MongoDB also supports useful queries along with its ability to uniquely index data. Indexes support the efficient traversal of collections when querying data. If there are no indexes defined, then Mongo must traverse all of the documents in a collection versus just the ones with the particular index. One of the most interesting ways Mongo has made it possible to index and query data is using geospatial indexes. There is a built in 2dsphere indexing system which allows data to be indexed in relation to some 2dshere, like the earth. This makes geospatial queries exceptionally quick. MongoDB uses BSON (Binary JSON) to build their document data structure. BSON build on JSON to include some extra types and to provide efficient encoding and decoding of data in different languages.

4 Physical Storage Thanks to Mongo s ability to easily shard a database into many sections, the size of the physical storage on a machine is not limited. One could have a 1 TB database on a single machine or just as easily have GB shards distributed on to multiple machines. This report will discuss sharding further in the Scalability section. One topic to be discussed in physical storage is replication. Mongo can replicate its data easily, creating multiple servers that hold the same data. If needed, one of these servers can become the primary server. Mongo allows this to happen easily with its implementation of replica set elections. The primary in a set is the only database that can accept write operations. If a primary member of the set becomes unavailable, elections make it possible to resume normal operation without the necessary intervention of a DBA. Figure 1: If a member fails, an election is held to change the primary. Elections take some time and don t allow for writes during the process; for these reasons, Mongo avoids holding elections unless absolutely necessary (MongoDB). The following are some of the factors which drive elections: the replica databases send heartbeats (pings) to each other every two seconds. If a heartbeat is not returned within 10 seconds, the delinquent server is marked as unavailable. Priorities may be set on a replica member. If the highest priority member is already the primary, then no election will be held. Members with a zero priority cannot become the primary and are not considered in the elections. A member must be able to connect to a majority of the other members in order to be eligible to become primary. If there are no members to connect to a majority of the other members in a replica set, then no primary will be elected. The following are a few of the events which may trigger an election: if there is not currently a primary, an election will be held. This may occur if a new replica set has been added, a secondary loses connection with a primary, or a primary steps down. Primaries will step down if they are asked specifically with a command, if another member has a higher priority and is eligible to be a primary, or if the primary loses contact with the majority of the group.

5 Elections are a way that MongoDB has made it easy to shard the database onto many different machines and still be able to use replication. A large benefit to this is that it is already integrated. The overhead of MongoDB is very light since much of the needed systems are already integrated in Mongo, such as Analytics, text search, geospatial, in-memory performance, and global replication (MongoDB). Transactions Mongo doesn t advertise that it can do transactions. It does however offer a Transaction-Like operation where it makes a series of writes conditional on the success of all of the writes. The reason for being called Transaction-Like is because intermediate processes can still return data while the transaction is being committed. This Transaction-Like process is called Two-Phase Commits. These two phase commits allow for data to be written to multiple documents and still allow for data to be recovered, should an error occur. There are various transaction commands which give the user full control over the order of the data write process. The most important part of transactions is not the write syntax, but what happens in the case there is an error in the procedure. Mongo offers different states which refer to the steps in the transaction process; most notably there are the Applied and the Pending states. For each state in the transaction process, if an error occurs, there are certain operations which may be used to revert to a previous state, start a transaction over, or even to Rollback or undo an applied action. Even though there are not true relational database like transactions in MongoDB, the technology does a pretty good job at making up for it. Two phase commits are a great work around, and there is plenty of documentation to help out if one wishes to explore the functionality out. Scalability Figure 2: Sharding and Replication. Historically scaling has mostly been known to happen in a vertical manner, meaning that when an application needed more memory storage space, an upgrade to the database server would be performed. Resources such as physical memory, RAM or graphics cards could be added, replaced or upgraded. Mongo s approach to scaling, much like most other NoSQL databases, is horizontal scaling or sharding. With Mongo, a user can set up the database to auto shard.

6 Cassandra History Avinash Lakshman invented Cassandra at Facebook in The project was started during one of Facebook s hackathons in The main goal was to help query the massive amounts of data the company was dealing with, particularly in users inboxes. The project was released to the open source community in July of 2008 and by February 2010, it was considered an apache top-level project. The software has speculatively been named after the Greek mythological prophet Cassandra. The myth goes that the princess of Troy was given prophetic powers by Apollo who wished something in return. When he did not get what he wanted, he cursed her by making it so no one would ever believe her word again. An astute blogger at Kellabyte.com points out that the creators at Facebook may have put a little more thought into the name than just a cool Greek myth: Cassandra is the cursed Oracle (Kellabyte). Data Model The data model is meant to be somewhat familiar for traditional RDBMS users. An instance of Cassandra has one table which is made up of multiple column families as defined by the user. Each column family can contain one of two structures: super-columns or columns. There is no limit on the number of these that can be stored in a column family. Columns have a name value and a user defined timestamp associated with them. The number of columns that is allowed in a column family is very large. Super columns are a data structure which have a name and an infinite number of columns associated with them. Overall they exhibit the same characteristics as columns. Columns are made up of row entries. Each row is made up of columns and has a primary key. The first part of a key is a column name. This is where things become interesting, in the way the database is distributed, which is addressed more in the Scalability Section of this report. Every row is uniquely identified by a partition key, which is a string, and has no limit on its size. All rows are distributed across the cluster based on the value of the hashed key. One feature which should be mentioned is the CQL, which is very familiar to users of SQL. For example, create table statement looks like this: CREATE TABLE tweets ( tweet_id uuid PRIMARY KEY, author varchar, body varchar ); Or adding a new column in a table: ALTER TABLE users ADD birth_date INT; This may present the question then: How is Cassandra different from a Relational database? The key is in the way that it allocates memory for a column. In traditional RDBMS each row reserves storage space for every column it is associated with, even if there is nothing populated in that column for a particular entry.

7 Figure 3: In a static-column storage engine, each row must reserve space for every column Figure 4: In a sparse-column engine, space is only used by columns present in each row In Cassandra a row is sparse, meaning only columns which have data are stored. In this way Cassandra affords its users flexibility, normally associated with a schema less system like MongoDB, while also providing the benefits of a defined schema like RDBMS typically have. This also means that Cassandra can easily support thousands of columns per table without wasting space, if each row only needs a few of them (Ellis). Physical Storage Nodes make up the basic infrastructure of Cassandra. A data center is a collection of nodes. These data centers can either be physical or virtual data centers. A cluster contains one or more data centers and it may be distributed over physical locations. Cassandra is designed to handle big data workloads across multiple nodes with no single point of failure. One of the biggest advantages of Cassandra is the fact that servers do not depend on each other to a degree that would cause multiple failures if one server lost connection with another. Creator Avinash Lakshman described the problem that led them to come up with Cassandra as a fragile system which had too many points of failure. Facebook had a lot of data just sitting around on a lot of servers, which created a sort of house of cards effect. When one server went down, it caused big issues for the system as a whole. With Cassandra, the data can be distributed across many systems in a way that one server s failure, which inevitably happens, has only the smallest impact on the entire application. Transactions Cassandra doesn t use ACID transactions with rollback mechanisms, but instead offers atomic, isolated, and durable transactions with eventual consistency. Cassandra s transactions allow the user to decide how strong or eventual they would like the transaction to be. Atomicity means that everything in a transaction succeeds or else the entire transaction is rolled back. Transactions cannot interfere with each other, and completed transactions persist in the event of crashes or failure. Lightweight transactions can be used in INSERT and UPDATE statements using the IF clause in CQL. For example: INSERT INTO USERS (login, , name, login_count) values ('jbellis', 'jbellis@datastax.com', 'Jonathan Ellis', 1) IF NOT EXISTS Or

8 UPDATE users SET reset_token = null, password = newpassword WHERE login = jbellis IF reset_token = some-generated-reset-token In these cases the preceding commands will only take place if the IF condition is met. Scalability Cassandra is designed to handle large amounts of data across multiple nodes with no single point of failure. The architecture of Cassandra takes into account that system failures can and will happen. To remedy this problem, the system employs a peer to peer distributed system. According to a post by the creator of the database on Facebook in August 2008 (close to when the technology was first developed), Facebook was using Cassandra for its Inbox search system and had scaled to a cluster of 600+ cores and 120+ TB of disk space. In this same post on Facebook Avinash Lakshman says, Reliability at massive scale is a very big challenge. Outages in the service can have significant negative impact. Hence Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different datacenters). At this scale, small and large components fail continuously; the way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. Data is replicated across systems and eventual consistency is the mantra of the system. This is because a user may write data to any one of the nodes in the cluster and then the changes are eventually pushed out to the rest of the nodes via the peer to peer communication of the nodes. Consistency is one of the tradeoffs in Cassandra in order to achieve high availability and partition tolerance. An advantage which stems from this tradeoff however, is that the system has great incremental scalability properties which can be achieved as easily as dropping a new node and automatically having it initialized with data.

9 Redis History Redis ( REmote DIctionary Service ) is a key value database which was originally developed by an Italian software engineer named Salvatore Sanfilippo (Russo). While Sanfilippo was working at a company he started, he developed an application which would allow a developer to see who was accessing his site and what actions they were taking in real time. This application was called LLOOGG. With the rapid rate and the large amount of data coming in to the application, there was no way his original implementation using MySQL could keep up and scale according to needs. So in early 2009, Sanfilippo started working on Redis to help take care of the scalability needs. By June 2009, Redis was released as the production database for LLOOGG. After this initial release, Redis became a hit in the NoSQL community. Sanfilippo added features quickly and was always helping resolve database corruption bugs and other Redis related issues. In March of 2010, Sanfilippo was hired by VM Ware to work full time on Redis, even though it was BSD open source licensed. Data Model The data model used in Redis is very familiar to computer scientists. A programmer will use Strings, Lists, Sets, Sorted Sets, and Hashes on a regular basis. These are all types of data that can be stored in a Redis data base. Regardless of the data that is being stored, it is always identified by a key, and that key is always a string. Physical Storage The way that Redis can perform 100,000+ SETs and 80,000+ GETs per second is by requiring the entire dataset to be loaded to memory at all times (Russo). This may be argued to be one of the main disadvantages to using Redis because the amount of RAM needed is proportional to the size of the data set. In most cases using RAM is very fast, yet very expensive. Replication is available in Redis through the master and slave topology. There is one master which may have any number of slaves. Each slave can have as many other slaves as desired as well. This allows for many different server configurations and personalization. When a slave is initialized, it subscribes as a slave of another member of the topology. After the initialization process, the slave is given the snapshot of the current master and is then notified of all commands the master receives after initiating that snapshot. Data persistence is achieved in various ways: if data durability is not of great importance, then the snap shot technique is recommended. This involves a snapshot of the entire data set being taken every x seconds and being written to memory. This operation has been optimized to use at most 2x the memory needed for the entire set. If durability is desired then the append-only file method is suggested. This method syncs data to a file in memory, which upon server failure and restart, just replays the entire file into active memory again. The synchronization process may be set up to be carried out with every command, every second, or let the OS decide when to sync.

10 Transactions MULTI, EXEC, DISCARD, and WATCH are the commands most often associated with transactions in Redis. A user may queue up multiple commands using MULTI. Instead of executing these commands Redis will queue them. All commands are then run once EXEC is called. A user may call Discard and this will flush the transaction queue and exit the transaction. If errors occur during the execution of a transaction, however, this will not stop the execution of the other commands in the transaction. In order to maintain speed of commands, there are no roll back capabilities in Redis. Scalability Since data is stored in a key value pair Redis makes it very easy to partition the data set and distribute over multiple computers. Because Redis is an in memory database, the overall possible size of a Redis instance depends on how much RAM is available. Partitioning and distributing to multiple computers adds more resource,s and therefore increases the overall capacity to the total amount of RAM in the cluster. Redis is architected in such a way that allows multiple choices for a partitioning strategy. One of the more useful strategies is called hash partitioning. This is where the key name is hashed according to some hash function the user defines. The hash number is then modulo by the number of computers in the Redis cluster. The resulting number then tells the program which computer the key value pair should be stored on. Figure 5: Redis is easily partitioned thanks to the Key Value data model. As an example of how Redis is scalable we can observe what Twitter has done with it. In 2014, according to highscalability.com, the timeline feature of Twitter alone was using around 40 TB of RAM. The Redis instance running the timeline feature got over 30 million queries per second and had more than 6000 instances running (Hoff).

11 Differences The main differences between the three databases compared in this report are in the data models and the amount of consistency vs. availability afforded. MongoDB is an example of a document database where the main advantage is in the flexibility of the data schema. As long as the data can be represented as a JSON object it can be stored in the database. The superior sharding and redundancy of Mongo allows for pretty good consistency while maintaining scalability. Cassandra is an example of a table database and is best used in situations where the stakes are high and data must always be available. The peer to peer architecture between nodes allows for high availability and ensures that there is no single point of failure. Redis is built for speed. With great speed comes great resource demands, since this key value store database requires that the entire data set be loaded in RAM at all times. The nature of a key value database allows for easy partitioning and generally values consistency over availability. Conclusions One of the greatest reasons to choose a specific technology maybe the amount of documentation there is on that technology. Not only does this help with learning, but it can also be a great advantage to using the technology the most effective way possible. Although all three technologies considered in this report have pretty good documentation on their respective websites, at this point in time, MongoDB has a humongous advantage in community documentation over the other two. All three technologies have their own scaling options and do that relatively well. So a technology decision for myself would depend on the use case. I would choose: Redis as a very interesting option for a system which is projected to grow quickly and needs to make exceptionally fast reads and writes. Mongo for a quick project which may not be completely thought through, this is due to the flexibility it affords. The simplicity of the BSON format would be useful for the same reason. I realize that this way of thinking may cause problems down the road, but who are we kidding, as a developer sometimes we just want to quickly throw something together. Cassandra for extremely large data sets which may have lots of different connections. With lot of reads and writes happening at any given time. Cassandra is also useful in situations where I need to be able to run complex queries on the data and get my results quickly. I would not use Cassandra in applications such as banking software where consistency is paramount, since the system runs with the mantra of eventual consistency is good enough.

12 References Chodorow, Kristina Laptop DB-Engines. n.d Ellis, Jonathan February October Hoff, Todd Laptop Kellabyte Laptop MongoDB. n.d. Laptop n.d. Laptop n.d. Laptop Russo, Michael Laptop

NoSQL Databases Analysis

NoSQL Databases Analysis NoSQL Databases Analysis Jeffrey Young Intro I chose to investigate Redis, MongoDB, and Neo4j. I chose Redis because I always read about Redis use and its extreme popularity yet I know little about it.

More information

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these

More information

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related

More information

CIB Session 12th NoSQL Databases Structures

CIB Session 12th NoSQL Databases Structures CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is

More information

NoSQL Database Comparison: Bigtable, Cassandra and MongoDB CJ Campbell Brigham Young University October 16, 2015

NoSQL Database Comparison: Bigtable, Cassandra and MongoDB CJ Campbell Brigham Young University October 16, 2015 Running Head: NOSQL DATABASE COMPARISON: BIGTABLE, CASSANDRA AND MONGODB NoSQL Database Comparison: Bigtable, Cassandra and MongoDB CJ Campbell Brigham Young University October 16, 2015 1 INTRODUCTION

More information

Migrating Oracle Databases To Cassandra

Migrating Oracle Databases To Cassandra BY UMAIR MANSOOB Why Cassandra Lower Cost of ownership makes it #1 choice for Big Data OLTP Applications. Unlike Oracle, Cassandra can store structured, semi-structured, and unstructured data. Cassandra

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours

More information

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM About us Adamo Tonete MongoDB Support Engineer Agustín Gallego MySQL Support Engineer Agenda What are MongoDB and MySQL; NoSQL

More information

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( ) Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL

More information

CSE 530A. Non-Relational Databases. Washington University Fall 2013

CSE 530A. Non-Relational Databases. Washington University Fall 2013 CSE 530A Non-Relational Databases Washington University Fall 2013 NoSQL "NoSQL" was originally the name of a specific RDBMS project that did not use a SQL interface Was co-opted years later to refer to

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

CS 655 Advanced Topics in Distributed Systems

CS 655 Advanced Topics in Distributed Systems Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3

More information

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System Overview The current paradigm (CCL and Relational DataBase) Propose of a new monitor data system using NoSQL Monitoring Storage Requirements

More information

Modern Database Concepts

Modern Database Concepts Modern Database Concepts Basic Principles Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz NoSQL Overview Main objective: to implement a distributed state Different objects stored on different

More information

Rule 14 Use Databases Appropriately

Rule 14 Use Databases Appropriately Rule 14 Use Databases Appropriately Rule 14: What, When, How, and Why What: Use relational databases when you need ACID properties to maintain relationships between your data. For other data storage needs

More information

Chapter 24 NOSQL Databases and Big Data Storage Systems

Chapter 24 NOSQL Databases and Big Data Storage Systems Chapter 24 NOSQL Databases and Big Data Storage Systems - Large amounts of data such as social media, Web links, user profiles, marketing and sales, posts and tweets, road maps, spatial data, email - NOSQL

More information

Relational databases

Relational databases COSC 6397 Big Data Analytics NoSQL databases Edgar Gabriel Spring 2017 Relational databases Long lasting industry standard to store data persistently Key points concurrency control, transactions, standard

More information

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22

More information

Topics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL

Topics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL Databases Topics History - RDBMS - SQL Architecture - SQL - NoSQL MongoDB, Mongoose Persistent Data Storage What features do we want in a persistent data storage system? We have been using text files to

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos Instituto Politécnico de Tomar Introduction to Big Data NoSQL Databases Ricardo Campos Mestrado EI-IC Análise e Processamento de Grandes Volumes de Dados Tomar, Portugal, 2016 Part of the slides used in

More information

Charity

Charity Charity Majors @mipsytipsy Charity Majors @mipsytipsy @mipsytipsy Production Engineering Manager @ FB Operations Engineer Accidental DBA Hates software (all of it) Mobile Backend Powers >500k apps MongoDB,

More information

Massive Scalability With InterSystems IRIS Data Platform

Massive Scalability With InterSystems IRIS Data Platform Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special

More information

CSE 344 Final Review. August 16 th

CSE 344 Final Review. August 16 th CSE 344 Final Review August 16 th Final In class on Friday One sheet of notes, front and back cost formulas also provided Practice exam on web site Good luck! Primary Topics Parallel DBs parallel join

More information

Scaling. Marty Weiner Grayskull, Eternia. Yashh Nelapati Gotham City

Scaling. Marty Weiner Grayskull, Eternia. Yashh Nelapati Gotham City Scaling Marty Weiner Grayskull, Eternia Yashh Nelapati Gotham City Pinterest is... An online pinboard to organize and share what inspires you. Relationships Marty Weiner Grayskull, Eternia Yashh Nelapati

More information

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented

More information

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS Questions & Answers- DBMS https://career.guru99.com/top-50-database-interview-questions/ 1) Define Database. A prearranged collection of figures known as data is called database. 2) What is DBMS? Database

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.

More information

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,

More information

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

MongoDB Schema Design

MongoDB Schema Design MongoDB Schema Design Demystifying document structures in MongoDB Jon Tobin @jontobs MongoDB Overview NoSQL Document Oriented DB Dynamic Schema HA/Sharding Built In Simple async replication setup Automated

More information

A Non-Relational Storage Analysis

A Non-Relational Storage Analysis A Non-Relational Storage Analysis Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Cloud Computing - 2nd semester 2012/2013 Universitat Politècnica de Catalunya Microblogging - big data?

More information

MongoDB - a No SQL Database What you need to know as an Oracle DBA

MongoDB - a No SQL Database What you need to know as an Oracle DBA MongoDB - a No SQL Database What you need to know as an Oracle DBA David Burnham Aims of this Presentation To introduce NoSQL database technology specifically using MongoDB as an example To enable the

More information

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high

More information

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline 10. Replication CSEP 545 Transaction Processing Philip A. Bernstein Copyright 2003 Philip A. Bernstein 1 Outline 1. Introduction 2. Primary-Copy Replication 3. Multi-Master Replication 4. Other Approaches

More information

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,

More information

Datacenter replication solution with quasardb

Datacenter replication solution with quasardb Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION

More information

RethinkDB. Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu

RethinkDB. Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu RethinkDB Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu Content Introduction System Features Data Model ReQL Applications Introduction Niharika Vithala What is a NoSQL Database Databases that

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

Introduction to NoSQL

Introduction to NoSQL Introduction to NoSQL Agenda History What is NoSQL Types of NoSQL The CAP theorem History - RDBMS Relational DataBase Management Systems were invented in the 1970s. E. F. Codd, "Relational Model of Data

More information

Switching to Innodb from MyISAM. Matt Yonkovit Percona

Switching to Innodb from MyISAM. Matt Yonkovit Percona Switching to Innodb from MyISAM Matt Yonkovit Percona -2- DIAMOND SPONSORSHIPS THANK YOU TO OUR DIAMOND SPONSORS www.percona.com -3- Who We Are Who I am Matt Yonkovit Principal Architect Veteran of MySQL/SUN/Percona

More information

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY WHAT IS NOSQL? Stands for No-SQL or Not Only SQL. Class of non-relational data storage systems E.g.

More information

How you can benefit from using. javier

How you can benefit from using. javier How you can benefit from using I was Lois Lane redis has super powers myth: the bottleneck redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop,mset -P 16 -q On my laptop: SET: 513610 requests

More information

Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila Prakash, Dhanusha Varik

Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila Prakash, Dhanusha Varik Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila Prakash, Dhanusha Varik mongodb (humongous) Introduction What is MongoDB? Why MongoDB? MongoDB Terminology Why Not MongoDB? What is MongoDB? DOCUMENT STORE

More information

CA485 Ray Walshe NoSQL

CA485 Ray Walshe NoSQL NoSQL BASE vs ACID Summary Traditional relational database management systems (RDBMS) do not scale because they adhere to ACID. A strong movement within cloud computing is to utilize non-traditional data

More information

Kim Greene - Introduction

Kim Greene - Introduction Kim Greene kim@kimgreene.com 507-216-5632 Skype/Twitter: iseriesdomino Copyright Kim Greene Consulting, Inc. All rights reserved worldwide. 1 Kim Greene - Introduction Owner of an IT consulting company

More information

Architekturen für die Cloud

Architekturen für die Cloud Architekturen für die Cloud Eberhard Wolff Architecture & Technology Manager adesso AG 08.06.11 What is Cloud? National Institute for Standards and Technology (NIST) Definition On-demand self-service >

More information

Course Content MongoDB

Course Content MongoDB Course Content MongoDB 1. Course introduction and mongodb Essentials (basics) 2. Introduction to NoSQL databases What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL

More information

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ] Database Availability and Integrity in NoSQL Fahri Firdausillah [M031010012] What is NoSQL Stands for Not Only SQL Mostly addressing some of the points: nonrelational, distributed, horizontal scalable,

More information

Documentation Accessibility. Access to Oracle Support

Documentation Accessibility. Access to Oracle Support Oracle NoSQL Database Availability and Failover Release 18.3 E88250-04 October 2018 Documentation Accessibility For information about Oracle's commitment to accessibility, visit the Oracle Accessibility

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

MySQL High Availability. Michael Messina Senior Managing Consultant, Rolta-AdvizeX /

MySQL High Availability. Michael Messina Senior Managing Consultant, Rolta-AdvizeX / MySQL High Availability Michael Messina Senior Managing Consultant, Rolta-AdvizeX mmessina@advizex.com / mike.messina@rolta.com Introduction Michael Messina Senior Managing Consultant Rolta-AdvizeX, Working

More information

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Copyright 2013, Oracle and/or its affiliates. All rights reserved. 1 Oracle NoSQL Database: Release 3.0 What s new and why you care Dave Segleau NoSQL Product Manager The following is intended to outline our general product direction. It is intended for information purposes

More information

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval New Oracle NoSQL Database APIs that Speed Insertion and Retrieval O R A C L E W H I T E P A P E R F E B R U A R Y 2 0 1 6 1 NEW ORACLE NoSQL DATABASE APIs that SPEED INSERTION AND RETRIEVAL Introduction

More information

DATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016

DATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016 DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016 AGENDA FOR TODAY Advanced Mysql More than just SELECT Creating tables MySQL optimizations: Storage engines, indexing.

More information

Cassandra Couldn't Detect Any Schema

Cassandra Couldn't Detect Any Schema Cassandra Couldn't Detect Any Schema Definitions In Local Storage Follow-up: The incorrect schema propagated to other servers. Couldn't detect any schema definitions in local storage - after handling schema

More information

Scalability of web applications

Scalability of web applications Scalability of web applications CSCI 470: Web Science Keith Vertanen Copyright 2014 Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing

More information

4 Myths about in-memory databases busted

4 Myths about in-memory databases busted 4 Myths about in-memory databases busted Yiftach Shoolman Co-Founder & CTO @ Redis Labs @yiftachsh, @redislabsinc Background - Redis Created by Salvatore Sanfilippo (@antirez) OSS, in-memory NoSQL k/v

More information

Performance Comparison of NOSQL Database Cassandra and SQL Server for Large Databases

Performance Comparison of NOSQL Database Cassandra and SQL Server for Large Databases Performance Comparison of NOSQL Database Cassandra and SQL Server for Large Databases Khalid Mahmood Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology, Karachi Pakistan khalidmdar@yahoo.com

More information

/ Cloud Computing. Recitation 7 October 10, 2017

/ Cloud Computing. Recitation 7 October 10, 2017 15-319 / 15-619 Cloud Computing Recitation 7 October 10, 2017 Overview Last week s reflection Project 3.1 OLI Unit 3 - Module 10, 11, 12 Quiz 5 This week s schedule OLI Unit 3 - Module 13 Quiz 6 Project

More information

Cassandra Database Security

Cassandra Database Security Cassandra Database Security Author: Mohit Bagria NoSQL Database A NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular

More information

relational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance

relational Relational to Riak Why Move From Relational to Riak? Introduction High Availability Riak At-a-Glance WHITEPAPER Relational to Riak relational Introduction This whitepaper looks at why companies choose Riak over a relational database. We focus specifically on availability, scalability, and the / data model.

More information

DATABASE DESIGN II - 1DL400

DATABASE DESIGN II - 1DL400 DATABASE DESIGN II - 1DL400 Fall 2016 A second course in database systems http://www.it.uu.se/research/group/udbl/kurser/dbii_ht16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

There And Back Again

There And Back Again There And Back Again Databases At Uber Evan Klitzke October 4, 2016 Outline Background MySQL To Postgres Connection Scalability Write Amplification/Replication Miscellaneous Other Things Databases at Uber

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

Massively scalable NoSQL with Apache Cassandra! Jonathan Ellis Project Chair, Apache Cassandra CTO,

Massively scalable NoSQL with Apache Cassandra! Jonathan Ellis Project Chair, Apache Cassandra CTO, Massively scalable NoSQL with Apache Cassandra! Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced Cassandra Job Trends Big Data trend Why Big Data Matters Big data Analytics (Hadoop)?

More information

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23 Final Exam Review 2 Kathleen Durant CS 3200 Northeastern University Lecture 23 QUERY EVALUATION PLAN Representation of a SQL Command SELECT {DISTINCT} FROM {WHERE

More information

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano Database Evolution DB NoSQL Linked Open Data Requirements and features Large volumes of data..increasing No regular data structure to manage Relatively homogeneous elements among them (no correlation between

More information

When, Where & Why to Use NoSQL?

When, Where & Why to Use NoSQL? When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),

More information

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence.

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence. SCALABLE DATABASES From Relational Databases To Polyglot Persistence Sergio Bossa sergio.bossa@gmail.com http://twitter.com/sbtourist About Me Software architect and engineer Gioco Digitale (online gambling

More information

Upgrading Databases. without losing your data, your performance or your mind. Charity

Upgrading Databases. without losing your data, your performance or your mind. Charity Upgrading Databases without losing your data, your performance or your mind Charity Majors @mipsytipsy Upgrading Databases without losing your data, your performance or your mind Charity Majors @mipsytipsy

More information

Scaling for Humongous amounts of data with MongoDB

Scaling for Humongous amounts of data with MongoDB Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight alvinonmongodb.com From here... http://bit.ly/ot71m4 ...to here... http://bit.ly/oxcsis

More information

Oracle NoSQL Database at OOW 2017

Oracle NoSQL Database at OOW 2017 Oracle NoSQL Database at OOW 2017 CON6544 Oracle NoSQL Database Cloud Service Monday 3:15 PM, Moscone West 3008 CON6543 Oracle NoSQL Database Introduction Tuesday, 3:45 PM, Moscone West 3008 CON6545 Oracle

More information

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database How do we build TiDB a Distributed, Consistent, Scalable, SQL Database About me LiuQi ( 刘奇 ) JD / WandouLabs / PingCAP Co-founder / CEO of PingCAP Open-source hacker / Infrastructure software engineer

More information

Getting to know. by Michelle Darling August 2013

Getting to know. by Michelle Darling August 2013 Getting to know by Michelle Darling mdarlingcmt@gmail.com August 2013 Agenda: What is Cassandra? Installation, CQL3 Data Modelling Summary Only 15 min to cover these, so please hold questions til the end,

More information

Database Solution in Cloud Computing

Database Solution in Cloud Computing Database Solution in Cloud Computing CERC liji@cnic.cn Outline Cloud Computing Database Solution Our Experiences in Database Cloud Computing SaaS Software as a Service PaaS Platform as a Service IaaS Infrastructure

More information

MongoDB. copyright 2011 Trainologic LTD

MongoDB. copyright 2011 Trainologic LTD MongoDB MongoDB MongoDB is a document-based open-source DB. Developed and supported by 10gen. MongoDB is written in C++. The name originated from the word: humongous. Is used in production at: Disney,

More information

Shen PingCAP 2017

Shen PingCAP 2017 Shen Li @ PingCAP About me Shen Li ( 申砾 ) Tech Lead of TiDB, VP of Engineering Netease / 360 / PingCAP Infrastructure software engineer WHY DO WE NEED A NEW DATABASE? Brief History Standalone RDBMS NoSQL

More information

Introduction to NoSQL Databases

Introduction to NoSQL Databases Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction

More information

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara Week 1-B-0 Week 1-B-1 CS535 BIG DATA FAQs Slides are available on the course web Wait list Term project topics PART 0. INTRODUCTION 2. DATA PROCESSING PARADIGMS FOR BIG DATA Sangmi Lee Pallickara Computer

More information

Introduction to Database Services

Introduction to Database Services Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational

More information

Blizzard: A Distributed Queue

Blizzard: A Distributed Queue Blizzard: A Distributed Queue Amit Levy (levya@cs), Daniel Suskin (dsuskin@u), Josh Goodwin (dravir@cs) December 14th 2009 CSE 551 Project Report 1 Motivation Distributed systems have received much attention

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

10. Replication. Motivation

10. Replication. Motivation 10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure

More information

Transactions and ACID

Transactions and ACID Transactions and ACID Kevin Swingler Contents Recap of ACID transactions in RDBMSs Transactions and ACID in MongoDB 1 Concurrency Databases are almost always accessed by multiple users concurrently A user

More information

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568 FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected

More information

Introduction to MySQL Cluster: Architecture and Use

Introduction to MySQL Cluster: Architecture and Use Introduction to MySQL Cluster: Architecture and Use Arjen Lentz, MySQL AB (arjen@mysql.com) (Based on an original paper by Stewart Smith, MySQL AB) An overview of the MySQL Cluster architecture, what's

More information

MongoDB. David Murphy MongoDB Practice Manager, Percona

MongoDB. David Murphy MongoDB Practice Manager, Percona MongoDB Click Replication to edit Master and Sharding title style David Murphy MongoDB Practice Manager, Percona Who is this Person and What Does He Know? Former MongoDB Master Former Lead DBA for ObjectRocket,

More information

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook Cassandra - A Decentralized Structured Storage System Avinash Lakshman and Prashant Malik Facebook Agenda Outline Data Model System Architecture Implementation Experiments Outline Extension of Bigtable

More information

Distributed Data Management Replication

Distributed Data Management Replication Felix Naumann F-2.03/F-2.04, Campus II Hasso Plattner Institut Distributing Data Motivation Scalability (Elasticity) If data volume, processing, or access exhausts one machine, you might want to spread

More information

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability Topics COS 318: Operating Systems File Performance and Reliability File buffer cache Disk failure and recovery tools Consistent updates Transactions and logging 2 File Buffer Cache for Performance What

More information

Tools for Social Networking Infrastructures

Tools for Social Networking Infrastructures Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes

More information

Cassandra- A Distributed Database

Cassandra- A Distributed Database Cassandra- A Distributed Database Tulika Gupta Department of Information Technology Poornima Institute of Engineering and Technology Jaipur, Rajasthan, India Abstract- A relational database is a traditional

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

TWOO.COM CASE STUDY CUSTOMER SUCCESS STORY

TWOO.COM CASE STUDY CUSTOMER SUCCESS STORY TWOO.COM CUSTOMER SUCCESS STORY With over 30 million users, Twoo.com is Europe s leading social discovery site. Twoo runs the world s largest scale-out SQL deployment, with 4.4 billion transactions a day

More information