NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu
Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related to Google s BigTable and Amazon s Dynamo
Purpose - Experiment - Use single, low capacity server to run mix of read/update queries and compare performance - Compare MongoDB and Cassandra - Differences between the two models - Benchmarking - Performance - Understand how execution time is reflected by database size
NoSQL Databases - BASE principle, not ACID principle - Types - Key-Value Store - Document Store - Column-Family - Graph
Types of NoSQL Databases - Key Value Store - Azure Table Storage, Redis - Document Store - MongoDB, CouchDB - Column-family - Cassandra, Accumulo - Graph - Neo4J, Infinite Graph
MongoDB vs Cassandra - Describe MongoDB and Cassandra - Differences between MongoDB and Cassandra
MongoDB - Developed in C++ by 10gen - Document Store - Schemaless - Documents stored in BSON format - 16MB limit - Identification by defined type, not just id - Indexes - Automatic unique index on _id field - Compound indexes - Areas of Use - CMS (Content Management System) system, comment storage
Characteristics of MongoDB - Most important characteristics are durability and concurrency. - Allows creation of replicas - Master-slave - One master and one or more slaves - Master reads and writes while slaves serve as backup. - When Master goes down, Slave with more recent data is promoted to master - Replica members can be configured - Locking - WiredTiger storage engine
Cassandra - Developed in Java by Apache Software Foundation - Column-family store - Similar to relational model - Designed to store large amounts of data and interact with it efficiently - Data can be distributed and stored over clusters - Areas of Use - Banking, Finance, Logging
Characteristics of Cassandra - Most important characteristics are durability, high availability and scalability. - Peer-to-peer - Possible to store petabytes of data - Failed nodes can be replaced quickly - Replication Strategies - Simple - Network Topology - Replication types - Synchronous - Asynchronous - Indexes - Implemented as hidden table
Feature Comparison - Similarities in core properties - Locking, file types, querying, transactions, data storage and operating systems - MongoDB - Better for frequently written data and use of dynamic queries - Queries are written JSON-like - Cassandra - Optimized for storing and interacting with large amounts of data - CQL based on SQL - Main difference - MongoDB is a CP type system - Cassandra is a PA type system
Benchmark - YCSB (Yahoo! Cloud Serving Benchmark) - 6 Workloads - A, B, C, D, E G, H - 32bit VM with Ubuntu/2GB Ram - Windows 7-4GB Ram - Intel Core 2 Quad - Mongo 2.4.3 - Cassandra 1.2.4
Benchmark Process - Three different data sets - 100K, 280K, 700K - 1 Record = 1KB (10 fields) - Each field contains random characters(ex: user1234123 ) - Workload - Executed 3 times (with computer restart after each one) - Average of the 3
Evaluation: Insertion - Mongo 24% faster than Cassandra
Workload A - 50% read and 50% update - Cassandra is ~2.54 faster than Mongo - 700K faster than 280K for both - NoSQL optimized for larger datasets. - Nodes/Clusters
Workload B - 95% read, 5% update - Mongo faster with smaller sets - Cassandra faster with bigger sets
Workload C 100% read - Same behavior as Workload B - Cassandra utilizes MemTable/SSTable - Mongo utilizes memory-mapped file - If not enough RAM, page faults occur - Large data = large # of page faults = slower
Workload F - Read-modify-write - Read file - Modify file - Write it back - Same behavior of Workload B/C
Workload G - 5% read, 95% update - Cassandra super fast - Writes are appendable on Cassandra - Append to end of file - Writes are in place on Mongo - Locate page on disk - Put in memory - Update - Write back to disk
Workload H - 100% update - Same behavior as Workload G
Summary - RDBMS performance slows when apps begin to develop and become more complex - NoSQL popularity is rising and are being integrated into many production products. - Compare / contrast between Mongo and Cassandra - After workload benchmarks, Cassandra seems to be faster for bigger scale applications. - Small set / mostly read - Mongo - Mostly write / big set - Cassandra