Distributed Databases: SQL vs NoSQL

Similar documents
CIB Session 12th NoSQL Databases Structures

International Journal of Informative & Futuristic Research ISSN:

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Introduction to NoSQL Databases

Introduction to NoSQL

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Chapter 24 NOSQL Databases and Big Data Storage Systems

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

Safe Harbor Statement

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

CSE 344 Final Review. August 16 th

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

Relational databases

CompSci 516 Database Systems

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

Course Introduction & Foundational Concepts

Module - 17 Lecture - 23 SQL and NoSQL systems. (Refer Slide Time: 00:04)

Migrating Oracle Databases To Cassandra

RethinkDB. Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu

relational Key-value Graph Object Document

CS-580K/480K Advanced Topics in Cloud Computing. NoSQL Database

Study of NoSQL Database Along With Security Comparison

Stages of Data Processing

Hands-on immersion on Big Data tools

Challenges for Data Driven Systems

Non-Relational Databases. Pelle Jakovits

Introduction to Graph Databases

Presented by Sunnie S Chung CIS 612

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

A Review Of Non Relational Databases, Their Types, Advantages And Disadvantages

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Paradigm Shift of Database

NOSQL Databases: The Need of Enterprises

Big Data Analytics. Rasoul Karimi

CSE 344 JULY 9 TH NOSQL

Performance Comparison of NOSQL Database Cassandra and SQL Server for Large Databases

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Strategic Briefing Paper Big Data

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

A Single Source of Truth

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Performance Evaluation of Redis and MongoDB Databases for Handling Semi-structured Data

Topics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL

COSC 304 Introduction to Database Systems. NoSQL Databases. Dr. Ramon Lawrence University of British Columbia Okanagan

Data Informatics. Seon Ho Kim, Ph.D.

CISC 7610 Lecture 4 Approaches to multimedia databases. Topics: Document databases Graph databases Metadata Column databases

CISC 7610 Lecture 2b The beginnings of NoSQL

In-Memory Data processing using Redis Database

Review - Relational Model Concepts

10 Million Smart Meter Data with Apache HBase

L22: NoSQL. CS3200 Database design (sp18 s2) 4/5/2018 Several slides courtesy of Benny Kimelfeld

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

What is database? Types and Examples

DATABASE DESIGN II - 1DL400

MIS Database Systems.

BIS Database Management Systems.

Advanced Database Technologies NoSQL: Not only SQL

A Review to the Approach for Transformation of Data from MySQL to NoSQL

Big data easily, efficiently, affordably. UniConnect 2.1

Why NoSQL? Why Riak?

Course Introduction & Foundational Concepts

IoT Data Storage: Relational & Non-Relational Database Management Systems Performance Comparison

Big Data Architect.

Lecture 25 Overview. Last Lecture Query optimisation/query execution strategies

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence.

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

CS 445 Introduction to Database Systems

Sources. P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley

Getting to know. by Michelle Darling August 2013

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Modern Database Concepts

CSE 530A. Non-Relational Databases. Washington University Fall 2013

INFO-H415 Adanvanced Databases Documents store and cloudant

A Study of NoSQL Database

I D C T E C H N O L O G Y S P O T L I G H T

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

Document stores and MongoDB Project: Advanced Databases

An Brief Introduction to Data Storage

HBase vs Neo4j. Technical overview. Name: Vladan Jovičić CR09 Advanced Scalable Data (Fall, 2017) Ecolé Normale Superiuere de Lyon

Database Architectures

OLTP vs. OLAP Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia Final Exam. Administrivia Final Exam

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Evolution of Database Systems

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

A NoSQL Introduction for Relational Database Developers. Andrew Karcher Las Vegas SQL Saturday September 12th, 2015

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

CS 655 Advanced Topics in Distributed Systems

CSC 355 Database Systems

Transcription:

Distributed Databases: SQL vs NoSQL Seda Unal, Yuchen Zheng April 23, 2017 1 Introduction Distributed databases have become increasingly popular in the era of big data because of their advantages over traditional databases. In this project, distributed databases are investigated from a relational versus non-relational perspective introducing SQL and NoSQL and discussing their advantages and disadvantages for distributed databases. A distributed database is a database where portions of the database are stored in multiple physical locations and processing is distributed among multiple database nodes. SQL, Structured Query Language, is the standard language for relational database management systems. In a relational database, relations/tables are used to match data and the resulting group is called a Schema. The need to store, process and analyze the unstructured data led to the development of schema-less alternatives to SQL, namely NoSQL known as not only SQL. In this project, a thorough comparison between SQL and NoSQL for distributed databases is provided on many aspects from model features, data integrity and flexibility to scalability. 2 Distributed Databases A distributed database is a collection of multiple, logically related databases distributed over a computer network [1]. It has advantages such as improved performance, speed and resource efficiency over a traditional centralized database system. However, it also has disadvantages such as increased complexity and difficulty to maintain data integrity. A distributed database system allows applications to access data from local and remote databases. With the data stored on multiple computers, users can access and modify the data in a network simultaneously. DDBMS (distributed database management system) control all database servers and maintain the consistency of the global database in a distributed database. Distributed databases use a client-server-node architecture to process information requests, in which client is an application that requests information from a 1

server, a server is the software that manages a database, and a node in a distributed database system can be either a client or a server. In the recent years preference has gradually moved to distributed database systems with motivations like the distributed nature of organizational units, support for both OLTP (online transaction processing) and OLAP (online analytical processing) and database recovery [2, 3]. Distributed databases have advantages over centralized databases like reliability, higher speed, low communication cost. Minimal disruption on database from failures indicates a distributed database is much reliable as the entire system will not come to a halt because of minor mistakes. Data stored on multiple computers, which does not require sending everything to a central computer to process, guarantees faster responses for users. Lastly, data in distributed databases is located locally where it is mostly used, which minimizes the communication costs for data manipulation [3, 4]. 3 SQL vs NoSQL in Distributed Databases SQL and NoSQL are both great inventions to keep data storage and retrieval optimized and smooth with one having some advantages over the other in certain scenarios [5, 8, 7]. Consider a social platform where users engage with each other in terms of posts associated with images, audio, video, comments, links to websites, etc. Using SQL a query with many joins is necessary to retrieve the content. In addition to the complexity of the data, consider stream of posts dynamically loading. With SQL, many queries and joins will be necessary to complete the task. A solution to this problem can be using JSON, as it is the supported dynamic data format. Another approach, from a nonrelational perspective would be using NoSQL. NoSQL simplifies the approach for this specific scenario. Due to its simplicity, the use of NoSQL has grown with the social media platforms in order to successfully handle the growing need of IoT (Internet of Things). 3.1 SQL SQL (Structured Query Language) is a programming language widely used to manage data in relational databases. A relational database is a set of tables containing data fitted into predefined categories. Users can access data from the database without knowing the structure of the database table. Scalability in relational SQL databases is an important issue since the database has to be distributed on to multiple servers and handling tables across different servers can be problematic.however, with Google announcing F1, a SQL database that is trivial to scale up, to run at the core of AdWords business; SQL is guaranteed to be available for distributed databases [8]. 2

3.2 NoSQL Non-relational databases have been around since the late 1960s, but the major shift to NoSQL databases happened only in the last decade, when Amazon introduced its Dynamo distributed NoSQL system [2]. A NoSQL database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases. 3.2.1 NoSQL Databases There has been a booming of all kinds of data in recent years, a big portion of which is no longer structured data which SQL is good at handling. NoSQL is exactly the tool to deal with these unstructured and semi-structured data. A NoSQL database environment is, simply put, a non-relational and largely distributed database system that enables rapid, ad-hoc organization and analysis of extremely high-volume, disparate data types [9]. Schema-less NoSQL databases follow basically all advantages of distributed databases and have become the alternative to relational databases. What does it mean by saying NoSQL is schema-less? It means for any data we put into NoSQL databases, we do not need to predefine a rigid schema like one of those in relational databases. The format of the data can even be changed at any time, without disruption on anything, which provides great application flexibility, and a faster speed than relational databases. NoSQL also has great horizontal scalability. It automatically spreads data over many servers, which can be added or removed from the data layer. Another reason is that NoSQL does not follow ACID (atomicity, consistency, isolation, durability), which relational databases have. ACID makes it harder to have a good horizontal scalability. Other features like replication (which maintains availability) and integrated caching (which improves read performance) also make NoSQL databases distinct from relational databases. 3.2.2 Types of NoSQL Databases Among many types of NoSQL databases, there are four most common ones: Document database, which stores document-oriented information (semi-structured data); Column store, which stores data tables as sections of columns of data, rather than as rows of data; Key-value store, which stores data with an indexed key and a value; Graph database, which stores data whose relations are represented as a graph [9]. A brief comparison of these databases in in Figure 1 in terms of their performance, scalability, flexibility, complexity and functionality. Let s check an example here. We choose MongoDB (document), Cassandra (column), Riak (key-value), Neo4j (graph) from the types we just mentioned to see how they are similar or different from each other in Data distribution mechanism and Distributed data processing support. 3

Figure 1: Comparison among NoSQL Databases [9] Data distribution mechanism Distributed data processing support MongoDB Cassandra Riak Neo4j Key-range Key-range Hash based masterslave and hash and hash distribution clus- based distribution. based ter distribution Aggregation Hadoop, MapReduce Hadoop pipeline, MapReduce MapReduce Table 1: Comparison of Four Representative NoSQL Databases [10] 3.3 NoSQL vs SQL NoSQL supporting a nonrelational, flexible, dynamic and horizontally scalable databases is the frequently selected programming language for distributed database systems. SQL, on the other hand, enforces strict schema, structured data, strong consistency and vertical scalability. Depending on the needs, the best fit between SQL and NoSQL can be decided. In distributed systems, due the the scalability problems of SQL, mostly NoSQL is preferred. However, there is evidence that scalability in SQL can be achieved with clustered hierarchical distributed databases [8]. 4 Conclusion In the last decade, with the introduction of big data, IoT and increasing user activity in social platforms have forced the backbone in database technology to be distributive. Distributed databases have better reliability, higher speed and low communication cost compared to the traditional databases. SQL and NoSQL are both great inventions to keep data storage and retrieval optimized and smooth. There are advantages and disadvantages of both programming languages in different applications. However, it would be fair to state that both are good and a better usage might depends on the situation and the needs. 4

References [1] M. T. Özsu and P. Valduriez, Principles of distributed databases, 2011. [2] S. Venkatraman et al., SQL versus NoSQL movement with big data analytics, International Journal of Information Technology and Computer Science 8(12), pp. 5966. 2016. [3] Distributed DBMS-Distributed Databases: https://www.tutorialspoint.com/distributed d bms/distributed d bms d atabases.htm [4] N. Leavitt, Will NoSQL databases live up to their promise? Computer 43(2), pp. 1214. 2010. [5] MangoDB White Paper: Top 5 Considerations When Evaluating NoSQL Databases [6] Oracle9i Database Administrator s Guide Release 2 (9.2) Part Number A96521-01 [7] NoSQL vs. SQL, Microsoft Azure: https://docs.microsoft.com/enus/azure/documentdb/documentdb-nosql-vs-sql [8] Ian Rae et.al. August 26th 2013, Proceedings of the VLDB Endowment, Vol. 6, No. 11 [9] What is NoSQL?: https://academy.datastax.com/planet-cassandra/whatis-nosql [10] Xiaoming Gao, Investigation and Comparison of Distributed NoSQL Database Systems, Indiana University 5