The Consistency Analysis of Secondary Index on Distributed Ordered Tables. Houliang Qi, Xu Chang, Xingwu Liu, Li Zha

Size: px
Start display at page:

Download "The Consistency Analysis of Secondary Index on Distributed Ordered Tables. Houliang Qi, Xu Chang, Xingwu Liu, Li Zha"

Transcription

1 The Consistency Analysis of Secondary Index on Distributed Ordered Tables Houliang Qi, Xu Chang, Xingwu Liu, Li Zha

2 Agenda Background MoEvaEon SoluEons Consistency Consistency Window Consistency Model Secondary Index EvaluaEon Conclusion & Future Work Related Work

3 Background NoSQL is widely used The digital universe is huge, and growing exponeneally Google BigTable, Yahoo! PNUTS, Apache HBase, Cassandra, etc high performance, low space overhead, and high reliability MulE- dimensional Range Queries MDRQ is common laetude > 41.5 and laetude < 44.6 and longitude > and longitude < and type = shop NoSQL only support row- key query Indexing techniques HIndex, Apache Phoenix, Diff- Index etc.

4 Motivation Only concerned about performance, do not care about consistency Data inconsistency problem Some queseons can t answer How to measure the consistency between the base table and index table? What is the difference between indexing techniques with eventual consistency models? What can we learn form the above analysis?

5 Contribution of our work Introduce the inconsistency window to measure the consistency degree Present the definieon of strong, RYW and eventual consistency between the index table and base table Classify the typical secondary indexing techniques and implement them by ourselves Experimental evaluaeon

6 Question1 How to measure the consistency between the base table and index table?

7 Index model Two operaeons C(v):reading key from index table by the value v and reading the record < k, v > from base table by the key, then comparing v with v. S(k, v): wrieng the record< k, v > into the base table and updaeng the index table ExecuEon DAG C(v1) C(v2) C(v3) S(k,v1) S(k,v2) S(k,v3) C(v1) C(v2) C(v3)

8 Inconsistency Window Concurrent execueon serializaeon Concurrent ES: ExecuEon Sequence Client P1: P2: Pm: O O! O! O i 1n O O! O! O i 2n! O O! O! O m1 m2 mi mnm 1 2 O O! O! O 1' 2' i' n' O O! O! O 1'' 2'' i'' n'' Server Server SequenEal RS: Result Sequence Inconsistency Window :Given any operaeon O, consider the set of operaeons O preceding O in ES and afer O in RS, denoted by Lag(O). Define inconsistency window of the RS as maxo RS Lag(O).

9 Inconsistency Window Calculate O1 Execu&on Sequence: Sort according to the Eme at which the operaeon begin to execute time O2 ES O 1 O 2 O 3 O 4 O 5 O 6 O 7 O3 O4 O5 Result Sequence: Sort by Eme according to the compleeon of the operaeon RS O6 O7 ICW O 1 0= 1-1 O 3 1= 3-2 O 5 2= 5-3 O 7 3= 7-4 O 2 3= 2-5 O 6 0= 6-6 O 4 3= 4-7 Execu&on DAG: from client perspeceve Inconsistency Window: For each operaeon, calculate its distance between the ES and RS

10 Consistency Model: Index table & Base Table Strong Consistency For any execueon ES, there is a linearizaeon RS preserving the order in ES, which means that if an operaeon O1 precedes O2 in ES, then operaeon O1 precedes O2 in RS. C(v1) C(v2) C(v3) S(k,v1) S(k,v2) S(k,v3) C(v1) C(v2) C(v3)

11 Consistency Model: Index table & Base Table Strong Consistency For any execueon ES, there is a linearizaeon RS preserving the order in ES, which means that if an operaeon O1 precedes O2 in ES, then operaeon O1 precedes O2 in RS. Read- Your- Writes(RYW) Consistency. There is a number Δ > 0. For any execueon ES, there is a linearizaeon whose inconsistency window is less than Δ and preserving the ordering of operaeons launched by any client. Eventual Consistency. There is a number Δ > 0 such that for any execueon ES, there is a linearizaeon whose inconsistency window is less than Δ C(v1) C(v2) C(v3) S(k,v1) S(k,v2) S(k,v3) C(v1) C(v2) C(v3) C(v3) failure failure failure

12 Question 2 What is the difference between indexing techniques with eventual consistency models?

13 Update operation in DOT Distributed Ordered Tables introduced by Yahoo! pareeons conenuous keys to regions, replicates regions, distributes regions to shared- nothing servers. serves as tables and columns, supports range queries on primary keys. Update operaeon in DOT

14 Secondary Indexes Indexing techniques classificaeon No. Indexing Techniques DescripGon 1 Sync Step1+Step2+Step3+Step4 2 Async-insert Step1+Step2+async maintain 3 Async-compact Step1+Step2+compact maintain 4 Async-simple Step1+async maintain Here we only focus on the eventual consistency Strong RYW Eventual Google Spanner, XiaoMi Themis, Apache Phoenix Diff-Index CMIndex, SLIK, Diff-Index, DELI, HIndex, IHBase, Apache Phoenix Indexing Techniques Sync Global CMIndex, Diff-Index Async-insert Diff-Index, SLIK - Local Apache Phoenix Async-compact DELI HIndex Async-simple Diff-Index IHBase

15 Evaluation Experimental setup 7 nodes: 3 client, 1 master, 3 slaves Client: generate workload by YCSB Servers: HBase/HDFS default configuraeon Workload scenario Scenario A: 100% update Scenario B: 50%scan, 50%update

16 Evaluation- Scenario A Sync: higher Async: lower Sync: lower Async: higher

17 Evaluation- Scenario A Global and local have the similar consistency for the same indexing techniques. Sync has the highest degree of consistency Async- insert and async- simple indexing technique both show worse consistency indexing techniques percenele inconsistency window global local Async- insert Async- simple

18 Evaluation- Scenario B Compared with update latency, scan latency is lower. The trend of performance of the indexing techniques is the same as scenario A The total performance of scenario B is beoer than scenario A

19 Evaluation- Scenario B Global and local have the similar consistency for the same indexing techniques. Sync has the highest degree of consistency. Async- insert and async- simple indexing technique both show worse consistency. read and write operaeons have been completed, laving only the asynchronous process indexing techniques percenele inconsistency window global local Async- insert Async- simple

20 Evaluation- Eme breakup Local index have no sever RPC Eme compared with global index

21 Question 3 What can we learn form the above analysis?

22 Conclusion Asynchronous indexing techniques are much beoer than synchronous indexing techniques in performance Async- simple is beoer than async- insert in performance while has similar inconsistency window with async- insert Comparing with the global indexing techniques, the local indexing techniques have no server RPC Eme Item Feature Performance Async vs Sync 10x worse Async-simple vs Async-insert 1.86x-2.87x (local) 3.20x-6.26x (global) Consistency similar Local vs Global better similar

23 Conclusion Async- simple is beoer than async- insert in performance while has similar inconsistency window with async- insert,why? No. Indexing Techniques DescripGon 2 Async-insert Step1+Step2+async maintain 4 Async-simple Step1+async maintain Although async- insert do step1 and step2 synchronizaeon, but they are not atomic.

24 Future Work Why not async- simple local secondary indexing technique? Whether it can improve the consistency without decreasing the performance?

25 Related Work Index vs Replica Consistency model [Lamport1979How], [Ahamad1995Causal], [Vogels2008Eventually], [Bailis2013Bolt], [Phansalkar2015Tunable] SemanGcs Secondary index database Replicas system Read First reading the index table and then reading the base table. Reading data from anyone replica. Write First writing data to base table and then writing data to index table or conversely. No certain writing order between the replicas.

26 Related Work Indexing OpEmizaEon Concurrency Control [graefe2012concurrency], [faleiro2015rethinking], Query Verify [tan2014diff- index], [kejriwal2016slik] RDMA [huang2012high- performance], [li2016acceleraeng] Double- Level Indexing [wu2009an],[cheng2014bf], [Sidirourgos2013Column] Domain- Specific Data [Wang2014Lightweight]

27 Q&A Thanks!

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related

More information

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006 Google File System, Replication Amin Vahdat CSE 123b May 23, 2006 Annoucements Third assignment available today Due date June 9, 5 pm Final exam, June 14, 11:30-2:30 Google File System (thanks to Mahesh

More information

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University HyperDex A Distributed, Searchable Key-Value Store Robert Escriva Bernard Wong Emin Gün Sirer Department of Computer Science Cornell University School of Computer Science University of Waterloo ACM SIGCOMM

More information

10 Million Smart Meter Data with Apache HBase

10 Million Smart Meter Data with Apache HBase 10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on

More information

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013 PNUTS: Yahoo! s Hosted Data Serving Platform Reading Review by: Alex Degtiar (adegtiar) 15-799 9/30/2013 What is PNUTS? Yahoo s NoSQL database Motivated by web applications Massively parallel Geographically

More information

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook Cassandra - A Decentralized Structured Storage System Avinash Lakshman and Prashant Malik Facebook Agenda Outline Data Model System Architecture Implementation Experiments Outline Extension of Bigtable

More information

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Goal of the presentation is to give an introduction of NoSQL databases, why they are there. 1 Goal of the presentation is to give an introduction of NoSQL databases, why they are there. We want to present "Why?" first to explain the need of something like "NoSQL" and then in "What?" we go in

More information

LCIndex: A Local and Clustering Index on Distributed OrderedTablesforFlexible Multi-Dimensional Range Queries

LCIndex: A Local and Clustering Index on Distributed OrderedTablesforFlexible Multi-Dimensional Range Queries LCIndex: A Local and Clustering Index on Distributed OrderedTablesforFlexible Multi-Dimensional Range Queries Chen Feng,XiYang,FanLiang,Xian-HeSun and Zhiwei Xu SKL Computer Architecture, ICT, CAS, China

More information

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL

More information

Rule 14 Use Databases Appropriately

Rule 14 Use Databases Appropriately Rule 14 Use Databases Appropriately Rule 14: What, When, How, and Why What: Use relational databases when you need ACID properties to maintain relationships between your data. For other data storage needs

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

10. Replication. Motivation

10. Replication. Motivation 10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure

More information

Why distributed databases suck, and what to do about it. Do you want a database that goes down or one that serves wrong data?"

Why distributed databases suck, and what to do about it. Do you want a database that goes down or one that serves wrong data? Why distributed databases suck, and what to do about it - Regaining consistency Do you want a database that goes down or one that serves wrong data?" 1 About the speaker NoSQL team lead at Trifork, Aarhus,

More information

NewSQL Databases. The reference Big Data stack

NewSQL Databases. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NewSQL Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference

More information

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,

More information

DrRobert N. M. Watson

DrRobert N. M. Watson Distributed systems Lecture 15: Replication, quorums, consistency, CAP, and Amazon/Google case studies DrRobert N. M. Watson 1 Last time General issue of consensus: How to get processes to agree on something

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.

More information

Scalable Low-Latency Indexes for a Key-Value Store Ankita Kejriwal

Scalable Low-Latency Indexes for a Key-Value Store Ankita Kejriwal Scalable Low-Latency Indexes for a Key-Value Store Ankita Kejriwal With Arjun Gopalan, Ashish Gupta, Zhihao Jia, Stephen Yang and John Ousterhout Conjecture Can a key value store support strongly consistent

More information

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours

More information

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 11: NoSQL & JSON (mostly not in textbook only Ch 11.1) HW5 will be posted on Friday and due on Nov. 14, 11pm [No Web Quiz 5] Today s lecture: NoSQL & JSON

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database How do we build TiDB a Distributed, Consistent, Scalable, SQL Database About me LiuQi ( 刘奇 ) JD / WandouLabs / PingCAP Co-founder / CEO of PingCAP Open-source hacker / Infrastructure software engineer

More information

Data Storage Revolution

Data Storage Revolution Data Storage Revolution Relational Databases Object Storage (put/get) Dynamo PNUTS CouchDB MemcacheDB Cassandra Speed Scalability Availability Throughput No Complexity Eventual Consistency Write Request

More information

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis 1 NoSQL So-called NoSQL systems offer reduced functionalities compared to traditional Relational DBMSs, with the aim of achieving

More information

CompSci 516 Database Systems

CompSci 516 Database Systems CompSci 516 Database Systems Lecture 20 NoSQL and Column Store Instructor: Sudeepa Roy Duke CS, Fall 2018 CompSci 516: Database Systems 1 Reading Material NOSQL: Scalable SQL and NoSQL Data Stores Rick

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

CIB Session 12th NoSQL Databases Structures

CIB Session 12th NoSQL Databases Structures CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is

More information

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these

More information

Asynchronous View Maintenance for VLSD Databases

Asynchronous View Maintenance for VLSD Databases Asynchronous View Maintenance for VLSD Databases Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, Raghu Ramakrishnan SIGMOD 2009 Talk by- Prashant S. Jaiswal Ketan J. Mav Motivation

More information

SCALABLE CONSISTENCY AND TRANSACTION MODELS

SCALABLE CONSISTENCY AND TRANSACTION MODELS Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:

More information

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm Final Exam Logistics CS 133: Databases Fall 2018 Lec 25 12/06 NoSQL Final exam take-home Available: Friday December 14 th, 4:00pm in Olin Due: Monday December 17 th, 5:15pm Same resources as midterm Except

More information

Distributed Data Store

Distributed Data Store Distributed Data Store Large-Scale Distributed le system Q: What if we have too much data to store in a single machine? Q: How can we create one big filesystem over a cluster of machines, whose data is

More information

CS / Cloud Computing. Recitation 11 November 5 th and Nov 8 th, 2013

CS / Cloud Computing. Recitation 11 November 5 th and Nov 8 th, 2013 CS15-319 / 15-619 Cloud Computing Recitation 11 November 5 th and Nov 8 th, 2013 Announcements Encounter a general bug: Post on Piazza Encounter a grading bug: Post Privately on Piazza Don t ask if my

More information

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis 1 NoSQL So-called NoSQL systems offer reduced functionalities compared to traditional Relational DBMS, with the aim of achieving

More information

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8)

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8) PNUTS and Weighted Voting Vijay Chidambaram CS 380 D (Feb 8) PNUTS Distributed database built by Yahoo Paper describes a production system Goals: Scalability Low latency, predictable latency Must handle

More information

In the news. Request- Level Parallelism (RLP) Agenda 10/7/11

In the news. Request- Level Parallelism (RLP) Agenda 10/7/11 In the news "The cloud isn't this big fluffy area up in the sky," said Rich Miller, who edits Data Center Knowledge, a publicaeon that follows the data center industry. "It's buildings filled with servers

More information

Introduction to NoSQL Databases

Introduction to NoSQL Databases Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction

More information

CS 655 Advanced Topics in Distributed Systems

CS 655 Advanced Topics in Distributed Systems Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3

More information

EECS 498 Introduction to Distributed Systems

EECS 498 Introduction to Distributed Systems EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Dynamo Recap Consistent hashing 1-hop DHT enabled by gossip Execution of reads and writes Coordinated by first available successor

More information

Benchmarking Replication and Consistency Strategies in Cloud Serving Databases: HBase and Cassandra

Benchmarking Replication and Consistency Strategies in Cloud Serving Databases: HBase and Cassandra Benchmarking Replication and Consistency Strategies in Cloud Serving Databases: HBase and Cassandra Huajin Wang 1,2, Jianhui Li 1(&), Haiming Zhang 1, and Yuanchun Zhou 1 1 Computer Network Information

More information

Quantitative Analysis of Consistency in NoSQL Key-value Stores

Quantitative Analysis of Consistency in NoSQL Key-value Stores Quantitative Analysis of Consistency in NoSQL Key-value Stores Si Liu, Son Nguyen, Jatin Ganhotra, Muntasir Raihan Rahman Indranil Gupta, and José Meseguer February 206 NoSQL Systems Growing quickly $3.4B

More information

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 19 Memory Consistency Models

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 19 Memory Consistency Models CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 19 Memory Consistency Models Krste Asanovic Electrical Engineering and Computer Sciences University of California

More information

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete MDCC MULTI DATA CENTER CONSISTENCY Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete gpang@cs.berkeley.edu amplab MOTIVATION 2 3 June 2, 200: Rackspace power outage of approximately 0

More information

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports University of Washington Distributed storage systems

More information

GridGain and Apache Ignite In-Memory Performance with Durability of Disk

GridGain and Apache Ignite In-Memory Performance with Durability of Disk GridGain and Apache Ignite In-Memory Performance with Durability of Disk Dmitriy Setrakyan Apache Ignite PMC GridGain Founder & CPO http://ignite.apache.org #apacheignite Agenda What is GridGain and Ignite

More information

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication DB Reading Group Fall 2015 slides by Dana Van Aken Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports

More information

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin ZHT: Const Eventual Consistency Support For ZHT Group Member: Shukun Xie Ran Xin Outline Problem Description Project Overview Solution Maintains Replica List for Each Server Operation without Primary Server

More information

Shen PingCAP 2017

Shen PingCAP 2017 Shen Li @ PingCAP About me Shen Li ( 申砾 ) Tech Lead of TiDB, VP of Engineering Netease / 360 / PingCAP Infrastructure software engineer WHY DO WE NEED A NEW DATABASE? Brief History Standalone RDBMS NoSQL

More information

Extreme Computing. NoSQL.

Extreme Computing. NoSQL. Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable

More information

Time Series Storage with Apache Kudu (incubating)

Time Series Storage with Apache Kudu (incubating) Time Series Storage with Apache Kudu (incubating) Dan Burkert (Committer) dan@cloudera.com @danburkert Tweet about this talk: @getkudu or #kudu 1 Time Series machine metrics event logs sensor telemetry

More information

April 21, 2017 Revision GridDB Reliability and Robustness

April 21, 2017 Revision GridDB Reliability and Robustness April 21, 2017 Revision 1.0.6 GridDB Reliability and Robustness Table of Contents Executive Summary... 2 Introduction... 2 Reliability Features... 2 Hybrid Cluster Management Architecture... 3 Partition

More information

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13 Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University

More information

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS h_da Prof. Dr. Uta Störl Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe 2017 163 Performance / Benchmarks Traditional database benchmarks

More information

MongoDB Distributed Write and Read

MongoDB Distributed Write and Read VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui MongoDB Distributed Write and Read Lecturer : Dr. Pavle Mogin SWEN 432 Advanced Database Design and Implementation Advanced

More information

Introduction to Distributed Data Systems

Introduction to Distributed Data Systems Introduction to Distributed Data Systems Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook January

More information

YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores

YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores Swapnil Patil M. Polte, W. Tantisiriroj, K. Ren, L.Xiao, J. Lopez, G.Gibson, A. Fuchs *, B. Rinaldi * Carnegie

More information

Migrating Oracle Databases To Cassandra

Migrating Oracle Databases To Cassandra BY UMAIR MANSOOB Why Cassandra Lower Cost of ownership makes it #1 choice for Big Data OLTP Applications. Unlike Oracle, Cassandra can store structured, semi-structured, and unstructured data. Cassandra

More information

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University Introduction to Computer Science William Hsu Department of Computer Science and Engineering National Taiwan Ocean University Chapter 9: Database Systems supplementary - nosql You can have data without

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23 Final Exam Review 2 Kathleen Durant CS 3200 Northeastern University Lecture 23 QUERY EVALUATION PLAN Representation of a SQL Command SELECT {DISTINCT} FROM {WHERE

More information

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database. Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database. Presented by Kewei Li The Problem db nosql complex legacy tuning expensive

More information

CockroachDB on DC/OS. Ben Darnell, CTO, Cockroach Labs

CockroachDB on DC/OS. Ben Darnell, CTO, Cockroach Labs CockroachDB on DC/OS Ben Darnell, CTO, Cockroach Labs Agenda A cloud-native database CockroachDB on DC/OS Why CockroachDB Demo! Cloud-Native Database What is Cloud-Native? Horizontally scalable Individual

More information

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY WHAT IS NOSQL? Stands for No-SQL or Not Only SQL. Class of non-relational data storage systems E.g.

More information

HyperDex: Next Generation NoSQL

HyperDex: Next Generation NoSQL Emin Gün Sirer Robert Escriva Bernard Wong CloudPhysics June 28, 2013 1 / 40 From RDBMS to NoSQL RDBMS have difficulty with scalability and performance NoSQL systems emerged to fill the gap 2 / 40 Problems

More information

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System Overview The current paradigm (CCL and Relational DataBase) Propose of a new monitor data system using NoSQL Monitoring Storage Requirements

More information

Replication in Distributed Systems

Replication in Distributed Systems Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over

More information

A Non-Relational Storage Analysis

A Non-Relational Storage Analysis A Non-Relational Storage Analysis Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Cloud Computing - 2nd semester 2012/2013 Universitat Politècnica de Catalunya Microblogging - big data?

More information

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano Database Evolution DB NoSQL Linked Open Data Requirements and features Large volumes of data..increasing No regular data structure to manage Relatively homogeneous elements among them (no correlation between

More information

There is a tempta7on to say it is really used, it must be good

There is a tempta7on to say it is really used, it must be good Notes from reviews Dynamo Evalua7on doesn t cover all design goals (e.g. incremental scalability, heterogeneity) Is it research? Complexity? How general? Dynamo Mo7va7on Normal database not the right fit

More information

Modern Database Concepts

Modern Database Concepts Modern Database Concepts Basic Principles Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz NoSQL Overview Main objective: to implement a distributed state Different objects stored on different

More information

Eventual Consistency Today: Limitations, Extensions and Beyond

Eventual Consistency Today: Limitations, Extensions and Beyond Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley - Nomchin Banga Outline Eventual Consistency: History and Concepts How eventual is eventual consistency?

More information

Presented by Nanditha Thinderu

Presented by Nanditha Thinderu Presented by Nanditha Thinderu Enterprise systems are highly distributed and heterogeneous which makes administration a complex task Application Performance Management tools developed to retrieve information

More information

Consistency examples. COS 418: Distributed Systems Precept 5. Themis Melissaris

Consistency examples. COS 418: Distributed Systems Precept 5. Themis Melissaris Consistency examples COS 418: Distributed Systems Precept 5 Themis Melissaris Plan Midterm poll Consistency examples 2 Fill out this poll: http://tinyurl.com/zdeq4lr 3 Linearizability 4 Once read returns

More information

Basic vs. Reliable Multicast

Basic vs. Reliable Multicast Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?

More information

App Engine: Datastore Introduction

App Engine: Datastore Introduction App Engine: Datastore Introduction Part 1 Another very useful course: https://www.udacity.com/course/developing-scalableapps-in-java--ud859 1 Topics cover in this lesson What is Datastore? Datastore and

More information

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval New Oracle NoSQL Database APIs that Speed Insertion and Retrieval O R A C L E W H I T E P A P E R F E B R U A R Y 2 0 1 6 1 NEW ORACLE NoSQL DATABASE APIs that SPEED INSERTION AND RETRIEVAL Introduction

More information

Indexing Large-Scale Data

Indexing Large-Scale Data Indexing Large-Scale Data Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook November 16, 2010

More information

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores Swapnil Patil Milo Polte, Wittawat Tantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie

More information

CGAR: Strong Consistency without Synchronous Replication. Seo Jin Park Advised by: John Ousterhout

CGAR: Strong Consistency without Synchronous Replication. Seo Jin Park Advised by: John Ousterhout CGAR: Strong Consistency without Synchronous Replication Seo Jin Park Advised by: John Ousterhout Improved update performance of storage systems with master-back replication Fast: updates complete before

More information

Scaling with mongodb

Scaling with mongodb Scaling with mongodb Ross Lawley Python Engineer @ 10gen Web developer since 1999 Passionate about open source Agile methodology email: ross@10gen.com twitter: RossC0 Today's Talk Scaling Understanding

More information

Distributed Systems. 19. Spanner. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 19. Spanner. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 19. Spanner Paul Krzyzanowski Rutgers University Fall 2017 November 20, 2017 2014-2017 Paul Krzyzanowski 1 Spanner (Google s successor to Bigtable sort of) 2 Spanner Take Bigtable and

More information

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2014 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

arxiv: v1 [cs.dc] 9 Jan 2018

arxiv: v1 [cs.dc] 9 Jan 2018 Search on Secondary Attributes in Geo-Distributed Systems Dimitrios Vasilas 1,2, Marc Shapiro 1, and Bradley King 2 arxiv:1801.02974v1 [cs.dc] 9 Jan 2018 1 Inria & Sorbonne Universités, UPMC Univ Paris

More information

6.UAP Final Report: Replication in H-Store

6.UAP Final Report: Replication in H-Store 6.UAP Final Report: Replication in H-Store Kathryn Siegel May 14, 2015 This paper describes my research efforts implementing replication in H- Store. I first provide general background on the H-Store project,

More information

Chapter 11 - Data Replication Middleware

Chapter 11 - Data Replication Middleware Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 11 - Data Replication Middleware Motivation Replication: controlled

More information

CSE 344 JULY 9 TH NOSQL

CSE 344 JULY 9 TH NOSQL CSE 344 JULY 9 TH NOSQL ADMINISTRATIVE MINUTIAE HW3 due Wednesday tests released actual_time should have 0s not NULLs upload new data file or use UPDATE to change 0 ~> NULL Extra OOs on Mondays 5-7pm in

More information

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden SQL, NoSQL, MongoDB CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden SQL Databases Really better called Relational Databases Key construct is the Relation, a.k.a. the table Rows represent records Columns

More information

Integrity in Distributed Databases

Integrity in Distributed Databases Integrity in Distributed Databases Andreas Farella Free University of Bozen-Bolzano Table of Contents 1 Introduction................................................... 3 2 Different aspects of integrity.....................................

More information

MDHIM: A Parallel Key/Value Store Framework for HPC

MDHIM: A Parallel Key/Value Store Framework for HPC MDHIM: A Parallel Key/Value Store Framework for HPC Hugh Greenberg 7/6/2015 LA-UR-15-25039 HPC Clusters Managed by a job scheduler (e.g., Slurm, Moab) Designed for running user jobs Difficult to run system

More information

Prophecy: Using History for High Throughput Fault Tolerance

Prophecy: Using History for High Throughput Fault Tolerance Prophecy: Using History for High Throughput Fault Tolerance Siddhartha Sen Joint work with Wyatt Lloyd and Mike Freedman Princeton University Non crash failures happen Non crash failures happen Model as

More information

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The

More information

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日 Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日 Motivation There are many cloud DB and nosql systems out there PNUTS BigTable HBase, Hypertable, HTable Megastore Azure Cassandra Amazon

More information

Faster HBase queries. Introducing hindex Secondary indexes for HBase. ApacheCon North America Rajeshbabu Chintaguntla

Faster HBase queries. Introducing hindex Secondary indexes for HBase. ApacheCon North America Rajeshbabu Chintaguntla Security Level: Faster HBase queries Introducing hindex Secondary indexes for HBase ApacheCon North America 2014 www.huawei.com Rajeshbabu Chintaguntla rajeshbabu@apache.org HUAWEI TECHNOLOGIES CO., LTD.

More information

Percolator. Large-Scale Incremental Processing using Distributed Transactions and Notifications. D. Peng & F. Dabek

Percolator. Large-Scale Incremental Processing using Distributed Transactions and Notifications. D. Peng & F. Dabek Percolator Large-Scale Incremental Processing using Distributed Transactions and Notifications D. Peng & F. Dabek Motivation Built to maintain the Google web search index Need to maintain a large repository,

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

MillWheel:Fault Tolerant Stream Processing at Internet Scale. By FAN Junbo

MillWheel:Fault Tolerant Stream Processing at Internet Scale. By FAN Junbo MillWheel:Fault Tolerant Stream Processing at Internet Scale By FAN Junbo Introduction MillWheel is a low latency data processing framework designed by Google at Internet scale. Motived by Google Zeitgeist

More information

Relational databases

Relational databases COSC 6397 Big Data Analytics NoSQL databases Edgar Gabriel Spring 2017 Relational databases Long lasting industry standard to store data persistently Key points concurrency control, transactions, standard

More information

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre NoSQL systems: sharding, replication and consistency Riccardo Torlone Università Roma Tre Data distribution NoSQL systems: data distributed over large clusters Aggregate is a natural unit to use for data

More information