Scalable Low-Latency Indexes for a Key-Value Store Ankita Kejriwal

Size: px
Start display at page:

Download "Scalable Low-Latency Indexes for a Key-Value Store Ankita Kejriwal"

Transcription

1 Scalable Low-Latency Indexes for a Key-Value Store Ankita Kejriwal With Arjun Gopalan, Ashish Gupta, Zhihao Jia, Stephen Yang and John Ousterhout

2 Conjecture Can a key value store support strongly consistent secondary indexes while operating at low latency and large scale? SLIK Slide 2

3 Scalable Low-latency Indexes for a Key-value Store: SLIK Enables multiple secondary keys for each object Allows lookups and range queries on these keys Key design features: Scalability using independent partitioning Strong consistency using an ordered write approach Implemented in RAMCloud Low-latency, DRAM-based, distributed key-value store Performance: Summary of Results Scalability: Linear throughput increase with increasing number of partitions Low-latency:11-13 µs indexed reads, µs durable writes/overwrites Latency approximately 2x non-indexed reads and writes SLIK Slide 3

4 Talk Outline Motivation Design Performance Related Work Summary SLIK Slide 4

5 Motivation Traditional RDBMs MySQL SLIK Slide 5

6 Motivation Traditional RDBMs MySQL + scalability - data models - consistency NoSQL Systems SLIK Slide 6

7 Motivation + consistency H-Base Espresso RAMCloud Traditional RDBMs MySQL + scalability - data models - consistency NoSQL Systems Yesquel Spanner Megastore HyperDex MongoDB H-Store Memcached PNUTS Tao + data models + low latency SLIK Slide 7

8 Motivation + consistency H-Base Espresso RAMCloud Traditional RDBMs MySQL + scalability - data models - consistency NoSQL Systems Yesquel Spanner Megastore HyperDex MongoDB H-Store Memcached PNUTS Tao + data models + low latency SLIK Slide 8

9 Talk Outline Motivation Design Performance Related Work Summary SLIK Slide 9

10 Design Data model Scalability Strong consistency Storage Durability Availability SLIK Slide 10

11 Design Data model Scalability Strong consistency Storage Durability Availability SLIK Slide 11

12 Design Data model Scalability Strong consistency Storage Durability Availability SLIK Slide 12

13 Scalability Nearly constant low latency irrespective of the server span Linear increase in throughput with the server span SLIK Slide 13

14 Index Partitioning: Colocation Colocate index entries and objects One of the keys used to partition the objects and indexes Server 1 Server 2 Server 3 Indexlet Indexlet Indexlet index key primary key a è 2 n è 3 q è 1 Tablet e è 4 g è 6 Tablet v è 5 b è 8 m è 7 Tablet 1 q rose 2 a tulip 3 n violet 4 e clover 5 v daily 6 g iris 7 m lily 8 b dahlia primary key index key value SLIK Slide 14

15 Index Partitioning: Colocation Colocate index entries and objects One of the keys used to partition the objects and indexes No association between index partitions and index key ranges Server 1 Server 2 Server 3 Indexlet Indexlet Indexlet index key primary key a è 2 n è 3 q è 1 Tablet e è 4 g è 6 Tablet v è 5 b è 8 m è 7 Tablet 1 q rose 2 a tulip 3 n violet 4 e clover 5 v daily 6 g iris 7 m lily 8 b dahlia primary key index key value Metadata: tablet & indexlet w/ pk 1 to 3: S 1 tablet & indexlet w/ pk 4 to 6: S 2 tablet & indexlet w/ pk >= 7: S 3 Slide 15

16 Index Partitioning: Colocation Client query: objects with index key between m - q Server 1 Server 2 Server 3 Indexlet Indexlet Indexlet a è 2 n è 3 q è 1 e è 4 g è 6 v è 5 b è 8 m è 7 Tablet 1 q rose 2 a tulip 3 n violet Tablet 4 e clover 5 v daily 6 g iris Tablet 7 m lily 8 b dahlia SLIK Slide 16

17 Index Partitioning: Colocation Client query: objects with index key between m - q Server 1 Server 2 Server 3 Indexlet Indexlet Indexlet a è 2 n è 3 q è 1 e è 4 g è 6 v è 5 b è 8 m è 7 Tablet 1 q rose 2 a tulip 3 n violet Tablet 4 e clover 5 v daily 6 g iris Tablet 7 m lily 8 b dahlia SLIK Slide 17

18 Index Partitioning: Colocation Client query: objects with index key between m - q Server 1 Server 2 Server 3 Indexlet Indexlet Indexlet a è 2 n è 3 q è 1 e è 4 g è 6 v è 5 b è 8 m è 7 Tablet 1 q rose 2 a tulip 3 n violet Tablet 4 e clover 5 v daily 6 g iris Tablet 7 m lily 8 b dahlia SLIK Slide 18

19 Index Partitioning: Colocation Client query: objects with index key between m - q Server 1 Server 2 Server 3 Indexlet Indexlet Indexlet a è 2 n è 3 q è 1 e è 4 g è 6 v è 5 b è 8 m è 7 Tablet 1 q rose 2 a tulip 3 n violet Tablet 4 e clover 5 v daily 6 g iris Tablet 7 m lily 8 b dahlia Not Scalable! Slide 19

20 Index Partitioning: Independent Partition each index and table independently Partition each index according to sort order for that index Server 4 Server 5 index key primary key Indexlet a è 2 b è 8 e è 4 g è 6 Indexlet m è 7 n è 3 q è 1 v è 5 Server 1 Server 2 Server 3 Tablet Tablet Tablet 1 q rose 4 e clover 7 m lily 2 a tulip 5 v daily 8 b dahlia 3 n violet 6 g iris primary key index key value SLIK Slide 20

21 Index Partitioning: Independent Partition each index and table independently Partition each index according to sort order for that index Server 4 Server 5 Indexlet Indexlet index key primary key a è 2 b è 8 e è 4 g è 6 Tablet Server 1 Server 2 Server 3 1 q rose 2 a tulip 3 n violet Tablet 4 e clover 5 v daily 6 g iris m è 7 n è 3 q è 1 v è 5 Tablet 7 m lily 8 b dahlia Metadata: tablet w/ pk 1 to 3: S 1 tablet w/ pk 4 to 6: S 2 tablet w/ pk >= 7: S 3 indexlet w/ sk a to g: S 4 indexlet w/ sk >= h: S 5 primary key index key value SLIK Slide 21

22 Index Partitioning: Independent Server 4 Server 5 Indexlet Indexlet a è 2 b è 8 e è 4 g è 6 m è 7 n è 3 q è 1 v è 5 Client query: objects with index key between m - q Server 1 Server 2 Server 3 Tablet Tablet Tablet 1 q rose 4 e clover 7 m lily 2 a tulip 5 v daily 8 b dahlia 3 n violet 6 g iris SLIK Slide 22

23 Index Partitioning: Independent Server 4 Server 5 Indexlet Indexlet a è 2 b è 8 e è 4 g è 6 m è 7 n è 3 q è 1 v è 5 Client query: objects with index key between m - q Server 1 Server 2 Server 3 Tablet Tablet Tablet 1 q rose 4 e clover 7 m lily 2 a tulip 5 v daily 8 b dahlia 3 n violet 6 g iris SLIK Slide 23

24 Index Partitioning: Independent Server 4 Server 5 Indexlet Indexlet a è 2 b è 8 e è 4 g è 6 m è 7 n è 3 q è 1 v è 5 Client query: objects with index key between m - q Server 1 Server 2 Server 3 Tablet Tablet Tablet 1 q rose 4 e clover 7 m lily 2 a tulip 5 v daily 8 b dahlia 3 n violet 6 g iris SLIK Slide 24

25 Index Partitioning: Independent Server 4 Server 5 Indexlet Indexlet a è 2 b è 8 e è 4 g è 6 m è 7 n è 3 q è 1 v è 5 Client query: objects with index key between m - q Server 1 Server 2 Server 3 Tablet Tablet Tablet 1 q rose 4 e clover 7 m lily 2 a tulip 5 v daily 8 b dahlia 3 n violet 6 g iris Scalable! Slide 25

26 Scalability Nearly constant low latency irrespective of the server span Linear increase in throughput with the server span SLIK Slide 26

27 Scalability Nearly constant low latency irrespective of the server span Linear increase in throughput with the server span Solution: Use independent partitioning But: indexed object writes: distributed operations Potential consistency issues between indexes and objects SLIK Slide 27

28 Design Data model Scalability Strong consistency Storage Durability Availability SLIK Slide 28

29 Design Data model Scalability Strong consistency Storage Durability Availability SLIK Slide 29

30 Consistency Properties If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range SLIK Slide 30

31 Consistency Properties If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Bob Peggy Frank Alice Carol Trent SLIK Slide 31

32 Consistency Properties If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Bob Alice Frank Trent Peggy Carol students with name between a d? Alice Carol? SLIK Slide 32

33 Consistency Properties If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range SLIK Slide 33

34 Consistency Properties If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Bob Peggy Frank Alice Carol Trent SLIK Slide 34

35 Consistency Properties If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Alice Bob Alice students Bob Frank Trent with name between a d? Carol Peggy Carol? Peggy SLIK Slide 35

36 Consistency Properties If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Bob Alice Frank Trent Peggy Carol students with name between a d? Alice Bob Carol SLIK Slide 36

37 Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints Slide 37

38 Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints Index Entry Bob è Foo Object Foo Bob... Slide 38

39 Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints Index Entry Bob è Foo Sam è Foo 1. Add new index entry Object Foo Bob... Slide 39

40 Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints Index Entry Bob è Foo Sam è Foo 1. Add new index entry 2. Modify object Object Foo Sam... Slide 40

41 Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints Index Entry Object Foo Sam... Sam è Foo 1. Add new index entry 2. Modify object 3. Remove old index entry Slide 41

42 Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints Index Entry Bob è Foo Sam è Foo Object Foo Bob... Foo Sam... time write object modify object remove object Slide 42

43 Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints Index Entry Bob è Foo Sam è Foo Object Foo Bob... Foo Sam... time write object modify object remove object Slide 43

44 Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints commit point commit point x commit point Index Entry Bob è Foo Sam è Foo Object Foo Bob... Foo Sam... time write object modify object remove object Slide 44

45 Consistency properties: Consistency If an object contains a given secondary key, then an index lookup with that key will return the object If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range Solution: Longer index lifespan (via ordered writes) Object data is ground truth and index entries serve as hints commit point commit point commit point Index Entry Bob è Foo Sam è Foo Object Foo Bob... Foo Sam... time write object modify object remove object Slide 45

46 Talk Outline Motivation Design Performance Related Work Summary SLIK Slide 46

47 Performance: Questions Does SLIK provide low latency? Does SLIK provide scalability? How does the performance of indexing with SLIK compare to other state-of-the-art systems? SLIK Slide 47

48 Performance: Systems for Comparison H-Store: Main memory database Data (and indexes) partitioned based on specified attribute Many parameters for tuning Got assistance from developers to tune for each test Examples: txn_incoming_delay, partitioning column HyperDex: Spaces containing objects Data (and indexes) partitioned using hyperspace hashing Each index contains all object data Designed to use disk for storage SLIK Slide 48

49 Hardware SLIK Slide 49

50 Latency Experiments: 1. Lookups: table with single secondary index 2. Overwrites: table with single secondary index 3. Overwrites: varying number of secondary indexes Configuration: Single client Single partition for table and (each) index Object: 30 B pk, 30 B sk, 100 B value SLIK: Three-way replication to durable backups H-Store: No replication, durability disabled, single server Slide 50

51 Lookup Latency 250 H-Store SLIK TCP SLIK (a) Lookup Latency (µs) Size of Index (# objects) Slide 51

52 Lookup Latency 250 H-Store SLIK TCP SLIK (a) Lookup Latency (µs) Size of Index (# objects) Slide 52

53 Overwrite Latency 250 H-Store SLIK TCP SLIK (b) Overwrite Latency (µs) Size of Index (# objects) Slide 53

54 Multiple Secondary Indexes Overwrite Latency (µs) H-Store via PK SLIK TCP H-Store via SK SLIK Number of Indexes Slide 54

55 Multiple Secondary Indexes Overwrite Latency (µs) H-Store via PK SLIK TCP H-Store via SK SLIK Number of Indexes Slide 55

56 Multiple Secondary Indexes Overwrite Latency (µs) H-Store via PK SLIK TCP H-Store via SK SLIK Number of Indexes Slide 56

57 Scalability Compare: (a) partitioning approaches (b) systems Experiments: 1. Lookup throughput with increasing number of partitions 2. Lookup latency with increasing number of partitions Configuration: Single table with one secondary index Table and index partitioned across servers Object: 30 B pk, 30 B sk, 100 B value Throughput experiments: Loaded system Latency experiments: Unloaded system Slide 57

58 Scalability: Throughput Throughput (10 3 lookups/sec) Independent Partitioning Colocation Number of Indexlets 3092 Slide 58

59 Scalability: Throughput Throughput (10 3 lookups/sec) H-Store SLIK TCP 4248 SLIK Number of Servers Slide 59

60 Scalability: Latency Lookup Latency (µs) Colocation size Colocation size Independent size 1 Independent size Number of Servers Slide 60

61 Scalability: Latency Average Latency per Lookup (µs) H-Store SLIK TCP SLIK Number of Indexlets Slide 61

62 Talk Outline Motivation Design Performance Related Work Summary SLIK Slide 62

63 Data storage system Related Work Data model (spectrum from key-value to relational) Consistency (spectrum from eventual to strong) Performance: latency and/or throughput SLIK Slide 63

64 Current Web Scale Datastores Better Strong Consistency Level Causal, SI, Define your own Eventual 100s 10s 1s 100ms 10ms 1ms 100µs 10µs Read / write latency (approx) Better SLIK Slide 64

65 Current Web Scale Datastores Better Strong Consistency Level Causal, SI, Define your own Eventual PNUTS CouchDB Espresso Tao 100s 10s 1s 100ms 10ms 1ms 100µs 10µs Read / write latency (approx) Better SLIK Slide 65

66 Current Web Scale Datastores Better Strong MongoDB H-Base Spanner H-Store HyperDex Consistency Level Causal, SI, Define your own Eventual PNUTS CouchDB Espresso Tao 100s 10s 1s 100ms 10ms 1ms 100µs 10µs Read / write latency (approx) Better SLIK Slide 66

67 Current Web Scale Datastores Better Strong MongoDB H-Base Spanner H-Store HyperDex Consistency Level Causal, SI, Define your own Eventual Cassandra Espresso PNUTS CouchDB Tao 100s 10s 1s 100ms 10ms 1ms 100µs 10µs Read / write latency (approx) Better SLIK Slide 67

68 Current Web Scale Datastores Better Strong MongoDB H-Base Spanner H-Store HyperDex SLIK Consistency Level Causal, SI, Define your own Eventual Cassandra Espresso PNUTS CouchDB Tao 100s 10s 1s 100ms 10ms 1ms 100µs 10µs Read / write latency (approx) Better SLIK Slide 68

69 Talk Outline Motivation Design Performance Related Work Summary SLIK Slide 69

70 Conjecture Can a key value store support strongly consistent secondary indexes while operating at low latency and large scale? SLIK Slide 70

71 Summary Lookups and range queries on secondary keys A key value store can support strongly consistent secondary indexes while operating at low latency and large scale. SLIK Slide 71

72 Summary By using ordered writes and treating indexes as hints Lookups and range queries on secondary keys A key value store can support strongly consistent secondary indexes while operating at low latency and large scale. SLIK Slide 72

73 Summary By using ordered writes and treating indexes as hints Lookups and range queries on secondary keys A key value store can support strongly consistent secondary indexes while operating at low latency and large scale. By using independent partitioning we get: linear throughput increase and minimal impact on latency as the scale increases SLIK Slide 73

74 Summary By using ordered writes and treating indexes as hints Lookups and range queries on secondary keys A key value store can support strongly consistent secondary indexes while operating at low latency and large scale. By using approaches that have minimal overheads we get: µs lookups and µs (over)writes By using independent partitioning we get: linear throughput increase and minimal impact on latency as the scale increases SLIK Slide 74

75 Summary By using ordered writes and treating indexes as hints Lookups and range queries on secondary keys A key value store can support strongly consistent secondary indexes while operating at low latency and large scale. By using approaches that have minimal overheads we get: µs lookups and µs (over)writes By using independent partitioning we get: linear throughput increase and minimal impact on latency as the scale increases SLIK Slide 75

76 Thank you! Code available free and open source: github.com/platformlab/ramcloud My papers and other information at: stanford.edu/~ankitak I can be reached at: ankitak@cs.stanford.edu

Indexing in RAMCloud. Ankita Kejriwal, Ashish Gupta, Arjun Gopalan, John Ousterhout. Stanford University

Indexing in RAMCloud. Ankita Kejriwal, Ashish Gupta, Arjun Gopalan, John Ousterhout. Stanford University Indexing in RAMCloud Ankita Kejriwal, Ashish Gupta, Arjun Gopalan, John Ousterhout Stanford University RAMCloud 1.0 Introduction Higher-level data models Without sacrificing latency and scalability Secondary

More information

RAMCloud. Scalable High-Performance Storage Entirely in DRAM. by John Ousterhout et al. Stanford University. presented by Slavik Derevyanko

RAMCloud. Scalable High-Performance Storage Entirely in DRAM. by John Ousterhout et al. Stanford University. presented by Slavik Derevyanko RAMCloud Scalable High-Performance Storage Entirely in DRAM 2009 by John Ousterhout et al. Stanford University presented by Slavik Derevyanko Outline RAMCloud project overview Motivation for RAMCloud storage:

More information

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University HyperDex A Distributed, Searchable Key-Value Store Robert Escriva Bernard Wong Emin Gün Sirer Department of Computer Science Cornell University School of Computer Science University of Waterloo ACM SIGCOMM

More information

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel Rosenblum and John Ousterhout) a Storage System

More information

Making RAMCloud Writes Even Faster

Making RAMCloud Writes Even Faster Making RAMCloud Writes Even Faster (Bring Asynchrony to Distributed Systems) Seo Jin Park John Ousterhout Overview Goal: make writes asynchronous with consistency. Approach: rely on client Server returns

More information

CS 655 Advanced Topics in Distributed Systems

CS 655 Advanced Topics in Distributed Systems Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3

More information

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,

More information

High-Level Data Models on RAMCloud

High-Level Data Models on RAMCloud High-Level Data Models on RAMCloud An early status report Jonathan Ellithorpe, Mendel Rosenblum EE & CS Departments, Stanford University Talk Outline The Idea Data models today Graph databases Experience

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.

More information

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons

More information

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc. PROFESSIONAL NoSQL Shashank Tiwari WILEY John Wiley & Sons, Inc. Examining CONTENTS INTRODUCTION xvil CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3 Definition and Introduction 4 Context and a Bit

More information

Extreme Computing. NoSQL.

Extreme Computing. NoSQL. Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable

More information

CGAR: Strong Consistency without Synchronous Replication. Seo Jin Park Advised by: John Ousterhout

CGAR: Strong Consistency without Synchronous Replication. Seo Jin Park Advised by: John Ousterhout CGAR: Strong Consistency without Synchronous Replication Seo Jin Park Advised by: John Ousterhout Improved update performance of storage systems with master-back replication Fast: updates complete before

More information

Asynchronous View Maintenance for VLSD Databases

Asynchronous View Maintenance for VLSD Databases Asynchronous View Maintenance for VLSD Databases Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, Raghu Ramakrishnan SIGMOD 2009 Talk by- Prashant S. Jaiswal Ketan J. Mav Motivation

More information

HyperDex: Next Generation NoSQL

HyperDex: Next Generation NoSQL Emin Gün Sirer Robert Escriva Bernard Wong CloudPhysics June 28, 2013 1 / 40 From RDBMS to NoSQL RDBMS have difficulty with scalability and performance NoSQL systems emerged to fill the gap 2 / 40 Problems

More information

A Modular Design for Geo-Distributed Querying

A Modular Design for Geo-Distributed Querying A Modular Design for Geo-Distributed Querying Dimitrios Vasilas, Marc Shapiro, Bradley King To cite this version: Dimitrios Vasilas, Marc Shapiro, Bradley King. A Modular Design for Geo-Distributed Querying:

More information

RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University

RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Nandu Jayakumar, Diego Ongaro, Mendel Rosenblum, Stephen Rumble, and Ryan Stutsman) DRAM in Storage

More information

GridGain and Apache Ignite In-Memory Performance with Durability of Disk

GridGain and Apache Ignite In-Memory Performance with Durability of Disk GridGain and Apache Ignite In-Memory Performance with Durability of Disk Dmitriy Setrakyan Apache Ignite PMC GridGain Founder & CPO http://ignite.apache.org #apacheignite Agenda What is GridGain and Ignite

More information

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: replication adds latency and throughput overheads CURP: Consistent Unordered Replication Protocol

More information

Migrating Oracle Databases To Cassandra

Migrating Oracle Databases To Cassandra BY UMAIR MANSOOB Why Cassandra Lower Cost of ownership makes it #1 choice for Big Data OLTP Applications. Unlike Oracle, Cassandra can store structured, semi-structured, and unstructured data. Cassandra

More information

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 11: NoSQL & JSON (mostly not in textbook only Ch 11.1) HW5 will be posted on Friday and due on Nov. 14, 11pm [No Web Quiz 5] Today s lecture: NoSQL & JSON

More information

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction

More information

Distributed Data Store

Distributed Data Store Distributed Data Store Large-Scale Distributed le system Q: What if we have too much data to store in a single machine? Q: How can we create one big filesystem over a cluster of machines, whose data is

More information

EECS 498 Introduction to Distributed Systems

EECS 498 Introduction to Distributed Systems EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Dynamo Recap Consistent hashing 1-hop DHT enabled by gossip Execution of reads and writes Coordinated by first available successor

More information

CompSci 516 Database Systems

CompSci 516 Database Systems CompSci 516 Database Systems Lecture 20 NoSQL and Column Store Instructor: Sudeepa Roy Duke CS, Fall 2018 CompSci 516: Database Systems 1 Reading Material NOSQL: Scalable SQL and NoSQL Data Stores Rick

More information

Rule 14 Use Databases Appropriately

Rule 14 Use Databases Appropriately Rule 14 Use Databases Appropriately Rule 14: What, When, How, and Why What: Use relational databases when you need ACID properties to maintain relationships between your data. For other data storage needs

More information

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日 Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日 Motivation There are many cloud DB and nosql systems out there PNUTS BigTable HBase, Hypertable, HTable Megastore Azure Cassandra Amazon

More information

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Goal of the presentation is to give an introduction of NoSQL databases, why they are there. 1 Goal of the presentation is to give an introduction of NoSQL databases, why they are there. We want to present "Why?" first to explain the need of something like "NoSQL" and then in "What?" we go in

More information

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013 PNUTS: Yahoo! s Hosted Data Serving Platform Reading Review by: Alex Degtiar (adegtiar) 15-799 9/30/2013 What is PNUTS? Yahoo s NoSQL database Motivated by web applications Massively parallel Geographically

More information

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports University of Washington Distributed storage systems

More information

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these

More information

arxiv: v1 [cs.dc] 9 Jan 2018

arxiv: v1 [cs.dc] 9 Jan 2018 Search on Secondary Attributes in Geo-Distributed Systems Dimitrios Vasilas 1,2, Marc Shapiro 1, and Bradley King 2 arxiv:1801.02974v1 [cs.dc] 9 Jan 2018 1 Inria & Sorbonne Universités, UPMC Univ Paris

More information

CSE 344 JULY 9 TH NOSQL

CSE 344 JULY 9 TH NOSQL CSE 344 JULY 9 TH NOSQL ADMINISTRATIVE MINUTIAE HW3 due Wednesday tests released actual_time should have 0s not NULLs upload new data file or use UPDATE to change 0 ~> NULL Extra OOs on Mondays 5-7pm in

More information

Design and Validation of Distributed Data Stores using Formal Methods

Design and Validation of Distributed Data Stores using Formal Methods Design and Validation of Distributed Data Stores using Formal Methods Peter Ölveczky University of Oslo University of Illinois at Urbana-Champaign Based on joint work with Jon Grov, Indranil Gupta, Si

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (2/2) March 16, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

Replex: A Scalable, Highly Available Multi-Index Data Store

Replex: A Scalable, Highly Available Multi-Index Data Store Replex: A Scalable, Highly Available Multi-Index Data Store Amy Tai, VMWare Research and Princeton University; Michael Wei, VMware Research and University of California, San Diego; Michael J. Freedman,

More information

Data Storage Revolution

Data Storage Revolution Data Storage Revolution Relational Databases Object Storage (put/get) Dynamo PNUTS CouchDB MemcacheDB Cassandra Speed Scalability Availability Throughput No Complexity Eventual Consistency Write Request

More information

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL

More information

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23 Final Exam Review 2 Kathleen Durant CS 3200 Northeastern University Lecture 23 QUERY EVALUATION PLAN Representation of a SQL Command SELECT {DISTINCT} FROM {WHERE

More information

App Engine: Datastore Introduction

App Engine: Datastore Introduction App Engine: Datastore Introduction Part 1 Another very useful course: https://www.udacity.com/course/developing-scalableapps-in-java--ud859 1 Topics cover in this lesson What is Datastore? Datastore and

More information

SCALABLE CONSISTENCY AND TRANSACTION MODELS

SCALABLE CONSISTENCY AND TRANSACTION MODELS Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:

More information

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 15: NoSQL & JSON (mostly not in textbook only Ch 11.1) 1 Homework 4 due tomorrow night [No Web Quiz 5] Midterm grading hopefully finished tonight post online

More information

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ] Database Availability and Integrity in NoSQL Fahri Firdausillah [M031010012] What is NoSQL Stands for Not Only SQL Mostly addressing some of the points: nonrelational, distributed, horizontal scalable,

More information

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University Introduction to Computer Science William Hsu Department of Computer Science and Engineering National Taiwan Ocean University Chapter 9: Database Systems supplementary - nosql You can have data without

More information

Big Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

Big Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI Big Data Development CASSANDRA NoSQL Training - Workshop November 20 to 24 2016 (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 9798 Dubai UAE, email training-coordinator@isidusnet

More information

YeSQL: Battling the NoSQL Hype Cycle with Postgres

YeSQL: Battling the NoSQL Hype Cycle with Postgres YeSQL: Battling the NoSQL Hype Cycle with Postgres BRUCE MOMJIAN This talk explores how new NoSQL technologies are unique, and how existing relational database systems like Postgres are adapting to handle

More information

Tools for Social Networking Infrastructures

Tools for Social Networking Infrastructures Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes

More information

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete MDCC MULTI DATA CENTER CONSISTENCY Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete gpang@cs.berkeley.edu amplab MOTIVATION 2 3 June 2, 200: Rackspace power outage of approximately 0

More information

Time Series Storage with Apache Kudu (incubating)

Time Series Storage with Apache Kudu (incubating) Time Series Storage with Apache Kudu (incubating) Dan Burkert (Committer) dan@cloudera.com @danburkert Tweet about this talk: @getkudu or #kudu 1 Time Series machine metrics event logs sensor telemetry

More information

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8)

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8) PNUTS and Weighted Voting Vijay Chidambaram CS 380 D (Feb 8) PNUTS Distributed database built by Yahoo Paper describes a production system Goals: Scalability Low latency, predictable latency Must handle

More information

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( ) Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL

More information

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 16: NoSQL and JSon Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5 Today s lecture: JSon The book covers

More information

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high

More information

Introduction to NoSQL Databases

Introduction to NoSQL Databases Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 16: NoSQL and JSon CSE 414 - Spring 2016 1 Announcements Current assignments: Homework 4 due tonight Web Quiz 6 due next Wednesday [There is no Web Quiz 5] Today s lecture:

More information

CIB Session 12th NoSQL Databases Structures

CIB Session 12th NoSQL Databases Structures CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is

More information

Scaling for Humongous amounts of data with MongoDB

Scaling for Humongous amounts of data with MongoDB Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight alvinonmongodb.com From here... http://bit.ly/ot71m4 ...to here... http://bit.ly/oxcsis

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 7: Mutable State (2/2) November 13, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2017/18 Unit 15 J. Gamper 1/44 Advanced Data Management Technologies Unit 15 Introduction to NoSQL J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE ADMT 2017/18 Unit 15

More information

Latest Trends in Database Technology NoSQL and Beyond

Latest Trends in Database Technology NoSQL and Beyond Latest Trends in Database Technology NoSQL and Beyond Sebas>an Marsching www.aquenos.com Why we want more than SQL Performance / Data Size Opera>onal Costs Availability 2 NoSQL NoSQL Not Only SQL 3 NoSQL

More information

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona Beyond Relational Databases: MongoDB, Redis & ClickHouse Marcos Albe - Principal Support Engineer @ Percona Introduction MySQL everyone? Introduction Redis? OLAP -vs- OLTP Image credits: 451 Research (https://451research.com/state-of-the-database-landscape)

More information

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm Final Exam Logistics CS 133: Databases Fall 2018 Lec 25 12/06 NoSQL Final exam take-home Available: Friday December 14 th, 4:00pm in Olin Due: Monday December 17 th, 5:15pm Same resources as midterm Except

More information

Part I What are Databases?

Part I What are Databases? Part I 1 Overview & Motivation 2 Architectures 3 Areas of Application 4 History Saake Database Concepts Last Edited: April 2019 1 1 Educational Objective for Today... Motivation for using database systems

More information

Performance Evaluation of NoSQL Databases

Performance Evaluation of NoSQL Databases Performance Evaluation of NoSQL Databases A Case Study - John Klein, Ian Gorton, Neil Ernst, Patrick Donohoe, Kim Pham, Chrisjan Matser February 2015 PABS '15: Proceedings of the 1st Workshop on Performance

More information

Realtime visitor analysis with Couchbase and Elasticsearch

Realtime visitor analysis with Couchbase and Elasticsearch Realtime visitor analysis with Couchbase and Elasticsearch Jeroen Reijn @jreijn #nosql13 About me Jeroen Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com About Hippo Visitor Analysis OneHippo

More information

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook Cassandra - A Decentralized Structured Storage System Avinash Lakshman and Prashant Malik Facebook Agenda Outline Data Model System Architecture Implementation Experiments Outline Extension of Bigtable

More information

References. NoSQL Motivation. Why NoSQL as the Solution? NoSQL Key Feature Decisions. CSE 444: Database Internals

References. NoSQL Motivation. Why NoSQL as the Solution? NoSQL Key Feature Decisions. CSE 444: Database Internals References SE 444: atabase Internals Scalable SQL and NoSQL ata Stores, Rick attell, SIGMO Record, ecember 2010 (Vol. 39, No. 4) ynamo: mazon s Highly vailable Key-value Store. y Giuseppe eandia et. al.

More information

JANUARY 20, 2016, SAN JOSE, CA. Microsoft. Microsoft SQL Hekaton Towards Large Scale Use of PM for In-memory Databases

JANUARY 20, 2016, SAN JOSE, CA. Microsoft. Microsoft SQL Hekaton Towards Large Scale Use of PM for In-memory Databases JANUARY 20, 2016, SAN JOSE, CA PRESENTATION Cristian TITLE Diaconu GOES HERE Microsoft Microsoft SQL Hekaton Towards Large Scale Use of PM for In-memory Databases What this talk is about Trying to answer

More information

ITG Software Engineering

ITG Software Engineering Introduction to MongoDB Course ID: Page 1 Last Updated 12/15/2014 MongoDB for Developers Course Overview: In this 3 day class students will start by learning how to install and configure MongoDB on a Mac

More information

Distributed Data Analytics Partitioning

Distributed Data Analytics Partitioning G-3.1.09, Campus III Hasso Plattner Institut Different mechanisms but usually used together Distributing Data Replication vs. Replication Store copies of the same data on several nodes Introduces redundancy

More information

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database How do we build TiDB a Distributed, Consistent, Scalable, SQL Database About me LiuQi ( 刘奇 ) JD / WandouLabs / PingCAP Co-founder / CEO of PingCAP Open-source hacker / Infrastructure software engineer

More information

At-most-once Algorithm for Linearizable RPC in Distributed Systems

At-most-once Algorithm for Linearizable RPC in Distributed Systems At-most-once Algorithm for Linearizable RPC in Distributed Systems Seo Jin Park 1 1 Stanford University Abstract In a distributed system, like RAM- Cloud, supporting automatic retry of RPC, it is tricky

More information

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis 1 NoSQL So-called NoSQL systems offer reduced functionalities compared to traditional Relational DBMSs, with the aim of achieving

More information

Eventual Consistency 1

Eventual Consistency 1 Eventual Consistency 1 Readings Werner Vogels ACM Queue paper http://queue.acm.org/detail.cfm?id=1466448 Dynamo paper http://www.allthingsdistributed.com/files/ amazon-dynamo-sosp2007.pdf Apache Cassandra

More information

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies Object Storage 201 PRESENTATION TITLE GOES HERE Understanding Architectural Trade-offs in Object Storage Technologies SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA

More information

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016 Consistency in Distributed Storage Systems Mihir Nanavati March 4 th, 2016 Today Overview of distributed storage systems CAP Theorem About Me Virtualization/Containers, CPU microarchitectures/caches, Network

More information

Database Solution in Cloud Computing

Database Solution in Cloud Computing Database Solution in Cloud Computing CERC liji@cnic.cn Outline Cloud Computing Database Solution Our Experiences in Database Cloud Computing SaaS Software as a Service PaaS Platform as a Service IaaS Infrastructure

More information

Low-Latency Datacenters. John Ousterhout Platform Lab Retreat May 29, 2015

Low-Latency Datacenters. John Ousterhout Platform Lab Retreat May 29, 2015 Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015 Datacenters: Scale and Latency Scale: 1M+ cores 1-10 PB memory 200 PB disk storage Latency: < 0.5 µs speed-of-light delay Most

More information

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 NoSQL Databases

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 NoSQL Databases Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 NoSQL Databases Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur Schmid, Daniyal Kazempour,

More information

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

The NoSQL Ecosystem. Adam Marcus MIT CSAIL The NoSQL Ecosystem Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in The Architecture of Open Source Applications

More information

Kaladhar Voruganti Senior Technical Director NetApp, CTO Office. 2014, NetApp, All Rights Reserved

Kaladhar Voruganti Senior Technical Director NetApp, CTO Office. 2014, NetApp, All Rights Reserved Kaladhar Voruganti Senior Technical Director NetApp, CTO Office Storage Used to Be Simple DRAM $$$ DISK Nearline TAPE volatile persistent Access Latency 2 Talk Focus: Persistent Memory Design Center DRAM

More information

Cloud Spanner. Rohit Gupta, Solutions

Cloud Spanner. Rohit Gupta, Solutions Cloud Spanner Rohit Gupta, Solutions Engineer @rohitforcloud Today s goals Provide a brief history of Spanner at Google Provide an explanation of Cloud Spanner Do a demo! Built on the same infrastructure

More information

MDHIM: A Parallel Key/Value Store Framework for HPC

MDHIM: A Parallel Key/Value Store Framework for HPC MDHIM: A Parallel Key/Value Store Framework for HPC Hugh Greenberg 7/6/2015 LA-UR-15-25039 HPC Clusters Managed by a job scheduler (e.g., Slurm, Moab) Designed for running user jobs Difficult to run system

More information

The Consistency Analysis of Secondary Index on Distributed Ordered Tables. Houliang Qi, Xu Chang, Xingwu Liu, Li Zha

The Consistency Analysis of Secondary Index on Distributed Ordered Tables. Houliang Qi, Xu Chang, Xingwu Liu, Li Zha The Consistency Analysis of Secondary Index on Distributed Ordered Tables Houliang Qi, Xu Chang, Xingwu Liu, Li Zha Agenda Background MoEvaEon SoluEons Consistency Consistency Window Consistency Model

More information

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours

More information

10. Replication. Motivation

10. Replication. Motivation 10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure

More information

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22

More information

Flash Storage Complementing a Data Lake for Real-Time Insight

Flash Storage Complementing a Data Lake for Real-Time Insight Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum

More information

Study of NoSQL Database Along With Security Comparison

Study of NoSQL Database Along With Security Comparison Study of NoSQL Database Along With Security Comparison Ankita A. Mall [1], Jwalant B. Baria [2] [1] Student, Computer Engineering Department, Government Engineering College, Modasa, Gujarat, India ank.fetr@gmail.com

More information

A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff

A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff Percona Live! MySQL Conference Santa Clara, April 12th, 2012 v1.3 Intro: Globalizing NDB Proposed Architecture What We Learned

More information

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY WHAT IS NOSQL? Stands for No-SQL or Not Only SQL. Class of non-relational data storage systems E.g.

More information

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden SQL, NoSQL, MongoDB CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden SQL Databases Really better called Relational Databases Key construct is the Relation, a.k.a. the table Rows represent records Columns

More information

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis 1 NoSQL So-called NoSQL systems offer reduced functionalities compared to traditional Relational DBMS, with the aim of achieving

More information

EECS 498 Introduction to Distributed Systems

EECS 498 Introduction to Distributed Systems EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha November 27, 2017 EECS 498 Lecture 19 2 Windows Azure Storage Focus on storage within a data center Goals: Durability Strong

More information

Shen PingCAP 2017

Shen PingCAP 2017 Shen Li @ PingCAP About me Shen Li ( 申砾 ) Tech Lead of TiDB, VP of Engineering Netease / 360 / PingCAP Infrastructure software engineer WHY DO WE NEED A NEW DATABASE? Brief History Standalone RDBMS NoSQL

More information

Relational databases

Relational databases COSC 6397 Big Data Analytics NoSQL databases Edgar Gabriel Spring 2017 Relational databases Long lasting industry standard to store data persistently Key points concurrency control, transactions, standard

More information

Scaling with mongodb

Scaling with mongodb Scaling with mongodb Ross Lawley Python Engineer @ 10gen Web developer since 1999 Passionate about open source Agile methodology email: ross@10gen.com twitter: RossC0 Today's Talk Scaling Understanding

More information

Intro Cassandra. Adelaide Big Data Meetup.

Intro Cassandra. Adelaide Big Data Meetup. Intro Cassandra Adelaide Big Data Meetup instaclustr.com @Instaclustr Who am I and what do I do? Alex Lourie Worked at Red Hat, Datastax and now Instaclustr We currently manage x10s nodes for various customers,

More information

LogBase: A Scalable Log-structured Database System in the Cloud

LogBase: A Scalable Log-structured Database System in the Cloud : A Scalable Log-structured Database System in the Cloud [Technical Report] Hoang Tam Vo #, Sheng Wang #, Divyakant Agrawal, Gang Chen, Beng Chin Ooi # # School of Computing, National University of Singapore

More information

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM About us Adamo Tonete MongoDB Support Engineer Agustín Gallego MySQL Support Engineer Agenda What are MongoDB and MySQL; NoSQL

More information