SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING

Similar documents
Schism: a Workload-Driven Approach to Database Replication and Partitioning

Database Design. Wenfeng Xu Hanxiang Zhao

Low Overhead Concurrency Control for Partitioned Main Memory Databases. Evan P. C. Jones Daniel J. Abadi Samuel Madden"

Anti-Caching: A New Approach to Database Management System Architecture. Guide: Helly Patel ( ) Dr. Sunnie Chung Kush Patel ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Database Management and Tuning

Huge market -- essentially all high performance databases work this way

HYRISE In-Memory Storage Engine

Low Overhead Concurrency Control for Partitioned Main Memory Databases

CSE 344 Final Review. August 16 th

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems

Crescando: Predictable Performance for Unpredictable Workloads

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

Column Stores vs. Row Stores How Different Are They Really?

HAWQ: A Massively Parallel Processing SQL Engine in Hadoop

Automated Data Partitioning for Highly Scalable and Strongly Consistent Transactions

Efficient, Scalable, and Provenance-Aware Management of Linked Data

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II

7. Query Processing and Optimization

management systems Elena Baralis, Silvia Chiusano Politecnico di Torino Pag. 1 Distributed architectures Distributed Database Management Systems

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Main-Memory Databases 1 / 25

Advanced Databases: Parallel Databases A.Poulovassilis

DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

A Graph-based Database Partitioning Method for Parallel OLAP Query Processing

Portland State University ECE 588/688. Graphics Processors

Performance Isolation in Multi- Tenant Relational Database-asa-Service. Sudipto Das (Microsoft Research)

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25

Database Architectures

Distributed Database Management Systems. Data and computation are distributed over different machines Different levels of complexity

Transactions. Kathleen Durant PhD Northeastern University CS3200 Lesson 9

SCALABLE WEB PROGRAMMING. CS193S - Jan Jannink - 2/02/10

Architecture and Implementation of Database Systems (Winter 2014/15)

Citation for published version (APA): Ydraios, E. (2010). Database cracking: towards auto-tunning database kernels

CSE 544: Principles of Database Systems

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

Distributed Transaction Processing in the Escada Protocol

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

The Role of Database Aware Flash Technologies in Accelerating Mission- Critical Databases

Concurrency Control In Distributed Main Memory Database Systems. Justin A. DeBrabant

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8)

Fine-grained Data Partitioning Framework for Distributed Database Systems

Distributed KIDS Labs 1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 16-1

IOGP. an Incremental Online Graph Partitioning algorithm for distributed graph databases. Dong Dai*, Wei Zhang, Yong Chen

Database Architectures

CSE 544 Principles of Database Management Systems

Hyrise - a Main Memory Hybrid Storage Engine

An Evaluation of Distributed Concurrency Control. Harding, Aken, Pavlo and Stonebraker Presented by: Thamir Qadah For CS590-BDS

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Shen PingCAP 2017

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

HyPer on Cloud 9. Thomas Neumann. February 10, Technische Universität München

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

Achieving Horizontal Scalability. Alain Houf Sales Engineer

Enhancing Throughput of

Outline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs.

VOLTDB + HP VERTICA. page

Automated Data Partitioning for Highly Scalable and Strongly Consistent Transactions

A Non-Relational Storage Analysis

YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores

Survey: A Wide Survey on Graph Partitioning Algorithms

PostgreSQL Cluster. Mar.16th, Postgres XC Write Scalable Cluster

Percolator. Large-Scale Incremental Processing using Distributed Transactions and Notifications. D. Peng & F. Dabek

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

STORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)

STAR: Scaling Transactions through Asymmetric Replication

data parallelism Chris Olston Yahoo! Research

Chapter 12: Indexing and Hashing. Basic Concepts

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases

Technical Comparison of Oracle Database vs. SQL Server 2000: Focus on Performance. An Oracle White Paper May 2005

The Future of Postgres Sharding

CS317 File and Database Systems

Chapter 12: Indexing and Hashing

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

SCALABLE CONSISTENCY AND TRANSACTION MODELS

CSE 190D Database System Implementation

CS317 File and Database Systems

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Course Content. Parallel & Distributed Databases. Objectives of Lecture 12 Parallel and Distributed Databases

Chapter 8 Virtual Memory

Course Outline. Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led

L i (A) = transaction T i acquires lock for element A. U i (A) = transaction T i releases lock for element A

Recent Innovations in Data Storage Technologies Dr Roger MacNicol Software Architect

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

BDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz

CS848 Paper Presentation Building a Database on S3. Brantner, Florescu, Graf, Kossmann, Kraska SIGMOD 2008

Oracle NoSQL Database at OOW 2017

What s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Exadata Implementation Strategy

Chapter 17: Parallel Databases

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores

Parallel Databases C H A P T E R18. Practice Exercises

Transcription:

SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING ZEYNEP KORKMAZ CS742 - PARALLEL AND DISTRIBUTED DATABASE SYSTEMS UNIVERSITY OF WATERLOO

OUTLINE. Background 2. What is Schism? 3. Cost of Distributed Transactions 4. Partitioning and Replication. Optimization 6. Experiments 7. Conclusion 8. Discussion - Critiques

BACKGROUND Problem: Scaling database workloads Solution: Partioning minimize the number of nodes involved in answering a query Round-robin Range Hash Social networking workloads Hard to partition

BACKGROUND Problem: Distributed transactions are expensive Solution: Minimize the number of distributed transactions, while producing balanced partitions Schism

SCHISM A novel graph-based, data driven partitioning system for transactional workloads. Data pre-processing Input: trace of transactions & DB Read and write sets Creating the graph Nodes: tuples Edges: transactions Partitioning the graph Balanced min-cut partitioning & replication Explaining the partition Decision tree on frequent attribute set Final validation The best strategy?

COST OF DISTRIBUTED TRANSACTIONS Transactions access data on a single node No additional overhead Distributed transactions are expensive: Contention: Overheads of locking Distributed deadlocks Complex statements need to access data from multiple servers Experiment: Single transaction two rows; issuing two statements Every transaction is run on a server Every transaction is distributed

COST OF DISTRIBUTED TRANSACTIONS Reducing throughput by a factor of 2 Double the average latency

GRAPH REPRESENTATION Graph representation: build a graph from transaction traces Node: tuple Edges: usage of the tuples within a transaction Edge weights: #transactions that co-access a pair of tuples Hypergraphs? Extension: replicated tuples

GRAPH REPRESENTATION BEGIN UPDATE account SET bal=60k WHERE id=2; SELECT * FROM account WHERE id=; COMMIT account id name bal carlo 80k 2 evan 60k 3 sam 29k 4 eugene 29k yang 2k......... BEGIN UPDATE account SET bal=bal-k WHERE name="carlo"; UPDATE account SET bal=bal+k WHERE name="evan"; COMMIT 2 3 4 PARTITION 0 BEGIN SELECT * FROM account WHERE id IN {,3} ABORT BEGIN UPDATE SET bal=bal+k WHERE bal < 00k; COMMIT PARTITION transaction edges

GRAPH WITH REPLICATION Extension to basic graph representation: Tuple-level replication A singel node: a singel tuple (basic graph) Star-shaped configuration: n+ nodes: a single tuple n: #transactions that access the tuple Replication edge weights: #transactions that update the tuple in the workload

GRAPH WITH REPLICATION 2 2 2 2 2 3 3 3 3 2 3 4 2 2 2 replication edges transaction edges

GRAPH PARTITIONING Splits graph into k non-overlapping partitions: Overall cost of the cut edges is minimized (min-cut) Keep the weight of partitions within a constant factor of perfect balance Decide replication of tuple and distributed updates or place it in a single partition and distributed transactions? Use METIS to partition the graph Assign nodes to partitions

GRAPH PARTITIONING tuple partition id label R 2 0 3 0 4 2 2 2 2 2 3 3 3 3 2 3 4 2 2 2 replication edges transaction edges PARTITION 0 PARTITION Fine-grained mapping between nodes and partitions Look-up table on attributes that frequently appears in WHERE clauses.

GRAPH PARTITIONING Look-up tables: Stored in RAM? Efficient maintanence for updates Not ideal for large DB or insert-heavy workloads Another phase of Schism: Explanation Predicate based partitioning

EXPLAINING THE PARTITION Find a compact model/rules that represent the partitions Decision Trees Values: tuples Labels: partitions Replicated tuples are labeled by replication identifier

EXPLAINING THE PARTITION The explanation is useful if; It is base on frequent attributes It does not reduce the partitioning quality too much It avoids over-fitting prunning

FINAL VALIDATION Compare solutions Bestsolution: Provides the smallest number of distributed transactions. Fine-grained per tuple partitioning Range predicate partitioning Hash partitioning Full replication Tie? Lowest complexity Hash vs predicate?

OPTIMIZATIONS Scalability: Graph partitioning scale well in terms of the number of partitions, but running time increases substantially with graph size.

OPTIMIZATIONS Reducing the size of graph with a limited impact on quality: Transaction-level sampling Reducing #edges Tuple-level sampling Reducing #nodes Tuple-coalescing Represents tuples that are always accessed together

EXPERIMENTS The experiments compare #transactions produced by; Schism Fine-grained per tuple Range predicates Best manual partitioning Replication of all tables Hash partitioning The fraction of the sampled dataset & #partitions Final validation

EXPERIMENTS Datasets: Yahoo Cloud Serving Benchmark Workload A: reads updates (%0-%0) Workload E: short scan one tuple update (%9-%) TPC-C: write intensive OLTP workload Sampling, #partitions 2W 0W TPC-E: read intensive OLTP workload Complex (33 tables, 88 columns, 0 kinds of transactions) Epinions.com: Social website, n-to-n relations in the schema

EXPERIMENTS

CONCLUSION Schism; System for fine-grained partitioning of OLTP DB Represents DB and transactions as a graph Supports tuple-level replication Uses classification techniques to represent partitions Uses graph-partitioning algorithm Proposes sampling to reduce graph size

DISCUSSIONS Schism overcome the partitioning challenges Distributed transactions Many-to-many relations The quality of sampling & decision tree? Prunning? What is the running time of Schism including all steps? Overhead of complexity - Choose the simplest. Overhead of fine-grained partitioning?

DISCUSSIONS The provided scalability is a result of METIS graph partitioning Schism focuses on using classification techniques to transform fine-grained partitioning into range partitions. How to use fine-grained partitioning Hypergraph vs collection of edges

DISCUSSIONS Statements that access tuples using partitioning attributes are sent to those partitions Access table using other attributes? Broadcast the statement to all partitions More complex statements: access multiple tables using non-partition attributes? Not currently handled.

THANK YOU