management systems Elena Baralis, Silvia Chiusano Politecnico di Torino Pag. 1 Distributed architectures Distributed Database Management Systems

Similar documents
Distributed Database Management Systems. Data and computation are distributed over different machines Different levels of complexity

Database Systems Concepts, Languages and Architectures

Distributed KIDS Labs 1

Database Management Systems Concurrency Control

Elena Baralis, Silvia Chiusano Politecnico di Torino. Reliability Management. DBMS Architecture D B M G. Database Management Systems. Pag.

Database Management Systems

Chapter 19: Distributed Databases

Database Architectures

Chapter 18: Parallel Databases

Chapter 19: Distributed Databases

Distributed Transaction Management

Chapter 18: Parallel Databases Chapter 19: Distributed Databases ETC.

Database Architectures

Distributed Databases

DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Advances in Data Management Distributed and Heterogeneous Databases - 2

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Distributed DBMS. Concepts. Concepts. Distributed DBMS. Concepts. Concepts 9/8/2014

Mobile and Heterogeneous databases Distributed Database System Query Processing. A.R. Hurson Computer Science Missouri Science & Technology

Database Management Systems Introduction to DBMS

Chapter 2 Distributed Information Systems Architecture

CORBA Object Transaction Service

Distributed Databases

Distributed Database Systems

It also performs many parallelization operations like, data loading and query processing.

Chapter 1: Distributed Information Systems

Distributed Database Systems

Introduction to the course

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1

CSE 444: Database Internals. Lecture 22 Distributed Query Processing and Optimization

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

Database Replication in Tashkent. CSEP 545 Transaction Processing Sameh Elnikety

Transactions. ACID Properties of Transactions. Atomicity - all or nothing property - Fully performed or not at all

SQL Server 2005 Analysis Services

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

TIBCO StreamBase 10 Distributed Computing and High Availability. November 2017

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication

Database Management Systems Reliability Management

Problems Caused by Failures

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Database. Università degli Studi di Roma Tor Vergata. ICT and Internet Engineering. Instructor: Andrea Giglio

Heckaton. SQL Server's Memory Optimized OLTP Engine

Huge market -- essentially all high performance databases work this way

CSE 544, Winter 2009, Final Examination 11 March 2009

Distributed Databases Systems

Advanced Databases: Parallel Databases A.Poulovassilis

Integrity in Distributed Databases

Gustavo Alonso, ETH Zürich. Web services: Concepts, Architectures and Applications - Chapter 1 2

Distributed Databases. Distributed Database Systems. CISC437/637, Lecture #21 Ben Cartere?e

GUJARAT TECHNOLOGICAL UNIVERSITY

CSE 344 Final Review. August 16 th

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Scaling Database Systems. COSC 404 Database System Implementation Scaling Databases Distribution, Parallelism, Virtualization

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

INTRODUCTORY INFORMATION TECHNOLOGY ENTERPRISE DATABASES AND DATA WAREHOUSES. Faramarz Hendessi

Control. CS432: Distributed Systems Spring 2017

Concurrency and Recovery

Topics in Reliable Distributed Systems

Course Content. Parallel & Distributed Databases. Objectives of Lecture 12 Parallel and Distributed Databases

6.830 Problem Set 3 Assigned: 10/28 Due: 11/30

Distributed System Chapter 16 Issues in ch 17, ch 18

Distributed Data Management

Chapter 3. Database Architecture and the Web

CS5412: TRANSACTIONS (I)

Overview of Transaction Management

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Distributed Databases

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Chapter 11 - Data Replication Middleware

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Announcements. me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris

Concurrency Control & Recovery

Chapter 17: Parallel Databases

Module 16: Distributed System Structures

Fault tolerance with transactions: past, present and future. Dr Mark Little Technical Development Manager, Red Hat

5. Distributed Transactions. Distributed Systems Prof. Dr. Alexander Schill

Adapting Commit Protocols for Large-Scale and Dynamic Distributed Applications

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

HYRISE In-Memory Storage Engine

SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING

CSE 190D Database System Implementation

Distributed Databases

Security Mechanisms I. Key Slide. Key Slide. Security Mechanisms III. Security Mechanisms II

C-Store: A column-oriented DBMS

02 - Distributed Systems

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

MarkLogic Server. Database Replication Guide. MarkLogic 6 September, Copyright 2012 MarkLogic Corporation. All rights reserved.

Janus: A Hybrid Scalable Multi-Representation Cloud Datastore

Transactions. Kathleen Durant PhD Northeastern University CS3200 Lesson 9

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Suggested Topics for Written Project Report. Traditional Databases:

Middleware. Adapted from Alonso, Casati, Kuno, Machiraju Web Services Springer 2004

Introduction to Database Systems CSE 444. Lecture 26: Distributed Transactions

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Consistency and Scalability

Transcription:

atabase Management Systems istributed database istributed architectures atabase Management Systems istributed atabase Management Systems ata and computation are distributed over different machines ifferent levels of complexity epending on the independence level of nodes Typical advantages Performance improvement Increased availability Stronger reliability B M G 1 M BG 2 istributed architectures istributed architectures Client/server Simplest and more widespread Server manages the database Client manages the user interface istributed database system ifferent BMS servers on different network nodes autonomous able to cooperate Guaranteeing the ACI properties requires more complex techniques ata replication A replica is a copy of the data stored on a different network node The replication server autonomously manages copy update Simpler architecture than distributed database M BG 3 M BG 4 istributed architectures Relevant properties Parallel architectures Performance increase is the only objective ifferent architectures Multiprocessor machines CPU clusters edicated network connections ata warehouses Servers specialized in decision support Perform OLAP (On Line Analytical Processing) different from OLTP (On Line Transaction Processing) M BG 5 Portability Capability of moving a program from a system to a different system Guaranteed by the SQL standard Interoperability Capability of different BMS servers to cooperate on a given task Interaction protocols are needed OBC X-Open-TP M BG 6 Pag. 1 1

atabase Management Systems istributed database Client/server architectures atabase Management Systems 2-Tier Thick clients with some application logic BMS server provides access to data CLIENT 1 BMS SERVER CLIENTn Client/server Architectures B B M G 7 M BG 8 Client/server architectures SQL execution 3-Tier Thin client browser Application server implements business logic typically also a web server BMS Server provides access to data CLIENT 1 BMS SERVER B CLIENTn APPLICATION SERVER Compile & Go The query is sent to the server The query is prepared generation of the query plan The query is executed The result is shipped The query plan is not stored on the server Effective for one-shot query executions provides flexible execution of dynamic SQL M BG 9 M BG 10 SQL execution Compile & Store The query is sent to the server The query is prepared generation of the query plan the query plan is stored for future usage may continue with execution the query is executed the result is shipped Efficient for repeated query executions parametric executions of the same query atabase Management Systems istributed atabase Systems M BG 11 B M G 12 Pag. 2 2

atabase Management Systems istributed database istributed database systems Client transactions access more than one BMS server ifferent complexity of available distributed services Local autonomy Each BMS server manages its local data in an autonomous way e.g., concurrency control, recovery istributed database systems Functional advantages Appropriate localization of data and applications e.g., geographical distribution Technological advantages Increased data availability Total block probability is reduced Local blocks may be more frequent Enhanced scalability Provided by the modularity and flexibility of the architecture M BG 13 M BG 14 ata fragmentation atabase Management Systems istributed atabase esign Given a relation R, a data fragment is a subset of R in terms of tuples, or schema, or both ifferent criteria to perform fragmentation horizontal subset of tuples vertical subset of schema mixed both horizontal and vertical together B M G 15 M BG 16 Horizontal fragmentation The horizontal fragmentation of a relation R selects a subset of tuples in R with same schema of R obtained by means of s p p is the partitioning predicate Fragments are not overlapped The following table is given Employee (Emp#, Ename, eptname, Tax) Horizontal fragmentation on attribute eptname card(eptname) = N E 1 = s eptname = Production Employee E N = s eptname = Marketing Employee Reconstruction of the original table Employee = E 1 E 2 E N M BG 17 M BG 18 Pag. 3 3

atabase Management Systems istributed database Vertical fragmentation The vertical fragmentation of a relation R selects a subset of schema of R Obtained by means of p X X is a subset of the schema of R The primary key should be included in X to allow rebuilding R All tuples are included Fragments are overlapping on the primary key The following table is given Employee (Emp#, Ename, eptname, Tax) Vertical fragmentation E 1 = p Emp#, Ename, eptname Employee E 2 = p Emp#, Ename, Tax Employee Reconstruction of the original table Employee = E 1 E 2 M BG 19 M BG20 20 Fragmentation properties istributed database design Completeness each information in relation R is contained in at least one fragment R i Correctness the information in R can be rebuilt from its fragments It is based on data fragmentation ata distribution over different servers Each fragment of a relation R is usually stored in a different file possibly, on a different server Relation R does not exist it may be rebuilt from fragments M BG 21 M BG 22 Allocation of fragments Allocation of fragments The allocation schema describes how fragments are stored on different server nodes Non redundant mapping if each fragment is stored on one single node SITE 1 Redundant mapping if some fragments are replicated on different servers increased data availability complex maintenance copy synchronization is needed SITE 1 SITE 2 SITE 2 SITE 1 SITE 1 + SITE 2 M BG 23 M BG 24 Pag. 4 4

atabase Management Systems istributed database Transparency levels Transparency levels describe the knowledge of data distribution An application should operate differently depending on the transparency level supported by the BMS Transparency levels fragmentation transparency allocation transparency language transparency Transparency levels The following table is given Supplier S (S#, SName, City, Status) Horizontal fragmentation on the City attribute domain of city = {Torino, Roma} Allocation schema Horizontal fragment S 1 = s city = Torino S S 2 = s city = Roma S Allocation schema S 1 @xxx.torino.it S 2 @xxx.roma1.it S 2 @xxx.roma2.it M BG 25 M BG 26 Fragmentation transparency Applications know the existence of tables and not of their fragments data distribution is not visible The programmer only accesses table S not its fragments SELECT SName FROM S WHERE S#=:COE M BG 27 Allocation transparency Applications know the existence of fragments, but not their allocation not aware of replication of fragments must enumerate all fragments SELECT SName FROM S 1 WHERE S# = :COE IF (NOT FOUN) SELECT SName FROM S 2 M B WHERE S# = :COE G 28 Language transparency The programmer should select both the fragment and its allocation No SQL dialects are used This is the format in which higher level queries are transformed by a distributed BMS SELECT SName FROM S 1 @xxx.torino.it WHERE S# = :COE Selection of a IF (NOT FOUN) specific replica of S 2 SELECT SName FROM S 2 @xxx.roma1.it M BG WHERE S# = :COE 29 atabase Management Systems B M G Transaction classification 30 Pag. 5 5

atabase Management Systems istributed database Transaction classification The client requests the execution of a transaction to a given BMS server the BMS server is in charge of redistributing it Classes define different complexity levels in the interaction among BMS servers They are based on the type of SQL instruction which the transaction is allowed to contain Remote request Read only request only select statement Single remote server Remote transaction Any SQL command Single remote server Transaction classification M BG 31 M BG 32 Transaction classification istributed transaction Any SQL command Each SQL statement is addressed to one single server Global atomicity is needed 2 phase commit protocol istributed request Each SQL command may refer to data on different servers istributed optimization is needed Fragmentation transparency is in this class only M BG 33 The following table is given Account (Acc#, Name, Balance) Fragments and allocation schema Horizontal fragmentation Allocation schema A 1 = s acc# < 10000 Account Node 1 A 2 = s acc# >=10000 Account Node 2 M BG 34 Money transfer transaction BoT (Beginning of transaction) UPATE Account SET Balance = Balance - 100 WHERE Acc# = 3000 UPATE Account SET Balance = Balance + 100 WHERE Acc# = 13000 EoT (End of transaction) What is the class of the transaction? istributed request because Account is not explicitly partitioned If instead the update instructions reference explicitly A 1 and A 2 istributed transaction If both update instructions reference only A 1 e.g., second update with WHERE Acc#=9000 Remote transaction B M G 35 M BG 36 Pag. 6 6

atabase Management Systems istributed database ACI properties atabase Management Systems istributed BMS Technology Atomicity It requires distributed techniques 2 phase commit Consistency Constraints are currently enforced only locally Isolation It requires strict 2PL and 2 Phase Commit urability It requires the extension of local procedures to manage atomicity in presence of failure B M G 37 M BG 38 Other issues Atomicity istributed query optimization is performed by the BMS receiving the query execution request It partitions the query in subqueries, each addressed to a single BMS It selects the execution strategy order of operations and execution technique order of operations on different nodes transmission cost may become relevant (optionally) selection of the appropriate replica It coordinates operations on different nodes and information exchange All nodes (i.e., BMS servers) participating to a distributed transaction must implement the same decision (commit or rollback) Coordinated by 2 phase commit protocol Failure causes Node failure Network failure which causes lost messages Acknowledgement of messages (ack) Usage of timeout Network partitioning in separate subnetworks M BG 39 M BG 40 Objective Coordination of the conclusion of a distributed transaction Parallel with a wedding Priest celebrating the wedding Coordinates the agreement Couple to be married Participate to the agreement istributed transaction One coordinator Transaction Manager () Several BMS servers which take part to the transaction Resource Managers () Any participant may take the role of Also the client requesting the transaction execution M BG 41 M BG 42 Pag. 7 7

atabase Management Systems istributed database New log records and have separate private logs Records in the log it contains the identity of all s participating to the transaction (Node I + Process I) Global commit/abort final decision on the transaction outcome Complete written at the end of the protocol New log records New records in the log Ready The is willing to perform commit of the transaction The decision cannot be changed afterwards The node has to be in a reliable state WAL and commit precedence rules are enforced Resources are locked After this point the loses its autonomy for the current transaction M BG 43 M BG 44 Phase I 1. The Writes the prepare record in the log Sends the prepare message to all (participants) Sets a timeout, defining maximum waiting time for answer M BG 45 M BG 46 Phase I Ready 2. The s Wait for the prepare message When they receive it If they are in a reliable state Write the ready record in the log Send the ready message to the If they are not in a reliable state Send a not ready message to the Terminate the protocol Perform local rollback If the crashed No answer is sent M BG 47 M BG 48 Pag. 8 8

atabase Management Systems istributed database Phase I Ready Global 3. The Collects all incoming messages from the s If it receives ready from all s The commit global decision record is written in the log If it receives one or more not ready or the timeout expires The abort global decision record is written in the log M BG 49 M BG 50 Phase II 1. The Sends the global decision to the s Sets a timeout for the answers Ready Global B M BG 51 M BG 52 Phase II 2. The Waits for the global decision When it receives it The commit/abort record is written in the log The database is updated An ACK message is sent to the Ready Global Complete B M BG 53 M BG 54 Pag. 9 9

atabase Management Systems istributed database Phase II 3. The Collects the ACK messages from the s If all ACK messages are received The complete record is written in the log If the timeout expires and some ACK messages are missing A new timeout is set The global decision is resent to the s which did not answer until all answers are received Ready B uncertainty window Global Complete Phase I Phase II M BG 55 M BG 56 Uncertainty window Each is affected by an uncertainty window Start after ready msg is sent End upon receipt of global decision Local resources in the are locked during the uncertainty window It should be small M BG 57 Failure of a participant () The warm restart procedure is modified with a new case If the last record in the log for transaction T is ready, then T does not know the global decision of its Recovery REAY list new list collecting the Is of all transactions in ready state For all transactions in the ready list, the global decision is asked to the at restart Remote recovery request M BG 58 Failure of the coordinator () Messages that can be lost (outgoing) Ready (incoming) I Phase Global decision (outgoing) II Phase Recovery If the last record in the log is prepare The global abort decision is written in the log and sent to all participants Alternative: redo phase I (not implemented) If the last record in the log is the global decision M BG Repeat phase II 59 Network failures Any network problem in phase I causes global abort The prepare or the ready msg are not received Any network problem in phase II causes the repetition of phase II The global decision or the ACK are not received M BG 60 Pag. 10 10

atabase Management Systems istributed database X-Open-TP atabase Management Systems X-Open-TP Protocol for the coordination of distributed transactions It guarantees interoperability of distributed transactions on heterogeneous BMSs i.e., different BMS products Based on One client One Several s B M G 61 M BG 62 Interfaces Standard features X-Open-TP defines interfaces for the communication between client and interface between and XA interface BMS servers provide the XA interface Specialized products implement the and provide the interface E.g., BEA tuxedo M BG 63 s are passive and only answer to remote procedure invocations from the The control of the protocol is embedded in the The protocol implements two optimizations of 2 Phase Commit Presumed abort Read only Heuristic decision to allow controlled transaction evolution in presence of failures M BG 64 Presumed abort The, when no information is available in the log, answers abort to a remote recovery request by a Reduces the number of synchronous log writes prepare, global abort, complete are not synchronous Synchronous writes are still needed global commit in log ready, commit in log Exploited by a that did not modify its database during the transaction The answers read only to the prepare request does not write the log locally terminates the protocol The will ignore the in phase II of the protocol Read only M BG 65 M BG 66 Pag. 11 11

Client - communication Session Transaction Transaction atabase Management Systems istributed database Heuristic decision Heuristic decision Allows transaction evolution in presence of failures uring the uncertainty window, a may be blocked because of a failure Locked resources are blocked until recovery The blocked transaction evolves locally under operator control Transaction end is forced by the operator Typically rollback, rarely commit Heuristic decision, because actual transaction outcome is not known Blocked resources are released M BG 67 uring recovery, decisions are compared to the actual decisions If decision and heuristic decision are different, atomicity is lost The protocol guarantees that the inconsistency is notified to the client process Resolving inconsistencies caused by a heuristic decision is up to user applications M BG 68 Protocol interaction Client ( Interface) (XA Interface).Init().Open.Begin() : :.Commit() XA.Open() XA.Start() XA.Precom() XA.Abort() XA.Commit() XA.End().Term() XA.Close().Exit() M BG 69 atabase Management Systems B M G Parallel BMS 70 Parallelism Inter-query parallelism Parallel computation increases BMS efficiency Queries can be effectively parallelized s large table scan performed in parallel on different portions of data data is fragmented on different disks group by on a large dataset partitioned on different processors group by result merged ifferent technological solutions are available Multiprocessor systems Computer clusters M BG 71 ifferent queries are scheduled on different processors Used in OLTP systems Appropriate for workloads characterized by simple, short transactions high transaction load 100-1000 tps Load balancing on the pool of available processing units M BG 72 Pag. 12 12

atabase Management Systems istributed database Intra-query parallelism Subparts of the same query are executed on different processors Used in OLAP systems Appropriate for workloads characterized by complex queries reduced query load Complex queries are partitioned in subqueries each subquery performs one or more operations on a subset of data group by and join are easily parallelizable pipelining operations is possible M BG 73 atabase Management Systems B M G BMS benchmarks 74 BMS benchmarks Benchmarks describe the conditions in which performance is measured for a system BMS benchmarks are standardized by the TPC (Transaction Processing Council) Each benchmark is characterized by Transaction load distribution of arrival time of transactions atabase size and content randomized data generation Transaction code Techniques to measure and certify performance M BG 75 Types of benchmarks TPC C Order entry transactions It simulates the behavior of an OLTP system New evolution is TPC E TPC H ecision support (OLAP) It is a mix of complex queries and some updates TPC App Transactions on the web Simulation of an e-commerce site M BG 76 Pag. 13 13