Introduction to the Course

Similar documents
Distributed Databases Systems

Outline. File Systems. Page 1

Distributed KIDS Labs 1

DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Database Management Systems

Distributed DBMS. Concepts. Concepts. Distributed DBMS. Concepts. Concepts 9/8/2014

Information Integration

Database Architectures

It also performs many parallelization operations like, data loading and query processing.

Specific Objectives Contents Teaching Hours 4 the basic concepts 1.1 Concepts of Relational Databases

Database Architectures

CMPT 354 Database Systems I. Spring 2012 Instructor: Hassan Khosravi

Distributed Databases

Chapter 19: Distributed Databases

GUJARAT TECHNOLOGICAL UNIVERSITY

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology

Dr. Awad Khalil. Computer Science & Engineering department

Database Management System. Fundamental Database Concepts

Distributed Data Management

Distributed Databases

Chapter 3. Database Architecture and the Web

Distributed Database Systems

Distributed Database Systems

Distributed Transaction Management 2003

Chapter 18: Parallel Databases Chapter 19: Distributed Databases ETC.

Course Book Academic Year

Chapter 18: Parallel Databases

COURSE 1. Database Management Systems

Quick Facts about the course. CS 2550 / Spring 2006 Principles of Database Systems. Administrative. What is a Database Management System?

DISTRIBUTED DATABASES

Chapter 19: Distributed Databases

CSC 4710 / CSC 6710 Database Systems. Rao Casturi

VII. Database System Architecture

CMPUT 391 Database Management Systems. Fall Semester 2006, Section A1, Dr. Jörg Sander. Introduction

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

CONCEPTS OF DISTRIBUTED AND PARALLEL DATABASE

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Operating Systems Comprehensive Exam. Spring Student ID # 2/17/2011

Chapter 9: Concurrency Control

Introduction and Overview

Advanced Databases Lecture 17- Distributed Databases (continued)

UNIVERSITY OF BOLTON CREATIVE TECHNOLOGIES COMPUTING PATHWAY SEMESTER TWO EXAMINATION 2014/2015 ADVANCED DATABASE SYSTEMS MODULE NO: CPU6007

Lecture 1: January 23

Database Management Systems MIT Introduction By S. Sabraz Nawaz

data dependence Data dependence Structure dependence

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1

Distributed Databases

Multi-CPU and distributed systems. Database system architectures. Lecture topics: monolithic systems client server systems parallel database servers

Introduction: Database Concepts Slides by: Ms. Shree Jaswal

CSE 5306 Distributed Systems. Course Introduction

Introduction and Overview

Lecture 1: January 22

LIS 2680: Database Design and Applications

Lecture 23 Database System Architectures

CS510 Operating System Foundations. Jonathan Walpole

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Database Administration. Database Administration CSCU9Q5. The Data Dictionary. 31Q5/IT31 Database P&A November 7, Overview:

Introduction. Distributed Systems. Introduction. Introduction. Instructor Brian Mitchell - Brian

Chapter 20: Database System Architectures

CPS352 - DATABASE SYSTEMS. Professor: Russell C. Bjork Spring semester, Office: KOSC 242 x4377

Operating Systems (ECS 150) Spring 2011

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013

The Replication Technology in E-learning Systems

Chapter 1 Introduction

Computer Science and Engineering Technology Course Descriptions

Addressed Issue. P2P What are we looking at? What is Peer-to-Peer? What can databases do for P2P? What can databases do for P2P?

CSE 3241: Database Systems I Databases Introduction (Ch. 1-2) Jeremy Morris

Distributed OS and Algorithms

Process Synchronisation (contd.) Deadlock. Operating Systems. Spring CS5212

management systems Elena Baralis, Silvia Chiusano Politecnico di Torino Pag. 1 Distributed architectures Distributed Database Management Systems

Advanced Databases: Parallel Databases A.Poulovassilis

Introduction Distributed Systems

Chapter 1: Introduction

Mini-Project, Exam, Etc

Client Server & Distributed System. A Basic Introduction

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Mobile and Heterogeneous databases Security. A.R. Hurson Computer Science Missouri Science & Technology

Introduction. A more thorough explanation of the overall topic

DATABASE MANAGEMENT SYSTEM ARCHITECTURE

Transaction Management Exercises KEY

Philadelphia University Faculty of Information Technology Department of Computer Science --- Semester, 2007/2008. Course Syllabus

Chapter 2 Distributed Information Systems Architecture

LECTURE1: PRINCIPLES OF DATABASES

Course Name: Database Systems - 1 Course Code: IS211

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 1-1

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system

DISTRIBUTED COMPUTING

Distributed Database Management Systems. Data and computation are distributed over different machines Different levels of complexity

Outline. Quick Introduction to Database Systems. Data Manipulation Tasks. What do they all have in common? CSE142 Wi03 G-1

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CS564: Database Management Systems. Lecture 1: Course Overview. Acks: Chris Ré 1

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

MIT Database Management Systems Lesson 01: Introduction

Kingdom of Saudi Arabia Ministry of Higher Education College of Computer & Information Sciences Majmaah University. Course Profile

COURSE OUTLINE. IST 253 Database Concept 3 Course Number Course Title Credits

ECE519 Advanced Operating Systems

Database Management Systems MIT Lesson 01 - Introduction By S. Sabraz Nawaz

CSC 407 Database System I COURSE PARTICULARS COURSE INSTRUCTORS COURSE DESCRIPTION

Transcription:

Outline Introduction to the Course Introduction Distributed DBMS Architecture Distributed Database Design Query Processing Transaction Management Issues in Distributed Databases and an Example Objectives of the course, deadlines, exam Transparency Intuitively Definition of Distributed Computing and Distributed Databases Transparency in Distributed Databases Distributed Concurrency Control Distributed DBMS Reliability Parallel Database Systems Spring, 2006 Arturas Mazeika Page 2 Issues in Distributed Databases An Example of a Distributed Database Architecture and Design How to fragment data How to replicate data Query and Transaction Management Convert user transactions into data manipulation queries Convert user queries into parallel queries Optimize queries Concurrency Control Synchronization of concurrent accesses Deadlock Management Reliability How to make the system resilient to failures Atomicity and Durability Database consists of two relations: employees and projects The relations are distributed What are the problems with queries, transactions, concurrency, and reliability? Spring, 2006 Arturas Mazeika Page 3 Spring, 2006 Arturas Mazeika Page 4

Objectives Teaching Format Course: Present and compare the key aspects of the distributed databases. I expect a high participation of the students in the course Labs: Select a topic of the distributed databases or extend your Data Warehousing and Data Mining project Lectures Project-based programming in the labs Spring, 2006 Arturas Mazeika Page 5 April 24th. The course starts Deadlines May 1st (one week) The students forms the groups, choose a mini-project, and defines the problem. The groups should consists of two to four people. The group sends an email to the professor of the course with the list of the members (together with their email addresses) and the problem definition June 8th (five weeks) The presentation of the projects. The group makes a five minutes presentation on the project. The presentation should include (i) the problem definition, (ii) example of the extended algorithm June 9th (five weeks) Mini-project report deadline. The students must send a PDF version of the report to the professor of the course. Also the students should hand in a hard copy of the report to the secretaries of the faculty. The report must define the problem (0.5-1 page), give a solution (0.5-1 page), illustrate the solution by examples (2-3 pages), give a manual to the command line interface (0.5-1 page) Spring, 2006 Arturas Mazeika Page 6 Exam Organization of the Exam: The exam is oral. The exam consists of two parts: (i) the defense of the project, and (ii) a question from the DDB course. The student will have to explain his project by an example. The student will have to explain the aspect of the distributed databases and show the positive and negative characteristics of the aspect Each student will be examined individually for 15 minutes This is an open book exam, i.e. the student is allows to use any material in the exam. However, the student should freely operate with the concepts and know definitions of the course Grading of the Exam: The student can earn up to 18 points for the project part (60 per cent of the grade) The student can earn up to 12 points for the theory part (40 per cent of the grade) The students of a group can earn up to 6 points for the presentation in the DB seminar series (optional) The final grade is a sum of 1 3. However, the maximum grade for the course is 30 points (+cum laude) Spring, 2006 Arturas Mazeika Page 7 Spring, 2006 Arturas Mazeika Page 8

Exam Cont. Data Independence: Evolution of Programs 1/3 Our Advice: Prepare a plan for each question of the theory part. Absence of a plan typically introduces some degree of disorder in the presentation, and lowers the final grade Answer the questions in the exam precisely and shortly. Giving all information that is just relevant to the asked question does not increase the grade of the exam The successful pass of the exam depends on two things: the knowledge of the course, and the ability to present the knowledge. Make sure you learn the subject and learn how to present the subject Programs can store data in regular files and achieve data independence (transparency) Spring, 2006 Arturas Mazeika Page 9 Data Independence: Evolution of Programs 2/3 Spring, 2006 Arturas Mazeika Page 10 Data Independence: Evolution of Programs 3/3 Programs can store data with a help of DBMS and fully achieve data independence (transparency) Goal: achieve data distribution transparency Integration of databases and networking does not mean centralization (in fact quite opposite) Spring, 2006 Arturas Mazeika Page 11 Spring, 2006 Arturas Mazeika Page 12

Definition of a Distributed Computation Synonymous Terms A distributed database is a collection of autonomous processing elements that are interconnected by a computer network. The elements cooperate in order to perform the assigned task Synonymous terms: distributed function distributed data processing multiprocessors/multicomputers satellite processing back-end processing dedicated/special purpose computers timeshared systems functionally modular systems The term distributed is very broadly used. The exact meaning of the word depends on the context Spring, 2006 Arturas Mazeika Page 13 What Can Be Distributed? Spring, 2006 Arturas Mazeika Page 14 Definition of DDB and DDBMS The following can be distributed: Processing logic Functions Data Control A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network A distributed database management system (DDBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users I ll use the term DDBMS and DDBS interchangeably Spring, 2006 Arturas Mazeika Page 15 Spring, 2006 Arturas Mazeika Page 16

What is not a DDBS? DDBS Environment Shared Memory Shared Disk Shared Nothing Central Databases These systems are parallel database systems and are quite different (though related to) distributed DB systems Spring, 2006 Arturas Mazeika Page 17 Implicit Assumptions Spring, 2006 Arturas Mazeika Page 18 Applications Data stored at a number of sites each site logically consists of a single processor Processors at different sites are interconnected by a computer network (we do not consider multiprocessors in DDBMS, cf. parallel systems) DDBS is a database, not a collection of files (cf. relational data model). Placement and query of data is impacted by the access patterns of the user DDBMS is a collections of DBMSes (not a remote file system) Manufacturing, especially multi-plant manufacturing Military command and control Airlines Hotel chains Any organization which has a decentralized organization structure Spring, 2006 Arturas Mazeika Page 19 Spring, 2006 Arturas Mazeika Page 20

Motivation for DDBS Motivation for DDBS (Higher Reliability) Distributed Database Systems delivers: Higher reliability Partially improved performance System expansion Transparency Replication components No single points of failure Communications links Spring, 2006 Arturas Mazeika Page 21 Motivation for DDBS (Partial Improved Performance) Spring, 2006 Arturas Mazeika Page 22 Motivation for DDBS (System Expansion) Proximity of data to its points of use Requires some support for fragmentation and replication Parallelism in execution Inter-query parallelism Intra-query parallelism Update and read-only queries influences the design of DDBS substantially Issue is database scaling Emergence of microprocessor and workstation technologies Data communication cost versus telecommunication cost Increasing database size Spring, 2006 Arturas Mazeika Page 23 Spring, 2006 Arturas Mazeika Page 24

Motivation for DDBS (Transparency) Transparency User Wants to See One Database Programmer Sees Many Databases Transparent system hides the implementation details from the users Distribution transparency Replication transparency Fragmentation transparency Network transparency Location transparency Naming transparency Transaction transparency Concurrency transparency Failure transparency Performance transparency Spring, 2006 Arturas Mazeika Page 25 Transparency (Distribution Transparency) Spring, 2006 Arturas Mazeika Page 26 Transparency (Naming Transparency 1/3) Distribution transparency allows user to perceive database as single, logical entity If DDBMS exhibits distribution transparency, user does not need to know: data is fragmented (fragmentation transparency) location of data items (location transparency) With replication transparency, user is unaware of replication of fragments Each item in a DDB must have a unique name DDBMS must ensure that no two sites create a database object with same name One solution is to create central name server. However, this results in: loss of some local autonomy central site may become a bottleneck low availability. If the central site fails remaining sites cannot create any new objects Spring, 2006 Arturas Mazeika Page 27 Spring, 2006 Arturas Mazeika Page 28

Transparency (Naming Transparency 2/3) Transparency (Naming Transparency 3/3) Alternative solution - prefix object with identifier of site that created it For example, branch created at site S1 might be named S1.BRANCH Also need to identify each fragment and its copies Thus, copy 2 of fragment 3 of Branch created at site S1 might be referred to as S1.BRANCH.F3.C2 An approach that resolves these problems uses aliases for each database object Thus, S1.BRANCH.F3.C2 might be known as Local Branch by user at site S1 DDBMS has task of mapping an alias to appropriate database object Spring, 2006 Arturas Mazeika Page 29 Transparency (Transaction Transparency) Spring, 2006 Arturas Mazeika Page 30 Transparency (Concurrency Transparency 1/2) Ensures that all distributed transactions maintain integrity and consistency of the DDB Distributed transaction accesses data at more than one location Each transaction is divided into a number of sub-transactions (a sub-transaction for each site that has relevant data) DDBMS must ensure the indivisibility of both the global transaction and each of the sub-transactions All transactions must execute independently and be logically consistent. The result of a set of the (parallel) execution should be the same, as if the transactions were executed in some arbitrary serial order Same fundamental principles as for centralized DBMS DDBMS must ensure both global and local transactions do not interfere with each other Similarly, DDBMS must ensure consistency of all sub-transactions of global transaction Spring, 2006 Arturas Mazeika Page 31 Spring, 2006 Arturas Mazeika Page 32

Transparency (Concurrency Transparency 2/2) Transparency (Failure Transparency) Replication makes concurrency more complex If a copy of a replicated data item is updated, update must be propagated to all copies Could propagate changes as part of original transaction, making it an atomic operation However, if one site holding copy is not reachable, then transaction is delayed until site is reachable Could limit update propagation to only those sites currently available. Remaining sites updated when they become available again DDBMS must ensure atomicity and durability of the global transaction, i.e., the sub-transactions of the global transaction either all commit or all abort Thus, DDBMS must synchronize global transaction to ensure that all sub-transactions have completed successfully before recording a final COMMIT for the global transaction The solution should be robust in presence of site and network failures Could allow updates to copies to happen asynchronously, sometime after the original update. Delay in regaining consistency may range from a few seconds to several hours Spring, 2006 Arturas Mazeika Page 33 Transparency (Performance Transparency 1/2) Spring, 2006 Arturas Mazeika Page 34 Transparency (Performance Transparency 1/2) Distributed Query Processor (DQP) maps data request into an ordered sequence of operations on local databases DQP must consider fragmentation, replication, and allocation schemas DDBMS must perform as if it were a centralized DBMS DDBMS should not suffer any performance degradation due to the distributed architecture DDBMS should determine most cost-effective strategy to execute a request DQP has to decide: which fragment to access which copy of a fragment to use which location to use DQP produces execution strategy optimized with respect to some cost function Typically, costs associated with a distributed request include: I/O cost CPU cost communication cost Spring, 2006 Arturas Mazeika Page 35 Spring, 2006 Arturas Mazeika Page 36

Complicating Factors Related Technologies/Problems Complexity Cost Security Integrity control more difficult Lack of standards Lack of experience Database design more complex Distributed database design Distributed query processing Distributed directory management Distributed concurrency control Distributed deadlock management Reliability Heterogeneous databases Spring, 2006 Arturas Mazeika Page 37 Relationship Between Issues Spring, 2006 Arturas Mazeika Page 38 Conclusion A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network Data stored at a number of sites, the sites are connected by a network. DDB supports the relational model. DDB is not a remote file system Transparent system hides the implementation details from the users Distribution transparency Network transparency Transaction transparency Performance transparency Programming a distributed database involves: Distributed database design Distributed query processing Distributed directory management Distributed concurrency control Distributed deadlock management Reliability Spring, 2006 Arturas Mazeika Page 39 Spring, 2006 Arturas Mazeika Page 40