B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

Size: px
Start display at page:

Download "B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2"

Transcription

1 Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance, increase availability, distributed access to data, analysis of distributed data and so on. To meet the complex requirement of users, the modern database system today operate with the architecture where multiple CPUs are working parallel to provide the complex database services. In some of the architectures, multiple CPUs are working in parallel and are physically located in closed environment, in the same building and communicating at very high speed. The databases operating in such a environment are called Parallel Databases. In parallel database system, multiple CPUs work in parallel to improve performance through parallel implementation of various operations such as loading data, building indexes and evaluating queries. Parallel processing divides a large task into many smaller task and execute the smaller tasks concurrently on several CPUs. As a result the larger task complete more quickly. Parallel database system improve the processing and I/O speed by using multiple CPUs and disks working in parallel. The parallel databases are essentially useful for applications that have to query large databases and process large number of transactions per second. In parallel processing many operations are performed simultaneously, as opposed to the centralized processing, in which serial computation is performed. The goal of Parallel Database System :- To ensure that the database system can continue to perform at one acceptable speed, even as the size of database and the number of transactions increases. And this can be done by increasing the capacity of the system by increasing the parallelism provides a smoother path for growth for an enterprise then does replacing a centralized system by a faster machine. The parallel database systems are usually designed to provide a best costperformance and they are quit uniform in site machine architecture. The cooperation between site machines is usually achieved at the level of the transaction module of the database system. Parallel database system represents an attempt to construct a faster centralized computer using several small CPUs. Page # 1

2 Why do we need Parallel Database? More and More Data! We have databases that hold a high amount of data, in the order of 1012 bytes: 10,000,000,000,000 bytes! Faster and Faster Access! We have data applications that need to process data at very high speeds: 10,000s transactions per second! Advantages of Parallel Database System :- Increase Throughput (Scale-Up). Increase Response time (Speed-Up) Useful to the application to query extremely large databases and to process an extremely large number of transactions rate (in order of thousands of transactions per second). Increase availability of the system. Greater flexibility. Possible to serve a large number of users. Disadvantages of Parallel Database System :- More Start-Up Cost. Interface Problem. How to Measure the benefits? o Speed-Up. As you multiply resources by a certain factor, the time taken to execute a transaction should be reduced by the same factor: 10 seconds to scan a DB of 10,000 records using 1 CPU 1 second to scan a DB of 10,000 records using 10 CPUs o Scale-Up. As you multiply resources the size of a task that can be executed In a given time should be increased by the same factor. 1 second to scan a DB of 1,000 records using 1 CPU 1 second to scan a DB of 10,000 records using 10 CPUs Page # 2

3 Shared-Memory Multiple CPU :- Shared-Disk Multiple CPU :- Shared-Nothing Multiple CPU :- Architecture of Parallel Database 1. Shared-Memory Multiple CPU :- In this system a computer has multiple simultaneously active CPUs that are attached to an interconnected network and can share a single MAIN MEMORY. Thus in this architecture a single copy of a multithreaded Operating System and multithreaded DBMS can support multiple CPUs. This architecture of Parallel Database System is closest to the traditional single CPU processer of centralized database system, but much faster in performance as compare to the single CPU of the same power. Benefits of Shared-Memory Architecture :- Communication between CPUs is extremely efficient. Data can be access by any CPU without being moved with software. A CPU can send a message to the other CPU much faster by using memory writes, which usually takes less then a microsecond, then by sending a message through a communication mechanism. The communication overhead are low, because of main memory can be used for this purpose and operating services can be used to utilize the additional CPUs. Limitations of Shared-Memory Architecture :- Page # 3

4 Memory access uses a very high speed mechanism that is difficult to partition without losing efficiency. Thus the design must take the special type of different CPUs have equal access to a common memory. Since the communication bus or interconnection network is shared by all CPUs, this architecture is not capable beyond 80 or 100 CPUs in parallel. The bus and interconnection network become a bottleneck as the number CPUs increase. The addition of more CPUs causes CPUs to spend time waiting for their turn on the bus to access memory. 2. Shared-Disk Multiple CPU :- In this system multiple CPUs are attached to an interconnection network and each CPU has its own memory but all of them have access to the same disk storage or more commonly to the shared array of disk. The scalability of the system is largely determined by the capacity and the throughput of the interconnection network. Since the main memory is not shared among the CPU, each machine has its own OS and its own DBMS. It is possible that with the same data accessible, two or more nodes want to read or write the same data at the same time. Therefore the global locking scheme is require to preservation of the data integrity. Benefits of Shared-Memory Architecture :- Easy to load balance, because data does not have to be permanently divided among available CPUs. Since each CPU has its own memory, the memory bus is not a bottleneck. Page # 4

5 It offers a low cost solution to provide a degree of fault tolerance. In this case of a CPU or memory failure, the other CPUs take over its task; since the database is resident on disk that is accessible from all CPUs. It has found acceptance in wide applications. Limitations of Shared-Memory Architecture :- It is also facing the problems of interface and memory contention bottleneck as the number of CPUs increase. As more CPUs are added, the existing CPUs are slow down because of the increased contention for memory accesses and network bandwidth. It is also having the problem of scalability. The interconnection to the disk subsystem becomes bottleneck, particularly when the database makes the large number of access to the disk. 3. Shared-Nothing Multiple CPU :- In this system multiple CPUs are attached with interconnecting network and each CPU has a local memory and local disk storage, but no two CPU can access the same storage area. All communication between CPUs is through a high-speed interconnection network. Thus the shared nothing environments involve no sharing on memory or disk. Each CPU has its own copy of OS and its own copy of DBMS and its own copy of a portion of a data managed by DBMS. In this type of architecture CPUs sharing responsibilities for database services usually split up the data among themselves. CPUs then perform the transactions and queries by dividing up the work and communicating by messages over the high speed network. Page # 5

6 Benefits of Shared-Memory Architecture :- This architecture minimized the connection of CPUs by not sharing resources and therefore offers a high degree of scalability. Since local disk references are serviced by local disk ay each CPU, this architecture overcomes the limitations of requiring all I/O to go through a single interconnection network. Only queries accesses to non-local disk and result relation pass through the network. The interconnection network for this architecture is usually designed to be scalable. Thus adding more CPUs and more disks enable the system grow in a manner that is divided the power and the capacity of the newly added component. In other words the shared-nothing architecture provides linear Speed-Up and linear Scale-Up. Linear Speed-Up and Scale-Up properties increase the transmission capacity of shared-nothing architecture as more nodes are added and therefore, it can easily support the large number of CPUs. Limitations of Shared-Memory Architecture :- Shared nothing architecture are difficult to load-balance. In many multi CPU environments, it necessary to split the system work loads in some way so that all Page # 6

7 system resources are being used efficiently. Proper splitting or balancing workload across the shared nothing system requires an administrator to properly partition or divide the data across the various disks. In practice this is difficult to achieve. Adding a new CPU and disk to Shared-Nothing Architecture means the data may needed to be redistributed in order to make advantage of the new resources and thus require more extensive reorganization of DBMS. The cost of communication and non-local disk access are higher than in Shared-Disk or Shared-Memory architecture because of sending data involves software interaction at both the ends. The high speed network is limited in size, because of speed-of light consideration. This leads to the requirement that a parallel architecture has CPUs that are physically closed together. It requires an OS that is capable of accommodating the heavy amount of messaging that are require to support the inter processor communication. Key Elements of Parallel Database Processing Speed-Up :- Scale-Up :- Synchronization :- Locking :- 1. Speed-Up :- Speed-Up is the property in which the time taken for performing the task decreases in case of increasing the number of CPUs. In other word Speed-Up is the property of running a given task in less time by increasing the degree of parallelism (more number of hardware). With additional hardware, Speed-Up holds the task constant and measure the time saved. Thus, Speed-Up enables user to improve the system response time for their queries, assuming the size of their database remain the same. Speed-Up = To = Execution time of a task on the original or smaller machine (or original processing time) Tp = execution time of the same task on parallel or larger machine (or parallel processing time). Page # 7

8 Here the original processing time To is the time spent by a centralized system or small system on the given task. And the parallel processing time Tp is the time spent by large system or Parallel System on the same task. Consider a database application running on a parallel system with a certain number of CPUs and disks. Now suppose the size of system is increase by increasing the number of CPUs, disks and other hardware components. The goal is to process the task in time inversely proportional to the number of CPUs and disk allocated. If original system takes 60 seconds to perform the task and the parallel system (with double capacity) takes 30 seconds to complete the same task then the value of Speed-Up = 60/30 = 2. The Speed-Up value 2 is indicates the Linear Speed-Up. If the Speed-Up is N when the larger system has N times the resources of the smaller system. If the Speed-Up value is less then N then the system is said to demonstrate Sub Linear Speed-Up. 2. Scale-Up :- Scale-Up is the property in which the performance of the parallel database is sustained if the number of CPU and disk are increased in proportional to the amount of data. In other word, Scale-Up is the ability of handling the large task by increasing the degree of parallelism, in the same time period as the original system. With added hardware the formula for Scale-Up holds the time constant and measure the increase size of task. Page # 8

9 Thus the Scale-Up enable users to increase the size of their database while maintaining the same response time. Vp = Parallel or Large Processing Volume. Vo =Original or Small Processing Volume. Here the Original Processing Volume is the transaction volume process in the given amount of time on a smaller system. Parallel Processing Volume is the transaction volume process in the given amount of time on a larger system. For Example, if the original system can process 3000 transactions in given amount of time and if the parallel system can process 6000 transactions in the same amount of time then the Scale-Up = 6000/3000 = 2. The Scale-Up value 2 is an indication of the Linear Scale-Up, which means that the twice as much of hardware can process twice the data volume in same amount of time. If the Scale-Up value is less then 2 then it is called Sub Linear Scale-Up. That means as much of times we increase the resources of the parallel system, the value of Linear Scale-Up will also be increase that much of times. 3. Synchronization :- Synchronization is the coordination of the current task. Page # 9

10 For a successful operation of the parallel database system, the task should be divided such that the synchronization requirement is less. It is necessary for the correctness. With less synchronization requirement better speed-up and scale-up can be achieved. The amount of synchronization depends on the amount of resources and the number of users and the task working on the resources. More synchronization is requiring coordinating large number of concurrent tasks. 4. Locking :- Locking is a method of synchronizing current task., Both internal as well as external locking mechanisms are used for synchronization of tasks that are required by the parallel database system. For external locking, a distributed lock manager (DLM) is used, which is apart of the OS. DLM coordinate the resources sharing between communication nodes running a parallel server. The instances of parallel server use the DLM to communicate with each other and coordinate modification of database resources. The DLM allows application to synchronize access to resources such as data, software and devices, so that current requests for the same resource are coordinate between applications running on different nodes. Intra-Query Parallelism :- Inter-Query Parallelism :- Intra-Operation Parallelism :- Inter-Operation Parallelism :- Query Parallelism 1. Intra-Query Parallelism :- Intra-Query Parallelism refers to the execution of single query in parallel on multiple CPUs using Shared-Nothing Architecture Technique. It is sometimes called Parallel Query Processing. For example, suppose a table has been partitioned across multiple disks by range partitioning on some attribute and now user want to perform SORT on the partitioning attribute. The SORT operation can be implemented by sorting each portion in parallel, then concatenating the sorted portions to get the final sorted relation. Thus a query can be parallelized by parallelizing individual operations. Page # 10

11 B.H.GARDI COLLEGEE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2 Advantages :- 2. Inter-Query Parallelism :- Intra-Query Parallelism Speeds Up long running queries. They are beneficial for decision support applications that issues complex, read- only queries, including queries involving multiple JOINs. In Inter-Query Parallelism multiple transactions are executed in parallel, One by each CPU. It sometimes also called as Parallel Transaction Processing. The primary use of Inter-Query Parallelism is to Scale-Up a Transaction Processing system to support a large number of transactions per second. To support a Inter-Query Parallelism DBMS generally uses a task or transaction dispatching. Efficient lock management is another method to used by DBMS to support Inter-Query Parallelism, particularly in Shared-Disk Architecture. Since in Inter-Query Parallelism each query is run sequentially, it does not help in speeding up in long running query. In such a case DBMS must understand the locks held by different transactions executing on different CPUs in order to preserve data integrity. Inter-Query Parallelism on Shared-Disk architecture perform best when transactions that execute in parallel do not access the same disk. Advantages :- Page # 11

12 Easiest form of parallelism to support in a database system, particularly in Shared-Disk Parallel System. It Scale-Up a transaction processing system to support a large number of transactions per second. Disadvantages :- Response time of individual transaction is no faster then they would be if the transaction were run in isolation. It is more complicated in Shared-Memory and Shared-Nothing Architectures. 3. Intra-Operation Parallelism :- In Intra-Query Parallelism of each individual operation of a task, such as sorting, projection, join and so on. Since the number of operations in a typical query small, compared to the number of tuple processed by each operation, Intra-Operation Parallelism scales better with increasing parallelism. Advantages :- Inter-Operation Parallelism is natural in a Database. Degree of Parallelism is potentially enormous. 4. Inter-Operation Parallelism :- In Inter-Operation Parallelism, the different operations in a query expression are executed in parallel. Following two types of Inter-Operation Parallelism are used : Pipelined Parallelism :- Independent Parallelism :- 1. Pipelined Parallelism :- In this parallelism output tuple of one operation A are consumed by second operation B, even before the first operation has produced the entire set of tuple in its output. Thus it is possible to run operation A and B simultaneously in different processors, so that the operation B consumes tuple in parallel with operation A producing them. Advantages :- o Pipelined parallelism useful with smaller number of CPUs. o Also pipelined execution avoids writing intermediate result to disk. Page # 12

13 Disadvantages :- o It does not Scale-Up well. o Pipelined chain does not attain sufficient length to provide a high degree of parallelism. o It is not possible to pipeline relational operators that do not produce output until all inputs have been accessed. o Only marginal Speed-Up is obtained for the frequent case in which one operation s cost is much higher then the others. 2. Independent Parallelism :- In an independent parallelism the operations in query expression that do not depend on one other can be execute in parallel. Advantages :- o It is useful with a lower degree of parallelism. Disadvantages :- o Like pipelined parallelism, independent parallelism does not provide a high degree of parallelism so it is less useful in highly parallel system. Page # 13

Chapter 17: Parallel Databases

Chapter 17: Parallel Databases Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems

More information

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!

More information

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Chapter 20: Parallel Databases. Introduction

Chapter 20: Parallel Databases. Introduction Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery

More information

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of

More information

Architecture and Implementation of Database Systems (Winter 2014/15)

Architecture and Implementation of Database Systems (Winter 2014/15) Jens Teubner Architecture & Implementation of DBMS Winter 2014/15 1 Architecture and Implementation of Database Systems (Winter 2014/15) Jens Teubner, DBIS Group jens.teubner@cs.tu-dortmund.de Winter 2014/15

More information

It also performs many parallelization operations like, data loading and query processing.

It also performs many parallelization operations like, data loading and query processing. Introduction to Parallel Databases Companies need to handle huge amount of data with high data transfer rate. The client server and centralized system is not much efficient. The need to improve the efficiency

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture X: Parallel Databases Topics Motivation and Goals Architectures Data placement Query processing Load balancing

More information

Advanced Databases: Parallel Databases A.Poulovassilis

Advanced Databases: Parallel Databases A.Poulovassilis 1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger

More information

CSE 544: Principles of Database Systems

CSE 544: Principles of Database Systems CSE 544: Principles of Database Systems Anatomy of a DBMS, Parallel Databases 1 Announcements Lecture on Thursday, May 2nd: Moved to 9am-10:30am, CSE 403 Paper reviews: Anatomy paper was due yesterday;

More information

Huge market -- essentially all high performance databases work this way

Huge market -- essentially all high performance databases work this way 11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch

More information

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction Parallel machines are becoming quite common and affordable Prices of microprocessors, memory and disks have dropped sharply Recent desktop computers feature

More information

Computer-System Organization (cont.)

Computer-System Organization (cont.) Computer-System Organization (cont.) Interrupt time line for a single process doing output. Interrupts are an important part of a computer architecture. Each computer design has its own interrupt mechanism,

More information

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE RAID SEMINAR REPORT 2004 Submitted on: Submitted by: 24/09/2004 Asha.P.M NO: 612 S7 ECE CONTENTS 1. Introduction 1 2. The array and RAID controller concept 2 2.1. Mirroring 3 2.2. Parity 5 2.3. Error correcting

More information

CS352 Lecture: Database System Architectures last revised 11/22/06

CS352 Lecture: Database System Architectures last revised 11/22/06 CS352 Lecture: Database System Architectures last revised 11/22/06 I. Introduction - ------------ A. Most large databases require support for accesing the database by multiple users, often at multiple

More information

CSE544 Database Architecture

CSE544 Database Architecture CSE544 Database Architecture Tuesday, February 1 st, 2011 Slides courtesy of Magda Balazinska 1 Where We Are What we have already seen Overview of the relational model Motivation and where model came from

More information

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

OPERATING SYSTEM. Functions of Operating System:

OPERATING SYSTEM. Functions of Operating System: OPERATING SYSTEM Introduction: An operating system (commonly abbreviated to either OS or O/S) is an interface between hardware and user. OS is responsible for the management and coordination of activities

More information

Advanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch

Advanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch Advanced Databases Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch www.mniazi.ir Parallel Join The join operation requires pairs of tuples to be

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in

More information

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM Module III Overview of Storage Structures, QP, and TM Sharma Chakravarthy UT Arlington sharma@cse.uta.edu http://www2.uta.edu/sharma base Management Systems: Sharma Chakravarthy Module I Requirements analysis

More information

PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 INTRODUCTION In centralized database: Data is located in one place (one server) All DBMS functionalities are done by that server

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

Chapter 20: Database System Architectures

Chapter 20: Database System Architectures Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing This document consists of two parts. The first part introduces basic concepts and issues that apply generally in discussions of parallel computing. The second part consists

More information

Distributed KIDS Labs 1

Distributed KIDS Labs 1 Distributed Databases @ KIDS Labs 1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Appears to user as a single system Database

More information

Chapter 18: Parallel Databases Chapter 19: Distributed Databases ETC.

Chapter 18: Parallel Databases Chapter 19: Distributed Databases ETC. Chapter 18: Parallel Databases Chapter 19: Distributed Databases ETC. Introduction Parallel machines are becoming quite common and affordable Prices of microprocessors, memory and disks have dropped sharply

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.

More information

COURSE 12. Parallel DBMS

COURSE 12. Parallel DBMS COURSE 12 Parallel DBMS 1 Parallel DBMS Most DB research focused on specialized hardware CCD Memory: Non-volatile memory like, but slower than flash memory Bubble Memory: Non-volatile memory like, but

More information

Parallel Databases C H A P T E R18. Practice Exercises

Parallel Databases C H A P T E R18. Practice Exercises C H A P T E R18 Parallel Databases Practice Exercises 181 In a range selection on a range-partitioned attribute, it is possible that only one disk may need to be accessed Describe the benefits and drawbacks

More information

Parallel Computing Concepts. CSInParallel Project

Parallel Computing Concepts. CSInParallel Project Parallel Computing Concepts CSInParallel Project July 26, 2012 CONTENTS 1 Introduction 1 1.1 Motivation................................................ 1 1.2 Some pairs of terms...........................................

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

Lecture 23 Database System Architectures

Lecture 23 Database System Architectures CMSC 461, Database Management Systems Spring 2018 Lecture 23 Database System Architectures These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used

More information

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due

More information

Performance Monitoring

Performance Monitoring Performance Monitoring Performance Monitoring Goals Monitoring should check that the performanceinfluencing database parameters are correctly set and if they are not, it should point to where the problems

More information

Segregating Data Within Databases for Performance Prepared by Bill Hulsizer

Segregating Data Within Databases for Performance Prepared by Bill Hulsizer Segregating Data Within Databases for Performance Prepared by Bill Hulsizer When designing databases, segregating data within tables is usually important and sometimes very important. The higher the volume

More information

MEMORY MANAGEMENT. Jo, Heeseung

MEMORY MANAGEMENT. Jo, Heeseung MEMORY MANAGEMENT Jo, Heeseung TODAY'S TOPICS Why is memory management difficult? Old memory management techniques: Fixed partitions Variable partitions Swapping Introduction to virtual memory 2 MEMORY

More information

Chapter 8: Virtual Memory. Operating System Concepts

Chapter 8: Virtual Memory. Operating System Concepts Chapter 8: Virtual Memory Silberschatz, Galvin and Gagne 2009 Chapter 8: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating

More information

Design of Parallel Algorithms. Course Introduction

Design of Parallel Algorithms. Course Introduction + Design of Parallel Algorithms Course Introduction + CSE 4163/6163 Parallel Algorithm Analysis & Design! Course Web Site: http://www.cse.msstate.edu/~luke/courses/fl17/cse4163! Instructor: Ed Luke! Office:

More information

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22

More information

Module 6: INPUT - OUTPUT (I/O)

Module 6: INPUT - OUTPUT (I/O) Module 6: INPUT - OUTPUT (I/O) Introduction Computers communicate with the outside world via I/O devices Input devices supply computers with data to operate on E.g: Keyboard, Mouse, Voice recognition hardware,

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Four-Socket Server Consolidation Using SQL Server 2008

Four-Socket Server Consolidation Using SQL Server 2008 Four-Socket Server Consolidation Using SQL Server 28 A Dell Technical White Paper Authors Raghunatha M Leena Basanthi K Executive Summary Businesses of all sizes often face challenges with legacy hardware

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

Definition of RAID Levels

Definition of RAID Levels RAID The basic idea of RAID (Redundant Array of Independent Disks) is to combine multiple inexpensive disk drives into an array of disk drives to obtain performance, capacity and reliability that exceeds

More information

IMPROVING THE PERFORMANCE, INTEGRITY, AND MANAGEABILITY OF PHYSICAL STORAGE IN DB2 DATABASES

IMPROVING THE PERFORMANCE, INTEGRITY, AND MANAGEABILITY OF PHYSICAL STORAGE IN DB2 DATABASES IMPROVING THE PERFORMANCE, INTEGRITY, AND MANAGEABILITY OF PHYSICAL STORAGE IN DB2 DATABASES Ram Narayanan August 22, 2003 VERITAS ARCHITECT NETWORK TABLE OF CONTENTS The Database Administrator s Challenge

More information

Parallel Query Optimisation

Parallel Query Optimisation Parallel Query Optimisation Contents Objectives of parallel query optimisation Parallel query optimisation Two-Phase optimisation One-Phase optimisation Inter-operator parallelism oriented optimisation

More information

Outline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs.

Outline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs. Parallel Database Systems STAVROS HARIZOPOULOS stavros@cs.cmu.edu Outline Background Hardware architectures and performance metrics Parallel database techniques Gamma Bonus: NCR / Teradata Conclusions

More information

CPS352 Lecture: Database System Architectures last revised 3/27/2017

CPS352 Lecture: Database System Architectures last revised 3/27/2017 CPS352 Lecture: Database System Architectures last revised 3/27/2017 I. Introduction - ------------ A. Most large databases require support for accesing the database by multiple users, often at multiple

More information

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 CSC 261/461 Database Systems Lecture 20 Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 Announcements Project 1 Milestone 3: Due tonight Project 2 Part 2 (Optional): Due on: 04/08 Project 3

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected

More information

Database Server. 2. Allow client request to the database server (using SQL requests) over the network.

Database Server. 2. Allow client request to the database server (using SQL requests) over the network. Database Server Introduction: Client/Server Systems is networked computing model Processes distributed between clients and servers. Client Workstation (usually a PC) that requests and uses a service Server

More information

Database Technology Database Architectures. Heiko Paulheim

Database Technology Database Architectures. Heiko Paulheim Database Technology Database Architectures Today So far, we have treated Database Systems as a black box We can define a schema...and write data into it...and read data from it Today Opening the black

More information

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 23 Hierarchical Memory Organization (Contd.) Hello

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Distributed Databases

Distributed Databases Distributed Databases These slides are a modified version of the slides of the book Database System Concepts (Chapter 20 and 22), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan. Original slides

More information

CA485 Ray Walshe Google File System

CA485 Ray Walshe Google File System Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage

More information

CS307: Operating Systems

CS307: Operating Systems CS307: Operating Systems Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building 3-513 wuct@cs.sjtu.edu.cn Download Lectures ftp://public.sjtu.edu.cn

More information

First-In-First-Out (FIFO) Algorithm

First-In-First-Out (FIFO) Algorithm First-In-First-Out (FIFO) Algorithm Reference string: 7,0,1,2,0,3,0,4,2,3,0,3,0,3,2,1,2,0,1,7,0,1 3 frames (3 pages can be in memory at a time per process) 15 page faults Can vary by reference string:

More information

SYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID

SYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID System Upgrade Teaches RAID In the growing computer industry we often find it difficult to keep track of the everyday changes in technology. At System Upgrade, Inc it is our goal and mission to provide

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables

More information

Chapter 13: I/O Systems. Operating System Concepts 9 th Edition

Chapter 13: I/O Systems. Operating System Concepts 9 th Edition Chapter 13: I/O Systems Silberschatz, Galvin and Gagne 2013 Chapter 13: I/O Systems Overview I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations

More information

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps Interactive Scheduling Algorithms Continued o Priority Scheduling Introduction Round-robin assumes all processes are equal often not the case Assign a priority to each process, and always choose the process

More information

CompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy

CompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy CompSci 516 Database Systems Lecture 20 Parallel DBMS Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements HW3 due on Monday, Nov 20, 11:55 pm (in 2 weeks) See some

More information

Memory Management. Jo, Heeseung

Memory Management. Jo, Heeseung Memory Management Jo, Heeseung Today's Topics Why is memory management difficult? Old memory management techniques: Fixed partitions Variable partitions Swapping Introduction to virtual memory 2 Memory

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems DM510-14 Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS Performance 13.2 Objectives

More information

Best Practices for Setting BIOS Parameters for Performance

Best Practices for Setting BIOS Parameters for Performance White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page

More information

Chapter 3. Database Architecture and the Web

Chapter 3. Database Architecture and the Web Chapter 3 Database Architecture and the Web 1 Chapter 3 - Objectives Software components of a DBMS. Client server architecture and advantages of this type of architecture for a DBMS. Function and uses

More information

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic) I/O Systems Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) I/O Systems 1393/9/15 1 / 57 Motivation Amir H. Payberah (Tehran

More information

Current Topics in OS Research. So, what s hot?

Current Topics in OS Research. So, what s hot? Current Topics in OS Research COMP7840 OSDI Current OS Research 0 So, what s hot? Operating systems have been around for a long time in many forms for different types of devices It is normally general

More information

Increasing Performance for PowerCenter Sessions that Use Partitions

Increasing Performance for PowerCenter Sessions that Use Partitions Increasing Performance for PowerCenter Sessions that Use Partitions 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 14: Data Replication Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database Replication What is database replication The advantages of

More information

CSCI 4717 Computer Architecture

CSCI 4717 Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

SSDs vs HDDs for DBMS by Glen Berseth York University, Toronto

SSDs vs HDDs for DBMS by Glen Berseth York University, Toronto SSDs vs HDDs for DBMS by Glen Berseth York University, Toronto So slow So cheap So heavy So fast So expensive So efficient NAND based flash memory Retains memory without power It works by trapping a small

More information

Introduction. CS3026 Operating Systems Lecture 01

Introduction. CS3026 Operating Systems Lecture 01 Introduction CS3026 Operating Systems Lecture 01 One or more CPUs Device controllers (I/O modules) Memory Bus Operating system? Computer System What is an Operating System An Operating System is a program

More information

Chapter 12: I/O Systems

Chapter 12: I/O Systems Chapter 12: I/O Systems Chapter 12: I/O Systems I/O Hardware! Application I/O Interface! Kernel I/O Subsystem! Transforming I/O Requests to Hardware Operations! STREAMS! Performance! Silberschatz, Galvin

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS Performance Silberschatz, Galvin and

More information

Chapter 12: I/O Systems. Operating System Concepts Essentials 8 th Edition

Chapter 12: I/O Systems. Operating System Concepts Essentials 8 th Edition Chapter 12: I/O Systems Silberschatz, Galvin and Gagne 2011 Chapter 12: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

Heckaton. SQL Server's Memory Optimized OLTP Engine

Heckaton. SQL Server's Memory Optimized OLTP Engine Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part V Lecture 15, March 15, 2015 Mohammad Hammoud Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+ Tree) and Hash-based (i.e., Extendible

More information

SONAS Best Practices and options for CIFS Scalability

SONAS Best Practices and options for CIFS Scalability COMMON INTERNET FILE SYSTEM (CIFS) FILE SERVING...2 MAXIMUM NUMBER OF ACTIVE CONCURRENT CIFS CONNECTIONS...2 SONAS SYSTEM CONFIGURATION...4 SONAS Best Practices and options for CIFS Scalability A guide

More information

The Design and Optimization of Database

The Design and Optimization of Database Journal of Physics: Conference Series PAPER OPEN ACCESS The Design and Optimization of Database To cite this article: Guo Feng 2018 J. Phys.: Conf. Ser. 1087 032006 View the article online for updates

More information

Hierarchical Clustering: A Structure for Scalable Multiprocessor Operating System Design

Hierarchical Clustering: A Structure for Scalable Multiprocessor Operating System Design Journal of Supercomputing, 1995 Hierarchical Clustering: A Structure for Scalable Multiprocessor Operating System Design Ron Unrau, Orran Krieger, Benjamin Gamsa, Michael Stumm Department of Electrical

More information

Operating System Performance and Large Servers 1

Operating System Performance and Large Servers 1 Operating System Performance and Large Servers 1 Hyuck Yoo and Keng-Tai Ko Sun Microsystems, Inc. Mountain View, CA 94043 Abstract Servers are an essential part of today's computing environments. High

More information

Database Architectures

Database Architectures B0B36DBS, BD6B36DBS: Database Systems h p://www.ksi.m.cuni.cz/~svoboda/courses/172-b0b36dbs/ Lecture 11 Database Architectures Authors: Tomáš Skopal, Irena Holubová Lecturer: Mar n Svoboda, mar n.svoboda@fel.cvut.cz

More information

Peer-to-Peer Systems. Chapter General Characteristics

Peer-to-Peer Systems. Chapter General Characteristics Chapter 2 Peer-to-Peer Systems Abstract In this chapter, a basic overview is given of P2P systems, architectures, and search strategies in P2P systems. More specific concepts that are outlined include

More information

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Introduction to Indexing 2 Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Indexed Sequential Access Method We have seen that too small or too large an index (in other words too few or too

More information

Segmentation with Paging. Review. Segmentation with Page (MULTICS) Segmentation with Page (MULTICS) Segmentation with Page (MULTICS)

Segmentation with Paging. Review. Segmentation with Page (MULTICS) Segmentation with Page (MULTICS) Segmentation with Page (MULTICS) Review Segmentation Segmentation Implementation Advantage of Segmentation Protection Sharing Segmentation with Paging Segmentation with Paging Segmentation with Paging Reason for the segmentation with

More information

Data Processing on Modern Hardware

Data Processing on Modern Hardware Data Processing on Modern Hardware Jens Teubner, TU Dortmund, DBIS Group jens.teubner@cs.tu-dortmund.de Summer 2014 c Jens Teubner Data Processing on Modern Hardware Summer 2014 1 Part V Execution on Multiple

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 13 Virtual memory and memory management unit In the last class, we had discussed

More information

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking

More information

VERITAS Storage Foundation 4.0 TM for Databases

VERITAS Storage Foundation 4.0 TM for Databases VERITAS Storage Foundation 4.0 TM for Databases Powerful Manageability, High Availability and Superior Performance for Oracle, DB2 and Sybase Databases Enterprises today are experiencing tremendous growth

More information

King 2 Abstract: There is one evident area of operating systems that has enormous potential for growth and optimization. Only recently has focus been

King 2 Abstract: There is one evident area of operating systems that has enormous potential for growth and optimization. Only recently has focus been King 1 Input and Output Optimization in Linux for Appropriate Resource Allocation and Management James Avery King March 25, 2016 University of North Georgia Annual Research Conference King 2 Abstract:

More information