Classifying Physical Storage Media. Chapter 11: Storage and File Structure. Storage Hierarchy (Cont.) Storage Hierarchy. Magnetic Hard Disk Mechanism

Similar documents
Classifying Physical Storage Media. Chapter 11: Storage and File Structure. Storage Hierarchy. Storage Hierarchy (Cont.) Speed

Chapter 11: Storage and File Structure. Silberschatz, Korth and Sudarshan Updated by Bird and Tanin

CMSC 424 Database design Lecture 13 Storage: Files. Mihai Pop

Administração e Optimização Bases de Dados DEI-IST 2010/2011

Ch 11: Storage and File Structure

QUIZ: Is either set of attributes a superkey? A candidate key? Source:

Chapter 11: Storage and File Structure

Storage and File Structure. Classification of Physical Storage Media. Physical Storage Media. Physical Storage Media

Information Systems and Software Systems Engineering (12CFU)

Lecture 15 - Chapter 10 Storage and File Structure

File Structures and Indexing

Storage and File Structure

Chapter 10: Storage and File Structure

Chapter 14: Mass-Storage Systems. Disk Structure

CPSC 421 Database Management Systems. Lecture 11: Storage and File Organization

Storage and File Structure

CMSC 424 Database design Lecture 12 Storage. Mihai Pop

Disks, Memories & Buffer Management

Chapter 10 Storage and File Structure

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Database Management Systems need to:

Storing Data: Disks and Files

Chapter 12: Mass-Storage

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Chapter 7

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

Database Technology Database Architectures. Heiko Paulheim

Chapter 1 Disk Storage, Basic File Structures, and Hashing.

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

Outlines. Chapter 2 Storage Structure. Structure of a DBMS (with some simplification) Structure of a DBMS (with some simplification)

Mass-Storage Structure

Data Storage and Disk Structure

Chapter 14: Mass-Storage Systems

Database Systems. November 2, 2011 Lecture #7. topobo (mit)

L9: Storage Manager Physical Data Organization

Storing Data: Disks and Files

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

BBM371- Data Management. Lecture 2: Storage Devices

CS3600 SYSTEMS AND NETWORKS

Module 13: Secondary-Storage

Tape pictures. CSE 30341: Operating Systems Principles

Storing Data: Disks and Files

Storing Data: Disks and Files

Module 13: Secondary-Storage Structure

V. Mass Storage Systems

Disk Scheduling. Based on the slides supporting the text

CSE380 - Operating Systems

Chapter 10: Mass-Storage Systems

UNIT - 1 INTRODUCTION TO DATA BASE

Chapter 12: Mass-Storage Systems. Operating System Concepts 8 th Edition,

Chapter 10: Mass-Storage Systems

Database Technology. Topic 7: Data Structures for Databases. Olaf Hartig.

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

CS 405G: Introduction to Database Systems. Storage

Today: Secondary Storage! Typical Disk Parameters!

Storage Devices for Database Systems

CS3600 SYSTEMS AND NETWORKS

Principles of Data Management. Lecture #2 (Storing Data: Disks and Files)

I/O CANNOT BE IGNORED

Physical Database Design: Outline

Chapter 13: Mass-Storage Systems. Disk Scheduling. Disk Scheduling (Cont.) Disk Structure FCFS. Moving-Head Disk Mechanism

Chapter 13: Mass-Storage Systems. Disk Structure

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

I/O CANNOT BE IGNORED

Disk Scheduling. Chapter 14 Based on the slides supporting the text and B.Ramamurthy s slides from Spring 2001

UNIT III DATA STORAGE AND QUERY PROCESSING

Physical Storage Media

Address Accessible Memories. A.R. Hurson Department of Computer Science Missouri University of Science & Technology

Professor: Pete Keleher! Closures, candidate keys, canonical covers etc! Armstrong axioms!

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Database Applications (15-415)

Disks and Files. Storage Structures Introduction Chapter 8 (3 rd edition) Why Not Store Everything in Main Memory?

Advanced Database Systems

MASS-STORAGE STRUCTURE

Chapter 10: Mass-Storage Systems

Operating Systems. Operating Systems Professor Sina Meraji U of T

QUESTION BANK. SUBJECT CODE / Name: CS2255 DATABASE MANAGEMENT SYSTEM UNIT V. PART -A (2 Marks)

Disks & Files. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Database Systems II. Secondary Storage

CMSC424: Database Design. Instructor: Amol Deshpande

OPERATING SYSTEM. Chapter 12: File System Implementation

Parser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text.

CSE 190D Database System Implementation

Chapter 11: File System Implementation. Objectives

Associate Professor Dr. Raed Ibraheem Hamed

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

Chapter 11: Implementing File

STORING DATA: DISK AND FILES

Mass-Storage Systems. Mass-Storage Systems. Disk Attachment. Disk Attachment

File System Implementation

CSE325 Principles of Operating Systems. Mass-Storage Systems. David P. Duggan. April 19, 2011

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

CS143: Disks and Files

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Chapter 11: Implementing File Systems

Disk scheduling Disk reliability Tertiary storage Swap space management Linux swap space management

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page

Database Management Systems, 2nd edition, Raghu Ramakrishnan, Johannes Gehrke, McGraw-Hill

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

System Structure Revisited

Transcription:

Chapter 11: Storage and File Structure Overview of Storage Media Magnetic Disks Characteristics RAID Database Buffers Structure of Records Organizing Records within Files Data-Dictionary Storage Classifying Physical Storage Media Speed Initial delay (latency) Sustained rate Cost - per bit Reliability data loss on power failure or system crash physical failure occasional or permanent Volatile/Non-volatile volatile: loses contents when power goes off 1 2 Storage Hierarchy Storage Hierarchy (Cont.) Faster Lower Cost/bit, More Capacity processor Volatile Sometimes Portable Portable Primary storage E.g. cache, main memory (semiconductor RAM) Fastest but volatile and expensive For programs and files in recent active use Secondary storage E.g. on-line magnetic hard-disk Default location for programs and data/databases Tertiary storage E.g. removable magnetic tape, optical disk Slowest but cheapest For archives and back-ups 3 4 Magnetic Hard Disk Mechanism A sector is smallest addressable unit (~512b) To access a sector: 1. Move head to the correct track (seek time) 2. Spin disk to the right angular position (rotational latency) NOTE: Diagram is schematic, and simplifies the structure of actual disk drives Performance Measures of Disks Magnetic Disk s access time is ~1 million times slower than Main Memory s Access time = seek time + rotational latency Initial delay from request to first bit of data Seek time Average seek time is ~1/2 the worst case seek time. 8 to 20 ms on typical disks Rotational latency Average latency is ~1/2 of the worst case latency. 4 to 11 ms on typical disks (5400 to 15000 r.p.m.) Data-transfer rate (bandwidth) From first byte to last byte of one block request 25 to 100 MB/s max, lower for inner tracks 5 6

Optimization of Disk-Block Access Minimize the number of disk accesses Larger blocks? Arrange file contents by blocks Minimize physical distance traveled Cluster blocks together Reorder the sequence of disk accesses Use fast caches and buffers to eliminate or parallelize disk accesses RAID for Reliability and Speed Redundant Arrays of Independent Disks With N disks, failure rate increases by factor of N Reliability through Redundancy Mirroring (or shadowing) - Duplicate set of disks Allows a complete failure of one disk per pair. Parity bit - detect and correct single bit errors Performance through Parallelism Can complete one request more quickly Can sometimes do multiple requests in parallel Bit-level striping spread 1 byte across 8 disks Block-level striping spread consecutive blocks across N disks 7 8 RAID Levels 1 and 5 (Most Popular) Choice of RAID Level Level 1: Mirrored disks with block striping Offers best write performance. Popular for applications such as storing log files in a database system. Level 5: Block-Interleaved Distributed Parity: partitions data and parity among all N + 1 disks, rather than storing data in N disks and parity in 1 disk. Decision Factors Monetary cost Throughput and bandwidth of normal operation Performance during failure Performance during rebuild of failed disk Level 1 has better write performance than level 5 Level 5 requires at least 2 block reads and 2 block writes to write a single block; Level 1 only requires 2 block writes Level 5 is preferred for applications with low update rate and large amounts of data Level 1 is preferred for all other applications 9 10 Database Buffer DBMS uses part of main memory as a disk cache buffer. When a block is needed 1. If the block is already in the buffer, get it from the buffer instead of disk (cache hit) 2. If the block is not in the buffer (cache miss), then Allocates space in the buffer for the block A. Replace some block already in the buffer to make space for the new block B. The replaced block is first written back to disk if it was modified since it was buffered (write-back policy) Buffer-Replacement Policies Most operating systems replace the block least recently used (LRU strategy) LRU isn t always a good strategy Example: nested join for each tuple t r of r do for each tuple t s of s do if the tuples tr and ts match Better strategy: a query optimizer provides hints on replacement strategy 11 12

File Organization A database is stored as a collection of files. Each file is a sequence of records. A record is a sequence of fields. Two approaches Fixed length records, homogeneous files Each file has records of one particular type only Each table has a separate file Variable length records, heterogeneous files Variable-length fields (VARCHAR) Records from multiple tables in one file Case 1: Fixed-Length Records Store record i starting from byte n (i 1), where n is the size of each record. But, what if we need to delete record i? 3 options: Move records i + 1,..., n to i,..., n 1 Move record n to i Do not move records, but link all free records on a free list 13 14 Free Lists Store the address of the first deleted record in the file header. Use this first record to point to the second deleted record, and so on The figure shows a pointer field in each record, but there is a way to eliminate this. How? Case 2: Variable-length Fields Variable length fields can be represented by a pair (offset,length) offset is the location within the record length is field length All fields start at predefined location, but extra indirection required for variable length fields Q: How do we insert a record? How do we then update the free list? A-102 offset 10 400 account_number balance branch_name Perryridge Example record structure of account record 15 16 Slotted Page Structure for Mixed Files Header contains: Number of record entries End of free space in the block Location and size of each record Records can be moved around within a page to keep them contiguous with no empty space between them; entry in the header must be updated Organizing Records within a File Sequential store records in sequential order, based on the value of the search key of each record Heap a record can be placed anywhere in the file where there is space Hashing a hash function computed on some attribute of each record; the result specifies in which block of the file the record should be placed Multitable clustering stores related records from several different relations adjacent to one other Like doing part of the work of a JOIN ahead of time More in Chapter 12 17 18

Sequential File Organization Suitable for applications that require sequential processing of the entire file The records in the file are ordered by a search-key Maintaining Sequential File Organization Deletion use pointer chains Insertion locate the position where the record is to be inserted if there is free space insert there if no free space, insert the record in an overflow block In either case, pointer chain must be updated Need to reorganize the file from time to time to restore sequential order 19 20 Multitable Clustering File Organization Store several relations in one file using a multitable clustering file organization Multitable Clustering File Organization Multitable clustering of customer and depositor: Depositor Customer Good for queries involving depositorcustomer, and for one single customer and his accounts Bad for queries involving only customer Can add pointer chains to link records of a particular relation 21 22 Data Dictionary (aka System Catalog) Stores metadata; that is, data about data, such as Logical Schema names of relations names and types of attributes of each relation names and definitions of views integrity constraints Physical Schema Physical location of relation How relation is organized (sequential/hash/ ) Information about indices (Chapter 12) User and accounting information Statistical and descriptive data Data Dictionary Storage (Cont.) Catalog structure Relational representation on disk, or Specialized data structures designed for efficient access Q: What does MySQL use? A possible catalog representation: Relation_metadata = (relation_name, number_of_attributes, storage_organization, location) Attribute_metadata = (attribute_name, relation_name, domain_type, position, length) User_metadata = (user_name, encrypted_password, group) Index_metadata = (index_name, relation_name, index_type, index_attributes) View_metadata = (view_name, definition) 23 24

Figure 11.4 RAID Levels Figure 11.100 More recent RAID Levels 25 26 Different Ways to Handle Deleting Record 2 Shift all later records to fill the gap Move last record to fill the gap Clustering File Structure With Pointer Chains 27 28 Byte-String Representation of Variable-Length Records Byte-String Representation (cont) Alow two kinds of block in file: Anchor block contains the first records of chain Overflow block contains records other than those that are the first records of chairs. Non-1NF Schema: ( branch_name, {account(s)} ) Attach an end-of-record ( ) control character to the end of each record Difficulty with deletion Difficulty with growth 29 30