Professor: Pete Keleher! Closures, candidate keys, canonical covers etc! Armstrong axioms!

Similar documents
Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC 424 Database design Lecture 12 Storage. Mihai Pop

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

Ch 11: Storage and File Structure

Storage and File Structure. Classification of Physical Storage Media. Physical Storage Media. Physical Storage Media

Storage. CS 3410 Computer System Organization & Programming

Database Systems II. Secondary Storage

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Monday, May 4, Discs RAID: Introduction Error detection and correction Error detection: Simple parity Error correction: Hamming Codes

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Principles of Data Management. Lecture #2 (Storing Data: Disks and Files)

CSE 451: Operating Systems Spring Module 12 Secondary Storage

Disks and Files. Jim Gray s Storage Latency Analogy: How Far Away is the Data? Components of a Disk. Disks

CS 405G: Introduction to Database Systems. Storage

STORING DATA: DISK AND FILES

Storage and File Structure

BBM371- Data Management. Lecture 2: Storage Devices

Storing Data: Disks and Files

CS370 Operating Systems

Storage Systems. Storage Systems

CPSC 421 Database Management Systems. Lecture 11: Storage and File Organization

CSE 451: Operating Systems Winter Secondary Storage. Steve Gribble. Secondary storage

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

Today: Secondary Storage! Typical Disk Parameters!

Lecture 15 - Chapter 10 Storage and File Structure

Chapter 10: Storage and File Structure

Physical Storage Media

Wednesday, April 25, Discs RAID: Introduction Error detection and correction Error detection: Simple parity Error correction: Hamming Codes

Mass-Storage Structure

Mass Storage. 2. What are the difference between Primary storage and secondary storage devices? Primary Storage is Devices. Secondary Storage devices

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

Chapter 10: Mass-Storage Systems

Administração e Optimização Bases de Dados DEI-IST 2010/2011

CS122A: Introduction to Data Management. Lecture #14: Indexing. Instructor: Chen Li

Data Storage - I: Memory Hierarchies & Disks. Contains slides from: Naci Akkök, Pål Halvorsen, Hector Garcia-Molina, Ketil Lund, Vera Goebel

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

Classifying Physical Storage Media. Chapter 11: Storage and File Structure. Storage Hierarchy (Cont.) Storage Hierarchy. Magnetic Hard Disk Mechanism

Classifying Physical Storage Media. Chapter 11: Storage and File Structure. Storage Hierarchy. Storage Hierarchy (Cont.) Speed

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

CS 261 Fall Mike Lam, Professor. Memory

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Data Storage and Query Answering. Data Storage and Disk Structure (2)

Informal Design Guidelines for Relational Databases

I/O CANNOT BE IGNORED

The Memory Hierarchy 10/25/16

Mass-Storage. ICS332 Operating Systems

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

CS 554: Advanced Database System

CS5460: Operating Systems Lecture 20: File System Reliability

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review

CMSC 461 Final Exam Study Guide

Physical Data Organization. Introduction to Databases CompSci 316 Fall 2018

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Module 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.

Mass-Storage Structure

Disks, Memories & Buffer Management

Storage Devices for Database Systems

CSE 451: Operating Systems Winter Lecture 12 Secondary Storage. Steve Gribble 323B Sieg Hall.

Chapter 6. Storage and Other I/O Topics

Lecture 29. Friday, March 23 CS 470 Operating Systems - Lecture 29 1

CISC 7310X. C11: Mass Storage. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 4/19/2018 CUNY Brooklyn College

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

V. Mass Storage Systems

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

Chapter 10: Mass-Storage Systems

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Advanced Database Systems

Chapter 4 File Systems. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved

Mass-Storage Systems. Mass-Storage Systems. Disk Attachment. Disk Attachment

L9: Storage Manager Physical Data Organization

Database Systems. November 2, 2011 Lecture #7. topobo (mit)

Data Storage and Disk Structure

Computer System Architecture

Storage systems. Computer Systems Architecture CMSC 411 Unit 6 Storage Systems. (Hard) Disks. Disk and Tape Technologies. Disks (cont.

Chapter 12: Mass-Storage

Chapter 12: Mass-Storage

Chapter 12: Mass-Storage

CS3600 SYSTEMS AND NETWORKS

Outlines. Chapter 2 Storage Structure. Structure of a DBMS (with some simplification) Structure of a DBMS (with some simplification)

Instructor: Amol Deshpande

Introduction to Data Management. Lecture 14 (Storage and Indexing)

CHAPTER 12: MASS-STORAGE SYSTEMS (A) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

Introduction to Data Management. Lecture #13 (Indexing)

CS122 Lecture 1 Winter Term,

Disks and Files. Storage Structures Introduction Chapter 8 (3 rd edition) Why Not Store Everything in Main Memory?

Computer Architecture 计算机体系结构. Lecture 6. Data Storage and I/O 第六讲 数据存储和输入输出. Chao Li, PhD. 李超博士

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page

COMP283-Lecture 3 Applied Database Management

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

DATABASE MANAGEMENT SYSTEMS

Transcription:

Professor: Pete Keleher! keleher@cs.umd.edu! } Mechanisms and definitions to work with FDs! Closures, candidate keys, canonical covers etc! Armstrong axioms! } Decompositions! Loss-less decompositions, Dependency-preserving decompositions! } BCNF! How to achieve a BCNF schema! } BCNF may not preserve dependencies! } 3NF: Solves the above problem! } BCNF allows for redundancy! } 4NF: Solves the above problem!

MovieTitle MovieYear StarName Address Star wars 1977 Harrison Ford Address 1, LA Star wars 1977 Harrison Ford Address 2, FL Indiana Jones 198x Harrison Ford Address 1, LA Indiana Jones 198x Harrison Ford Address 2, FL Witness 19xx Harrison Ford Address 1, LA Witness 19xx Harrison Ford Address 2, FL Lot of redundancy FDs? No non-trivial FDs. So the schema is trivially in BCNF (and 3NF) What went wrong? } The redundancy is because of multi-valued dependencies! } Denoted:! starname address! starname movietitle, movieyear! } Should not happen if the schema is constructed from an E/R diagram! } Functional dependencies are a special case of multivalued dependencies!

} Similar to BCNF, except with MVDs instead of FDs.! } Given a relation schema R, and a set of multi-valued dependencies F, if every MVD, A àà B, is either:! 1. Trivial, or! 2. A is a superkey of R! Then, R is in 4NF (4th Normal Form)! } 4NF à BCNF à 3NF à 2NF à 1NF:! If a schema is in 4NF, it is in BCNF.! If a schema is in BCNF, it is in 3NF.! } Other way round is not always true.! 3NF BCNF 4NF Eliminates redundancy because of FD s Eliminates redundancy because of MVD s Mostly Yes Yes No No Yes Preserves FDs Yes. Maybe Maybe Preserves MVDs Maybe Maybe Maybe 4NF is typically desired and achieved. A good E/R diagram won t generate non-4nf relations at all Choice between 3NF and BCNF is up to the designer ALL THREE ARE LOSSLESS

! } Three ways to come up with a schema! 1. Using E/R diagram! If good, then little normalization is needed! Tends to generate 4NF designs! 2. A universal relation R that contains all attributes.! Called universal relation approach! Note that MVDs will be needed in this case! 3. An ad hoc schema that is then normalized! MVDs may be needed in this case! } What about 1 st and 2 nd normal forms?! } 1NF:! Essentially says that no set-valued attributes allowed! Formally, a domain is called atomic if the elements of the domain are considered indivisible! A schema is in 1NF if the domains of all attributes are atomic! We assumed 1NF throughout the discussion! Non 1NF is just not a good idea! } 2NF:! Mainly historic interest! See Exercise 7.15 in the book if interested!

} We would like our relation schemas to:! Not allow potential redundancy because of FDs or MVDs! Be dependency-preserving:! Make it easy to check for dependencies! Since they are a form of integrity constraints! } Functional Dependencies/Multi-valued Dependencies! Domain knowledge about the data properties! } Normal forms! Defines the rules that schemas must follow! 4NF is preferred, but 3NF is sometimes used instead! } Denormalization! After doing the normalization, we may have too many tables! We may denormalize for performance reasons! Too many tables à too many joins during queries! A better option is to use views instead! So if a specific set of tables is joined often, create a view on the join! } More advanced normal forms! project-join normal form (PJNF or 5NF)! domain-key normal form! Rarely used in practice!

Professor: Pete Keleher! keleher@cs.umd.edu! } Data Models Conceptual representation of the data } Data Retrieval How to ask questions of the database How to answer those questions } Data Storage How/where to store data, how to access it } Data Integrity Manage crashes, concurrency Manage semantic inconsistencies

user query! results! Query Processing Engine! Given a input user query, decide how to execute it! Specify sequence of pages to be brought in memory! Operate upon the tuples to produce results! page requests! pointers! to pages! block requests! Buffer Management! data! Bringing pages from disk to memory! Managing the limited memory! Space Management on Persistent Storage (e.g., Disks)!! Storage hierarchy!!! How are relations mapped to files?! How are tuples mapped to disk blocks?! } Storage hierarchy! } Disks! } RAID! } File Organization! } Etc.!

} Tradeoffs between speed and cost of access! } Volatile vs nonvolatile! Volatile: Loses contents when power switched off! } Sequential vs random access! Sequential: read the data contiguously! select * from employee! Random: read the data from anywhere at any time! select * from employee where name like a b! } Why care?! Need to know how data is stored in order to optimize, to understand what s going on! } Trade-offs shifted drastically over last 10-15 years! Especially with fast network, SSDs, and high memories! However, the volume of data is also growing quite rapidly! } Some observations:! Cheaper to access another computer s memory than local disk! Cache is playing more and more important role! Data often fits in memory of a single machine, or cluster of machines! Disk considerations less important! Still: Disks are where most of the data lives today! Similar reasoning/algorithms required though!

SSD

! } Cache! Super fast; volatile; Typically on chip! L1 vs L2 vs L3 caches???! L1 about 64KB or so; L2 about 1MB; L3 8MB (on chip) to 256MB (off chip)! Huge L3 caches available now-a-days! Becoming more and more important to care about this! Cache misses are expensive! Similar tradeoffs as were seen between main memory and disks! Cache-coherency??! source: http://cse1.net/recaps/4-memory.html

K8 core in the AMD Athlon 64 CPU } Main memory! 10s or 100s of ns; volatile! Pretty cheap and dropping: 1GByte < 100$! $10 Main memory databases feasible now-a-days! } Flash memory (EEPROM)! Limited number of write/erase cycles! Non-volatile, slower than main memory (especially writes)! Examples?! } Question! How does what we discuss next change if we use flash memory only?! Key issue: Random access as cheap as sequential access!

} Magnetic Disk (Hard Drive)! Non-volatile! Sequential access much much faster than random access! Discuss in more detail later! } Optical Storage - CDs/DVDs; Jukeboxes! Used more as backups Why?! Very slow to write (if possible at all)! } Tape storage! Backups; super-cheap; painful to access! IBM just released a secure tape drive storage solution! } Primary! e.g. Main memory, cache; typically volatile, fast! } Secondary! e.g. Disks; Solid State Drives (SSD); non-volatile! } Tertiary! e.g. Tapes; Non-volatile, super cheap, slow!

source: http://cse1.net/recaps/4-memory.html 10 9 Tape /Optical Robot Andromeda The image cannot be displayed. Your computer The image may not have enough memory to cannot open the be image, or the image may have been displayed. corrupted. Your Restart your computer, and computer then open may the file again. If the red x still not appears, have you may have to delete the image and then insert it The again. image cannot be displayed. Your computer may not have enough 2,000 Years 10 6 Disk Pluto 2 Years 100 Memory Sacramento 1.5 hr 10 2 1 On Board Cache On Chip Cache Registers This Lecture Hall This Room My Head 10 min 1 min

} Storage hierarchy! } Disks! } RAID! } File Organization! } Etc.! 1956 IBM RAMAC 24 platters 100,000 characters each 5 million characters

1979 SEAGATE 5MB 1998 SEAGATE 47GB 2006 Western Digital 500GB Weight (max. g): 600g Latest: Single hard drive: Toshiba 7200.10 SATA 3 TB 7200 rpm Uses perpendicular recording $84

} Accessing a sector! Time to seek to the track (seek time)! average 4 to 10ms! Waiting for the sector to get under the head (rotational latency)! average 4 to 11ms! Time to transfer the data (transfer time)! very low! About 10ms per access! So if randomly accessed blocks, can only do 100 block transfers! 100 x 512bytes = 50 KB/s! } Data transfer rates! Rate at which data can be transferred (w/o any seeks)! 30-50MB/s to up to 200MB/s (Compare to above)! Seeks are bad!

} Heads 8, Disks 4! } Bytes per sector: 512 bytes! } Default cylinders: 16,383! } Defaults sectors per track: 63! } Defaults read/write heads: 16! } Spindle speed: 7200 rpm! } Average latency: 4.16msec! Track-to-track seek time: 1msec-1.2msec! Internal data transfer rate: 1287 Mbits/sec max! Average seek: 8.5-9.5msec! } We also about power now! } Mean time to/between failure (MTTF/MTBF):! 57 to 136 years! } Consider:! 1000 new disks! 1,200,000 hours of MTTF each! On average, one will fail 1200 hours = 50 days!

} Interface between the disk and the CPU! } Accepts the commands! } checksums to verify correctness! } Remaps bad sectors!! } Typically sectors too small! } Block: A contiguous sequence of sectors! 512 bytes to several Kbytes! All data transfers done in units of blocks! } Scheduling of block access requests?! Considerations: performance and fairness! Elevator algorithm!

} Essentially flash that emulates hard disk interfaces! } No seeks à Much better random reads performance! } Writes are slower, the number of writes at the same location limited! Must write an entire block at a time! } About a factor of 10 more expensive right now! } Will soon lead to perhaps the most radical hardware configuration change in a while! } Storage hierarchy! } Disks! } RAID! } File Organization! } Etc.!

} Redundant array of independent disks! } Goal:! Disks are very cheap! Failures are very costly! Use extra disks to ensure reliability! If one disk goes down, the data still survives! Also allows faster access to data! } Many raid levels! Different reliability and performance properties! (a) No redundancy. (b) Make a copy of the disks. If one disk goes down, we have a copy. Reads: Can go to either disk, so higher data rate possible. Writes: Need to write to both disks.

(c) Memory-style Error Correcting Keep extra bits around so we can reconstruct. Superceeded by below. (d) One disk contains parity for the main data disks. Can handle a single disk failure. Little overhead (only 25% in the above case). } Distributed parity blocks instead of bits! } Subsumes Level 4! } Normal operation:! Read directly from the disk. Uses all 5 disks! Write : Need to read and update the parity block! To update 9 to 9! read 9 and P2! compute P2 = P2 xor 9 xor 9! write 9 and P2!

} Failure operation (disk 3 has failed)! Read block 0 : Read it directly from disk 2! Read block 1 (which is on disk 3)! Read P0, 0, 2, 3 and compute 1 = P0 xor 0 xor 2 xor 3! Write :! To update 9 to 9! read 9 and P2! Oh P2 is on disk 3! So no need to update it! Write 9! } Main choice between RAID 1 and RAID 5! } Level 1 better write performance than level 5! Level 5: 2 block reads and 2 block writes to write a single block! Level 1: only requires 2 block writes! Level 1 preferred for high update environments such as log disks! } Level 5 lower storage cost!! Level 1 50% of disks used for redundancy! Level 5 is preferred for applications with low update rate,! and large amounts of data!