Relational Theory and Data Independence: Unfinished Business. Logical Data Independence and the CREATE VIEW Statement.

Similar documents
Violating Independence

Data Base Concepts. Course Guide 2

8) A top-to-bottom relationship among the items in a database is established by a

Fundamentals of Information Systems, Seventh Edition

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 16-1

SQL. History. From Wikipedia, the free encyclopedia.

Introduction to SET08104

Introduction to Information Systems

2. An implementation-ready data model needn't necessarily contain enforceable rules to guarantee the integrity of the data.

CHAPTER 2: DATA MODELS

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

Data about data is database Select correct option: True False Partially True None of the Above

CHAPTER 2: DATA MODELS

Course Logistics & Chapter 1 Introduction

Lecture 02. Fall 2017 Borough of Manhattan Community College

Elena Baralis and Tania Cerquitelli 2013 Politecnico di Torino 1

Basant Group of Institution

Mahathma Gandhi University

Several major software companies including IBM, Informix, Microsoft, Oracle, and Sybase have all released object-relational versions of their

Review -Chapter 4. Review -Chapter 5

Definitions. Database Architecture. References Fundamentals of Database Systems, Elmasri/Navathe, Chapter 2. HNC Computing - Databases

COSC 304 Introduction to Database Systems. Database Introduction. Dr. Ramon Lawrence University of British Columbia Okanagan

DATABASTEKNIK - 1DL116

CISC 3140 (CIS 20.2) Design & Implementation of Software Application II

DBMS and its Architecture

Layers. External Level Conceptual Level Internal Level

DATABASE DESIGN I - 1DL300

FAQ: Relational Databases in Accounting Systems

Introduction to Data Management. Lecture #2 (Big Picture, Cont.)

Database Management Systems. Chapter 1

DATABASE MANAGEMENT SYSTEMS. UNIT I Introduction to Database Systems

CPS510 Database System Design Primitive SYSTEM STRUCTURE

Relational Data Model

Database Systems Concepts *

Practical Database Design Methodology and Use of UML Diagrams Design & Analysis of Database Systems

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

4/28/2014. File-based Systems. Arose because: Result

DATABASE SYSTEMS CHAPTER 2 DATA MODELS 1 DESIGN IMPLEMENTATION AND MANAGEMENT INTERNATIONAL EDITION ROB CORONEL CROCKETT

What is Data? ANSI definition: Volatile vs. persistent data. Data. Our concern is primarily with persistent data

What is Data? Volatile vs. persistent data Our concern is primarily with persistent data

CPS352 Lecture - The Transaction Concept

DATABASE SYSTEM CONCEPTS AND ARCHITECTURE

Review for Exam 1 CS474 (Norton)

CPS221 Lecture: Relational Database Querying and Updating

Introduction to the Structured Query Language [ SQL ] (Significant Concepts)

Index. Bitmap Heap Scan, 156 Bitmap Index Scan, 156. Rahul Batra 2018 R. Batra, SQL Primer,

CPS221 Lecture: Relational Database Querying and Updating

Bottom line: A database is the data stored and a database system is the software that manages the data. COSC Dr.

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Instructor: Chen Li

Teradata Certified Professional Program Teradata V2R5 Certification Guide

Database Fundamentals Chapter 1

Database Optimization

Database Management Systems MIT Introduction By S. Sabraz Nawaz

Solved MCQ on fundamental of DBMS. Set-1

Event Stores (I) [Source: DB-Engines.com, accessed on August 28, 2016]

Database Systems Overview. Truong Tuan Anh CSE-HCMUT

Relational Model. Topics. Relational Model. Why Study the Relational Model? Linda Wu (CMPT )

Help student appreciate the DBMS scope of function

Introduction to Databases

SYED AMMAL ENGINEERING COLLEGE

Chapter 7. Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel

CPS352 Lecture - Indexing

Relational Database Management Systems Oct/Nov I. Section-A: 5 X 4 =20 Marks

Database Management System. Fundamental Database Concepts

CS211 Lecture: Database Querying and Updating

Essay Question: Explain 4 different means by which constrains are represented in the Conceptual Data Model (CDM).

Fundamentals of Physical Design: State of Art

Review: Where have we been?

Introduction to Database Management Systems

Stored Relvars 18 th April 2013 (30 th March 2001) David Livingstone. Stored Relvars

Introduction to Database Systems. Chapter 1. Instructor: . Database Management Systems, R. Ramakrishnan and J. Gehrke 1

The Relational Model

DATA AND SCHEMA MODIFICATIONS CHAPTERS 4,5 (6/E) CHAPTER 8 (5/E)

Readings. Important Decisions on DB Tuning. Index File. ICOM 5016 Introduction to Database Systems

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

11. Architecture of Database Systems

The Relational Model. Relational Data Model Relational Query Language (DDL + DML) Integrity Constraints (IC)

Techno India Batanagar Computer Science and Engineering. Model Questions. Subject Name: Database Management System Subject Code: CS 601

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 7 Introduction to Structured Query Language (SQL)

Introduction to Database Systems. Fundamental Concepts

Relational Database Systems Part 01. Karine Reis Ferreira

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept]

EMC GREENPLUM MANAGEMENT ENABLED BY AGINITY WORKBENCH

CGS 3066: Spring 2017 SQL Reference

DumpsKing. Latest exam dumps & reliable dumps VCE & valid certification king


Brief History of SQL. Relational Database Management System. Popular Databases

Assignment Session : July-March

CSE 3241: Database Systems I Databases Introduction (Ch. 1-2) Jeremy Morris

Product Review: James F. Koopmann Pine Horse, Inc. SoftTree s SQL Assistant. Product Review: SoftTree s SQL Assistant

Wentworth Institute of Technology COMP2670 Databases Spring 2016 Derbinsky. Physical Tuning. Lecture 12. Physical Tuning

5/23/2014. Limitations of File-based Approach. Limitations of File-based Approach CS235/CS334 DATABASE TECHNOLOGY CA 40%

UNIT I. Introduction

Copyright 2004 Pearson Education, Inc.

Databases: Why? Databases: What? Databases: How? DATABASE DESIGN I - 1DL300

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

Administrivia. The Relational Model. Review. Review. Review. Some useful terms

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML)

data dependence Data dependence Structure dependence

Lecture2: Database Environment

Transcription:

Relational Theory and Data Independence: Unfinished Business. Dr. Tom Johnston Much has been made of the data independence that relational technology is said to provide. And indeed, much has been accomplished by this theory and these products. In particular, relational products implement two levels of data independence: physical and logical. These concepts can best be understood in terms of the ANSI/SPARC three-layered data architecture. [insert illustration.] Logical Data Independence and the CREATE VIEW Statement. SQL Data Definition Language (DDL) provides a CREATE VIEW statement which does two things. First, it creates the objects in the External Layer. These objects are view schemas. Second, it provides a mapping between those objects and the objects in the next layer down. Those nextlayer-down objects are base table schemas, and so CREATE VIEW statements define schemas for views based on schemas for tables. This mapping is what provides logical data independence, which is independence of objects in the External Layer from objects in the Conceptual Layer. To a considerable extent, schemas in either layer can be changed without affecting schemas in the other layer. Also, because this mapping is done declaratively (with the CREATE VIEW statement, not with procedural code), it is easily changed. This ease of change this mapping which is specified declaratively makes change less expensive than if the mapping had been done with procedural code. Physical Data Independence and the CREATE TABLE Statement. At the next layer down, SQL DDL provides a CREATE TABLE statement which also does two things. First, it creates the objects in the Conceptual Layer schemas for base tables. Second, it partially specifies objects in the Last revised November, 2007. Page 1.

Physical Layer, and associates them with those tables. This association constitutes the mapping between the Conceptual and Physical Layers. This mapping is what provides physical data independence, which is independence of objects in the Conceptual Layer from objects in the Physical Layer. But much of the work to create Physical Layer persistent physical structures (files, indexes, hash keys) and also to create Physical Layer non-persistent physical structures (buffers, queues, stacks) is done automatically. The DBA neither needs to nor is able to influence many of the decisions which go into setting up the Physical Layer for a database. DBAs are always able to direct the creation of some Physical Layer objects, however. This varies from one DBMS to another. But all come with SQL DDL dialects that offer different clauses of the CREATE TABLE statement that do influence the construction of the Physical Layer, e.g. clauses that define unique or non-unique indexes on non-primary key columns. Three ANSI/SPARC Layers, Two CREATE Statements, One Problem. So in effect, CREATE TABLE statements do three things, not two things as do CREATE VIEW statements. CREATE TABLE statements: Create base table schemas, the objects in the Conceptual Layer; Create physical storage structures, the objects in the Physical layer; and Provides a mapping between those two sets of objects. This lack of a Physical Layer CREATE statement might be considered an aesthetic blemish only, doing no more harm than destroying the symmetry of three CREATE statements, one for each of the three ANSI/SPARC layers. But we can nonetheless distinguish the conceptual from the physical aspects of CREATE TABLE statements. We can do it like this. Last revised November, 2007. Page 2.

CREATE TABLE As a Conceptual And As a Physical Specification. Both the External and the Conceptual Layers are accessible to the programmer and end-user, since SQL can be written against either views or base tables. 1 So any portion of a CREATE TABLE statement that can alter what is visible to SQL is a portion that defines an object in the Conceptual Layer. On the other hand, any portion of a CREATE TABLE statement that can be altered without altering what is visible to SQL, in any way, is a portion that defines an object in the Physical Layer. Let s call these the conceptual and physical portions of CREATE TABLE statements, respectively. For example, adding a column to a table is changing the conceptual portion because it alters the table as seen by SQL. Adding an index is a physical change because it does not. Some DBMS vendors might consider this lack a good thing because, on balance, they believe that their own developers can do a better job than enduser DBAs at defining the best static and dynamic physical structures for the database. 2 But this lack of a separate CREATE PHYSICAL STRUCTURE statement is more than an aesthetic blemish. It is symptomatic of an unfinished revolution. For by the most important criterion of all, relational products fail to provide physical data independence. Relational products fail to make denormalization unnecessary! Relational theory fails to define the conceptual structures that would, when implemented, realize full Physical Data Independence. 1 2 By SQL in this context, I mean the SQL that programmers and end users write to access the content of tables or views. This is more formally called SQL Data Manipulation Language. But I follow the convention of most authors, and use the unqualified SQL to mean SQL DML. CREATE VIEW and CREATE TABLE statements, on the other hand, are also SQL. But they are SQL Data Definition Language statements. Teradata is famous for managing its physical data structures with little input from DBAs. For example, Teradata DBAs cannot define various kinds and sizes of buffer pools. The physical keys in Teradata databases are values computed by an internal hashing algorithm. DB2 UDB, by contrast, uses as primary keys what the DBAs define as primary keys, in the CREATE TABLE statements, and also gives wide latitude to the DBA in assigning buffer pools. Last revised November, 2007. Page 3.

Relational theory, and relational products, fail to make denormalization unnecessary, and thus fail to support full Physical Data Independence. Figure 1. How Relational Theory and Products Fail to Support Physical Data Independence. Data modelers and DBAs know this for a fact. We are frequently faced with the need to improve the performance of on-line queries. And frequently, the best or only way to do so is to eliminate one or more of the joins in those queries. And how do we do that? The only way to eliminate a join, against otherwise normalized tables, is to denormalize! Specifically, it is to introduce a second normal form violation, a partial dependency, or else to introduce a third normal form violation, a transitive dependency. For example, illustrating a second normal form violation, to eliminate a join that gets the salesperson name on a Salesperson By Customer associative table, we add the salesperson name to that associative table, introducing a partial dependency. Since the fact that a specific salesperson identifier is associated with a specific salesperson name can now be recorded in the database as many times as that salesperson identifier appears as part of the primary key of the associative table, the database is denormalized, and duplicate facts are allowed into the database. To illustrate a third normal form violation, we may choose to eliminate a join which gets the department name for an employee from the Department table by adding department name to the Employee table. Since department number (the primary key of the Department table) was already part of the Employee table, this introduces a transitive dependency into the Employee table. Last revised November, 2007. Page 4.

A more direct way to see this failure to achieve full physical data independence is to focus on the join operation itself. For what is a join? It is, in actuality, a specification of both a logical and a physical operation. The logical operation is a mathematical operation on logical objects. This is how the join operation is defined. But unfortunately, a join, as it is provided by relational DBMSs, is also a physical operation. This operation always involves accessing separate and distinct physical structures. The reason that extra joins slow down a retrieval is that the separate structures accessed by the join are usually so separate that they cannot both be retrieved in the same I/O operation. It is the extra I/Os that slow down joins. If relational theory specified a more complete separation of conceptual from physical objects and processes, and if relational DBMSs implemented that more complete separation, then there would not be a correlation between the number of joins in a SQL query and the number of I/Os performed to return the result set. And consequently, fully normalized data models would never need to be denormalized in the course of implementing them and performance tuning the database. Figure 2. In a Perfect World.. This, in turn, would guarantee that a logically correct image of the database was always presented to the programmer and end user. It would be presented as a fully normalized set of schemas, in the Conceptual Layer, and a set of views derived from them in the External Layer. Since data modelers and DBAs must work with the DBMSs and SQL dialects given to them by their vendors, the question is how best to cope with these unfortunate and unnecessary limitations. Current best practice is Last revised November, 2007. Page 5.

simply to denormalize only when absolutely necessary, when required performance cannot be achieved by any other means. What Can We Do? [Notes only. I'll provide a more extensive write-up later.] 1. Don't wait for standards committees or vendors. This issue doesn't seem to be even on their horizon. 2. The bottom-line solution is to provide a purely syntactical, semantically neutral, set of physical schemas for data. This is what RDF attempts to do. 3. I made my own attempt, in 2004, in an eight-part series at (the now defunct). DataWarehouse.com, entitled Stability and Flexibility in Databases. (Links to these articles will appear in the DMReview.com archives sometime soon.) 4. I don't yet know enough about RDF to be able to compare it with my own solution. Last revised November, 2007. Page 6.