Fig. 7.1 Levels of normalization

Similar documents
The Relational Model and Normalization

3 Data Models and Data Sublanguages

SOFTWARE ENGINEERING Prof.N.L.Sarda Computer Science & Engineering IIT Bombay. Lecture #10 Process Modelling DFD, Function Decomp (Part 2)

SOME TYPES AND USES OF DATA MODELS

Database Systems Relational Model. A.R. Hurson 323 CS Building

Database Management System Prof. Partha Pratim Das Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

CS103 Spring 2018 Mathematical Vocabulary

6. Relational Algebra (Part II)

Chapter 3. The Relational database design

Chapter 3. Set Theory. 3.1 What is a Set?

In mathematical terms, the relation itself can be expressed simply in terms of the attributes it contains:

6.001 Notes: Section 8.1

CS211 Lecture: Database Design

Chapter 14. Chapter 14 - Objectives. Purpose of Normalization. Purpose of Normalization

The Encoding Complexity of Network Coding

Chapter 8 INTEGRITY 1

CSCI 403: Databases 13 - Functional Dependencies and Normalization

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Divisibility Rules and Their Explanations

To overcome these anomalies we need to normalize the data. In the next section we will discuss about normalization.

UNIT 3 DATABASE DESIGN

Learning outcomes. On successful completion of this unit you will: 1. Understand data models and database technologies.

Semantics via Syntax. f (4) = if define f (x) =2 x + 55.

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

(Refer Slide Time 3:31)

IS 263 Database Concepts

Normalisation Chapter2 Contents

E-R Model. Hi! Here in this lecture we are going to discuss about the E-R Model.

EC121 Mathematical Techniques A Revision Notes

Database Foundations. 3-9 Validating Data Using Normalization. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Chapter 11 Database Concepts

The 4/5 Upper Bound on the Game Total Domination Number

Relational Data Model

EDMS. Architecture and Concepts

The strategy for achieving a good design is to decompose a badly designed relation appropriately.

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

Testing! Prof. Leon Osterweil! CS 520/620! Spring 2013!

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

14.1 Encoding for different models of computation

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases.

Module 11. Directed Graphs. Contents

Normalization and Roberts s Rules. Prepared for CSCI 6442 George Washington University. David C. Roberts

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Chapter 7: Entity-Relationship Model

Conformance Requirements Guideline Version 0.1

CPS510 Database System Design Primitive SYSTEM STRUCTURE

Intermediate Algebra. Gregg Waterman Oregon Institute of Technology

Software Engineering Prof.N.L.Sarda IIT Bombay. Lecture-11 Data Modelling- ER diagrams, Mapping to relational model (Part -II)

Some doubts about the objectivity of logical determination of the uniqueness of the elementary process in the Function Point Analysis

ACS-2914 Normalization March 2009 NORMALIZATION 2. Ron McFadyen 1. Normalization 3. De-normalization 3

Relational Database design. Slides By: Shree Jaswal

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Creating SQL Tables and using Data Types

II. Review/Expansion of Definitions - Ask class for definitions

COMPILER DESIGN. For COMPUTER SCIENCE

Lecture Notes on Contracts

Essay Question: Explain 4 different means by which constrains are represented in the Conceptual Data Model (CDM).

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

SQL DDL. CS3 Database Systems Weeks 4-5 SQL DDL Database design. Key Constraints. Inclusion Constraints

8) A top-to-bottom relationship among the items in a database is established by a

DATA MODELS FOR SEMISTRUCTURED DATA

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Database Systems ER Model. A.R. Hurson 323 CS Building

IJREAS Volume 2, Issue 2 (February 2012) ISSN: COMPARING MANUAL AND AUTOMATIC NORMALIZATION TECHNIQUES FOR RELATIONAL DATABASE ABSTRACT

Chapter 14. Database Design Theory: Introduction to Normalization Using Functional and Multivalued Dependencies

Chapter 2 Introduction to Relational Models

Entity-Relationship Modelling. Entities Attributes Relationships Mapping Cardinality Keys Reduction of an E-R Diagram to Tables

Database Design Process

Test bank for accounting information systems 1st edition by richardson chang and smith

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

Logical Database Design Normalization

A Sample Solution to the Midterm Test

Course on Database Design Carlo Batini University of Milano Bicocca

Core Membership Computation for Succinct Representations of Coalitional Games

2.3 Algorithms Using Map-Reduce

"Relations for Relationships"

A Reduction of Conway s Thrackle Conjecture

Dr. Lyn Mathis Page 1

Lesson 10. Student Outcomes. Lesson Notes

Database Normalization Complete

Module 5. Function-Oriented Software Design. Version 2 CSE IIT, Kharagpur

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Course on Database Design Carlo Batini University of Milano Bicocca Part 5 Logical Design

Rational Number Operations on the Number Line

DBMS. Relational Model. Module Title?

Proofwriting Checklist

BOOLEAN ALGEBRA AND CIRCUITS

(Refer Slide Time: 05:25)

DC62 Database management system JUNE 2013

CS 377 Database Systems

6.001 Notes: Section 4.1

Relational Model. Rab Nawaz Jadoon DCS. Assistant Professor. Department of Computer Science. COMSATS IIT, Abbottabad Pakistan

Chapter 14 Outline. Normalization for Relational Databases: Outline. Chapter 14: Basics of Functional Dependencies and


Applied Databases. Sebastian Maneth. Lecture 5 ER Model, normal forms. University of Edinburgh - January 25 th, 2016

Principles of Algorithm Design

Lecture 5: The Halting Problem. Michael Beeson

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning

Transcription:

7 Normalization 7.1 INTRODUCTION The concept of normalization was introduced in Section 3.2. A normalized relation may be defined as one for which each of the, underlying domains contains atomic (nondecomposable) values ontly, so that every value in the relation is in turn atomic. Figure 3.3 demonstrated by example that an unnormalized relation may easily be reduced to an equivalent normalized form, and the relational model was then defined to consist of normalized relations only. (Allowing unnormalized relations in the relational model would open the door to all the problems associated with the hierarchical approach, since an unnormalized relation is analogous, more or less, to a file of hierarchical records.) We now proceed to investigate the concept of normalization in more detail. Codd has defined three levels of normalization, which he calls the first, second, and third normal forms, respectively. Briefly, all normalized relations are in first normal form (1NF); some 1NF relations are also in second normal form (2NF); and some 2NF relations are also in third normal form (3NF). See Fig. 7. 1. It is the, objective of the present chapter to demonstrate the advantages of third normal form. In aiming for this objective, we make, no attempt at mathematical rigor in our arguments and definitions; rather, we rely to a considerable extent on plain intuition. Indeed, part of the argument is that third normal form, despite its somewhat esoteric name, is essentially a very simple and common-sense idea. Published papers by Codd and others treat the material in a more rigorous manner. 7.2 FUNCTIONAL DEPENDENCE Fig. 7.1 Levels of normalization We start by introducing the notion of functional dependence (within a relation)-a concept of absolutely paramount importance, not least for the DBA in his work of data model design. Given a relation R, we say that domain Y of R is functionally dependent on domain X of R if and only if each X-value in R has 1

associated with t precisely one Y-value in R (at any one time). 1 In the relation S of our suppliers-and-parts data model, for example, domains SNAME, STATUS, and CITY are each functionally dependent on domain S #: given a particular S # value, there exists precisely one corresponding value for each of SNAME, STATUS, and CITY. We may represent these functional dependencies diagrammatically, as shown in Fig. 7.2. Fig. 7.2 Functional dependencies in the relation S. Recognizing the functional dependencies is an essential part of understanding the meaning or semantics of the data. The fact that CITY is functionally dependent of S #, for example, means that each supplier is located in precisely one city. To put it another way, we have a constraint in the real world which the database represents, namely, that each supplier is located in precisely one city; since it is part of the semantics of the situation, this constraint must somehow be observed in the data model; one way of ensuring that it is so observed is by specifying the constraint in the data model definition (DMD) so that the DBMS can enforce it; and the way to specify it in the DMD is to declare the functional dependency. Later we shall see that third normal form provides a simple means of declaring such functional dependencies. The notion of functional dependence can be extended to cover the case where X or Y or both are composite domains. For example, in the relation SP of our suppliers-and-parts data model, the domain QTY is functionally dependent on the composite domain (S#,P#): given a particular combination of S# and P# values, there exists precisely one corresponding QTY value (assuming that the particular S# -P# combination occurs within SP). We may represent this as shown in Fig. 7.3. We also introduce the concept of full functional dependence. Domain Y is fully functionally dependent on domain X if it is functionally dependent on X and not functionally dependent on any subset of X (it is assumed that X is composite). For example, in the relation S, the domain CITY is functionally dependent on the composite domain (S #,STATUS); however, it is not fully functionally dependent on this Fig. 7.3 Functional dependency in the relation SP. 1 Strict1y speaking, the term domain" in the definition of functional dependence refers to the use of a domain within a relation (remember that the same domain may be used several times within a relation; see the relation COMPONENT of Fig. 3.2, for example). For simplicity we shall tend to ignore this distinction in what follows. 2

composite domain because, of course, it is also functionally dependent on S# alone. Throughout this chapter we shall take functional dependence" to mean full functional dependence unless explicitly stated otherwise. All the examples so far (in relations S and SP) have been of domains functionally dependent on a primary key. The functional dependencies in relation P are also of this nature. This is not always so, however; the next section contains several counterexamples. [The principal reason that all functional dependencies so far considered are of this type is that the relations concerned (S,P,SP) are in fact simple examples of relations in third normal form.] 7.3 THE THREE NORMAL FORMS We are now in a position to describe the three normal forms. We shall consider them in the sequence third (3NF), second (2NF), first (INF), not out of contrariness but because this is the easiest way to understand them. At the end of the chapter we shall discuss briefly the successive reduction of a 1NF relation to 2NF and thence to 3NF, thus justifying the terminology. We start, then, with a definition of third normal form (3NF). A normalized relation R is said to be in 3NF if and only if the non-key domains of R, if any, are (a) mutually independent and (b) functionally dependent on the primary key of R. The term "non-key domain" refers to any domain not participating in the primary key. The non-key domains of a relation are "mutually independent if none of them is functionally dependent on any other. As already mentioned, S, P, and SP are all examples of 3NF relations. We may interpret the 3NF definition loosely as follows. At any one time, each tuple in a 3NF relation consists of a primary key value which identifies some entity, together with a set of values which describe that entity in some way. These descriptive values are all independent of one another, in the sense that none is determined by (any combination of) the others. Thus, for example, each tuple in the relation S consists of an S # value which identifies a particular supplier, together with three pieces of descriptive information about that supplier-his name, status, and location. Moreover, each of the three values, name, status, and location, is independent of the other two. Similar remarks apply to relations P and SP. The entities identified by the primary key values are the fundamental entities about which data is recorded in the database (see Section 1.1). To obtain a definition of second normal form (2NF), we simply remove the 3NF requirement of mutual independence among the non-key domains. Fig. 7.4 The relation S' A normalized relation R is said to be in 2NF if and only if the non-key domains of R, if any, are functionally dependent on the primary key of R. As an example of a 2NF relation which is not also in 3NF, let us suppose that the STATUS value for supplier S3 in relation S is changed to 10 (from 30) to be consistent with the additional constraint, which 3

we now introduce, that STATUS is functionally dependent on CITY. The meaning of this constraint is that a supplier's status is defined by the city in which he is located. Figure 7.4 shows the actual data values which we assume to form the current contents of the supplier relation (with its additional constraint). To avoid confusion, we shall refer to this relation as S'. Figure 7.5 is the functional dependency diagram for relation S'. (Note that the diagram is "more complex" than a 3NF diagram.) We still have a functional dependency from S # to STATUS (given a particular S # value, the corresponding STATUS value is uniquely determined), but the dependency is transitive (via CITY). That is, the S# value determines the CITY value, which in turn determines the STATUS value. This relation is not in 3NF because CITY and STATUS are not mutually independent. Fig. 7.5 Functional dependencies in the relation S' The 2NF data model of Fig. 7.4 suffers from anomalies with respect to storage operations very similar to those encountered with the hierarchical approach (see Section 3.4). Adding. Since we now have a functional dependency from CITY to STATUS, it is a reasonable requirement to want to add new instances of this dependency-for example, to state that any supplier in Rome must have a status value of 50. However, we cannot insert this information until some supplier is actually located in Rome, because until such a supplier exists, we have no appropriate primary key value. It is not permissible to create a tuple such as < -, -, 50, ROME > where the dash represents an undefined or null value, because to do so would be to say, in effect, that the database is to contain an unidentified entity-a contradiction in terms. 2 Deleting. If we delete the only supplier in a particular city, then we also lose the association between that city and its corresponding status value. For example, if we delete supplier S5, we lose the information that the status value associated with Athens is 30. Updating. If we need to change the status value associated with a particular city-for example, to change the status value associated with London to 30-we are faced with either the problem of searching the S' relation to find every occurrence of London or the possibility of producing an inconsistent result (London's status may be given as 20 in one place and 30 in another). Similar difficulties arise if we wish to change the city or status value for a particular supplier. The solution to these storage problems is to replace the 2NF relation S' by two 3NF relations, namely, the projection (SSC) of S' over S#, SNAME, and CITY, and the projection (CS) of S' over CITY and STATUS. Figure 7.6 shows the values of SSC and CS' corresponding to the relation S' of Fig. 7.4, 2 No tuple may ever contain an undefined primary key value (i.e., none of the copnstituent values may be null). Other values in the tuple may be null, in general. 4

and Fig. 7.7 shows the functional dependencies involved. The reader should confirm that these two relations are indeed 3NF (the primary keys are S# and CITY). It should be clear that the data model of Fig. 7.6 overcomes all the storage difficulties we encountered with the 2NF equivalent. Adding. We can add the information that Rome's status is 50 by simply adding the appropriate tuple to CS'. Fig. 7.6 The relations SSC' and CS' Fig. 7.7 Functional dependencies in the relations SSC' and CS' Deleting. We can delete supplier S5 from SSC without losing the information that Athens's status is 30. Updating. We can change London's status to 30 by changing it once and for all in the appropriate CS' tuple. Also we can change a supplier's city (without having to change any associated value) by simply changing it in the appropriate SSC tuple. As for changing a supplier's status, it is clear from the DMD (of which Fig. 7.6 may be considered a representation) that such a change may be effected only by changing the status associated with his city or by changing his city. This constraint is a true reflection of the real-world situation, which was concealed in the 2NF data model of Fig. 7.4 (where it looked as if a supplier's status could be changed independently of his city). The reader should note, incidentally, that the relation S 'may be recovered from the relations SSC and CS' by means of a join operation over the domain CITY. Thus any information which may be retrieved from the 2NF data model may also be derived from the 3NF data model (where "3NF data model" means a data model consisting entirely of 3NF relations, and similarly for 2NF). The reverse is not true, however. The 3NF data model may contain information-such as the fact that Rome's status is 50- which cannot be represented in the 2NF data model. 3 Finally, we consider first normal form (1NF). To obtain a definition of 1NF, we remove the 2NF requirement that non-key domains be fully 4 functionally dependent on the primary key. We then have left just a simple statement that any normalized relation is in 1NF (which is of course correct). As an example of a 1NF relation which is not in 2NF (and hence not in city value is a piece of descriptive information about the supplier, not about 3NF, either), we consider the relation SPC of Fig. 7.8. The domains have their usual meaning. Figure 7.9 shows the functional dependencies. 3 The 3NF model also introduces a minor consistency problem not present in the 2NF model, namely, that any city value occurring in SSC' must also occur in CS'. The question of interrelation consistency is discussed elsewhere. 4 It is here that the concept of full functional dependence becomes significant. 5

Here QTY is functionally dependent on the primary key (the combination S# -P #). CITY is also functionally dependent on the primary key, but not fully (see Section 7.2) because it is also functionally dependent on S# alone. The semantic interpretation is that, given a particular SPC tuple, the city value is a piece of descriptive information about the supplier, not about the supplier-part connection. Thus it is not a description of the entity identified by the primary key value. The relation SPC is therefore not in 2NF. (Again we observe informally that the diagram is "more complex" than a 3NF diagram.) Once again we encounter anomalies with storage operations. Adding. We cannot enter the fact that a particular supplier is located in a particular city-for example, supplier S6 is located in Rome - until that supplier supplies at least one part. The reason is that, until he supplies a part, we have no appropriate primary key value (remember that the primary key value involves both a supplier number and a part number). Deleting. lf we delete the only SPC tuple for a given supplier, we destroy not only the link between that supplier and the corresponding part but also the information that the supplier is located in a particular city. Fig. 7.8 The relation SPC Fig. 7.9 Functional dependencies in the relation SPC For example, if we delete the SPC tuple with S# value S5 and P# value P5, we lose the information that S5 is located in Athens. 6

Updating. We cannot change the city value for a given supplier without either search problems or possibly consistent results. Again the solution to these problems is to replace the 1NF relation SPC by a pair of 3NF relations, namely, the projection SP of SPC over S #,P #, and QTY, and the projection SC of SPC over S# and CITY. Here SP is, of course, the familiar relation SP of our suppliers-and-parts (3NF) data model; SC is the projection of the relation S over S # and CITY. The functional dependencies are illustrated in Fig. 7. 10. We already know that with this data model the storage problems do not arise. Detailed consideration of the anomalies mentioned in connection with the 1NF relation SPC is left to the reader. We observe that the relation SPC may be recovered by taking the join of SP and SC over S#, but that the 3NF data model may contain information which could not be represented in the original (INF) version: for example, the fact that supplier S6 is located in Rome (although he currently supplies no parts). 5 Fig. 7.10 Functional dependencies in the relations SP and SC. 7.4 PRACTICAL SIGNIFICANCE OF THIRD NORMAL FORM The concept of 3NF has practical implications for the DBA, for the DBMS, and for the user. Of the three, the DBA is probably the one for whom the concept has most direct significance, but it is not possible to justify this statement without at the same time showing how 3NF affects the design of the DBMS and the way in which the user views the data. The following paragraphs amplify these remarks. Consider first of all the DBA. As explained in Chapter 1, the DBA is responsible for deciding the information content of the database. In essence, he must identify the fundamental types of entity which are of interest to the enterprise, and for each one he must establish what type of information is to be held in the database. For example, he may decide that suppliers form one such type of entity, and that for each supplier the information to be held consists of name, status, and city (in addition to the supplier identity, of course). The first point to be made, then, is that 3NF is clearly an aid to precise thinking in this area. It provides the DBA with a formal means of structuring the information in such a way that it is clear exactly what types of data exist and what constraints (functional dependencies) they satisfy. 6 (We ignore here the question of interrelation 5 Again, a (minor) consistency problem has been introduced, namely, any S# value occurring in SP should also occur in SC (a reasonable interrelation constraint from the real world). 6 The procedure here outlined, in which the DBA first identifies the entities and then the corresponding descriptive information, may be viewed as the constructive approach to data model design. In practice, the DBA will frequently be faced with an existing collection of data, often large and usually sornewhat amorphous, upon which he must somehow impose a rational integrated structure. In such a situation semantics of the data he will have to employ an analytic approach. Again, 3NF is an invaluable tool; for example, see Exercises 7.1 and 7.2. 7

constraints, which discussed elsewhere. It is also possible to have intrarelation constraints; this question is considered in Section 7.5.). Having decided what types of data exist and what constraints they satisfy, the DBA must record his decision in an appropriate form in the DMD, in order to communicate it both to the DBMS and to the users. In other words, he must specify the appropriate domains and (as we know from Section 7.2) the corresponding functional dependencies. The simplest way to specify the functional dependencies is to define a 3NF relation over the domains; see Fig. 7. 11. DOMAIN S# CHARACTER (5) SNAME CHARACTER (20) STATUS NUMERIC (3) CITY CHARACTER (15) RELATION S (S#, SNAME, STATUS, CITY) KEY (S#) Fig. 7.11 Part of the suppliers-and-parts DMD Figure 7.11 represents that part of the suppliers-and-parts DMD concerned with the relation S. The declaration of S in this DMD informs the DBMS (and the users) of the functional dependencies involved provided that the DBMS (and the users) are aware that the relation is 3NF. To take the DBMS first: Let us assume that the DBMS does indeed know that the relation S is 3NF. Then, given the foregoing DMD, the DBMS can determine, within the context of the relation S, that (a) S# values are unique, (b) SNAME, STATUS, and CITY values are not necessarily unique. Thus, to maintain the internal integrity of the relation S, it is sufficient for the DBMS to reject any operation which attempts to introduce a duplicate primary key (S#) value. (Again we ignore for the time being intrarelation constraints, which are constraints over and above those implied by 3NF.) The question arises: How does the DBMS in fact know that the relation S is 3NF? There are two answers to this question. One is that the DBMS may simply assume that every relation defined in the DMD is 3NF; that is, the DBMS may actually have been constructed on this assumption. The alternative is that the DBA may specify the level of normalization (1NF, 2NF, or 3NF) explicitly in the DMD. If the level is 1NF or 2NF, he must then also specify the functional dependencies explicit1y (for 3NF they can be taken by default). Of the two possibilities, we tend to favor the first approach, since it is simpler for both the DBA and the DBMS. (In the last analysis, as we shall see in Section 7.5, there is theoretically very little difference between the two, but in practice the first approach is much easier for all concerned.) As for the user, we have already seen that a DMI) based on 3NF relations provides a simple means for informing the user of the precise semantics of the data. (Again, it is easiest for all concerned if the user may simply assume that all relations defined in the DMD are 3NF.) The fact that his view of the data is so clear-cut is obviously beneficial to the user, no matter what function he is trying to perform. However, it becomes really significant when he comes to consider the effect of a storage operation. Basically, the user can assume that every tuple in the data model is independent of every other; that is, an operation such as updating a particular tuple affects that tuple and no other (there are no side effects). We amplify this point in the material that follows. Adding. The user knows that the only restriction on additions is that he may not introduce a duplicate primary key value into a relation. This is not the only restriction in a non-3nf data model. For example, with the 2NF relation S' of Fig. 7.4, is it permissible for the user to add the following tuple? < 56, THOMPSON, 20, PARIS > (If the answer is no, then the user has to be aware of the additional restriction, namely, that STATUS is functionally dependent on CITY. If the answer is yes, then either we have a loss of integrity in the resulting relation, or the DBMS must produce side effects in the relation to maintain the integrity-neither of which is particularly desirable.) 8

Deleting. The user knows that any tuple may be deleted without restriction. Again, this is not necessarily so in a non-3nf data model. For example, with reference again to the relation S', the restriction may be imposed that a tuple may not be deleted if it represents the only supplier in a particular city. Updating. The user knows that any value (other than the primary key value) in any tuple may be updated independently of all other values, in this tuple or any other. Once again, this does not apply to a non-3nf data model, such as that in Fig. 7.4; here, for example, a supplier's city cannot be changed without hís status changing too (otherwise loss of integrity may result). We must emphasize once again, however, that the foregoing remarks apply to what may be termed a "pure" 3NF data model only. In practice the problem of interrelation consistency must also be addressed. For the suppliers-and-parts data model, an example of an interrelation consistency constraint might be that any S# value appearing in the relation SP must also appear in the relation S (a constraint which would restrict the user's freedom with respect to add and delete operations). Another example of an interrelation constraint might arise where values appearing in one relation are totals of values appearing in another relation (so that if one relation is updated, the other one must be, too). We defer discussion of this problem to Chapter 20. To summarize the arguments of the present section, then: The notion of 3NF is significant for the DBA, the DBMS, and the user. It is significant for the DBA because it is an aid to precise thinking and because it provides a simple means of defining the data and the intrarelation dependencies (or at least most of them; see Section 7.5). It is significant for the DBMS, inasmuch as the actual construction of the DBMS may be simplified (perhaps a debatable point). And it is significant for the user because it provides him with a concise, clear-cut view of the data and its semantics. One other point, not explicitly mentioned above, is that 3NF is also of assistance in the problem of "domain migration." This point will be discussed in the next chapter. 7.5 INTRARELATION DEPENDENCIES We mentioned several times in the previous section that functional dependencies may exist within a relation over and above the "basic ones implied by 3NF. In practice such additional dependencies are perhaps unlikely to occur - the 3NF mechanism is sufficient to handle almost all practical situations - but we present some examples of additional dependencies to illustrate the sort of situation which could arise in theory. Some of these examples highly contrived, but the very fact that they can be devised at all indicates that we must be, able to handle such cases. We do not here consider in any detail how such dependencies should be dealt with. Relation S of the suppliers-and-parts data model provides us with our first (and perhaps more realistic) example. It may well be that we have an additional functional dependency from SNAME to S# (in other words, each supplier has a unique name as well as a unique number). The functional dependence diagram is shown in figure 7.12. We now have two candidates for the primary key, S# and SNAME. We may arbitrarily choose S # as the primary key. The relation is still 3NF, despite the fact that STATUS and CITY are each apparently transitively dependent on the primary key (via SNAME) - the important point is that they are also direct1y dependent on the primary key. (See the formal explanation of 3NF in [7.1].) Fig. 7.12 Additional dependency in relation S 9

So far as the database system is concerned, the DBA needs to be able to declare this additional constraint, the DBMS needs to be able to enforce it, and the user needs to be aware of it. All these aims can be achieved by some phrase such as UNIQUE (SNAME) following the definition of the relation S in the DMD. As our second example, we consider a relation SJT defined on domains S (student), J (subject), and T (teacher). The meaning of an SJT tuple is that the specified student is taught the specified subject by the specified teacher. The semantic rules follow. For each subject, each student of that subject is taught by only one teacher. Each teacher teaches only one subject. Each subject is taught by several teachers. Figure 7.13 shows a sample tabulation of this relation. Fig. 7.13 The relation SJT What are the functional dependencies in this relation? From the first semantic rule we have a functional dependency of T on the composite domain (S,J). From the second semantic rule we have a functional dependency of J on T. The third semantic rule tells us that there is not a functional dependency of T on J. Hence we have the situation shown in Fig. 7.14. Fig. 7.14 Functional dependencies in the relation SJT The relation SJT is 3NF, and the key is the combination S-J 7 (see the definition in Section 7.3). However, we find once again that we have difficulties with the storage operations. For example, if we wish 7 An alternative key is the combination S-T. If this key is chosen, the relation appears to be 1NF, not 3NF, since the domain J is not fully functionally dependent on the key. However, it is in fact 3NF, because J is not exactly a non-key domain (since it is a constituent of one of the candidate keys). 10

to delete the information that Jones is studying physics (see Fig. 7.13), we cannot do so without at the same time losing the information that Prof. Brown teaches physics. We can get out of these difficulties by replacing the relation SJT by two of its (3NF) projections. The projections for the data of Fig. 7.13 are shown in Fig. 7.15, and the functional dependencies are given in Fig. 7.16. Fig. 7.15 The relations ST and TJ The reader should convince himself that this solution does indeed avoid the storage problems. Note once again that we have introduced an interrelation consistency problem. (What is it?) Note, incidentally, that relation ST is "all key" - it has no non-key domains. Figure 7.16 Functional dependencies in the relations ST and TJ Our third and last example concerns a relation EXAM defined on do mains S (student), J (subject), and P (position). The meaning of an EXAM tuple is that the specified student was examined in the specified subject and achieved the specified position in the class list. For the purposes of the example, we suppose that the following semantic rule holds. There are no ties; that is, no two students obtained the same position in the same subject Then the functional dependencies are as illustrated in Fig. 7.17. Again the relation is in 3NF, regardless of which domain combination is chosen as the primary key. If the DBA chooses (S,J) as the primary key - the structure on the left in Fig. 7.17 - he also needs to be able to specify the dependency of S on (J,P) 8 - shown on the right in Fig. 7.17. Conversely, of course, if he chooses the structure on the right, he needs to be able to specify the dependency shown on the left. There is no way of avoiding this problem. 8 To inform both the DBMS and the user of the additional dependency. 11

Figure 7.17 Functional dependencies in the relation EXAM 7.6 SUMMARY In this chapter we have presented arguments for basing a relational data model on relations in third normal form. The examples of Section 7.3 give some insight into the method of reducing an arbitrary 1NF relation to an equivalent collection of 3NF relations. The reduction process may be informally described as follows: a) Take projections of the original 1NF relation to eliminate any nonfull functional dependencies. This will produce a collection of 2NF relations. b) Take projections of these 2NF relations to eliminate any transitive dependencies. This will produce a collection of 3NF relations. (The reader should realize that, in general, the collection of 3NF relations corresponding to a given 1NF relation may not be unique. In particular, this is so if the original 1NF relation includes more than one candidate key. This point is discussed by Rissanen and Delobel. Codd also touches on it when he warns against "over-projection," i.e., the replacement of a relation which is already in 3NF by two or more of its (3NF) projections.) In Section 7.4 the practical implications of the 3NF concept were discussed at some length, particularly from the point of view of the DBA. Finally, in Section 7.5 we pointed out that 3NF is not strictly adequate for handling all intrarelation dependencies, although it should be sufficient for nearly all practical situations. EXERCISES 7.1 Figure 7.18 represents a hierarchical structure (see Chapter 3) which contains information about departments of a company. For each department the database contains a department number (unique), a budget value, and the department manager's employee number (unique). For each department the database also contains information about all employees working in the department, all projects assigned to the department, and all offices occupied by the department. The employee information consists of employee number (unique), the number of the project on which he is working, his office number, and his phone number; the project information consists of project number (unique) and a budget value; and the office information consists of an office number (unique) and the area of that office (in square feet). Also, for each employee the database contains the title of each job the employee has held, together with date and salary for each distinct salary he received in that job; and for each office it contains the numbers (unique) of all phones in that office. 12

Fig. 7.18 A company database (hierarchical structure) Convert this hierarchical structure to an appropriate collection of 3NF relations. Make any assumptions you deem reasonable about the functional dependencies involved. 7.2 A database used in an order-entry system is to contain information about customers, items, and orders. The following information is to be included. For each customer Customer number (unique) Valid "ship-to" addresses (several per customer) Balance Credit limit Discount For each order Heading information: customer number, "ship-to" address, date of order Detail lines (several per order), each giving item number, quantity ordered For each item Item number (unique) Manufacturing plants Quantity on hand at each plant Stock danger level for each plant Item description For internal processing reasons a "quantity outstanding" value is associated with each detail line of each order. [This value is initially set equal to the quantity of the item ordered and is (progressively) reduced to zero as (partial) shipments are made.] Design a data model for this data based on 3NF relations. As in the previous question, make any semantic assumptions that seem necessary. 7.3 Suppose that in Exercise 7.2 only a very small number of customers, say I percent, actually have more than one ship-to address. (This is typical of real-life situations, in which frequently just a very few exceptions-usu ally rather important ones-fail to conform to a general pattern.) Can you see any drawbacks to your solution to Exercise 7.2? Can you think of any improvements? 13

7.4 A relation TIMETABLE is defined on the following domains. D Day of the week (1-5) P Period within day (1-8) C Classroom number T Teacher name S Student name L Lesson identifier A tuple <d, p, c, t, s, l> is an element of this relation if at time <d, p> student s is taught lesson l by teacher t in classroom c. You may assume that lessons are one period in duration and that every lesson has an identifier which is unique among all lessons taught in the week. Reduce TIMETABLE to an equivalent set of 3NF relations. 14