Question 1: Are normalization rules followed exclusively in the real world? Answer 1: Unfortunately, the answer to this question is no. Database design and development do not have hard and fast rules, and designing and developing databases could be considered more of an art form than a science, so databases everywhere are not normalized. In fact, if you asked a database analyst (DBA) to recite the nine rules of normalization, it would be very unlikely that he or she could recite three, let alone identify all of the nine normalization rules. Normalization is an academic concept; the effects of normalization are ingrained into the DBA's understanding of the trade. Although DBAs may not be able to cite all of the normalization rules, they practice them on a routine basis and they have become part of regular procedures. Unfortunately, compromises to the overall efficiency of the relational database are constantly made for the benefit of the business system of the end users. Question 2: What should all database developers remember? Answer 2: A simple rule is that there should be more tables and fewer fields. Because of relational integrity, the database can work faster through joined relationships against tables than fewer tables with a large number of fields. If the database contains many tables with few fields, expanding upon that schema will become simpler and more efficient over time. In designing a database with few tables and many fields, there will be rampant normalization problems and speed problems in areas of intense redundancy. Question 3: Is there a bad approach to database development? Answer 3: One bad approach to database development is an approach entirely based on assumption. If there is no study into the actual needs of the business or of its end users, then development is performed in a vacuum. The database analyst (DBA) makes assumptions about what kind of information must be stored, must be reported, and must be preserved for conducting regular business activity. This assumption can limit the ability of the database to service its end users over time. Another bad approach to database development is to isolate the DBA from end-user constituents, although this happens with some regularity. Some more revolutionary systems development life cycle (SDLC) models (like extreme programming, or XP) actually placed the DBAs alongside the analysts when conducting interviews and requirements documentation. This exposes the DBA to the stories and the complexity of the business process as experienced by the end users, so 1
the assumptions that the DBA might have about how the information is used can be dispelled. Perhaps the worst approach to database development can be found in not asking questions or not questioning assumptions. Question 4: How complex are enterprise databases compared to the smallto medium-size business (SMB) database? Answer 4: Scale has a direct relationship to complexity. If the rule that there should be more tables and fewer fields is to be applied, the enterprise would have hundreds if not thousands more tables than the small business inside its business system. There is something intuitive about this. The small Access flat-file database is likely to be used for a very narrow business function, whereas Oracle s enterprise resource planning (ERP) products and solutions will be used for extensively complex and broad business functions. A smaller product is likely to yield less complexity; however, it would be foolish to think that smaller databases are inherently simplistic this relates back to the overall design. Question 5: Should normalization be factored into the database system development life cycle (DSDL)? Answer 5: It should be factored in, particularly in the conceptualization phases, but it is not a phase in and of itself. Normalization is factored into the database system development life cycle (DSDL) in the course of developing the conceptual understanding of tables and fields; the database analyst (DBA) responsible for documenting requirements should realize that normalization must be applied during the conceptualization process. It can also be that various iterations of the initial conceptual schema could be prepared by different individuals and refined before actually being created inside the database management system (DBMS) environment. Normalization should be a regular part of conceptualizing the database in the initial stages of the DSDL. Question 6: Why does a company not fix its database management system (DBMS) problems? Answer 6: This is a difficult yet quite common question. The users will often notice redundancy, inaccuracy, or performance problems that are related to database activity. They notice this because users are generally more savvy about databases than they have been historically; they also have greater access to the underlying layers of the database management system (DBMS) 2
than they ever have historically. It is not uncommon to see users who understand that inaccuracies in a database are related to inaccuracies in reporting, data extraction, or business system conclusions. In weighing the cost of redesigning the database to fix the problem, the information technology (IT) department may conclude that completely revising the database schema would affect far too many applications and would be impossible to correct. Thus, the IT department may choose to allow inaccuracy and redundancy to exist because the cost of the correction far exceeds the value to the business system. The IT department may also be waiting for a correction to arise from a scheduled update or application of a service pack to the database environment. End users may be frustrated with the IT department s sluggish if not disinterested response, but the IT department is simply performing a cost benefit judgment on its time and resources. The IT department might just leave consistent problems with the database alone so it might focus on areas where it can provide more extensive value. Question 7: How does the database management system (DBMS) structure relate to application services? Answer 7: N-tiered architecture refers to various levels of databases and applications working on multiple servers. One server may be configured to house an application, another server may be configured to house an operations database, and still another server may be configured to house a security database for the application. Multiple servers are used to balance the overall application across many database engines and across many application servers. The complexity of this kind of arrangement is enormous, but the database management system (DBMS) structure can be partitioned in this way to improve overall performance and to specialize data in specific containers or places in the information system. The database structure will support the application primarily in terms of speed: The application must access, retrieve, and perform DBMS operations quickly as to not irritate the end user when using the program; because of this, the DBMS structure will be optimized for the application service. Question 8: Is too much normalization bad? Is there a risk of overdoing it? Answer 8: Although this is probably a matter of opinion, yes, too much normalization is probably bad. A good example of this could be an address. An address typically consists of about five fields: address_1 field, address_2 field, city field, state field, and zip_code field. It makes sense to believe that a common address for any U.S. postal location could be stored in the 3
structure; however, in applying 2NF and 3NF, inconsistencies are found in storing multiple addresses in this way. There would be repeating rows of cities and states, which would be transitively defined by the primary key of another entity, like a customer ID. Both of the situations fail 2NF and 3NF. You might be motivated to create another table for states, cities, and zip codes with primary keys for each entity. This is to say that an address would have three joins across at least three tables address joined to states, states joined to cities, cities to zip codes all to avoid 2NF and 3NF irregularities. In this circumstance, it would be impossible to misspell a city, assign it an incorrect zip code, or have redundant or misspelled states. This may look fairly desirable, but you can imagine the record set: {123 Main Street, STE 10, 398,176,9}. In the preceding example, the numbers reflect foreign keys. This may be extraordinarily convoluted and perhaps really too much for what you are attempting to accomplish, which is to simply store an address. This is a common example where too much normalization adds too much complexity to a relatively simple problem. Question 9: How many database analysts (DBAs) does a company need? Answer 9: That depends. Many companies that have extensive enterprise scale databases will have teams of database analysts (DBAs). Smaller firms may have just one DBA, whereas even smaller companies may give their software development teams DBA authority. Some companies might even outsource the DBA function entirely to a vendor who can manage it more effectively than themselves. It is a decision that is based on how management perceives the value of the database to the execution of their business strategy. If the database is perceived primarily as a utilitarian function and is not a strategic asset, it is likely that there will be few DBAs or the DBA function will be outsourced. If data accuracy, speed, performance, reporting, and consistency is important to the overall execution of a company s business plan, it is likely that a company will install DBAs to safeguard its database activities. Question 10: How would a database analyst (DBA) use entity-relationship diagrams (ERDs)? Answer 10: The entity-relationship diagram (ERD) is a map that shows the relationships between tables and other database management system (DBMS) objects. Enterprise-scaled databases are exceptionally complicated with numerous tables and tens of thousands of relationships. The ERD posted to a wall or available in electronic form allows the database analyst (DBA) to conceptualize data as they are stored in the database. The ERD can 4
be an easy way to share complex information with a variety of stakeholders it might give the DBA a point of reference for conversation with developers, analysts, and vendors. It is an important tool in managing the complex, highly scaled enterprise DBMS solution. 5