OpenEdge. Database Essentials. Getting Started:

Size: px

Start display at page:

Download "OpenEdge. Database Essentials. Getting Started:"

Matthew Paul
5 years ago
Views:

1 OpenEdge Database Essentials Getting Started:

3 Copyright 2017 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. These materials and all Progress software products are copyrighted and all rights are reserved by Progress Software Corporation. The information in these materials is subject to change without notice, and Progress Software Corporation assumes no responsibility for any errors that may appear therein. The references in these materials to specific platforms supported are subject to change. Business Making Progress, Corticon, DataDirect (and design), DataDirect Cloud, DataDirect Connect, DataDirect Connect64, DataDirect XML Converters, DataDirect XQuery, Deliver More Than Expected, Icenium, Kendo UI, Making Software Work Together, NativeScript, OpenEdge, Powered by Progress, Progress, Progress Software Developers Network, Rollbase, SequeLink, Sitefinity (and Design), SpeedScript, Stylus Studio, TeamPulse, Telerik, Telerik (and Design), Test Studio, and WebSpeed are registered trademarks of Progress Software Corporation or one of its affiliates or subsidiaries in the U.S. and/or other countries. AccelEvent, Analytics360, AppsAlive, AppServer, Arcade, BravePoint, BusinessEdge, DataDirect Spy, DataDirect SupportLink, DevCraft, DigitalFactory, Fiddler, Future Proof, High Performance Integration, JustCode, JustDecompile, JustMock, JustTrace, OpenAccess, ProDataSet, Progress Arcade, Progress Profiles, Progress Results, Progress RFID, Progress Software, ProVision, PSE Pro, SectorAlliance, Sitefinity, SmartBrowser, SmartComponent, SmartDataBrowser, SmartDataObjects, SmartDataView, SmartDialog, SmartFolder, SmartFrame, SmartObjects, SmartPanel, SmartQuery, SmartViewer, SmartWindow, WebClient, and Who Makes Progress are trademarks or service marks of Progress Software Corporation and/or its subsidiaries or affiliates in the U.S. and other countries. Java is a registered trademark of Oracle and/or its affiliates. Any other marks contained herein may be trademarks of their respective owners. Please refer to the Release Notes applicable to the particular Progress product release for any third-party acknowledgements required to be provided in the documentation associated with the Progress product. The Release Notes can be found in the OpenEdge installation directory and online at: For the latest documentation updates see OpenEdge Product Documentation on Progress Communities: ( openedge-product-documentation-overview.aspx). August 2014 Last updated with new content: Release Updated: 2017/03/24 3

4 Copyright 4

5 Contents Table of Contents Preface...9 Purpose...9 Audience...10 Organization...10 References to ABL data types...10 Typographical conventions...11 OpenEdge messages...12 Obtaining more information about OpenEdge messages...12 Chapter 1: Introduction to Databases...15 Describing a database...15 Elements of a relational database...16 Tables...17 Rows...17 Columns...17 Keys...18 Indexes...18 Applying the principles of the relational model...19 OpenEdge database and the relational model...21 Database schema and metaschema...21 Sports 2000 database...22 Key points to remember...23 Chapter 2: Database Design...25 Design basics...25 Data analysis...26 Logical database design...27 Table relationships...27 One-to-one relationship...28 One-to-many relationship...29 Many-to-many relationship...29 Normalization...30 First normal form...31 Second normal form...33 Third normal form...34 Denormalization...35 Defining indexes...35 Indexing basics

6 Contents Choosing which tables and columns to index...40 Indexes and ROWIDs...41 Calculating index size...41 Eliminating redundant indexes...42 Deactivating indexes...43 Physical database design...43 Chapter 3: OpenEdge RDBMS...45 OpenEdge database file structure...46 Other database-related files...47 OpenEdge architecture...48 Storage areas...48 Guidelines for choosing storage area locations...50 Extents...50 Clusters...50 Blocks...51 Other block types...52 Storage design overview...54 Mapping objects to areas...55 Determining configuration options...56 System platform...56 Connection modes...56 Client type...57 Database location...57 Database connections...57 Relative- and absolute-path databases...59 Chapter 4: Administrative Planning...61 Data layout...61 Calculating database storage requirements...62 Sizing your database areas...65 Database areas...72 Data area optimization...72 Primary recovery (before-image) information...73 After-image information...74 System resources...76 Disk capacity...76 Disk storage...76 Projecting future storage requirements...77 Comparing expensive and inexpensive disks...78 Understanding cache usage...79 Increasing disk reliability with RAID...79 OpenEdge in a network storage environment

7 Contents Disk summary...80 Memory usage...81 Estimating memory requirements...81 Optimizing memory usage...85 CPU activity...87 Tuning your system...88 Understanding idle time...89 Fast CPUs versus many CPUs...89 Tunable operating system resources...89 Chapter 5: Database Administration...91 Database administrator role...91 Security administrator role...92 Ensuring system availability...92 Database capacity...93 Application load...93 System memory...93 Additional factors to consider in monitoring performance...94 Testing to avoid problems...94 Safeguarding your data...94 Why backups are done...95 Creating a complete backup and recovery strategy...95 Using PROBKUP versus operating system utilities...97 After-imaging implementation and maintenance...98 Testing your recovery strategy...99 Maintaining your system Daily monitoring tasks Monitoring the database log file Monitoring area fill Monitoring buffer hit rate Monitoring buffers flushed at checkpoint Monitoring system resources (disks, memory, and CPU) Periodic monitoring tasks Database analysis Rebuilding indexes Compacting indexes Fixing indexes Moving tables Moving indexes Truncating and growing BI files Dumping and loading Periodic event administration Annual backups Archiving

8 Contents Modifying applications Migrating OpenEdge releases Profiling your system performance Establishing a performance baseline Performance tuning methodology Summary Index

9 Preface For details, see the following topics: Purpose Audience Organization References to ABL data types Typographical conventions OpenEdge messages Purpose introduces the principles of a relational database, database design, and the architecture of the OpenEdge database. The book also introduces planning concepts for a successful database deployment, and the database administration tasks required for database maintenance and tuning. You should use this book if you are unfamiliar with either relational database concepts or database administration tasks. For the latest documentation updates see the OpenEdge Product Documentation on PSDN: openedge-product-documentation-overview.aspx. 9

10 Preface Audience This book is for users who are new database designers or database administrators and who require conceptual information to introduce them to the tasks and responsibilities of their role. Organization Introduction to Databases on page 15 Presents an introduction to relational database terms and concepts. Database Design on page 25 Provides an overview of database design techniques. OpenEdge RDBMS on page 45 Explains the architecture and configuration supported by an OpenEdge database. This chapter also provides information on storage design and client/server configurations. Administrative Planning on page 61 Offers administrative planning advice for block sizes, disk space, and other system resource requirements. Database Administration on page 91 Introduces the database administrator role and discusses the associated responsibilities and tasks. References to ABL data types ABL provides built-in data types, built-in class data types, and user-defined class data types. References to built-in data types follow these rules: Like most other keywords, references to specific built-in data types appear in all UPPERCASE, using a font that is appropriate to the context. No uppercase reference ever includes or implies any data type other than itself. Wherever integer appears, this is a reference to the INTEGER or INT64 data type. Wherever character appears, this is a reference to the CHARACTER, LONGCHAR, or CLOB data type. Wherever decimal appears, this is a reference to the DECIMAL data type. Wherever numeric appears, this is a reference to the INTEGER, INT64, or DECIMAL data type. References to built-in class data types appear in mixed case with initial caps, for example, Progress.Lang.Object. References to user-defined class data types appear in mixed case, as specified for a given application example. 10

11 Preface Typographical conventions This documentation uses the following typographical and syntax conventions: Convention Description Bold Italic SMALL, BOLD CAPITAL LETTERS KEY1+KEY2 KEY1 KEY2 Bold typeface indicates commands or characters the user types, provides emphasis, or the names of user interface elements. Italic typeface indicates the title of a document, or signifies new terms. Small, bold capital letters indicate OpenEdge key functions and generic keyboard keys; for example, GET and CTRL. A plus sign between key names indicates a simultaneous key sequence: you press and hold down the first key while pressing the second key. For example, CTRL+X. A space between key names indicates a sequential key sequence: you press and release the first key, then press another key. For example, ESCAPE H. Syntax: Fixed width Fixed-width italics Fixed-width bold UPPERCASE fixed width Period (.) or colon (:) [ ] [ ] { } { } A fixed-width font is used in syntax, code examples, system output, and file names. Fixed-width italics indicate variables in syntax. Fixed-width bold italic indicates variables in syntax with special emphasis. ABL keywords in syntax and code examples are almost always shown in upper case. Although shown in uppercase, you can type ABL keywords in either uppercase or lowercase in a procedure or class. All statements except DO, FOR, FUNCTION, PROCEDURE, and REPEAT end with a period. DO, FOR, FUNCTION, PROCEDURE, and REPEAT statements can end with either a period or a colon. Large brackets indicate the items within them are optional. Small brackets are part of ABL. Large braces indicate the items within them are required. They are used to simplify complex syntax diagrams. Small braces are part of ABL. For example, a called external procedure must use braces when referencing arguments passed by a calling procedure. 11

12 Preface Convention Description... A vertical bar indicates a choice. Ellipses indicate repetition: you can choose one or more of the preceding items. OpenEdge messages OpenEdge displays several types of messages to inform you of routine and unusual occurrences: Execution messages inform you of errors encountered while OpenEdge is running a procedure; for example, if OpenEdge cannot find a record with a specified index field value. Compile messages inform you of errors found while OpenEdge is reading and analyzing a procedure before running it; for example, if a procedure references a table name that is not defined in the database. Startup messages inform you of unusual conditions detected while OpenEdge is getting ready to execute; for example, if you entered an invalid startup parameter. After displaying a message, OpenEdge proceeds in one of several ways: Continues execution, subject to the error-processing actions that you specify or that are assumed as part of the procedure. This is the most common action taken after execution messages. Returns to the Procedure Editor, so you can correct an error in a procedure. This is the usual action taken after compiler messages. Halts processing of a procedure and returns immediately to the Procedure Editor. This does not happen often. Terminates the current session. OpenEdge messages end with a message number in parentheses. In this example, the message number is 200: ** Unknown table name table. (200) If you encounter an error that terminates OpenEdge, note the message number before restarting. Obtaining more information about OpenEdge messages In Windows platforms, use OpenEdge online help to obtain more information about OpenEdge messages. Many OpenEdge tools include the following Help menu options to provide information about messages: Choose Help > Recent Messages to display detailed descriptions of the most recent OpenEdge message and all other messages returned in the current session. Choose Help > Messages and then type the message number to display a description of a specific OpenEdge message. In the Procedure Editor, press the HELP key or F1. 12

13 Preface On UNIX platforms, use the OpenEdge pro command to start a single-user mode character OpenEdge client session and view a brief description of a message by providing its number. To use the pro command to obtain a message description by message number: 1. Start the Procedure Editor: OpenEdge-install-dir/bin/pro 2. Press F3 to access the menu bar, then choose Help > Messages. 3. Type the message number and press ENTER. Details about that message number appear. 4. Press F4 to close the message, press F3 to access the Procedure Editor menu, and choose File > Exit. 13

14 Preface 14

15 1 Introduction to Databases Before you can administer an OpenEdge database, it is important to understand the basic concepts of relational databases. This chapter introduces those concepts. For details, see the following topics: Describing a database Elements of a relational database Applying the principles of the relational model OpenEdge database and the relational model Key points to remember Describing a database A database is a collection of data that can be searched in a systematic way to maintain and retrieve information. A database offers you many advantages, including: Centralized and shared data You enter and store all your data in the computer. This minimizes the use of paper, files, folders, as well as the likelihood of losing or misplacing them. Once the data is in the computer, many users can access it through a computer network, regardless od the users' physical or geographical locations. Current data Since users can quickly update data, the data available is current and ready to use. Speed and productivity You can search, sort, retrieve, make changes, and print your data, as well as tally up the totals more quickly than performing these tasks by hand. 15

16 Chapter 1: Introduction to Databases Accuracy and consistency You can design your database to validate data entry, thus ensuring that it is consistent and valid. For example, if a user enters "OD" instead of "OH" for Ohio, your database can display an error message. It can also ensure that the user is unable to delete a customer record that has an outstanding order. Analysis Databases can store, track, and process large volumes of data from diverse sources. You can use the data collected from varied sources to track the performance of an area of business for analysis, or to reveal business trends. For example, a clothes retailer can track faulty suppliers, customers' credit ratings, and returns of defective clothing, and an auto manufacturer can track assembly line operation costs, product reliability, and worker productivity. Security You can protect your database by establishing a list of authorized user identifications and passwords. The security ensures that the user can perform only permitted operations. For example, you might allow users to read data in your database but they are not allowed to update or delete the data. Crash recovery System failures are inevitable. With a database, data integrity is assured in the event of a failure. The database management system uses a transaction log to ensure that your data will be properly recovered when you restart after a crash. Transactions The transaction concept provides a generalized error recovery mechanism that protects against the consequences of unexpected errors. Transactions ensure that a group of related database changes always occur as a unit; either all the changes are made or none of the changes are made. This allows you to restore the previous state of the database should an error occur after you began making changes, or if you simply decided not to complete the change. To satisfy the definition of a transaction, a database management system must adhere to the following four properties: Atomicity The transaction is either completed or entirely undone. There can be no partial transaction. Consistency The transaction must transform the database from one consistent state to another. Isolation Each transaction must execute independent of any other transaction. Durability Completed transactions are permanent. Using the first letter of each of the four properties, satisfying these properties defines your database transactions as ACID compliant. Now that the benefits of a database system have been discussed, the elements of relational databases follows. Elements of a relational database Relational databases are based on the relational model. The relational model is a group of rules set forth by E. F. Codd based on mathematical principles (relational algebra), and it defines how database management systems should function. The basic structures of a relational database (as defined by the relational model) are tables, columns (or fields), rows (or records), and keys. This section describes these elements. 16

17 Elements of a relational database Tables A table is a collection of logically related information treated as a unit. Tables are organized by rows and columns. The following figure shows the contents of a sample Customer table. Figure 1: Columns and rows in the Customer table Other common tables include an Order table in a retail database that tracks the orders each customer places, an Assignment table in a departmental database that tracks all the projects each employee works on, and a Student Schedule in a college database table that tracks all the courses each student takes. Tables are generally grouped into three types: Kernel tables Tables that are independent entities. Kernel tables often represent or model things that exist in the real world. Some example kernel tables are customers, vendors, employees, parts, goods, and equipment. Association tables Tables that represent a relationship among entities. For example, an order represents an association between a customer and goods. Characteristic tables Tables whose purpose is to qualify or describe some other entity. Characteristic only have meaning in relation to the entity they describe. For example, order-lines might describe orders; without an order, an order-line is useless. Rows A table is made up of rows (or records). A row is a single occurrence of the data contained in a table; each row is treated as a single unit. In the Customer table shown in Figure 1: Columns and rows in the Customer table on page 17, there are four rows, and each row contains information about an individual customer. Columns Rows are organized as a set of columns (or fields). All rows in a table comprise the same set of columns. In the Customer table, shown in Figure 1: Columns and rows in the Customer table on page 17, the columns are Cust Number, Name, and Street. 17

18 Chapter 1: Introduction to Databases Keys There are two types of keys: primary and foreign. A primary key is a column (or group of columns) whose value uniquely identifies each row in a table. Because the key value is always unique, you can use it to detect and prevent duplicate rows. A good primary key has the following characteristics: It is mandatory; that is, it must store non-null values. If the column is left blank, duplicate rows can occur. It is unique. For example, the social security column in an Employee or Student table is an example of an unique key because it uniquely identifies each individual. The Cust Number column in the Customer table uniquely identifies each customer. It is not practical to use a person's name as an unique key because more than one customer might have the same name. Also, databases do not detect variations in names as duplicates (for example, Cathy for Catherine, Joe for Joseph). Furthermore, people do sometimes change their names (for example, through a marriage or divorce). It is stable; that is, it is unlikely to change. A social security number is an example of a stable key because but it is unlikely to change, while a person's or customer's name might change. It is short; that is, it has few characters. Smaller columns occupy less storage space, database searches are faster, and entries are less prone to mistakes. For example, a social security column of nine digits is easier to access than a name column of 30 characters. A foreign key is a column value in one table that is required to match the column value of the primary key in another table. In other words, it is the reference by one table to another. If the foreign key value is not null, then the primary key value in the referenced table must exist. It is this relationship of a column in one table to a column in another table that provides the relational database with its ability to join tables. Database Design on page 25 describes this concept in more detail. When either a primary key or foreign key is comprised of multiple columns, it is considered a composite key. Indexes An index in a database operates like the index tab on a file folder. It points out one identifying column, such as a customer's name, that makes it easier and quicker to find the information you want. When you use index tabs in a file folder, you use those pieces of information to organize your files. If you index by customer name, you organize your files alphabetically; and if you index by customer number, you organize them numerically. Indexes in the database serve the same purpose. You can use a single column to define a simple index, or a combination of columns to define a composite or compound index. To decide which columns to use, you first need to determine how the data in the table is accessed. If users frequently look up customers by last name, then the last name is a good choice for an index. It is typical to base indexes on primary keys (columns that contain unique information). An index has the following advantages: Faster row search and retrieval. It is more efficient to locate a row by searching a sorted index table than by searching an unsorted table. In an application written with OpenEdge ABL (Advanced Business Language), records are ordered automatically to support your particular data access patterns. Regardless of how you change the table, when you browse or print it, the rows appear in indexed order instead of their stored physical order on disk. When you define an index as unique, each row is unique. This ensures that duplicate rows do not occur. A unique index can contain nulls, however, a primary key, although unique, cannot contain nulls. A combination of columns can be indexed together to allow you to sort a table in several different ways simultaneously (for example, sort the Projects table by a combined employee and date column). 18

19 Applying the principles of the relational model Efficient access to data in multiple related tables. When you design an index as unique, each key value must be unique. The database engine prevents you from entering records with duplicate key values. Applying the principles of the relational model The relational model organizes data into tables and allows you to create relationships among tables by referencing columns that are common to both the primary and foreign keys. It is easiest to understand this concept of relationships between tables with a common business example. A hypothetical business needs to track information about customers and their orders. The business' database, as shown in Figure 2: Example of a relational database on page 20, includes the following tables: The Customer table The Customer table shows four rows, one for each individual customer. Each row has two columns: Cust Num and Name. To uniquely identify each customer, every customer has a unique customer number. Each column contains exactly one data value, such as C3 and Jim Cain. The primary key is Cust Num. The Order table The Order table shows five rows for orders placed by the customers in the Customer table. Each Order row contains two columns: Cust Num, from the Customer table, and Order Num.The primary key is Order Num. The Cust Num column is the foreign key that relates the two tables. This relationship lets you find all the orders placed by a particular customer, as well as information about a customer for a particular order. The Order-Line table The Order-Line table shows seven rows for the order-lines of each order. Each order-line row contains three columns: Order-Line Num; Item Num, from the Item table; and Order Num, from the Order table. The primary key is the combination of Order Num and Order-Line Num. The two foreign keys, Order Num and Item Num, relate the Customer, Order, and Item tables so that you can find the following information: All the order-lines for an order Information about the order for a particular order-line The item in each order-line Information about an item 19

20 Chapter 1: Introduction to Databases The Item table The Item table shows four rows for each separate item. Each Item row contains two columns: Item Num and Description. Every item in the Item table has a unique item number. Item Num is the primary key. Figure 2: Example of a relational database Suppose you want to find out which customers ordered ski boots. To gather this data from your database, you must know what item number identifies ski boots and who ordered them. There is no direct relationship between the Item table and the Customer table, so to gather the data you need, you join four tables using their primary/foreign key relationships, following these steps: 1. Select the Item table row whose Description value equals ski boots. The Item Number value is I1. 2. Next, locate the Orders that contain Item I1. Because the Order table does not contain Items, you first select the Order-Lines that contain I1, and determine the Orders related to these Order-Lines. Orders 01 and 04 contain Item Number I1. 3. Now that you know the Order Numbers, you can find out the customers who placed the orders. Select the 01 and 04 orders, and determine the associated customer numbers. They are C1 and C3. 4. Finally, to determine the names of Customers C1 and C3, select the Customer table rows that contain customer numbers C1 and C3. Don Smith and Jim Cain ordered ski boots. 20

21 OpenEdge database and the relational model The following figure illustrates the steps outlined in the previous procedure. Figure 3: Selecting records from related tables By organizing your data into tables and relating the tables with common columns, you can perform powerful queries. The structures of tables and columns are relatively simple to implement and modify, and the data is consistent regardless of the queries or applications used to access the data. Figure 3: Selecting records from related tables on page 21 shows the primary key values as character data for clarity, but a numeric key is better and more efficient. OpenEdge database and the relational model The OpenEdge database is a relational database management system (RDBMS).You can add, change, manipulate, or delete the data and data structures in your database as your requirements change. Database schema and metaschema The logical structure of the OpenEdge database consists of the elements of a relational database: tables, columns, and indexes. The description of the database's structure, the tables it contains, the columns within the tables, views, etc. is called the database schema or the data definitions. The underlying structure of a database that makes it possible to store and retrieve data is called the metaschema. That is, the metaschema defines that there can be database tables and columns and the structural characteristics of those database parts. All metaschema table names begin with an underscore ( _ ). Note: The metaschema is a set of tables that includes itself. Therefore, you can do ordinary queries on the metaschema to examine all table and index definitions, including the definitions of the metaschema itself. The physical structure of the database and its relationship to the logical structure is discussed in OpenEdge RDBMS chapter. 21

22 Chapter 1: Introduction to Databases Sports 2000 database The Sports 2000 database is one of several sample databases provided with the product, and it is frequently used in the documentation to illustrate database concepts and programming techniques. This database holds the information necessary to track customers, take and process orders, bill customers, and track inventory. The following table describes the tables of the Sports 2000 database. For details about the fields and indexes of the Sports 2000 database, you can use either the Data Dictionary or the Data Admin Tool to create table and index reports. For details on how to create reports, see the online Help. Table 1: The Sports 2000 database Table Description Benefits BillTo Bin Customer Department Employee Family Feedback InventoryTrans Invoice Item Local-Default Order Order-Line POLine PurchaseOrder Ref-Call Salesrep Contains employee benefits Contains bill to address information for an order Represents the bins in each warehouse that contain items Contains customer information including balance and address Contains a master listing of departments Stores employee information including name and address Tracks an employee's family information Contains customer feedback regarding likes and dislikes Contains information about the movement of inventory Contains financial information by invoice for the receivables subsystem Provides quick reference for stocking, pricing, and descriptive information about items in inventory Contains format and label information for various countries Contains sales and shipping header information for orders Provides identification of and pricing information for a specific item ordered on a specific order Contains the PO detail information including the item and quantity on the PO Contains information pertaining to the purchase order including PO number and status Contains all history for a customer Contains names, regions, and quotas for the sales people 22

23 Key points to remember Table Description ShipTo State Supplier SupplierItemXr TimeSheet Vacation Warehouse Contains the ship to address information for an order Provides U.S. state names, their abbreviations, and sales region Contains a supplier's name, address, and additional information pertaining to the supplier Lists all of the items that are supplied by a particular supplier Records time in and out, hours worked, and overtime Tracks employee vacation time Contains warehouse information including warehouse name and address Key points to remember The following are some key points to remember: A database is an electronic filing system for organizing and storing data that relates to a broad subject area, such as sales and inventory. A database is made up of tables. A table is a collection of rows about a specific subject, such as customers. A row is a collection of pieces of information about one thing, such as a specific customer. A column is a specific item of information, such as a customer name. An index is a set of pointers to rows that you use as the basis for searching, sorting, or otherwise processing rows, such as a customer number. A primary key is a column (or group of columns) whose value uniquely identifies each row in a table. Because the key value is always unique, you can use it to detect and prevent duplicate rows. It cannot contain null data. An index in a database operates like the index tab on a file folder, making it easier to find information. A foreign key is a column (or group of columns) in one table whose values are required to match the value of a primary key in another table. 23

24 Chapter 1: Introduction to Databases 24

25 2 Database Design It is important to understand the concepts relating to database design. This chapter presents an overview of database design. For details, see the following topics: Design basics Data analysis Logical database design Table relationships Normalization Defining indexes Physical database design Design basics Once you understand the basic structure of a relational database, you can begin the database design process. Designing a database is an iterative process that involves developing and refining a database structure based on the information and processing requirements of your business. This chapter describes each phase of the design process. 25

26 Chapter 2: Database Design Data analysis The first step in the database design cycle is to define the data requirements for your business. Answer the following questions to get started: What types of information does my business currently use? What types of information does my business need? What kind of information do I want from this system? What kind of reports do I want to generate? What will I do with this information? What kind of data control and security does this system require? For information on how a user is identified and authenticated and access is authorized, see OpenEdge Getting Started: Identity Management. Where is expansion most likely to occur? Will multiple clients or sites utilize one common database? Is any information shared between the clients? For a complete introduction to multi-tenancy, see OpenEdge Getting Started: Multi-tenancy Overview. Do you anticipate large tables that can be partitioned horizontally? Horizontal table partitioning allows you to design a physical database layout that aligns storage with specific data values or ranges. The physical separation of data into partitions can improve performance, maintenance, and data availability. For overview information on table partitioning, see OpenEdge Getting Started: Table Partitioning. It is never too early to consider the security requirements of your design. For example: Will any data need to be encrypted? Will I need to audit changes to my data? For complete discussions of OpenEdge support for auditing and transparent data encryption, see OpenEdge Getting Started: Core Business Services - Security and Auditing. To answer some of these questions, list all the data you intend to input and modify in your database, along with all the expected outputs. For example, some of the requirements a retail store might include are the ability to: Input data for customers, orders, and inventory items Add, update, and delete rows Sort all customer addresses by zip code List alphabetically all customers with outstanding balances of over $1,000 List the total year-to-date sales and unpaid balances of all customers in a specific region List all orders for a specific item (for example, ski boots) List all items in inventory that have fewer than 200 units, and automatically generate a reorder report List the amount of overhead for each item in inventory Track customer information to have a current listing of customer accounts and balances Track customer orders, and print customer orders and billing information for both customers and the accounting department Track inventory to know which materials are in stock, which materials need to be ordered, where they are kept, and how much of your assets are tied up with inventory 26

27 Logical database design Track customer returns on items to know which items to discontinue and which suppliers to notify The process of identifying the goals of the business, interviewing, and gathering information from the different sources who will use the database is a time-consuming but essential process. Once you the information gathered, you are ready to define your tables and columns. Logical database design Logical database design helps you define and communicate your business' information requirements. When you create a logical database design, you describe each piece of information you need to track and the relationships among, or the business rules that govern, those pieces of information. Once you create a logical database design, you can verify with users and management that the design is complete (that is, it contains all of the data that must be tracked) and accurate (that is, it reflects the correct table relationships and enforces the business rules). Creating a logical data design is an information-gathering, iterative process. It includes the following steps: Define the tables you need based on the information your business requires. Determine the relationships between the tables. Determine the contents (or columns) of each table. Normalize the tables to at least the third normal form. Determine the primary keys and the column domain. A domain is the set of valid values for each column. For example, the domain for the customer number can include all positive numbers. At this point, you do not consider processing requirements, performance, or hardware constraints. Table relationships In a relational database, tables relate to one another by sharing a common column or columns. This column, existing in two or more tables, allows the tables to be joined. When you design your database, you define the table relationships based on the rules of your business. The relationship is frequently between primary and foreign key columns; however, tables can also be related by other nonkey columns. 27

28 Chapter 2: Database Design The following figure illustrates that the Customer and Order tables are related by a foreign key the Customer Number. Figure 4: Relating the Customer table and the Order Table If the Customer Number is an index in both tables, you can quickly do the following: Find all the orders for a given customer and query information for each order (such as order date, promised delivery date, the actual shipping date) Find customer information for each order using an order's customer number (such as name and address) One-to-one relationship A one-to-one relationship exists when each row in one table has only one related row in a second table. For example, a business might decide to assign one office to exactly one employee. Thus, one employee can have only one office. The same business might also decide that a department can have only one manager. Thus, one manager can manage only one department. The following figure shows these one-to-one relationships. Figure 5: Examples of one-to-one relationships The business might also decide that for one office there can be zero or one employee, or for one department there can be no manager or one manager. These relationships are described as zero-or-one relationships. 28

29 Table relationships One-to-many relationship A one-to-many relationship exists when each row in one table has one or many related rows in a second table. The following figure shows examples: one customer can place many orders, or a sales representative can have many customer accounts. Figure 6: Examples of one-to-many relationships However, the business rule might be that for one customer there can be zero-or-many orders, one student can take zero-or-many courses, and a sales representative can have zero-or-many customer accounts. This relationship is described as a zero-or-many relationship. Many-to-many relationship A many-to-many relationship exists when a row in one table has many related rows in a second table. Likewise, those related rows have many rows in the first table. The following figure shows examples of: An order can contain many items, and an item can appear in many different orders An employee can work on many projects, and a project can have many employees working on it Figure 7: Examples of the many-to-many relationship Accessing information in tables with a many-to-many relationship is difficult and time consuming. For efficient processing, you can convert the many-to-many relationship tables into two one-to-many relationships by connecting these two tables with a cross-reference table that contains the related columns. 29

30 Chapter 2: Database Design For example, to establish a one-to-many relationship between Order and Item tables, create a cross-reference table Order-Line, as shown in the following figure. The Order-Line table contains both the Order Number and the Item Number. Without this table, you would have to store repetitive information or create multiple columns in both the Order and Item tables. Figure 8: Using a cross-reference table to relate Order and Item tables Normalization Normalization is an iterative process during which you streamline your database to reduce redundancy and increase stability. During the normalization process, you determine in which table a particular piece of data belongs based on the data itself, its meaning to your business, and its relationship to other data. Normalizing your database results in a data-driven design that is more stable over time. Normalization requires that you know your business and know the different ways you want to relate the data in your business. When you normalize your database, you eliminate columns that: Contain more than one value Are duplicates or repeat Do not describe the table in which they currently reside Contain redundant data Can be derived from other columns 30

31 Normalization The result of each iteration of the normalization process is a table that is in a normal form. After one complete iteration, your table is said to be in first normal form; after two, second normal form; and so on. The sections that follow describe the rules for the first, second, and third normal forms. A perfectly normalized database represents the most stable data-driven design, but it might not yield the best performance. Increasing the number of tables and keys, generally leads to higher overhead per query. If performance degrades due to normalization, you should consider denormalizing your data. See the Denormalization on page 35 for more information. First normal form The first rule of normalization is that you must remove duplicate columns or columns that contain more than one value to a new table. The columns of a table in the first normal form have these characteristics: They contain only one value They occur once and do not repeat First, examine an un-normalized Customer table, as shown in the following figure. Table 2: Un-normalized Customer table with several values in a column Cust Num Name Street Order Number 101 Jones, Sue 2 Mill Ave. M31, M98, M Hand, Jim 12 Dudley St. M Lee, Sandy 45 School St. M37, M Tan, Steve 67 Main St. M41 Here, the Order Number column has more than one entry. This makes it very difficult to perform even the simplest tasks, such as deleting an order, finding the total number of orders for a customer, or printing orders in sorted order. To perform any of those tasks, you need a complex algorithm to examine each value in the Order Number column for each row. You can eliminate the complexity by updating the table so that each column in a table consists of exactly one value. The following figure shows the same Customer table in a different un-normalized format which contains only one value per column. Table 3: Un-normalized Customer table with multiple duplicate columns Cust Num Name Street Order Number1 Order Number2 Order Number3 101 Jones, Sue 2 Mill Ave. M31 M98 M Hand, Jim 12 Dudley St. M56 Null Null 103 Lee, Sandy 45 School St. M37 M140 Null 104 Tan, Steve 67 Main St. M41 Null Null 31

32 Chapter 2: Database Design Here, instead of a single Order Number column, there are three separate but duplicate columns for multiple orders. This format is also not efficient. What happens if a customer has more than three orders? You must either add a new column or clear an existing column value to make a new entry. It is difficult to estimate a reasonable maximum number of orders for a customer. If your business is brisk, you might have to create 200 Order Number columns for a row. But if a customer has only 10 orders, the database will contain 190 null values for this customer. Furthermore, it is difficult and time consuming to retrieve data with repeating columns. For example, to determine which customer has Order Number M98, you must look at each Order Number column individually (all 200 of them) in every row to find a match. To reduce the Customer table to the first normal form, split it into two smaller tables, one table to store only Customer information and another to store only Order information. Table 4: Customer table reduced to first normal form on page 32 shows the normalized Customer table, and Table 5: Order table created when normalizing the Customer table on page 32 shows the new Order table. Table 4: Customer table reduced to first normal form Cust Num (Primary key) Name Street Jones, Sue Hand, Jim Lee, Sandy Tan, Steve 2 Mill Ave. 12 Dudley St. 45 School St. 67 Main St. Table 5: Order table created when normalizing the Customer table Order Number (Primary key) Cust Num (Foreign key) M31 M98 M129 M56 M37 M140 M There is now only one instance of a column in the Customer and Order tables, and each column contains exactly one value. The Cust Num column in the Order table relates to the Cust Num column in the Customer table. A table that is normalized to the first normal form has these advantages: It allows you to create any number of orders for each customer without having to add new columns. It allows you to query and sort data for orders very quickly because you search only one column Order Number. 32

33 Normalization It uses disk space more efficiently because no empty columns are stored. Second normal form The second rule of normalization is that you must move those columns that do not depend on the primary key of the current table to a new table. A table is in the second normal form when it is in the first normal form and only contains columns that give you information about the key of the table. The following table shows a Customer table that is in the first normal form because there are no duplicate columns, and every column has exactly one value. Table 6: Customer table with repeated data Cust Num Name Street Order Number Order Date Order Amount 101 Jones, Sue 2 Mill Ave. M31 3/19/05 $ Jones, Sue 2 Mill Ave. M98 8/13/05 $3, Jones, Sue 2 Mill Ave. M129 2/9/05 $ Hand, Jim 12 Dudley St. M56 5/14/04 $1, Lee, Sandy 45 School St. M37 12/25/04 $ Lee, Sandy 45 School St. M140 3/15/05 $ Tan, Steve 67 Main St. M41 4/2/04 $2, However, the table is not normalized to the second rule because it has these problems: The first three rows in this table repeat the same data for the columns Cust Num, Name, and Street. This is redundant data. If the customer Sue Jones changes her address, you must then update all existing rows to reflect the new address. In this case, you would update three rows. Any row with the old address left unchanged leads to inconsistent data, and your database will lack integrity. You might want to trim your database by eliminating all orders before November 1, 2004, but in the process, you also lose all the customer information for Jim Hand and Steve Tan. The unintentional loss of rows during an update operation is called an anomaly. To resolve these problems, you must move data. Note that Table 6: Customer table with repeated data on page 33 contains information about an individual customer, such as Cust Num, Name, and Street, that remains the same when you add an order. Columns like Order Num, Order Date, and Order Amount do not pertain to the customer and do not depend on the primary key Cust Num. They should be in a different table. To reduce the Customer table to the second normal form, move the Order Date and Order Amount columns to the Order tables, as shown in Table 7: Customer table on page 33 and Table 8: Order table on page 34. Table 7: Customer table Cust Num (Primary key) Name Street 101 Jones, Sue 2 Mill Ave. 33

34 Chapter 2: Database Design Cust Num (Primary key) Name Street Hand, Jim Lee, Sandy Tan, Steve 12 Dudley St. 45 School St. 67 Main St. Table 8: Order table Order Number (Primary key) Order Date Order Amount Cust Num (Foreign key) M31 3/19/05 $ M98 8/13/05 $3, M129 2/9/05 $ M56 5/14/04 $1, M37 12/25/04 $ M140 3/15/05 $ M41 4/2/04 $2, The Customer table now contains only one row for each individual customer, while the Order table contains one row for every order, and the Order Number is its primary key. The Order table contains a common column, Cust Num, that relates the Order rows with the Customer rows. A table that is normalized to the second normal form has these advantages: It allows you to make updates to customer information in just one row. It allows you to delete customer orders without eliminating necessary customer information. It uses disk space more efficiently because no repeating or redundant data is stored. Third normal form The third rule of normalization is that you must remove columns that can be derived from existing columns. A table is in the third normal form when it contains only independent columns, that is, columns not derived from other columns. The following table shows an Order table with a Total After Tax column that is calculated from adding a 10% tax to the Order Amount column. Table 9: Order table with derived column Order Number (Primary key) Order Date Order Amount Total After Tax Cust Num (Foreign key) M31 3/19/05 $ $

35 Defining indexes Order Number (Primary key) Order Date Order Amount Total After Tax Cust Num (Foreign key) M98 8/13/05 $3, $3, M129 2/9/05 $ $ M56 5/14/04 $1, $1, M37 12/25/04 $ $ M140 3/15/04 $ $ M41 4/2/04 $2, $2, To reduce this table to the third normal form, eliminate the Total After Tax column because it is a dependent column that changes when the Order Amount or tax changes. For your report, you can create an algorithm to obtain the amount for Total After Tax. You need only keep the source value because you can always derive dependent values. Similarly, if you have an Employee table, you do not need to include an Age column if you already have a Date of Birth column, because you can always calculate the age from the date of birth. A table that is in the third normal form gives you these advantages: It uses disk space more efficiently because no unnecessary data is stored It contains only the necessary columns because superfluous columns are removed Although a database normalized to the third normal form is desirable because it provides a high level of consistency, it might impact performance when you implement the database. If this occurs, consider denormalizing these tables. Denormalization Denormalizing a database means that you reintroduce redundancy into your database to meet processing requirements. To reduce Table 9: Order table with derived column on page 34 to the third normal form, the Total After Tax column was eliminated because it contained data that can be derived. However, when data access requirements are considered, you discover that this data is constantly used. Although you can construct the Total After Tax value, your customer service representatives need this information immediately, and you do not want to have to calculate it every time it is needed. If it is kept in the database, it is always available on request. In this instance, the performance outweighs other considerations, so you denormalize the data by including the derived field in the table. Defining indexes An index on a database table speeds up the process of searching and sorting rows. Although it is possible to search and sort data without using indexes, indexes generally speed up data access. Use them to avoid or limit row scanning operations and to avoid sorting operations. If you frequently search and sort row data by particular columns, you might want to create indexes on those columns. Or, if you regularly join tables to retrieve data, consider creating indexes on the common columns. 35

36 Chapter 2: Database Design On the other hand, indexes consume disk space and add to the processing overhead of many data operations including data entry, backup, and other common administration tasks. Each time you update an indexed column, OpenEdge updates the index, and related indexes as well. When you create or delete a row, OpenEdge updates each index on the affected tables. As you move into the details of index design, remember that index design is not a once-only operation. It is a process, and it is intricately related to your coding practices. Faulty code can undermine an index scheme, and masterfully coded queries can perform poorly if not properly supported by indexes. Therefore, as your applications develop and evolve, your indexing scheme might need to evolve as well. The following sections discuss indexes in detail: Indexing basics on page 36 Choosing which tables and columns to index on page 40 Indexes and ROWIDs on page 41 Calculating index size on page 41 Eliminating redundant indexes on page 42 Deactivating indexes on page 43 Indexing basics This section explains the basics of indexing, including: How indexes work on page 36 Reasons for defining an index on page 37 Sample indexes on page 38 Disadvantages of defining an index on page 40 How indexes work A database index works like a book index. To look up a topic, you scan the book index, locate the topic, and turn to the pages where the information resides. The index itself does not contain the information; it only contains page numbers that direct you to the pages where the information resides. Without an index, you would have to search the entire book, scanning each page sequentially. 36

37 Defining indexes Similarly, if you ask for specific data from a database, the database engine uses an index to find the data. An index contains two pieces of information the index key and a row pointer that points to the corresponding row in the main table. The following figure illustrates this using the Order table from the Sports 2000 database. Figure 9: Indexing the Order table Index table entries are always sorted in numerical, alphabetical, or chronological order. Using the pointers, the system can then access data rows directly, and in the sort order specified by the index. Every table should have at least one index, the primary index. When you create the first index on any table, OpenEdge assumes it is the primary index and sets the Primary flag accordingly. In the above figure, the Order-Num index is the primary index. Reasons for defining an index There are four benefits to defining an index for a table: Direct access and rapid retrieval of rows. The rows of the tables are physically stored in the sequence the users enter them into the database. If you want to find a particular row, the database engine must scan every individual row in the entire table until it locates one or more rows that meet your selection criteria. Scanning is inefficient and time consuming, particularly as the size of your table increases. When you create an index, the index entries are stored in an ordered manner to allow for fast lookup. For example, when you query for order number 4, OpenEdge does not go to the main table. Instead, it goes directly to the Order-Num index to search for this value. OpenEdge uses the pointer to read the corresponding row in the Order table. Because the index is stored in numerical order, the search and retrieval of rows is very fast. Similarly, having an index on the date column allows the system to go directly to the date value that you query (for example, 9/13/04). The system then uses the pointer to read the row with that date in the Order table. Again, because the date index is stored in chronological order, the search and retrieval of rows is very fast. Automatic ordering of rows. An index imposes an order on rows. Since an index automatically sorts rows sequentially (instead of the order in which the rows are created and stored on the disk), you can get very fast responses for range queries. For example, when you query, "Find all orders with dates from 09/6/04 to 09/20/04," all the order rows for that range appear in chronological order. 37

38 Chapter 2: Database Design Note: Although an index imposes order on rows, the data stored on disk is in the order in which it was created. You can have multiple indexes on a table, each providing a different sort ordering, and the physical storage order is not controlled by either of the indexes. Enforced uniqueness. When you define a unique index for a table, the system ensures that no two rows can have the same value for that index. For example, if order-num 4 already exists and you attempt to create an order with order-num 4, you get an error message stating that 4 already exists. The message appears because order-num is a unique index for the order table. Rapid processing of inter-table relationships. Two tables are related if you define a column (or columns) in one table that you use to access a row in another table. If the table you access has an index based on the corresponding column, then the row access is much more efficient. The column you use to relate two tables does not need to have the same name in both tables. Sample indexes The following figure lists some indexes defined in the Sports 2000 database, showing why the index is defined. Table 10: Reasons for defining some Sports 2000 database indexes Table Index name Index column(s) Primary Unique Customer cust-num cust-num YES YES Why the index is defined: Rapid access to a customer given a customer's number. Reporting customers in order by number. Ensuring that there is only one customer row for each customer number (uniqueness). Rapid access to a customer from an order, using the customer number in the order row. name name NO NO Why the index is defined: Rapid access to a customer given a customer's name. Reporting customers in order by name. zip zip NO NO Why the index is defined: Rapid access to all customers with a given zip code or in a zip code range. Reporting customers in order by zip code, perhaps for generating mailing lists. Item item-num item-num YES YES 38

39 Defining indexes Table Index name Index column(s) Primary Unique Why the index is defined: Rapid access to an item given an item number. Reporting items in order by number. Ensuring that there is only one item row for each item number (uniqueness). Rapid access to an item from an order-line, using the item-num column in the order-line row. Order-line order-line order-num YES YES line-num Why the index is defined: Ensuring that there is only one order-line row with a given order number and line number. The index is based on both columns together since neither column alone needs to be unique. Rapid access to all of the order-lines for an order, ordered by line number. item-num item-num NO NO Why the index is defined: Rapid access to all the order-lines for a given item. Order order-num order-num YES YES 39

40 Chapter 2: Database Design Table Index name Index column(s) Primary Unique Why the index is defined: Rapid access to an order given an order number. Reporting orders in order by number. Ensuring that there is only one order row for each order number (uniqueness). Rapid access to an order from an order-line, using the order-num column in the order-line row. cust-order cust-num NO YES order-num Why the index is defined: Rapid access to all the orders placed by a customer. Without this index, all of the records in the order file would be examined to find those having a particular value in the cust-num column. Ensuring that there is only one row for each customer/order combination (uniqueness). Rapid access to the order numbers of a customer's orders. order-date order-date NO NO Why the index is defined: Rapid access to all the orders placed on a given date or in a range of dates. Disadvantages of defining an index Even though indexes are beneficial, there are two things to remember when defining indexes for your database: Indexes take up disk space.(see the Calculating index size on page 41) Indexes can slow down other processes. When the user updates an indexed column, OpenEdge updates all related indexes as well, and when the user creates or deletes a row, OpenEdge changes all the indexes for that table. Define the indexes that your application requires, but avoid indexes that provide little benefit or are infrequently used. For example, unless you display data in a particular order frequently (such as by zip code), then sorting the data when you display it is more efficient than defining an index to do automatic sorting. Choosing which tables and columns to index If you perform frequent adds, deletes, and updates against a small table, you might not want to index it because of slower performance caused by the index overhead. However, if you mostly perform retrievals, then it is useful to create an index for the table. You can index the columns that are retrieved most often, and in the order they are most often retrieved. 40

41 Defining indexes You do not have to create an index if you are retrieving a large percentage of the rows in your database (for example, 19,000 out of 20,000 rows) because it is more efficient to scan the table. However, it is worth the effort to create an index to retrieve a very small number of rows (for example, 100 out of 20,000) because then OpenEdge only scans the index table instead of the entire table. Indexes and ROWIDs An index is a list of index values and row IDs (ROWIDs). ROWIDs are physical pointers to the database tables that give you the fastest access to rows. ROWIDs do not change during the life of a row they only change when you dump and reload a database. If you delete a row and create a new identical row, it might have a different ROWID. The database blocks of an index are organized into a tree structure for fast access. OpenEdge locates rows within an index by traversing the index tree. Once a row is located, the ROWID accesses the data. OpenEdge does not lock the whole index tree while it looks for the row, it only locks the block that contains the row. Therefore, other users can simultaneously access rows in the same database. Calculating index size You can estimate the approximate maximum amount of disk space occupied by an index by using this formula: Number of rows * (7 + number of columns in index + index column storage) * 2 For example, if you have an index on a character column with an average of 21 characters for column index storage and there are 500 rows in the table, the index size is: 500 * ( ) * 2 = 29,000 bytes The size of an index is dependent on four things: The number of entries or rows. The number of columns in the key. The size of the column values, that is, the character value "abcdefghi" takes more space than "xyz." In addition, special characters and mulit-byte Unicode characters take even more space. The number of similar key values. You will never reach the maximum because OpenEdge uses a data compression algorithm to reduce the amount of disk space an index uses. In fact, an index uses on average about 20% to 60% less disk space than the maximum amount you calculated using the previously described formula. The amount of data compressed depends on the data itself. OpenEdge compresses identical leading data and collapses trailing entries into one entry. Typically non-unique indexes get better compression than unique indexes. Note: All key values are compressed in the index, eliminating as many redundant bytes as possible. 41

42 Chapter 2: Database Design The following figure shows how OpenEdge compresses data. Figure 10: Data compression The City index is stored by city and by ROWID in ascending order. There is no compression for the very first entry "Bolonia." For subsequent entries, OpenEdge eliminates any characters that are identical to the leading characters of Bolonia. Therefore, for the second entry, "Bolton," it is not necessary to save the first three characters "Bol" since they are identical to leading characters of Bolonia. Instead, Bolton compresses to "ton." Subsequently, OpenEdge does not save redundant occurrences of Bolton. Similarly, the first two characters of "Bonn" and "Boston" ("Bo") are not saved. For ROWIDs, OpenEdge eliminates identical leading digits. It saves the last digit of the ROWID separately and combines ROWIDs that differ only by the last digit into one entry. For example, OpenEdge saves the leading three digits of the first ROWID 333 under ROWID, and saves the last digit under nth byte. Go down the list and notice that the first occurrence of Boston has a ROWID of 1111, the second has a ROWID of Since the leading three digits (111) of the second ROWID are identical to the first one, they are not saved; only the last digit (8) appears in the index. Because of the compression feature, OpenEdge can substantially decrease the amount of space indexes normally use. In the above figure, only 65 bytes are used to store the index that previously took up 141 bytes. That is a saving of approximately 54%. As you can see, the amount of disk space saved depends on the data itself, and you can save the most space on the non-unique indexes. Eliminating redundant indexes If two indexes share the same leading, contiguous components for the same table, they are redundant. Redundant indexes occupy space and slow down performance. 42

43 Physical database design Deactivating indexes Indexes that you seldom use can impair performance by causing unnecessary overhead. If you do not want to delete an index that is seldom used, then you should deactivate it. Deactivating an index eliminates the processing overhead associated with the index, but it does not free up the index disk space. For information on how to deactivate indexes, see OpenEdge Data Management: Database Administration. To learn deactivate indexes using SQL, see OpenEdge Data Management: SQL Reference. Physical database design The physical database design is a refinement of the logical design. In this phase, you examine how the user will access the database. During this phase, ask yourself: What data will I commonly use? Which columns in the table should I index based on data access? Where should I build in flexibility and allow for growth? Should I denormalize the database to improve performance? At this stage you might denormalize the database to meet performance requirements. Once you determine the physical design of your database, you must determine how to map the database to your hardware. Maintaining the physical database is the primary responsibility of a database administrator. The OpenEdge RDBMS on page 45 discusses the physical storage of the OpenEdge database. 43

44 Chapter 2: Database Design 44

45 3 OpenEdge RDBMS When administering an OpenEdge database, it is important to understand its architecture and the configuration options it supports. This chapter presents an overview of the OpenEdge Release 10 database. For details, see the following topics: OpenEdge database file structure OpenEdge architecture Storage design overview Determining configuration options Relative- and absolute-path databases 45

46 Chapter 3: OpenEdge RDBMS OpenEdge database file structure The OpenEdge database contains more than data. The following figure illustrates the components of an OpenEdge database. Descriptions of the files follow the table. Figure 11: OpenEdge RDBMS As shown in the above figure, a typical OpenEdge database consists of: A structure description (.st) file, which defines the structure of the database. The.st file is a text file with a.st filename extension. The administration utility PROSTRCT CREATE uses the information in the.st file to create the areas and extents of the database. It is the database administrator's responsibility to create the.st file. For detailed information about structure description files, see OpenEdge Data Management: Database Management. A log (.lg) file, which is a text file. The.lg file contains a history of significant database events, including server startup and shutdown, client login and logout, and maintenance activity. One database (.db) control area, which is a binary file containing a database structure extent. The control area and its.db file act as a table of contents for the database engine, listing the name and location of every area and extent in the database. 46

47 OpenEdge database file structure One primary recovery (before-image) area, which contains one or more extents with a.bn filename extension. The.bn files store notes about data changes. In the event of hardware failure, the database engine uses these notes to undo any incomplete transactions and maintain data integrity. One schema area, which contains at least one variable-length extent with a.dn filename extension. The schema area contains the master and sequence blocks, as well as schema tables and indexes. Progress Software Corporation recommends that you place all your application data in additional data areas, but if you do not create application data areas, the schema area contains your user data. Optionally, application data areas, which contain at least one variable-length extent with a.dn filename extension. Application data areas contain user data, indexes, CLOBs and BLOBs. Optionally, one after-image area when after-imaging is enabled. The after-image area can contain many fixed-length and variable-length extents with the.an filename extension. In the event of hardware failure, the database engine uses the.an file and the most recent backup to reconstruct the database. Optionally, one transaction log area when two-phase commit is in use. The transaction log area contains one or more fixed-length extents with the.tn filename extension; variable-length extents are not allowed. The transaction log lists committed two-phase commit transactions. An OpenEdge database is collectively all the files described above: the control area, schema area, data areas, recovery files, and log files. You should treat these files as an indivisible unit. For example, the phrase "back up the database" means "back up the database data:.db,.dn,.lg,.dn,.tn,.an, and.bn files together." Other database-related files While maintaining your database, you might encounter files with the extensions listed in the following table: Table 11: Other database-related files File extension Description.abd.bd.blb.cf.cp.cst.dfsql.dsql.d.df.fd Archived binary dump of audit data Binary dump file (table-based) Object data file for a BLOB or CLOB Schema cache file Binary compiled code page information file Client database-request statement cache file SQL data definition file SQL table dump file ABL table dump file ABL data definition file ABL bulk loader description file 47

48 Chapter 3: OpenEdge RDBMS File extension Description.ks.lic.lk.repl.properties.repl.recovery.rpt OpenEdge Key store for encryption-enabled databases License file Lock file OpenEdge Replication properties file OpenEdge Replication file for maintaining replication state information License usage report file OpenEdge architecture The architecture for the OpenEdge database is known as Type II. Prior to Release 10, the supported architecture was known as Type I. OpenEdge Release 10 continues to support the Type I architecture, but since Type II offers significant advantages in both storage efficiency and data access performance, you should consider migrating your legacy databases to a Type II storage architecture. The Type II architecture contains these elements, as described in the following sections: Storage areas on page 48 Extents on page 50 Clusters on page 50 Blocks on page 51 The elements are defined in your database structure definition file. For details on the structure definition file, see OpenEdge Data Management: Database Administration. Storage areas A storage area is a set of physical disk files, and it is the largest physical unit of a database. With storage areas, you have physical control over the location of database objects: you can place each database object in its own storage area, you can place many database objects in a single storage area, or you can place objects of different types in the same storage area. Even though you can extend a table or index across multiple extents, you cannot split them across storage areas. Certain storage areas have restrictions on the types of extents they support. See the Extents on page 50 for a definition of extents. The transaction log storage area, used for two-phase commit, uses only fixed-length extents but it can use more than one. The other storage areas can use many extents but they can have only one variable-length extent, which must be the last extent. Storage areas are identified by their names. The number and types of storage areas used varies from database to database. However, all OpenEdge databases must contain a control area, a schema area, and a primary recovery area. The database storage areas are: Control area on page 49 48

49 OpenEdge architecture Schema area on page 49 Primary recovery area on page 49 Application data area on page 49 After-image area on page 49 Encryption Policy area on page 50 Audit data and index areas (optional) on page 50 Transaction log area on page 50 Control area The control area contains only one variable-length extent: the database structure extent, which is a binary file with a.db extension. The.db file contains the _area table and the _area-extent tables, which list the name of every area in the database, as well as the location and size of each extent. Schema area The schema area can contain as many fixed-length extents as needed; however, every schema area should have a variable-length extent as its last extent. The schema area stores all database system and user information, and any objects not assigned to another area. If you choose not to create any optional application data areas, the schema area contains all of the objects and sequences of the database. Primary recovery area The primary recovery area can contain as many fixed-length extents as needed, as long as the last extent is a variable length extent. The primary recovery area is also called the before-image area. The files, named.bn, record data changes. In the event of a database crash, the server uses the contents of the.bn files to perform crash recovery during the next startup of the database. Crash recovery is the process of backing out incomplete transactions. Application data area The application data storage area contains all application-related database objects. Defining more than one application data area allows you to improve database performance by storing different objects on different disks. Each application data area contains one or more extents with a.dn extension. After-image area The optional after-image area contains as many fixed-length or variable-length extents as needed. After-image extents are used to apply changes made to a database since the last backup. Enable after-imaging for your database when the risk of data loss due to a system failure is unacceptable. 49

50 Chapter 3: OpenEdge RDBMS Encryption Policy area For databases enabled for transparent data encryption, a dedicated area called the "Encryption Policy Area" is required to hold your encryption policies. The Encryption Policy Area is a specialized Type II application data area. You cannot perform any record operation on the data in the Encryption Policy Area with either an SQL or an ABL client. The Encryption Policy Area contains one or more extents with a.dn extension, but it is defined in your structure definition file with an "e" token. Audit data and index areas (optional) For databases enabled for auditing, specifying an application data area exclusively for audit data is recommended. If you anticipate generating large volumes of audit data, you can achieve better performance by also creating a dedicated area for audit indexes and separating the data and indexes. Both the audit data and audit index areas are application data areas with no special restrictions. Transaction log area The transaction log area is required if two-phase commit is used. This area contains one or more fixed-length extents with the.tn filename extension; variable-length extents are not allowed. Guidelines for choosing storage area locations When choosing the locations for your database storage areas, consider the following: Protect against disk failures by creating the after-image storage area on a separate physical disk from the disks that store the database control and primary recovery areas. Improve performance by creating the primary recovery area on a separate disk from the disk that stores the database control area and its extents. If using two-phase commit, create the transaction log area in the same directory as the database control area to simplify management. Extents Extents are disk files that store physical blocks of database objects. Extents make it possible for an OpenEdge database to extend across more than one file system or physical volume. There are two types of extents: fixed-length and variable-length. With fixed-length extents you control how much disk space each extent uses by defining the size of the extent in the.st file. Variable-length extents do not have a predefined length and can continue to grow until they use all available space on a disk or until they reach the file system's limit on file size. Clusters A cluster is a contiguous allocation of space for one type of database object. Data clusters reduce fragmentation and enable your database to yield better performance from the underlying file system. 50

51 OpenEdge architecture Data clusters are specified on a per-area basis. There is one cluster size for all extents in an area. The minimum size of a data cluster is 8 blocks, but you can also specify larger clusters of 64 or 512 blocks. All blocks within a data cluster contain the same type of object. The high-water mark of an extent is increased by the cluster size. In the Type I architecture, blocks are laid out one at a time. In the Type II architecture, blocks are laid out in a cluster at one time. With the Type II architecture, data is maintained at the cluster level, and blocks only include data associated with one particular object. Additionally, all blocks of an individual cluster are associated with a particular object. In Release 10 of OpenEdge, existing storage areas use the Type I architecture, as do schema. New storage areas can use the Type II architecture or the Type I architecture. To use the Type II architecture, you must allocate clusters of 8, 64, or 512 blocks to an area. If you do not, you will get the Type I architecture. Cluster sizes are defined in your structure definition file. For details on the structure definition file, see OpenEdge Data Management: Database Administration. Blocks A block is the smallest unit of physical storage in a database. Many types of database blocks are stored inside the database, and most of the work to store these database blocks happens behind the scenes. However, it is helpful to know how blocks are stored so that you can create the best database layout. The most common database blocks are divided into three groups: Data blocks on page 51 Index blocks on page 52 Other block types on page 52 Data blocks Data blocks are the most common blocks in the database. There are two types of data blocks: RM blocks and RM chain blocks. The only difference between the two is that RM blocks are considered full and RM chain blocks are not full. The internal structure of the blocks is the same. Both types of RM blocks are social. Social blocks can contain records from different tables. In other words, RM blocks allow table information (records) from multiple tables to be stored in a single block. In contrast, index blocks only contain index data from one index in a single table. The number of records that can be stored per block is tunable per storage area. See the Data layout on page 61 for a discussion of calculating optimal records per block settings. Each RM block contains four types of information: Block header Records Fields Free space The block header contains the address of the block (dbkey), the block type, the chain type, a backup counter, the address of the next block, an update counter (used for schema changes), free space pointers, and record pointers. For a Type I storage area, the block header is 16 bytes in length. For a Type II storage area, the block header is variable; the header of the first and last block in a cluster is 80 bytes, while the header for the remaining blocks in a cluster is 64 bytes. Each record contains a fragment pointer (used by record pointers in individual fields), the Length of the Record field, and the Skip Table field (used to increase field search performance). Each field needs a minimum of 15 bytes for overhead storage and contains a Length field, a Miscellaneous Information field, and data. 51

52 Chapter 3: OpenEdge RDBMS The following figure shows the layout of an RM block. Figure 12: RM Block layout Index blocks Index blocks have the same header information as data blocks, and have the same size requirements of 16 bytes for Type I storage areas, and 64 or 80 bytes for Type II storage areas. Index blocks can store the amount of information that can fit within the block, and that information is compressed for efficiency. As stated earlier, index blocks can only contain information referring to a single index. Indexes are used to find records in the database quickly. Each index in an OpenEdge RDBMS is a structured B-tree and is always in a compressed format. This improves performance by reducing key comparisons. A database can have up to 32,767 indexes. Each B-tree starts at the root. The root is stored in an _storageobject record. For the sake of efficiency, indexes are multi-threaded, allowing concurrent access. Rather than locking the whole B-tree, only those nodes that are required by a process are locked. Other block types There are other types of blocks that are valuable to understand. These include: Master blocks on page 53 52

53 OpenEdge architecture Storage object blocks on page 53 Free blocks on page 53 Empty blocks on page 53 Master blocks The master block contains the same 16-byte header as other blocks, but this block stores status information about the entire database. It is always the first block in the database and it is found in Area 6 (a Type I storage area). This block contains the database version number, the total allocated blocks, time stamps, and status flags. You can retrieve additional information from this block using the Virtual System Table (VST) _mstrblk. For more information on VSTs, see OpenEdge Data Management: Database Administration. Storage object blocks Storage object blocks contain the addresses of the first and last records in every table by each index. If a user runs a program that requests the first or last record in a table, it is not necessary to traverse the index. The database engine obtains the information from the storage object block and goes directly to the record. Because storage object blocks are frequently used, they are pinned in memory. This availability further increases the efficiency of the request. Free blocks Free blocks have a header, but no data is stored in the blocks. These blocks can become any other valid block type. These blocks are below the high-water mark. The high-water mark is a pointer to the last formatted block within the database storage area. Free blocks can be created by extending the high-water mark of the database, extending the database, or reformatting blocks during an index rebuild. If the user deletes many records, the RM blocks are put on the RM Chain. However, index blocks can only be reclaimed through an index rebuild or an index compress. Empty blocks Empty blocks do not contain header information. These blocks must be formatted prior to use. These blocks are above the high-water mark but below the total number of blocks in the area. The total blocks are the total number of allocated blocks for the storage area. 53

Chapter 3: OpenEdge RDBMS Storage design overview The storage design of the OpenEdge RDBMS is divided into a physical model and a logical model.

54 Chapter 3: OpenEdge RDBMS Storage design overview The storage design of the OpenEdge RDBMS is divided into a physical model and a logical model. You manipulate the physical storage model through ABL, OpenEdge SQL, and database administration utility interfaces. The following figure shows how areas can be stored on and span different file slices. Observe that although an area's extents are comprised of many disk files, an extent can only be associated with one storage area. Figure 13: Physical storage model 54

55 Storage design overview The logical storage model overlays the physical model. Logical database objects are described in the database schema and include tables, indexes, and sequences that your application manipulates. The following figure illustrates how logical objects can span physical extents. Figure 14: Logical storage model Mapping objects to areas Creating optimal storage areas and proper relationships between areas and objects enhances your database performance. You can define one object per area, or you can have multiple objects in an area. Use the information in the following table to guide your design decision. Table 12: Guidelines for storage areas If your database contains... Many large or frequently updated tables Many small tables A mixture of large and small tables Then you should... Create a storage area for each table. This will give you easier administration and better performance. Create storage areas to represent a distinct subject (for example Sales, Inventory, etc.) and assign related tables to that storage area. This will give you easier administration. Combine the above two strategies. 55

White Paper: Supporting Java Style Comments in ABLDoc

White Paper: Supporting Java Style Comments in ABLDoc Notices 2015 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. These materials and all Progress software products