CS2 Current Technologies Note 1 Relational Database Systems Introduction When we wish to extract information from a database, we communicate with the Database Management System (DBMS) using a query language called SQL. SQL is the most frequently used programming language in the world, in the sense that every day, more SQL programs are written, compiled and executed than programs in any other computer programming language. This needs some qualification. There are 4 million Visual Basic programmers in the world and more software is written in Visual Basic than all of the other programming languages put together. However, this does not contradict the assertion that SQL is the most frequently used programming language. Most of the SQL programs that are written get done so by other computer programs (written in Visual Basic, for example) not by human programmers. So, typically, SQL is an embedded language. The reason I am telling you this now is because a lot of the features of SQL only make sense when you bear in mind that the language is designed for use by other software. SQL is used in a large number of database management systems (DBMSs), so it may be relevant to you later in your career. SQL stands for Structured Query Language. This is a bit confusing as it contains language features for performing a variety of operations on tables besides allowing the expression of queries. SQL is the de facto industry standard database query language. In the next section, we shall see examples of the use of SQL as a data definition language. Relational Databases SQL is used with relational database systems. In a relational database, all of the data is stored in tables. There are also object-oriented databases in use. These databases store objects as defined by object-oriented programming language classes. Object-oriented databases have been around for many years, but they have never proved to be popular. Currently, relational systems account 1
for ninety percent of the market for database systems, whereas object-oriented database systems account for the other ten percent. In a relational database table, each row corresponds to some real-world entity (or possibly to some relationship between real-world entities) and each column is labelled by an attribute of those entities. They are called relational databases because they are supposed to be based on the mathematical theory of relations. Each table corresponds to a predicate and each row to a tuple over that predicate. A moderately large commercial installation might have about two hundred tables in a database. Some large installations have many thousands of tables. The number of rows in a tables depends on the number of entities in the entity class being modelled. An Example Database Throughout this note we shall use a sample database which consists of two tables. One table contains information about the employees of a company. The other table contains information about the departments of the company and their location. Typically, a database comprises several tables, so each table must have a name in order that we can state from which table(s) the information is to be extracted. The name of the employee table is EMP and the content of this table is shown in Table 1. The name of the department table is DEPT and its content is shown in Table 2, This database contains some personnel details of a company. The data items are stored in a tabular form; we shall cover the details in subsequent notes. There are several tables associated with this database. For example, one table is called SALGRADE and contains the minimum and maximum salary levels associated with each of the salary grades in the company s pay structure. (The other tables are named EMP and DEPT.) Some simple queries are as follows. For example, SQL> SELECT * FROM salgrade; You should get the following result. GRADE LOSAL HISAL ---------- ---------- ---------- 1 700 1200 2 1201 1400 3 1401 2000 4 2001 3000 5 3001 9999 2
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO 7369 SMITH CLERK 7902 17-DEC-80 800 20 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 7566 JONES MANAGER 7839 02-APR-81 2975 20 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 7788 SCOTT ANALYST 7566 27-JUN-90 3000 20 7839 KING PRESIDENT 17-NOV-81 5000 10 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 7876 ADAMS CLERK 7788 31-JUL-90 1100 20 7900 JAMES CLERK 7698 03-DEC-81 950 30 7902 FORD ANALYST 7566 03-DEC-81 3000 20 7934 MILLER CLERK 7782 23-JAN-82 1300 10 Table 1: Database table EMP DEPTNO DNAME LOC 10 ACCOUNTING NEW YORK 20 RESEARCH DALLAS 30 SALES CHICAGO 40 OPERATIONS BOSTON Table 2: Database table DEPT SQL> Note that some SQL systems convert letters to upper case. When you want to set up an alias in the SELECT line, surrounding a string in double quotes means that SQL will respect the case of the letters. This is illustrated in the following example. SQL> SELECT grade "Grade" FROM salgrade; Grade ---------- 1 2 3 4 5 SQL> It is easy to forget to type the semicolon at the end of a SELECT command. If you type <return> before you have typed the semicolon that terminates the query, you will get another prompt, as in the following example. 3
SQL> SELECT LOC FROM DEPT; 2 You may continue typing in the rest of the query, or just the terminating semicolon, on this line. The query may be continued over many lines. Data manipulation: SQL For the purpose of this course SQL has been used exclusively for data retrieval exercises. That is, you query an existing database for particular information, rather than alter information in the database. The data manipulation operators at the user s disposal generate new tables from old. For example: Take a subset of the columns of a table. Take a subset of the rows of a table. Join two tables together. Database Tables In a relational database management system all data in the database are stored in tables, such as for example EMP and DEPT. A table consists of rows and columns. The columns for the DEPT table are DEPTNO, DNAME and LOC; DEPT consists of 4 rows. Each row consists of fields which show the data values of a row with respect to the table columns. For example, the value of the DNAME column in the DEPT table for the first row shown in table 2 is ACCOUNTING. Fields in a row describe information about one particular entity, i.e. they are related to each other. If, again, we take the first row in the DEPT table, this means that there exists a department with department number (DEPTNO) 10, that its name (DNAME) is ACCOUNTING, and that it is located (LOC) in New York. Access to the table is specified in SQL statements with respect to these columns and rows. We note that attributes act as the names of columns. We also note that rows do not have names in the sense that columns and tables have names. The Basic SELECT Statement The SQL SELECT statement is used to retrieved information from the database. The most simple form of SELECT is given in Figure 1. 4
SELECT FROM column-names table-names ; Figure 1: Basic SELECT statement The column-names given in a SELECT-clause determine which columns of the table we would like to retrieve. The FROM-clause determines from which table we want to retrieve data. If we were interested in the department numbers and department names of the DEPT table, we would issue the query shown in Example 1. Example 1 List the number and name of each department. SELECT DEPTNO, DNAME FROM DEPT; DEPTNO DNAME 10 ACCOUNTING 20 RESEARCH 30 SALES 40 OPERATIONS Several points should be noted: if more than one column name is given in the SELECT-clause then the column names have to be separated by commas (, ). The end of a SELECT-statement is indicated through a semicolon ( ; ). We will see later that some SELECT-statements retrieve data from more than one table, in which case table names in the FROM-clause are also separated by commas. If we wish to select all columns of a table, we can do so by writing an asterisk ( * ) instead of the list of all column names. The SELECT-statement in Figure 2, for example, retrieves all columns (and all rows) of the EMP table. The output of this query is identical to Table 1. Example 2 List the contents of the EMP table. SELECT * FROM EMP; Key words, such as SELECT and FROM, are written in upper-case. This is not required by SQL. If you prefer, you can type the entire statement in lower-case characters and in one line. As soon as you type a semicolon, SQL will try to execute the statement you typed. The order of columns displayed in the output depends on the order of the column names given in the SELECT-clause, i.e. if a list of column names is specified in the SELECT-clause, the output will order columns according to that list. If the * -option is used, the output will follow the order which is used in the original table. 5
The WHERE Clause In all previous examples when we selected some, or all, columns from a table, we did so for all rows stored in that table. Sometimes, however, we are only interested in a subset of all rows of a table. The manager of our example company may want to know the salary of a particular employee. To facilitate this kind of query, SQL provides the WHERE-clause in the SELECT-statement. The extended definition of a SELECT-statement is given in Figure 2. The search-condition in the WHERE-clause specifies which rows in the table we are interested in. All rows that satisfy the condition in the WHERE-clause are retrieved from the table. SELECT FROM WHERE column-names table-names search-condition ; Figure 2: Select statement with where-clause Suppose that we want to select rows that have a particular column value. For example, we may wish to know the salary of a particular employee. In Example 3, we can specify an employee by his or her number (EMPNO). Example 3 What is the salary for the person with employee number 7902? SELECT SAL FROM EMP WHERE EMPNO = 7902; SAL 3000 When testing for equality, the WHERE-clause has the following general form: WHERE column-name = value Further Reading Elmasri and Navathe, Chapter 1. Dietel, Deitel and Nieto, Chapter 25, pages 840-850. Ramakrishnan and Gehrke, Database Management Systems, Chapter 1. Chris Walton (adapted from a note by Rob Procter) 6