Department of Computer Science and Information Systems, College of Business and Technology, Morehead State University

1 Department of Computer Science and Information Systems, College of Business and Technology, Morehead State University Lecture 3 Part A CIS 311 Introduction to Management Information Systems (Spring 2017) Computers and information technology have been the main tools used to handle data. In the earlier days, however, data were not the main part of software. In other words, software was designed in such a way that programs (instructions or processes) were the main part, and data were handled as a byproduct of the programs. This approach is still used sometimes, but it is not a good way to manage a large amount of data, and new approaches to managing data have been introduced since the 1960s. Some earlier models (e.g., the hierarchical model such as IBM s IMS system, the network model such as IDMS, etc.) were efficient in terms of speed but too rigid to be used in daily operations. A more reliable and robust database model (i.e., the relational model for databases) was introduced in 1969 by Edgar F. Codd, and later commercial relational database management systems were introduced (Oracle 2). Since then many organizations have used relational database management systems because they are reliable and flexible enough to support their business. In the 1990s, with the popularity of the Internet and the World Wide Web, organizations began to collect external data and use a special type of database management system called data warehouses to manage and analyze different types of internal and external data. Since 2002, the amount of data that organizations deal with is so big that traditional database management systems would not be useful. These traditional systems (e.g., relational database management systems) are still widely used for internal enterprise database management, but for Big Data, different types of database management systems are used (e.g., NoSQL, NewSQL, etc.). Nowadays, all these database managements systems (relational DBMS, data warehouses, NoSQL, etc.) are used in business. Database management systems are used to organize a set of data efficiently and store it economically in a permanent or secondary storage device (e.g., hard disk drives). They also provide efficient tools to locate the on-demand part or whole of the data set, retrieve it, manipulate it, and generate outputs. So far, the most reliable, flexible, and efficient database management systems are based on the relational model, and thus it is important to understand the model. Relational databases use relations to organize and store data. A relation in a relational database is a twodimensional table with a primary key and other constraints integrated. In other words, not every table is a relation. To be a relation, the following conditions should be met: 1. A relation has a name. 2. Data in a relation are organized in two dimensions: row (record, tuple) and column (field, attribute) 3. Each column in a table (field or attribute) is of the same data type. If a column is of numeric data type, the data of text type cannot be inserted in the column. 4. The order of columns does not matter. 5. The order of rows does not matter. 6. Every relation must have one, and only one, primary key. 7. Foreign key is optional. A relation may not have a foreign key or may have one or more foreign keys. In a database, several tables are stored. This is to improve database management and save resources. Let s review the following table. It could be a relation if it has a name (EMPLOYEE_INFORMATION) and if it has a primary key (e.g., the empno and deptno columns together could be set as the primary key for the table). Even

2 if it may be a relation, the data should not be stored as is because it has many anomalies and redundant data. EMPLOYEE_INFORMATION empno ename job hiredate sal deptno dname loc 7839 KING PRESIDENT 17-Nov-81 $5,000.00 10 ACCOUNTING NEW YORK 7782 CLARK MANAGER 09-Jun-81 $2,450.00 10 ACCOUNTING NEW YORK 7934 MILLER CLERK 23-Jan-82 $1,300.00 10 ACCOUNTING NEW YORK 7566 JONES MANAGER 02-Apr-81 $2,975.00 20 RESEARCH DALLAS 7788 SCOTT ANALYST 13-Jul-87 $3,000.00 20 RESEARCH DALLAS 7902 FORD ANALYST 03-Dec-81 $3,000.00 20 RESEARCH DALLAS 7369 SMITH CLERK 17-Dec-80 $800.00 20 RESEARCH DALLAS 7876 ADAMS CLERK 13-Jul-87 $1,100.00 20 RESEARCH DALLAS 7698 BLAKE MANAGER 01-May-81 $2,850.00 30 SALES CHICAGO 7499 ALLEN SALESMAN 20-Feb-81 $1,600.00 30 SALES CHICAGO 7521 WARD SALESMAN 22-Feb-81 $1,250.00 30 SALES CHICAGO 7654 MARTIN SALESMAN 28-Sep-81 $1,250.00 30 SALES CHICAGO 7844 TURNER SALESMAN 08-Sep-81 $1,500.00 30 SALES CHICAGO 7900 JAMES CLERK 03-Dec-81 $950.00 30 SALES CHICAGO In the EMPLOYEE_INFORMATION table, there are many redundant data. For instance, the dname column has many repeating values (e.g., ACCOUNTING, RESEARCH, and SALES). The loc column also has repeating values. This table shows only 14 records (rows), and thus the redundant data may be ignorable. In practice, however, a relation usually has a large amount of data (e.g., millions or billions of records), and these repeating values would be a big waste. Another problem is the anomalies in the table. For instance, if the accounting department moves to a new place (e.g., L.A.), then many values of the data (i.e., the cell values, NEW YORK, in the loc column) should be changed (i.e., to L.A.). This is called update anomalies. In the table, it is not possible to enter a new record if a department does not exist (e.g., no record of IT department exists), and this is called insertion anomalies. Another type, deletion anomaly, occurs when a record cannot be deleted without deleting all related data (e.g., if the ACCOUNTING department was removed, several records will be deleted. A table with these problems is called a non-normalized relation (or table). One easy way to avoid or minimize these problems in the example is to divide the table into two or more tables. For instance, the EMPLOYEE_INFORMATION table could be divided into two tables as shown in the following figure (the Employees and Departments tables). To make each of these a relation, the primary key constraint should be set up in each table (e.g., empno for Employees and deptno for Departments). If the tables are compared with the previous table, it is clear that a significant amount of data points (i.e., values in many cells) has been saved, if not completely. As shown in the figure, there are less repeating values in the new tables. For instance, the value, ACCOUNTING, is shown only one time. Likewise, the values RESEARCH, SALES, OPERATIONS, NEW YORK, DALLAS, CHICAGO, and BOSTON appear only once. There are, however, still some redundant values. For instance, the repeating values appear in the job column and the deptno column. If there was more information for jobs (e.g., job characteristics, job title number, etc.), a new table may be developed. In the given example, however, there are none, and thus keeping it as is seems OK. The deptno column in the Employees table is different and plays an important role in the table. In fact, it is the

3 foreign key in the Employees table which references the primary key in the Departments table. The foreign key is used as a linking point between the two tables. For instance, if a record is selected in the Employees table (e.g., SMITH with the empno 7369), then the foreign key can be used to trace the relevant information on the Departments table (deptno 20). In other words, with the value in the deptno of the Employees table, the foreign key constraint makes it possible to trace the value in the deptno of the Departments table (i.e., the department name is RESEARCH, and the department location is DALLAS). These tables are normalized relations, and this process of converting tables with anomalies into normalized tables is called normalization. This process, however, can be very complex and time consuming. A better way is to use a conceptual data model, and the most popular one is an entity-relationship model. In practice, system analysts or database administrators develop a conceptual data model first, convert it into a relational data schema, generate SQL statements based on the relational data schema, and execute the SQL statements to create actual tables and databases. For instance, the following SQL statement can be executed in a database management system to create a table (relation) called Departments. create table Departments( deptno byte, dname text, loc text, constraint pk_departments primary key (deptno) ); SQL, or Structured Query Language, is a fourth generation programming language (non-procedural or goaloriented). It is the US and international standard for relational databases. In the previous SQL example, create table is the reserved keyword that is used to create a relation. The word after the create table keyword, Departments, is the name of the table (relation). Inside the parentheses, columns (attributes or fields) are defined with column names, data types, and optional constraints. The columns are separated by commas, and the whole statement could be written in one line like the following: create table Departments(deptno byte, dname text, loc text, constraint pk_departments primary key (deptno)); In practice, however, the former is frequently used for better readability. The semicolon after the closing

4 parenthesis is the ending mark (like a period in an English sentence). The SQL code in the example is modified to conform to Microsoft Access, and it can be executed in the program using the SQL view of the Query Design menu. To test the code, first, open Microsoft Access and click on the Blank database menu. Initially, an unnamed database has been created with a table (Table1 which can be deleted). Select the Create tab menu, click on the Query Design menu of the Queries group, and click on the Close button if you see a pop-up window (Show Tables). If this is done correctly, the following will be shown.

5 If the SQL View menu is selected, the following will be displayed. Delete the SELECT; statement in the Query1 window, type in the example SQL code, and click on the Run button to create the Departments table.

6 Likewise, in the Query1 window, delete the previous SQL statement and type in the following to create the Employees table. After the code replaces the previous statement, click on the Run button (if it is not shown, select the Design tab). create table Employees( empno int, ename text, job text, hiredate date, sal currency, deptno byte, constraint pk_employees primary key (empno), constraint fk_deptno foreign key (deptno) references Departments (deptno) ); Once the relations have been created, an insert into statement can be used to enter a record (row or tuple) into a table. For instance, the following statement can be executed to enter a record into the Departments table (Follow the process explained previously to enter the data): insert into Departments values(10, 'ACCOUNTING', 'NEW YORK'); There are some ways to enter the whole data at one time, but this requires advanced programming, and thus, type in the following SQL statements one by one to enter all the records for the Departments and Employees tables:

7 insert into Departments values(20, 'RESEARCH', 'DALLAS'); insert into Departments values(30, 'SALES', 'CHICAGO'); insert into Departments values(40, 'OPERATIONS', 'BOSTON'); values( 7839, 'KING', 'PRESIDENT',format('17-11-1981','dd-mm-yyyy'), 5000, 10); values( 7698, 'BLAKE', 'MANAGER', format('1-5-1981','dd-mm-yyyy'), 2850, 30); values( 7782, 'CLARK', 'MANAGER', format('9-6-1981','dd-mm-yyyy'), 2450, 10); values( 7566, 'JONES', 'MANAGER', format('2-4-1981','dd-mm-yyyy'), 2975, 20); values( 7788, 'SCOTT', 'ANALYST', format('13-jul-1987','dd-mmm-yyyy'), 3000, 20); values( 7902, 'FORD', 'ANALYST', format('3-12-1981','dd-mm-yyyy'), 3000, 20); values( 7369, 'SMITH', 'CLERK', format('17-12-1980','dd-mm-yyyy'), 800, 20); values( 7499, 'ALLEN', 'SALESMAN', format('20-2-1981','dd-mm-yyyy'), 1600, 30); values( 7521, 'WARD', 'SALESMAN', format('22-2-1981','dd-mm-yyyy'), 1250, 30); values( 7654, 'MARTIN', 'SALESMAN', format('28-9-1981','dd-mm-yyyy'), 1250, 30); values( 7844, 'TURNER', 'SALESMAN', format('8-9-1981','dd-mm-yyyy'), 1500, 30); values( 7876, 'ADAMS', 'CLERK', format('13-jul-1987', 'dd-mmm-yyyy'), 1100, 20); values( 7900, 'JAMES', 'CLERK', format('3-12-1981','dd-mm-yyyy'), 950, 30); values( 7934, 'MILLER', 'CLERK', format('23-1-1982','dd-mm-yyyy'), 1300, 10); After the insert into statements are executed, if the following select statement is executed, the records of the Departments table will be displayed: select * from Departments;

8 If the Run button is clicked on, the data will be displayed as follows: The records shown in the Query1 window is not the Departments table, but the data set retrieved from the table and loaded into the computer memory. Likewise, the following statement will display the data from the Employees table: select * from Employees;

9 The following SQL statement will summarize the data from the two tables and display the result as shown in the following figure: select Departments.deptno, sum(sal) from Departments, Employees where Departments.deptno = Employees.deptno group by Departments.deptno, dname; The query (currently Query1) can be saved by clicking on the Save button.

10 Name it as SalaryByDepartment, click the OK button, and the query is shown in the Queries area. The outcome from the database by the SQL statement can be used in different application programs (e.g., Excel). To understand the process, first, click the right mouse button on top of the query and select the Excel menu of the Export list. Complete the Export to Excel process by clicking on the OK and Close buttons, and the file (SalaryByDepartment.xlsx) can be located where the file has been exported.

12 In Excel, the data can be modified and analyzed further. The summarized data can be saved in the Excel file, too, but the raw data is usually stored in a database. As mentioned before, Excel is different from Access. Excel is an electronic spreadsheet program which is used to analyze numeric data; on the other hand, Access is a relational database management system which is used to collect, organize, and store a large amount of data. In Excel, there are many cells in a big table-like structure, but it is not a relation (no primary key, foreign key, or any other constraints). In Access, many tables are organized, and each table is a relation which should include a primary key. Optionally, if necessary, a foreign key or more foreign keys could be implemented in a relation. A primary key is a column or a set of columns in a relation and is used to identify each record. For instance, if a value of the primary key is given, a specific record can be identified (differentiated from other records). A foreign key is an extra column or set of columns in a relation. It is the primary key in another relation (reference table) and is used to keep a link between the tables (the reference table and the referring table that has the foreign key). In practice, ordinary business people don t use a database management system directly unless they know the important concepts and techniques. Usually they use business applications or analytic programs (e.g., electronic spreadsheet programs) which can be used to import data from database management systems indirectly and process and analyze the imported data. Some business applications or analytic systems are introduced in CIS 211, CIS 385, and other IS courses. Some database management courses (CIS 326, CIS 426) provide more information on database management. It is strongly recommended that Microsoft Access not be used unless the advanced concepts and techniques (e.g., Entity-Relationship Model, Relational Data Model, etc.) are learned. On the other hand, all college students, especially business-major students, should master an electronic spreadsheet program.