Topic 2: Relational Databases and SQL Olaf Hartig olaf.hartig@liu.se
Relational Data Model
Recall: DB Design Process 3
Relational Model Concepts Relational database: represent data as a collection of relations Think of a relation as a table of values Each row (tuple) represents a record of related data values Facts that typically correspond to a real-world entity or relationship Each column (attribute) holds a corresponding value for each row Columns associated with a data type (domain) Each column header: attribute name 4
Relational Model Concepts (cont'd) Relational database: represent data as a collection of relations Think of a relation as a table of values Schema describes the relation Relation name, attribute names and domains Integrity constraints Instance denotes the current contents of the relation (also called state) Set of tuples
Domains Domain is a set of atomic values { 0, 1, 2, } { Jo Smith, Dana Jones, Ashley Wong, Y. K. Lee, } Atomic: Each value indivisible Domains specified by data type rather than by enumeration Integer, string, date, real, etc. Can be specified by format e.g., (ddd)ddd-dddd for phone numbers (where d represents a digit) 6
Schemas and Attributes Relation schema A relation name R and a list of attributes A1, A2,..., An Denoted by R(A1, A2,..., An) Attribute Ai Name of a role in the relation schema R Associated with a domain dom(ai) Attribute names do not repeat within a relation schema, but domains can repeat Degree (or arity) of a relation Number of attributes n in its relation schema 7
NULL Values Each domain may be augmented with a special value called NULL Represent the values of attributes that may be unknown or may not apply to a tuple If an attribute of a tuple is NULL, we cannot make any assumption about the value for that attribute (for that tuple) Interpretations for NULL values Nothing is known about the value Value exists but is (currently) not available Value undefined (i.e., attribute does not apply to this tuple) For instance, Ashley s telephone number is NULL could mean Ashley doesn t have a phone Ashley has a phone but we don t know the number (perhaps withheld) Ashley has a phone that has no number 8
Integrity Constraints
What are Integrity Constraints? Constraints are restrictions on the permitted values in a DB state Derived from the rules in the miniworld that the DB represents 1. Inherent model-based constraints (also called implicit constraints) Inherent in the data model e.g., duplicate tuples are not allowed in a relation 2. Schema-based constraints (also called explicit constraints) Can be directly expressed in schemas of the data model e.g., films have only one director Our focus here 3. Application-based (also semantic constraints or business rules) Not directly expressed in schemas Expressed and enforced by application program e.g., this year s salary increase can be no more than last year s 12
Key Constraints Uniqueness constraints on tuples Key of a relation R is a set K of attributes of R that has two properties: 1. Uniqueness: No two distinct tuples have the same values across all attributes in K (i.e., it is a superkey) 2. Minimality: No subset of K has the uniqueness property Superkey: set of attributes that has the uniqueness property (but that is not necessarily minimal) Keys declared as part of the schema of a relation Uniqueness must hold in all valid states Serve as a constraint on updates 13
Key Constraints (cont'd) Candidate key: If there is more than one key in a relation, every key is called a candidate key Primary key: a particular candidate key is chosen as the primary Diagrammatically, underline its attribute(s) Tuples cannot have NULL for any primary key attribute Other candidate keys are designated as unique Non-NULL values cannot repeat, but values may be NULL 14
Other Integrity Constraints Entity integrity constraint: No primary key value can be NULL Domain constraint: declared by specifying the datatype of attributes Referential integrity constraint Specified between two relations Allows tuples in one relation to refer to tuples in another Maintains consistency among tuples in two relations Foreign key rules: Let PK be the primary key in a relation R1 (i.e., set of attributes in its relational schema declared to be primary key) Let FK be a set of attributes for another relation R2 The attribute(s) FK have the same domain(s) as the attribute(s) PK Value of FK in a tuple t2 of the current state of R2 either occurs as a value of PK for some tuple t1 in the current state of R1 or it is NULL 1
Diagramming Referential Constraints Show each relational schema Underline primary key attributes in each Directed arc from each foreign key to the relation it references 16
SQL
Structured Query Language Declarative language (what data to get, not how) Considered one of the major reasons for the commercial success of relational databases Statements for data definitions, queries, and updates Both DDL and DML Terminology: Relational Model SQL relation table tuple row attribute column Syntax notes: Some interfaces require each statement to end with a semicolon SQL is not case-sensitive 19
SQL DDL
Creating Tables CREATE TABLE <tablename> ( <colname> <datatype> [<constraint>],, [<constraint>], ); Data types: integer, decimal, number, varchar, char, etc. Constraints: not, primary key, foreign key, unique, etc. 21
Creating Tables (Example) CREATE TABLE WORKS_ON ( ESSN integer, PNO integer, HOURS decimal(3,1), constraint pk_workson primary key (ESSN, PNO), constraint fk_works_emp FOREIGN KEY (ESSN) references EMPLOYEE(SSN), constraint fk_works_proj FOREIGN KEY (PNO) references PROJECT(PNUMBER) ); 22
Modifying Table Definitions Add, delete, and modify columns and constraints ALTER TABLE EMPLOYEE ADD COLUMN JOB VARCHAR(12); ALTER TABLE EMPLOYEE DROP COLUMN ADDRESS CASCADE; ALTER TABLE WORKS_ON DROP FOREIGN KEY fk_works_emp; ALTER TABLE WORKS_ON ADD CONSTRAINT fk_works_emp FOREIGN KEY (ESSN) REFERENCES EMPLOYEE(SSN); Delete a table and its definition DROP TABLE EMPLOYEE; 23
SQL Queries
Basic SQL Retrieval Queries All retrievals use SELECT statement: SELECT <return list> FROM <table list> [ WHERE <condition> ] ; where <return list> is a list of column names (or expressions) whose values are to be retrieved <table list> is a list of table names required to process the query <condition> is a Boolean expression that identifies the tuples to be retrieved by the query (if no WHERE clause, all tuples to be retrieved) 2
Example SELECT title, year, genre FROM Film WHERE director = 'Steven Spielberg' 1. Start with the relation named in the FROM clause 2. Consider each tuple one after the other, eliminating those that do not satisfy the WHERE clause 3. For each remaining tuple, create a return tuple with columns for each expression (column name) in the SELECT clause Film title The Company Men Lincoln genre drama biography year 2010 2012 director John Wells Steven Spielberg minutes 104 gross 1,000,000 4,439,063 6,000,000 181,408,467 66,000,000 79,883,39 44,00,000 13,178,21 10 War Horse drama 2011 Steven Spielberg 146 Argo drama 2012 Ben Affleck 120 budget 26
All Attributes List all information about the employees of department. SELECT FNAME, MINIT, LNAME, SSN, BDATE, ADDRESS, SEX, SALARY, SUPERSSN, DNO FROM EMPLOYEE WHERE DNO = ; or SELECT * FROM EMPLOYEE WHERE DNO = ; Comparison operators {=, <>, >, =>, etc.} 27
Logical Operators List the last name, birth date and address for all employees whose name is `Alicia J. Zelaya SELECT LNAME, BDATE, ADDRESS FROM EMPLOYEE WHERE FNAME = Alicia AND MINIT = J AND LNAME = Zelaya ; Logical operators {and, or, not} 28
Pattern Matching in Strings List the birth date and address for all employees whose last name contains the substring aya SELECT BDATE, ADDRESS FROM EMPLOYEE WHERE LNAME LIKE %aya% ; LIKE comparison operator % represents 0 or more characters _ represents a single character 29
NULLs List all employees that do not have a boss. SELECT FNAME, LNAME FROM EMPLOYEE WHERE SUPERSSN IS NULL; SUPERSSN = NULL and SUPERSSN <> NULL will not return any matching tuples, because NULL is incomparable to any value, including another NULL 30
Tables as Sets List all salaries: SELECT SALARY FROM EMPLOYEE; SALARY 30000 40000 2000 43000 38000 2000 2000 000 SQL considers a table as a multi-set (bag), i.e. tuples may occur more than once in a table This is different from the relational data model Why? Removing duplicates is expensive User may want information about duplicates Aggregation operators (e.g., sum) 31
Removing Duplicates List all salaries: SELECT SALARY FROM EMPLOYEE; List all salaries without duplicates SELECT DISTINCT SALARY FROM EMPLOYEE; SALARY 30000 40000 2000 43000 38000 2000 2000 000 SALARY 30000 40000 2000 43000 38000 000 32
Set Operations Duplicate tuples are removed. Queries can be combined by set operations: UNION, INTERSECT, EXCEPT (MySQL only supports UNION) Retrieve the first names of all people in the database. E D SELECT FNAME FROM EMPLOYEE UNION SELECT DEPENDENT_NAME FROM DEPENDENT; Which department managers have dependents? Show their SSN. SELECT MGRSSN FROM DEPARTMENT INTERSECT SELECT ESSN FROM DEPENDENT; M DE 33
Join: Cartesian Product List all employees and the names of their departments. SELECT LNAME, DNAME FROM EMPLOYEE, DEPARTMENT; EMPLOYEE LNAME Smith Wong Zelaya Wallace Narayan English Jabbar Borg DNO 4 4 4 1 DEPARTMENT DNAME Research Administration Headquarters DNUM 4 1 LNAME Smith Wong Zelaya Wallace Narayan English Jabbar Borg Smith Wong Zelaya Wallace Narayan English Jabbar Borg Smith Wong Zelaya Wallace Narayan English Jabbar Borg DNAME Research Research Research Research Research Research Research Research Administration Administration Administration Administration Administration Administration Administration Administration Headquarters Headquarters Headquarters Headquarters Headquarters Headquarters Headquarters Headquarters 34
Join: Equijoin Foreign key in EMPLOYEE List all employees and the names of their departments. SELECT LNAME, DNAME FROM EMPLOYEE, DEPARTMENT WHERE DNO = DNUM; Equijoin EMPLOYEE LNAME Smith Wong Zelaya Wallace Narayan English Jabbar Borg DNO 4 4 4 1 DEPARTMENT DNAME DNUM Research Administration Headquarters 4 1 Cartesian product Primary key in DEPARTMENT LNAME DNO DNAME Smith Wong Zelaya Wallace Narayan English Jabbar Borg Smith Wong Zelaya Wallace Narayan English Jabbar Borg Smith Wong Zelaya Wallace Narayan English Jabbar Borg Research Research Research Research Research Research Research Research Administration Administration Administration Administration Administration Administration Administration Administration Headquarters Headquarters Headquarters Headquarters Headquarters Headquarters Headquarters Headquarters 4 4 4 1 4 4 4 1 4 4 4 1 DNUM 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 3
Inner Join List all employees and the names of their departments. SELECT LNAME, DNAME FROM EMPLOYEE, DEPARTMENT WHERE DNO = DNUM; As an alternative, the join condition may be given in the FROM clause by using the keywords INNER JOIN and ON as follows: SELECT LNAME, DNAME FROM EMPLOYEE INNER JOIN DEPARTMENT ON DNO = DNUM; 36
Ambiguous Names: Aliasing What if the same attribute name is used in different relations? No alias Whole name SELECT EMPLOYEE.NAME, DEPARTMENT.NAME SELECT NAME, NAME FROM EMPLOYEE, DEPARTMENT WHERE DNO=DNUM; FROM EMPLOYEE, DEPARTMENT WHERE EMPLOYEE.DNO=DEPARTMENT.DNUM; Alias SELECT E.NAME, D.NAME FROM EMPLOYEE E, DEPARTMENT D WHERE E.DNO=D.DNUM; 37
Self-Join List the last name for all employees together with the last names of their bosses SELECT E.LNAME AS Employee, S.LNAME AS Boss FROM EMPLOYEE E, EMPLOYEE S WHERE E.SUPERSSN = S.SSN; Employee Boss Smith Wong Zelaya Wallace Narayan English Jabbar Wong Borg Wallace Borg Wong Wong Wallace 38
Self-Joins may also be written as Inner Join List the last name for all employees together with the last names of their bosses SELECT E.LNAME AS Employee, S.LNAME AS Boss FROM EMPLOYEE E, EMPLOYEE S WHERE E.SUPERSSN = S.SSN; Employee Boss Smith Wong Zelaya Wallace Narayan English Jabbar SELECT E.LNAME Employee, S.LNAME Boss FROM EMPLOYEE E INNER JOIN EMPLOYEE S ON E.SUPERSSN = S.SSN; Wong Borg Wallace Borg Wong Wong Wallace 39
Left Outer Join Every tuple in left table appears in result If there exist matching tuples in right table, works like inner join If no matching tuple in right table, one tuple in result with left tuple values padded with NULL values for columns of right table Customer custid 120 3122 2134 1697 3982 name Lee Willis Smith Ng Harrison address 633 S. First 41 King 213 Main Queen N. 808 Main Sale phone -1219-9876 -1234-002 -4829 saleid A17 B823 B219 C41 X00 date Dec Dec 9 Dec 1 Dec 23 Dec custid 3122 1697 3122 120 NULL SELECT * FROM Customer LEFT JOIN Sale ON Customer.custid = Sale.custid Customer.custid name 120 3122 3122 2134 1697 3982 Lee Willis Willis Smith Ng Harrison address phone 633 S. First -1219 41 King -9876 41 King -9876 213 Main -1234 Queen N. -002 808 Main Databases -4829 Topic 2: Relational and SQL saleid date Sale.custid C41 A17 B219 NULL B823 NULL 1 Dec Dec 9 Dec NULL Dec NULL 120 3122 3122 NULL 1697 NULL 40
Joins Revisited A Cartesian product SELECT * FROM a, b; A1 A2 100 B B1 B2 A 100 W B 200 X A2 A1 B1 B2 300 C Y A 100 100 W D Z B 100 W C 300 100 W D 100 W A 100 200 X B 200 X C 300 200 X D 200 X A 100 Y B Y C 300 Y D Y A 100 Z B Z C 300 D Equijoin, inner join SELECT * from A, B WHERE A1=B1; A2 A1 B1 B2 A 100 100 W Thetajoin SELECT * from A, B WHERE A1>B1; A2 A1 B1 B2 Z C 300 100 W Z C 300 200 X 41
Joins Revisited (cont'd) A Right outer join SELECT * FROM A RIGHT JOIN B on A1=B1; A1 A2 100 B B1 B2 A 100 W B 200 X A2 A1 B1 B2 300 C Y A 100 100 W D Z 200 X Y Z Full outer join ( union of right+left) SELECT * FROM A FULL JOIN b on A1=B1; Left outer join SELECT * FROM A LEFT JOIN B on A1=B1; A2 A1 B1 B2 A 100 100 W C 300 B D A2 A1 B1 B2 A 100 100 W 200 X Y Z C 300 B D 42
Subqueries List all employees that do not have any project assignment with more than 10 hours SELECT LNAME FROM EMPLOYEE, WORKS_ON WHERE SSN = ESSN AND HOURS <= 10.0; {>, >=, <, <=, <>} + {ANY, SOME, ALL} SELECT LNAME FROM EMPLOYEE WHERE SSN NOT IN (SELECT ESSN FROM WORKS_ON WHERE HOURS > 10.0); Or EXISTS SELECT LNAME FROM EMPLOYEE WHERE NOT EXISTS (SELECT * FROM WORKS_ON WHERE SSN = ESSN AND HOURS > 10.0); 43
Quiz Country Code Name ismemb er Country Organization Consider the following two SQL queries: SELECT Name FROM Country WHERE Code IN ( SELECT Country FROM IsMember WHERE Organization = 'EU' ); SELECT Name FROM Country, IsMember WHERE Code = Country AND Organization = 'EU'; Are these queries equivalent? (yes) (no) 4
Additional Features
Extended SELECT Syntax SELECT <attribute-list and function-list> FROM <table-list> [ WHERE <condition> ] [ GROUP BY <grouping attribute-list>] [ HAVING <group condition> ] [ ORDER BY <attribute-list> ]; 47
Aggregate Functions Used to accumulate information from multiple tuples, forming a single-tuple summary Built-in aggregate functions: SUM, MAX, MIN, AVG, COUNT Example: What is the average budget of all movies? SELECT AVG(budget) FROM Film; Used in the SELECT clause and the HAVING clause Hence, cannot be used in the WHERE clause! NULL values are not considered in the computations; e.g.,: 0 0 100 100 NULL 0 7 0 AVG: 48
Aggregate Functions (cont'd) Example How many movies were directed by Steven Spielberg? SELECT COUNT(*) FROM Film WHERE director='steven Spielberg'; All tuples in the result are counted, with duplicates! i.e., COUNT(title) or COUNT(director) give same result COUNT(DISTINCT year) would include each year only once 49
Grouping Before Aggregation How can we answer a query such as How many films were directed by each director after 2001? Need to produce a result with one tuple per director 1.Partition relation into subsets based on grouping column(s) 2.Apply aggregate function to each such group independently 3.Produce one tuple per group 0
Grouping Before Aggregation How can we answer a query such as How many films were directed by each director after 2001? GROUP BY clause to specify grouping attributes SELECT director, COUNT(*) FROM Film WHERE year > 2001 GROUP BY director; Important: Every element in SELECT clause must be a grouping column or an aggregation function. e.g., SELECT director, year, COUNT(*) would not be allowed (in the query above) unless also grouping by year: i.e., GROUP BY director, year 1
Filtering Out Whole Groups After partitioning into groups, whole partitions can be discarded i.e., HAVING clause specifies a condition on the grouped tuples SELECT DNO, COUNT(*), AVG(SALARY) FROM EMPLOYEE GROUP BY DNO HAVING COUNT(*) > 2; HAVING clause cannot reference individual tuples within a group Instead, can reference grouping column(s) and aggregates only Contrast WHERE clause to HAVING clause Note: As for aggregation, no GROUP BY clause means relation treated as one group 2
Sorting Query Results Show the department names and their locations in alphabetical order SELECT DNAME, DLOCATION FROM DEPARTMENT D, DEPT_LOCATIONS DL WHERE D.DNUMBER = DL.DNUMBER ORDER BY DNAME ASC, DLOCATION DESC; DNAME Administration Headquarters Research Research Research DLOCATION Stafford Houston Sugarland Houston Bellaire 3
SQL Data Manipulation
Inserting Data INSERT INTO <table> (<attr>, ) VALUES ( <val>, ) ; INSERT INTO <table> (<attr>, ) <subquery> ; Store information about how many hours an employee works for the project 1' into WORKS_ON. INSERT INTO WORKS_ON VALUES (12346789, 1, 32.); Integrity constraint! Referential integrity constraint!
Updating Data UPDATE <table> SET <attr> = <val>, WHERE <condition> ; UPDATE <table> SET (<attr>,.) = ( <subquery> ) WHERE <condition> ; Integrity constraint! Referential integrity constraint! Give all employees in the Research department a 10% raise in salary UPDATE EMPLOYEE SET SALARY = SALARY*1.1 WHERE DNO IN (SELECT DNUMBER FROM DEPARTMENT WHERE DNAME = Research ); 6
Deleting Data DELETE FROM <table> WHERE <condition> ; Delete the employees having the last name Borg from the EMPLOYEE table. DELETE FROM EMPLOYEE WHERE LNAME = Borg ; Foreign key EMPLOYEE FNAME M LNAME SSN Ramesh K Narayan 666884444 Joyce A English 434343 Ahmad V Jabbar 987987987 James E Borg 88866 DEPARTMENT DNAME DNUMBER MGRSSN Research 33344 Administration 4 98764321 Headquarters 1 88866 ON DELETE SET NULL / DEFAULT / CASCADE? Referential integrity constraint! 7
Views
What are Views? A virtual table derived from other (possibly virtual) tables, i.e. always up-to-date CREATE VIEW dept_view AS SELECT DNO, COUNT(*) AS C, AVG(SALARY) AS S FROM EMPLOYEE GROUP BY DNO; Why? Simplify query commands Provide data security Enhance programming productivity 9
www.liu.se