CSE 530A SQL Washington University Fall 2013
SELECT SELECT * FROM employee; employee_id last_name first_name department salary -------------+-----------+------------+-----------------+-------- 12345 Bunny Bugs Management 70000 23456 Duck Daffy Sales 40000 34567 Porky Pig Human Resources 35000 45678 Sam Yosemite Sales 47000 56789 Fudd Elmer Human Resources 36000 67890 Bird Tweety Marketing 38000 78901 Coyote Wile Research 42000 89012 Runner Road Research 52000 (8 rows) Returns all rows in employee
WHERE SELECT * FROM employee WHERE department = 'Sales'; employee_id last_name first_name department salary -------------+-----------+------------+------------+-------- 23456 Duck Daffy Sales 40000 45678 Sam Yosemite Sales 47000 (2 rows) Returns rows where predicate is true
WHERE SELECT * FROM employee WHERE department = 'Sales' AND salary > 45000; employee_id last_name first_name department salary -------------+-----------+------------+------------+-------- 45678 Sam Yosemite Sales 47000 (1 row) WHERE clause can be a compound boolean expression
WHERE SELECT * FROM employee WHERE (department = 'Sales' OR department = 'Research') AND salary > 45000; employee_id last_name first_name department salary -------------+-----------+------------+------------+-------- 45678 Sam Yosemite Sales 47000 89012 Runner Road Research 52000 (2 rows)
Aggregate Functions SELECT count(*) FROM employee WHERE department = 'Sales'; count ------- 2 (1 row) Aggregate functions operate over the matching rows
Aggregate Functions SELECT avg(salary) FROM employee WHERE department = 'Sales'; avg -------------------- 43500.000000000000 (1 row) Common aggregate functions include avg, count, min, max, sum
GROUP BY SELECT department, avg(salary) FROM employee GROUP BY department; department avg -----------------+-------------------- Research 47000.000000000000 Marketing 38000.000000000000 Management 70000.000000000000 Human Resources 35500.000000000000 Sales 43500.000000000000 (5 rows) GROUP BY allows aggregate functions to be applied to groups sharing a property
GROUP BY SELECT department, avg(salary), count(*) FROM employee GROUP BY department; department avg count -----------------+--------------------+------- Research 47000.000000000000 2 Marketing 38000.000000000000 1 Management 70000.000000000000 1 Human Resources 35500.000000000000 2 Sales 43500.000000000000 2 (5 rows)
HAVING SELECT department, avg(salary), count(*) FROM employee GROUP BY department HAVING count(*) > 1; department avg count -----------------+--------------------+------- Research 47000.000000000000 2 Human Resources 35500.000000000000 2 Sales 43500.000000000000 2 (3 rows) HAVING filters grouped results
WHERE GROUP BY SELECT department, avg(salary), count(*) FROM employee WHERE salary > 40000 GROUP BY department; department avg count ------------+--------------------+------- Research 47000.000000000000 2 Management 70000.000000000000 1 Sales 47000.000000000000 1 (3 rows) WHERE is applied before the grouping, HAVING is applied after
WHERE GROUP BY -HAVING SELECT department, avg(salary), count(*) FROM employee WHERE salary > 40000 GROUP BY department HAVING count(*) > 1; department avg count ------------+--------------------+------- Research 47000.000000000000 2 (1 row)
WHERE GROUP BY -HAVING SELECT department, avg(salary), count(*) FROM employee WHERE salary > 50000 GROUP BY department HAVING count(*) > 1; department avg count ------------+-----+------- (0 rows)
ORDER BY Results are returned in arbitrary order unless explicitly specified SELECT * FROM employee ORDER BY last_name, first_name; employee_id last_name first_name department salary -------------+-----------+------------+-----------------+-------- 67890 Bird Tweety Marketing 38000 12345 Bunny Bugs Management 70000 78901 Coyote Wile Research 42000 23456 Duck Daffy Sales 40000 56789 Fudd Elmer Human Resources 36000 34567 Porky Pig Human Resources 35000 89012 Runner Road Research 52000 45678 Sam Yosemite Sales 47000 (8 rows)
SELECT Summary SELECT FROM WHERE GROUP BY HAVING ORDER BY ; SELECT and FROM clauses are required Other clauses are optional but must appear in this order (and are applied in this order)
Aliases Fields in the SELECT clause and tables in the FROM clause can be given aliases SELECT department, avg(salary) AS average_salary, count(*) AS number FROM employee AS emp WHERE emp.salary > 40000 GROUP BY department HAVING count(*) > 1; department average_salary number ------------+--------------------+-------- Research 47000.000000000000 2 (1 row)
Aliases The AS keyword is optional SELECT department, avg(salary) average_salary, count(*) number FROM employee emp WHERE emp.salary > 40000 GROUP BY department HAVING count(*) > 1; department average_salary number ------------+--------------------+-------- Research 47000.000000000000 2 (1 row)
Joining A key feature of relational databases is the ability to combine fields from multiple tables using common values
Example Consider our employee example Table "public.employee" Column Type Modifiers -------------+---------+----------- employee_id integer last_name text first_name text department text salary integer Suppose we want to add data about the departments, such as budget?
Example We could add it to the employee table Table "public.employee" Column Type Modifiers -------------+---------+----------- employee_id integer last_name text first_name text department text salary integer budget integer but that leads to duplicate data (and isn't really about the employees)
Example employee_id last_name first_name department salary budget -------------+-----------+------------+------------+--------+-------- 78901 Coyote Wile Research 42000 150000 89012 Runner Road Research 52000 150000 There's a danger that the budget data can become inconsistent Solution: Create a separate table for department
Example CREATE TABLE department ( name text, budget integer ); INSERT INTO department VALUES ('Management', 200000); INSERT INTO department VALUES ('Sales', 150000); INSERT INTO department VALUES ('Human Resources', 100000); INSERT INTO department VALUES ('Marketing', 300000); INSERT INTO department VALUES ('Research', 200000);
Example SELECT * FROM department; name budget -----------------+-------- Management 200000 Sales 150000 Human Resources 100000 Marketing 300000 Research 200000 (5 rows)
Cross Join SELECT * FROM employee, department; SELECT * FROM employee CROSS JOIN department; CROSS JOIN returns the Cartesian product of the tables, combining every row of the first table with every row of the second table employee_id last_name first_name department salary name budget -------------+-----------+------------+-----------------+--------+-----------------+-------- 12345 Bunny Bugs Management 70000 Management 200000 23456 Duck Daffy Sales 40000 Management 200000 34567 Porky Pig Human Resources 35000 Management 200000 45678 Sam Yosemite Sales 47000 Management 200000 56789 Fudd Elmer Human Resources 36000 Management 200000 67890 Bird Tweety Marketing 38000 Management 200000 78901 Coyote Wile Research 42000 Management 200000 89012 Runner Road Research 52000 Management 200000 12345 Bunny Bugs Management 70000 Sales 150000 23456 Duck Daffy Sales 40000 Sales 150000 34567 Porky Pig Human Resources 35000 Sales 150000 45678 Sam Yosemite Sales 47000 Sales 150000 56789 Fudd Elmer Human Resources 36000 Sales 150000 67890 Bird Tweety Marketing 38000 Sales 150000
Cross Join 78901 Coyote Wile Research 42000 Sales 150000 89012 Runner Road Research 52000 Sales 150000 12345 Bunny Bugs Management 70000 Human Resources 100000 23456 Duck Daffy Sales 40000 Human Resources 100000 34567 Porky Pig Human Resources 35000 Human Resources 100000 45678 Sam Yosemite Sales 47000 Human Resources 100000 56789 Fudd Elmer Human Resources 36000 Human Resources 100000 67890 Bird Tweety Marketing 38000 Human Resources 100000 78901 Coyote Wile Research 42000 Human Resources 100000 89012 Runner Road Research 52000 Human Resources 100000 12345 Bunny Bugs Management 70000 Marketing 300000 23456 Duck Daffy Sales 40000 Marketing 300000 34567 Porky Pig Human Resources 35000 Marketing 300000 45678 Sam Yosemite Sales 47000 Marketing 300000 56789 Fudd Elmer Human Resources 36000 Marketing 300000 67890 Bird Tweety Marketing 38000 Marketing 300000 78901 Coyote Wile Research 42000 Marketing 300000 89012 Runner Road Research 52000 Marketing 300000 12345 Bunny Bugs Management 70000 Research 200000 23456 Duck Daffy Sales 40000 Research 200000 34567 Porky Pig Human Resources 35000 Research 200000 45678 Sam Yosemite Sales 47000 Research 200000 56789 Fudd Elmer Human Resources 36000 Research 200000 67890 Bird Tweety Marketing 38000 Research 200000 78901 Coyote Wile Research 42000 Research 200000 89012 Runner Road Research 52000 Research 200000 (40 rows)
Cross Join Cross joins are generally not very useful unless combined with a WHERE clause SELECT * FROM employee, department WHERE employee.department = department.name; employee_id last_name first_name department salary name budget -------------+-----------+------------+-----------------+--------+-----------------+-------- 12345 Bunny Bugs Management 70000 Management 200000 45678 Sam Yosemite Sales 47000 Sales 150000 23456 Duck Daffy Sales 40000 Sales 150000 56789 Fudd Elmer Human Resources 36000 Human Resources 100000 34567 Porky Pig Human Resources 35000 Human Resources 100000 67890 Bird Tweety Marketing 38000 Marketing 300000 89012 Runner Road Research 52000 Research 200000 78901 Coyote Wile Research 42000 Research 200000 (8 rows)
Inner Join A cross join with a WHERE clause is the same as an inner join SELECT * FROM employee INNER JOIN department ON (employee.department = department.name); employee_id last_name first_name department salary name budget -------------+-----------+------------+-----------------+--------+-----------------+-------- 12345 Bunny Bugs Management 70000 Management 200000 45678 Sam Yosemite Sales 47000 Sales 150000 23456 Duck Daffy Sales 40000 Sales 150000 56789 Fudd Elmer Human Resources 36000 Human Resources 100000 34567 Porky Pig Human Resources 35000 Human Resources 100000 67890 Bird Tweety Marketing 38000 Marketing 300000 89012 Runner Road Research 52000 Research 200000 78901 Coyote Wile Research 42000 Research 200000 (8 rows) The inner join syntax is now preferred
Aliases in Joins SELECT * FROM employee AS emp INNER JOIN department AS dep ON (emp.department = dep.name); Table aliases can sometime simplify complex statements
NULLs in Joins What about NULLs? Lets add an employee without a department INSERT INTO employee (employee_id, last_name, first_name) VALUES (90123, 'Martian', 'Marvin'); Could also do Or INSERT INTO employee VALUES (90123, 'Martian', 'Marvin', NULL, NULL); INSERT INTO employee VALUES (90123, 'Martian', 'Marvin'); Note that NULL values at the end can be left off
NULLs in Joins SELECT * FROM employee ORDER BY employee_id; employee_id last_name first_name department salary -------------+-----------+------------+-----------------+-------- 12345 Bunny Bugs Management 70000 23456 Duck Daffy Sales 40000 34567 Porky Pig Human Resources 35000 45678 Sam Yosemite Sales 47000 56789 Fudd Elmer Human Resources 36000 67890 Bird Tweety Marketing 38000 78901 Coyote Wile Research 42000 89012 Runner Road Research 52000 90123 Martian Marvin (9 rows) Note empty values in last row
NULLs in Joins What happens when we join employee with department? Remember that NULL doesn't match anything, even itself
NULLs in Joins SELECT * FROM employee AS emp INNER JOIN department AS dep ON (emp.department = dep.name) ORDER BY employee_id; employee_id last_name first_name department salary name budget -------------+-----------+------------+-----------------+--------+-----------------+-------- 12345 Bunny Bugs Management 70000 Management 200000 23456 Duck Daffy Sales 40000 Sales 150000 34567 Porky Pig Human Resources 35000 Human Resources 100000 45678 Sam Yosemite Sales 47000 Sales 150000 56789 Fudd Elmer Human Resources 36000 Human Resources 100000 67890 Bird Tweety Marketing 38000 Marketing 300000 78901 Coyote Wile Research 42000 Research 200000 89012 Runner Road Research 52000 Research 200000 (8 rows) Note that Marvin is missing!
Outer Joins Outer joins include rows without matches LEFT OUTER JOIN: includes all rows from the first (left) table whether or not there are matching rows in the second (right) table RIGHT OUTER JOIN: includes all rows from the second (right) table FULL OUTER JOIN: includes all rows from both tables
Example INSERT INTO department VALUES ('IT', 250000); SELECT * FROM department; name budget -----------------+-------- Management 200000 Sales 150000 Human Resources 100000 Marketing 300000 Research 200000 IT 250000 (6 rows)
LEFT OUTER JOIN SELECT * FROM employee AS emp LEFT OUTER JOIN department AS dep ON (emp.department = dep.name) ORDER BY employee_id; employee_id last_name first_name department salary name budget -------------+-----------+------------+-----------------+--------+-----------------+-------- 12345 Bunny Bugs Management 70000 Management 200000 23456 Duck Daffy Sales 40000 Sales 150000 34567 Porky Pig Human Resources 35000 Human Resources 100000 45678 Sam Yosemite Sales 47000 Sales 150000 56789 Fudd Elmer Human Resources 36000 Human Resources 100000 67890 Bird Tweety Marketing 38000 Marketing 300000 78901 Coyote Wile Research 42000 Research 200000 89012 Runner Road Research 52000 Research 200000 90123 Martian Marvin (9 rows) Note that Marvin is included but the IT department is not
RIGHT OUTER JOIN SELECT * FROM employee AS emp RIGHT OUTER JOIN department AS dep ON (emp.department = dep.name) ORDER BY employee_id; employee_id last_name first_name department salary name budget -------------+-----------+------------+-----------------+--------+-----------------+-------- 12345 Bunny Bugs Management 70000 Management 200000 23456 Duck Daffy Sales 40000 Sales 150000 34567 Porky Pig Human Resources 35000 Human Resources 100000 45678 Sam Yosemite Sales 47000 Sales 150000 56789 Fudd Elmer Human Resources 36000 Human Resources 100000 67890 Bird Tweety Marketing 38000 Marketing 300000 78901 Coyote Wile Research 42000 Research 200000 89012 Runner Road Research 52000 Research 200000 IT 250000 (9 rows) Note that the IT department is included but Marvin is not
FULL OUTER JOIN SELECT * FROM employee AS emp FULL OUTER JOIN department AS dep ON (emp.department = dep.name) ORDER BY employee_id; employee_id last_name first_name department salary name budget -------------+-----------+------------+-----------------+--------+-----------------+-------- 12345 Bunny Bugs Management 70000 Management 200000 23456 Duck Daffy Sales 40000 Sales 150000 34567 Porky Pig Human Resources 35000 Human Resources 100000 45678 Sam Yosemite Sales 47000 Sales 150000 56789 Fudd Elmer Human Resources 36000 Human Resources 100000 67890 Bird Tweety Marketing 38000 Marketing 300000 78901 Coyote Wile Research 42000 Research 200000 89012 Runner Road Research 52000 Research 200000 90123 Martian Marvin IT 250000 (10 rows) Note that both Marvin and the IT department are included
Multiple Joins Multiple tables can be joined together Example: SELECT * FROM address; employee_id address -------------+------------- 12345 1 Acme Lane 23456 2 Acme Lake 34567 3 Acme Ave (3 rows)
Multiple Joins SELECT * FROM employee AS emp LEFT OUTER JOIN department AS dep ON (emp.department = dep.name) LEFT OUTER JOIN address AS adr ON (emp.employee_id = adr.employee_id) ORDER BY emp.employee_id; employee_id last_name first_name department salary name budget employee_id address -------------+-----------+------------+-----------------+--------+-----------------+--------+-------------+------------- 12345 Bunny Bugs Management 70000 Management 200000 12345 1 Acme Lane 23456 Duck Daffy Sales 40000 Sales 150000 23456 2 Acme Lake 34567 Porky Pig Human Resources 35000 Human Resources 100000 34567 3 Acme Ave 45678 Sam Yosemite Sales 47000 Sales 150000 56789 Fudd Elmer Human Resources 36000 Human Resources 100000 67890 Bird Tweety Marketing 38000 Marketing 300000 78901 Coyote Wile Research 42000 Research 200000 89012 Runner Road Research 52000 Research 200000 90123 Martian Marvin (9 rows) Note that employee_id shows up twice, once for each table it is in, since we used * in the SELECT
Disambiguation We can eliminate the duplicate fields by explicitly listing the fields we want, but we must use the table name or alias to disambiguate which employee_id we want Note that we also needed to disambiguate the ORDER BY field SELECT emp.employee_id, first_name, last_name, address FROM employee AS emp LEFT OUTER JOIN address AS adr ON (emp.employee_id = adr.employee_id) ORDER BY emp.employee_id; employee_id first_name last_name address -------------+------------+-----------+------------- 12345 Bugs Bunny 1 Acme Lane 23456 Daffy Duck 2 Acme Lake 34567 Pig Porky 3 Acme Ave 45678 Yosemite Sam 56789 Elmer Fudd 67890 Tweety Bird 78901 Wile Coyote 89012 Road Runner 90123 Marvin Martian (9 rows)
USING If we're joining on columns with the same name then the USING construct can be used SELECT * FROM employee AS emp LEFT OUTER JOIN address AS adr USING (employee_id) ORDER BY employee_id; employee_id last_name first_name department salary address -------------+-----------+------------+-----------------+--------+------------- 12345 Bunny Bugs Management 70000 1 Acme Lane 23456 Duck Daffy Sales 40000 2 Acme Lake 34567 Porky Pig Human Resources 35000 3 Acme Ave 45678 Sam Yosemite Sales 47000 56789 Fudd Elmer Human Resources 36000 67890 Bird Tweety Marketing 38000 78901 Coyote Wile Research 42000 89012 Runner Road Research 52000 90123 Martian Marvin (9 rows) Note there is only one employee_id column in the results And therefore we don't need to disambiguate the ORDER BY employee_id
IN SELECT * FROM employee WHERE department IN ('Sales', 'Research') ORDER BY employee_id; employee_id last_name first_name department salary -------------+-----------+------------+------------+-------- 23456 Duck Daffy Sales 40000 45678 Sam Yosemite Sales 47000 78901 Coyote Wile Research 42000 89012 Runner Road Research 52000 (4 rows)
Subselect SELECT * FROM employee AS emp WHERE department IN ( SELECT name FROM department WHERE budget > 200000 ) ORDER BY employee_id; employee_id last_name first_name department salary -------------+-----------+------------+------------+-------- 67890 Bird Tweety Marketing 38000 (1 row)
Join SELECT * FROM employee emp INNER JOIN department dep ON (emp.department = dep.name) WHERE dep.budget > 200000 ORDER BY employee_id; employee_id last_name first_name department salary name budget -------------+-----------+------------+------------+--------+-----------+-------- 67890 Bird Tweety Marketing 38000 Marketing 300000 (1 row)