INFS0 - Database Management Systems SQL: Aggregate Functions with Grouping Peter Y. Wu Department of Computer and Information Systems Robert Morris University Simple SQL Query Syntax Select-From-Where select [ distinct all ] <goal-list> (removal of duplicates) from <source-table> (simple query has one table) where <conditions> (combination using operators) order by <column> [ asc desc ],... (order of presentation) (c) Peter Y Wu - RMU.
INFS0 - Database Management Systems SQL Query using aggregate functions Select-functions-From-Where select <goal-list-using-aggregate-functions > from <source-table> (simple query has one table) where <conditions> (filter to eliminate rows) Result has only ONE row. No need to use distinct keyword. No need to use order by. 3 SQL Query with GROUP BY Select-From-Where--Having select [ distinct all ] <goal-list-from-grouped-table> from <source-table> (simple query has one table) where <conditions> (filter: eliminate rows for consideration) group by <pivot-list> (to divide table into groups) having <conditions> (conditions based on the grouped table) order by <term-from-grouped-table> [ asc desc ],... (order of presentation) (c) Peter Y Wu - RMU.
INFS0 - Database Management Systems Aggregate Functions Earliest birth and latest death of composers who lived for longer than 50 years select min(born), max(died) from Composer where Born-Died > 50; Aggregate functions do not consider null values. Aggregate functions will consider duplicates, unless otherwise specified, by keyword distinct. select count(distinct ) from Composer; Keyword distinct applies only to a specific term, but cannot apply to the * argument in count( ). 5 Aggregate Functions select count(cname) from Composer; count(cname) ------------------ 5 Composer CName Born Died Vivaldi 678 7 Bach 685 750 3 Mozart 756 79 Prokofiev 89 953 5 Dvorak 8 90 Can we find the number of composers born in each century? 6 (c) Peter Y Wu - RMU. 3
INFS0 - Database Management Systems Aggregate Functions Composer PNo PName CName Born Died 000 The Four Seasons Vivaldi 678 7 00 B-minor Mass Bach 685 750 003 Christmas Oratorio 3 Mozart 756 79 00 Missa Solemnis 3 Prokofiev 89 953 006 Classical Symphony 5 Dvorak 8 90 007 Cinderella select count(pname) # of pieces from where = ; Can we find the number of pieces written by each composer? 7 Aggregate functions with group by We can apply aggregate function to sub-groups of values in any data set (i.e., a column). select, from group by ; 000 The Four Seasons 00 B-minor Mass 003 Christmas Oratorio 00 Missa Solemnis 3 006 Classical Symphony 007 Cinderella 3 8 (c) Peter Y Wu - RMU.
INFS0 - Database Management Systems Aggregate functions with group by We form the sub-groups by the unique values of the group by columns that is, the pivot. We apply the aggregate function to each sub-group... select, from group by ; 000 The Four Seasons 00 B-minor Mass 003 Christmas Oratorio 00 Missa Solemnis 3 006 Classical Symphony 007 Cinderella 3 9 Aggregate functions with group by The result table therefore consists of ONE row for each sub-group (unique values of the pivot). How many rows should the result table contain? select, from group by ; 000 The Four Seasons 00 B-minor Mass 003 Christmas Oratorio 00 Missa Solemnis 3 006 Classical Symphony 007 Cinderella 3 0 (c) Peter Y Wu - RMU. 5
INFS0 - Database Management Systems Aggregate function without GROUP BY always produces exactly one row of result. Aggregate function with GROUP BY becomes very powerful it produces a table! We must first understand how grouping works with the GROUP BY clause. The pivot is the collection of columns (terms) used in GROUP BY A table is a collection of records (rows). With GROUP BY, we attempt to sub-divide the collection of records into sub-groups. Table T GROUP BY C means that we will sub-divide table T into sub-groups by the unique values under column C that is, the unique values under the pivot. For each unique value, we have one sub-group (c) Peter Y Wu - RMU. 6
INFS0 - Database Management Systems Table T GROUP BY C Table T GROUP BY C T C C C3 000 John 33 00 Robert 7 00 Edward 003 John 33 00 Robert 00 Edward C 000 003 00 00 00 00 T C John John Robert Robert Edward Edward C3 33 33 7 There are groups! There are 3 groups! 3 Table T GROUP BY C3 Table T GROUP BY C,C3 T C C C3 000 John 33 003 John 33 00 Robert 7 00 Robert 00 Edward 00 Edward C 000 003 00 00 00 00 T C John John Robert Robert Edward Edward C3 33 33 7 There are groups! The same groups! (c) Peter Y Wu - RMU. 7
INFS0 - Database Management Systems Table Composer GROUP BY floor(born/00+) Composer CName Born Died Vivaldi 678 7 Bach 685 750 3 Mozart 756 79 Prokofiev 89 953 5 Dvorak 8 90 (extended) floor(born/00)+ 7 7 8 9 9 There are 3 groups one sub-group for each century! 5 The pivot is the collection of columns (terms) used in GROUP BY. Each sub-group has at least one record (row). Each sub-group has the same value under the pivot that is, all the records (rows) is each sub-group share the same value of the pivot. Under other columns, the values may be different. 6 (c) Peter Y Wu - RMU. 8
INFS0 - Database Management Systems Table GROUP BY Each sub-group has the same value under the pivot that is, all the records (rows) is each sub-group share the same value of the pivot. Under other columns, the values may be different. 000 The Four Seasons 00 B-minor Mass 003 Christmas Oratorio 00 Missa Solemnis 3 006 Classical Symphony 007 Cinderella 7 Aggregate function with Table GROUP BY Now when we apply aggregate function, we apply it once to each sub-group, generating exactly one row for each sub-group. Select count(pno) from GROUP BY ; 000 The Four Seasons 00 B-minor Mass 003 Christmas Oratorio 00 Missa Solemnis 3 006 Classical Symphony 007 Cinderella count(pno) 8 (c) Peter Y Wu - RMU. 9
INFS0 - Database Management Systems Aggregate function with When we apply aggregate function to each sub-group, we may also select columns (or terms) from the pivot to be listed... Select, from GROUP BY ; 000 The Four Seasons 00 B-minor Mass 003 Christmas Oratorio 00 Missa Solemnis 3 006 Classical Symphony 3 007 Cinderella 9 Aggregate function with But selecting from non-pivot columns will NOT be allowed, because that won t make sense! select PNo, from group by ; 000 The Four Seasons 00 B-minor Mass 003 Christmas Oratorio 00 Missa Solemnis 3 006 Classical Symphony 007 Cinderella PNo???? X 0 (c) Peter Y Wu - RMU. 0
INFS0 - Database Management Systems Aggregate function with Even when rows of the non-pivot column share the same value in each sub-group, the column cannot be selected because it is not in the pivot! select CName, from group by ; PNo PName CName 000 The Four Seasons Vivaldi 00 B-minor Mass Bach 003 Christmas Oratorio Bach 00 Missa Solemnis Mozart 3 006 Classical Symphony Prokofiev 007 Cinderella Prokofiev CName Vivaldi Bach Mozart Prokofiev X Aggregate function with The goal list to be selected can only be: () aggregate functions, or () terms involving only columns in the pivot. Select floor(born/00)+ as Century, count() as # of composers, avg(died-born) as Average Life from Composer group by floor(born/00)+; Composer CName Born Died Vivaldi 678 7 Bach 685 750 3 Mozart 756 79 Prokofiev 89 953 5 Dvorak 8 90 Century # of composers 7 8 9 Average Life 6 35 6.5 (c) Peter Y Wu - RMU.
INFS0 - Database Management Systems Aggregate functions with group by select <goal-list-with-aggr-func-and-pivot-cols> from <source-table> where <where-condition> group by <pivot-list>; First, the table in the source-list is identified. Second, the where-condition is evaluated for each row. Rows not meeting the where-condition are eliminated. Third, the columns (terms) for the pivot list are identified. Fourth, the table is grouped according to the unique values of the pivot columns, into sub-groups. Fifth, the columns referred to in the aggregate functions in the goal list are evaluated, one evaluation for each sub-group, producing one row of values for each sub-group. Sixth, the results are listed, with exactly ONE row for each unique value of the pivot columns. 3 Group by with having select <goal-list-with-aggr-func-and-pivot-cols> from <source-table> where <where-condition> group by <pivot-list> having <group-condition>; First, the table in the source-list is identified. Second, the where-condition is evaluated for each row. Rows not meeting the where-condition are eliminated. Third, the columns (terms) for the pivot list are identified. Fourth, the table is grouped according to the unique values of the pivot columns, into sub-groups. Fifth, the columns referred to in the aggregate functions in the goal list are evaluated, one evaluation for each sub-group, producing one row of values for each sub-group. Sixth, the results are listed, with exactly ONE row for each unique value of the pivot columns. Seventh, only those rows satisfying the group-condition (specified by having ) are listed in the result as output. (c) Peter Y Wu - RMU.
INFS0 - Database Management Systems Group by with having select, from group by having >= ; 000 The Four Seasons 00 B-minor Mass 003 Christmas Oratorio 00 Missa Solemnis 3 006 Classical Symphony 007 Cinderella 3 In the same way as the goal-list, the condition specified for having may only involve aggregate functions, or terms included in the pivot. 5 (c) Peter Y Wu - RMU. 3