Grouping Data using GROUP BY in MySQL One key feature supported by the SELECT is to find aggregate values, grouping data elements as needed. <select statement>::= SELECT... [GROUP BY <group by definition>] [HAVING <expression> [{<operator> <expression>}...]] ] <group by definition>::= <column name> [ASC DESC] [{, <column name> [ASC DESC]}...] [WITH ROLLUP] The films in sakila are assigned to various categories, ratings and are in different languages. If you wish to determine how many films there are in each category, use GROUP BY category_i d. Each row in the result set is made of the multiple entries with the same value, also called grouping. Aggregate Functions in MySQL Aggregate functions are used to calculate results using field values from multiple records. There are five common aggregate functions. Common Aggregate Functions MySQL Aggregate Function Description COUNT() Returns the number of rows containing non-null values in the specified SUM() Returns the sum of the non-null values in the specified field. AVG() Returns the average of the non-null values in the specified field. MAX() Returns the maximum of the non-null values in the specified field. MIN() Returns the minimum of the non-null values in the specified field. SELECT customer_id, MAX(rental_date) 'LastRentalOn' FROM rental GROUP BY customer_id; SELECT customer_id, MIN(payment_date) 'OldestPaymentOn' FROM payment 1 / 11
GROUP BY customer_id; SELECT customer_id, SUM(amount) 'PaidAmount' FROM payment GROUP BY customer_id; SELECT l.name, COUNT(f.film_id) 'FilmCount' FROM language l LEFT JOIN film_detail f USING (language_id) GROUP BY f.language_id; SELECT category_name, ROUND(AVG(length)/60,2) 'Average Length In Hours' GROUP BY category_id; Show various aggregation functions usage. Using Conditions in Grouping Note: Conditions can be placed within an aggregate function, for example, to enable the consideration of only those values that satisfy a certain condition when you are using COUNT or SUM. SELECT MONTH(rental_date) rental_month, SUM(IF(rating = 'PG',1,0)) PG_count, COUNT(rental_id) rental_count FROM rental_detail GROUP BY MONTH(rental_date) ORDER BY 3 DESC, 2; Show count of rented films by rental month, also show count of 'PG' rated films. You can also use CASE statement in your grouping. SELECT r.category_name, (CASE WHEN return_date IS NULL THEN 'Still Out' 2 / 11
WHEN DATEDIFF(r.return_date, r.rental_date) <= r.rental_duration THEN 'On Time' ELSE 'Late' END) AS status, COUNT(r.rental_id) As rental_count FROM rental_detail r GROUP BY r.category_name, (CASE WHEN return_date IS NULL THEN 'Still Out' WHEN DATEDIFF(r.return_date, r.rental_date) <= r.rental_duration THEN 'On Time' ELSE 'Late' END) ORDER BY 1, 2; Show count of rented films by film-category and timeliness of returns, whether late or on-time. GROUP_CONCAT: Aggregate Functions The aggregate function GROUP_CONCAT groups together character strings, as shown by the following example where actors in a film are arrayed for a film in alphabetical order. SELECT f.title, GROUP_CONCAT(CONCAT(a.first_name, _utf8' ', a.last_name) ORDER BY a.last_name, a.first_name SEPARATOR ', ') AS actors FROM film f JOIN film_actor fa USING(film_id) JOIN actor a ON fa.actor_id = a.actor_id WHERE title LIKE 'SU%' GROUP BY f.film_id; Join film, film_actor and actor tables. Concatenate actors for each film into one string, grouping by film. Order actors by last_name, first_name. 3 / 11
Multiple Column GROUPING GROUP BY can be used for multiple columns. The following query averages rental and length for rating and category: SELECT category_name, rating, ROUND(AVG(rental_rate),2) 'Average Rental', ROUND(AVG(length),0) 'Average Length' GROUP BY category_name, rating ORDER BY category_name, rating; Show average length and rental rate by rating and category. GROUP BY WITH ROLLUP The key word WITH ROLLUP can be appended to GROUP BY column, where if GROUP BY groups only a single column, then an additional sum row is added, with the group name NULL : SELECT category_name, ROUND(AVG(rental_rate),2) 'Average Rental', ROUND(AVG(length),0) 'Average Length' GROUP BY category_name WITH ROLLUP; SELECT category_name, ROUND(AVG(rental_rate),2) 'Average Rental', ROUND(AVG(length),0) 'Average Length' WHERE category_name IS NOT NULL GROUP BY category_name WITH ROLLUP; SELECT category_name, ROUND(AVG(rental_rate),2) 'Average Rental', ROUND(AVG(length),0) 'Average Length' WHERE category_name IN ('Children','Foreign') GROUP BY category_name WITH ROLLUP; 4 / 11
Show average length and rental rate by category with a Rollup. Note there may be a NULL category, which can be filtered out via WHERE clause. ORDER BY cannot be used with ROLLUP. The last row is the rolled-up, summary row for all categories. WHERE clause limits the result set and also affects the ROLLUP. Filtering Aggregates using HAVING Clause The HAVING clause is used to filter grouped data. For example, the following code specifies that we only want information on languages that have more than a certain number of films. SELECT l.name, COUNT(f.film_id) FROM language l JOIN film_detail f USING (language_id) GROUP BY f.language_id HAVING COUNT(f.film_id) > 5 ORDER BY 2 DESC; Count films by language. Limit result set to languages with more than 5 films. 5 / 11
The HAVING clause works similar to the WHERE clause in that it consists of one or more conditions that define which rows are included in a result set. You cannot use aggregate functions or column aliases in expressions in your WHERE clause for which we use HAVING clause. In general, the HAVING clause is normally best suited to use in conjunction with the GROUP BY clause. A HAVING clause is constructed exactly like a WHERE clause, in terms of defining conditions and connecting multiple conditions with operators. For example, the following SELECT statement includes a HAVING clause that contains one condition: SELECT category_name, ROUND(AVG(rental_rate),2) 'Average Rental', ROUND(AVG(length),0) 'Average Length' GROUP BY category_name HAVING AVG(length) > 120 ORDER BY category_name; Show average length and rental rate by category. Limit result set to those having average length more than 2 hours. 6 / 11
You can use both WHERE and HAVING clauses in the same SELECT. The following query shows use of Alias in HAVING clause. SELECT c.customer_id, c.last_name, c.first_name, cat.name, COUNT(r.rental_id) AS total_rentals, SUM(p.amount) AS total_sales FROM payment AS p INNER JOIN rental AS r ON p.rental_id = r.rental_id INNER JOIN inventory AS i ON r.inventory_id = i.inventory_id INNER JOIN film AS f ON i.film_id = f.film_id INNER JOIN film_category AS fc ON f.film_id = fc.film_id INNER JOIN category AS cat ON fc.category_id = cat.category_id INNER JOIN customer AS c ON r.customer_id = c.customer_id WHERE c.customer_id BETWEEN 100 AND 200 GROUP BY c.customer_id, cat.category_id HAVING total_rentals > 3 ORDER BY total_sales DESC; Group payments and rental count by customer and category. Limit result set to those having rental count more than 3, using an alias for the count. Limit result set by customer ID between 100 and 200. We are using both WHERE and HAVING clauses in the same SELECT. Miscellaneous Grouping Concepts and General Constraints 7 / 11
Finding Top-N or Bottom-N Entities It is desirable sometimes to find best or worst-case records, such as top paying customers and films that are not getting rented. The same is achieved using creative mix of Grouping, Ordering and Limits. The following query finds the Top 5 paying customers. SELECT c.last_name, c.first_name, COUNT(1) AS 'RentalCount' FROM rental r JOIN customer c USING (customer_id) WHERE r.rental_date BETWEEN '20050601' AND '20050930' GROUP BY r.customer_id ORDER BY RentalCount DESC LIMIT 5; Group rental count by customer. Limit result set to 5 top renting customers by rental count. The third quarter of 2005 is used to filter data. Find 5 least-rented films. SELECT f.title, f.rating, f.category_name, COUNT(1) AS 'RentalCount' FROM rental r JOIN inventory i USING (inventory_id) JOIN film_detail f USING (film_id) GROUP BY f.film_id 8 / 11
ORDER BY RentalCount LIMIT 5; Group rental count by films. Find lowest renting films. Find oldest-rented 10 films. SELECT f.title, f.rating, f.category_name, MAX(rental_date) AS 'LastRentedOn' FROM rental r JOIN inventory i USING (inventory_id) JOIN film_detail f USING (film_id) GROUP BY f.film_id ORDER BY LastRentedOn LIMIT 10; Read max rental date by films. Find 10 films whose rental date is the oldest, or first in ordered list. Order of Clauses As seen in the syntax, the clauses are expected to follow a certain order to be syntactically correct. 1. SELECT 2. FROM 3. WHERE 4. GROUP BY 9 / 11
5. HAVING 6. ORDER BY SELECT city, COUNT(customer_id) AS NumCustomers FROM customer c JOIN address USING (address_id) JOIN city USING (city_id) WHERE city BETWEEN 'B' AND 'G' GROUP BY city HAVING COUNT(customer_id) > 1 ORDER BY NumCustomers; Group customer count by city. Filter cities by name. List cities with more than 1 living customer. Grouping Rules - Normally, every non-aggregate column that appears in the SELECT clause must also appear in the GROUP BY clause. MySQL relaxes this rule by allowing one to SELECT columns that are not in GROUP BY and vice versa, as will be seen in many examples. There are arguments on both sides, but this is an advantageous feature of MySQL, which can be made stricter using SQL Modes discussed in another lesson. - Normally, you may not use aliases in the HAVING clause, but again we see MySQL's liberty at play here. It is very common to use aliases in the HAVING clause in MySQL queries. - You may use aliases or actual fields in the ORDER BY clause, in addition to column 10 / 11
positions. - Normally, you may only use calculated fields in the HAVING clause, but MySQL's allows use of aliases which relaxes this control. - As we have seen, using ROLLUP is allowed with only one grouping column, and no ORDER BY can be used. Aggregate Functions and Grouping Conclusion in MySQL This lesson gave you the information necessary to perform the following tasks: - Use GROUP BY clauses to your SELECT statements to generate summary data - Use HAVING and other clauses to your SELECT statements to filter the results returned by summarized data To continue to learn MySQL go to the top of this page and click on the next lesson in this M ysql Tutorial's Table of Contents. 11 / 11