Grouping Data using GROUP BY in MySQL

Similar documents
To insert a record into a table, you must specify values for all fields that do not have default values and cannot be NULL.

Database performance becomes an important issue in the presence of

Introducing Transactions

CSCI DBMS Spring 2017 Final Examination. Last Name: First Name: Student Id:

CSCI 5333 DBMS Fall 2017 Final Examination. Last Name: First Name: Student Id:

Target Practice. A Workshop in Tuning MySQL Queries OSCON Jay Pipes Community Relations Manager, North America MySQL, Inc.

Relational Database Management Systems for Epidemiologists: SQL Part II

Optimizing Queries with EXPLAIN

Brief History of SQL. Relational Database Management System. Popular Databases

Lecture 17. Monday, November 17, 2014

Relational Database Management Systems for Epidemiologists: SQL Part I

MIS2502: Data Analytics SQL Getting Information Out of a Database Part 1: Basic Queries

Jarek Szlichta

Simple queries Set operations Aggregate operators Null values Joins Query Optimization. John Edgar 2

1. Introduction. 2. History. Table of Contents

1Z Oracle Database 11g - SQL Fundamentals I Exam Summary Syllabus Questions

CS317 File and Database Systems

MySQL User Conference and Expo 2010 Optimizing Stored Routines

Instructor: Craig Duckett. Lecture 03: Tuesday, April 3, 2018 SQL Sorting, Aggregates and Joining Tables

3/3/2008. Announcements. A Table with a View (continued) Fields (Attributes) and Primary Keys. Video. Keys Primary & Foreign Primary/Foreign Key

12. MS Access Tables, Relationships, and Queries

How to use SQL to work with a MySQL database

Data Manipulation Language (DML)

CIS 363 MySQL. Chapter 12 Joins Chapter 13 Subqueries

MIS2502: Data Analytics SQL Getting Information Out of a Database. Jing Gong

Subquery: There are basically three types of subqueries are:

Sakila Sample Database

/* Module 9 Subqueries

T-SQL Training: T-SQL for SQL Server for Developers

BEYOND THE RDBMS: WORKING WITH RELATIONAL DATA IN MARKLOGIC

Sakila Sample Database

CSC Web Programming. Introduction to SQL

Writing High Performance SQL Statements. Tim Sharp July 14, 2014

Implementing Table Operations Using Structured Query Language (SQL) Using Multiple Operations. SQL: Structured Query Language

INDEX. 1 Basic SQL Statements. 2 Restricting and Sorting Data. 3 Single Row Functions. 4 Displaying data from multiple tables

II (The Sequel) We will use the following database as an example throughout this lab, found in students.db.

Chapter 3. Introduction to relational databases and MySQL. 2010, Mike Murach & Associates, Inc. Murach's PHP and MySQL, C3

STIDistrict Query (Basic)

Institute of Aga. Network Database LECTURER NIYAZ M. SALIH

Oracle Syllabus Course code-r10605 SQL

NCSS: Databases and SQL

Based on the following Table(s), Write down the queries as indicated: 1. Write an SQL query to insert a new row in table Dept with values: 4, Prog, MO

INTERMEDIATE SQL GOING BEYOND THE SELECT. Created by Brian Duffey

Institute of Aga. Microsoft SQL Server LECTURER NIYAZ M. SALIH

MariaDB Crash Course. A Addison-Wesley. Ben Forta. Upper Saddle River, NJ Boston. Indianapolis. Singapore Mexico City. Cape Town Sydney.

MySQL Workshop. Scott D. Anderson

Introduction to relational databases and MySQL

Exam #1 Review. Zuyin (Alvin) Zheng

Introduction to SQL. IT 5101 Introduction to Database Systems. J.G. Zheng Fall 2011

COMP 244 DATABASE CONCEPTS & APPLICATIONS

CSCI 5333 DBMS Spring 2018 Final Examination. Last Name: First Name: Student Id:

1 Writing Basic SQL SELECT Statements 2 Restricting and Sorting Data

This lecture. Databases - SQL II. Counting students. Summary Functions

Databases - SQL II. (GF Royle, N Spadaccini ) Structured Query Language II 1 / 22

Advanced Data Management Technologies

Set theory is a branch of mathematics that studies sets. Sets are a collection of objects.

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

Outline. Introduction to SQL. What happens when you run an SQL query? There are 6 possible clauses in a select statement. Tara Murphy and James Curran

Database Programming with SQL

Introduction to SQL. Tara Murphy and James Curran. 15th April, 2009

Databases (MariaDB/MySQL) CS401, Fall 2015

DATA CUBE : A RELATIONAL AGGREGATION OPERATOR GENERALIZING GROUP-BY, CROSS-TAB AND SUB-TOTALS SNEHA REDDY BEZAWADA CMPT 843

OVERVIEW OF RELATIONAL DATABASES: KEYS

Retrieving Data Using the SQL SELECT Statement. Copyright 2009, Oracle. All rights reserved.

Retrieving Data Using the SQL SELECT Statement. Copyright 2004, Oracle. All rights reserved.

Querying Data with Transact SQL

Advanced SQL GROUP BY Clause and Aggregate Functions Pg 1

SQL functions fit into two broad categories: Data definition language Data manipulation language

Lesson 2. Data Manipulation Language

Language. f SQL. Larry Rockoff COURSE TECHNOLOGY. Kingdom United States. Course Technology PTR. A part ofcenqaqe Learninq

Learn SQL by Calculating Customer Lifetime Value

CMPT 354: Database System I. Lecture 4. SQL Advanced

Working with Columns, Characters and Rows. Copyright 2008, Oracle. All rights reserved.

Table of Contents. PDF created with FinePrint pdffactory Pro trial version

SQL Part 2. Kathleen Durant PhD Northeastern University CS3200 Lesson 6

Relational Database Development

ASSIGNMENT NO Computer System with Open Source Operating System. 2. Mysql

CSE 530A SQL. Washington University Fall 2013

SQL Data Manipulation Language. Lecture 5. Introduction to SQL language. Last updated: December 10, 2014

SELECT WHERE JOIN. DBMS - Select. Robert Lowe. Division of Mathematics and Computer Science Maryville College. February 16, 2016

Tutorial 3 Maintaining and Querying a Database. Finding Data in a Table. Updating a Database

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 7 Introduction to Structured Query Language (SQL)

Reference: W3School -

MIS2502: Review for Exam 1. Jing Gong

Intermediate SQL: Aggregated Data, Joins and Set Operators

SQL QUERIES. CS121: Relational Databases Fall 2017 Lecture 5

20761 Querying Data with Transact SQL

CMP-3440 Database Systems

NEED FOR SPEED: BEST PRACTICES FOR MYSQL PERFORMANCE TUNING JANIS GRIFFIN PERFORMANCE EVANGELIST / SENIOR DBA

Introduction. Sample Database SQL-92. Sample Data. Sample Data. Chapter 6 Introduction to Structured Query Language (SQL)

Database Updater. Database Updater does not have a use interface, so it will appear in Extra Performers window for the page at design time:

Restricting and Sorting Data. Copyright 2004, Oracle. All rights reserved.

FUN WITH ANALYTIC FUNCTIONS UTOUG TRAINING DAYS 2017

Actual4Test. Actual4test - actual test exam dumps-pass for IT exams

In This Lecture. Yet More SQL SELECT ORDER BY. SQL SELECT Overview. ORDER BY Example. ORDER BY Example. Yet more SQL

CMP-3440 Database Systems

Unit Assessment Guide

Study Guide for: Oracle Database SQL Certified Expert Exam Guide (Exam 1Z0-047)

Stephen Redmond, CTO and Qlik Luminary

Relational Database Languages

Transcription:

Grouping Data using GROUP BY in MySQL One key feature supported by the SELECT is to find aggregate values, grouping data elements as needed. <select statement>::= SELECT... [GROUP BY <group by definition>] [HAVING <expression> [{<operator> <expression>}...]] ] <group by definition>::= <column name> [ASC DESC] [{, <column name> [ASC DESC]}...] [WITH ROLLUP] The films in sakila are assigned to various categories, ratings and are in different languages. If you wish to determine how many films there are in each category, use GROUP BY category_i d. Each row in the result set is made of the multiple entries with the same value, also called grouping. Aggregate Functions in MySQL Aggregate functions are used to calculate results using field values from multiple records. There are five common aggregate functions. Common Aggregate Functions MySQL Aggregate Function Description COUNT() Returns the number of rows containing non-null values in the specified SUM() Returns the sum of the non-null values in the specified field. AVG() Returns the average of the non-null values in the specified field. MAX() Returns the maximum of the non-null values in the specified field. MIN() Returns the minimum of the non-null values in the specified field. SELECT customer_id, MAX(rental_date) 'LastRentalOn' FROM rental GROUP BY customer_id; SELECT customer_id, MIN(payment_date) 'OldestPaymentOn' FROM payment 1 / 11

GROUP BY customer_id; SELECT customer_id, SUM(amount) 'PaidAmount' FROM payment GROUP BY customer_id; SELECT l.name, COUNT(f.film_id) 'FilmCount' FROM language l LEFT JOIN film_detail f USING (language_id) GROUP BY f.language_id; SELECT category_name, ROUND(AVG(length)/60,2) 'Average Length In Hours' GROUP BY category_id; Show various aggregation functions usage. Using Conditions in Grouping Note: Conditions can be placed within an aggregate function, for example, to enable the consideration of only those values that satisfy a certain condition when you are using COUNT or SUM. SELECT MONTH(rental_date) rental_month, SUM(IF(rating = 'PG',1,0)) PG_count, COUNT(rental_id) rental_count FROM rental_detail GROUP BY MONTH(rental_date) ORDER BY 3 DESC, 2; Show count of rented films by rental month, also show count of 'PG' rated films. You can also use CASE statement in your grouping. SELECT r.category_name, (CASE WHEN return_date IS NULL THEN 'Still Out' 2 / 11

WHEN DATEDIFF(r.return_date, r.rental_date) <= r.rental_duration THEN 'On Time' ELSE 'Late' END) AS status, COUNT(r.rental_id) As rental_count FROM rental_detail r GROUP BY r.category_name, (CASE WHEN return_date IS NULL THEN 'Still Out' WHEN DATEDIFF(r.return_date, r.rental_date) <= r.rental_duration THEN 'On Time' ELSE 'Late' END) ORDER BY 1, 2; Show count of rented films by film-category and timeliness of returns, whether late or on-time. GROUP_CONCAT: Aggregate Functions The aggregate function GROUP_CONCAT groups together character strings, as shown by the following example where actors in a film are arrayed for a film in alphabetical order. SELECT f.title, GROUP_CONCAT(CONCAT(a.first_name, _utf8' ', a.last_name) ORDER BY a.last_name, a.first_name SEPARATOR ', ') AS actors FROM film f JOIN film_actor fa USING(film_id) JOIN actor a ON fa.actor_id = a.actor_id WHERE title LIKE 'SU%' GROUP BY f.film_id; Join film, film_actor and actor tables. Concatenate actors for each film into one string, grouping by film. Order actors by last_name, first_name. 3 / 11

Multiple Column GROUPING GROUP BY can be used for multiple columns. The following query averages rental and length for rating and category: SELECT category_name, rating, ROUND(AVG(rental_rate),2) 'Average Rental', ROUND(AVG(length),0) 'Average Length' GROUP BY category_name, rating ORDER BY category_name, rating; Show average length and rental rate by rating and category. GROUP BY WITH ROLLUP The key word WITH ROLLUP can be appended to GROUP BY column, where if GROUP BY groups only a single column, then an additional sum row is added, with the group name NULL : SELECT category_name, ROUND(AVG(rental_rate),2) 'Average Rental', ROUND(AVG(length),0) 'Average Length' GROUP BY category_name WITH ROLLUP; SELECT category_name, ROUND(AVG(rental_rate),2) 'Average Rental', ROUND(AVG(length),0) 'Average Length' WHERE category_name IS NOT NULL GROUP BY category_name WITH ROLLUP; SELECT category_name, ROUND(AVG(rental_rate),2) 'Average Rental', ROUND(AVG(length),0) 'Average Length' WHERE category_name IN ('Children','Foreign') GROUP BY category_name WITH ROLLUP; 4 / 11

Show average length and rental rate by category with a Rollup. Note there may be a NULL category, which can be filtered out via WHERE clause. ORDER BY cannot be used with ROLLUP. The last row is the rolled-up, summary row for all categories. WHERE clause limits the result set and also affects the ROLLUP. Filtering Aggregates using HAVING Clause The HAVING clause is used to filter grouped data. For example, the following code specifies that we only want information on languages that have more than a certain number of films. SELECT l.name, COUNT(f.film_id) FROM language l JOIN film_detail f USING (language_id) GROUP BY f.language_id HAVING COUNT(f.film_id) > 5 ORDER BY 2 DESC; Count films by language. Limit result set to languages with more than 5 films. 5 / 11

The HAVING clause works similar to the WHERE clause in that it consists of one or more conditions that define which rows are included in a result set. You cannot use aggregate functions or column aliases in expressions in your WHERE clause for which we use HAVING clause. In general, the HAVING clause is normally best suited to use in conjunction with the GROUP BY clause. A HAVING clause is constructed exactly like a WHERE clause, in terms of defining conditions and connecting multiple conditions with operators. For example, the following SELECT statement includes a HAVING clause that contains one condition: SELECT category_name, ROUND(AVG(rental_rate),2) 'Average Rental', ROUND(AVG(length),0) 'Average Length' GROUP BY category_name HAVING AVG(length) > 120 ORDER BY category_name; Show average length and rental rate by category. Limit result set to those having average length more than 2 hours. 6 / 11

You can use both WHERE and HAVING clauses in the same SELECT. The following query shows use of Alias in HAVING clause. SELECT c.customer_id, c.last_name, c.first_name, cat.name, COUNT(r.rental_id) AS total_rentals, SUM(p.amount) AS total_sales FROM payment AS p INNER JOIN rental AS r ON p.rental_id = r.rental_id INNER JOIN inventory AS i ON r.inventory_id = i.inventory_id INNER JOIN film AS f ON i.film_id = f.film_id INNER JOIN film_category AS fc ON f.film_id = fc.film_id INNER JOIN category AS cat ON fc.category_id = cat.category_id INNER JOIN customer AS c ON r.customer_id = c.customer_id WHERE c.customer_id BETWEEN 100 AND 200 GROUP BY c.customer_id, cat.category_id HAVING total_rentals > 3 ORDER BY total_sales DESC; Group payments and rental count by customer and category. Limit result set to those having rental count more than 3, using an alias for the count. Limit result set by customer ID between 100 and 200. We are using both WHERE and HAVING clauses in the same SELECT. Miscellaneous Grouping Concepts and General Constraints 7 / 11

Finding Top-N or Bottom-N Entities It is desirable sometimes to find best or worst-case records, such as top paying customers and films that are not getting rented. The same is achieved using creative mix of Grouping, Ordering and Limits. The following query finds the Top 5 paying customers. SELECT c.last_name, c.first_name, COUNT(1) AS 'RentalCount' FROM rental r JOIN customer c USING (customer_id) WHERE r.rental_date BETWEEN '20050601' AND '20050930' GROUP BY r.customer_id ORDER BY RentalCount DESC LIMIT 5; Group rental count by customer. Limit result set to 5 top renting customers by rental count. The third quarter of 2005 is used to filter data. Find 5 least-rented films. SELECT f.title, f.rating, f.category_name, COUNT(1) AS 'RentalCount' FROM rental r JOIN inventory i USING (inventory_id) JOIN film_detail f USING (film_id) GROUP BY f.film_id 8 / 11

ORDER BY RentalCount LIMIT 5; Group rental count by films. Find lowest renting films. Find oldest-rented 10 films. SELECT f.title, f.rating, f.category_name, MAX(rental_date) AS 'LastRentedOn' FROM rental r JOIN inventory i USING (inventory_id) JOIN film_detail f USING (film_id) GROUP BY f.film_id ORDER BY LastRentedOn LIMIT 10; Read max rental date by films. Find 10 films whose rental date is the oldest, or first in ordered list. Order of Clauses As seen in the syntax, the clauses are expected to follow a certain order to be syntactically correct. 1. SELECT 2. FROM 3. WHERE 4. GROUP BY 9 / 11

5. HAVING 6. ORDER BY SELECT city, COUNT(customer_id) AS NumCustomers FROM customer c JOIN address USING (address_id) JOIN city USING (city_id) WHERE city BETWEEN 'B' AND 'G' GROUP BY city HAVING COUNT(customer_id) > 1 ORDER BY NumCustomers; Group customer count by city. Filter cities by name. List cities with more than 1 living customer. Grouping Rules - Normally, every non-aggregate column that appears in the SELECT clause must also appear in the GROUP BY clause. MySQL relaxes this rule by allowing one to SELECT columns that are not in GROUP BY and vice versa, as will be seen in many examples. There are arguments on both sides, but this is an advantageous feature of MySQL, which can be made stricter using SQL Modes discussed in another lesson. - Normally, you may not use aliases in the HAVING clause, but again we see MySQL's liberty at play here. It is very common to use aliases in the HAVING clause in MySQL queries. - You may use aliases or actual fields in the ORDER BY clause, in addition to column 10 / 11

positions. - Normally, you may only use calculated fields in the HAVING clause, but MySQL's allows use of aliases which relaxes this control. - As we have seen, using ROLLUP is allowed with only one grouping column, and no ORDER BY can be used. Aggregate Functions and Grouping Conclusion in MySQL This lesson gave you the information necessary to perform the following tasks: - Use GROUP BY clauses to your SELECT statements to generate summary data - Use HAVING and other clauses to your SELECT statements to filter the results returned by summarized data To continue to learn MySQL go to the top of this page and click on the next lesson in this M ysql Tutorial's Table of Contents. 11 / 11