10 How Do We Group Data Into Summary Rows?
Summary Queries
Summary queries are queries that contain one or more aggregate functions.
Aggregate Functions
Aggregate functions operate on a set of rows to calculate and return a single value. For example, determining how many rows are returned; adding up values; or finding the highest, lowest, or average values in a column or group of the result set.
Common SQL Aggregate Functions
AVG() |
Returns a column’s average value |
SUM() |
Returns the sum of a column’s values |
MAX() |
Returns a column’s highest value |
MIN() |
Returns a column’s lowest value |
COUNT() |
Returns the number of rows in a column |
The AVG() Function Syntax:
AVG([ALL| DISTINCT] expression)
SELECT AVG(salary) AS ‘Average Salary’ FROM salaries
Average Salary |
---|
64238.8072 |
THE SUM() Function Syntax:
SUM([ALL| DISTINCT] expression)
SELECT SUM(salary) AS ‘Total of all Salaries’ FROM salaries
Total of all Salaries |
---|
609497803 |
The MAX() Function Syntax:
MAX([ALL| DISTINCT] expression)
THE MIN() Function Syntax:
MIN([ALL| DISTINCT] expression)
SELECT MAX(salary) AS ‘Highest Salary’, MIN(salary) As ‘Lowest Salary’ FROM salaries
Highest Salary | Lowest Salary |
---|---|
136004 | 39264 |
THE COUNT() Function Syntax:
COUNT([ALL| DISTINCT] expression)
SELECT COUNT(*) AS ‘Number of Employees’ FROM employees
Number of Employees |
---|
9488 |
The ALL keyword is assumed with aggregate functions. You don’t have to include it. This means all rows are included in the aggregate function, which is the default.
SELECT AVG(ALL prod_price) AS ‘Average Price’ FROM products
All product prices are included in the average calculation.
You can use the DISTINCT keyword instead to include only unique values.
SELECT AVG(DISTINCT prod_price) AS ‘Average Price’ FROM products
So if there are multiple products with the same price, that price will only be included once in the average calculation.
Creating Groups
In previous examples the aggregate functions summarized data across all rows in the given table. Creating groups allows you to summarize subsets of table contents.
We will use the GROUP BY clause to do this. The GROUP BY clause is used when you have an aggregate function that you want to use on the grouped data.
SELECT statement with GROUP BY and HAVING clause syntax:
SELECT select_list
FROM table_source
[WHERE search_condition]
[GROUP BY group-by_list]
[HAVING search_condition]
[ORDER BY order_by_list]
SELECT vendor_id, COUNT(*) AS ‘Number of Products’ FROM products GROUP BY vendor_id
The GROUP BY clause sorts the data and groups by vendors, then the COUNT function is used to count the number of products for each vendor. So you’d have a count once per vendor rather than for the entire table.
vendor_id | Number of Products |
---|---|
021 | 15 |
044 | 23 |
121 | 31 |
Filtering Groups
The HAVING clause also works with aggregate functions and allows you to filter the groups (or in other words specify which groups to include or not include). It works a lot like the WHERE clause but the WHERE clause filters rows and the HAVING clause filters groups.
SELECT vendor_id, COUNT(*) AS ‘Number of Products’ FROM products GROUP BY vendor_id HAVING COUNT(*) > 20
vendor_id | Number of Products |
---|---|
044 | 23 |
121 | 31 |
The HAVING clause filters the vendors after they are grouped and shows only the vendors with the number of products over 20.
You can use WHERE and HAVING together. If you want to only include product prices greater than $10 and then show the number of products that result grouped by vendors for those vendors who have more than 20 products.
SELECT vendor_id, COUNT(*) AS ‘Number of Products’ FROM products WHERE product_price > 10 GROUP BY vendor_id HAVING COUNT(*) > 20
vendor_id | Number of Products |
---|---|
121 | 25 |
The ROLLUP operator can be used with grouping and aggregates allowing you to add one or more summary rows to your results. A summary row will show up for every group you have.
SELECT vendor_id, SUM(product_price) AS ‘Total of Product Prices’ FROM products WHERE vendor_id < 15 GROUP BY vendor_id WITH ROLLUP
vendor_id | Total of Product Prices |
---|---|
010 | 121.87 |
011 | 242.21 |
012 | 450.43 |
013 | 312.12 |
014 | 576.94 |
NULL | 1703.57 |
The summary row shows up as the bottom row and totals all the totals into a grand total.
SELECT Clause Syntax Order
The order of how you must write the code.
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
LIMIT
SELECT Order of Execution
The written order, or the syntax order of how we write a query differs from the order of how the query is actually executed. It’s important to understand the order of execution so you know what results are accessible where.
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
LIMIT
The data from the entire table(s) is retrieved, then that is filtered by the WHERE criteria. Then that resulting data is grouped and filtered by the HAVING criteria. The columns, calculations or functions that will show up are determined in SELECT. Then that resulting data is sorted and then limited.