10 How Do We Group Data Into Summary Rows?

Summary Queries

Summary queries are queries that contain one or more aggregate functions.

Aggregate Functions

Aggregate functions operate on a set of rows to calculate and return a single value. For example, determining how many rows are returned; adding up values; or finding the highest, lowest, or average values in a column or group of the result set.

Common SQL Aggregate Functions

AVG()

Returns a column’s average value

SUM()

Returns the sum of a column’s values

MAX()

Returns a column’s highest value

MIN()

Returns a column’s lowest value

COUNT()

Returns the number of rows in a column

The AVG() Function Syntax:

AVG([ALL| DISTINCT] expression)


 SELECT AVG(salary) AS ‘Average Salary’
 FROM salaries
			
Average Salary
64238.8072

THE SUM() Function Syntax:

SUM([ALL| DISTINCT] expression)


 SELECT SUM(salary) AS ‘Total of all Salaries’
 FROM salaries
			
Total of all Salaries
609497803

The MAX() Function Syntax:

MAX([ALL| DISTINCT] expression)



THE MIN() Function Syntax:

MIN([ALL| DISTINCT] expression)


 SELECT MAX(salary) AS ‘Highest Salary’, MIN(salary) As ‘Lowest Salary’
 FROM salaries
			
Highest Salary Lowest Salary
136004 39264

THE COUNT() Function Syntax:

COUNT([ALL| DISTINCT] expression)


 SELECT COUNT(*) AS ‘Number of Employees’
 FROM employees
			
Number of Employees
9488

The ALL keyword is assumed with aggregate functions. You don’t have to include it. This means all rows are included in the aggregate function, which is the default.


 SELECT AVG(ALL prod_price) AS ‘Average Price’
 FROM products
			

All product prices are included in the average calculation.

You can use the DISTINCT keyword instead to include only unique values.


 SELECT AVG(DISTINCT prod_price) AS ‘Average Price’
 FROM products
			

So if there are multiple products with the same price, that price will only be included once in the average calculation.

Creating Groups

In previous examples the aggregate functions summarized data across all rows in the given table. Creating groups allows you to summarize subsets of table contents.

We will use the GROUP BY clause to do this. The GROUP BY clause is used when you have an aggregate function that you want to use on the grouped data.


SELECT statement with GROUP BY and HAVING clause syntax:

SELECT select_list
FROM table_source
[WHERE search_condition]
[GROUP BY group-by_list]
[HAVING search_condition]
[ORDER BY order_by_list]


 SELECT vendor_id, COUNT(*) AS ‘Number of Products’
 FROM products 
 GROUP BY vendor_id
			

The GROUP BY clause sorts the data and groups by vendors, then the COUNT function is used to count the number of products for each vendor. So you’d have a count once per vendor rather than for the entire table.

vendor_id Number of Products
021 15
044 23
121 31

Filtering Groups

The HAVING clause also works with aggregate functions and allows you to filter the groups (or in other words specify which groups to include or not include). It works a lot like the WHERE clause but the WHERE clause filters rows and the HAVING clause filters groups.


 SELECT vendor_id, COUNT(*) AS ‘Number of Products’
 FROM products
 GROUP BY vendor_id
 HAVING COUNT(*) > 20
			
vendor_id Number of Products
044 23
121 31

The HAVING clause filters the vendors after they are grouped and shows only the vendors with the number of products over 20.

You can use WHERE and HAVING together. If you want to only include product prices greater than $10 and then show the number of products that result grouped by vendors for those vendors who have more than 20 products.

 SELECT vendor_id, COUNT(*) AS ‘Number of Products’
 FROM products
 WHERE product_price > 10
 GROUP BY vendor_id
 HAVING COUNT(*)  > 20
			
vendor_id Number of Products
121 25

The ROLLUP operator can be used with grouping and aggregates allowing you to add one or more summary rows to your results. A summary row will show up for every group you have.


 SELECT vendor_id, SUM(product_price) AS ‘Total of Product Prices’
 FROM products
 WHERE vendor_id < 15
 GROUP BY vendor_id WITH ROLLUP
			
vendor_id Total of Product Prices
010 121.87
011 242.21
012 450.43
013 312.12
014 576.94
NULL 1703.57

The summary row shows up as the bottom row and totals all the totals into a grand total.

SELECT Clause Syntax Order

The order of how you must write the code.

SELECT

FROM

WHERE

GROUP BY

HAVING

ORDER BY

LIMIT

SELECT Order of Execution

The written order, or the syntax order of how we write a query differs from the order of how the query is actually executed. It’s important to understand the order of execution so you know what results are accessible where.

FROM

WHERE

GROUP BY

HAVING

SELECT

ORDER BY

LIMIT

The data from the entire table(s) is retrieved, then that is filtered by the WHERE criteria. Then that resulting data is grouped and filtered by the HAVING criteria. The columns, calculations or functions that will show up are determined in SELECT. Then that resulting data is sorted and then limited.