TL;DR
GROUP BY collapses multiple rows into summary rows. COUNT, SUM, AVG, MIN, MAX are aggregate functions. Every non-aggregated column in SELECT must be in GROUP BY.
HAVING filters aggregated results (after grouping). WHERE filters individual rows (before grouping). Use HAVING when your filter involves an aggregate function like SUM > 10000.
NULL in aggregates: SUM(amount) where amount is NULL skips that row; COUNT(*) counts NULLs, COUNT(column) skips them; SUM(NULL) + 100 = NULL (NULL poison).

Why We Need GROUP BY

Imagine this question: "What is the total earned by each participant in Q1 2026?" Without GROUP BY, you'd get one row per result transaction. With GROUP BY, you collapse those rows into one row per participant with a SUM of their earnings.

GROUP BY groups rows by one or more columns and lets you summarize each group with aggregate functions.

The Five Aggregate Functions

Function Purpose Example
COUNT(*) Count all rows (including NULLs) COUNT(*) returns 5 if there are 5 rows
COUNT(column) Count non-NULL values in a column COUNT(RESULT_AMOUNT) counts result rows (skips NULLs)
SUM(amount) Add up all values SUM(RESULT_AMOUNT) = total earnings
AVG(amount) Average of all values (NULLs excluded) AVG(RESULT_AMOUNT) = average earning per transaction
MIN(value)
MAX(value)
Minimum / Maximum value MAX(RESULT_AMOUNT) = highest single earning

GROUP BY Syntax and The GROUP BY Rule

Here's the critical rule: Every column in SELECT must either be in GROUP BY or wrapped in an aggregate function.

SQL — GROUP BY example
SELECT
  PARTICIPANT_ID,                      -- In GROUP BY ✓
  SUM(RESULT_AMOUNT) AS TOTAL_EARNED   -- Aggregate function ✓
FROM CSC_RESULTS
WHERE PERIOD_ID = '2026-Q1'
GROUP BY PARTICIPANT_ID
ORDER BY TOTAL_EARNED DESC;

This query:

  • Groups all results by PARTICIPANT_ID
  • For each group, calculates SUM(RESULT_AMOUNT)
  • Returns one row per participant with their total

If you tried to SELECT a column not in GROUP BY (like NAME), the database returns an error. It doesn't know which NAME to pick from potentially multiple results rows.

Real ICM Example: Total Earnings by Participant

Here's a production query that combines everything: participants, results, and aggregations:

SQL — SAP SuccessFactors IM
SELECT
  p.PARTICIPANT_ID,
  p.NAME,
  p.REGION,
  COUNT(r.ID)              AS TRANSACTION_COUNT,
  SUM(r.RESULT_AMOUNT)    AS TOTAL_EARNED,
  AVG(r.RESULT_AMOUNT)    AS AVG_PER_TRANSACTION,
  MAX(r.RESULT_AMOUNT)    AS HIGHEST_EARNING,
  MIN(r.RESULT_AMOUNT)    AS LOWEST_EARNING
FROM     CSC_PARTICIPANT p
LEFT JOIN CSC_RESULTS r
  ON r.PARTICIPANT_ID = p.PARTICIPANT_ID
WHERE    p.STATUS = 'ACTIVE'
  AND      r.PERIOD_ID = '2026-Q1'
GROUP BY p.PARTICIPANT_ID, p.NAME, p.REGION
ORDER BY TOTAL_EARNED DESC;

GROUP BY includes PARTICIPANT_ID, NAME, and REGION. All three must be in GROUP BY because they're in SELECT but not aggregated. The database groups by all three, ensuring one row per unique combination (though typically PARTICIPANT_ID is unique).

HAVING: Filtering After Aggregation

WHERE filters before aggregation. HAVING filters after. This matters:

SQL — WHERE vs HAVING
-- Find active participants
WHERE p.STATUS = 'ACTIVE';  -- Filters rows before grouping ✓

-- Find groups where total earned > $10,000
HAVING SUM(r.RESULT_AMOUNT) > 10000;  -- Filters groups after aggregation ✓

The execution order is: FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. This is why HAVING can reference aggregate functions (they've already been calculated by then).

Real scenario: "Show me participants who earned more than $10,000 in Q1 2026, including their count of transactions and average earning per transaction."

SQL — SAP SuccessFactors IM
SELECT
  p.NAME,
  COUNT(r.ID)            AS TRANSACTIONS,
  SUM(r.RESULT_AMOUNT)  AS TOTAL_EARNED,
  AVG(r.RESULT_AMOUNT)  AS AVG_TRANSACTION
FROM     CSC_PARTICIPANT p
LEFT JOIN CSC_RESULTS r
  ON r.PARTICIPANT_ID = p.PARTICIPANT_ID
WHERE    p.STATUS = 'ACTIVE'
  AND      r.PERIOD_ID = '2026-Q1'
GROUP BY p.PARTICIPANT_ID, p.NAME
HAVING   SUM(r.RESULT_AMOUNT) > 10000
ORDER BY TOTAL_EARNED DESC;

The WHERE clause filters to active participants in Q1 2026. GROUP BY groups by participant. HAVING then filters to only those groups with SUM > 10000. You can't use HAVING without GROUP BY (it has no meaning without groups).

💡Rule of thumb: If your filter involves an aggregate (SUM, COUNT, AVG, etc.), use HAVING. If it involves a regular column, use WHERE.

NULL Behavior in Aggregates

NULLs are tricky in aggregates:

  • COUNT(*) counts all rows, including those with NULL values
  • COUNT(column) counts non-NULL values in that column. If column is NULL, the row is skipped
  • SUM(amount) skips NULL rows. SUM of (100, NULL, 200) = 300, not NULL
  • AVG(amount) skips NULL rows. AVG of (100, NULL, 200) = 150, not NULL
  • MIN/MAX(amount) skip NULLs
SQL — NULL in aggregates
-- Example data: (100, NULL, 200)

COUNT(*)       = 3      -- all rows, including NULL
COUNT(amount) = 2      -- only non-NULL
SUM(amount)   = 300    -- 100 + 200, NULL skipped
AVG(amount)   = 150    -- 300 / 2, not 300 / 3

In ICM, this matters: if a participant has some results and some missing results (NULL), the aggregates correctly skip the missing data.

GROUP BY Multiple Columns: Earnings by Period and Region

You can group by more than one column:

SQL — GROUP BY multiple columns
SELECT
  r.PERIOD_ID,
  p.REGION,
  COUNT(r.ID)            AS TRANSACTIONS,
  SUM(r.RESULT_AMOUNT)  AS TOTAL_EARNED,
  AVG(r.RESULT_AMOUNT)  AS AVG_TRANSACTION
FROM     CSC_PARTICIPANT p
LEFT JOIN CSC_RESULTS r
  ON r.PARTICIPANT_ID = p.PARTICIPANT_ID
WHERE    p.STATUS = 'ACTIVE'
GROUP BY r.PERIOD_ID, p.REGION
ORDER BY r.PERIOD_ID ASC, TOTAL_EARNED DESC;

This groups by both PERIOD_ID and REGION, returning one row per unique combination. You get earnings totals broken down by period and region.

Avoiding GROUP BY Errors

Common mistakes:

  • "Column X is not in GROUP BY": You selected X without aggregating it and it's not in GROUP BY. Either add it to GROUP BY or wrap it in an aggregate.
  • "Aggregate function not allowed": You used COUNT or SUM in WHERE. Move it to HAVING.
  • "Unexpected NULL values": Check if your JOIN produced NULLs. Use LEFT JOIN carefully when aggregating.
  • "Wrong totals": Verify the WHERE clause filters correctly. If you filter after the JOIN, you might reduce the aggregate scope.

What Comes Next

GROUP BY and aggregates answer "what is the total?" In Lesson 05, we'll learn advanced queries that answer "rank participants by earnings" or "show running totals by period" without collapsing rows. That's where window functions and CTEs come in.