Aggregation and Grouping

  • ID: DBSQL-F-L03
  • Type: Lesson
  • Audience: Public
  • Theme: Summarizing data with aggregation

Selecting rows retrieves information.

Aggregation summarizes it.

In analytics, we often need answers like:


Setup


Core Aggregate Functions

Common aggregate functions:

  • COUNT()
  • SUM()
  • AVG()
  • MIN()
  • MAX()

Count Orders

df = pd.read_sql(
    "SELECT COUNT(*) AS total_orders FROM orders;",
    con
)
print(df)
   total_orders
0             5

Compute Total Revenue

Revenue in this dataset is computed from line items:

quantity × unit_price

df = pd.read_sql(
    "SELECT SUM(quantity * unit_price) AS total_revenue FROM order_items;",
    con
)
print(df)
   total_revenue
0          65.55

GROUP BY

GROUP BY creates summaries per group.

Orders by Status

df = pd.read_sql(
    '''
    SELECT status, COUNT(*) AS order_count
    FROM orders
    GROUP BY status
    ORDER BY order_count DESC;
    ''',
    con
)
print(df)
      status  order_count
0       paid            4
1  cancelled            1

Aggregation with Joins

To summarize revenue by product name, we join order_items to products.

query = '''
SELECT p.product_name,
       SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_items oi
JOIN products p
  ON oi.product_id = p.product_id
GROUP BY p.product_name
ORDER BY revenue DESC;
'''

df = pd.read_sql(query, con)
print(df)
     product_name  revenue
0  USB Drive 32GB    26.70
1      Headphones    19.50
2     Notebook A5     7.50
3    Water Bottle     6.25
4        Pen Blue     5.60

HAVING

WHERE filters rows before grouping.

HAVING filters groups after aggregation.

df = pd.read_sql(
    '''
    SELECT status, COUNT(*) AS order_count
    FROM orders
    GROUP BY status
    HAVING COUNT(*) > 1;
    ''',
    con
)
print(df)
  status  order_count
0   paid            4

Aggregation is powerful.

It must be paired with structural discipline, especially when joins are involved.