df = pd.read_sql(
"SELECT COUNT(*) AS total_orders FROM orders;",
con
)
print(df) total_orders
0 5
Selecting rows retrieves information.
Aggregation summarizes it.
In analytics, we often need answers like:
Common aggregate functions:
df = pd.read_sql(
"SELECT COUNT(*) AS total_orders FROM orders;",
con
)
print(df) total_orders
0 5
Revenue in this dataset is computed from line items:
quantity × unit_price
df = pd.read_sql(
"SELECT SUM(quantity * unit_price) AS total_revenue FROM order_items;",
con
)
print(df) total_revenue
0 65.55
GROUP BY creates summaries per group.
df = pd.read_sql(
'''
SELECT status, COUNT(*) AS order_count
FROM orders
GROUP BY status
ORDER BY order_count DESC;
''',
con
)
print(df) status order_count
0 paid 4
1 cancelled 1
To summarize revenue by product name, we join order_items to products.
query = '''
SELECT p.product_name,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_items oi
JOIN products p
ON oi.product_id = p.product_id
GROUP BY p.product_name
ORDER BY revenue DESC;
'''
df = pd.read_sql(query, con)
print(df) product_name revenue
0 USB Drive 32GB 26.70
1 Headphones 19.50
2 Notebook A5 7.50
3 Water Bottle 6.25
4 Pen Blue 5.60
WHERE filters rows before grouping.
HAVING filters groups after aggregation.
df = pd.read_sql(
'''
SELECT status, COUNT(*) AS order_count
FROM orders
GROUP BY status
HAVING COUNT(*) > 1;
''',
con
)
print(df) status order_count
0 paid 4
Aggregation is powerful.
It must be paired with structural discipline, especially when joins are involved.