Data Frosch logo

Data Frosch

Guides

← All Guides

Guide

Pivot Tables

A pivot table answers one question: how does this number break down across these categories? This guide builds the intuition for what pivot tables do, how to read them, and how to construct them from a question.

The one question a pivot table answers

You have a long flat list of rows, like contracts, crimes, patients, transactions. You want to know: how many rows match each category? or what is the total value for each group?

That is all a pivot table does. It collapses many rows into a summary, grouped by one or more categories.

You could do the same thing by hand: sort by ministry, count the rows, write down the number, repeat. A pivot table does it instantly, and lets you change the grouping without starting over.

From flat list to grouped summary

Start with a flat list of procurement contracts. Each row is one contract.

Tidy data: one row per contract

ministryyearvalue (€)
Health2022120,000
Finance202245,000
Health202389,000
Finance2023210,000
Health202255,000
Finance202330,000

Pivot: count of contracts per ministry

ministrycount
Finance3
Health3

The six rows collapsed into two, one per ministry. The pivot table counted how many times each ministry appeared.

But you can ask a different question of the same data. Let's ask how much instead of how many:

Same tidy data

ministryyearvalue (€)
Health2022120,000
Finance202245,000
Health202389,000
Finance2023210,000
Health202255,000
Finance202330,000

Pivot: total spending per ministry over the years

ministrytotal (€)
Finance285,000
Health264,000

The years dimension disappeared and a we got a sum over the years. Pretty neat!

The four fields

Every pivot table tool (Excel, Google Sheets, Python, R) organises the same four fields. Once you understand what each one does, the tool doesn't matter.

Values: what you want to measure

This is a numeric column. You choose an aggregation: count (how many rows), sum (add them up), average, max, min. Count doesn't need a numeric column, it just counts rows. Sum, average, max, and min do.

Example: Sum of value, or Count of contract_id.

Rows: what you want to group by

A categorical column. The pivot table creates one output row per unique value in this column.

Example: ministry → one row per ministry.

Columns: a second grouping, spread sideways

Optional. A second categorical column whose unique values become column headers. This is what gives the pivot table its name. You are pivoting a category from rows into columns. The result is a cross-tabulation.

Example: year → columns become 2022, 2023, 2024.

Filters: zoom in on a subset

Restrict the data before summarising. The pivot table runs on only the rows that match your filter.

Example: Filter to year = 2023 before counting contracts.

Adding a second dimension with Columns

Here is what happens when you move year from Rows to Columns:

Rows: ministry & year: two grouping columns, stacked

ministryyeartotal (€)
Finance202245,000
Finance2023240,000
Health2022175,000
Health202389,000

Cross-tabulation. Rows: ministry, Columns: year

ministry20222023
Finance45,000240,000
Health175,00089,000

Both tables contain the same information. The cross-tabulation on the right is easier to scan for comparisons across years. It is also the form journalists most often see in press releases and reports.

Use the stacked form (left) when you have many years or want to sort by total. Use the cross-tab (right) when you want to compare two categories against each other at a glance.

Count vs. sum

These answer different questions. Getting them mixed up produces wrong numbers.

Count: how many contracts did each ministry sign?

ministrycount of contracts
Finance3
Health3

Counts rows. Does not care about the value column.

Sum: how much did each ministry spend?

ministrytotal value (€)
Finance285,000
Health264,000

Adds up the value column. A ministry with one huge contract scores higher than one with many small ones.

Both are valid, they just answer different questions. Before you build a pivot table, ask: am I measuring frequency (count) or magnitude (sum)?

From question to pivot table

Work backwards from your question. Each part of the question maps to a field.

Question Rows Columns Values
How many contracts per ministry? ministry count
Total spending per ministry per year? ministry year sum of value
Average contract size per region? region average of value
Number of crimes per neighbourhood per year, only violent crimes? neighbourhood year count
Which supplier received the largest single contract? supplier max of value

If you cannot map your question to these fields, the question may be too complex for a single pivot table, or the data may need to be tidied first.

What pivot tables need to work

Pivot tables only work well on tidy data. If your data has these problems, fix them first:

  • Merged cells: a merged region header makes the entire column unusable as a Row or Column field. Unmerge and fill down first.
  • Inconsistent values: "Netherlands", "The Netherlands", and "NL" will appear as three separate groups. Standardise categorical columns before pivoting.
  • Numbers stored as text: a value column with currency symbols or spaces will return 0 when you try to Sum it. Strip the non-numeric characters first.
  • Subtotal rows mixed into data: these will be counted and summed alongside the real data, inflating every result.

See the Tidy Data guide.