Pivot Tables | Data Frosch Guides

The one question a pivot table answers

You have a long flat list of rows, like contracts, crimes, patients, transactions. You want to know: how many rows match each category? or what is the total value for each group?

That is all a pivot table does. It collapses many rows into a summary, grouped by one or more categories.

You could do the same thing by hand: sort by ministry, count the rows, write down the number, repeat. A pivot table does it instantly, and lets you change the grouping without starting over.

From flat list to grouped summary

Start with a flat list of procurement contracts. Each row is one contract.

Tidy data: one row per contract

ministry	year	value (€)
Health	2022	120,000
Finance	2022	45,000
Health	2023	89,000
Finance	2023	210,000
Health	2022	55,000
Finance	2023	30,000

Pivot: count of contracts per ministry

ministry	count
Finance	3
Health	3

The six rows collapsed into two, one per ministry. The pivot table counted how many times each ministry appeared.

But you can ask a different question of the same data. Let's ask how much instead of how many:

Same tidy data

ministry	year	value (€)
Health	2022	120,000
Finance	2022	45,000
Health	2023	89,000
Finance	2023	210,000
Health	2022	55,000
Finance	2023	30,000

Pivot: total spending per ministry over the years

ministry	total (€)
Finance	285,000
Health	264,000

The years dimension disappeared and a we got a sum over the years. Pretty neat!

The four fields

Every pivot table tool (Excel, Google Sheets, Python, R) organises the same four fields. Once you understand what each one does, the tool doesn't matter.

Values: what you want to measure

This is a numeric column. You choose an aggregation: count (how many rows), sum (add them up), average, max, min. Count doesn't need a numeric column, it just counts rows. Sum, average, max, and min do.

Example: Sum of value, or Count of contract_id.

Rows: what you want to group by

A categorical column. The pivot table creates one output row per unique value in this column.

Example: ministry → one row per ministry.

Columns: a second grouping, spread sideways

Optional. A second categorical column whose unique values become column headers. This is what gives the pivot table its name. You are pivoting a category from rows into columns. The result is a cross-tabulation.

Example: year → columns become 2022, 2023, 2024.

Filters: zoom in on a subset

Restrict the data before summarising. The pivot table runs on only the rows that match your filter.

Example: Filter to year = 2023 before counting contracts.

Adding a second dimension with Columns

Here is what happens when you move year from Rows to Columns:

Rows: ministry & year: two grouping columns, stacked

ministry	year	total (€)
Finance	2022	45,000
Finance	2023	240,000
Health	2022	175,000
Health	2023	89,000

Cross-tabulation. Rows: ministry, Columns: year

ministry	2022	2023
Finance	45,000	240,000
Health	175,000	89,000

Both tables contain the same information. The cross-tabulation on the right is easier to scan for comparisons across years. It is also the form journalists most often see in press releases and reports.

Use the stacked form (left) when you have many years or want to sort by total. Use the cross-tab (right) when you want to compare two categories against each other at a glance.

Count vs. sum

These answer different questions. Getting them mixed up produces wrong numbers.

Count: how many contracts did each ministry sign?

ministry	count of contracts
Finance	3
Health	3

Counts rows. Does not care about the value column.

Sum: how much did each ministry spend?

ministry	total value (€)
Finance	285,000
Health	264,000

Adds up the value column. A ministry with one huge contract scores higher than one with many small ones.

Both are valid, they just answer different questions. Before you build a pivot table, ask: am I measuring frequency (count) or magnitude (sum)?

From question to pivot table

Work backwards from your question. Each part of the question maps to a field.

Question	Rows	Columns	Values
How many contracts per ministry?	ministry		count
Total spending per ministry per year?	ministry	year	sum of value
Average contract size per region?	region		average of value
Number of crimes per neighbourhood per year, only violent crimes?	neighbourhood	year	count
Which supplier received the largest single contract?	supplier		max of value

If you cannot map your question to these fields, the question may be too complex for a single pivot table, or the data may need to be tidied first.

What pivot tables need to work

Pivot tables only work well on tidy data. If your data has these problems, fix them first:

Merged cells: a merged region header makes the entire column unusable as a Row or Column field. Unmerge and fill down first.
Inconsistent values: "Netherlands", "The Netherlands", and "NL" will appear as three separate groups. Standardise categorical columns before pivoting.
Numbers stored as text: a value column with currency symbols or spaces will return 0 when you try to Sum it. Strip the non-numeric characters first.
Subtotal rows mixed into data: these will be counted and summed alongside the real data, inflating every result.

See the Tidy Data guide.