Pivot Tables & Crosstab

A pivot table in pandas is a summary grid built with df.pivot_table() that takes long, row-by-row data and reshapes it so one categorical column becomes the rows, another becomes the columns, and a numeric column is aggregated into each cell — exactly like a spreadsheet PivotTable.

Learn Pivot Tables & Crosstab in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick…

Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

Start with long-format sales data — one row per transaction. pivot_table turns it into a grid: index picks the rows, columns picks the columns, values is the number to summarise, and aggfunc says how (sum, mean, count, ...).

Add margins=True for an "All" row and column of grand totals. Pass a list to aggfunc to compute several summaries at once.

df.pivot() only reshapes and errors if any cell would receive more than one value . pd.crosstab() is a shortcut for frequency tables — counting how often category combinations occur, optionally as proportions with normalize= .

Build a pivot table of the average score for each class. Which aggfunc gives the average?

From transaction-level data, build a store-by-month revenue grid with grand totals, then find the best month overall.

Lesson complete — you can summarise data like a spreadsheet!

You can build pivot tables with custom aggregations and totals, choose between pivot and pivot_table, and count category combinations with crosstab. These are the everyday reporting tools of a data analyst.

🚀 Up next: Reshaping with melt, stack, unstack — convert between wide and long layouts.

Practice quiz

In pivot_table, which argument picks the rows of the grid?

columns
index
values
aggfunc

Answer: index. index chooses which column's categories become the rows.

What does the values argument specify?

The row labels
The column labels
The numeric column to aggregate into each cell
The sort order

Answer: The numeric column to aggregate into each cell. values is the numeric column summarised in each cell.

What is the default aggfunc of pivot_table?

mean
sum
count
max

Answer: mean. pivot_table aggregates with mean by default.

Which aggfunc gives the average score per group?

'sum'
'count'
'max'
'mean'

Answer: 'mean'. aggfunc='mean' computes the average of the values in each cell.

What does margins=True add?

An 'All' row and column of grand totals
A sort
A new index
Random noise

Answer: An 'All' row and column of grand totals. margins=True appends grand-total row and column (labelled 'All' by default).

How do you compute several statistics at once in pivot_table?

Call it many times
sum
mean
count

Answer: sum. A list of functions in aggfunc returns multiple summary columns.

Why does df.pivot raise an error on duplicate index/column pairs?

It only reshapes and can't decide which value to keep
It needs an index
It needs more memory
It is deprecated

Answer: It only reshapes and can't decide which value to keep. pivot can't aggregate, so duplicate cells are ambiguous and raise ValueError.

Which function is the shortcut for frequency (count) tables?

df.pivot()
pd.crosstab()
df.merge()
df.stack()

Answer: pd.crosstab(). pd.crosstab counts how often combinations of categories co-occur.

What does crosstab(a, b, normalize='index') produce?

Raw counts
A sum
Each row as proportions that sum to 1
A pivot with totals

Answer: Each row as proportions that sum to 1. normalize='index' converts each row's counts into proportions.

If df.pivot raises 'Index contains duplicate entries', what should you switch to?

crosstab without values
merge
concat
pivot_table with an aggfunc

Answer: pivot_table with an aggfunc. pivot_table plus an aggfunc decides how to combine the duplicate cells.