Pivot Tables & Crosstab
A pivot table in pandas is a summary grid built with df.pivot_table() that takes long, row-by-row data and reshapes it so one categorical column becomes the rows, another becomes the columns, and a numeric column is aggregated into each cell — exactly like a spreadsheet PivotTable.
Learn Pivot Tables & Crosstab in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick…
Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
Start with long-format sales data — one row per transaction. pivot_table turns it into a grid: index picks the rows, columns picks the columns, values is the number to summarise, and aggfunc says how (sum, mean, count, ...).
Add margins=True for an "All" row and column of grand totals. Pass a list to aggfunc to compute several summaries at once.
df.pivot() only reshapes and errors if any cell would receive more than one value . pd.crosstab() is a shortcut for frequency tables — counting how often category combinations occur, optionally as proportions with normalize= .
Build a pivot table of the average score for each class. Which aggfunc gives the average?
From transaction-level data, build a store-by-month revenue grid with grand totals, then find the best month overall.
Lesson complete — you can summarise data like a spreadsheet!
You can build pivot tables with custom aggregations and totals, choose between pivot and pivot_table, and count category combinations with crosstab. These are the everyday reporting tools of a data analyst.
🚀 Up next: Reshaping with melt, stack, unstack — convert between wide and long layouts.
Practice quiz
In pivot_table, which argument picks the rows of the grid?
- columns
- index
- values
- aggfunc
Answer: index. index chooses which column's categories become the rows.
What does the values argument specify?
- The row labels
- The column labels
- The numeric column to aggregate into each cell
- The sort order
Answer: The numeric column to aggregate into each cell. values is the numeric column summarised in each cell.
What is the default aggfunc of pivot_table?
- mean
- sum
- count
- max
Answer: mean. pivot_table aggregates with mean by default.
Which aggfunc gives the average score per group?
- 'sum'
- 'count'
- 'max'
- 'mean'
Answer: 'mean'. aggfunc='mean' computes the average of the values in each cell.
What does margins=True add?
- An 'All' row and column of grand totals
- A sort
- A new index
- Random noise
Answer: An 'All' row and column of grand totals. margins=True appends grand-total row and column (labelled 'All' by default).
How do you compute several statistics at once in pivot_table?
- Call it many times
- sum
- mean
- count
Answer: sum. A list of functions in aggfunc returns multiple summary columns.
Why does df.pivot raise an error on duplicate index/column pairs?
- It only reshapes and can't decide which value to keep
- It needs an index
- It needs more memory
- It is deprecated
Answer: It only reshapes and can't decide which value to keep. pivot can't aggregate, so duplicate cells are ambiguous and raise ValueError.
Which function is the shortcut for frequency (count) tables?
- df.pivot()
- pd.crosstab()
- df.merge()
- df.stack()
Answer: pd.crosstab(). pd.crosstab counts how often combinations of categories co-occur.
What does crosstab(a, b, normalize='index') produce?
- Raw counts
- A sum
- Each row as proportions that sum to 1
- A pivot with totals
Answer: Each row as proportions that sum to 1. normalize='index' converts each row's counts into proportions.
If df.pivot raises 'Index contains duplicate entries', what should you switch to?
- crosstab without values
- merge
- concat
- pivot_table with an aggfunc
Answer: pivot_table with an aggfunc. pivot_table plus an aggfunc decides how to combine the duplicate cells.