Cross-Tabulation (crosstab)
A cross-tabulation counts how often each combination of two categories occurs, laying one variable down the rows and the other across the columns to reveal their relationship at a glance.
Learn Cross-Tabulation (crosstab) in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick…
Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
You'll learn pd.crosstab(rows, cols) , percentages with normalize= , totals with margins=True , and value summaries with aggfunc .
The simplest call pd.crosstab(df["dept"], df["level"]) builds a frequency table : each cell is the number of rows that have that dept and that level. Add margins=True to tack on an All row and column with the totals.
Raw counts can mislead when groups are different sizes. normalize rescales the table into proportions:
A crosstab does not have to count. Supply values= with a numeric Series and aggfunc= with how to summarise it, and each cell becomes that statistic for the combination — for example the average salary of each dept/level pair instead of a head count.
Two stores log which category each sale belongs to. Compare their mix.
Lesson complete — you can profile categories fast!
You can count combinations with pd.crosstab , add totals with margins=True , switch to percentages with normalize , and summarise values with values= + aggfunc= .
🚀 Up next: Reading & Writing SQL — move data between pandas and databases.
Practice quiz
What does a plain pd.crosstab(rows, cols) compute in each cell?
- The mean of a value
- The sum of a value
- The count of rows with that row/column combination
- A correlation
Answer: The count of rows with that row/column combination. By default crosstab builds a frequency table: each cell counts the matching rows.
In pd.crosstab(a, b), what does the first argument become?
- The rows (index)
- The columns
- The values
- The totals
Answer: The rows (index). The first argument forms the rows (index) and the second forms the columns.
What must you pass to crosstab as the row and column arguments?
- Column name strings
- A dictionary
- A list of bins
- The actual Series
Answer: The actual Series. crosstab takes the actual Series, not column-name strings; passing strings treats them as data.
What does margins=True add?
- Percentages
- An 'All' row and column with the totals
- A new value column
- Sorting
Answer: An 'All' row and column with the totals. margins=True appends an 'All' row and column holding the marginal totals.
What does normalize=True do?
- Each cell is divided by the grand total (whole table sums to 1)
- Each row sums to 1
- Each column sums to 1
- Nothing changes
Answer: Each cell is divided by the grand total (whole table sums to 1). normalize=True divides every cell by the grand total so the whole table sums to 1.
Which normalize value makes each row sum to 1 (row percentages)?
- normalize=True
- normalize='columns'
- normalize='index'
- normalize='rows'
Answer: normalize='index'. normalize='index' rescales so each row's values sum to 1.
To compute the average salary per dept/level cell instead of a count, you pass...
- margins=True
- values= and aggfunc= together
- normalize='index'
- keys=
Answer: values= and aggfunc= together. values= supplies the numeric Series and aggfunc= says how to summarise it; both are required together.
What happens if you pass values= without aggfunc=?
- It defaults to count
- It averages automatically
- It returns the values unchanged
- pandas raises an error
Answer: pandas raises an error. values= and aggfunc= must travel together; one without the other raises an error.
For a 6-row frame with depts Eng/Sales each appearing across Jr and Sr, what is the 'All' total cell with margins=True?
- 3
- 6
- 2
- 4
Answer: 6. The bottom-right 'All' cell is the grand total of rows, which is 6.
How does crosstab differ from pivot_table?
- They are identical
- pivot_table only counts
- crosstab defaults to counting and accepts plain Series; pivot_table works from existing columns and defaults to averaging
- crosstab cannot use aggfunc
Answer: crosstab defaults to counting and accepts plain Series; pivot_table works from existing columns and defaults to averaging. crosstab is a frequency-focused wrapper taking Series; pivot_table works from DataFrame columns and averages by default.