Cross-Tabulation (crosstab)

A cross-tabulation counts how often each combination of two categories occurs, laying one variable down the rows and the other across the columns to reveal their relationship at a glance.

Learn Cross-Tabulation (crosstab) in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick…

Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

You'll learn pd.crosstab(rows, cols) , percentages with normalize= , totals with margins=True , and value summaries with aggfunc .

The simplest call pd.crosstab(df["dept"], df["level"]) builds a frequency table : each cell is the number of rows that have that dept and that level. Add margins=True to tack on an All row and column with the totals.

Raw counts can mislead when groups are different sizes. normalize rescales the table into proportions:

A crosstab does not have to count. Supply values= with a numeric Series and aggfunc= with how to summarise it, and each cell becomes that statistic for the combination — for example the average salary of each dept/level pair instead of a head count.

Two stores log which category each sale belongs to. Compare their mix.

Lesson complete — you can profile categories fast!

You can count combinations with pd.crosstab , add totals with margins=True , switch to percentages with normalize , and summarise values with values= + aggfunc= .

🚀 Up next: Reading & Writing SQL — move data between pandas and databases.

Practice quiz

What does a plain pd.crosstab(rows, cols) compute in each cell?

  • The mean of a value
  • The sum of a value
  • The count of rows with that row/column combination
  • A correlation

Answer: The count of rows with that row/column combination. By default crosstab builds a frequency table: each cell counts the matching rows.

In pd.crosstab(a, b), what does the first argument become?

  • The rows (index)
  • The columns
  • The values
  • The totals

Answer: The rows (index). The first argument forms the rows (index) and the second forms the columns.

What must you pass to crosstab as the row and column arguments?

  • Column name strings
  • A dictionary
  • A list of bins
  • The actual Series

Answer: The actual Series. crosstab takes the actual Series, not column-name strings; passing strings treats them as data.

What does margins=True add?

  • Percentages
  • An 'All' row and column with the totals
  • A new value column
  • Sorting

Answer: An 'All' row and column with the totals. margins=True appends an 'All' row and column holding the marginal totals.

What does normalize=True do?

  • Each cell is divided by the grand total (whole table sums to 1)
  • Each row sums to 1
  • Each column sums to 1
  • Nothing changes

Answer: Each cell is divided by the grand total (whole table sums to 1). normalize=True divides every cell by the grand total so the whole table sums to 1.

Which normalize value makes each row sum to 1 (row percentages)?

  • normalize=True
  • normalize='columns'
  • normalize='index'
  • normalize='rows'

Answer: normalize='index'. normalize='index' rescales so each row's values sum to 1.

To compute the average salary per dept/level cell instead of a count, you pass...

  • margins=True
  • values= and aggfunc= together
  • normalize='index'
  • keys=

Answer: values= and aggfunc= together. values= supplies the numeric Series and aggfunc= says how to summarise it; both are required together.

What happens if you pass values= without aggfunc=?

  • It defaults to count
  • It averages automatically
  • It returns the values unchanged
  • pandas raises an error

Answer: pandas raises an error. values= and aggfunc= must travel together; one without the other raises an error.

For a 6-row frame with depts Eng/Sales each appearing across Jr and Sr, what is the 'All' total cell with margins=True?

  • 3
  • 6
  • 2
  • 4

Answer: 6. The bottom-right 'All' cell is the grand total of rows, which is 6.

How does crosstab differ from pivot_table?

  • They are identical
  • pivot_table only counts
  • crosstab defaults to counting and accepts plain Series; pivot_table works from existing columns and defaults to averaging
  • crosstab cannot use aggfunc

Answer: crosstab defaults to counting and accepts plain Series; pivot_table works from existing columns and defaults to averaging. crosstab is a frequency-focused wrapper taking Series; pivot_table works from DataFrame columns and averages by default.