ANOVA & Model Comparison
ANOVA (Analysis of Variance) is a statistical test that uses an F-statistic to decide whether the means of three or more groups differ by more than random chance.
Learn ANOVA & Model Comparison in our free R course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick reference.
Part of the free R course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
By the end of this lesson you'll fit one-way and two-way ANOVAs with aov(), read every column of the ANOVA table, run TukeyHSD to find which groups differ, and use anova() to compare nested models.
What You'll Learn in This Lesson
1️⃣ One-Way ANOVA and the F-Test
One-way ANOVA tests whether a numeric outcome differs across the levels of a single grouping factor. You fit it with aov(outcome ~ group) and read the result with summary() . The key output is the F value and its Pr(>F) p-value.
2️⃣ Which Groups Differ? TukeyHSD
A significant ANOVA only says "some group differs." To find which pairs differ, run TukeyHSD() on the fitted model. It reports each pairwise difference with a confidence interval and an adjusted p-value, correcting for multiple comparisons.
3️⃣ Two-Way ANOVA and Comparing Models
Two-way ANOVA studies two factors at once and whether they interact. The formula y ~ a * b expands to both main effects plus the a:b interaction. A significant interaction means the effect of one factor depends on the other.
Separately, anova() compares two nested linear models — a simpler one against a richer one — to decide whether the extra predictors earn their keep.
Your turn. Fill in the # TODO blank, run it, and interpret the table.
Fit two nested models on mtcars and let anova() referee. The lesson is the discipline of adding complexity only when the data justify it.
📋 Quick Reference — ANOVA
Practice quiz
Which function fits a one-way ANOVA model in R?
- lm.anova(y, group)
- anova.fit(y ~ group)
- aov(y ~ group)
- fisher(y ~ group)
Answer: aov(y ~ group). aov() fits the ANOVA model; you read it with summary().
What does the F-statistic represent in ANOVA?
- Between-group variation divided by within-group variation
- Within-group divided by between-group variation
- The total sum of squares
- The number of groups minus one
Answer: Between-group variation divided by within-group variation. F is the ratio of between-group mean square to within-group mean square.
A small Pr(>F) value (e.g. < 0.05) tells you that...
- All group means are exactly equal
- The model failed to fit
- The residuals are normal
- At least one group mean differs from the others
Answer: At least one group mean differs from the others. A small p-value is evidence that group means are not all equal.
Why must the grouping variable usually be a factor?
- Factors run faster than character vectors
- Otherwise numbers may be treated as continuous, not as groups
- aov() rejects character vectors entirely
- Factors use less memory
Answer: Otherwise numbers may be treated as continuous, not as groups. Wrap a grouping variable in factor() so R treats levels as distinct groups.
After a significant ANOVA, which function finds WHICH pairs differ?
- TukeyHSD(fit)
- summary(fit)
- predict(fit)
- coef(fit)
Answer: TukeyHSD(fit). TukeyHSD() compares every pair while adjusting for multiple comparisons.
In the formula y ~ a * b, what does a * b expand to?
- a:b only
- a + b only
- a + b + a:b
- a - b
Answer: a + b + a:b. The * shorthand includes both main effects plus the a:b interaction.
What does a significant a:b interaction term mean?
- Both factors are irrelevant
- The effect of one factor depends on the level of the other
- The two factors are perfectly correlated
- The model has no main effects
Answer: The effect of one factor depends on the level of the other. An interaction means you can't describe one factor without mentioning the other.
What does anova(m1, m2) do for two nested lm models?
- Plots the residuals of both models
- Returns the AIC of each model
- Refits both models on new data
- F-tests whether the extra terms significantly improve the fit
Answer: F-tests whether the extra terms significantly improve the fit. anova() compares nested models with an F-test on the added terms.
For anova() to compare two models validly, the models must be...
- Completely unrelated
- Nested and fit on the same rows
- Fit with different datasets
- Both intercept-only models
Answer: Nested and fit on the same rows. anova() requires nested models fit on the same data; otherwise use AIC().
How do you read a column of the ANOVA table named Pr(>F)?
- It is the degrees of freedom
- It is the residual sum of squares
- It is the p-value for that term's F-test
- It is the mean of the outcome
Answer: It is the p-value for that term's F-test. Pr(>F) is the p-value; small values flag a significant effect.