Generalized Linear Models (glm)
A generalized linear model extends ordinary linear regression to outcomes that aren't continuous — like yes/no results or counts — by adding a "family" and a link function, with logistic regression (family = binomial) being the most common example.
Learn Generalized Linear Models (glm) in our free R course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick…
Part of the free R course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
In this lesson you'll see why lm() fails for 0/1 outcomes, fit a logistic model with glm(..., family = binomial) , read the summary() coefficients, convert them to odds ratios, and get probabilities with predict(type = "response") .
What You'll Learn in This Lesson
1️⃣ Why glm()? From lm to Families
Ordinary lm() predicts any real number, so for a 0/1 outcome it can return nonsense like a probability of 1.4. glm() fixes this with a family that matches your outcome type and a link that keeps predictions valid.
Now fit a real logistic model. The formula syntax is identical to lm() — outcome ~ predictor — you just add family = binomial .
2️⃣ Interpreting Coefficients: Odds Ratios
Logistic coefficients live on the log-odds scale, which is awkward to talk about. Exponentiating with exp() turns each into an odds ratio — a multiplier on the odds per unit change in the predictor.
The rule of thumb: an odds ratio above 1 means the predictor raises the odds of the outcome, below 1 lowers them, and exactly 1 means no effect. This single sentence unlocks most of logistic-regression interpretation.
3️⃣ Predicting Probabilities
To predict for new data, use predict() — but remember it defaults to the log-odds scale. Add type = "response" to get back genuine probabilities you can act on.
Plotting those probabilities against the predictor reveals the signature S-shaped (sigmoid) logistic curve: predictions hug 0 at low values, rise steeply through the middle, and flatten toward 1 — never escaping the valid [0, 1] range.
Your turn. Fill in the # TODO blank and run it.
Real models use several predictors, including categorical ones. Fit a default-risk model on income plus a credit rating, interpret both odds ratios, predict for a new applicant, and compare AIC.
📋 Quick Reference — glm
Practice quiz
Which family argument fits a logistic regression for a 0/1 outcome?
- family = binomial
- family = gaussian
- family = poisson
- family = ordinal
Answer: family = binomial. family = binomial gives logistic regression for binary outcomes.
Which family makes glm() behave like ordinary lm() for a continuous outcome?
- family = poisson
- family = gaussian
- family = binomial
- family = gamma
Answer: family = gaussian. family = gaussian is the default and matches lm().
Which family is appropriate for COUNT data (0, 1, 2, ...)?
- family = binomial
- family = gaussian
- family = poisson
- family = uniform
Answer: family = poisson. family = poisson models count outcomes.
On what scale are raw logistic-regression coefficients reported?
- Probabilities
- Counts
- Percentages
- Log-odds (logit)
Answer: Log-odds (logit). Logistic coefficients are on the log-odds scale, which is why they are hard to read directly.
What does exp(coef(model)) give you?
- Odds ratios
- Probabilities
- p-values
- Standard errors
Answer: Odds ratios. Exponentiating the log-odds coefficients yields interpretable odds ratios.
An odds ratio of exactly 1 means:
- The predictor doubles the odds
- The predictor halves the odds
- No effect on the odds
- The model failed to fit
Answer: No effect on the odds. OR > 1 raises the odds, < 1 lowers them, and exactly 1 means no effect.
What does predict() on a glm return by DEFAULT?
- Probabilities
- Predictions on the link (log-odds) scale
- Class labels
- The raw data
Answer: Predictions on the link (log-odds) scale. predict() defaults to the link scale; for logistic that is log-odds.
Which argument makes predict() return PROBABILITIES in [0, 1]?
- type = "prob"
- scale = "response"
- type = "odds"
- type = "response"
Answer: type = "response". type = "response" applies the inverse link to give probabilities.
What happens if you forget family = binomial on 0/1 data?
- glm() defaults to gaussian (like lm), giving wrong results
- glm() errors out
- It silently uses poisson
- It returns exact probabilities anyway
Answer: glm() defaults to gaussian (like lm), giving wrong results. Without a family, glm() uses gaussian, which is inappropriate for binary outcomes.
In the summary(), what does the Pr(>|z|) column report?
- The odds ratio
- The p-value testing whether the coefficient differs from zero
- The AIC
- The confidence interval width
Answer: The p-value testing whether the coefficient differs from zero. Pr(>|z|) is the p-value for each coefficient; smaller values get significance stars.