Generalized Linear Models (glm)

A generalized linear model extends ordinary linear regression to outcomes that aren't continuous — like yes/no results or counts — by adding a "family" and a link function, with logistic regression (family = binomial) being the most common example.

Learn Generalized Linear Models (glm) in our free R course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick…

Part of the free R course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

In this lesson you'll see why lm() fails for 0/1 outcomes, fit a logistic model with glm(..., family = binomial) , read the summary() coefficients, convert them to odds ratios, and get probabilities with predict(type = "response") .

What You'll Learn in This Lesson

1️⃣ Why glm()? From lm to Families

Ordinary lm() predicts any real number, so for a 0/1 outcome it can return nonsense like a probability of 1.4. glm() fixes this with a family that matches your outcome type and a link that keeps predictions valid.

Now fit a real logistic model. The formula syntax is identical to lm() — outcome ~ predictor — you just add family = binomial .

2️⃣ Interpreting Coefficients: Odds Ratios

Logistic coefficients live on the log-odds scale, which is awkward to talk about. Exponentiating with exp() turns each into an odds ratio — a multiplier on the odds per unit change in the predictor.

The rule of thumb: an odds ratio above 1 means the predictor raises the odds of the outcome, below 1 lowers them, and exactly 1 means no effect. This single sentence unlocks most of logistic-regression interpretation.

3️⃣ Predicting Probabilities

To predict for new data, use predict() — but remember it defaults to the log-odds scale. Add type = "response" to get back genuine probabilities you can act on.

Plotting those probabilities against the predictor reveals the signature S-shaped (sigmoid) logistic curve: predictions hug 0 at low values, rise steeply through the middle, and flatten toward 1 — never escaping the valid [0, 1] range.

Your turn. Fill in the # TODO blank and run it.

Real models use several predictors, including categorical ones. Fit a default-risk model on income plus a credit rating, interpret both odds ratios, predict for a new applicant, and compare AIC.

📋 Quick Reference — glm

Practice quiz

Which family argument fits a logistic regression for a 0/1 outcome?

  • family = binomial
  • family = gaussian
  • family = poisson
  • family = ordinal

Answer: family = binomial. family = binomial gives logistic regression for binary outcomes.

Which family makes glm() behave like ordinary lm() for a continuous outcome?

  • family = poisson
  • family = gaussian
  • family = binomial
  • family = gamma

Answer: family = gaussian. family = gaussian is the default and matches lm().

Which family is appropriate for COUNT data (0, 1, 2, ...)?

  • family = binomial
  • family = gaussian
  • family = poisson
  • family = uniform

Answer: family = poisson. family = poisson models count outcomes.

On what scale are raw logistic-regression coefficients reported?

  • Probabilities
  • Counts
  • Percentages
  • Log-odds (logit)

Answer: Log-odds (logit). Logistic coefficients are on the log-odds scale, which is why they are hard to read directly.

What does exp(coef(model)) give you?

  • Odds ratios
  • Probabilities
  • p-values
  • Standard errors

Answer: Odds ratios. Exponentiating the log-odds coefficients yields interpretable odds ratios.

An odds ratio of exactly 1 means:

  • The predictor doubles the odds
  • The predictor halves the odds
  • No effect on the odds
  • The model failed to fit

Answer: No effect on the odds. OR > 1 raises the odds, < 1 lowers them, and exactly 1 means no effect.

What does predict() on a glm return by DEFAULT?

  • Probabilities
  • Predictions on the link (log-odds) scale
  • Class labels
  • The raw data

Answer: Predictions on the link (log-odds) scale. predict() defaults to the link scale; for logistic that is log-odds.

Which argument makes predict() return PROBABILITIES in [0, 1]?

  • type = "prob"
  • scale = "response"
  • type = "odds"
  • type = "response"

Answer: type = "response". type = "response" applies the inverse link to give probabilities.

What happens if you forget family = binomial on 0/1 data?

  • glm() defaults to gaussian (like lm), giving wrong results
  • glm() errors out
  • It silently uses poisson
  • It returns exact probabilities anyway

Answer: glm() defaults to gaussian (like lm), giving wrong results. Without a family, glm() uses gaussian, which is inappropriate for binary outcomes.

In the summary(), what does the Pr(>|z|) column report?

  • The odds ratio
  • The p-value testing whether the coefficient differs from zero
  • The AIC
  • The confidence interval width

Answer: The p-value testing whether the coefficient differs from zero. Pr(>|z|) is the p-value for each coefficient; smaller values get significance stars.