Bayesian Statistics in R

Bayesian statistics updates beliefs with evidence: start from a prior, combine it with the likelihood of the data, and get a posterior distribution that quantifies your remaining uncertainty.

Learn Bayesian Statistics in R in our free R course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick reference.

Part of the free R course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

By the end of this lesson you'll understand priors, likelihoods, and posteriors, fit Bayesian regressions with brms and rstanarm via MCMC, and read credible intervals and posterior predictive checks.

What You'll Learn in This Lesson

1️⃣ Bayes' Theorem: Prior x Likelihood

The posterior is proportional to the prior times the likelihood. With a conjugate prior the math is clean: a Beta prior plus Binomial data gives a Beta posterior, no sampling required.

2️⃣ Credible Intervals

A credible interval is a direct probability statement about a parameter. From the Beta(9, 5) posterior, qbeta() gives the 2.5% and 97.5% quantiles — the 95% credible interval.

3️⃣ Bayesian Regression with brms

brms writes Stan code for you. You give a formula and priors; it runs MCMC chains and returns posterior means, credible intervals, and the Rhat convergence diagnostic.

4️⃣ Posterior Predictive Checks

A fitted model should reproduce the data it was trained on. pp_check() simulates datasets from the posterior and overlays them on the observed data so you can spot misfit at a glance.

Your turn. Fill in the # TODO blank, run it, and compare with the expected output.

Update a Beta prior with spam-filter data and report the posterior mean and a 90% credible interval. The conjugate Beta-Binomial pair keeps the arithmetic simple.

📋 Quick Reference — Bayesian R

Practice quiz

In Bayesian terms, what does the prior represent?

  • Beliefs about a parameter before seeing the data
  • The final answer
  • The observed data
  • The p-value

Answer: Beliefs about a parameter before seeing the data. The prior encodes what you believe about a parameter before observing the current data.

What does the likelihood describe?

  • The sample size
  • The prior belief
  • How probable the observed data is given a parameter value
  • The posterior mean

Answer: How probable the observed data is given a parameter value. The likelihood is the probability of the data given a parameter value.

Bayes' theorem says the posterior is proportional to which combination?

  • likelihood divided by prior
  • prior times likelihood
  • posterior times evidence
  • prior minus likelihood

Answer: prior times likelihood. Posterior is proportional to prior times likelihood (then normalized by the evidence).

Which R packages fit Bayesian regression models via Stan?

  • dplyr and tidyr
  • ggplot2 and plotly
  • knitr and rmarkdown
  • brms and rstanarm

Answer: brms and rstanarm. brms and rstanarm provide Bayesian regression interfaces built on the Stan engine.

What is a credible interval?

  • A range with a stated posterior probability of containing the parameter
  • The same as a confidence interval
  • The range of the raw data
  • The interval between two priors

Answer: A range with a stated posterior probability of containing the parameter. A 95% credible interval contains the parameter with 95% posterior probability, given the model.

What does MCMC sampling produce?

  • Exact closed-form integrals
  • Draws that approximate the posterior distribution
  • The prior distribution
  • A single point estimate only

Answer: Draws that approximate the posterior distribution. Markov chain Monte Carlo draws samples whose distribution approximates the posterior.

Which statement about credible vs confidence intervals is correct?

  • Confidence intervals require a prior
  • Credible intervals ignore the data
  • They are identical by definition
  • A credible interval makes a direct probability statement about the parameter

Answer: A credible interval makes a direct probability statement about the parameter. A credible interval directly states the probability the parameter lies in the range; a confidence interval is about the procedure's long-run coverage.

What is a posterior predictive check used for?

  • Counting MCMC chains
  • Choosing the optimizer
  • Comparing data simulated from the fitted model to the observed data
  • Setting the prior automatically

Answer: Comparing data simulated from the fitted model to the observed data. Posterior predictive checks compare simulated replicated data to the observed data to assess model fit.

In brms, which argument lets you specify your prior beliefs?

  • chains
  • prior
  • iter
  • family

Answer: prior. The prior argument (often built with set_prior or prior()) specifies prior distributions in brms.

What does an Rhat value near 1.00 indicate in MCMC output?

  • The chains have converged
  • The model has too few predictors
  • The prior is wrong
  • The data is missing

Answer: The chains have converged. Rhat close to 1.00 signals that the MCMC chains have mixed and converged.