Model Interpretability (SHAP & LIME)
Open the black box. Learn to explain why a model made a decision using feature importance, SHAP values, LIME, and partial dependence — the difference between a model you trust and one you don't.
Learn Model Interpretability (SHAP & LIME) in our free AI & Machine Learning course — a beginner-friendly interactive lesson with worked examples, a practice…
Part of the free AI & Machine Learning course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
Imagine a bank rejects your loan. "Computer says no" isn't good enough — by law and by fairness you deserve to know why . A good loan officer explains: "Your income helped, but three missed payments last year pushed you over the line." That breakdown is exactly what SHAP and LIME produce for a model.
Interpretability turns an opaque score into a sentence a human can check, challenge, and trust — and that's what separates a deployable model from a risky one.
Some models are white-box — you can read the decision straight out of them. Linear and logistic regression expose coefficients; a small decision tree is a flowchart. Others are black-box — deep neural networks and large gradient-boosting ensembles are often the most accurate, but their internal reasoning is opaque.
The tension is real: the most accurate model is frequently the least interpretable. Rather than always downgrading to a simpler model, we use post-hoc, model-agnostic explanation tools to interrogate the black box after training.
Global feature importance ranks which features matter most overall. But it can't tell you why a specific customer was rejected. SHAP values can: borrowed from cooperative game theory (Shapley values), they fairly split a single prediction among its features.
Their defining property is additivity : for one prediction, the SHAP values sum to the gap between that prediction and the model's baseline (average) output. So you can read exactly which features pushed the result up and which pulled it down. Run this hand-built version.
LIME (Local Interpretable Model-agnostic Explanations) takes a different route: to explain one prediction, it nudges the input around that point, watches how the black box reacts, and fits a simple model locally to approximate it. The simple model's coefficients become the explanation — fast and intuitive, though only faithful near that one point.
A partial dependence plot (PDP) gives the global complement: it shows how the predicted outcome changes as one feature varies across its range, averaging over the others — revealing whether a feature's effect is steady, threshold-like, or non-linear.
Here's how the libraries are called. Study it (it isn't runnable in the in-browser sandbox) and notice that all three are applied after training to an already-fitted model .
TreeExplainer is the fast SHAP variant for tree/boosting models; LIME and PDP treat the model as a queryable black box.
In regulated fields — lending, healthcare, hiring, insurance — a decision must be explainable and auditable . People affected have a right to know why, and organisations must check models for bias and fairness . Interpretability also helps you debug: a feature dominating for the wrong reason often reveals leakage or a flawed pipeline.
Fill in the blanks so the contributions sum to give the prediction. The expected output is in the comments.
Find the feature that pushed a prediction up the most. Fill in the blank in the max() key.
Build a full SHAP-style explanation from baseline to prediction and name the biggest driver. Only a comment outline is provided.
These are the traps when explaining models. Watch for them.
SHAP and LIME describe the model, not reality — a confident explanation of a wrong model is still wrong.
✅ Fix: read them as how the model behaves, and corroborate:
❌ Confusing global importance with a single decision
Global importance can't explain why one specific person was rejected.
✅ Fix: use local SHAP/LIME for one prediction:
❌ Using a black box when the law needs transparency
Sometimes a post-hoc explanation isn't enough and a directly interpretable model is required.
✅ Fix: prefer an inherently interpretable model when mandated:
You can now explain a model's decisions with feature importance , SHAP values , LIME , and partial dependence plots , and you know when to choose a white-box model outright.
🎓 That wraps up the Classical Machine Learning track — from SVMs and clustering through boosting, forecasting, anomaly detection, and now interpretability. Well done!
Practice quiz
What is model interpretability?
- Understanding why a model made a given prediction
- Making a model faster
- Adding more features
- Increasing accuracy
Answer: Understanding why a model made a given prediction. Interpretability is about explaining a model's predictions — why it decided what it did, in human-understandable terms.
Which of these is generally an interpretable ('white-box') model?
- Deep neural network
- Linear/logistic regression
- Random forest with 500 trees
- Large gradient boosting ensemble
Answer: Linear/logistic regression. Linear and logistic regression (and small decision trees) are directly readable; deep nets and big ensembles are black boxes.
What does global feature importance tell you?
- Why one specific prediction was made
- The training time
- Which features matter most across the whole model
- The number of layers
Answer: Which features matter most across the whole model. Global feature importance ranks features by their overall contribution across all predictions, not for a single case.
SHAP values are based on a concept from:
- Thermodynamics
- Cooperative game theory (Shapley values)
- Calculus only
- Random sampling alone
Answer: Cooperative game theory (Shapley values). SHAP uses Shapley values from game theory to fairly attribute a prediction among the features that contributed to it.
A key property of SHAP values for a single prediction is that they:
- Are always equal
- Sum to the difference between the prediction and the average prediction
- Ignore the features
- Are random each run
Answer: Sum to the difference between the prediction and the average prediction. SHAP values are additive: they sum to how far this prediction sits from the model's baseline (average) output.
How does LIME explain a single prediction?
- By retraining the whole model
- By fitting a simple local model around that one point
- By deleting features permanently
- By clustering the data
Answer: By fitting a simple local model around that one point. LIME perturbs the input around one instance and fits a simple, interpretable model locally to approximate the black box there.
What does a partial dependence plot (PDP) show?
- The training loss curve
- How the predicted outcome changes as one feature varies, averaging over the rest
- The number of trees
- The confusion matrix
Answer: How the predicted outcome changes as one feature varies, averaging over the rest. A PDP shows the marginal effect of a feature on predictions, averaging out the other features to reveal its overall shape.
Why does interpretability matter in regulated fields like lending or healthcare?
- It speeds up training
- Decisions must be explained and audited for fairness and legal reasons
- It removes the need for data
- It increases the learning rate
Answer: Decisions must be explained and audited for fairness and legal reasons. Regulators and affected people need to know why a model decided as it did, to check for fairness, bias, and legality.
What is the main trade-off interpretability methods address?
- Speed vs memory
- Accuracy of complex models vs the need to understand them
- Color vs size
- Rows vs columns
Answer: Accuracy of complex models vs the need to understand them. Powerful black-box models are often most accurate; interpretability tools let you keep accuracy while still explaining decisions.
SHAP and LIME are examples of what kind of explanation method?
- Model-specific only
- Data-cleaning tools
- Loss functions
- Post-hoc, model-agnostic explanations
Answer: Post-hoc, model-agnostic explanations. Both are post-hoc (applied after training) and largely model-agnostic — they can explain almost any trained model.