Anomaly Detection
Find the rare points that don't belong — the fraud, the defect, the intrusion. Learn statistical rules plus Isolation Forest, One-Class SVM, and Local Outlier Factor.
Learn Anomaly Detection in our free AI & Machine Learning course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a…
Part of the free AI & Machine Learning course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
Picture a bank watching thousands of card transactions an hour. Almost all are ordinary — coffee, groceries, fuel. Then one charges $9,000 in a foreign country at 3am. It doesn't fit the pattern, so it gets flagged for review. That is anomaly detection .
The challenge is that fraud is rare : you mostly have examples of 'normal' and very few of 'weird'. So instead of learning to tell two balanced classes apart, anomaly detectors learn what normal looks like and flag whatever sits far outside it.
An anomaly (or outlier) is a point that differs markedly from the rest. The difficulty is rarity and missing labels : anomalies might be 0.1% of the data and often nobody has tagged them. That makes the classes wildly imbalanced — a lazy classifier could call everything "normal" and score 99.9% while catching nothing useful.
So most anomaly detection is unsupervised : learn the shape of "normal" and flag what falls outside it. The methods range from simple statistical rules to dedicated algorithms.
The simplest detectors work on a single feature. The z-score rule measures how many standard deviations a point is from the mean and flags anything beyond a threshold (often 3). It assumes the data is roughly bell-shaped. The IQR rule is more robust: it uses quartiles and flags points beyond Q1 - 1.5*IQR or Q3 + 1.5*IQR .
Run both on a series with one obvious outlier and compare.
When you have many features, dedicated algorithms shine:
Here are all three model-based detectors side by side. Study it (it isn't runnable in the in-browser sandbox) and notice the contamination / nu parameter, which tells each one roughly how many anomalies to expect.
All three flag the value 95 as -1 . In practice you'd tune the contamination from how often anomalies really occur.
Most detectors need to know roughly how many anomalies to expect — the contamination (or nu ) parameter. It sets the threshold between the normal scores and the anomalous ones. Set it from domain knowledge: if about 1% of transactions are fraudulent, contamination is roughly 0.01.
Fill in the blank so is_outlier() compares the absolute z-score to the threshold. The expected output is in the comments.
Detectors return -1 for anomalies and 1 for normal. Fill in the two blanks to translate the label into words.
Implement the robust IQR outlier rule from scratch and flag the outlier. Only a comment outline is provided.
These are the traps that catch anomaly-detection beginners. Watch for them.
With 0.1% anomalies, calling everything "normal" scores 99.9% accuracy and catches nothing.
✅ Fix: use precision, recall, and F1 on the anomaly class:
The mean and std are themselves distorted by the very outliers you're hunting.
✅ Fix: use the robust IQR rule (quartile-based):
A wildly wrong contamination floods you with false alarms or hides real anomalies.
You can now flag outliers with the z-score and robust IQR rules, and reach for Isolation Forest , One-Class SVM , or Local Outlier Factor on richer data — tuning contamination to the real anomaly rate.
🚀 Up next: Model Interpretability — opening the black box with SHAP and LIME.
Practice quiz
What is anomaly detection?
- Predicting a continuous value
- Finding rare points that differ markedly from the rest
- Grouping similar points
- Reducing dimensions
Answer: Finding rare points that differ markedly from the rest. Anomaly (outlier) detection flags the unusual points that don't fit the normal pattern of the data.
Why are anomalies usually hard to learn with ordinary classification?
- They are too common
- They are rare and often unlabelled, so classes are highly imbalanced
- They have too many features
- They never repeat
Answer: They are rare and often unlabelled, so classes are highly imbalanced. Anomalies are rare and frequently unlabelled, so the data is extremely imbalanced and standard classifiers struggle.
How does the z-score method flag an outlier?
- By clustering
- By tree depth
- By how many standard deviations a point is from the mean
- By the median only
Answer: By how many standard deviations a point is from the mean. A z-score measures distance from the mean in standard deviations; points beyond a threshold (e.g. |z| > 3) are flagged.
The IQR method flags points that fall outside:
- The mean plus 1
- 1.5 times the interquartile range beyond Q1 or Q3
- The first value
- Any negative value
Answer: 1.5 times the interquartile range beyond Q1 or Q3. The IQR rule flags values below Q1 - 1.5*IQR or above Q3 + 1.5*IQR; it is robust because it uses quartiles, not the mean.
How does Isolation Forest detect anomalies?
- By averaging trees' votes for a label
- Anomalies are isolated in fewer random splits than normal points
- By computing distances to centroids
- By gradient descent
Answer: Anomalies are isolated in fewer random splits than normal points. Isolation Forest randomly partitions data; outliers get isolated with very few splits, giving short path lengths.
What does One-Class SVM learn?
- A boundary around the normal data, flagging anything outside as anomalous
- Two balanced classes
- A regression line
- A cluster count
Answer: A boundary around the normal data, flagging anything outside as anomalous. One-Class SVM fits a tight boundary around the 'normal' training points; new points outside that boundary are anomalies.
Local Outlier Factor (LOF) measures anomalies based on:
- Global mean only
- A point's local density compared with its neighbours' density
- Tree depth
- The label distribution
Answer: A point's local density compared with its neighbours' density. LOF compares a point's density to that of its neighbours; a point in a much sparser region than its neighbours is an outlier.
Statistical methods like z-score assume the data is roughly:
- Uniformly random
- Normally distributed (bell-shaped)
- All identical
- Categorical
Answer: Normally distributed (bell-shaped). The z-score rule relies on a roughly normal distribution; on skewed data the robust IQR method is usually safer.
Which is a classic real-world use case for anomaly detection?
- Image resizing
- Credit-card fraud detection
- Sorting a list
- Spell-checking
Answer: Credit-card fraud detection. Fraud detection is a flagship use case: fraudulent transactions are rare anomalies hidden among many normal ones.
In scikit-learn, what does a contamination parameter set?
- The learning rate
- The number of features
- The tree depth
- The expected proportion of anomalies in the data
Answer: The expected proportion of anomalies in the data. contamination tells the detector roughly what fraction of points to expect as anomalies, setting the decision threshold.