Apply Isolation Forest: Detect anomalies in high-dimensional data without labeled examples
Build Autoencoders: Use neural networks to learn normal patterns and flag deviations
Implement LSTM Networks: Detect temporal sequence anomalies in streaming data
Choose Between ML Methods: Select appropriate algorithms based on data characteristics and deployment constraints
In 60 Seconds
ML anomaly detection learns what “normal” sensor behavior looks like from historical data and flags anything that deviates. Unlike manual thresholds, these methods catch complex multi-dimensional patterns – for example, temperature and vibration individually normal but their combination at a specific RPM being unusual. Isolation Forest works without labeled anomalies and runs on edge devices. Autoencoders handle high-dimensional data (20+ sensors). LSTMs detect temporal sequence anomalies. Start simple (Isolation Forest), graduate to deep learning only when needed.
For Beginners: ML-Based Anomaly Detection
Data analytics and machine learning for IoT is about extracting useful insights from the massive streams of sensor data. Think of it like panning for gold – raw sensor readings are the river gravel, and analytics tools help you find the valuable nuggets of information hidden within. Machine learning takes this further by automatically learning patterns and making predictions.
Minimum Viable Understanding: ML Anomaly Detection
Core Concept: Machine learning methods learn what “normal” looks like from historical data and flag points that don’t fit the learned patterns. No explicit rules or thresholds required.
Why It Matters: ML methods detect complex, multi-dimensional anomalies that statistical methods miss. A motor with normal temperature, normal vibration, and normal current individually might still be anomalous if the combination of these values is unusual.
Key Takeaway: Use Isolation Forest when you have multi-dimensional data and no labeled anomalies. Use autoencoders for high-dimensional sensor data. Use LSTM for sequential patterns over time.
13.2 Prerequisites
Before diving into this chapter, you should be familiar with:
Statistical Methods: Understanding when simpler methods fail and ML is needed
Anomaly Types: Understanding collective anomalies that require pattern-based detection
~20 min | Advanced | P10.C01.U04
Key Concepts
Isolation Forest: Unsupervised ML algorithm that detects anomalies by measuring how few random splits are needed to isolate a data point – fewer splits means more anomalous
Autoencoder: Neural network trained to compress and reconstruct normal data; anomalies produce high reconstruction error because the network has never learned their patterns
LSTM (Long Short-Term Memory): Recurrent neural network that learns sequential patterns over time, detecting anomalies when actual values deviate from predicted next values
Contamination parameter: The expected proportion of anomalies in the dataset, used by Isolation Forest to set its detection threshold
Reconstruction error: The difference between an autoencoder’s input and its output; high error signals the input deviates from learned normal patterns
Concept drift: Gradual change in what constitutes “normal” behavior over time, causing static ML models to degrade without periodic retraining
Unsupervised learning: Training approach requiring only normal (unlabeled) data – the model learns the structure of normality without being told what anomalies look like
13.3 Introduction
When statistical methods reach their limits – complex patterns, high-dimensional data, or no known distribution – machine learning becomes essential. This chapter covers three core ML approaches for anomaly detection: Isolation Forest for unsupervised multi-dimensional detection, autoencoders for high-dimensional pattern learning, and LSTM networks for temporal sequence analysis. Each trades off complexity for detection power, and understanding when to use which method is key to building effective IoT monitoring systems.
How It Works
ML-based anomaly detection follows a fundamentally different approach than statistical methods. Instead of defining what is “abnormal” (like thresholds), ML learns what is “normal” from historical data and flags deviations.
The Learning Process:
Training Phase: Feed the model thousands of examples of normal sensor behavior. The model learns patterns, correlations, and typical ranges without being explicitly told what they are.
Pattern Encoding: The model internally represents “normal” as mathematical patterns. For Isolation Forest, this means decision trees that separate normal data. For autoencoders, it means compressed representations that capture essential features. For LSTM, it means sequential patterns over time.
Detection Phase: When new data arrives, the model compares it to learned patterns. High deviation scores indicate anomalies. The key advantage is that the model detects complex patterns humans might miss – like “vibration is normal, temperature is normal, but this combination at this RPM is unusual.”
Why This Works Better Than Rules: Traditional rule-based systems require engineers to manually define every possible failure mode. ML systems discover failure patterns automatically from data, including rare combinations that humans would never think to check.
13.4 Unsupervised Learning
Key Advantage: No labeled anomaly data required. The model learns “normal” from unlabeled data and flags deviations.
13.4.1 Isolation Forest
Core Concept: Anomalies are “easier to isolate” than normal points. Build random decision trees that split data – anomalies require fewer splits to isolate.
Why It Works:
Normal points are clustered – requires many splits to isolate one point
Anomalies are isolated – few splits needed
Algorithm:
Randomly select a feature and split value
Partition data recursively
Anomalies end up in shorter paths (fewer splits)
Anomaly score = average path length across trees
Putting Numbers to It
Isolation Forest anomaly scores are based on path length in binary trees. For a dataset of \(n\) samples, the average path length for normal points follows:
\[c(n) = 2H(n-1) - \frac{2(n-1)}{n}\]
where \(H(i)\) is the harmonic number (\(H(i) \approx \ln(i) + 0.5772\), the Euler-Mascheroni constant). The anomaly score is:
\[s(x, n) = 2^{-\frac{E(h(x))}{c(n)}}\]
where \(E(h(x))\) is the average path length for point \(x\) across all trees.
Example: For \(n = 1,000\) training samples and 100 trees: - Normal point: \(E(h) \approx 12\) splits, \(s \approx 0.5\) (near boundary) - Anomaly: \(E(h) \approx 4\) splits, \(s \approx 0.85\) (clearly anomalous)
With contamination=0.01, the threshold is set to catch the top 1% of scores. For a motor with 4 features, an anomaly requiring only 3-4 splits (vs. 10-12 for normal) indicates a rare multi-dimensional pattern worth investigating.
Implementation:
from sklearn.ensemble import IsolationForestimport numpy as npclass IsolationForestDetector:def__init__(self, contamination=0.01, n_estimators=100):""" contamination: expected proportion of anomalies (1% default) n_estimators: number of trees """self.model = IsolationForest( contamination=contamination, n_estimators=n_estimators, random_state=42 )self.is_trained =Falsedef train(self, normal_data):""" Train on normal operating data normal_data: shape (n_samples, n_features) """self.model.fit(normal_data)self.is_trained =Truedef predict(self, data):""" Predict anomalies in new data Returns: array of -1 (anomaly) or 1 (normal) """ifnotself.is_trained:raiseValueError("Model must be trained first")returnself.model.predict(data)def score(self, data):""" Get anomaly scores (more negative = more anomalous) """returnself.model.score_samples(data)# Example: Multi-sensor motor monitoring# Features: [temperature, vibration, current, RPM]# Normal operating data (1000 samples)normal_data = np.array([ [75+ np.random.normal(0, 2), # Temperature ~75C1.2+ np.random.normal(0, 0.1), # Vibration ~1.2 mm/s8.5+ np.random.normal(0, 0.3), # Current ~8.5 A1450+ np.random.normal(0, 10)] # RPM ~1450for _ inrange(1000)])detector = IsolationForestDetector(contamination=0.01)detector.train(normal_data)# Test data including anomaliestest_data = np.array([ [76, 1.3, 8.4, 1455], # Normal [74, 1.2, 8.6, 1448], # Normal [92, 3.8, 12.1, 1480], # Anomaly: overheating + high vibration [75, 1.1, 8.5, 1200], # Anomaly: RPM drop])predictions = detector.predict(test_data)scores = detector.score(test_data)for i, (pred, score) inenumerate(zip(predictions, scores)): status ="ANOMALY"if pred ==-1else"Normal"print(f"Sample {i}: {status} (score: {score:.3f})")# Output:# Sample 0: Normal (score: 0.215)# Sample 1: Normal (score: 0.198)# Sample 2: ANOMALY (score: -0.156) <- Detected!# Sample 3: ANOMALY (score: -0.089) <- Detected!
Explore how Isolation Forest detects anomalies in 2D sensor data. The scatter plot shows a cluster of “normal” motor readings (temperature vs. vibration). Use the sliders to place a test point and adjust the contamination parameter. Points far from the cluster center are easier to “isolate” and receive higher anomaly scores.
Show code
viewof iso_test_x = Inputs.range([15,100], {value:76,step:0.5,label:"Test point Temperature (C)"})
Show code
viewof iso_test_y = Inputs.range([0.2,5.0], {value:1.3,step:0.05,label:"Test point Vibration (mm/s)"})
{const s = iso_score_result;const ieee_teal ="#16A085";const ieee_red ="#E74C3C";const ieee_orange ="#E67E22";const ieee_navy ="#2C3E50";const bg ="var(--bs-light, #f8f9fa)";const fg ="var(--bs-body-color, #212529)";const score_color = s.is_anomaly? ieee_red : ieee_teal;const status_text = s.is_anomaly?"ANOMALY DETECTED":"Normal";returnhtml`<div style="background:${bg}; color:${fg}; border-left:4px solid ${score_color}; border-radius:4px; padding:1rem; font-family:Arial,sans-serif; max-width:620px; margin-top:0.5rem;"> <div style="display:grid; grid-template-columns:1fr 1fr 1fr; gap:0.6rem; margin-bottom:0.8rem;"> <div style="text-align:center; padding:0.5rem; background:${ieee_navy}; color:white; border-radius:4px;"> <div style="font-size:0.7rem; opacity:0.85;">Distance from cluster</div> <div style="font-size:1.2rem; font-weight:bold;">${s.mahal.toFixed(2)}σ</div> </div> <div style="text-align:center; padding:0.5rem; background:${score_color}; color:white; border-radius:4px;"> <div style="font-size:0.7rem; opacity:0.85;">Anomaly Score</div> <div style="font-size:1.2rem; font-weight:bold;">${s.anomaly_score.toFixed(3)}</div> </div> <div style="text-align:center; padding:0.5rem; background:${s.is_anomaly? ieee_red : ieee_teal}; color:white; border-radius:4px;"> <div style="font-size:0.7rem; opacity:0.85;">Decision</div> <div style="font-size:1.2rem; font-weight:bold;">${status_text}</div> </div> </div> <div style="font-size:0.85rem;"> <strong>How it works:</strong> The test point at (${iso_test_x.toFixed(1)}C, ${iso_test_y.toFixed(2)} mm/s) is ${s.mahal.toFixed(1)} standard deviations from the cluster center. Estimated path length: ${s.path_length.toFixed(1)} splits (normal ≈ ${s.cn.toFixed(1)} splits). With contamination=${iso_contamination.toFixed(3)}, threshold score = ${s.threshold_score.toFixed(3)}. </div> <div style="font-size:0.8rem; margin-top:0.5rem; color:#7F8C8D;"> <strong>Try:</strong> Move the test point far from the cluster (e.g., 95C, 4.0 mm/s) to see how anomaly scores increase. Then increase the contamination rate to see how the threshold loosens. </div> </div>`;}
Advantages:
No labeled anomalies needed for training
Handles high-dimensional data well
Computationally efficient (can run on edge gateways)
Limitations:
“contamination” parameter must be set correctly
Struggles with evolving “normal” (concept drift)
13.4.2 One-Class SVM
Core Concept: Learn a boundary that encloses normal data points in high-dimensional space. Points outside boundary are anomalies.
When to Use:
Small, well-defined “normal” region
You have clean training data (no anomalies in training set)
Moderate dimensionality (up to ~50 features)
Trade-Off: Can achieve tighter decision boundaries than Isolation Forest for well-defined normal regions, but computationally expensive (\(O(n^2)\) to \(O(n^3)\) training) – typically cloud-deployed.
13.4.3 K-Nearest Neighbors (KNN)
Core Concept: For each point, find K nearest neighbors. If average distance to neighbors is high, point is anomalous.
Algorithm:
For each test point:
1. Find K nearest neighbors in training data
2. Calculate average distance to K neighbors
3. If distance > threshold, flag as anomaly
Pros/Cons:
Aspect
Rating
Notes
Accuracy
High
Very accurate for low-dimensional data
Speed
Low
Slow for large datasets (must search all points)
Memory
High
Must store entire training dataset
Edge Deploy
No
Too resource-intensive for constrained devices
Tradeoff: Statistical Methods vs ML-Based Anomaly Detection
Option A: Statistical methods (Z-score, IQR, ARIMA residuals) Option B: Machine learning approaches (Isolation Forest, Autoencoders, LSTM) Decision Factors: Statistical methods require minimal training data, are interpretable, run efficiently on edge devices, and work well for point anomalies with known distributions. ML methods excel at detecting complex multi-dimensional patterns, collective anomalies, and subtle deviations without explicit threshold tuning, but require representative training data and more computational resources. For edge deployment with <100KB RAM, use statistical methods. For cloud-based detection of complex industrial equipment patterns, use ML. Many production systems use statistical methods at the edge for fast response, with ML in the cloud for deeper analysis.
Try It: Z-Score Anomaly Detection for IoT Sensors
Objective: Z-score detection is the simplest statistical anomaly method and often the first line of defense on edge devices. It flags readings that deviate more than a threshold number of standard deviations from the mean. This code processes a real-time sensor stream and demonstrates why Z-score works well for point anomalies but misses collective anomalies.
import numpy as npclass ZScoreDetector:"""Real-time Z-score anomaly detector using Welford's online algorithm. Uses O(1) memory -- ideal for microcontrollers."""def__init__(self, threshold=3.0, warmup=30):self.threshold = thresholdself.warmup = warmupself.n =0self.mean =0.0self.M2 =0.0# Running sum of squared deviationsdef update(self, value):"""Process one reading. Returns (z_score, is_anomaly)."""self.n +=1 delta = value -self.meanself.mean += delta /self.nself.M2 += delta * (value -self.mean)ifself.n <self.warmup:return0.0, False std = (self.M2 / (self.n -1)) **0.5if std ==0:return0.0, False z_score =abs(value -self.mean) / stdreturnround(z_score, 2), z_score >self.threshold# Example: Temperature sensor with injected anomaliesreadings =list(np.random.normal(23.0, 1.5, 100))readings[40] =35.0# Spike anomalyreadings[70] =5.0# Sensor disconnectdetector = ZScoreDetector(threshold=3.0, warmup=30)for i, temp inenumerate(readings): z, is_anomaly = detector.update(temp)if is_anomaly:print(f"[ANOMALY] Reading {i}: {temp:.1f}C (z={z})")# Detects sudden spikes but misses gradual drift -- use Isolation# Forest or LSTM for collective anomaly detection.
What to Observe:
Z-score uses only O(1) memory (running mean and variance) – ideal for ESP32 microcontrollers with limited RAM
The warmup period prevents false positives during initial calibration when mean/std are unstable
Sudden spikes (35C) and drops (5C) are easily detected as they exceed 3 standard deviations
Gradual drift (23 -> 24 -> 25 -> 26.5C) is missed because each individual step is within normal range
In production, combine Z-score at the edge (fast, catches obvious failures) with ML in the cloud (catches subtle patterns)
Try It: Confusion Matrix for IoT Anomaly Alerting
Objective: Evaluate anomaly detection performance using a confusion matrix. In IoT, the cost of false negatives (missed failures) and false positives (unnecessary maintenance) differs dramatically, so accuracy alone is misleading. This code calculates precision, recall, and F1-score and shows how to tune thresholds for different cost structures.
import numpy as npdef confusion_matrix_analysis(y_true, y_pred):"""Compute confusion matrix and metrics for anomaly detection.""" tp =sum(1for t, p inzip(y_true, y_pred) if t ==1and p ==1) fp =sum(1for t, p inzip(y_true, y_pred) if t ==0and p ==1) tn =sum(1for t, p inzip(y_true, y_pred) if t ==0and p ==0) fn =sum(1for t, p inzip(y_true, y_pred) if t ==1and p ==0) precision = tp / (tp + fp) if (tp + fp) >0else0 recall = tp / (tp + fn) if (tp + fn) >0else0 f1 =2* precision * recall / (precision + recall) if (precision + recall) >0else0return {"tp": tp, "fp": fp, "tn": tn, "fn": fn,"precision": precision, "recall": recall, "f1": f1}# Simulated: 1000 readings, 2% anomaly rate, 80% recall, 30 false positivesm = confusion_matrix_analysis(y_true, y_pred)print(f"Precision: {m['precision']:.1%}, Recall: {m['recall']:.1%}, F1: {m['f1']:.1%}")# IoT cost analysis: missing failures is far more expensive than false alarmscost_fp, cost_fn =150, 5000# $/eventtotal_cost = m['fp'] * cost_fp + m['fn'] * cost_fn# Key insight: 97% accuracy sounds great, but missing 4 failures# costs $20,000 vs $4,500 for false alarms. In IoT, recall matters most.
What to Observe:
Accuracy is misleading for rare anomalies: 96% accuracy sounds good, but a model that predicts “normal” for everything gets 98% accuracy with 2% anomaly rate
Precision answers “when we alert, are we right?” – low precision means too many false alarms causing alert fatigue
Recall answers “do we catch real failures?” – low recall means missed equipment failures
Cost asymmetry drives threshold tuning: if a missed failure costs 33x more than a false alarm, optimize for recall
For safety-critical IoT (medical, industrial), target recall above 95% even at the cost of more false positives
13.5 Deep Learning
When anomalies have complex temporal or spatial patterns, deep learning becomes powerful.
13.5.1 Autoencoders
Core Concept: Neural network that compresses data (encodes) then reconstructs it (decodes). Train on normal data – anomalies reconstruct poorly (high reconstruction error).
Architecture:
Input -> Encoder (compress) -> Bottleneck -> Decoder (reconstruct) -> Output
[Normal data only] |
Compare with Input
High error = Anomaly
Explore how an autoencoder detects anomalies by measuring reconstruction error. A “normal” signal is a sinusoidal temperature pattern. Adjust the amplitude and frequency offset to deviate from normal, and watch the reconstruction error rise. When the error exceeds the threshold, the reading is flagged as anomalous.
{const s = ae_signals;const ieee_teal ="#16A085";const ieee_red ="#E74C3C";const ieee_navy ="#2C3E50";const ieee_orange ="#E67E22";const bg ="var(--bs-light, #f8f9fa)";const fg ="var(--bs-body-color, #212529)";const is_anomalous = s.anomaly_count>10;const status_color = is_anomalous ? ieee_red : ieee_teal;const status_text = is_anomalous ?"ANOMALOUS PATTERN":"NORMAL PATTERN";returnhtml`<div style="background:${bg}; color:${fg}; border-left:4px solid ${status_color}; border-radius:4px; padding:1rem; font-family:Arial,sans-serif; max-width:620px; margin-top:0.5rem;"> <div style="display:grid; grid-template-columns:1fr 1fr 1fr 1fr; gap:0.5rem; margin-bottom:0.8rem;"> <div style="text-align:center; padding:0.4rem; background:${ieee_navy}; color:white; border-radius:4px;"> <div style="font-size:0.65rem; opacity:0.85;">Mean Error</div> <div style="font-size:1.1rem; font-weight:bold;">${s.mean_error.toFixed(3)}</div> </div> <div style="text-align:center; padding:0.4rem; background:${ieee_orange}; color:white; border-radius:4px;"> <div style="font-size:0.65rem; opacity:0.85;">Max Error</div> <div style="font-size:1.1rem; font-weight:bold;">${s.max_error.toFixed(3)}</div> </div> <div style="text-align:center; padding:0.4rem; background:${ieee_orange}; color:white; border-radius:4px;"> <div style="font-size:0.65rem; opacity:0.85;">Threshold</div> <div style="font-size:1.1rem; font-weight:bold;">${s.threshold.toFixed(3)}</div> </div> <div style="text-align:center; padding:0.4rem; background:${status_color}; color:white; border-radius:4px;"> <div style="font-size:0.65rem; opacity:0.85;">Anomalies</div> <div style="font-size:1.1rem; font-weight:bold;">${s.anomaly_count}/100</div> </div> </div> <div style="font-size:0.85rem;"> <strong>Status: <span style="color:${status_color};">${status_text}</span></strong> -- The autoencoder learned the normal sinusoidal pattern (amplitude=3.0C, base frequency). Your signal (amplitude=${ae_amplitude.toFixed(1)}C, freq offset=${ae_freq_offset.toFixed(2)}) produces${s.anomaly_count} points above the ${ae_threshold_pct}th-percentile threshold. </div> <div style="font-size:0.8rem; margin-top:0.5rem; color:#7F8C8D;"> <strong>Try:</strong> Set amplitude to 6.0 or frequency offset to 0.3 to create a clearly anomalous pattern. Then raise the threshold percentile to see how it reduces false positives at the cost of missing subtle anomalies. </div> </div>`;}
When Autoencoders Excel:
High-dimensional sensor data (>20 features)
Complex, nonlinear relationships between sensors
Sufficient training data (>10,000 samples)
Deployment Considerations:
Training: Cloud-based (GPUs)
Inference: Can run on edge gateways (Raspberry Pi 4, NVIDIA Jetson)
13.5.2 LSTM (Long Short-Term Memory) Networks
Core Concept: Recurrent neural network that learns temporal sequences. Predict next value(s) based on recent history – large prediction errors indicate anomalies.
Architecture for Anomaly Detection:
Sliding window of past N timesteps -> LSTM layers -> Predict next value
|
Compare prediction vs actual
High error = Anomaly
Ideal for:
Vibration patterns (bearing wear develops over time)
Network traffic (DDoS attacks show temporal patterns)
Power consumption (usage patterns span hours/days)
Try It: LSTM Autoencoder for Time-Series Anomaly Detection
Objective: Build a conceptual LSTM autoencoder that learns normal temporal patterns in sensor data and flags sequences that deviate from learned behavior. Unlike point-based methods (Z-score, Isolation Forest), LSTMs capture sequential dependencies – detecting anomalies like “temperature rising while compressor is running” that only make sense in context.
import numpy as npclass SimpleLSTMAutoencoder:"""Conceptual LSTM autoencoder for anomaly detection. Architecture: Input window -> Encoder -> Bottleneck -> Decoder -> Reconstruction High reconstruction error = anomaly. Use tf.keras.layers.LSTM in production."""def__init__(self, window_size=20, threshold_percentile=95):self.window_size = window_sizeself.threshold_percentile = threshold_percentiledef train(self, normal_data):"""Learn normal patterns from sliding windows of historical data.""" windows = np.array([normal_data[i:i +self.window_size]for i inrange(len(normal_data) -self.window_size)])self.training_mean = np.mean(windows, axis=0)self.training_std = np.std(windows, axis=0) +1e-8 errors =self._compute_errors(windows)self.threshold = np.percentile(errors, self.threshold_percentile)def _compute_errors(self, windows):"""MSE between input and reconstructed output (simplified).""" normalized = (windows -self.training_mean) /self.training_stdreturn np.mean(normalized **2, axis=1)def detect(self, data):"""Flag windows where reconstruction error exceeds threshold.""" windows = np.array([data[i:i +self.window_size]for i inrange(len(data) -self.window_size)]) errors =self._compute_errors(windows)return errors, errors >self.threshold# Train on normal sinusoidal temperature pattern, test with injected anomaliesnormal =22+3* np.sin(2* np.pi * np.arange(2000) /200) + np.random.normal(0, 0.5, 2000)test =22+3* np.sin(2* np.pi * np.arange(500) /200) + np.random.normal(0, 0.5, 500)test[200:230] =25.0# Pattern break: flatline (individually normal, sequentially anomalous)test[350:400] += np.linspace(0, 8, 50) # Gradual driftmodel = SimpleLSTMAutoencoder(window_size=20)model.train(normal)errors, anomalies = model.detect(test)# LSTM detects pattern breaks and drift that Z-score misses,# because it learned the expected SEQUENCE, not just valid ranges.
What to Observe:
The LSTM learns the normal sinusoidal temperature pattern from training data – not just the valid range
A “flatline” at 25C is detected as anomalous even though 25C is a perfectly normal temperature – the PATTERN is wrong
Gradual drift is detected because the sequence diverges from the expected sinusoidal shape
This is fundamentally different from Z-score (which only checks individual values) and Isolation Forest (which checks feature combinations but not sequences)
In production, use tf.keras.layers.LSTM with GPU training and TensorRT inference on edge devices
Explore how LSTM-style sequence prediction detects temporal anomalies. The chart shows a motor vibration time series with a predicted next value based on recent history (rolling average). Inject an anomaly at a specific timestamp and adjust the sensitivity threshold to see how prediction errors spike at anomalous points.
Show code
viewof lstm_anomaly_pos = Inputs.range([20,180], {value:100,step:1,label:"Anomaly position (timestep)"})
{const s = lstm_series_data;const ieee_teal ="#16A085";const ieee_red ="#E74C3C";const ieee_navy ="#2C3E50";const ieee_orange ="#E67E22";const bg ="var(--bs-light, #f8f9fa)";const fg ="var(--bs-body-color, #212529)";const has_detections = s.anomalies_detected>0;const status_color = has_detections ? ieee_red : ieee_teal;returnhtml`<div style="background:${bg}; color:${fg}; border-left:4px solid ${status_color}; border-radius:4px; padding:1rem; font-family:Arial,sans-serif; max-width:620px; margin-top:0.5rem;"> <div style="display:grid; grid-template-columns:1fr 1fr 1fr 1fr; gap:0.5rem; margin-bottom:0.8rem;"> <div style="text-align:center; padding:0.4rem; background:${ieee_navy}; color:white; border-radius:4px;"> <div style="font-size:0.65rem; opacity:0.85;">Mean Error</div> <div style="font-size:1.1rem; font-weight:bold;">${s.mean_err.toFixed(4)}</div> </div> <div style="text-align:center; padding:0.4rem; background:${ieee_navy}; color:white; border-radius:4px;"> <div style="font-size:0.65rem; opacity:0.85;">Error Std Dev</div> <div style="font-size:1.1rem; font-weight:bold;">${s.std_err.toFixed(4)}</div> </div> <div style="text-align:center; padding:0.4rem; background:${ieee_orange}; color:white; border-radius:4px;"> <div style="font-size:0.65rem; opacity:0.85;">Peak Error</div> <div style="font-size:1.1rem; font-weight:bold;">${s.peak_error.toFixed(4)}</div> </div> <div style="text-align:center; padding:0.4rem; background:${status_color}; color:white; border-radius:4px;"> <div style="font-size:0.65rem; opacity:0.85;">Anomalies Found</div> <div style="font-size:1.1rem; font-weight:bold;">${s.anomalies_detected}</div> </div> </div> <div style="font-size:0.85rem;"> <strong>How LSTM detection works:</strong> The predictor uses a rolling window of ${lstm_window_size} steps to estimate the next value. An anomaly of magnitude ${lstm_anomaly_mag.toFixed(1)} mm/s at step ${Math.round(lstm_anomaly_pos)} causes the actual value to diverge from the prediction. Threshold = mean + ${lstm_threshold_mult.toFixed(1)} x std = ${s.threshold.toFixed(4)}. </div> <div style="font-size:0.8rem; margin-top:0.5rem; color:#7F8C8D;"> <strong>Try:</strong> Set anomaly magnitude to 0 (no anomaly) and observe baseline errors. Then increase magnitude to 3+ to see the prediction error spike. Lower the sensitivity multiplier to catch subtler anomalies (more false positives). Increase the window size to see smoother predictions but slower anomaly response. </div> </div>`;}
13.5.3 Isolation Forest Anomaly Score Explorer
Use this interactive calculator to explore how Isolation Forest scores change with dataset size and path length. Shorter paths (fewer splits to isolate a point) yield higher anomaly scores.
Select your data characteristics below and get a recommendation for which ML anomaly detection method best fits your scenario. The selector considers dimensionality, data availability, temporal structure, and deployment constraints.
Option A (Batch Training): Train models offline on historical data (weeks to months of sensor readings), deploy frozen models that remain static until manually retrained. Training takes hours on cloud GPUs, but inference is fast (10-100ms) with predictable performance. Option B (Online Learning): Continuously update model parameters as new data arrives, adapting to concept drift in real-time. Each new reading triggers incremental model updates (100-500ms), allowing detection thresholds to evolve with changing equipment behavior. Decision Factors: Choose batch training when normal behavior is stable over deployment lifetime (factory equipment with fixed operating conditions), regulatory compliance requires reproducible model behavior (medical devices, financial systems), or compute resources at edge cannot support training workloads. Choose online learning when “normal” shifts frequently (seasonal HVAC patterns, user behavior in smart homes), deploying to environments without historical training data (new equipment types), or concept drift causes batch models to degrade within weeks. Hybrid approach: batch-trained base model with online threshold adaptation handles most industrial IoT scenarios - the model structure stays frozen but decision boundaries adjust monthly based on confirmed anomaly feedback.
13.7 Worked Examples
Worked Example: Isolation Forest for HVAC Anomaly Detection
Scenario: A commercial building management company deploys anomaly detection across 200 HVAC units to identify failing compressors before breakdown. Each unit reports temperature, humidity, power consumption, and compressor vibration every 30 seconds.
Given:
Training data: 6 months of normal operation (5.2 million samples from 200 units)
Feature vector per sample: [supply_temp, return_temp, humidity, power_kW, vibration_rms, outdoor_temp]
Known anomaly rate from historical maintenance records: ~0.8%
Deployment target: Raspberry Pi 4 gateway (4GB RAM) serving 50 units
Requirement: <2 second detection latency, <5% false positive rate
Steps:
Data preparation:
Remove known maintenance periods and startup transients (first 10 minutes after power-on)
Normalize features: Z-score standardization using training set statistics
Clean dataset: 4.8 million samples after filtering
Result: Isolation Forest with 50 trees achieves 87% recall and 3.1% false positive rate. Over 3 months of production operation, the system detected 23 of 26 actual failures (88.5% real-world recall) with 847 false alarms across 200 units (1.4 false alarms per unit per month). Estimated savings: $156,000 in prevented emergency repairs vs. $127,000 investigation cost for false alarms ($150 per dispatch).
Key Insight: Isolation Forest’s contamination parameter directly controls the precision-recall tradeoff. Start with your historical anomaly rate, then adjust based on the cost ratio of false negatives (missed failures) to false positives (unnecessary investigations). For HVAC where emergency repairs cost 5-10x preventive maintenance, accepting higher false positive rates (2-5%) to achieve 85%+ recall is economically justified.
Worked Example: Autoencoder for Multi-Sensor Industrial Anomaly Detection
Scenario: A semiconductor fab deploys deep learning anomaly detection on plasma etch chambers. Each chamber has 47 sensors (gas flows, pressures, RF power, temperatures, optical emission spectra) sampled at 10 Hz during 3-minute etch processes.
Given:
Training data: 45,000 normal etch processes (no defective wafers) over 8 months
Input shape per process: 1,800 timesteps x 47 features = 84,600 values
Anomaly definition: Process that produces defective wafer (known from downstream metrology)
Historical defect rate: 0.3% of processes
Deployment: NVIDIA Jetson AGX Xavier (32GB RAM, 512 CUDA cores)
Requirement: Process-level decision within 10 seconds of etch completion
Steps:
Feature engineering for temporal data:
Segment each process into 6 phases (30s each): gas stabilization, plasma ignition, main etch (3 phases), purge
Extract per-phase statistics: mean, std, min, max, slope for each sensor (47 x 5 x 6 = 1,410 features)
Dimensionality: 84,600 raw values compressed to 1,410 phase-aggregated features
Autoencoder architecture design:
Encoder: 1410 -> 512 -> 256 -> 64 (bottleneck)
Decoder: 64 -> 256 -> 512 -> 1410
Activation: ReLU (hidden), linear (output)
Total parameters: 1.8 million (7.2 MB float32)
Training configuration:
Loss function: MSE reconstruction loss
Optimizer: Adam, learning rate 1e-4 with cosine decay
Batch size: 128 processes
Training epochs: 150 (early stopping at epoch 112)
Training time: 2.4 hours on 4x V100 GPUs
Validation reconstruction error (normal): mean 0.023, std 0.008
Threshold determination:
Compute reconstruction error on 5,000 validation processes (all normal)
Set threshold at 99.5th percentile: 0.047 (3 sigma above mean)
Test on 200 known-defective processes: 178 exceed threshold (89% recall)
Test on 5,000 normal processes: 24 exceed threshold (0.48% false positive rate)
Deployment optimization for Jetson:
Convert to TensorRT: 2.1x inference speedup
Mixed precision (FP16): Model size 3.6 MB, inference time 82ms
Batch inference: 10 processes in parallel, 340ms total
Result: TensorRT-optimized autoencoder achieves 89% recall on defective processes with 0.48% false positive rate. End-to-end latency is 3.2 seconds (feature extraction) + 0.34s (inference) = 3.5 seconds per process. Over 6-month deployment, the system flagged 156 processes; 141 were true defects (90.4% precision), preventing $2.1M in downstream processing of defective wafers. 15 false alarms cost ~$45K in additional metrology.
Key Insight: For high-dimensional time-series anomaly detection, phase-based feature aggregation dramatically reduces autoencoder input size while preserving discriminative information. A 60x dimensionality reduction (84,600 to 1,410) enables faster training, smaller models, and more robust generalization. The key is domain knowledge - knowing that plasma etch has distinct phases allows meaningful aggregation rather than arbitrary windowing.
Concept Relationships
How These Concepts Connect:
Anomaly types determine detection methods: Point anomalies need statistical methods, contextual need time-series, collective need ML
ML methods handle pattern complexity: When Z-score fails because patterns are multi-dimensional, Isolation Forest succeeds
Training data drives method selection: Unsupervised (Isolation Forest) works with no labels, supervised needs labeled anomalies
See Also
Foundation Concepts (read these first): - Anomaly Types - Understand point vs contextual vs collective anomalies before choosing ML methods - Statistical Methods - Learn when statistical approaches fail and ML becomes necessary
Related Detection Methods:
Time-Series Methods - ARIMA and STL for temporal patterns before trying LSTM
Experiment 1: Compare Detection Methods on Multi-Sensor Data
Modify the Isolation Forest example to compare three approaches on the same motor monitoring data:
Z-score (statistical): Apply to each sensor individually
Isolation Forest (ML): Consider all sensors together
Manual rules: Define thresholds for temperature, vibration, current
Generate test data where individual sensors are in range but the combination is anomalous (e.g., high temperature + high vibration + low RPM). Which method catches it?
Experiment 2: Tune Contamination Parameter
The Isolation Forest contamination parameter controls how many anomalies to expect. Try values from 0.001 to 0.1 on your data:
contamination=0.001 (0.1%): Very strict, few alerts
contamination=0.01 (1%): Balanced
contamination=0.05 (5%): Loose, many alerts
Plot precision vs recall for each setting. How does this trade-off compare to adjusting Z-score thresholds?
Experiment 3: Edge Deployment Simulation
Convert the Isolation Forest detector to use fixed-point arithmetic (integers instead of floats) and measure: - Memory usage (bytes) - Processing time per sample (milliseconds) - Accuracy change vs full-precision model
Can this run on an ESP32 with 520KB RAM? If not, how would you simplify it?
Challenge: Build a Hybrid Detector
Combine Z-score (for fast edge detection) with Isolation Forest (for deeper analysis): 1. Edge device: Z-score flags readings >3 sigma 2. Gateway: Isolation Forest analyzes flagged readings + recent history 3. Cloud: LSTM autoencoder for long-term pattern analysis
Compare detection rate and latency vs running each method alone.
Interactive Quiz: Match Concepts
Interactive Quiz: Sequence the Steps
Common Pitfalls
1. Training on contaminated data
The mistake: Including undetected anomalies in your “normal” training set. If 2% of your training data are actually anomalies, the model learns them as normal and will never flag similar patterns.
Why it’s easy to make: You rarely have perfectly clean data. Equipment may have had intermittent faults during the training period that went unnoticed.
How to fix: Use robust preprocessing – remove statistical outliers from training data before fitting the ML model. For Isolation Forest, set a conservative contamination parameter during a “cleaning pass” to filter the training set first, then retrain on the cleaned data.
2. Ignoring concept drift
The mistake: Deploying a model trained on summer data and expecting it to work in winter. Normal behavior shifts with seasons, equipment aging, process changes, and workload patterns.
Why it’s easy to make: The model performs well during initial deployment (same conditions as training), creating false confidence. Degradation is gradual – precision drops from 90% to 60% over months without obvious failure.
How to fix: Monitor model performance metrics (false positive rate, reconstruction error distribution) continuously. Retrain quarterly or implement online threshold adaptation. Alert when the baseline error distribution shifts significantly from the training distribution.
3. Using accuracy as the primary metric
The mistake: Reporting “98% accuracy” for an anomaly detector when the anomaly rate is 1%. A model that always predicts “normal” achieves 99% accuracy while catching zero anomalies.
Why it’s easy to make: Accuracy is the default metric in most ML tutorials. It is deeply misleading for imbalanced problems like anomaly detection.
How to fix: Use precision (are my alerts real?), recall (am I catching failures?), and F1-score. For IoT, also compute the cost-weighted metric: total_cost = FP × cost_per_false_alarm + FN × cost_per_missed_failure. Optimize for the metric that matches your business objective.
4. Over-engineering when statistical methods suffice
The mistake: Deploying an LSTM autoencoder on a cloud GPU to detect a temperature sensor exceeding 80C. A simple threshold check on the ESP32 would achieve the same result at 1/1000th the latency and cost.
Why it’s easy to make: ML methods are exciting and feel more sophisticated. There is pressure to use “AI” even when simpler approaches work.
How to fix: Start with the simplest method that meets requirements. If Z-score with a 3-sigma threshold catches 95% of your anomalies, you do not need Isolation Forest. Graduate to ML only when: (a) anomalies are multi-dimensional, (b) patterns are temporal, or (c) statistical methods have unacceptable false positive rates.
5. Setting contamination parameter without domain knowledge
The mistake: Using the default contamination=0.1 (10%) from sklearn tutorials when your actual anomaly rate is 0.5%. This floods operators with false alarms, causing alert fatigue, and operators start ignoring real anomalies.
Why it’s easy to make: The contamination parameter seems like just another hyperparameter. Without historical anomaly data, choosing the right value feels arbitrary.
How to fix: Always start with your domain’s historical failure rate. If unknown, start conservative (0.005-0.01) and adjust based on operator feedback. Track the false positive rate weekly and tune until it matches the acceptable alert burden for your team (typically 1-5 false alarms per device per month).
Label the Diagram
13.8 Summary
Machine learning methods excel at detecting complex anomalies that statistical methods miss:
Isolation Forest: Best for multi-dimensional data, no labels required, edge-deployable
Autoencoders: Excel at high-dimensional sensor data with complex correlations
LSTM: Captures temporal sequence patterns for time-series anomalies
Key Takeaway: Start with statistical methods for point anomalies. Graduate to ML for collective anomalies, high dimensions, or when statistical methods have too many false positives.
For Kids: Meet the Sensor Squad!
Sammy the Sensor was puzzled. Every day, he measured the temperature in the school greenhouse, and every day it was around 25 degrees. But one morning, the reading said 25 degrees, the humidity said 80%, and the light level was high – all perfectly normal numbers on their own. Yet Lila the LED started flashing red!
“Why are you alarming?” Sammy asked. “Each of my readings looks fine!”
Max the Microcontroller grinned. “That is where machine learning comes in! I have been watching ALL your readings together for months. Normally when light is high, temperature goes up a little and humidity goes down. Today, all three are high at the same time – that combination has never happened before!”
“So it is not about one number being wrong,” Sammy realized. “It is about the pattern being unusual!”
“Exactly!” said Max. “I used something called an Isolation Forest. Think of it like a game of 20 Questions. Normal readings take lots of questions to tell apart, but weird combinations stand out right away – like finding a penguin at a beach party!”
Bella the Battery chimed in: “And the best part is, Max learned what ‘normal’ looks like all by himself, just by watching data for a while. He did not need anyone to tell him what ‘broken’ looks like!”
They investigated and found that a sprinkler had accidentally turned on during sunny hours, explaining the unusual combination. Machine learning saved the plants from overwatering!
Key lesson: Machine learning finds unusual patterns that simple rules miss, especially when you need to look at many measurements together!
13.9 What’s Next
If you want to…
Read this
Build production-grade detection pipelines with edge-cloud architecture