12 ML-Based Anomaly Detection

analytics-ml

anomaly

machine

learning

12.1 Start With the Story

Picture an IoT team using the ideas in ML-Based Anomaly Detection during a live operations review. A device has produced messy evidence, an analytic step is about to change an alert or control decision, and someone has to explain why the result should be trusted.

Read this page as that path from sensor evidence to accountable action. Start with what the system observes, keep the model or data treatment visible, and finish with the check that would convince an operator, maintainer, or auditor to act.

12.2 ML Scores Subtle Patterns

Machine-learning anomaly detection is useful when the abnormal evidence is a pattern across several features, a sequence, or a learned representation of normal behavior. It should not be the first answer to every alerting problem. Start with statistical and time-series baselines, then move to ML when a simpler detector cannot represent the evidence boundary.

ML belongs where the anomaly type demands it: collective or high-dimensional patterns often need learned models, while point and contextual anomalies may still be better served by simpler statistical or time-series methods.

ML detectors still need a baseline, score, threshold, and review record. Isolation Forest scores how easily a point can be separated from normal examples. Autoencoders score reconstruction error after learning to reproduce normal feature vectors. Sequence models score departures from expected temporal patterns. None of these scores is operationally useful unless the feature set, training window, threshold, drift state, and sensor-health checks are kept with the alert.

The first design question is therefore not “which model is most advanced?” but “which evidence boundary do we need?” A single tank-level spike can usually be handled as a point anomaly with a robust threshold. A temperature pattern that is normal during production but abnormal during shutdown is contextual and needs mode or calendar features. A slowly worsening combination of vibration, current, and heat is collective; each feature may look acceptable alone, while the combined feature vector is drifting away from healthy operation. That is the case where ML earns its complexity, and where reviewable model evidence prevents guesswork.

ML anomaly detection is not a substitute for evidence. Treat the model score as one reviewed signal alongside context, sensor health, feature provenance, threshold policy, and operator feedback.

Isolation Forest

Good for tabular feature vectors when anomalies are rare and labeled fault examples are scarce.

Autoencoder

Good for high-dimensional normal-pattern learning when reconstruction error is explainable enough for review.

Sequence Model

Good when order and timing matter, such as vibration windows, current traces, or operating-cycle stages.

Hybrid Rule

Often best in production: simple health and safety gates first, ML score second, operator review third.

Overview Knowledge Check

12.3 Pick Models by Evidence Need

Before training, define the alert record. Decide which features are allowed, which period counts as normal, which maintenance events must be excluded, and how the score will be reviewed. If labeled fault examples are available, use them for evaluation, but many IoT anomaly systems begin with mostly normal data and only sparse incident labels. That makes validation examples and operator feedback essential.

Worked example: motor feature-vector detector
features per 10-second window:
- temperature residual after operating-mode baseline
- RMS vibration
- current residual
- RPM stability
- sensor-health flags

first pass:
train Isolation Forest on reviewed normal windows
hold out a recent normal period for false-alert review
replay known maintenance or fault windows when available

alert record:
window start/end, feature vector, model version, score,
threshold, top contributing features, sensor-health state,
and operator disposition.

deployment rule:
edge or gateway can compute features and cheap scores.
cloud review can handle retraining, drift checks, and fleet comparison.

Make the feature vector small enough to review. If a gateway summarizes each 10-second motor window into 12 numeric features and stores 7 days at 8,640 windows per day, the review set is 12 x 8,640 x 7 = 725,760 feature values before labels and metadata. That is manageable for replay and drift plots; storing every raw vibration sample may not be.

Situation

Model Choice

Strength

Guardrail

Few labels, tabular features

Isolation Forest, one-class SVM, or robust distance score.

Works with mostly normal data and multiple engineered features.

Validate contamination or threshold against alert workload and known incidents.

Many sensors per window

Autoencoder or PCA-style reconstruction baseline.

Learns normal feature relationships and flags high reconstruction error.

Keep reconstruction-error examples and feature-level residuals for explanation.

Order matters

LSTM, temporal convolution, transformer, or simpler sequence residual.

Scores changes in timing, phase, and sequence shape.

Compare with simpler residual models before adopting a heavy network.

Safety-critical alert

Hybrid rules plus ML score and operator confirmation.

Combines deterministic health gates with learned pattern evidence.

Fail closed: a model score alone should not hide missing sensors or stale context.

Practitioner Knowledge Check

12.4 Drift and Deployment Limits

ML anomaly models are sensitive to training data. If the training period contains hidden faults, the model may learn faulty behavior as normal. If the site changes equipment, firmware, sampling rate, sensor placement, or operating modes, the model can drift. If the score is opaque, operators may ignore alerts even when the model is technically correct. A production ML detector therefore needs model governance, not just model training.

Deployment tier matters. Feature extraction and small-tree inference may fit at an edge gateway. Larger neural models, fleet comparison, retraining, and drift dashboards usually belong in cloud or offline workflows. A reliable design records where each step runs, what happens when connectivity is missing, and how alerts degrade when required features are absent.

Size the deployment path before choosing a model. A gateway with 256 MB RAM can comfortably score a compact Isolation Forest over 12 features, but it may struggle with a neural model that needs large rolling tensors and GPU-style batching. If 500 motors send one 12-feature window every 10 seconds, the fleet produces 500 x 6 x 60 = 180,000 windows per hour. Pushing all raw windows to cloud may be acceptable on Ethernet, but edge scoring plus cloud review may be cheaper on cellular or congested plant networks.

Drift review should compare both feature distributions and score distributions. For example, if median motor load rises after a process change, a stable model may suddenly score normal production as abnormal. The right response is not simply to retrain overnight. First replay a recent normal window, check whether known incidents still score high, compare false-alert volume with operator capacity, and record the approved baseline version. That makes model change a controlled release instead of a hidden dashboard adjustment.

Training Window

Normal data must be curated; hidden faults and maintenance periods should be excluded or explicitly labeled.

Threshold Policy

Score thresholds should be tuned against false-alert burden, missed incidents, and operator capacity.

Drift Monitor

Feature distributions, score distributions, and operator feedback should trigger retraining review.

Fallback Mode

If features, context, or model service are missing, the system should report degraded evidence instead of pretending certainty.

Risk

Symptom

Root Cause

Control

Training contamination

Known fault patterns are scored as normal.

Faulty or transitional periods were included as normal training data.

Curate training windows and replay incident examples before release.

Concept drift

False alerts rise or real incidents score lower over time.

Normal operating behavior changed after deployment.

Monitor feature and score drift; require retraining review after site changes.

Opaque alert

Operators cannot tell why the model escalated a window.

Alert lacks feature residuals, examples, threshold, or model version.

Store explanations, nearest normal examples, top residuals, and operator feedback.

Under-the-Hood Knowledge Check

12.5 Summary

ML-based anomaly detection helps when IoT evidence is multivariate, sequential, or too complex for a simple threshold. Isolation Forest is a practical first choice for tabular feature windows with few labels. Autoencoders score reconstruction error for high-dimensional normal patterns. Sequence models can help when timing and order are central to the anomaly. In every case, the deployed system must preserve feature provenance, model version, score, threshold, drift state, sensor-health evidence, and review feedback.

Key Takeaway

Use ML only when the evidence boundary justifies it. A useful ML anomaly alert is not just a score; it is a reviewable record of features, model version, threshold policy, drift checks, and sensor-health state.