10  Anomaly Detection for IoT Systems

10.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Distinguish anomalies across three categories: point, contextual, and collective anomalies
  • Apply statistical methods (Z-score, IQR) for real-time anomaly detection on edge devices
  • Implement time-series anomaly detection using ARIMA and exponential smoothing
  • Configure machine learning approaches (Isolation Forest, autoencoders) for complex pattern detection
  • Design end-to-end anomaly detection pipelines balancing edge and cloud processing
  • Assess detection systems using appropriate metrics for imbalanced IoT data
In 60 Seconds

Anomaly detection identifies the rare critical events hidden within billions of normal IoT sensor readings by scoring data against statistical or learned baselines. The key takeaway: start with Z-score or IQR at the edge and escalate to ML models only when patterns exceed statistical explanations.

10.2 How It Works

10.2.1 Overview

A single anomalous vibration pattern detected at a wind turbine bearing could indicate imminent failure—catching it early saves $250,000 in repair costs and prevents 2 weeks of downtime. Missing that subtle signal costs millions in lost generation and emergency repairs. This is the critical role of anomaly detection in IoT.

Anomaly detection systems operate through a three-stage pipeline:

  1. Feature Extraction: Raw sensor data (vibration, temperature, pressure) is transformed into statistical features (mean, variance, frequency spectrum components) or time-series representations (ARIMA residuals, autoencoder reconstruction errors)
  2. Anomaly Scoring: Each data point receives an anomaly score using statistical methods (Z-score distance from mean), time-series forecasts (prediction error magnitude), or ML models (Isolation Forest path lengths, autoencoder reconstruction loss)
  3. Threshold Classification: Scores above a tuned threshold trigger alerts - the threshold balances precision (avoiding false alarms) against recall (catching all real anomalies) based on domain-specific cost functions

In traditional IT systems, anomalies are relatively rare—a server crash or security breach. In IoT, we face a unique challenge: billions of sensors generating trillions of data points, where 99.99% is normal and only 0.01% represents critical anomalies. How do we find that needle in the haystack, in real-time, at scale?

Consider a wind farm with 100 turbines, each reporting 12 sensor readings every second. That’s \(100 \times 12 \times 86400 = 103.68\) million readings per day. If anomalies occur at 0.01% rate, we expect \(103.68M \times 0.0001 = 10,368\) anomalous readings daily to investigate.

Traditional storage of all readings: \(103.68M \times 50\text{ bytes} = 5.18\text{ GB/day}\), costing ~$4/month in S3. But transmitting this data from remote turbines at \(\$0.09/\text{GB}\) costs \(5.18 \times 30 \times 0.09 = \$14/\text{month}\) in bandwidth alone.

Edge anomaly detection reduces cloud transmission by 99%: only flagged anomalies plus hourly summaries go to cloud. New daily volume: \((10,368 \times 50) + (100 \times 12 \times 24 \times 20) = 1.1\text{ MB/day}\). Monthly bandwidth cost drops to \(\$0.03\)—a 467x reduction. The anomaly detection algorithm running on a \(\$200\) edge gateway pays for itself in under 15 months through bandwidth savings alone—and much faster when factoring in avoided downtime costs.

Minimum Viable Understanding: Anomaly Detection Fundamentals

Core Concept: Anomaly detection identifies data points or patterns that deviate significantly from expected behavior - finding the 0.01% of critical events in the 99.99% of normal sensor readings.

Why It Matters: A missed anomaly in predictive maintenance costs $250,000+ in emergency repairs; too many false alarms cause “alert fatigue” where operators ignore real warnings. The business value lies in finding the balance.

Key Takeaway: Start simple - Z-score thresholds catch 80% of anomalies with 10% of the complexity. Only add ML models (Isolation Forest, autoencoders) when statistical methods fail. Always evaluate with precision/recall, never accuracy - in imbalanced IoT data, a “99% accurate” model that never detects anomalies is worthless.

Think of anomaly detection like having a really attentive crossing guard at a school:

  • Normal traffic: Cars and buses pass by every day at about the same times, following the speed limit
  • Anomaly: A car speeding through at 80 mph during school hours - the crossing guard blows the whistle!

In IoT systems, your sensors are constantly watching for “speeding cars”:

  • A temperature sensor normally reads 20-25°C in an office building
  • Suddenly it reads 85°C - ANOMALY! Something is wrong (maybe a fire!)
  • The system alerts the operator before damage occurs

Why anomaly detection matters: Without it, you’d have to manually monitor millions of sensor readings. That’s impossible! Anomaly detection acts like thousands of crossing guards watching your entire system 24/7.

Imagine your sensors are like security guards at a museum!

10.2.2 The Sensor Squad Adventure: Finding the Sneaky Thief

The Sensor Squad has been hired to guard a famous museum at night. Their job? Find anything UNUSUAL!

The Normal Pattern:

  • Security cameras show empty hallways from 6 PM to 6 AM
  • Temperature stays at 68°F all night
  • Motion sensors detect NOTHING after closing time

One Night, Something Strange Happens:

🔍 Sammy the Motion Sensor notices: “Hey! I detected movement in Gallery 3 at 2 AM!”

🌡️ Tina the Temperature Sensor adds: “And the temperature near the back door dropped to 45°F - someone opened it!”

🎥 Cam the Camera confirms: “I see a shadowy figure near the paintings!”

The Sensor Squad used ANOMALY DETECTION! They knew what “normal” looked like, so when something “abnormal” happened, they sounded the alarm!

10.2.3 Real Examples:

  • Normal: Your smartwatch tracks 5,000-10,000 steps daily

  • Anomaly: Your smartwatch shows 0 steps for 3 days (Did you lose it? Are you sick?)

  • Normal: Your family’s smart thermostat runs the AC for 2 hours in summer evenings

  • Anomaly: The AC runs for 12 hours straight (Maybe someone left a window open!)

10.2.4 The Three Types of “Weird Stuff” Sensors Find:

  1. One Really Weird Thing (Point Anomaly): A temperature of 1000°F in your kitchen - something is VERY wrong!
  2. Weird Timing (Contextual Anomaly): 80°F is normal in summer, but WEIRD in winter if you live in Alaska!
  3. Weird Patterns (Collective Anomaly): One loud noise at night is okay (maybe a car). But 50 loud noises in a pattern might mean someone is trying to break in!

Try This at Home: Keep track of how many times your family opens the refrigerator each day for a week. What’s normal? If one day it’s opened 100 times, that’s an anomaly! (Maybe you’re having a party? 🎉)

10.3 Prerequisites

Before diving into this chapter, you should be familiar with:

  • Big Data Overview: Understanding IoT data characteristics—volume, velocity, variety—provides context for why anomaly detection requires specialized techniques that handle streaming data at scale
  • Modeling and Inferencing: Knowledge of machine learning fundamentals, feature extraction, and model deployment prepares you for ML-based anomaly detection approaches
  • Edge Compute Patterns: Familiarity with edge vs cloud processing trade-offs helps you design anomaly detection pipelines that balance latency, bandwidth, and computational constraints
  • Multi-Sensor Data Fusion: Understanding sensor correlation and fusion techniques is essential for detecting collective anomalies that span multiple sensors
How This Chapter Fits Into Data Analytics

Anomaly detection is a critical real-time analytics capability that sits at the intersection of streaming data, machine learning, and edge computing:

If you’re unsure about time-series analysis or ML fundamentals, review those earlier chapters before diving into advanced detection algorithms.

10.4 Chapter Guide

This chapter is split into focused sections covering different aspects of anomaly detection. Work through them in order for a comprehensive understanding.

Classification diagram showing the six chapter topics in the anomaly detection module: types, statistical methods, time-series methods, machine learning approaches, detection pipelines, and performance metrics

10.4.1 1. Types of Anomalies

What You’ll Learn: Understand the three fundamental anomaly types—point, contextual, and collective—and how to match each type to appropriate detection methods and deployment locations.

Key Topics:

  • Point anomalies: Single outliers detected with statistical methods
  • Contextual anomalies: Context-dependent deviations requiring time-series analysis
  • Collective anomalies: Pattern-based anomalies needing ML approaches
  • Detection method selection framework
  • Edge/fog/cloud deployment strategies

Time: ~10 minutes | Difficulty: Intermediate


10.4.2 2. Statistical Methods

What You’ll Learn: Master lightweight statistical techniques for real-time point anomaly detection on edge devices.

Key Topics:

  • Z-score detection for Gaussian distributions
  • IQR (Interquartile Range) for skewed data
  • Moving statistics and adaptive thresholds
  • Edge deployment and resource constraints
  • When statistical methods suffice vs. when to escalate to ML

Time: ~15 minutes | Difficulty: Intermediate


10.4.3 3. Time-Series Methods

What You’ll Learn: Apply time-series analysis techniques to detect contextual anomalies that depend on temporal patterns.

Key Topics:

  • ARIMA forecasting for anomaly detection
  • Exponential smoothing methods
  • Seasonal decomposition (STL)
  • Handling concept drift and seasonal patterns
  • Forecast error thresholds

Time: ~12 minutes | Difficulty: Intermediate


10.4.4 4. Machine Learning Approaches

What You’ll Learn: Deploy advanced ML algorithms for complex collective anomalies and multivariate patterns.

Key Topics:

  • Isolation Forest for unsupervised detection
  • Autoencoders for reconstruction-based anomaly detection
  • LSTM networks for sequence anomalies
  • One-Class SVM and density estimation
  • Model training, tuning, and deployment trade-offs

Time: ~18 minutes | Difficulty: Advanced


10.4.5 5. Detection Pipelines

What You’ll Learn: Design end-to-end real-time detection systems that balance edge and cloud processing.

Key Topics:

  • Three-tier architecture (edge/fog/cloud)
  • Streaming data pipelines
  • Ensemble detection (combining multiple methods)
  • Alert fusion and prioritization
  • Latency vs. accuracy trade-offs

Time: ~14 minutes | Difficulty: Advanced


10.4.6 6. Performance Metrics

What You’ll Learn: Evaluate and optimize anomaly detection systems using appropriate metrics for imbalanced IoT data.

Key Topics:

  • Confusion matrix for imbalanced data
  • Precision, recall, and F1 score
  • False positive rate and alert fatigue
  • ROC curves and threshold tuning
  • Domain-specific cost functions
  • Hands-on lab: Building a detection system

Time: ~20 minutes | Difficulty: Advanced


10.5 Learning Path

Recommended Study Sequence

For Beginners (Focus on fundamentals): 1. Read Types of Anomalies to understand classification 2. Study Statistical Methods for practical edge detection 3. Review Performance Metrics to evaluate your systems

For Practitioners (Comprehensive coverage): 1. Work through all chapters in order 2. Complete the hands-on lab in Performance Metrics 3. Experiment with the interactive tools in each chapter

For Advanced Users (Deep specialization): 1. Skim Types and Statistical Methods 2. Focus on Machine Learning Approaches 3. Study Detection Pipelines for production deployment 4. Use Performance Metrics to optimize your systems

10.6 Interactive Tools

Explore how edge-based anomaly detection reduces data transmission costs. Adjust the parameters to see how sensor count, sampling rate, and anomaly rate affect bandwidth savings.

Experiment with anomaly detection algorithms using these interactive simulations:

10.7 Cross-Hub Connections

Enhance your learning with these interactive resources:

Practice & Simulation:

  • Simulations Hub: Test anomaly detection algorithms with interactive sensor data simulations—experiment with Z-score, IQR, and Isolation Forest on realistic IoT datasets
  • Quizzes Hub: Self-assess your understanding of statistical methods, confusion matrices, and edge/cloud deployment trade-offs

Clarify Concepts:

  • Knowledge Gaps Hub: Common misconceptions about false positive rates, concept drift, and when to use ML vs statistical methods
  • Videos Hub: Visual explanations of ARIMA forecasting, autoencoder architectures, and real-world anomaly detection case studies

Navigate Connections:

  • Knowledge Map: See how anomaly detection connects to edge computing, time-series databases, and predictive maintenance workflows

10.8 Quick Reference

Topic Key Concepts Go To
Anomaly classification Point, contextual, collective Types
Statistical detection Z-score, IQR, moving averages Statistical Methods
Time-series methods ARIMA, exponential smoothing, STL Time-Series
ML approaches Isolation Forest, autoencoders, LSTM Machine Learning
Pipeline design Edge/fog/cloud, ensemble methods Pipelines
Evaluation metrics Precision, recall, F1, ROC Metrics

10.9 Knowledge Check

Test your understanding of anomaly detection fundamentals:

A manufacturing plant monitors vibration patterns from a motor. The vibration suddenly spikes 10x above normal for a single reading, then returns to normal. What type of anomaly is this?

  1. Contextual anomaly
  2. Collective anomaly
  3. Point anomaly
  4. Temporal anomaly

C) Point anomaly - A point anomaly is a single data point that deviates significantly from the rest of the data. The spike is isolated to one reading, making it a classic point anomaly. Contextual anomalies depend on context (time, season), collective anomalies involve patterns across multiple points, and temporal anomaly is not a standard category.

A smart building tracks HVAC energy consumption. The system needs to detect when energy usage is anomalously high FOR THE CURRENT SEASON (summer vs winter). Which method is most appropriate?

  1. Z-score with fixed threshold
  2. Isolation Forest
  3. Contextual anomaly detection with seasonal decomposition
  4. One-Class SVM

C) Contextual anomaly detection with seasonal decomposition - The key phrase is “for the current season.” The same energy reading might be normal in winter (heating) but anomalous in summer. This requires contextual awareness - time-series decomposition (STL) separates seasonal patterns, allowing detection of deviations from expected seasonal behavior. Fixed Z-score thresholds don’t account for seasonality.

An anomaly detection system for a nuclear power plant has the following results: 98% accuracy, 60% precision, 95% recall. Which statement is TRUE?

  1. The system is excellent because accuracy is 98%
  2. The system is problematic because 40% of alerts are false alarms
  3. The system should be tuned for higher precision even if recall drops
  4. Accuracy is the most important metric for safety-critical systems

B) The system is problematic because 40% of alerts are false alarms - With 60% precision, 40% of detected anomalies are false positives. In safety-critical systems, this matters because operators may develop “alert fatigue” and ignore real warnings. However, 95% recall means we’re catching most actual anomalies. For nuclear plants, missing a real anomaly (low recall) is worse than false alarms, so B is true and C is dangerous advice. Accuracy is misleading in imbalanced data - with 99.99% normal data, a model that always predicts “normal” would have 99.99% accuracy but zero anomaly detection.

Which anomaly detection method is MOST suitable for deployment on a battery-powered edge device with 32KB RAM?

  1. Deep autoencoder with 5 hidden layers
  2. Isolation Forest with 1000 trees
  3. Z-score with exponential moving average
  4. LSTM network for sequence analysis

C) Z-score with exponential moving average - Resource constraints on edge devices require lightweight algorithms. Z-score calculation needs only mean and standard deviation (or exponentially weighted versions that update incrementally with O(1) memory). Autoencoders, Isolation Forests, and LSTMs require significant memory for model parameters and cannot run on 32KB RAM devices. Statistical methods are the go-to choice for edge deployment.

An IoT system processes 1 million sensor readings daily, with approximately 100 genuine anomalies (0.01%). A detection model reports 500 anomalies with 80 true positives. Calculate the precision and recall.

  1. Precision: 16%, Recall: 80%
  2. Precision: 80%, Recall: 16%
  3. Precision: 80%, Recall: 80%
  4. Precision: 99.96%, Recall: 80%

A) Precision: 16%, Recall: 80%

  • Precision = True Positives / Predicted Positives = 80 / 500 = 16%
  • Recall = True Positives / Actual Positives = 80 / 100 = 80%

This illustrates a key challenge: even with 80% recall (catching 80 of 100 real anomalies), the low precision means 420 of the 500 alerts are false alarms. Operators would be overwhelmed reviewing 500 alerts daily when only 80 are real. This is why precision matters critically in production systems.

A smart city traffic system monitors 10,000 intersections with cameras. Engineers need to detect accidents in <5 seconds while minimizing cloud bandwidth costs. Which architecture is BEST?

  1. Send all video frames to cloud for centralized ML processing
  2. Run Isolation Forest on edge cameras, send all anomaly scores to cloud
  3. Run motion detection and simple rule-based filtering on edge, send only flagged frames to cloud for deep learning verification
  4. Store all video locally, process in batch overnight with autoencoders

C) Run motion detection and simple rule-based filtering on edge, send only flagged frames to cloud for deep learning verification

This hybrid approach satisfies both constraints:

  • Latency (<5 seconds): Simple motion detection and rule-based checks (sudden stop, unusual object positions) run instantly on edge devices
  • Bandwidth: Only flagged frames (~1% of video) are sent to cloud, reducing bandwidth by 99%
  • Accuracy: Cloud-based deep learning verifies edge decisions, reducing false positives

Option A violates bandwidth constraints (sending all frames is expensive). Option B still sends too much data (anomaly scores from every frame). Option D violates the <5 second latency requirement (overnight batch processing).

Objective: Implement real-time Z-score anomaly detection on simulated IoT sensor data and visualize which readings are flagged as anomalous.

import random
import math

# Simulate IoT temperature sensor with occasional anomalies
random.seed(42)
normal_mean, normal_std = 22.5, 1.2  # Normal office temperature

readings = []
for i in range(100):
    if random.random() < 0.05:  # 5% chance of anomaly
        # Inject anomalous readings (sensor malfunction or real event)
        value = random.choice([random.gauss(50, 3),   # Spike high
                               random.gauss(-5, 2),    # Spike low
                               random.gauss(22.5, 8)]) # High variance
    else:
        value = random.gauss(normal_mean, normal_std)
    readings.append(round(value, 1))

# Z-Score anomaly detection with rolling window
WINDOW_SIZE = 20
THRESHOLD = 3.0
anomalies = []

for i in range(WINDOW_SIZE, len(readings)):
    window = readings[i - WINDOW_SIZE:i]
    mean = sum(window) / len(window)
    variance = sum((x - mean) ** 2 for x in window) / len(window)
    std = math.sqrt(variance) if variance > 0 else 0.001

    z_score = abs(readings[i] - mean) / std

    if z_score > THRESHOLD:
        anomalies.append((i, readings[i], round(z_score, 2)))
        print(f"  [ANOMALY] Index {i:3d}: {readings[i]:6.1f}C "
              f"(z-score: {z_score:.2f}, mean: {mean:.1f}, std: {std:.1f})")

print(f"\nTotal readings: {len(readings)}")
print(f"Anomalies detected: {len(anomalies)} "
      f"({100 * len(anomalies) / len(readings):.1f}%)")
print(f"Normal readings: {len(readings) - len(anomalies)}")

What to Observe:

  1. Z-score measures how many standard deviations a reading is from the rolling mean
  2. A threshold of 3.0 catches extreme outliers while allowing normal variation
  3. The rolling window adapts to gradual changes (concept drift)
  4. This algorithm uses minimal memory (just the window)—ideal for edge deployment

Objective: Compare IQR-based detection with Z-score on the same data, demonstrating how IQR is more robust to existing outliers.

import random

# Simulated sensor data with outliers already present
random.seed(42)
data = [random.gauss(25, 1.5) for _ in range(50)]
# Inject 5 outliers
data[10] = 55.0   # Equipment malfunction
data[22] = -8.0   # Sensor dropout
data[35] = 48.2   # Heat event
data[41] = 60.0   # Fire alarm range
data[47] = -12.0  # Freezing anomaly

def detect_iqr(values, multiplier=1.5):
    """IQR-based outlier detection"""
    sorted_vals = sorted(values)
    n = len(sorted_vals)
    q1 = sorted_vals[n // 4]
    q3 = sorted_vals[3 * n // 4]
    iqr = q3 - q1
    lower = q1 - multiplier * iqr
    upper = q3 + multiplier * iqr
    return lower, upper

def detect_zscore(values, threshold=3.0):
    """Z-score outlier detection"""
    mean = sum(values) / len(values)
    std = (sum((x - mean) ** 2 for x in values) / len(values)) ** 0.5
    return mean - threshold * std, mean + threshold * std

# Compare methods
iqr_low, iqr_high = detect_iqr(data)
z_low, z_high = detect_zscore(data)

print("Detection Bounds Comparison:")
print(f"  IQR method:    [{iqr_low:.1f}, {iqr_high:.1f}]")
print(f"  Z-score method: [{z_low:.1f}, {z_high:.1f}]")

print("\nOutlier Detection Results:")
print(f"{'Index':>5} {'Value':>7} {'IQR':>8} {'Z-Score':>8}")
print("-" * 32)
for i, v in enumerate(data):
    is_iqr = v < iqr_low or v > iqr_high
    is_z = v < z_low or v > z_high
    if is_iqr or is_z:
        iqr_flag = "OUTLIER" if is_iqr else "normal"
        z_flag = "OUTLIER" if is_z else "normal"
        print(f"{i:5d} {v:7.1f} {iqr_flag:>8} {z_flag:>8}")

# Count detections
iqr_count = sum(1 for v in data if v < iqr_low or v > iqr_high)
z_count = sum(1 for v in data if v < z_low or v > z_high)
print(f"\nIQR detected: {iqr_count} outliers")
print(f"Z-score detected: {z_count} outliers")
print(f"\nKey insight: Existing outliers inflate the Z-score's mean and std,")
print(f"making it LESS sensitive. IQR uses the median and is robust to outliers.")

What to Observe:

  1. Z-score’s bounds are wider because existing outliers inflate the mean and standard deviation
  2. IQR uses percentiles (Q1, Q3) which are resistant to extreme values
  3. IQR typically catches more outliers when the data already contains some
  4. Choose Z-score for clean data; choose IQR when outliers may already be present

10.10 Worked Example: Industrial Motor Monitoring

Let’s walk through a complete anomaly detection scenario for an industrial motor in a manufacturing plant.

Scenario: Predicting Motor Bearing Failure

Context: A factory monitors 500 motors using vibration sensors (accelerometers) sampling at 1 kHz. Each motor generates 86.4 million readings per day. Total data: 43.2 billion readings daily.

Challenge: Detect bearing degradation before catastrophic failure (typical lead time: 2-4 weeks of subtle vibration changes before failure).

Cost Impact: Early detection saves $50,000-$250,000 per motor in emergency repairs and lost production.

Edge computing architecture diagram showing the three-tier anomaly detection pipeline: edge devices performing statistical filtering, fog nodes for cross-sensor correlation, and cloud servers for ML-based analysis

10.10.1 Step-by-Step Detection Pipeline

Step 1: Edge Processing (per motor)

  • Input: Raw vibration signal at 1 kHz (86.4M readings/day)
  • Processing: FFT to extract frequency components, Z-score on RMS vibration level
  • Output: Only readings exceeding 3σ threshold (~0.1% = 86,400 candidates/day)
  • Resource: 32KB RAM microcontroller, <1ms latency

Step 2: Fog Aggregation (per floor)

  • Input: Anomaly candidates from 50 motors (~4.3M candidates/day)
  • Processing: Cross-motor correlation (environmental vs. motor-specific)
  • Output: Correlated motor-specific anomalies (~10,000/day)
  • Insight: If 40 motors spike simultaneously → environmental (HVAC change), ignore. If 1 motor spikes alone → motor-specific, escalate.

Step 3: Cloud ML Analysis

  • Input: Correlated anomalies from all factory floors
  • Processing: Isolation Forest on multivariate features (frequency spectrum, temperature, current draw)
  • Output: Anomaly score + degradation trend prediction
  • Alert: Motors with >0.8 anomaly score + upward trend → maintenance queue

10.10.2 Results

Metric Value
Data reduction 99.9% (43.2B → 10K readings/day to cloud)
Detection lead time 14 days average before failure
False positive rate 3% (alerts that don’t require action)
True positive rate 92% (caught 92 of 100 eventual failures)
Cost savings $2.3M annually (23 failures prevented × $100K average)
Key Takeaway from This Example

The three-tier architecture achieves massive data reduction (99.9%) while maintaining high detection accuracy. Statistical methods at the edge handle the bulk of data; ML at the cloud handles the complexity. This is the hybrid approach in action.

10.11 Concept Check

Scenario: A predictive maintenance system processes 1 million sensor readings daily with 100 genuine failures (0.01%). Two detection models:

Model A: Detects 95 of 100 failures (95% recall) but generates 500 total alerts (405 false positives, 81% false alarm rate) Model B: Detects 80 of 100 failures (80% recall) but generates 100 total alerts (20 false positives, 20% false alarm rate)

Which model should a plant operator choose?

Answer: Model B. While Model A catches more failures (95 vs 80), operators must investigate 500 alerts daily (81% of which are false alarms). Model B’s 100 alerts/day is manageable, and the 80% recall still catches most critical failures. In practice, alert fatigue from Model A would cause operators to ignore warnings, reducing effective recall below 80% anyway. Precision matters as much as recall in production systems.

10.12 Concept Relationships

Anomaly detection integrates statistical methods, machine learning, and edge computing architectures:

To Statistical Methods (Time-Series Analytics): Z-score and IQR detection provide lightweight edge-deployable algorithms catching 80% of anomalies with <1ms latency - suitable for battery-powered devices with 32KB RAM.

To Machine Learning (Modeling and Inferencing): Isolation Forest (unsupervised), autoencoders (reconstruction-based), and LSTM (sequence) networks handle multivariate patterns and collective anomalies beyond statistical methods’ capabilities.

To Edge Computing (Edge Compute Patterns): Three-tier architecture places statistical filtering at edge (99.9% data reduction), correlation at fog tier (cross-sensor validation), and complex ML at cloud (computational resources) - achieving <5 second end-to-end latency.

To Security (IoT Intrusion Detection): The same algorithms (Isolation Forest, autoencoders) detect both sensor data anomalies (predictive maintenance) and network traffic anomalies (intrusion detection) - different domain, same techniques.

10.13 See Also

For detection method deep dives:

For system design:

For related analytics:

For application domains:

Common Pitfalls

In imbalanced IoT data (0.01% anomalies), a detector that always predicts ‘normal’ achieves 99.99% accuracy yet catches zero real events. Always evaluate with precision, recall, and F1-score.

A 3σ Z-score threshold is a starting point, not a rule. Tune thresholds using real cost ratios: missed failure cost vs false alarm investigation cost.

Models trained on summer patterns will generate false alarms in winter. Build adaptive thresholds or retrain periodically so ‘normal’ evolves with the environment.

Isolation Forest with 100 trees will not fit in 32 KB of RAM. Profile memory before choosing an algorithm; statistical methods are the only viable option on microcontrollers.

A false alarm at 3 AM on a non-critical pump and one during peak production on a safety valve carry very different costs. Weight alert priority by asset criticality.

10.14 Summary

Anomaly detection is a critical capability for IoT systems that process billions of sensor readings to find the rare but critical events that indicate failures, security breaches, or opportunities.

Topic Key Takeaway
Anomaly Types Point (single outliers), contextual (depends on time/season), collective (patterns across multiple points) - classification drives method selection
Statistical Methods Z-score and IQR are lightweight, suitable for edge deployment, catch 80% of anomalies with 10% complexity
Time-Series Methods ARIMA, STL decomposition handle seasonality and trend - essential for contextual anomalies
ML Approaches Isolation Forest (efficient), autoencoders (multivariate), LSTM (sequences) - use when statistical methods fail
Pipeline Design Three-tier architecture: edge for filtering, fog for correlation, cloud for complex ML
Metrics Use precision/recall for imbalanced data - accuracy is misleading when anomalies are < 1%
Key Decision Framework

Use this decision tree to select the right anomaly detection approach:

Decision tree flowchart for selecting anomaly detection methods: statistical methods for point anomalies on edge devices, time-series methods for seasonal contextual anomalies, and machine learning for complex multivariate collective anomalies

When to use statistical methods (Z-score, IQR):

  1. Point anomaly detection with known normal distributions
  2. Edge deployment with severe resource constraints
  3. Real-time detection with <10ms latency requirements
  4. Data follows relatively stable patterns

When to use time-series methods (ARIMA, STL):

  1. Strong seasonal or trend components in data
  2. Contextual anomalies that depend on time of day/year
  3. Need to handle concept drift over time

When to use machine learning (Isolation Forest, autoencoders):

  1. Multivariate data with complex correlations
  2. Collective anomalies requiring pattern recognition
  3. Unknown or evolving normal behavior
  4. Sufficient computational resources (fog/cloud tier)

Hybrid approach (recommended): Use statistical methods at edge for fast filtering, ML at cloud for complex analysis - this catches obvious anomalies instantly while allowing sophisticated detection of subtle patterns.

Connection: Data Anomaly Detection meets Security Intrusion Detection

The same algorithms used for detecting sensor data anomalies (Z-score, Isolation Forest, autoencoders) are used in IoT network intrusion detection systems (NIDS). A temperature spike that triggers a maintenance alert and a suspicious traffic pattern that triggers a security alert are both “anomalies”—they just have different consequences. Isolation Forest trained on normal network traffic patterns can detect port scans, data exfiltration, and botnet C2 communication with the same unsupervised approach used for predictive maintenance. The key difference is the cost of errors: in maintenance, a false negative means a missed failure; in security, a false negative means an undetected breach. See IoT Intrusion Detection for security-specific applications of these techniques.

10.15 What’s Next

If you want to… Read this
Understand the three anomaly types Types of Anomalies
Apply Z-score and IQR at the edge Statistical Methods
Detect seasonal and trending anomalies Time-Series Methods
Use Isolation Forest and autoencoders Machine Learning Approaches
Design edge/fog/cloud pipelines Detection Pipelines
Evaluate with precision and recall Performance Metrics