12  Statistical Anomaly Methods

12.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Apply Z-Score Detection: Implement standard deviation-based anomaly detection for Gaussian data
  • Use IQR Method: Deploy robust outlier detection that works with any distribution
  • Build Adaptive Thresholds: Create moving statistics systems that handle concept drift
  • Select Appropriate Methods: Choose between statistical approaches based on data characteristics
In 60 Seconds

Statistical anomaly detection — Z-score and IQR — provides lightweight, interpretable algorithms that run in microseconds on microcontrollers with as little as 32 KB RAM, making them the first-choice method for edge-tier anomaly detection. These methods work best on data with known, stable distributions; switch to time-series or ML methods when seasonality or complex correlations dominate.

Statistical anomaly detection uses math to spot unusual sensor readings. Think of measuring the heights of everyone in a room – someone three times the average height would clearly stand out. Similarly, if a temperature sensor suddenly reads far outside its normal range, statistical methods can automatically flag it as suspicious without needing complex machine learning.

Minimum Viable Understanding: Statistical Anomaly Detection

Core Concept: Statistical methods detect anomalies by measuring how far a value deviates from “normal” - either in standard deviations (Z-score) or quartile ranges (IQR).

Why It Matters: Statistical methods are computationally lightweight, require no training data, and can run on resource-constrained edge devices. They catch 80% of anomalies with 10% of the complexity of ML approaches.

Key Takeaway: Use Z-score for Gaussian data, IQR for skewed or bounded data, and adaptive thresholds when “normal” changes over time.

12.2 Prerequisites

Before diving into this chapter, you should be familiar with:

~20 min | Intermediate | P10.C01.U02

Key Concepts

  • Z-score: A measure of how many standard deviations a data point lies from the mean of its reference window; values beyond ±3σ are conventionally flagged as anomalies.
  • IQR (Interquartile Range): The range between the 25th and 75th percentiles (Q3 − Q1); used to define outlier bounds robust to existing outliers, unlike the Z-score.
  • Exponential moving average (EMA): A weighted average that gives more weight to recent observations, allowing the reference mean to adapt gradually to slow trends without storing a full window.
  • Adaptive threshold: A detection boundary that updates automatically as the underlying data distribution changes, preventing false alarms when normal operating conditions shift.
  • Sliding window: A fixed-size buffer of the most recent N readings used as the reference distribution for computing Z-scores or IQR bounds.
  • False alarm rate: The proportion of normal readings incorrectly flagged as anomalies; for a Gaussian distribution, a 3σ threshold yields a 0.27% false alarm rate.

12.3 Introduction

Statistical approaches form the foundation of anomaly detection in IoT systems. They are computationally lightweight, suitable for edge deployment, and work well when data follows known distributions. Unlike machine learning methods that require training data and significant compute resources, statistical detectors can run on microcontrollers with as little as 16 bytes of RAM. This chapter covers three complementary methods – Z-score, IQR, and adaptive thresholds – each suited to different data characteristics and deployment constraints.

How It Works

Statistical anomaly detection works by quantifying how unusual a value is relative to historical patterns, using mathematical measures of deviation.

Z-Score Mechanism:

  1. Establish baseline: Calculate mean (μ) and standard deviation (σ) from recent normal readings
  2. Measure deviation: For each new reading, compute how many standard deviations it is from the mean
  3. Apply threshold: Values beyond 3σ (99.7% confidence interval) are flagged as anomalies

Why this works: In Gaussian distributions, 99.7% of values fall within ±3σ. Readings beyond this are statistically rare (<0.3% probability if data is normal).

IQR Mechanism:

  1. Find quartiles: Sort historical data, identify 25th percentile (Q1) and 75th percentile (Q3)
  2. Calculate range: IQR = Q3 - Q1 (the middle 50% of data)
  3. Define fences: Lower fence = Q1 - 1.5×IQR, Upper fence = Q3 + 1.5×IQR
  4. Flag outliers: Values outside fences are anomalies

Why this works: IQR uses percentiles (order statistics) rather than mean/std, making it robust to outliers and distribution shape. Works even when data is skewed or has heavy tails.

Adaptive Thresholds Mechanism:

  1. Sliding window: Maintain buffer of last N readings (e.g., 50 samples)
  2. Recompute statistics: Calculate mean and std from window, not all-time history
  3. Dynamic baseline: As new readings arrive, old readings drop out, keeping statistics current

Why this works: Sensor characteristics drift due to aging, temperature, etc. Fixed thresholds become obsolete. Adaptive methods track changing baselines automatically.

12.4 Z-Score (Standard Deviation Method)

Core Concept: Measure how many standard deviations a data point is from the mean.

Formula:

\[Z = \frac{x - \mu}{\sigma}\]

  • \(x\) = observed value
  • \(\mu\) = mean of dataset
  • \(\sigma\) = standard deviation
  • Flag an anomaly when \(|Z| > 3\) (99.7% confidence interval)

The Z-score threshold directly controls your false alarm rate. For a Gaussian distribution, the probability of exceeding a threshold is:

\[P(|Z| > z) = 2 \times (1 - \Phi(z))\]

where \(\Phi(z)\) is the cumulative distribution function. Concrete thresholds:

  • \(|Z| > 2.0\): 4.6% of normal readings flagged (1 false alarm per 22 readings)
  • \(|Z| > 2.5\): 1.2% false alarm rate (1 per 80 readings)
  • \(|Z| > 3.0\): 0.27% false alarm rate (1 per 370 readings)
  • \(|Z| > 4.0\): 0.006% false alarm rate (1 per 15,800 readings)

Example: A temperature sensor samples every minute (1,440 readings/day). With \(Z > 3.0\), expect \(1,440 \times 0.0027 \approx 4\) false alarms per day even when nothing is wrong. For 100 sensors, that’s 400 false alarms daily. Adjust threshold to \(Z > 3.5\) (0.047% rate) for 68 daily false alarms across the fleet.

12.4.1 Z-Score Threshold Explorer

Adjust the sensor parameters below to see how Z-score thresholds translate to real detection boundaries and false alarm rates.

When It Works:

  • Data follows Gaussian (normal) distribution
  • You have sufficient historical data to calculate mu and sigma
  • Distribution is stable over time (no concept drift)

When It Fails:

  • Non-normal distributions (skewed, multi-modal)
  • Data with seasonal patterns
  • When “normal” range changes dynamically

Implementation Example:

import numpy as np

class ZScoreDetector:
    def __init__(self, threshold=3.0, window_size=100):
        self.threshold = threshold
        self.window_size = window_size
        self.values = []

    def update(self, value):
        """Streaming Z-score anomaly detection"""
        self.values.append(value)

        # Keep only recent window
        if len(self.values) > self.window_size:
            self.values.pop(0)

        # Need minimum samples
        if len(self.values) < 30:
            return False, 0.0

        mean = np.mean(self.values)
        std = np.std(self.values)

        if std == 0:  # Avoid division by zero
            return False, 0.0

        z_score = abs((value - mean) / std)
        is_anomaly = z_score > self.threshold

        return is_anomaly, z_score

# Usage example: Temperature sensor monitoring
detector = ZScoreDetector(threshold=3.0)

# Seed with 30+ normal readings to meet minimum sample requirement
import random
random.seed(42)
for _ in range(35):
    normal_temp = 22.0 + random.uniform(-0.5, 0.5)
    detector.update(normal_temp)

# Now test with normal and anomalous readings
for temp in [22.1, 22.5, 21.8, 22.3, 22.0]:
    anomaly, score = detector.update(temp)
    print(f"Temp: {temp}C, Z-score: {score:.2f}, Anomaly: {anomaly}")

# Anomalous reading
anomaly, score = detector.update(-5.0)  # Sensor malfunction
print(f"Temp: -5.0C, Z-score: {score:.2f}, Anomaly: {anomaly}")
# Output: Temp: -5.0C, Z-score: ~75.0, Anomaly: True

Implementation sketch:

  • Keep a recent sliding window of sensor values.
  • Wait until you have at least 30 baseline readings.
  • Compute mean and standard deviation from the window.
  • Calculate abs((value - mean) / std) for each new reading.
  • Flag an anomaly when the score is greater than the threshold.

Temperature example: after learning a normal baseline around 22 C, a -5 C reading produces an extremely large Z-score and is flagged immediately.

12.5 Interquartile Range (IQR)

When to Choose IQR Over Z-Score

Not all unusual values indicate real-world problems. A temperature spike from 22C to -40C is almost certainly a sensor malfunction, not actual weather. IQR is the better choice when your data is skewed or has natural outliers because it uses percentiles rather than mean/standard deviation, making it immune to the distortions that extreme values cause in Z-score calculations. For IoT sensors with physical constraints (humidity 0-100%, voltage 0-5V), combine IQR with hard bounds – any reading outside physical limits is immediately flagged as a sensor fault, not an anomaly.

Core Concept: Identify outliers based on the spread of the middle 50% of data.

Formula:

\[\mathrm{IQR} = Q_3 - Q_1\] \[\mathrm{Lower\ Bound} = Q_1 - 1.5 \times \mathrm{IQR}\] \[\mathrm{Upper\ Bound} = Q_3 + 1.5 \times \mathrm{IQR}\]

  • Flag an outlier when \(x < \mathrm{Lower\ Bound}\) or \(x > \mathrm{Upper\ Bound}\)
  • \(Q_1\) = 25th percentile
  • \(Q_3\) = 75th percentile

Advantages over Z-Score:

  • No distribution assumptions (works with skewed data)
  • Robust to extreme outliers
  • Good for bounded sensor ranges

IoT Use Cases:

  • Battery voltage monitoring (naturally bounded 0-4.2V)
  • Humidity sensors (0-100% RH)
  • Any sensor with physical constraints on range

Implementation:

import numpy as np

def iqr_anomaly_detection(data, value):
    """
    Detect if a new value is anomalous using IQR method
    """
    q1 = np.percentile(data, 25)
    q3 = np.percentile(data, 75)
    iqr = q3 - q1

    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr

    is_anomaly = (value < lower_bound) or (value > upper_bound)

    return is_anomaly, lower_bound, upper_bound

# Example: Battery voltage monitoring (normal: 3.3-4.2V)
battery_readings = [3.85, 3.92, 3.78, 3.88, 3.95, 3.82, 3.90,
                   3.87, 3.93, 3.81, 3.89, 3.86]
# Sorted: [3.78, 3.81, 3.82, 3.85, 3.86, 3.87, 3.88, 3.89, 3.90, 3.92, 3.93, 3.95]
# Q1 ≈ 3.84, Q3 ≈ 3.91, IQR ≈ 0.07

# Check new reading
new_reading = 2.1  # Low battery or sensor fault
anomaly, lower, upper = iqr_anomaly_detection(battery_readings, new_reading)

print(f"Battery: {new_reading}V")
print(f"Expected range: {lower:.2f}V - {upper:.2f}V")
print(f"Anomaly: {anomaly}")
# Output: Battery: 2.1V, Expected range: 3.74V - 4.00V, Anomaly: True

Implementation sketch:

  • Compute the 25th percentile (Q1) and 75th percentile (Q3).
  • Calculate IQR = Q3 - Q1.
  • Set lower = Q1 - 1.5 * IQR and upper = Q3 + 1.5 * IQR.
  • Flag any new value below lower or above upper.

Battery example: a 2.1 V reading falls well below the expected band, so the detector marks it as anomalous.

12.6 Moving Statistics (Adaptive Thresholds)

Core Concept: Calculate mean and standard deviation over a sliding window to adapt to changing conditions.

Why It Matters for IoT:

  • Sensor characteristics drift over time (aging, temperature effects)
  • Environmental conditions change (seasons, occupancy patterns)
  • Fixed thresholds become obsolete

Implementation:

from collections import deque
import numpy as np
import random

class AdaptiveThresholdDetector:
    def __init__(self, window_size=50, n_std=3.0):
        self.window = deque(maxlen=window_size)
        self.n_std = n_std

    def update(self, value):
        """Add a value and compare it to the rolling baseline."""
        if len(self.window) <= 10:
            self.window.append(value)
            return False, None, None

        mean = np.mean(self.window)
        std = np.std(self.window)
        lower = mean - self.n_std * std
        upper = mean + self.n_std * std
        flagged = (value < lower) or (value > upper)

        if not flagged:  # Avoid contaminating the baseline
            self.window.append(value)

        return flagged, lower, upper

# Example: Indoor temperature monitoring with day/night cycles
detector = AdaptiveThresholdDetector(window_size=50, n_std=2.5)
for temp in [22.7, 23.0, 23.2]:
    reading = temp + random.uniform(-0.2, 0.2)
    flagged, lower, upper = detector.update(reading)
    print(round(reading, 1), flagged, lower, upper)

Implementation sketch:

  • Store recent readings in a fixed-size queue.
  • During the warm-up period, keep collecting values without alerting.
  • Compute rolling mean and standard deviation from the queue.
  • Alert when a reading falls outside mean ± n_std * std.
  • Only add non-anomalous readings back into the baseline so the window does not drift too quickly.

Window Size Trade-Offs:

Window Size Responsiveness Stability Best For
Small (10-50) High - detects sudden changes Low - sensitive to noise Fast-changing environments
Medium (50-200) Balanced Balanced General IoT monitoring
Large (200+) Low - slow to adapt High - ignores noise Stable industrial processes

Small (10-50)

  • Responsiveness: High - detects sudden changes
  • Stability: Low - sensitive to noise
  • Best for: Fast-changing environments

Medium (50-200)

  • Responsiveness: Balanced
  • Stability: Balanced
  • Best for: General IoT monitoring

Large (200+)

  • Responsiveness: Low - slow to adapt
  • Stability: High - ignores noise
  • Best for: Stable industrial processes

Tradeoff: Fixed Thresholds vs Adaptive Detection

Option A: Fixed statistical thresholds (Z-score > 3, IQR bounds) Option B: Adaptive thresholds with moving statistics Decision Factors: Fixed thresholds are simpler to implement and explain (auditable for compliance), work well when normal behavior is stable, and have lower computational overhead. Adaptive methods handle concept drift, seasonal patterns, and changing sensor characteristics, but require tuning window sizes and may temporarily miss anomalies during adaptation periods. Choose fixed for stable industrial processes; choose adaptive for environments with natural variation like building HVAC or outdoor sensors.

12.7 Method Comparison

Comparison matrix showing computational characteristics and deployment recommendations for Z-score, moving average, and IQR statistical anomaly detection methods across edge, gateway, and cloud deployments
Figure 12.1: Comparison matrix showing computational characteristics and deployment recommendations for each statistical method. Z-score suits resource-constrained edge devices, moving average handles concept drift at gateways, and IQR combined with ensembles provides maximum robustness in cloud deployments.

Mobile comparison guide: use these cards instead of the desktop matrix.

Z-Score

  • Best fit: Gaussian data with stable operating ranges
  • Why teams use it: Fastest option for edge devices with minimal memory
  • Deployment: Edge MCU or gateway

IQR Method

  • Best fit: Skewed, bounded, or outlier-heavy sensor data
  • Why teams use it: Robust to distribution shape and extreme values
  • Deployment: Gateway or cloud when you can afford sorting

Adaptive Thresholds

  • Best fit: Environments with concept drift or changing baselines
  • Why teams use it: Tracks gradual change without retraining a model
  • Deployment: Gateway with a tuned rolling window

12.7.1 Worked Example: Arla Foods Cold Chain Z-Score Monitoring

Scenario: Arla Foods, a Scandinavian dairy cooperative, monitors refrigerated transport trucks delivering milk from farms to processing plants across Denmark. Each truck has 4 temperature sensors (front, rear, left wall, right wall) reporting every 30 seconds. Arla needs to detect cooling failures before milk spoils (milk must stay between 2 C and 6 C).

Given:

  • 850 trucks, each with 4 sensors = 3,400 sensors
  • Reporting interval: 30 seconds = 2,880 readings/sensor/day
  • Normal operating temperature: mean 3.8 C, standard deviation 0.4 C
  • Regulatory limit: milk above 6 C for more than 30 minutes must be discarded
  • Spoilage cost per truck load: EUR 12,000 (average 8,000 litres)
  • Current method: Manual spot checks at delivery (catches failures only after spoilage)

Step 1: Select detection method

Method Suitability Reasoning
Z-Score Good Temperature is normally distributed around 3.8 C
IQR Acceptable Would also work, but Z-Score is simpler for Gaussian data
Adaptive Not needed “Normal” is stable (refrigeration setpoint does not drift)

Z-Score

  • Suitability: Good
  • Reasoning: Temperature is normally distributed around 3.8 C.

IQR

  • Suitability: Acceptable
  • Reasoning: Would also work, but Z-score is simpler for Gaussian data.

Adaptive

  • Suitability: Not needed
  • Reasoning: “Normal” is stable, so the refrigeration setpoint does not drift.

Step 2: Set Z-Score threshold

Threshold Temperature Trigger False Positive Rate Detection Speed
Z = 2.0 4.6 C (3.8 + 2 x 0.4) 4.6% (too many alerts) Very early
Z = 3.0 5.0 C (3.8 + 3 x 0.4) 0.27% Early (1.0 C below limit)
Z = 4.0 5.4 C (3.8 + 4 x 0.4) 0.006% Moderate (0.6 C margin)
Z = 5.0 5.8 C (3.8 + 5 x 0.4) 0.00006% Late (only 0.2 C margin)

Z = 2.0

  • Temperature trigger: 4.6 C (3.8 + 2 x 0.4)
  • False positive rate: 4.6% (too many alerts)
  • Detection speed: Very early

Z = 3.0

  • Temperature trigger: 5.0 C (3.8 + 3 x 0.4)
  • False positive rate: 0.27%
  • Detection speed: Early (1.0 C below limit)

Z = 4.0

  • Temperature trigger: 5.4 C (3.8 + 4 x 0.4)
  • False positive rate: 0.006%
  • Detection speed: Moderate (0.6 C margin)

Z = 5.0

  • Temperature trigger: 5.8 C (3.8 + 5 x 0.4)
  • False positive rate: 0.00006%
  • Detection speed: Late (only 0.2 C margin)

Selected: Z = 3.0 (triggers at 5.0 C, giving 1.0 C and approximately 15 minutes of warning before the 6 C regulatory limit is reached at typical failure rate of 0.07 C/minute).

Step 3: Calculate business impact

Metric Before (Manual) After (Z-Score) Improvement
Detection time At delivery (2-6 hours late) Within 90 seconds of anomaly 99% faster
Monthly spoilage events 23 truck loads 3 truck loads (caught early, rerouted) 87% reduction
Monthly spoilage cost EUR 276,000 EUR 36,000 EUR 240,000 saved
False alarms per truck per day 0 ~31 (0.27% of 11,520 readings per truck) High, needs filtering
Edge compute cost 0 EUR 0 (runs on existing truck gateway MCU) No hardware needed

Detection time

  • Before: At delivery (2-6 hours late)
  • After: Within 90 seconds of anomaly
  • Improvement: 99% faster

Monthly spoilage events

  • Before: 23 truck loads
  • After: 3 truck loads (caught early, rerouted)
  • Improvement: 87% reduction

Monthly spoilage cost

  • Before: EUR 276,000
  • After: EUR 36,000
  • Improvement: EUR 240,000 saved

False alarms per truck per day

  • Before: 0
  • After: ~31 (0.27% of 11,520 readings per truck)
  • Improvement: High, needs filtering

Edge compute cost

  • Before: 0
  • After: EUR 0 (runs on existing truck gateway MCU)
  • Improvement: No hardware needed

Result: A Z-Score threshold of 3.0, running on the existing truck gateway microcontroller with zero additional hardware cost, detects cooling failures within 90 seconds and saves Arla EUR 240,000 per month in prevented milk spoilage. The raw false alarm rate of ~31 per truck per day is addressed by requiring 3 consecutive Z > 3.0 readings (90 seconds) before escalating, reducing false alerts to fewer than 1 per truck per day across the fleet.

Key Insight: Statistical anomaly detection does not require cloud connectivity, ML models, or specialized hardware. A Z-Score calculation consumes fewer than 20 floating-point operations per reading and fits in 16 bytes of RAM, making it deployable on any microcontroller. The key design decision is the threshold: Z = 3.0 provides the sweet spot between early warning (1.0 C margin) and manageable false positives (0.27%) for this cold chain use case.

12.8 Interactive Demo: Anomaly Detection Methods

Experiment with different anomaly detection methods on simulated IoT sensor data. Adjust the threshold and detection method to see how they affect true positive and false positive rates.

Concept Relationships

Concept relationship diagram showing Statistical Anomaly Detection branching into Z-Score, IQR Method, and Adaptive Thresholds with their characteristics including data requirements, deployment contexts, and tuning considerations

Mobile reading guide: Z-score maps to Gaussian edge data, IQR maps to skewed or bounded sensors, and adaptive thresholds map to changing baselines that need window tuning.

How These Concepts Connect:

  • Distribution assumptions guide method selection: Z-score requires Gaussian data (temperature, pressure), IQR works with any distribution (battery voltage, humidity)
  • Resource constraints favor statistical methods: Z-score uses <100 bytes RAM, runs in <1ms on ESP32, perfect for edge deployment
  • Concept drift requires adaptive approaches: Fixed thresholds fail when sensor characteristics change; adaptive windows track drift automatically
  • Statistical methods are often first line of defense: Deploy at edge for fast response, escalate to ML (fog/cloud) only when statistical methods generate too many false positives

See Also

Foundation Concepts:

When Statistical Methods Fail:

Deployment and Operations:

Edge Computing:

Common Pitfalls

Z-score assumes a roughly bell-shaped distribution. Sensor data with heavy tails or strong skew will produce excessive false alarms. Visualise the distribution first; if non-Gaussian, use IQR or a log-transformation.

Computing Z-score against the overall historical mean ignores drift. A gradual temperature rise will eventually flag every reading as an anomaly. Always use a sliding or exponentially weighted window.

At 1 Hz, a 0.27% false alarm rate means 864 false alarms per day per sensor. Scale your threshold with sampling rate; higher rates often require tighter bounds (4σ or 5σ) plus temporal persistence checks.

Mixing raw ADC counts (0–4095) with calibrated engineering units (°C) will dominate the score with the higher-magnitude variable. Always normalise each sensor channel independently.

12.9 Summary

Statistical methods provide the foundation for anomaly detection in IoT:

  • Z-Score: Fast, simple, works for Gaussian data. Best for edge devices with minimal memory.
  • IQR: Robust to outliers and skewed data. Requires more memory for sorting.
  • Adaptive Thresholds: Handles concept drift. Requires tuning of window size.

Key Takeaway: Start with statistical methods. They catch 80% of anomalies with minimal resources. Only escalate to ML when statistical approaches fail.

Sammy the Sensor was monitoring the temperature of the school aquarium. Every day, the water was a nice, steady 24 degrees Celsius – give or take half a degree.

One morning, Sammy read 24.3 degrees. “Perfectly normal!” he reported to Max the Microcontroller.

Then he read 24.1. “Still fine!” Then 23.8. “No worries!”

Then suddenly: 15.2 degrees!

“WHOA!” shouted Sammy. “That is WAY different from normal!”

Max the Microcontroller quickly did some math. “The average temperature has been 24.0 degrees, and readings usually only vary by about 0.5 degrees. Your reading of 15.2 is almost EIGHTEEN standard deviations away from normal! That is called a Z-score of 18 – anything above 3 is suspicious!”

Lila the LED turned bright red to alert the teacher. They discovered the aquarium heater had broken!

“But wait,” said Bella the Battery. “What if we had a different sensor watching battery voltage? My voltage readings are not spread out evenly – they cluster near the top when I am full and drop fast at the end. Would Z-score work for me?”

“Great question!” said Max. “For you, we use the IQR method instead. It looks at the middle chunk of your readings and does not care about the shape of the data. It is more robust – like wearing a raincoat that works in any weather!”

Key lesson: Z-score works great when data follows a bell curve (like temperature). IQR works for any shape of data (like battery voltage). Pick the right tool for your data!

12.10 What’s Next

If you want to… Read this
Detect seasonally dependent anomalies Time-Series Methods
Handle complex multivariate patterns Machine Learning Approaches
Integrate methods into a production pipeline Detection Pipelines
Evaluate your detector’s accuracy Performance Metrics
Return to the module overview Anomaly Detection Overview