12  Statistical Anomaly Methods

12.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Apply Z-Score Detection: Implement standard deviation-based anomaly detection for Gaussian data
  • Use IQR Method: Deploy robust outlier detection that works with any distribution
  • Build Adaptive Thresholds: Create moving statistics systems that handle concept drift
  • Select Appropriate Methods: Choose between statistical approaches based on data characteristics
In 60 Seconds

Statistical anomaly detection — Z-score and IQR — provides lightweight, interpretable algorithms that run in microseconds on microcontrollers with as little as 32 KB RAM, making them the first-choice method for edge-tier anomaly detection. These methods work best on data with known, stable distributions; switch to time-series or ML methods when seasonality or complex correlations dominate.

Statistical anomaly detection uses math to spot unusual sensor readings. Think of measuring the heights of everyone in a room – someone three times the average height would clearly stand out. Similarly, if a temperature sensor suddenly reads far outside its normal range, statistical methods can automatically flag it as suspicious without needing complex machine learning.

Minimum Viable Understanding: Statistical Anomaly Detection

Core Concept: Statistical methods detect anomalies by measuring how far a value deviates from “normal” - either in standard deviations (Z-score) or quartile ranges (IQR).

Why It Matters: Statistical methods are computationally lightweight, require no training data, and can run on resource-constrained edge devices. They catch 80% of anomalies with 10% of the complexity of ML approaches.

Key Takeaway: Use Z-score for Gaussian data, IQR for skewed or bounded data, and adaptive thresholds when “normal” changes over time.

12.2 Prerequisites

Before diving into this chapter, you should be familiar with:

~20 min | Intermediate | P10.C01.U02

Key Concepts

  • Z-score: A measure of how many standard deviations a data point lies from the mean of its reference window; values beyond ±3σ are conventionally flagged as anomalies.
  • IQR (Interquartile Range): The range between the 25th and 75th percentiles (Q3 − Q1); used to define outlier bounds robust to existing outliers, unlike the Z-score.
  • Exponential moving average (EMA): A weighted average that gives more weight to recent observations, allowing the reference mean to adapt gradually to slow trends without storing a full window.
  • Adaptive threshold: A detection boundary that updates automatically as the underlying data distribution changes, preventing false alarms when normal operating conditions shift.
  • Sliding window: A fixed-size buffer of the most recent N readings used as the reference distribution for computing Z-scores or IQR bounds.
  • False alarm rate: The proportion of normal readings incorrectly flagged as anomalies; for a Gaussian distribution, a 3σ threshold yields a 0.27% false alarm rate.

12.3 Introduction

Statistical approaches form the foundation of anomaly detection in IoT systems. They are computationally lightweight, suitable for edge deployment, and work well when data follows known distributions. Unlike machine learning methods that require training data and significant compute resources, statistical detectors can run on microcontrollers with as little as 16 bytes of RAM. This chapter covers three complementary methods – Z-score, IQR, and adaptive thresholds – each suited to different data characteristics and deployment constraints.

How It Works

Statistical anomaly detection works by quantifying how unusual a value is relative to historical patterns, using mathematical measures of deviation.

Z-Score Mechanism:

  1. Establish baseline: Calculate mean (μ) and standard deviation (σ) from recent normal readings
  2. Measure deviation: For each new reading, compute how many standard deviations it is from the mean
  3. Apply threshold: Values beyond 3σ (99.7% confidence interval) are flagged as anomalies

Why this works: In Gaussian distributions, 99.7% of values fall within ±3σ. Readings beyond this are statistically rare (<0.3% probability if data is normal).

IQR Mechanism:

  1. Find quartiles: Sort historical data, identify 25th percentile (Q1) and 75th percentile (Q3)
  2. Calculate range: IQR = Q3 - Q1 (the middle 50% of data)
  3. Define fences: Lower fence = Q1 - 1.5×IQR, Upper fence = Q3 + 1.5×IQR
  4. Flag outliers: Values outside fences are anomalies

Why this works: IQR uses percentiles (order statistics) rather than mean/std, making it robust to outliers and distribution shape. Works even when data is skewed or has heavy tails.

Adaptive Thresholds Mechanism:

  1. Sliding window: Maintain buffer of last N readings (e.g., 50 samples)
  2. Recompute statistics: Calculate mean and std from window, not all-time history
  3. Dynamic baseline: As new readings arrive, old readings drop out, keeping statistics current

Why this works: Sensor characteristics drift due to aging, temperature, etc. Fixed thresholds become obsolete. Adaptive methods track changing baselines automatically.

12.4 Z-Score (Standard Deviation Method)

Core Concept: Measure how many standard deviations a data point is from the mean.

Formula:

Z = (x - mu) / sigma

Where:
- x = observed value
- mu = mean of dataset
- sigma = standard deviation
- |Z| > 3 typically indicates anomaly (99.7% confidence interval)

The Z-score threshold directly controls your false alarm rate. For a Gaussian distribution, the probability of exceeding a threshold is:

\[P(|Z| > z) = 2 \times (1 - \Phi(z))\]

where \(\Phi(z)\) is the cumulative distribution function. Concrete thresholds:

  • \(|Z| > 2.0\): 4.6% of normal readings flagged (1 false alarm per 22 readings)
  • \(|Z| > 2.5\): 1.2% false alarm rate (1 per 80 readings)
  • \(|Z| > 3.0\): 0.27% false alarm rate (1 per 370 readings)
  • \(|Z| > 4.0\): 0.006% false alarm rate (1 per 15,800 readings)

Example: A temperature sensor samples every minute (1,440 readings/day). With \(Z > 3.0\), expect \(1,440 \times 0.0027 \approx 4\) false alarms per day even when nothing is wrong. For 100 sensors, that’s 400 false alarms daily. Adjust threshold to \(Z > 3.5\) (0.047% rate) for 68 daily false alarms across the fleet.

12.4.1 Z-Score Threshold Explorer

Adjust the sensor parameters below to see how Z-score thresholds translate to real detection boundaries and false alarm rates.

When It Works:

  • Data follows Gaussian (normal) distribution
  • You have sufficient historical data to calculate mu and sigma
  • Distribution is stable over time (no concept drift)

When It Fails:

  • Non-normal distributions (skewed, multi-modal)
  • Data with seasonal patterns
  • When “normal” range changes dynamically

Implementation Example:

import numpy as np

class ZScoreDetector:
    def __init__(self, threshold=3.0, window_size=100):
        self.threshold = threshold
        self.window_size = window_size
        self.values = []

    def update(self, value):
        """Streaming Z-score anomaly detection"""
        self.values.append(value)

        # Keep only recent window
        if len(self.values) > self.window_size:
            self.values.pop(0)

        # Need minimum samples
        if len(self.values) < 30:
            return False, 0.0

        mean = np.mean(self.values)
        std = np.std(self.values)

        if std == 0:  # Avoid division by zero
            return False, 0.0

        z_score = abs((value - mean) / std)
        is_anomaly = z_score > self.threshold

        return is_anomaly, z_score

# Usage example: Temperature sensor monitoring
detector = ZScoreDetector(threshold=3.0)

# Seed with 30+ normal readings to meet minimum sample requirement
import random
random.seed(42)
for _ in range(35):
    normal_temp = 22.0 + random.uniform(-0.5, 0.5)
    detector.update(normal_temp)

# Now test with normal and anomalous readings
for temp in [22.1, 22.5, 21.8, 22.3, 22.0]:
    anomaly, score = detector.update(temp)
    print(f"Temp: {temp}C, Z-score: {score:.2f}, Anomaly: {anomaly}")

# Anomalous reading
anomaly, score = detector.update(-5.0)  # Sensor malfunction
print(f"Temp: -5.0C, Z-score: {score:.2f}, Anomaly: {anomaly}")
# Output: Temp: -5.0C, Z-score: ~75.0, Anomaly: True

12.5 Interquartile Range (IQR)

When to Choose IQR Over Z-Score

Not all unusual values indicate real-world problems. A temperature spike from 22C to -40C is almost certainly a sensor malfunction, not actual weather. IQR is the better choice when your data is skewed or has natural outliers because it uses percentiles rather than mean/standard deviation, making it immune to the distortions that extreme values cause in Z-score calculations. For IoT sensors with physical constraints (humidity 0-100%, voltage 0-5V), combine IQR with hard bounds – any reading outside physical limits is immediately flagged as a sensor fault, not an anomaly.

Core Concept: Identify outliers based on the spread of the middle 50% of data.

Formula:

IQR = Q3 - Q1
Lower Bound = Q1 - 1.5 x IQR
Upper Bound = Q3 + 1.5 x IQR

Outlier if: x < Lower Bound OR x > Upper Bound

Where:
- Q1 = 25th percentile
- Q3 = 75th percentile

Advantages over Z-Score:

  • No distribution assumptions (works with skewed data)
  • Robust to extreme outliers
  • Good for bounded sensor ranges

IoT Use Cases:

  • Battery voltage monitoring (naturally bounded 0-4.2V)
  • Humidity sensors (0-100% RH)
  • Any sensor with physical constraints on range

Implementation:

import numpy as np

def iqr_anomaly_detection(data, value):
    """
    Detect if a new value is anomalous using IQR method
    """
    q1 = np.percentile(data, 25)
    q3 = np.percentile(data, 75)
    iqr = q3 - q1

    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr

    is_anomaly = (value < lower_bound) or (value > upper_bound)

    return is_anomaly, lower_bound, upper_bound

# Example: Battery voltage monitoring (normal: 3.3-4.2V)
battery_readings = [3.85, 3.92, 3.78, 3.88, 3.95, 3.82, 3.90,
                   3.87, 3.93, 3.81, 3.89, 3.86]
# Sorted: [3.78, 3.81, 3.82, 3.85, 3.86, 3.87, 3.88, 3.89, 3.90, 3.92, 3.93, 3.95]
# Q1 ≈ 3.84, Q3 ≈ 3.91, IQR ≈ 0.07

# Check new reading
new_reading = 2.1  # Low battery or sensor fault
anomaly, lower, upper = iqr_anomaly_detection(battery_readings, new_reading)

print(f"Battery: {new_reading}V")
print(f"Expected range: {lower:.2f}V - {upper:.2f}V")
print(f"Anomaly: {anomaly}")
# Output: Battery: 2.1V, Expected range: 3.74V - 4.00V, Anomaly: True

12.6 Moving Statistics (Adaptive Thresholds)

Core Concept: Calculate mean and standard deviation over a sliding window to adapt to changing conditions.

Why It Matters for IoT:

  • Sensor characteristics drift over time (aging, temperature effects)
  • Environmental conditions change (seasons, occupancy patterns)
  • Fixed thresholds become obsolete

Implementation:

from collections import deque
import numpy as np

class AdaptiveThresholdDetector:
    def __init__(self, window_size=50, n_std=3.0):
        self.window = deque(maxlen=window_size)
        self.n_std = n_std

    def update(self, value):
        """Add new value and check if it's anomalous."""
        if len(self.window) > 10:
            mean = np.mean(self.window)
            std = np.std(self.window)
            lower = mean - self.n_std * std
            upper = mean + self.n_std * std
            is_anomaly = (value < lower) or (value > upper)
        else:
            is_anomaly, lower, upper = False, None, None

        # Add to window AFTER checking (don't contaminate with anomalies)
        if not is_anomaly:
            self.window.append(value)
        return is_anomaly, lower, upper

# Example: Indoor temperature monitoring with day/night cycles
detector = AdaptiveThresholdDetector(window_size=50, n_std=2.5)
temp = 23.0 + random.uniform(-0.5, 0.5)  # Daytime reading
anomaly, lower, upper = detector.update(temp)
print(f"Temp: {temp:.1f}C, Range: [{lower:.1f}, {upper:.1f}], Anomaly: {anomaly}")

Window Size Trade-Offs:

Window Size Responsiveness Stability Best For
Small (10-50) High - detects sudden changes Low - sensitive to noise Fast-changing environments
Medium (50-200) Balanced Balanced General IoT monitoring
Large (200+) Low - slow to adapt High - ignores noise Stable industrial processes

Tradeoff: Fixed Thresholds vs Adaptive Detection

Option A: Fixed statistical thresholds (Z-score > 3, IQR bounds) Option B: Adaptive thresholds with moving statistics Decision Factors: Fixed thresholds are simpler to implement and explain (auditable for compliance), work well when normal behavior is stable, and have lower computational overhead. Adaptive methods handle concept drift, seasonal patterns, and changing sensor characteristics, but require tuning window sizes and may temporarily miss anomalies during adaptation periods. Choose fixed for stable industrial processes; choose adaptive for environments with natural variation like building HVAC or outdoor sensors.

12.7 Method Comparison

Comparison matrix showing computational characteristics and deployment recommendations for Z-score, moving average, and IQR statistical anomaly detection methods across edge, gateway, and cloud deployments
Figure 12.1: Comparison matrix showing computational characteristics and deployment recommendations for each statistical method. Z-score suits resource-constrained edge devices, moving average handles concept drift at gateways, and IQR combined with ensembles provides maximum robustness in cloud deployments.

12.7.1 Worked Example: Arla Foods Cold Chain Z-Score Monitoring

Scenario: Arla Foods, a Scandinavian dairy cooperative, monitors refrigerated transport trucks delivering milk from farms to processing plants across Denmark. Each truck has 4 temperature sensors (front, rear, left wall, right wall) reporting every 30 seconds. Arla needs to detect cooling failures before milk spoils (milk must stay between 2 C and 6 C).

Given:

  • 850 trucks, each with 4 sensors = 3,400 sensors
  • Reporting interval: 30 seconds = 2,880 readings/sensor/day
  • Normal operating temperature: mean 3.8 C, standard deviation 0.4 C
  • Regulatory limit: milk above 6 C for more than 30 minutes must be discarded
  • Spoilage cost per truck load: EUR 12,000 (average 8,000 litres)
  • Current method: Manual spot checks at delivery (catches failures only after spoilage)

Step 1: Select detection method

Method Suitability Reasoning
Z-Score Good Temperature is normally distributed around 3.8 C
IQR Acceptable Would also work, but Z-Score is simpler for Gaussian data
Adaptive Not needed “Normal” is stable (refrigeration setpoint does not drift)

Step 2: Set Z-Score threshold

Threshold Temperature Trigger False Positive Rate Detection Speed
Z = 2.0 4.6 C (3.8 + 2 x 0.4) 4.6% (too many alerts) Very early
Z = 3.0 5.0 C (3.8 + 3 x 0.4) 0.27% Early (1.0 C below limit)
Z = 4.0 5.4 C (3.8 + 4 x 0.4) 0.006% Moderate (0.6 C margin)
Z = 5.0 5.8 C (3.8 + 5 x 0.4) 0.00006% Late (only 0.2 C margin)

Selected: Z = 3.0 (triggers at 5.0 C, giving 1.0 C and approximately 15 minutes of warning before the 6 C regulatory limit is reached at typical failure rate of 0.07 C/minute).

Step 3: Calculate business impact

Metric Before (Manual) After (Z-Score) Improvement
Detection time At delivery (2-6 hours late) Within 90 seconds of anomaly 99% faster
Monthly spoilage events 23 truck loads 3 truck loads (caught early, rerouted) 87% reduction
Monthly spoilage cost EUR 276,000 EUR 36,000 EUR 240,000 saved
False alarms per truck per day 0 ~31 (0.27% of 11,520 readings per truck) High, needs filtering
Edge compute cost 0 EUR 0 (runs on existing truck gateway MCU) No hardware needed

Result: A Z-Score threshold of 3.0, running on the existing truck gateway microcontroller with zero additional hardware cost, detects cooling failures within 90 seconds and saves Arla EUR 240,000 per month in prevented milk spoilage. The raw false alarm rate of ~31 per truck per day is addressed by requiring 3 consecutive Z > 3.0 readings (90 seconds) before escalating, reducing false alerts to fewer than 1 per truck per day across the fleet.

Key Insight: Statistical anomaly detection does not require cloud connectivity, ML models, or specialized hardware. A Z-Score calculation consumes fewer than 20 floating-point operations per reading and fits in 16 bytes of RAM, making it deployable on any microcontroller. The key design decision is the threshold: Z = 3.0 provides the sweet spot between early warning (1.0 C margin) and manageable false positives (0.27%) for this cold chain use case.

12.8 Interactive Demo: Anomaly Detection Methods

Experiment with different anomaly detection methods on simulated IoT sensor data. Adjust the threshold and detection method to see how they affect true positive and false positive rates.

Concept Relationships

Concept relationship diagram showing Statistical Anomaly Detection branching into Z-Score, IQR Method, and Adaptive Thresholds with their characteristics including data requirements, deployment contexts, and tuning considerations

How These Concepts Connect:

  • Distribution assumptions guide method selection: Z-score requires Gaussian data (temperature, pressure), IQR works with any distribution (battery voltage, humidity)
  • Resource constraints favor statistical methods: Z-score uses <100 bytes RAM, runs in <1ms on ESP32, perfect for edge deployment
  • Concept drift requires adaptive approaches: Fixed thresholds fail when sensor characteristics change; adaptive windows track drift automatically
  • Statistical methods are often first line of defense: Deploy at edge for fast response, escalate to ML (fog/cloud) only when statistical methods generate too many false positives

See Also

Foundation Concepts:

When Statistical Methods Fail:

Deployment and Operations:

Edge Computing:

Common Pitfalls

Z-score assumes a roughly bell-shaped distribution. Sensor data with heavy tails or strong skew will produce excessive false alarms. Visualise the distribution first; if non-Gaussian, use IQR or a log-transformation.

Computing Z-score against the overall historical mean ignores drift. A gradual temperature rise will eventually flag every reading as an anomaly. Always use a sliding or exponentially weighted window.

At 1 Hz, a 0.27% false alarm rate means 864 false alarms per day per sensor. Scale your threshold with sampling rate; higher rates often require tighter bounds (4σ or 5σ) plus temporal persistence checks.

Mixing raw ADC counts (0–4095) with calibrated engineering units (°C) will dominate the score with the higher-magnitude variable. Always normalise each sensor channel independently.

12.9 Summary

Statistical methods provide the foundation for anomaly detection in IoT:

  • Z-Score: Fast, simple, works for Gaussian data. Best for edge devices with minimal memory.
  • IQR: Robust to outliers and skewed data. Requires more memory for sorting.
  • Adaptive Thresholds: Handles concept drift. Requires tuning of window size.

Key Takeaway: Start with statistical methods. They catch 80% of anomalies with minimal resources. Only escalate to ML when statistical approaches fail.

Sammy the Sensor was monitoring the temperature of the school aquarium. Every day, the water was a nice, steady 24 degrees Celsius – give or take half a degree.

One morning, Sammy read 24.3 degrees. “Perfectly normal!” he reported to Max the Microcontroller.

Then he read 24.1. “Still fine!” Then 23.8. “No worries!”

Then suddenly: 15.2 degrees!

“WHOA!” shouted Sammy. “That is WAY different from normal!”

Max the Microcontroller quickly did some math. “The average temperature has been 24.0 degrees, and readings usually only vary by about 0.5 degrees. Your reading of 15.2 is almost EIGHTEEN standard deviations away from normal! That is called a Z-score of 18 – anything above 3 is suspicious!”

Lila the LED turned bright red to alert the teacher. They discovered the aquarium heater had broken!

“But wait,” said Bella the Battery. “What if we had a different sensor watching battery voltage? My voltage readings are not spread out evenly – they cluster near the top when I am full and drop fast at the end. Would Z-score work for me?”

“Great question!” said Max. “For you, we use the IQR method instead. It looks at the middle chunk of your readings and does not care about the shape of the data. It is more robust – like wearing a raincoat that works in any weather!”

Key lesson: Z-score works great when data follows a bell curve (like temperature). IQR works for any shape of data (like battery voltage). Pick the right tool for your data!

12.10 What’s Next

If you want to… Read this
Detect seasonally dependent anomalies Time-Series Methods
Handle complex multivariate patterns Machine Learning Approaches
Integrate methods into a production pipeline Detection Pipelines
Evaluate your detector’s accuracy Performance Metrics
Return to the module overview Anomaly Detection Overview