Apply Z-Score Detection: Implement standard deviation-based anomaly detection for Gaussian data
Use IQR Method: Deploy robust outlier detection that works with any distribution
Build Adaptive Thresholds: Create moving statistics systems that handle concept drift
Select Appropriate Methods: Choose between statistical approaches based on data characteristics
In 60 Seconds
Statistical anomaly detection — Z-score and IQR — provides lightweight, interpretable algorithms that run in microseconds on microcontrollers with as little as 32 KB RAM, making them the first-choice method for edge-tier anomaly detection. These methods work best on data with known, stable distributions; switch to time-series or ML methods when seasonality or complex correlations dominate.
For Beginners: Statistical Anomaly Detection
Statistical anomaly detection uses math to spot unusual sensor readings. Think of measuring the heights of everyone in a room – someone three times the average height would clearly stand out. Similarly, if a temperature sensor suddenly reads far outside its normal range, statistical methods can automatically flag it as suspicious without needing complex machine learning.
Core Concept: Statistical methods detect anomalies by measuring how far a value deviates from “normal” - either in standard deviations (Z-score) or quartile ranges (IQR).
Why It Matters: Statistical methods are computationally lightweight, require no training data, and can run on resource-constrained edge devices. They catch 80% of anomalies with 10% of the complexity of ML approaches.
Key Takeaway: Use Z-score for Gaussian data, IQR for skewed or bounded data, and adaptive thresholds when “normal” changes over time.
12.2 Prerequisites
Before diving into this chapter, you should be familiar with:
Anomaly Types: Understanding the difference between point, contextual, and collective anomalies
Z-score: A measure of how many standard deviations a data point lies from the mean of its reference window; values beyond ±3σ are conventionally flagged as anomalies.
IQR (Interquartile Range): The range between the 25th and 75th percentiles (Q3 − Q1); used to define outlier bounds robust to existing outliers, unlike the Z-score.
Exponential moving average (EMA): A weighted average that gives more weight to recent observations, allowing the reference mean to adapt gradually to slow trends without storing a full window.
Adaptive threshold: A detection boundary that updates automatically as the underlying data distribution changes, preventing false alarms when normal operating conditions shift.
Sliding window: A fixed-size buffer of the most recent N readings used as the reference distribution for computing Z-scores or IQR bounds.
False alarm rate: The proportion of normal readings incorrectly flagged as anomalies; for a Gaussian distribution, a 3σ threshold yields a 0.27% false alarm rate.
12.3 Introduction
Statistical approaches form the foundation of anomaly detection in IoT systems. They are computationally lightweight, suitable for edge deployment, and work well when data follows known distributions. Unlike machine learning methods that require training data and significant compute resources, statistical detectors can run on microcontrollers with as little as 16 bytes of RAM. This chapter covers three complementary methods – Z-score, IQR, and adaptive thresholds – each suited to different data characteristics and deployment constraints.
How It Works
Statistical anomaly detection works by quantifying how unusual a value is relative to historical patterns, using mathematical measures of deviation.
Z-Score Mechanism:
Establish baseline: Calculate mean (μ) and standard deviation (σ) from recent normal readings
Measure deviation: For each new reading, compute how many standard deviations it is from the mean
Apply threshold: Values beyond 3σ (99.7% confidence interval) are flagged as anomalies
Why this works: In Gaussian distributions, 99.7% of values fall within ±3σ. Readings beyond this are statistically rare (<0.3% probability if data is normal).
Flag outliers: Values outside fences are anomalies
Why this works: IQR uses percentiles (order statistics) rather than mean/std, making it robust to outliers and distribution shape. Works even when data is skewed or has heavy tails.
Adaptive Thresholds Mechanism:
Sliding window: Maintain buffer of last N readings (e.g., 50 samples)
Recompute statistics: Calculate mean and std from window, not all-time history
Dynamic baseline: As new readings arrive, old readings drop out, keeping statistics current
Why this works: Sensor characteristics drift due to aging, temperature, etc. Fixed thresholds become obsolete. Adaptive methods track changing baselines automatically.
12.4 Z-Score (Standard Deviation Method)
Core Concept: Measure how many standard deviations a data point is from the mean.
Formula:
Z = (x - mu) / sigma
Where:
- x = observed value
- mu = mean of dataset
- sigma = standard deviation
- |Z| > 3 typically indicates anomaly (99.7% confidence interval)
Putting Numbers to It
The Z-score threshold directly controls your false alarm rate. For a Gaussian distribution, the probability of exceeding a threshold is:
\[P(|Z| > z) = 2 \times (1 - \Phi(z))\]
where \(\Phi(z)\) is the cumulative distribution function. Concrete thresholds:
\(|Z| > 2.0\): 4.6% of normal readings flagged (1 false alarm per 22 readings)
Example: A temperature sensor samples every minute (1,440 readings/day). With \(Z > 3.0\), expect \(1,440 \times 0.0027 \approx 4\) false alarms per day even when nothing is wrong. For 100 sensors, that’s 400 false alarms daily. Adjust threshold to \(Z > 3.5\) (0.047% rate) for 68 daily false alarms across the fleet.
12.4.1 Z-Score Threshold Explorer
Adjust the sensor parameters below to see how Z-score thresholds translate to real detection boundaries and false alarm rates.
Show code
viewof sensorMean = Inputs.range([0,100], {value:22.0,step:0.1,label:"Sensor mean (μ)"})viewof sensorStd = Inputs.range([0.1,10], {value:0.5,step:0.1,label:"Sensor std dev (σ)"})viewof zThreshold = Inputs.range([1.5,5.0], {value:3.0,step:0.1,label:"Z-score threshold"})viewof readingsPerDay = Inputs.range([100,86400], {value:1440,step:100,label:"Readings per day"})
Show code
{const lowerTrigger = sensorMean - zThreshold * sensorStd;const upperTrigger = sensorMean + zThreshold * sensorStd;// False alarm rate: P(|Z| > z) ≈ 2 * (1 - Phi(z))// Using approximation: erfc(z/sqrt(2))functionerfcApprox(x) {const t =1/ (1+0.3275911*Math.abs(x));const poly = t * (0.254829592+ t * (-0.284496736+ t * (1.421413741+ t * (-1.453152027+ t *1.061405429))));const result = poly *Math.exp(-x * x);return x >=0? result :2- result; }const falseAlarmRate =erfcApprox(zThreshold /Math.SQRT2);const falseAlarmsPerDay = falseAlarmRate * readingsPerDay;const falseAlarmsPerDayPer100 = falseAlarmsPerDay *100;const bgColor ="var(--bs-light, #f8f9fa)";const textColor ="var(--bs-body-color, #2C3E50)";returnhtml`<div style="background: ${bgColor}; color: ${textColor}; border-radius: 8px; padding: 16px; font-family: Arial, sans-serif; border-left: 4px solid #16A085;"> <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 12px;"> <div> <strong style="color: #16A085;">Detection Boundaries</strong><br> Lower trigger: <strong>${lowerTrigger.toFixed(1)}</strong><br> Upper trigger: <strong>${upperTrigger.toFixed(1)}</strong><br> Normal range: ${lowerTrigger.toFixed(1)} to ${upperTrigger.toFixed(1)} </div> <div> <strong style="color: #E67E22;">False Alarm Impact</strong><br> False alarm rate: <strong>${(falseAlarmRate *100).toFixed(4)}%</strong><br> Per sensor/day: <strong>${falseAlarmsPerDay.toFixed(1)}</strong> false alarms<br> Per 100 sensors/day: <strong>${falseAlarmsPerDayPer100.toFixed(0)}</strong> false alarms </div> </div> <div style="margin-top: 10px; font-size: 0.9em; color: #7F8C8D;"> Any reading below ${lowerTrigger.toFixed(1)} or above ${upperTrigger.toFixed(1)} triggers an anomaly alert.${falseAlarmsPerDay >10?"⚠ High false alarm rate — consider increasing the threshold.":"✓ Manageable false alarm rate."} </div> </div>`;}
When It Works:
Data follows Gaussian (normal) distribution
You have sufficient historical data to calculate mu and sigma
Distribution is stable over time (no concept drift)
When It Fails:
Non-normal distributions (skewed, multi-modal)
Data with seasonal patterns
When “normal” range changes dynamically
Implementation Example:
import numpy as npclass ZScoreDetector:def__init__(self, threshold=3.0, window_size=100):self.threshold = thresholdself.window_size = window_sizeself.values = []def update(self, value):"""Streaming Z-score anomaly detection"""self.values.append(value)# Keep only recent windowiflen(self.values) >self.window_size:self.values.pop(0)# Need minimum samplesiflen(self.values) <30:returnFalse, 0.0 mean = np.mean(self.values) std = np.std(self.values)if std ==0: # Avoid division by zeroreturnFalse, 0.0 z_score =abs((value - mean) / std) is_anomaly = z_score >self.thresholdreturn is_anomaly, z_score# Usage example: Temperature sensor monitoringdetector = ZScoreDetector(threshold=3.0)# Seed with 30+ normal readings to meet minimum sample requirementimport randomrandom.seed(42)for _ inrange(35): normal_temp =22.0+ random.uniform(-0.5, 0.5) detector.update(normal_temp)# Now test with normal and anomalous readingsfor temp in [22.1, 22.5, 21.8, 22.3, 22.0]: anomaly, score = detector.update(temp)print(f"Temp: {temp}C, Z-score: {score:.2f}, Anomaly: {anomaly}")# Anomalous readinganomaly, score = detector.update(-5.0) # Sensor malfunctionprint(f"Temp: -5.0C, Z-score: {score:.2f}, Anomaly: {anomaly}")# Output: Temp: -5.0C, Z-score: ~75.0, Anomaly: True
12.5 Interquartile Range (IQR)
When to Choose IQR Over Z-Score
Not all unusual values indicate real-world problems. A temperature spike from 22C to -40C is almost certainly a sensor malfunction, not actual weather. IQR is the better choice when your data is skewed or has natural outliers because it uses percentiles rather than mean/standard deviation, making it immune to the distortions that extreme values cause in Z-score calculations. For IoT sensors with physical constraints (humidity 0-100%, voltage 0-5V), combine IQR with hard bounds – any reading outside physical limits is immediately flagged as a sensor fault, not an anomaly.
Core Concept: Identify outliers based on the spread of the middle 50% of data.
Formula:
IQR = Q3 - Q1
Lower Bound = Q1 - 1.5 x IQR
Upper Bound = Q3 + 1.5 x IQR
Outlier if: x < Lower Bound OR x > Upper Bound
Where:
- Q1 = 25th percentile
- Q3 = 75th percentile
Advantages over Z-Score:
No distribution assumptions (works with skewed data)
Robust to extreme outliers
Good for bounded sensor ranges
IoT Use Cases:
Battery voltage monitoring (naturally bounded 0-4.2V)
from collections import dequeimport numpy as npclass AdaptiveThresholdDetector:def__init__(self, window_size=50, n_std=3.0):self.window = deque(maxlen=window_size)self.n_std = n_stddef update(self, value):"""Add new value and check if it's anomalous."""iflen(self.window) >10: mean = np.mean(self.window) std = np.std(self.window) lower = mean -self.n_std * std upper = mean +self.n_std * std is_anomaly = (value < lower) or (value > upper)else: is_anomaly, lower, upper =False, None, None# Add to window AFTER checking (don't contaminate with anomalies)ifnot is_anomaly:self.window.append(value)return is_anomaly, lower, upper# Example: Indoor temperature monitoring with day/night cyclesdetector = AdaptiveThresholdDetector(window_size=50, n_std=2.5)temp =23.0+ random.uniform(-0.5, 0.5) # Daytime readinganomaly, lower, upper = detector.update(temp)print(f"Temp: {temp:.1f}C, Range: [{lower:.1f}, {upper:.1f}], Anomaly: {anomaly}")
Window Size Trade-Offs:
Window Size
Responsiveness
Stability
Best For
Small (10-50)
High - detects sudden changes
Low - sensitive to noise
Fast-changing environments
Medium (50-200)
Balanced
Balanced
General IoT monitoring
Large (200+)
Low - slow to adapt
High - ignores noise
Stable industrial processes
Tradeoff: Fixed Thresholds vs Adaptive Detection
Option A: Fixed statistical thresholds (Z-score > 3, IQR bounds) Option B: Adaptive thresholds with moving statistics Decision Factors: Fixed thresholds are simpler to implement and explain (auditable for compliance), work well when normal behavior is stable, and have lower computational overhead. Adaptive methods handle concept drift, seasonal patterns, and changing sensor characteristics, but require tuning window sizes and may temporarily miss anomalies during adaptation periods. Choose fixed for stable industrial processes; choose adaptive for environments with natural variation like building HVAC or outdoor sensors.
12.7 Method Comparison
Figure 12.1: Comparison matrix showing computational characteristics and deployment recommendations for each statistical method. Z-score suits resource-constrained edge devices, moving average handles concept drift at gateways, and IQR combined with ensembles provides maximum robustness in cloud deployments.
12.7.1 Worked Example: Arla Foods Cold Chain Z-Score Monitoring
Scenario: Arla Foods, a Scandinavian dairy cooperative, monitors refrigerated transport trucks delivering milk from farms to processing plants across Denmark. Each truck has 4 temperature sensors (front, rear, left wall, right wall) reporting every 30 seconds. Arla needs to detect cooling failures before milk spoils (milk must stay between 2 C and 6 C).
Normal operating temperature: mean 3.8 C, standard deviation 0.4 C
Regulatory limit: milk above 6 C for more than 30 minutes must be discarded
Spoilage cost per truck load: EUR 12,000 (average 8,000 litres)
Current method: Manual spot checks at delivery (catches failures only after spoilage)
Step 1: Select detection method
Method
Suitability
Reasoning
Z-Score
Good
Temperature is normally distributed around 3.8 C
IQR
Acceptable
Would also work, but Z-Score is simpler for Gaussian data
Adaptive
Not needed
“Normal” is stable (refrigeration setpoint does not drift)
Step 2: Set Z-Score threshold
Threshold
Temperature Trigger
False Positive Rate
Detection Speed
Z = 2.0
4.6 C (3.8 + 2 x 0.4)
4.6% (too many alerts)
Very early
Z = 3.0
5.0 C (3.8 + 3 x 0.4)
0.27%
Early (1.0 C below limit)
Z = 4.0
5.4 C (3.8 + 4 x 0.4)
0.006%
Moderate (0.6 C margin)
Z = 5.0
5.8 C (3.8 + 5 x 0.4)
0.00006%
Late (only 0.2 C margin)
Selected: Z = 3.0 (triggers at 5.0 C, giving 1.0 C and approximately 15 minutes of warning before the 6 C regulatory limit is reached at typical failure rate of 0.07 C/minute).
Step 3: Calculate business impact
Metric
Before (Manual)
After (Z-Score)
Improvement
Detection time
At delivery (2-6 hours late)
Within 90 seconds of anomaly
99% faster
Monthly spoilage events
23 truck loads
3 truck loads (caught early, rerouted)
87% reduction
Monthly spoilage cost
EUR 276,000
EUR 36,000
EUR 240,000 saved
False alarms per truck per day
0
~31 (0.27% of 11,520 readings per truck)
High, needs filtering
Edge compute cost
0
EUR 0 (runs on existing truck gateway MCU)
No hardware needed
Result: A Z-Score threshold of 3.0, running on the existing truck gateway microcontroller with zero additional hardware cost, detects cooling failures within 90 seconds and saves Arla EUR 240,000 per month in prevented milk spoilage. The raw false alarm rate of ~31 per truck per day is addressed by requiring 3 consecutive Z > 3.0 readings (90 seconds) before escalating, reducing false alerts to fewer than 1 per truck per day across the fleet.
Key Insight: Statistical anomaly detection does not require cloud connectivity, ML models, or specialized hardware. A Z-Score calculation consumes fewer than 20 floating-point operations per reading and fits in 16 bytes of RAM, making it deployable on any microcontroller. The key design decision is the threshold: Z = 3.0 provides the sweet spot between early warning (1.0 C margin) and manageable false positives (0.27%) for this cold chain use case.
12.8 Interactive Demo: Anomaly Detection Methods
Interactive: Anomaly Detection Demo
Experiment with different anomaly detection methods on simulated IoT sensor data. Adjust the threshold and detection method to see how they affect true positive and false positive rates.
Concept Relationships
How These Concepts Connect:
Distribution assumptions guide method selection: Z-score requires Gaussian data (temperature, pressure), IQR works with any distribution (battery voltage, humidity)
Resource constraints favor statistical methods: Z-score uses <100 bytes RAM, runs in <1ms on ESP32, perfect for edge deployment
Concept drift requires adaptive approaches: Fixed thresholds fail when sensor characteristics change; adaptive windows track drift automatically
Statistical methods are often first line of defense: Deploy at edge for fast response, escalate to ML (fog/cloud) only when statistical methods generate too many false positives
See Also
Foundation Concepts:
Anomaly Types - Statistical methods detect point anomalies; see other types for ML approaches
Z-score assumes a roughly bell-shaped distribution. Sensor data with heavy tails or strong skew will produce excessive false alarms. Visualise the distribution first; if non-Gaussian, use IQR or a log-transformation.
2. Using a global mean instead of a rolling window
Computing Z-score against the overall historical mean ignores drift. A gradual temperature rise will eventually flag every reading as an anomaly. Always use a sliding or exponentially weighted window.
3. Choosing a 3σ threshold without considering sampling rate
At 1 Hz, a 0.27% false alarm rate means 864 false alarms per day per sensor. Scale your threshold with sampling rate; higher rates often require tighter bounds (4σ or 5σ) plus temporal persistence checks.
4. Ignoring sensor units and scaling
Mixing raw ADC counts (0–4095) with calibrated engineering units (°C) will dominate the score with the higher-magnitude variable. Always normalise each sensor channel independently.
Label the Diagram
12.9 Summary
Statistical methods provide the foundation for anomaly detection in IoT:
Z-Score: Fast, simple, works for Gaussian data. Best for edge devices with minimal memory.
IQR: Robust to outliers and skewed data. Requires more memory for sorting.
Adaptive Thresholds: Handles concept drift. Requires tuning of window size.
Key Takeaway: Start with statistical methods. They catch 80% of anomalies with minimal resources. Only escalate to ML when statistical approaches fail.
For Kids: Meet the Sensor Squad!
Sammy the Sensor was monitoring the temperature of the school aquarium. Every day, the water was a nice, steady 24 degrees Celsius – give or take half a degree.
One morning, Sammy read 24.3 degrees. “Perfectly normal!” he reported to Max the Microcontroller.
Then he read 24.1. “Still fine!” Then 23.8. “No worries!”
Then suddenly: 15.2 degrees!
“WHOA!” shouted Sammy. “That is WAY different from normal!”
Max the Microcontroller quickly did some math. “The average temperature has been 24.0 degrees, and readings usually only vary by about 0.5 degrees. Your reading of 15.2 is almost EIGHTEEN standard deviations away from normal! That is called a Z-score of 18 – anything above 3 is suspicious!”
Lila the LED turned bright red to alert the teacher. They discovered the aquarium heater had broken!
“But wait,” said Bella the Battery. “What if we had a different sensor watching battery voltage? My voltage readings are not spread out evenly – they cluster near the top when I am full and drop fast at the end. Would Z-score work for me?”
“Great question!” said Max. “For you, we use the IQR method instead. It looks at the middle chunk of your readings and does not care about the shape of the data. It is more robust – like wearing a raincoat that works in any weather!”
Key lesson: Z-score works great when data follows a bell curve (like temperature). IQR works for any shape of data (like battery voltage). Pick the right tool for your data!