1354  Anomaly Detection Performance Metrics and Lab

1354.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Evaluate Detection Performance: Use precision, recall, F1, and confusion matrices for imbalanced data
  • Tune Thresholds: Optimize detection thresholds based on business costs
  • Reduce False Alarms: Apply temporal persistence and multi-sensor correlation
  • Build a Detection System: Implement multi-method anomaly detection on embedded hardware
TipMinimum Viable Understanding: Anomaly Detection Metrics

Core Concept: Standard accuracy is meaningless for anomaly detection. A detector that always says β€œnormal” achieves 99.9% accuracy but catches zero anomalies. Use precision, recall, and F1 instead.

Why It Matters: The cost of a missed anomaly (false negative) is often 10-100x the cost of a false alarm (false positive). Threshold tuning is a business decision, not a statistical one.

Key Takeaway: Set thresholds based on the cost ratio of false negatives to false positives. For safety-critical systems, optimize for recall (>99%). For consumer systems, optimize for precision (>90%).

1354.2 Prerequisites

Before diving into this chapter, you should be familiar with:

~10 min | Intermediate | P10.C01.U06

1354.3 Introduction

How do you know if your anomaly detector is working well? Standard accuracy is misleading for imbalanced data (99.9% normal, 0.1% anomalies).

1354.4 The Fundamental Trade-Off

Sensitivity vs Specificity:

  • High Sensitivity (Recall): Catch all anomalies, but many false alarms
  • High Specificity: Few false alarms, but miss some anomalies

Real-world costs: - False Positive: Operator investigates, finds nothing - wastes time, alarm fatigue - False Negative: Miss critical failure - equipment damage, safety risk

The balance depends on domain:

Domain Priority Target Metrics Rationale
Industrial Safety Recall >99% recall Cannot miss critical failures
Consumer IoT Precision >80% precision Users ignore frequent false alarms
Predictive Maintenance Balanced F1 > 0.85 Balance early detection vs maintenance costs

1354.5 Key Metrics Explained

Confusion Matrix:

                  Predicted
                Normal  Anomaly
Actual Normal     TN      FP    (False Positive = False Alarm)
      Anomaly     FN      TP    (False Negative = Missed Anomaly)

Derived Metrics:

  1. Precision (Positive Predictive Value)

    Precision = TP / (TP + FP)
    "Of alerts raised, how many were real anomalies?"
    • High precision means few false alarms
    • Critical for systems with alert fatigue risk
  2. Recall (Sensitivity, True Positive Rate)

    Recall = TP / (TP + FN)
    "Of real anomalies, how many did we catch?"
    • High recall means don’t miss critical events
    • Critical for safety systems
  3. F1 Score (Harmonic Mean)

    F1 = 2 x (Precision x Recall) / (Precision + Recall)
    • Balanced metric when both precision and recall matter
    • Single number for model comparison
  4. False Positive Rate

    FPR = FP / (FP + TN)
    "Of normal samples, how many did we incorrectly flag?"
    • Critical for operational burden
    • Target: <0.1% for industrial (1 false alarm per 1000 samples)

Worked Example:

# Motor monitoring system over 1 week
# 1,000,000 sensor readings, 100 real anomalies

TP = 95   # Detected 95 real anomalies
FN = 5    # Missed 5 real anomalies
FP = 200  # 200 false alarms
TN = 999700  # Correctly identified 999,700 normal samples

precision = TP / (TP + FP)
recall = TP / (TP + FN)
f1 = 2 * (precision * recall) / (precision + recall)
fpr = FP / (FP + TN)

print(f"Precision: {precision:.3f} (95/295 alerts were real)")
print(f"Recall: {recall:.3f} (caught 95/100 anomalies)")
print(f"F1 Score: {f1:.3f}")
print(f"False Positive Rate: {fpr:.5f} (0.02%)")

# Output:
# Precision: 0.322 (32% of alerts were real) <- LOW, too many false alarms
# Recall: 0.950 (95% of anomalies caught) <- HIGH, good detection
# F1 Score: 0.481 <- Mediocre balance
# False Positive Rate: 0.00020 (0.02%) <- EXCELLENT, few false alarms per sample

# Interpretation: System catches most anomalies but operator receives
# ~200 false alarms per week (28/day) -> likely alarm fatigue
# Solution: Increase detection threshold to improve precision

1354.6 Real-World Performance Targets

Industry Benchmarks:

Application Precision Target Recall Target False Alarm Tolerance
Manufacturing Safety >70% >99% <10 false alarms/day
Predictive Maintenance >80% >95% <5 false alarms/week
Energy Management >85% >90% <2 false alarms/week
Smart Home >90% >80% <1 false alarm/month
Network Security >60% >99.9% <100 false alarms/day

1354.7 Threshold Tuning

Tuning Strategy:

from sklearn.metrics import precision_recall_curve

def find_optimal_threshold(y_true, y_scores, target_recall=0.95):
    """
    Find threshold that achieves target recall while maximizing precision

    y_true: actual labels (0=normal, 1=anomaly)
    y_scores: anomaly scores from model
    target_recall: minimum recall required
    """
    precision, recall, thresholds = precision_recall_curve(y_true, y_scores)

    # Find thresholds that meet recall target
    valid_indices = recall >= target_recall

    if not any(valid_indices):
        print(f"Cannot achieve {target_recall} recall")
        return None

    # Among valid thresholds, pick one with best precision
    best_idx = np.argmax(precision[valid_indices])
    best_threshold = thresholds[valid_indices][best_idx]
    best_precision = precision[valid_indices][best_idx]

    print(f"Threshold: {best_threshold:.3f}")
    print(f"Achieves: {target_recall:.1%} recall, {best_precision:.1%} precision")

    return best_threshold

The Misconception: 99% accuracy means the detector works well.

Why It’s Wrong: - Anomalies are rare (0.1% of data typically) - A detector that always says β€œnormal” achieves 99.9% accuracy! - Accuracy ignores the cost of different errors - False positives vs false negatives have different impacts

Real-World Example: - Factory sensor: 1 million readings/day, 100 actual anomalies - Detector A: β€œAlways normal” -> 99.99% accuracy, 0 anomalies detected - Detector B: 90% recall -> detects 90 anomalies, 1000 false alarms - Detector B is useful despite lower β€œaccuracy”

The Correct Understanding: | Metric | Definition | What It Measures | |——–|β€”β€”β€”β€”|β€”β€”β€”β€”β€”β€”| | Accuracy | (TP+TN)/(All) | Misleading for rare events | | Precision | TP/(TP+FP) | How many alerts are real | | Recall | TP/(TP+FN) | How many anomalies caught | | F1 Score | Harmonic mean | Balance of precision/recall |

Use precision, recall, and F1 for anomaly detection. Accuracy is almost meaningless.

TipTradeoff Decision Guide: Statistical vs ML Anomaly Detection
Factor Statistical (Z-score/IQR) ML (Isolation Forest/Autoencoder) When to Choose
Compute Requirements Minimal (<1KB RAM) Significant (MB-GB RAM) Statistical for edge devices; ML for cloud/gateway
Training Data Needed None (online calculation) 1000+ normal samples Statistical for cold-start; ML with historical data
Interpretability High (clear thresholds) Low (black box scores) Statistical for regulated/auditable systems
Multivariate Patterns Poor (single variable) Excellent (cross-sensor) ML for complex correlations; statistical for single sensors
Concept Drift Handling Manual threshold updates Automatic with retraining Statistical with domain expertise; ML for autonomous
False Positive Rate Higher (simple rules) Lower (learned patterns) ML when false alarm cost is high
Setup Time Minutes Days to weeks Statistical for rapid deployment; ML for mature systems

Quick Decision Rule: Start with Z-score/IQR for immediate value with minimal setup; graduate to ML methods only when you have sufficient training data AND the false positive reduction justifies the computational and maintenance overhead.


1354.8 Lab: Build an Anomaly Detection System

~45 min | Intermediate | P10.C01.LAB01

1354.8.1 Learning Objectives

By completing this hands-on lab, you will be able to:

  • Implement Z-score based anomaly detection on embedded hardware
  • Build a moving average baseline for adaptive thresholds
  • Apply IQR (Interquartile Range) method for robust outlier detection
  • Design threshold-based alerts with hysteresis to reduce false positives
  • Compare different anomaly detection methods and understand their tradeoffs
  • Visualize anomaly detection decisions in real-time
NoteWhat You’ll Build

A complete anomaly detection system on ESP32 that demonstrates four detection methods running simultaneously: Z-score (statistical), Moving Average deviation, IQR-based outliers, and threshold with hysteresis. You’ll see how each method responds differently to the same sensor data, helping you understand when to use each approach in production IoT systems.

1354.8.2 Anomaly Detection Methods Demonstrated

This lab implements several key anomaly detection patterns:

Method How It Works Strengths Weaknesses
Z-Score Measures standard deviations from mean Mathematically rigorous, well-understood Assumes normal distribution, sensitive to outliers in baseline
Moving Average Compares to rolling baseline Adapts to slow changes, simple Lag in detection, window size tuning needed
IQR (Interquartile Range) Uses quartiles for robust bounds Resistant to outliers, no distribution assumption Requires sorted data buffer, higher memory
Threshold + Hysteresis Fixed bounds with entry/exit gap Prevents oscillation, deterministic Requires domain knowledge for thresholds

1354.8.3 Wokwi Simulator

Use the embedded simulator below to build your anomaly detection system:

1354.8.4 Circuit Setup

Connect the temperature sensor and indicator LEDs to the ESP32:

Component ESP32 Pin Purpose
Temperature Sensor (NTC) GPIO 34 Primary data source for anomaly detection
Potentiometer GPIO 35 Simulate temperature variations/anomalies
Red LED GPIO 18 Z-score anomaly indicator
Orange LED GPIO 19 Moving average anomaly indicator
Yellow LED GPIO 21 IQR anomaly indicator
Green LED GPIO 22 Normal operation / hysteresis state
Blue LED GPIO 23 Threshold + hysteresis anomaly indicator

Add this diagram.json configuration in Wokwi:

{
  "version": 1,
  "author": "IoT Class - Anomaly Detection Lab",
  "editor": "wokwi",
  "parts": [
    { "type": "wokwi-esp32-devkit-v1", "id": "esp", "top": 0, "left": 0 },
    { "type": "wokwi-ntc-temperature-sensor", "id": "temp1", "top": -120, "left": 120 },
    { "type": "wokwi-potentiometer", "id": "pot1", "top": -120, "left": 260, "attrs": { "value": "50" } },
    { "type": "wokwi-led", "id": "led_red", "top": 180, "left": 80, "attrs": { "color": "red" } },
    { "type": "wokwi-led", "id": "led_orange", "top": 180, "left": 130, "attrs": { "color": "orange" } },
    { "type": "wokwi-led", "id": "led_yellow", "top": 180, "left": 180, "attrs": { "color": "yellow" } },
    { "type": "wokwi-led", "id": "led_green", "top": 180, "left": 230, "attrs": { "color": "green" } },
    { "type": "wokwi-led", "id": "led_blue", "top": 180, "left": 280, "attrs": { "color": "blue" } },
    { "type": "wokwi-resistor", "id": "r1", "top": 240, "left": 80, "attrs": { "value": "220" } },
    { "type": "wokwi-resistor", "id": "r2", "top": 240, "left": 130, "attrs": { "value": "220" } },
    { "type": "wokwi-resistor", "id": "r3", "top": 240, "left": 180, "attrs": { "value": "220" } },
    { "type": "wokwi-resistor", "id": "r4", "top": 240, "left": 230, "attrs": { "value": "220" } },
    { "type": "wokwi-resistor", "id": "r5", "top": 240, "left": 280, "attrs": { "value": "220" } }
  ],
  "connections": [
    ["esp:GND.1", "temp1:GND", "black", ["h0"]],
    ["esp:3V3", "temp1:VCC", "red", ["h0"]],
    ["esp:34", "temp1:OUT", "green", ["h0"]],
    ["esp:GND.1", "pot1:GND", "black", ["h0"]],
    ["esp:3V3", "pot1:VCC", "red", ["h0"]],
    ["esp:35", "pot1:SIG", "purple", ["h0"]],
    ["esp:18", "led_red:A", "red", ["h0"]],
    ["led_red:C", "r1:1", "black", ["h0"]],
    ["r1:2", "esp:GND.2", "black", ["h0"]],
    ["esp:19", "led_orange:A", "orange", ["h0"]],
    ["led_orange:C", "r2:1", "black", ["h0"]],
    ["r2:2", "esp:GND.2", "black", ["h0"]],
    ["esp:21", "led_yellow:A", "yellow", ["h0"]],
    ["led_yellow:C", "r3:1", "black", ["h0"]],
    ["r3:2", "esp:GND.2", "black", ["h0"]],
    ["esp:22", "led_green:A", "green", ["h0"]],
    ["led_green:C", "r4:1", "black", ["h0"]],
    ["r4:2", "esp:GND.2", "black", ["h0"]],
    ["esp:23", "led_blue:A", "blue", ["h0"]],
    ["led_blue:C", "r5:1", "black", ["h0"]],
    ["r5:2", "esp:GND.2", "black", ["h0"]]
  ]
}

1354.8.5 Step-by-Step Instructions

1354.8.5.1 Step 1: Set Up the Simulator

  1. Open the Wokwi simulator embedded above (or visit wokwi.com)
  2. Create a new ESP32 project
  3. Click the diagram.json tab and paste the circuit configuration
  4. Copy the Arduino code from the collapsible section below

1354.8.5.2 Step 2: Run and Observe Normal Operation

  1. Click the Play button to start the simulation
  2. Open the Serial Monitor to see detection output
  3. Keep the potentiometer at center position (normal temperature range)
  4. Observe: All four methods should show β€œNormal” with the green LED lit
  5. Watch the buffer fill as the system collects baseline data

1354.8.5.3 Step 3: Trigger Anomalies with the Potentiometer

  1. Slowly rotate the potentiometer to the right (increase temperature)
  2. Watch which method detects the anomaly first:
    • Hysteresis: Triggers when crossing 45C threshold
    • Z-score: Triggers when 2.5 standard deviations from mean
    • Moving Average: Triggers at 15% deviation
    • IQR: Triggers outside 1.5x interquartile range
  3. Note: Different LEDs light up as each method triggers

1354.8.5.4 Step 4: Observe Hysteresis Behavior

  1. Push temperature above 45C (potentiometer far right)
  2. Blue LED turns ON (entered anomaly state)
  3. Slowly decrease temperature by rotating potentiometer left
  4. Notice: Blue LED stays ON until temperature drops below 38C
  5. This gap (45C to 38C) is the hysteresis band - prevents oscillation

Copy this code into the Wokwi editor:

// Anomaly Detection Lab: Multi-Method Comparison System
// Demonstrates: Z-Score, Moving Average, IQR, Hysteresis

const int TEMP_PIN = 34;
const int POT_PIN = 35;
const int LED_ZSCORE = 18;
const int LED_MAVG = 19;
const int LED_IQR = 21;
const int LED_NORMAL = 22;
const int LED_HYSTERESIS = 23;

const int SAMPLE_INTERVAL_MS = 200;
const int WINDOW_SIZE = 50;
const float ZSCORE_THRESHOLD = 2.5;
const float MAVG_DEVIATION_PCT = 15.0;
const float IQR_MULTIPLIER = 1.5;
const float HYSTERESIS_HIGH = 45.0;
const float HYSTERESIS_LOW = 38.0;
const int CONSECUTIVE_REQUIRED = 3;

float dataBuffer[WINDOW_SIZE];
float sortedBuffer[WINDOW_SIZE];
int bufferIndex = 0;
int bufferCount = 0;

float runningSum = 0;
float runningSumSq = 0;
float movingAverage = 0;
bool inHysteresisAnomaly = false;

int zscoreConsecutive = 0;
int mavgConsecutive = 0;
int iqrConsecutive = 0;

unsigned long totalSamples = 0;
unsigned long lastSampleTime = 0;

void setup() {
  Serial.begin(115200);
  delay(1000);

  pinMode(TEMP_PIN, INPUT);
  pinMode(POT_PIN, INPUT);
  pinMode(LED_ZSCORE, OUTPUT);
  pinMode(LED_MAVG, OUTPUT);
  pinMode(LED_IQR, OUTPUT);
  pinMode(LED_NORMAL, OUTPUT);
  pinMode(LED_HYSTERESIS, OUTPUT);

  Serial.println("Anomaly Detection Lab Started");
  Serial.println("Adjust potentiometer to simulate anomalies");
}

float readTemperature() {
  int ntcRaw = analogRead(TEMP_PIN);
  int potRaw = analogRead(POT_PIN);
  float baseTemp = map(ntcRaw, 0, 4095, 2000, 4000) / 100.0;
  float offset = map(potRaw, 0, 4095, -2000, 3000) / 100.0;
  return baseTemp + offset;
}

void addToBuffer(float value) {
  if (bufferCount == WINDOW_SIZE) {
    float oldValue = dataBuffer[bufferIndex];
    runningSum -= oldValue;
    runningSumSq -= oldValue * oldValue;
  }

  dataBuffer[bufferIndex] = value;
  runningSum += value;
  runningSumSq += value * value;

  bufferIndex = (bufferIndex + 1) % WINDOW_SIZE;
  if (bufferCount < WINDOW_SIZE) bufferCount++;
}

float calculateZScore(float value) {
  if (bufferCount < 10) return 0;
  float mean = runningSum / bufferCount;
  float variance = (runningSumSq / bufferCount) - (mean * mean);
  if (variance <= 0) return 0;
  return abs((value - mean) / sqrt(variance));
}

float calculateMADeviation(float value) {
  if (bufferCount == 0) return 0;
  float avg = runningSum / bufferCount;
  if (avg == 0) return 0;
  return abs((value - avg) / avg) * 100.0;
}

void sortBuffer() {
  for (int i = 0; i < bufferCount; i++) {
    sortedBuffer[i] = dataBuffer[i];
  }
  for (int i = 0; i < bufferCount - 1; i++) {
    for (int j = 0; j < bufferCount - i - 1; j++) {
      if (sortedBuffer[j] > sortedBuffer[j + 1]) {
        float temp = sortedBuffer[j];
        sortedBuffer[j] = sortedBuffer[j + 1];
        sortedBuffer[j + 1] = temp;
      }
    }
  }
}

bool isIQRAnomaly(float value) {
  if (bufferCount < 20) return false;
  sortBuffer();
  float q1 = sortedBuffer[bufferCount / 4];
  float q3 = sortedBuffer[(3 * bufferCount) / 4];
  float iqr = q3 - q1;
  float lowerFence = q1 - (IQR_MULTIPLIER * iqr);
  float upperFence = q3 + (IQR_MULTIPLIER * iqr);
  return (value < lowerFence || value > upperFence);
}

bool checkHysteresis(float value) {
  if (!inHysteresisAnomaly) {
    if (value > HYSTERESIS_HIGH) inHysteresisAnomaly = true;
  } else {
    if (value < HYSTERESIS_LOW) inHysteresisAnomaly = false;
  }
  return inHysteresisAnomaly;
}

bool updateConsecutive(bool detected, int* counter) {
  if (detected) {
    (*counter)++;
    return (*counter) >= CONSECUTIVE_REQUIRED;
  } else {
    *counter = 0;
    return false;
  }
}

void loop() {
  unsigned long now = millis();

  if (now - lastSampleTime >= SAMPLE_INTERVAL_MS) {
    lastSampleTime = now;
    totalSamples++;

    float temp = readTemperature();
    addToBuffer(temp);

    float zscore = calculateZScore(temp);
    float maDeviation = calculateMADeviation(temp);

    bool zscoreRaw = (zscore > ZSCORE_THRESHOLD) && (bufferCount >= 10);
    bool mavgRaw = (maDeviation > MAVG_DEVIATION_PCT) && (bufferCount >= 5);
    bool iqrRaw = isIQRAnomaly(temp);
    bool hystAnomaly = checkHysteresis(temp);

    bool zscoreAnomaly = updateConsecutive(zscoreRaw, &zscoreConsecutive);
    bool mavgAnomaly = updateConsecutive(mavgRaw, &mavgConsecutive);
    bool iqrAnomaly = updateConsecutive(iqrRaw, &iqrConsecutive);

    digitalWrite(LED_ZSCORE, zscoreAnomaly ? HIGH : LOW);
    digitalWrite(LED_MAVG, mavgAnomaly ? HIGH : LOW);
    digitalWrite(LED_IQR, iqrAnomaly ? HIGH : LOW);
    digitalWrite(LED_HYSTERESIS, hystAnomaly ? HIGH : LOW);

    bool anyAnomaly = zscoreAnomaly || mavgAnomaly || iqrAnomaly || hystAnomaly;
    digitalWrite(LED_NORMAL, anyAnomaly ? LOW : HIGH);

    Serial.print("T:");
    Serial.print(temp, 1);
    Serial.print(" Z:");
    Serial.print(zscore, 2);
    Serial.print(" MA%:");
    Serial.print(maDeviation, 1);
    Serial.print(" | ");
    if (zscoreAnomaly) Serial.print("Z-SCORE ");
    if (mavgAnomaly) Serial.print("MA ");
    if (iqrAnomaly) Serial.print("IQR ");
    if (hystAnomaly) Serial.print("HYST ");
    if (!anyAnomaly) Serial.print("NORMAL");
    Serial.println();
  }
}

1354.8.6 Key Concepts Explained

How It Works: - Define two thresholds: HIGH (enter anomaly state) and LOW (exit anomaly state) - Once in anomaly state, must drop below LOW to exit - The gap between thresholds prevents oscillation

In This Lab: - HIGH = 45C (enter anomaly) - LOW = 38C (exit anomaly) - 7C hysteresis band

Strengths: - Deterministic and predictable - Prevents β€œbouncing” alerts at threshold boundary - Simple state machine implementation

When to Use: Safety-critical systems with known limits, regulatory compliance


1354.9 Summary

Performance metrics and threshold tuning are critical for production anomaly detection:

  • Metrics: Use precision, recall, F1 - never accuracy for imbalanced data
  • Threshold Tuning: Based on cost ratio of false negatives to false positives
  • False Alarm Reduction: Temporal persistence and multi-sensor correlation
  • Lab: Multiple methods have different trade-offs - use the right tool for the job

Key Takeaway: Anomaly detection is as much about operational tuning as algorithm selection. The best algorithm with poor thresholds performs worse than a simple method with well-calibrated thresholds.

1354.10 What’s Next

Return to the Anomaly Detection Overview for a complete summary, or explore related topics: