1351 Production ML: Monitoring and Anomaly Detection

1351.1 Learning Objectives

By the end of this chapter, you will be able to:

Monitor Production ML: Track key metrics and detect model degradation
Design Anomaly Detection Pipelines: Build predictive maintenance systems
Handle Concept Drift: Detect and respond to changing data distributions
Debug IoT ML Systems: Diagnose and fix common production issues

1351.2 Prerequisites

IoT ML Pipeline: 7-step ML pipeline
Edge ML & Deployment: Deployment strategies
Feature Engineering: Feature design

Chapter Series: Modeling and Inferencing

This is part 7 (final) of the IoT Machine Learning series:

ML Fundamentals - Core concepts
Mobile Sensing - HAR, transportation
IoT ML Pipeline - 7-step pipeline
Edge ML & Deployment - TinyML
Audio Feature Processing - MFCC
Feature Engineering - Feature design
Production ML (this chapter) - Monitoring and anomaly detection

1351.3 Monitoring IoT ML Models

Deploying ML to IoT introduces unique challenges. Unlike cloud ML with direct server access, IoT models run on distributed, often disconnected edge devices.

1351.3.1 Key Metrics to Track

Metric Category	Critical Metrics	Alert Threshold
Model Performance	Inference accuracy, confidence distribution	< 80% baseline, KL divergence > 0.3
Inference Latency	P50/P99 latency, timeout rate	> 100ms / > 500ms, > 1%
Resource Usage	CPU, memory, battery drain	> 80% sustained, > 5mW avg
Data Quality	Missing features, out-of-range values	> 5% devices, > 1% readings

1351.3.2 Common Issues and Solutions

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart TB
    Issues[Production Issues]

    Issues --> Drift[Accuracy Degraded<br/>Concept Drift]
    Issues --> Latency[High Latency<br/>Model Too Large]
    Issues --> Battery[Battery Drain<br/>Too Frequent]
    Issues --> Memory[Model Crashes<br/>Out of Memory]

    Drift --> DriftSol[Compare distributions<br/>Retrain with recent data]
    Latency --> LatencySol[Profile inference<br/>Quantize/prune model]
    Battery --> BatterySol[Check duty cycle<br/>Reduce sampling rate]
    Memory --> MemSol[Monitor heap usage<br/>Optimize batch size]

    style Issues fill:#E67E22,stroke:#2C3E50,color:#fff
    style Drift fill:#E74C3C,stroke:#2C3E50,color:#fff
    style Latency fill:#E74C3C,stroke:#2C3E50,color:#fff
    style Battery fill:#E74C3C,stroke:#2C3E50,color:#fff
    style Memory fill:#E74C3C,stroke:#2C3E50,color:#fff
    style DriftSol fill:#27AE60,stroke:#2C3E50,color:#fff
    style LatencySol fill:#27AE60,stroke:#2C3E50,color:#fff
    style BatterySol fill:#27AE60,stroke:#2C3E50,color:#fff
    style MemSol fill:#27AE60,stroke:#2C3E50,color:#fff

Figure 1351.1: IoT ML Production Issues with Diagnostic Solutions

1351.4 Case Study: Fall Detection False Positives

Problem: Fall detection achieves 95% accuracy and 99.9% specificity, but generates 182 false alarms per user per year.

Root Cause: Class imbalance. Falls are rare (1 per year), but system checks every 100ms. With 3.15M non-fall checks per year × 0.1% FP rate = 3,150 false alarms!

1351.4.1 Solution: Three-Stage Pipeline

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart LR
    Stream[Sensor Stream<br/>Every 100ms]

    Stream --> Stage1[Stage 1:<br/>High Sensitivity<br/>Accel > 3.0g<br/>99% sensitivity<br/>90% specificity]

    Stage1 -->|Alert| Stage2[Stage 2:<br/>High Specificity<br/>Full ML Model<br/>95% sensitivity<br/>99.99% specificity]

    Stage2 -->|Confirmed| Stage3[Stage 3:<br/>User Confirmation<br/>10-second window]

    Stage3 -->|No response| Alert[Emergency<br/>Services]

    Stage1 -->|Normal| Stream
    Stage2 -->|False Alarm| Stream
    Stage3 -->|User OK| Stream

    style Stream fill:#7F8C8D,stroke:#2C3E50,color:#fff
    style Stage1 fill:#2C3E50,stroke:#16A085,color:#fff
    style Stage2 fill:#16A085,stroke:#2C3E50,color:#fff
    style Stage3 fill:#E67E22,stroke:#2C3E50,color:#fff
    style Alert fill:#E74C3C,stroke:#2C3E50,color:#fff

Figure 1351.2: Three-Stage Fall Detection with Progressive Filtering

Results: False positives reduced from 3,150 to 1.5 per user per year (acceptable!)

1351.5 Monitoring Tools Comparison

Tool	Best For	IoT Support	Key Features
TensorBoard	Training metrics	Limited	Visualization, profiling
MLflow	Experiment tracking	Good	Model versioning
Prometheus + Grafana	Infrastructure	Excellent	Time-series, alerting
Edge Impulse	TinyML	Native	Device profiling, OTA

1351.6 Production ML Checklist

Before deploying ML models to IoT devices:

Model Performance: Establish baseline accuracy, track per-class metrics
Data Quality: Track feature drift using KS test, KL divergence
Inference Performance: Monitor latency percentiles (P50/P90/P99)
Fleet Management: Track model versions, implement gradual rollout
Operational: Automate retraining, configure drift alerts

1351.7 Worked Example: Predictive Maintenance Pipeline

Scenario: A water utility operates 200 industrial pumps with vibration sensors (4 kHz accelerometer). Goal: Detect failures 2-4 weeks before they occur.

1351.7.1 Step 1: Feature Engineering from Vibration Data

import numpy as np
from scipy import signal, stats
from scipy.fftpack import fft

def extract_vibration_features(raw_signal, sample_rate=4000):
    """
    Extract predictive maintenance features from vibration.
    Input: 1 second of raw accelerometer (4000 samples)
    Output: 18 features capturing mechanical health
    """
    features = {}

    # Time-domain features
    features['rms'] = np.sqrt(np.mean(raw_signal**2))
    features['peak'] = np.max(np.abs(raw_signal))
    features['crest_factor'] = features['peak'] / features['rms']
    features['kurtosis'] = stats.kurtosis(raw_signal)

    # Frequency-domain features
    fft_vals = np.abs(fft(raw_signal))[:len(raw_signal)//2]
    freqs = np.fft.fftfreq(len(raw_signal), 1/sample_rate)[:len(raw_signal)//2]

    # Spectral energy in bands
    features['energy_0_500hz'] = np.sum(fft_vals[(freqs < 500)]**2)
    features['energy_500_1000hz'] = np.sum(fft_vals[(freqs >= 500) & (freqs < 1000)]**2)

    # Bearing fault frequencies
    rpm = 1800
    bpfo = 0.4 * (rpm/60) * 9  # Ball Pass Frequency Outer
    features['bpfo_energy'] = np.sum(fft_vals[(freqs >= bpfo-5) & (freqs <= bpfo+5)]**2)

    return features

Feature importance from domain knowledge:

Feature	Physical Meaning	Fault Correlation
RMS	Overall vibration	General degradation (r=0.85)
Kurtosis	Impulsiveness	Bearing pitting (r=0.92)
Crest factor	Peak-to-RMS	Localized damage (r=0.78)
BPFO energy	Bearing fault frequency	Outer race wear (r=0.95)

1351.7.2 Step 2: Model Selection

With only 12 failures vs 400,000+ hours of normal data, supervised classification would overfit. Use semi-supervised anomaly detection.

Model	Approach	Strengths	Best For
Isolation Forest	Tree-based isolation	Fast, handles high-dim	General anomalies
One-Class SVM	Boundary around normal	Robust to outliers	Small datasets
LSTM-Autoencoder	Temporal reconstruction	Captures sequences	Time-series

Selected: Ensemble of Isolation Forest + LSTM-Autoencoder - Isolation Forest: Fast (<10ms), runs on edge gateway - LSTM-AE: Captures temporal patterns, runs in cloud for flagged pumps

1351.7.3 Step 3: Threshold Tuning (Cost-Based)

Event	Cost
Unplanned failure	$15,000 (repair + downtime)
Planned maintenance (true positive)	$3,000 (scheduled repair)
False alarm inspection	$500 (technician visit)

Threshold analysis:

Threshold	Recall	FP/year	FN/year	Annual Cost
0.70	100%	80	0	$76,000
0.80	95%	35	0.6	$48,500
0.85	92%	18	1	$42,000
0.90	83%	8	2	$44,000

Selected: 0.85 - Optimal cost with 92% recall

1351.7.4 Step 4: Deployment Architecture

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart TB
    subgraph Sensors["200 Pumps"]
        P1[Pump 1]
        P2[Pump 2]
        PN[Pump N]
    end

    subgraph Edge["Edge Gateway"]
        FE[Feature Extraction<br/>4 kHz to 18 features/sec]
        IF[Isolation Forest<br/>Stage 1 Inference]
        ALERT[Alert if<br/>score > 0.7]
    end

    subgraph Cloud["Cloud Platform"]
        LSTM[LSTM-Autoencoder<br/>Stage 2 Inference]
        TREND[Trend Analysis<br/>7-day rolling]
        DASH[Maintenance<br/>Dashboard]
    end

    P1 & P2 & PN --> FE --> IF --> ALERT
    ALERT -->|Cellular| LSTM --> TREND --> DASH

    style Sensors fill:#2C3E50,stroke:#16A085,color:#fff
    style Edge fill:#E67E22,stroke:#2C3E50,color:#fff
    style Cloud fill:#16A085,stroke:#2C3E50,color:#fff

Figure 1351.3: Predictive Maintenance Edge-to-Cloud Architecture

Edge gateway specs: - Hardware: Raspberry Pi 4 ($75) - Feature extraction: 4 kHz → 1 Hz (4000× reduction) - Isolation Forest: 2 MB, 5ms inference - Connectivity: 4G LTE, 1 MB/day

1351.7.5 Step 5: Continuous Learning Pipeline

def monthly_model_update():
    """
    1. Collect new data
    2. Validate data quality
    3. Retrain models
    4. A/B test new model
    5. Deploy if improved
    """
    # Step 1: Data collection
    new_normal_data = fetch_last_month_normal_data()
    new_failure_data = fetch_confirmed_failures()

    # Step 2: Data quality checks
    assert len(new_normal_data) > 10000
    assert check_feature_drift(new_normal_data) < 0.3

    # Step 3: Retrain
    combined_normal = np.vstack([historical_normal_data, new_normal_data])
    new_iso_forest = IsolationForest(n_estimators=100)
    new_iso_forest.fit(combined_normal)

    # Step 4: A/B validation
    old_recall = evaluate_model(current_model, holdout_failures)
    new_recall = evaluate_model(new_iso_forest, holdout_failures)

    # Step 5: Deploy if improved
    if new_recall >= old_recall - 0.05:
        deploy_to_edge_gateways(new_iso_forest)

Drift detection thresholds:

Metric	Threshold	Action
Feature distribution shift	KL > 0.3	Investigate
False positive rate increase	> 50%	Retrain
Missed failure	Any	Immediately retrain

1351.7.6 Results

Metric	Value
Failure detection rate	92% (11/12 failures caught)
Lead time	2-4 weeks before failure
False positive rate	18/year (< 2/pump/year)
Annual cost reduction	$123,000 (68% savings)
ROI	820% (cost: $15,000 development)

1351.8 Production Monitoring Pitfall

Pitfall: Deploying ML Without Production Monitoring

The Mistake: Treating deployment as the final step without continuous monitoring for model degradation, data drift, or silent failures.

Why It Happens: Once accuracy looks good in testing, teams move on. ML monitoring is less established than application monitoring.

The Fix: 1. Prediction distribution monitoring: Track model outputs over time 2. Feature drift detection: Monitor input distributions (KL-divergence, PSI) 3. Ground truth sampling: Review 1-5% of predictions manually 4. Automatic alerting: Set thresholds for key metrics 5. Model versioning: Enable instant rollback

Warning sign: If you cannot answer “what was our model accuracy last week?”, you are operating blind.

1351.9 Knowledge Check

Question 1: A fall detection system achieves 95% accuracy but has 5% false positives. In a population of 1000 users with 1 fall per user per year, approximately how many false alarms occur annually?

Explanation: False positive rate applies to NON-FALL events. Events per user: 3650 non-falls (10/day × 365). Total: 1000 users × 3650 = 3.65M non-fall events. False alarms = 3.65M × 5% = 182,500/year. Solution: Increase specificity to 99.9% for only 3,650 false alarms total.

1351.10 Summary

This chapter covered production ML for IoT:

Monitoring: Track accuracy, latency, resource usage, and data quality
Anomaly Detection: Use semi-supervised learning for rare-event detection
Cost-Based Thresholds: Optimize for business cost, not just accuracy
Hybrid Architecture: Edge for real-time, cloud for complex analysis
Continuous Learning: Monthly retraining with drift detection

Key Insight: Production ML requires as much engineering as model development. Monitoring, drift detection, and automated retraining are essential for long-term success.

1351.11 What’s Next

You have completed the IoT Machine Learning series. Explore related topics:

Security and Privacy: Protecting ML models and data
Multi-Sensor Data Fusion: Combining multiple sensor streams
Stream Processing: Real-time data processing architectures