%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart TB
Issues[Production Issues]
Issues --> Drift[Accuracy Degraded<br/>Concept Drift]
Issues --> Latency[High Latency<br/>Model Too Large]
Issues --> Battery[Battery Drain<br/>Too Frequent]
Issues --> Memory[Model Crashes<br/>Out of Memory]
Drift --> DriftSol[Compare distributions<br/>Retrain with recent data]
Latency --> LatencySol[Profile inference<br/>Quantize/prune model]
Battery --> BatterySol[Check duty cycle<br/>Reduce sampling rate]
Memory --> MemSol[Monitor heap usage<br/>Optimize batch size]
style Issues fill:#E67E22,stroke:#2C3E50,color:#fff
style Drift fill:#E74C3C,stroke:#2C3E50,color:#fff
style Latency fill:#E74C3C,stroke:#2C3E50,color:#fff
style Battery fill:#E74C3C,stroke:#2C3E50,color:#fff
style Memory fill:#E74C3C,stroke:#2C3E50,color:#fff
style DriftSol fill:#27AE60,stroke:#2C3E50,color:#fff
style LatencySol fill:#27AE60,stroke:#2C3E50,color:#fff
style BatterySol fill:#27AE60,stroke:#2C3E50,color:#fff
style MemSol fill:#27AE60,stroke:#2C3E50,color:#fff
1351 Production ML: Monitoring and Anomaly Detection
1351.1 Learning Objectives
By the end of this chapter, you will be able to:
- Monitor Production ML: Track key metrics and detect model degradation
- Design Anomaly Detection Pipelines: Build predictive maintenance systems
- Handle Concept Drift: Detect and respond to changing data distributions
- Debug IoT ML Systems: Diagnose and fix common production issues
1351.2 Prerequisites
- IoT ML Pipeline: 7-step ML pipeline
- Edge ML & Deployment: Deployment strategies
- Feature Engineering: Feature design
This is part 7 (final) of the IoT Machine Learning series:
- ML Fundamentals - Core concepts
- Mobile Sensing - HAR, transportation
- IoT ML Pipeline - 7-step pipeline
- Edge ML & Deployment - TinyML
- Audio Feature Processing - MFCC
- Feature Engineering - Feature design
- Production ML (this chapter) - Monitoring and anomaly detection
1351.3 Monitoring IoT ML Models
Deploying ML to IoT introduces unique challenges. Unlike cloud ML with direct server access, IoT models run on distributed, often disconnected edge devices.
1351.3.1 Key Metrics to Track
| Metric Category | Critical Metrics | Alert Threshold |
|---|---|---|
| Model Performance | Inference accuracy, confidence distribution | < 80% baseline, KL divergence > 0.3 |
| Inference Latency | P50/P99 latency, timeout rate | > 100ms / > 500ms, > 1% |
| Resource Usage | CPU, memory, battery drain | > 80% sustained, > 5mW avg |
| Data Quality | Missing features, out-of-range values | > 5% devices, > 1% readings |
1351.3.2 Common Issues and Solutions
1351.4 Case Study: Fall Detection False Positives
Problem: Fall detection achieves 95% accuracy and 99.9% specificity, but generates 182 false alarms per user per year.
Root Cause: Class imbalance. Falls are rare (1 per year), but system checks every 100ms. With 3.15M non-fall checks per year Γ 0.1% FP rate = 3,150 false alarms!
1351.4.1 Solution: Three-Stage Pipeline
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart LR
Stream[Sensor Stream<br/>Every 100ms]
Stream --> Stage1[Stage 1:<br/>High Sensitivity<br/>Accel > 3.0g<br/>99% sensitivity<br/>90% specificity]
Stage1 -->|Alert| Stage2[Stage 2:<br/>High Specificity<br/>Full ML Model<br/>95% sensitivity<br/>99.99% specificity]
Stage2 -->|Confirmed| Stage3[Stage 3:<br/>User Confirmation<br/>10-second window]
Stage3 -->|No response| Alert[Emergency<br/>Services]
Stage1 -->|Normal| Stream
Stage2 -->|False Alarm| Stream
Stage3 -->|User OK| Stream
style Stream fill:#7F8C8D,stroke:#2C3E50,color:#fff
style Stage1 fill:#2C3E50,stroke:#16A085,color:#fff
style Stage2 fill:#16A085,stroke:#2C3E50,color:#fff
style Stage3 fill:#E67E22,stroke:#2C3E50,color:#fff
style Alert fill:#E74C3C,stroke:#2C3E50,color:#fff
Results: False positives reduced from 3,150 to 1.5 per user per year (acceptable!)
1351.5 Monitoring Tools Comparison
| Tool | Best For | IoT Support | Key Features |
|---|---|---|---|
| TensorBoard | Training metrics | Limited | Visualization, profiling |
| MLflow | Experiment tracking | Good | Model versioning |
| Prometheus + Grafana | Infrastructure | Excellent | Time-series, alerting |
| Edge Impulse | TinyML | Native | Device profiling, OTA |
1351.6 Production ML Checklist
Before deploying ML models to IoT devices:
- Model Performance: Establish baseline accuracy, track per-class metrics
- Data Quality: Track feature drift using KS test, KL divergence
- Inference Performance: Monitor latency percentiles (P50/P90/P99)
- Fleet Management: Track model versions, implement gradual rollout
- Operational: Automate retraining, configure drift alerts
1351.7 Worked Example: Predictive Maintenance Pipeline
Scenario: A water utility operates 200 industrial pumps with vibration sensors (4 kHz accelerometer). Goal: Detect failures 2-4 weeks before they occur.
1351.7.1 Step 1: Feature Engineering from Vibration Data
import numpy as np
from scipy import signal, stats
from scipy.fftpack import fft
def extract_vibration_features(raw_signal, sample_rate=4000):
"""
Extract predictive maintenance features from vibration.
Input: 1 second of raw accelerometer (4000 samples)
Output: 18 features capturing mechanical health
"""
features = {}
# Time-domain features
features['rms'] = np.sqrt(np.mean(raw_signal**2))
features['peak'] = np.max(np.abs(raw_signal))
features['crest_factor'] = features['peak'] / features['rms']
features['kurtosis'] = stats.kurtosis(raw_signal)
# Frequency-domain features
fft_vals = np.abs(fft(raw_signal))[:len(raw_signal)//2]
freqs = np.fft.fftfreq(len(raw_signal), 1/sample_rate)[:len(raw_signal)//2]
# Spectral energy in bands
features['energy_0_500hz'] = np.sum(fft_vals[(freqs < 500)]**2)
features['energy_500_1000hz'] = np.sum(fft_vals[(freqs >= 500) & (freqs < 1000)]**2)
# Bearing fault frequencies
rpm = 1800
bpfo = 0.4 * (rpm/60) * 9 # Ball Pass Frequency Outer
features['bpfo_energy'] = np.sum(fft_vals[(freqs >= bpfo-5) & (freqs <= bpfo+5)]**2)
return featuresFeature importance from domain knowledge:
| Feature | Physical Meaning | Fault Correlation |
|---|---|---|
| RMS | Overall vibration | General degradation (r=0.85) |
| Kurtosis | Impulsiveness | Bearing pitting (r=0.92) |
| Crest factor | Peak-to-RMS | Localized damage (r=0.78) |
| BPFO energy | Bearing fault frequency | Outer race wear (r=0.95) |
1351.7.2 Step 2: Model Selection
With only 12 failures vs 400,000+ hours of normal data, supervised classification would overfit. Use semi-supervised anomaly detection.
| Model | Approach | Strengths | Best For |
|---|---|---|---|
| Isolation Forest | Tree-based isolation | Fast, handles high-dim | General anomalies |
| One-Class SVM | Boundary around normal | Robust to outliers | Small datasets |
| LSTM-Autoencoder | Temporal reconstruction | Captures sequences | Time-series |
Selected: Ensemble of Isolation Forest + LSTM-Autoencoder - Isolation Forest: Fast (<10ms), runs on edge gateway - LSTM-AE: Captures temporal patterns, runs in cloud for flagged pumps
1351.7.3 Step 3: Threshold Tuning (Cost-Based)
| Event | Cost |
|---|---|
| Unplanned failure | $15,000 (repair + downtime) |
| Planned maintenance (true positive) | $3,000 (scheduled repair) |
| False alarm inspection | $500 (technician visit) |
Threshold analysis:
| Threshold | Recall | FP/year | FN/year | Annual Cost |
|---|---|---|---|---|
| 0.70 | 100% | 80 | 0 | $76,000 |
| 0.80 | 95% | 35 | 0.6 | $48,500 |
| 0.85 | 92% | 18 | 1 | $42,000 |
| 0.90 | 83% | 8 | 2 | $44,000 |
Selected: 0.85 - Optimal cost with 92% recall
1351.7.4 Step 4: Deployment Architecture
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart TB
subgraph Sensors["200 Pumps"]
P1[Pump 1]
P2[Pump 2]
PN[Pump N]
end
subgraph Edge["Edge Gateway"]
FE[Feature Extraction<br/>4 kHz to 18 features/sec]
IF[Isolation Forest<br/>Stage 1 Inference]
ALERT[Alert if<br/>score > 0.7]
end
subgraph Cloud["Cloud Platform"]
LSTM[LSTM-Autoencoder<br/>Stage 2 Inference]
TREND[Trend Analysis<br/>7-day rolling]
DASH[Maintenance<br/>Dashboard]
end
P1 & P2 & PN --> FE --> IF --> ALERT
ALERT -->|Cellular| LSTM --> TREND --> DASH
style Sensors fill:#2C3E50,stroke:#16A085,color:#fff
style Edge fill:#E67E22,stroke:#2C3E50,color:#fff
style Cloud fill:#16A085,stroke:#2C3E50,color:#fff
Edge gateway specs: - Hardware: Raspberry Pi 4 ($75) - Feature extraction: 4 kHz β 1 Hz (4000Γ reduction) - Isolation Forest: 2 MB, 5ms inference - Connectivity: 4G LTE, 1 MB/day
1351.7.5 Step 5: Continuous Learning Pipeline
def monthly_model_update():
"""
1. Collect new data
2. Validate data quality
3. Retrain models
4. A/B test new model
5. Deploy if improved
"""
# Step 1: Data collection
new_normal_data = fetch_last_month_normal_data()
new_failure_data = fetch_confirmed_failures()
# Step 2: Data quality checks
assert len(new_normal_data) > 10000
assert check_feature_drift(new_normal_data) < 0.3
# Step 3: Retrain
combined_normal = np.vstack([historical_normal_data, new_normal_data])
new_iso_forest = IsolationForest(n_estimators=100)
new_iso_forest.fit(combined_normal)
# Step 4: A/B validation
old_recall = evaluate_model(current_model, holdout_failures)
new_recall = evaluate_model(new_iso_forest, holdout_failures)
# Step 5: Deploy if improved
if new_recall >= old_recall - 0.05:
deploy_to_edge_gateways(new_iso_forest)Drift detection thresholds:
| Metric | Threshold | Action |
|---|---|---|
| Feature distribution shift | KL > 0.3 | Investigate |
| False positive rate increase | > 50% | Retrain |
| Missed failure | Any | Immediately retrain |
1351.7.6 Results
| Metric | Value |
|---|---|
| Failure detection rate | 92% (11/12 failures caught) |
| Lead time | 2-4 weeks before failure |
| False positive rate | 18/year (< 2/pump/year) |
| Annual cost reduction | $123,000 (68% savings) |
| ROI | 820% (cost: $15,000 development) |
1351.8 Production Monitoring Pitfall
The Mistake: Treating deployment as the final step without continuous monitoring for model degradation, data drift, or silent failures.
Why It Happens: Once accuracy looks good in testing, teams move on. ML monitoring is less established than application monitoring.
The Fix: 1. Prediction distribution monitoring: Track model outputs over time 2. Feature drift detection: Monitor input distributions (KL-divergence, PSI) 3. Ground truth sampling: Review 1-5% of predictions manually 4. Automatic alerting: Set thresholds for key metrics 5. Model versioning: Enable instant rollback
Warning sign: If you cannot answer βwhat was our model accuracy last week?β, you are operating blind.
1351.9 Knowledge Check
Question 1: A fall detection system achieves 95% accuracy but has 5% false positives. In a population of 1000 users with 1 fall per user per year, approximately how many false alarms occur annually?
Explanation: False positive rate applies to NON-FALL events. Events per user: 3650 non-falls (10/day Γ 365). Total: 1000 users Γ 3650 = 3.65M non-fall events. False alarms = 3.65M Γ 5% = 182,500/year. Solution: Increase specificity to 99.9% for only 3,650 false alarms total.
1351.10 Summary
This chapter covered production ML for IoT:
- Monitoring: Track accuracy, latency, resource usage, and data quality
- Anomaly Detection: Use semi-supervised learning for rare-event detection
- Cost-Based Thresholds: Optimize for business cost, not just accuracy
- Hybrid Architecture: Edge for real-time, cloud for complex analysis
- Continuous Learning: Monthly retraining with drift detection
Key Insight: Production ML requires as much engineering as model development. Monitoring, drift detection, and automated retraining are essential for long-term success.
1351.11 Whatβs Next
You have completed the IoT Machine Learning series. Explore related topics:
- Security and Privacy: Protecting ML models and data
- Multi-Sensor Data Fusion: Combining multiple sensor streams
- Stream Processing: Real-time data processing architectures