Building ML for IoT follows a systematic 7-step pipeline: data collection, cleaning, feature engineering, train/test split, model selection, evaluation, and deployment. The most critical pitfalls are using random instead of chronological splits for time-series data (causing 10-20% inflated accuracy), ignoring class imbalance, and training on clean lab data that fails in noisy real-world environments.
5.1 Learning Objectives
By the end of this chapter, you will be able to:
Design ML Pipelines: Implement a systematic 7-step ML pipeline for IoT applications
Avoid Common Pitfalls: Diagnose and address data leakage, overfitting, and class imbalance
Select Appropriate Models: Choose ML algorithms based on accuracy, latency, and deployment constraints
Evaluate IoT ML Systems: Use appropriate metrics for imbalanced IoT datasets
Key Concepts
ML pipeline: The end-to-end sequence of steps transforming raw sensor data into deployed model predictions: data collection → preprocessing → feature engineering → model training → validation → deployment → monitoring.
Training-serving skew: A discrepancy between the feature distribution seen during model training and the distribution encountered in production, causing model performance to degrade after deployment.
Model registry: A versioned repository for trained ML models, storing model artifacts, performance metrics, and metadata, enabling reproducible deployments and rollbacks.
Shadow deployment: Running a new model in parallel with the current production system, comparing predictions without affecting live decisions, to validate real-world performance before cutover.
Drift detection: Monitoring the statistical properties of production input features and model output distributions over time to detect when the data distribution has shifted enough to require model retraining.
For Beginners: IoT Machine Learning Pipeline
Data analytics and machine learning for IoT is about extracting useful insights from the massive streams of sensor data. Think of it like panning for gold – raw sensor readings are the river gravel, and analytics tools help you find the valuable nuggets of information hidden within. Machine learning takes this further by automatically learning patterns and making predictions.
5.2 Prerequisites
ML Fundamentals: Understanding training vs inference and feature extraction
Figure 5.1: Seven-Step IoT ML Pipeline with Continuous Feedback Loop
5.3.1 How It Works: The ML Pipeline Process
The IoT ML pipeline provides a systematic framework for transforming raw sensor data into deployed predictive models:
Step 1: Data Collection - Gather diverse sensor readings from target environment (e.g., 2 weeks of vibration data from 100 pumps) Step 2: Data Cleaning - Remove outliers, handle missing values, filter sensor glitches using domain knowledge (e.g., physically impossible readings) Step 3: Feature Engineering - Transform raw time-series into statistical features (mean, variance, FFT peaks) that capture patterns Step 4: Train/Test Split - CRITICAL for time-series: use chronological split (not random) to avoid data leakage—train on past, test on future Step 5: Model Selection - Choose algorithm based on constraints (Decision Tree for MCU, Random Forest for ESP32, Neural Network for cloud) Step 6: Evaluation & Tuning - Optimize hyperparameters using cross-validation, measure with appropriate metrics (F1-score for imbalanced data) Step 7: Deployment & Monitoring - Deploy to edge/cloud, continuously monitor for accuracy degradation and feature drift
The pipeline is cyclical—monitoring (Step 7) feeds back to data collection (Step 1) when model performance degrades, triggering retraining with recent data. This continuous loop is essential because IoT environments change over time (sensor aging, seasonal patterns, equipment updates).
5.4 Step 1: Data Collection
Goal: Gather representative sensor data that captures the full range of conditions your model will encounter.
Data Source
Sampling Rate
Duration
Labels
Accelerometer
50-100 Hz
1-2 weeks
Activity type
Temperature
0.01-1 Hz
30+ days
Normal/Anomaly
Audio
16 kHz
10+ hours
Keyword/Not
Best Practices:
Collect from diverse users, devices, and environments
Include edge cases (unusual activities, sensor noise)
Gap filling: Interpolate short gaps (< 10 samples)
Timestamp alignment: Synchronize multi-sensor data
Try It: Data Cleaning Impact Simulator
Adjust the parameters below to see how different cleaning operations affect your sensor dataset. Observe how aggressive vs. conservative settings trade data loss for quality.
Goal: Transform raw sensor data into discriminative features that capture patterns.
Figure 5.2: Feature Engineering Pipeline from Raw Data to Normalized Feature Vector
Feature Categories:
Category
Features
Purpose
Statistical
Mean, Std, Min, Max, IQR
Central tendency, spread
Signal Shape
Zero crossings, Peak count
Periodicity indicators
Frequency
FFT peaks, Spectral energy
Periodic patterns
Domain-Specific
Step frequency, Bearing frequencies
Application knowledge
Try It: Feature Engineering Explorer
Select a window size and see how statistical features are computed from raw sensor data. Notice how different window sizes capture different levels of detail.
featureResult = {const n =100;const w = windowSize;const numWindows =Math.floor(n / w);const signal = [];for (let i =0; i < n; i++) {if (signalType ==="Walking (periodic)") { signal.push(Math.sin(i *0.3) *5+ (Math.random() -0.5) *2); } elseif (signalType ==="Idle (low variance)") { signal.push(0.5+ (Math.random() -0.5) *0.3); } elseif (signalType ==="Fall event (spike)") { signal.push(i >=40&& i <=50?20+ (Math.random() -0.5) *3:1+ (Math.random() -0.5) *0.5); } else { signal.push(Math.sin(i *0.8) *3+Math.sin(i *2.5) *2+ (Math.random() -0.5) *4); } }const features = [];for (let wi =0; wi < numWindows; wi++) {const chunk = signal.slice(wi * w, (wi +1) * w);const mean = chunk.reduce((a, b) => a + b,0) / chunk.length;const std =Math.sqrt(chunk.reduce((a, b) => a + (b - mean) **2,0) / chunk.length);const range =Math.max(...chunk) -Math.min(...chunk);let zc =0;for (let j =1; j < chunk.length; j++) {if ((chunk[j] >=0&& chunk[j -1] <0) || (chunk[j] <0&& chunk[j -1] >=0)) zc++; }const rms =Math.sqrt(chunk.reduce((a, b) => a + b * b,0) / chunk.length);const val = featureType ==="Mean"? mean : featureType ==="Std Dev"? std : featureType ==="Min/Max Range"? range : featureType ==="Zero Crossings"? zc : rms; features.push({window: wi +1,value: val}); }const svgW =340;const svgH =120;const pad =30;const plotW = svgW -2* pad;const plotH = svgH -2* pad;const sigMin =Math.min(...signal);const sigMax =Math.max(...signal);const sigRange = sigMax - sigMin ||1;const sigPoints = signal.map((v, i) =>`${pad + (i / (n -1)) * plotW},${pad + plotH - ((v - sigMin) / sigRange) * plotH}`).join(" ");const fVals = features.map(f => f.value);const fMin =Math.min(...fVals);const fMax =Math.max(...fVals);const fRange = fMax - fMin ||1;const barW =Math.max(4,Math.floor(plotW / numWindows) -2);returnhtml`<div style="background: var(--bs-light, #f8f9fa); padding: 1.2rem; border-radius: 8px; border-left: 4px solid #3498DB; font-family: Arial, sans-serif; overflow-x: auto;"> <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 1rem;"> <div> <div style="font-weight: bold; color: #2C3E50; margin-bottom: 0.4rem; font-size: 0.95em;">Raw Signal (${n} samples)</div> <svg viewBox="0 0 ${svgW}${svgH}" style="width: 100%; max-width: ${svgW}px; height: auto; background: white; border-radius: 4px;"> <polyline points="${sigPoints}" fill="none" stroke="#3498DB" stroke-width="1.5"></polyline>${Array.from({length: numWindows}, (_, i) => {const x = pad + (i * w / (n -1)) * plotW;return`<line x1="${x}" y1="${pad}" x2="${x}" y2="${pad + plotH}" stroke="#7F8C8D" stroke-width="0.5" stroke-dasharray="3,3"></line>`; }).join("")} <text x="${pad}" y="${svgH -2}" font-size="10" fill="#7F8C8D">${numWindows} windows of ${w} samples</text> </svg> </div> <div> <div style="font-weight: bold; color: #2C3E50; margin-bottom: 0.4rem; font-size: 0.95em;">Extracted: ${featureType} per window</div> <svg viewBox="0 0 ${svgW}${svgH}" style="width: 100%; max-width: ${svgW}px; height: auto; background: white; border-radius: 4px;">${features.map((f, i) => {const x = pad + (i / numWindows) * plotW +1;const h = ((f.value- fMin) / fRange) * plotH *0.9;const y = pad + plotH - h;return`<rect x="${x}" y="${y}" width="${barW}" height="${h}" fill="#E67E22" rx="2"></rect>`; }).join("")} <text x="${pad}" y="${svgH -2}" font-size="10" fill="#7F8C8D">${featureType}: ${fVals.map(v => v.toFixed(2)).join(", ")}</text> </svg> </div> </div> <div style="margin-top: 0.8rem; padding: 0.6rem; background: #eaf2f8; border-radius: 4px; font-size: 0.9em; color: #2C3E50;"> <strong>Insight:</strong> ${w <10?"Small windows capture rapid changes but produce noisy features. Good for detecting sudden events like falls.": w >30?"Large windows smooth out noise but may miss brief events. Good for steady-state classification like activity type.":"Medium windows balance noise reduction with temporal resolution. A common choice for activity recognition."} From ${n} raw samples, you get <strong>${numWindows} feature vectors</strong> -- a ${(n / numWindows).toFixed(0)}x data reduction. </div> </div>`;}
5.7 Step 4: Train/Test Split
Critical for Time-Series: Use chronological splits, NOT random splits.
Data Leakage Warning
Wrong: Random 80/20 split (future data leaks into training)
Right: Chronological split (train on past, test on future)
# WRONG - Data leakage!X_train, X_test = train_test_split(X, test_size=0.2, random_state=42)# RIGHT - Chronological splittrain_end =int(len(X) *0.7)val_end =int(len(X) *0.85)X_train = X[:train_end] # Days 1-21X_val = X[train_end:val_end] # Days 22-25X_test = X[val_end:] # Days 26-30
Split Strategy:
Split
Percentage
Purpose
Training
70%
Learn patterns
Validation
15%
Tune hyperparameters
Test
15%
Final evaluation
5.8 Step 5: Model Selection
Choose based on constraints:
Model
Accuracy
Model Size
Inference
Best For
Decision Tree
80-85%
5-50 KB
< 1ms
Interpretable, MCU
Random Forest
88-93%
200-500 KB
5-20ms
Tabular data, ESP32
SVM
85-90%
10-100 KB
1-5ms
High-dimensional
Neural Network
92-98%
1-10 MB
20-100ms
Complex patterns
Quantized NN
90-95%
50-200 KB
5-30ms
Edge AI
Figure 5.3: Model Selection Decision Tree Based on Device Constraints
Try It: Model Selection Constraint Checker
Enter your device constraints below to see which ML models are feasible for your IoT deployment. The tool highlights compatible models and flags those that exceed your limits.
Pitfall 1: Training on Clean Lab Data, Deploying to Noisy Real World
The Mistake: Developing ML models using carefully curated datasets collected under controlled laboratory conditions, then expecting the same performance when deployed to production environments.
Why It Happens: Lab datasets are convenient, well-labeled, and produce impressive accuracy numbers. Real-world data collection is expensive and messy.
The Fix:
Data augmentation: Add synthetic noise matching expected sensor characteristics
Domain randomization: Train on data from multiple devices and environments
Staged deployment: Deploy to 5% of devices first, monitor for accuracy degradation
But recall for failure class (proportion of actual failures detected): \[
\text{Recall}_{\text{failure}} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}} = \frac{0}{0 + 100} = 0\%
\]
Weighted loss function correction: Assign class weights inversely proportional to frequency: \[
w_{\text{failure}} = \frac{N_{\text{total}}}{2 \times N_{\text{failure}}} = \frac{10{,}000}{200} = 50, \quad w_{\text{normal}} = \frac{10{,}000}{19{,}800} \approx 0.505
\]
With weighted loss, a false negative (missed failure) costs 50× more than a false positive—forcing the model to prioritize detecting rare failures over maximizing overall accuracy.
5.11.1 Explore: Class Imbalance Impact Calculator
Use the sliders below to see how class imbalance affects naive accuracy and the class weights needed to correct it.
Show code
viewof totalSamples = Inputs.range([100,50000], {value:10000,step:100,label:"Total samples"})viewof minorityPct = Inputs.range([0.1,25], {value:1.0,step:0.1,label:"Minority class (%)"})viewof modelRecall = Inputs.range([0,100], {value:0,step:1,label:"Model recall for minority (%)"})
The Mistake: Using random train/test splits on time-series data, allowing the model to “see the future” during training. Models with data leakage show 10-20% higher test accuracy than real-world performance.
Why It Happens: Default sklearn train_test_split uses random sampling, breaking temporal order.
The Fix: Always use chronological splits (train on past, test on future). See the detailed walkthrough below for a complete analysis of why this matters.
Deep Dive: Why Random Splits Fail on Time-Series Data
The Mistake: An activity recognition system randomly splits accelerometer data into 80% training and 20% test sets using sklearn.train_test_split(). The model achieves 95% test accuracy, but fails completely in production (60% accuracy).
Why It Happens:
# Typical mistake: treating time-series like static dataX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# This randomly shuffles data, breaking temporal order
The Hidden Data Leakage:
Imagine a user’s morning commute from 8:00-8:30 AM:
The model learns patterns from 8:00-8:24 AM, then “predicts” 8:25 AM. But in production, it must predict tomorrow’s 8:25 AM based only on past days’ data. This is fundamentally different.
Real-World Impact:
# Lab results (random split with leakage)Accuracy: 95.2%Precision: 0.94Recall: 0.96# Production results (true future data)Accuracy: 62.1%<-33% drop!Precision: 0.58Recall: 0.61
Why Performance Drops:
Autocorrelation: Adjacent time windows are highly similar. Random split puts similar samples in train and test.
Activity transitions: Model learns “Walking at 8:04 AM often continues at 8:05 AM” instead of general walking patterns.
User-specific quirks: Random split leaks user-specific motion patterns into test set.
The Correct Approach:
# Chronological split: train on past, test on futuredef chronological_split(X, y, timestamps, train_ratio=0.7, val_ratio=0.15): n =len(X) train_end =int(n * train_ratio) val_end =int(n * (train_ratio + val_ratio)) X_train = X[:train_end] y_train = y[:train_end] X_val = X[train_end:val_end] y_val = y[train_end:val_end] X_test = X[val_end:] y_test = y[val_end:]print(f"Training: {timestamps[0]} to {timestamps[train_end]}")print(f"Validation: {timestamps[train_end]} to {timestamps[val_end]}")print(f"Test: {timestamps[val_end]} to {timestamps[-1]}")return X_train, X_val, X_test, y_train, y_val, y_test
Even Better: User-Based Splits for Multi-User Systems
# Split by users, not time (test generalization to new users)train_users = ['user_01', 'user_02', ..., 'user_24'] # 80% of 30 usersval_users = ['user_25', 'user_26', 'user_27'] # 10%test_users = ['user_28', 'user_29', 'user_30'] # 10%X_train = X[X['user_id'].isin(train_users)]X_val = X[X['user_id'].isin(val_users)]X_test = X[X['user_id'].isin(test_users)]
Results Comparison:
Split Method
Test Accuracy
Production Accuracy
Accuracy Gap
Random Split (WRONG)
95.2%
62.1%
33.1% drop
Chronological Split
87.6%
85.2%
2.4% drop
User-Based Split
83.1%
82.8%
0.3% drop
Rule of Thumb: If your lab test accuracy is more than 10% higher than production accuracy, you likely have data leakage. Chronological or user-based splits are mandatory for time-series IoT data.
5.12 Worked Example: Model Selection for Industrial Predictive Maintenance
Scenario: A manufacturing plant needs vibration-based predictive maintenance on 500 CNC machines. Each machine has a 3-axis accelerometer at 4 kHz. The edge device is an ESP32 (520KB RAM, 240 MHz).
Constraints:
Model must fit in 150KB
Inference < 100ms
Target: >90% recall for Critical class, >70% precision
Model Comparison:
Model
Size
Latency
Critical Recall
Critical Precision
Random Forest (100 trees)
2.1 MB
45ms
87%
68%
Decision Tree Ensemble (10 trees)
89 KB
8ms
82%
71%
+ Frequency Features
112 KB
35ms
91%
74%
+ INT8 Quantization
28 KB
28ms
90%
73%
Result: INT8 quantized decision tree ensemble with 16 features achieves 90% critical recall, 28ms inference, 28KB model size.
Key Insight: Start with the simplest model that fits your constraints, then add complexity only if metrics demand it.
Try It: Quantization Trade-off Explorer
Explore how model quantization reduces size and latency at the cost of accuracy. Adjust the original model parameters to see how INT8 and INT4 quantization affect deployment feasibility.
Show code
viewof origAccuracy = Inputs.range([70,99], {value:92,step:0.5,label:"Original model accuracy (%)"})viewof origSizeKB = Inputs.range([50,10000], {value:2100,step:50,label:"Original model size (KB)"})viewof origLatencyMs = Inputs.range([1,500], {value:45,step:1,label:"Original inference latency (ms)"})viewof targetSizeKB = Inputs.range([10,2000], {value:150,step:10,label:"Target max size (KB)"})
Worked Example: Predictive Maintenance for Industrial Pumps
Scenario: A chemical plant operates 120 centrifugal pumps critical to production. Unplanned downtime costs $50,000/hour. Current reactive maintenance causes 8-12 emergency failures per year. Goal: Build ML model to predict pump failure 7 days in advance.
Available Data: 18 months of operational data from 120 pumps with 47 documented failures.
Complete 7-Step Pipeline Implementation:
Step 1: Data Collection
# Sensor data collected at 1-minute intervalssensors = {'vibration_x': 'Accelerometer X-axis (m/s²)','vibration_y': 'Accelerometer Y-axis (m/s²)','vibration_z': 'Accelerometer Z-axis (m/s²)','temperature_bearing': 'Bearing temperature (°C)','temperature_motor': 'Motor temperature (°C)','flow_rate': 'GPM','pressure_inlet': 'PSI','pressure_outlet': 'PSI','power_consumption': 'kW','rpm': 'Rotations per minute'}# Failure labels from maintenance logs# Label: 1 (failure within 7 days), 0 (normal operation)# Class distribution:# Normal: 775,000 samples (99.4%)# Pre-failure: 4,700 samples (0.6%)# → Severe class imbalance!
Step 2: Data Cleaning
import pandas as pdimport numpy as npdef clean_sensor_data(df):# Remove sensor glitches (physically impossible values) df = df[(df['vibration_magnitude'] >=0) & (df['vibration_magnitude'] <50)] df = df[(df['temperature_bearing'] >0) & (df['temperature_bearing'] <150)]# Handle missing values (sensor dropouts)# Linear interpolation for gaps < 10 minutesfor col in sensor_columns: df[col] = df[col].interpolate(method='linear', limit=10)# Drop remaining NaN (gaps > 10 min indicate sensor failure) df = df.dropna()# Remove duplicate timestamps (logging errors) df = df.drop_duplicates(subset=['pump_id', 'timestamp'])# Synchronize multi-sensor data (align to 1-min buckets) df['timestamp'] = df['timestamp'].dt.floor('1min')return df# Result: 775k samples → 761k samples after cleaning (1.8% removed)
Step 3: Feature Engineering
def extract_pump_features(df_window):"""Extract features from 1-hour sliding window""" features = {}# Vibration features (key indicators of bearing wear) vib_magnitude = np.linalg.norm(df_window[['vibration_x', 'vibration_y', 'vibration_z']], axis=1) features['vib_rms'] = np.sqrt(np.mean(vib_magnitude **2)) features['vib_peak'] = np.max(vib_magnitude) features['vib_kurtosis'] = scipy.stats.kurtosis(vib_magnitude) # Spikiness indicates wear# Temperature trends (gradual increase before failure) features['temp_bearing_mean'] = df_window['temperature_bearing'].mean() features['temp_bearing_trend'] = (df_window['temperature_bearing'].iloc[-1] - df_window['temperature_bearing'].iloc[0]) features['temp_motor_vs_bearing'] = (df_window['temperature_motor'].mean() - df_window['temperature_bearing'].mean())# Flow anomalies (cavitation or blockage) features['flow_std'] = df_window['flow_rate'].std() features['flow_efficiency'] = (df_window['flow_rate'].mean() / df_window['power_consumption'].mean())# Pressure differential (pump performance) features['pressure_diff'] = (df_window['pressure_outlet'].mean() - df_window['pressure_inlet'].mean()) features['pressure_diff_std'] = (df_window['pressure_outlet'] - df_window['pressure_inlet']).std()# Power consumption patterns features['power_per_flow'] = (df_window['power_consumption'].mean() / df_window['flow_rate'].mean())return features# Result: 12 engineered features from 10 raw sensors
Step 4: Train/Test Split (CRITICAL for Time-Series)
# WRONG: Random split (causes data leakage!)# X_train, X_test = train_test_split(X, test_size=0.2, random_state=42)# RIGHT: Chronological split by pump# Training: First 14 months, all pumps# Validation: Months 15-16, all pumps# Test: Months 17-18, all pumps (never seen by model)train_cutoff ='2023-03-01'val_cutoff ='2023-05-01'X_train = X[X['timestamp'] < train_cutoff]y_train = y[X['timestamp'] < train_cutoff]X_val = X[(X['timestamp'] >= train_cutoff) & (X['timestamp'] < val_cutoff)]y_val = y[(X['timestamp'] >= train_cutoff) & (X['timestamp'] < val_cutoff)]X_test = X[X['timestamp'] >= val_cutoff]y_test = y[X['timestamp'] >= val_cutoff]# Also split by pump (test on different pumps)test_pump_ids = random.sample(pump_ids, 24) # 20% of pumps
Step 5: Model Selection with Class Imbalance Handling
from sklearn.ensemble import RandomForestClassifierfrom imblearn.over_sampling import SMOTE# Handle 99.4% / 0.6% imbalance with SMOTEsmote = SMOTE(sampling_strategy=0.1) # Increase minority to 10% (not 50% - too synthetic)X_train_balanced, y_train_balanced = smote.fit_resample(X_train, y_train)# Random Forest with class weightsmodel = RandomForestClassifier( n_estimators=200, max_depth=15, class_weight={0: 1, 1: 50}, # Penalize false negatives 50× more random_state=42)model.fit(X_train_balanced, y_train_balanced)
Step 6: Evaluation with Appropriate Metrics
from sklearn.metrics import classification_report, roc_auc_score, precision_recall_curvey_pred = model.predict(X_test)y_pred_proba = model.predict_proba(X_test)[:, 1]# Accuracy is USELESS for imbalanced dataaccuracy = (y_pred == y_test).mean()print(f"Accuracy: {accuracy:.1%}") # 98.7% - but predicting all "normal" gives 99.4%!# Use precision, recall, F1, ROC-AUCprint(classification_report(y_test, y_pred))# Results:# precision recall f1-score# Normal (0) 0.997 0.982 0.989# Pre-fail (1) 0.68 0.91 0.78 ← Key metric# ROC-AUC: 0.94# Business metric: Catch 91% of failures with 68% precision# → 91% of failures predicted 7 days early# → 32% false alarms (acceptable for $50k/hr downtime cost)
Step 7: Deployment and Monitoring
# Deploy to edge gateway (Raspberry Pi 4 at each pump station)# Model size: 4.2 MB (fits in RAM)# Inference time: 15 ms per pump (8 pumps per gateway = 120 ms total)# Monitoring dashboard tracks:# 1. Prediction distribution (drift detection)# 2. Feature values vs training distribution (KL divergence)# 3. False alarm rate (business metric)# 4. Missed failure rate (critical safety metric)# Alert rule:# If probability > 0.7 for 3 consecutive hours → Alert maintenance team# If probability > 0.9 → Immediate inspection required
Results:
Before ML: 8-12 emergency failures/year, $400k-$600k downtime cost
After ML (first year): 2 emergency failures (missed predictions), 10 successful early interventions, $100k downtime cost
Key Lesson: The 7-step pipeline structure forced systematic thinking that caught the data leakage risk (Step 4) and class imbalance issue (Step 6) that would have made a naive model unusable.
Decision Framework: Choosing the Right ML Algorithm for IoT
Use this decision tree to select appropriate ML algorithms based on your constraints and data characteristics:
Decision Point
Question
If YES →
If NO →
1. Data Type
Is your data time-series (sequential sensor readings)?
Consider LSTM, GRU, or time-series features + Random Forest
Use standard tabular ML (skip to 2)
2. Labeled Data
Do you have >1,000 labeled examples per class?
Supervised learning (proceed to 3)
Use unsupervised (anomaly detection, clustering)
3. Real-Time
Must inference run in <10ms on MCU?
Decision Tree or tiny MLP (proceed to 4)
Can use Random Forest, SVM, or NN (proceed to 5)
4. Memory
RAM <100 KB?
Decision Tree (≤10 depth) or quantized MLP
Random Forest (50-100 trees) or SVM
5. Interpretability
Must you explain predictions to regulators/users?
Decision Tree or Linear Model
Neural Network acceptable
6. Class Balance
Is minority class <10% of data?
Apply SMOTE + class weights
Standard training OK
7. Complexity
Tried simple model first?
Deploy simple model; iterate if needed
Start with simplest: Decision Tree
Algorithm Selection Matrix:
Algorithm
Accuracy
Speed
Memory
Interpretability
Best For
Decision Tree
Medium
<1ms
5-50 KB
High
MCU deployment, need explainability
Random Forest
High
5-20ms
200-500 KB
Medium
Tabular data, ESP32+ devices
SVM
High
1-5ms
10-100 KB
Low
High-dimensional, small datasets
Neural Network
Very High
20-100ms
1-10 MB
Very Low
Complex patterns, cloud/edge
Quantized NN
High
5-30ms
50-200 KB
Very Low
Compromise: accuracy + edge deployment
LSTM/GRU
Very High
50-200ms
5-20 MB
Very Low
Sequential time-series, cloud
Example Decision Paths:
Scenario 1: Fall Detection on Wearable (Cortex-M4, 256KB RAM)
Time-series? YES → Extract time-domain features (no LSTM on MCU)
Labeled data? YES (200 fall events, 5,000 normal activities)
Real-time <10ms? YES (safety-critical)
RAM <100 KB? NO (256 KB available) → Decision: Random Forest (10 trees) with 8 time-domain features → Result: 96% recall, 8 ms inference, 45 KB model
Time-series? YES → Can use LSTM or feature engineering
Labeled data? YES (1,500 failure events over 2 years)
Real-time <10ms? NO (hourly predictions acceptable)
Memory constraint? NO (plenty of RAM)
Interpretability? YES (maintenance teams need explanations) → Decision: Random Forest (200 trees) with 15 engineered features → Result: 91% recall, 15 ms inference, 4.2 MB model, feature importance plots
Interpretability? NO (consumer product, no explanation needed) → Decision: Small Neural Network (3 layers: 4→8→8→1) → Result: 94% accuracy, 2 ms inference, 12 KB model
Common Pitfall: Starting with deep learning because it’s trendy, then realizing it won’t fit on the target device. Always start with the simplest model that might work (Decision Tree), then increase complexity only if metrics demand it.
Checklist Before Deploying ML Model:
5.14 Concept Relationships
The IoT ML Pipeline integrates all concepts from the ML series:
Connects To: Production ML extends Step 7 with monitoring and drift detection
Informs: Data Quality validates Step 2 cleaning procedures
The pipeline represents best practices distilled from thousands of failed IoT ML projects—skipping any step (especially the chronological split in Step 4) leads to models that appear accurate in development but fail catastrophically in production.
“Building Machine Learning Powered Applications” by Emmanuel Ameisen (O’Reilly, 2020)
5.16 Try It Yourself
Hands-On Challenge: Implement the 7-step pipeline for temperature anomaly detection
Task: Build a complete pipeline from scratch using Python:
Generate Synthetic Data (Step 1):
Create 30 days of hourly temperature readings (720 samples)
Normal: 20-25°C with daily cycles
Inject 10 anomalies (spikes to 40°C or drops to 5°C)
Clean Data (Step 2):
Add 5% missing values, interpolate gaps <3 hours
Add sensor glitches (readings > 50°C), remove them
Feature Engineering (Step 3):
Per 6-hour window: mean, std, rate of change, deviation from 24h average
Split Data (Step 4):
Chronological: Days 1-21 train, 22-25 validation, 26-30 test
Train Model (Step 5):
Isolation Forest for anomaly detection
Evaluate (Step 6):
Precision, recall, F1 on test set
Deploy & Monitor (Step 7):
Run on new data, track false positive rate
What to Observe:
Random split inflates accuracy by 10-20% vs chronological split (data leakage)
Feature engineering reduces false positives dramatically vs raw values
Small training sets (<100 samples) lead to overfitting
Interactive Quiz: Match Concepts
Interactive Quiz: Sequence the Steps
Label the Diagram
5.17 Summary
This chapter covered the systematic 7-step IoT ML pipeline:
Data Collection: Gather diverse, representative sensor data
Data Cleaning: Remove outliers, handle missing values, align timestamps
Feature Engineering: Extract time-domain and frequency-domain features
Train/Test Split: Use chronological splits to avoid data leakage
Model Selection: Choose based on RAM, latency, and accuracy constraints
Evaluation: Use F1-score, recall, precision for imbalanced data
Deployment: Monitor inference latency and prediction drift
Key Takeaways:
Chronological splits are mandatory for time-series data
Class imbalance requires specialized metrics and techniques
Start simple (Decision Tree), add complexity only if needed
Real-world accuracy is typically 10-15% lower than lab accuracy
For Kids: Meet the Sensor Squad!
Building a smart brain for sensors – the Sensor Squad’s 7-step recipe!
Max the Microcontroller wants to build a brain that can predict when a machine is about to break. But how do you teach a computer brain? The Sensor Squad follows a 7-step recipe!
Step 1 - Collect Data: Sammy the Sensor listens to the machine for 30 days, recording every vibration and temperature change. “I need to hear what NORMAL sounds like AND what BREAKING sounds like!”
Step 2 - Clean Up: Some of Sammy’s recordings have glitches – like when someone bumped into the sensor. Max removes these mistakes. “Garbage in, garbage out!” he says.
Step 3 - Find Clues: Instead of feeding the brain millions of raw numbers, they extract clues: “How loud was the average vibration? Did the temperature go up? Were there any sudden spikes?”
Step 4 - Split the Data: Here is the tricky part! They use the FIRST 3 weeks to teach the brain, and the LAST week to test it. “Never peek at the test!” warns Lila the LED. “That would be like studying the answer key before an exam!”
Step 5 - Pick a Brain Type: Small device? Use a simple brain (decision tree). Big computer? Use a fancy brain (neural network). “Start simple!” says Max.
Step 6 - Grade the Brain: How many times did it predict correctly? Did it miss any real breakdowns? “Getting 95% right sounds good, but if it misses the ONE time the machine actually breaks, that is terrible!” explains Bella the Battery.
Step 7 - Deploy and Watch: The brain goes to work! But they keep checking it every month. “Machines change over time,” says Sammy. “Our brain needs to keep learning too!”
5.17.1 Try This at Home!
Try predicting tomorrow’s weather using only the last 7 days. Write down the temperature each day for a week. On day 8, guess the temperature before checking. Were you close? That is basically what machine learning does – it looks at patterns in old data to predict the future!