5  IoT Machine Learning Pipeline

In 60 Seconds

Building ML for IoT follows a systematic 7-step pipeline: data collection, cleaning, feature engineering, train/test split, model selection, evaluation, and deployment. The most critical pitfalls are using random instead of chronological splits for time-series data (causing 10-20% inflated accuracy), ignoring class imbalance, and training on clean lab data that fails in noisy real-world environments.

5.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Design ML Pipelines: Implement a systematic 7-step ML pipeline for IoT applications
  • Avoid Common Pitfalls: Diagnose and address data leakage, overfitting, and class imbalance
  • Select Appropriate Models: Choose ML algorithms based on accuracy, latency, and deployment constraints
  • Evaluate IoT ML Systems: Use appropriate metrics for imbalanced IoT datasets

Key Concepts

  • ML pipeline: The end-to-end sequence of steps transforming raw sensor data into deployed model predictions: data collection → preprocessing → feature engineering → model training → validation → deployment → monitoring.
  • Training-serving skew: A discrepancy between the feature distribution seen during model training and the distribution encountered in production, causing model performance to degrade after deployment.
  • Model registry: A versioned repository for trained ML models, storing model artifacts, performance metrics, and metadata, enabling reproducible deployments and rollbacks.
  • Shadow deployment: Running a new model in parallel with the current production system, comparing predictions without affecting live decisions, to validate real-world performance before cutover.
  • Drift detection: Monitoring the statistical properties of production input features and model output distributions over time to detect when the data distribution has shifted enough to require model retraining.

Data analytics and machine learning for IoT is about extracting useful insights from the massive streams of sensor data. Think of it like panning for gold – raw sensor readings are the river gravel, and analytics tools help you find the valuable nuggets of information hidden within. Machine learning takes this further by automatically learning patterns and making predictions.

5.2 Prerequisites

Chapter Series: Modeling and Inferencing

This is part 3 of the IoT Machine Learning series:

  1. ML Fundamentals - Core concepts
  2. Mobile Sensing - HAR, transportation
  3. IoT ML Pipeline (this chapter) - 7-step pipeline
  4. Edge ML & Deployment - TinyML
  5. Audio Feature Processing - MFCC
  6. Feature Engineering - Feature design
  7. Production ML - Monitoring

5.3 The 7-Step IoT ML Pipeline

Diagram showing 7-step ML pipeline stages and data flow
Figure 5.1: Seven-Step IoT ML Pipeline with Continuous Feedback Loop

5.3.1 How It Works: The ML Pipeline Process

The IoT ML pipeline provides a systematic framework for transforming raw sensor data into deployed predictive models:

Step 1: Data Collection - Gather diverse sensor readings from target environment (e.g., 2 weeks of vibration data from 100 pumps) Step 2: Data Cleaning - Remove outliers, handle missing values, filter sensor glitches using domain knowledge (e.g., physically impossible readings) Step 3: Feature Engineering - Transform raw time-series into statistical features (mean, variance, FFT peaks) that capture patterns Step 4: Train/Test Split - CRITICAL for time-series: use chronological split (not random) to avoid data leakage—train on past, test on future Step 5: Model Selection - Choose algorithm based on constraints (Decision Tree for MCU, Random Forest for ESP32, Neural Network for cloud) Step 6: Evaluation & Tuning - Optimize hyperparameters using cross-validation, measure with appropriate metrics (F1-score for imbalanced data) Step 7: Deployment & Monitoring - Deploy to edge/cloud, continuously monitor for accuracy degradation and feature drift

The pipeline is cyclical—monitoring (Step 7) feeds back to data collection (Step 1) when model performance degrades, triggering retraining with recent data. This continuous loop is essential because IoT environments change over time (sensor aging, seasonal patterns, equipment updates).

5.4 Step 1: Data Collection

Goal: Gather representative sensor data that captures the full range of conditions your model will encounter.

Data Source Sampling Rate Duration Labels
Accelerometer 50-100 Hz 1-2 weeks Activity type
Temperature 0.01-1 Hz 30+ days Normal/Anomaly
Audio 16 kHz 10+ hours Keyword/Not

Best Practices:

  • Collect from diverse users, devices, and environments
  • Include edge cases (unusual activities, sensor noise)
  • Document collection conditions (timestamp, device model, location)

5.5 Step 2: Data Cleaning

Goal: Remove noise, handle missing values, and ensure data quality.

# Common cleaning operations
def clean_sensor_data(df):
    # Remove outliers (sensor glitches)
    df = df[(df['accel_mag'] > 0) & (df['accel_mag'] < 50)]

    # Handle missing values
    df = df.interpolate(method='linear', limit=10)
    df = df.dropna()

    # Remove duplicates
    df = df.drop_duplicates(subset=['timestamp'])

    return df

Key Operations:

  • Outlier removal: Filter physically impossible values
  • Gap filling: Interpolate short gaps (< 10 samples)
  • Timestamp alignment: Synchronize multi-sensor data
Try It: Data Cleaning Impact Simulator

Adjust the parameters below to see how different cleaning operations affect your sensor dataset. Observe how aggressive vs. conservative settings trade data loss for quality.

5.6 Step 3: Feature Engineering

Goal: Transform raw sensor data into discriminative features that capture patterns.

Diagram showing feature pipeline stages and data flow
Figure 5.2: Feature Engineering Pipeline from Raw Data to Normalized Feature Vector

Feature Categories:

Category Features Purpose
Statistical Mean, Std, Min, Max, IQR Central tendency, spread
Signal Shape Zero crossings, Peak count Periodicity indicators
Frequency FFT peaks, Spectral energy Periodic patterns
Domain-Specific Step frequency, Bearing frequencies Application knowledge
Try It: Feature Engineering Explorer

Select a window size and see how statistical features are computed from raw sensor data. Notice how different window sizes capture different levels of detail.

5.7 Step 4: Train/Test Split

Critical for Time-Series: Use chronological splits, NOT random splits.

Data Leakage Warning

Wrong: Random 80/20 split (future data leaks into training)

Right: Chronological split (train on past, test on future)

# WRONG - Data leakage!
X_train, X_test = train_test_split(X, test_size=0.2, random_state=42)

# RIGHT - Chronological split
train_end = int(len(X) * 0.7)
val_end = int(len(X) * 0.85)

X_train = X[:train_end]          # Days 1-21
X_val = X[train_end:val_end]     # Days 22-25
X_test = X[val_end:]             # Days 26-30

Split Strategy:

Split Percentage Purpose
Training 70% Learn patterns
Validation 15% Tune hyperparameters
Test 15% Final evaluation

5.8 Step 5: Model Selection

Choose based on constraints:

Model Accuracy Model Size Inference Best For
Decision Tree 80-85% 5-50 KB < 1ms Interpretable, MCU
Random Forest 88-93% 200-500 KB 5-20ms Tabular data, ESP32
SVM 85-90% 10-100 KB 1-5ms High-dimensional
Neural Network 92-98% 1-10 MB 20-100ms Complex patterns
Quantized NN 90-95% 50-200 KB 5-30ms Edge AI
Decision tree diagram for model selection
Figure 5.3: Model Selection Decision Tree Based on Device Constraints
Try It: Model Selection Constraint Checker

Enter your device constraints below to see which ML models are feasible for your IoT deployment. The tool highlights compatible models and flags those that exceed your limits.

5.9 Step 6: Evaluation and Tuning

Use appropriate metrics for IoT:

Use Case Primary Metric Why
Activity Recognition F1-Score, Accuracy Balanced classes
Fall Detection Recall, Specificity Rare events, false alarm cost
Anomaly Detection Precision @ Recall Class imbalance
Prediction MAE, RMSE Continuous output

Hyperparameter Tuning:

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 15, None],
    'min_samples_leaf': [1, 2, 5]
}

grid_search = GridSearchCV(
    RandomForestClassifier(),
    param_grid,
    cv=5,  # 5-fold cross-validation
    scoring='f1_macro',
    n_jobs=-1
)

grid_search.fit(X_train, y_train)
best_model = grid_search.best_estimator_

5.10 Step 7: Deployment and Monitoring

Deployment Options:

Location When to Use Tools
Edge (MCU) Real-time, offline TensorFlow Lite Micro
Edge (ESP32/RPi) Moderate complexity TensorFlow Lite
Cloud Complex models, fleet analytics AWS SageMaker, Azure ML

Monitoring Checklist:

  • Track inference latency (P50, P95, P99)
  • Monitor prediction distribution
  • Detect feature drift (KL divergence)
  • Set alerts for accuracy degradation

5.11 Common Pipeline Pitfalls

Pitfall 1: Training on Clean Lab Data, Deploying to Noisy Real World

The Mistake: Developing ML models using carefully curated datasets collected under controlled laboratory conditions, then expecting the same performance when deployed to production environments.

Why It Happens: Lab datasets are convenient, well-labeled, and produce impressive accuracy numbers. Real-world data collection is expensive and messy.

The Fix:

  1. Data augmentation: Add synthetic noise matching expected sensor characteristics
  2. Domain randomization: Train on data from multiple devices and environments
  3. Staged deployment: Deploy to 5% of devices first, monitor for accuracy degradation
  4. Graceful degradation: Output confidence scores, reject uncertain predictions

Rule of thumb: If your lab accuracy is 95%, budget for 80-85% real-world accuracy.

Pitfall 2: Ignoring Class Imbalance

The Mistake: Training on imbalanced data (95% normal, 5% anomaly) and celebrating “95% accuracy” when the model just predicts “normal” for everything.

Why It Happens: Accuracy rewards majority class prediction.

The Fix:

  • Use precision, recall, F1-score, and ROC-AUC
  • Apply class weighting or SMOTE oversampling
  • Set decision thresholds based on business costs

Example: Fall detection with 99% normal, 1% falls: - Naive model: 99% accuracy, 0% recall (misses all falls!) - Proper model: 95% accuracy, 90% recall (catches most falls)

Class Imbalance Impact on Predictive Maintenance:

Quantifying how imbalanced data misleads naive accuracy metrics.

Dataset composition:

  • Normal operation samples: \(N_{\text{normal}} = 9{,}900\)
  • Failure event samples: \(N_{\text{failure}} = 100\)
  • Imbalance ratio: \(9{,}900 / 100 = 99:1\)

Naive classifier (always predicts “normal”): \[ \text{Accuracy} = \frac{N_{\text{normal}}}{N_{\text{normal}} + N_{\text{failure}}} = \frac{9{,}900}{10{,}000} = 99\% \]

But recall for failure class (proportion of actual failures detected): \[ \text{Recall}_{\text{failure}} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}} = \frac{0}{0 + 100} = 0\% \]

Weighted loss function correction: Assign class weights inversely proportional to frequency: \[ w_{\text{failure}} = \frac{N_{\text{total}}}{2 \times N_{\text{failure}}} = \frac{10{,}000}{200} = 50, \quad w_{\text{normal}} = \frac{10{,}000}{19{,}800} \approx 0.505 \]

With weighted loss, a false negative (missed failure) costs 50× more than a false positive—forcing the model to prioritize detecting rare failures over maximizing overall accuracy.

5.11.1 Explore: Class Imbalance Impact Calculator

Use the sliders below to see how class imbalance affects naive accuracy and the class weights needed to correct it.

Pitfall 3: Data Leakage in Time-Series

The Mistake: Using random train/test splits on time-series data, allowing the model to “see the future” during training. Models with data leakage show 10-20% higher test accuracy than real-world performance.

Why It Happens: Default sklearn train_test_split uses random sampling, breaking temporal order.

The Fix: Always use chronological splits (train on past, test on future). See the detailed walkthrough below for a complete analysis of why this matters.

The Mistake: An activity recognition system randomly splits accelerometer data into 80% training and 20% test sets using sklearn.train_test_split(). The model achieves 95% test accuracy, but fails completely in production (60% accuracy).

Why It Happens:

# Typical mistake: treating time-series like static data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# This randomly shuffles data, breaking temporal order

The Hidden Data Leakage:

Imagine a user’s morning commute from 8:00-8:30 AM:

8:00 AM: Walking (train set)
8:05 AM: Walking (test set)    <- Leakage!
8:10 AM: Walking (train set)
8:15 AM: Bus riding (test set)  <- Leakage!
8:20 AM: Bus riding (train set)
8:25 AM: Bus riding (test set)  <- Leakage!

The model learns patterns from 8:00-8:24 AM, then “predicts” 8:25 AM. But in production, it must predict tomorrow’s 8:25 AM based only on past days’ data. This is fundamentally different.

Real-World Impact:

# Lab results (random split with leakage)
Accuracy: 95.2%
Precision: 0.94
Recall: 0.96

# Production results (true future data)
Accuracy: 62.1%  <- 33% drop!
Precision: 0.58
Recall: 0.61

Why Performance Drops:

  1. Autocorrelation: Adjacent time windows are highly similar. Random split puts similar samples in train and test.
  2. Activity transitions: Model learns “Walking at 8:04 AM often continues at 8:05 AM” instead of general walking patterns.
  3. User-specific quirks: Random split leaks user-specific motion patterns into test set.

The Correct Approach:

# Chronological split: train on past, test on future
def chronological_split(X, y, timestamps, train_ratio=0.7, val_ratio=0.15):
    n = len(X)
    train_end = int(n * train_ratio)
    val_end = int(n * (train_ratio + val_ratio))

    X_train = X[:train_end]
    y_train = y[:train_end]

    X_val = X[train_end:val_end]
    y_val = y[train_end:val_end]

    X_test = X[val_end:]
    y_test = y[val_end:]

    print(f"Training: {timestamps[0]} to {timestamps[train_end]}")
    print(f"Validation: {timestamps[train_end]} to {timestamps[val_end]}")
    print(f"Test: {timestamps[val_end]} to {timestamps[-1]}")

    return X_train, X_val, X_test, y_train, y_val, y_test

Even Better: User-Based Splits for Multi-User Systems

# Split by users, not time (test generalization to new users)
train_users = ['user_01', 'user_02', ..., 'user_24']  # 80% of 30 users
val_users = ['user_25', 'user_26', 'user_27']          # 10%
test_users = ['user_28', 'user_29', 'user_30']         # 10%

X_train = X[X['user_id'].isin(train_users)]
X_val = X[X['user_id'].isin(val_users)]
X_test = X[X['user_id'].isin(test_users)]

Results Comparison:

Split Method Test Accuracy Production Accuracy Accuracy Gap
Random Split (WRONG) 95.2% 62.1% 33.1% drop
Chronological Split 87.6% 85.2% 2.4% drop
User-Based Split 83.1% 82.8% 0.3% drop

Rule of Thumb: If your lab test accuracy is more than 10% higher than production accuracy, you likely have data leakage. Chronological or user-based splits are mandatory for time-series IoT data.

5.12 Worked Example: Model Selection for Industrial Predictive Maintenance

Scenario: A manufacturing plant needs vibration-based predictive maintenance on 500 CNC machines. Each machine has a 3-axis accelerometer at 4 kHz. The edge device is an ESP32 (520KB RAM, 240 MHz).

Constraints:

  • Model must fit in 150KB
  • Inference < 100ms
  • Target: >90% recall for Critical class, >70% precision

Model Comparison:

Model Size Latency Critical Recall Critical Precision
Random Forest (100 trees) 2.1 MB 45ms 87% 68%
Decision Tree Ensemble (10 trees) 89 KB 8ms 82% 71%
+ Frequency Features 112 KB 35ms 91% 74%
+ INT8 Quantization 28 KB 28ms 90% 73%

Result: INT8 quantized decision tree ensemble with 16 features achieves 90% critical recall, 28ms inference, 28KB model size.

Key Insight: Start with the simplest model that fits your constraints, then add complexity only if metrics demand it.

Try It: Quantization Trade-off Explorer

Explore how model quantization reduces size and latency at the cost of accuracy. Adjust the original model parameters to see how INT8 and INT4 quantization affect deployment feasibility.

5.13 Knowledge Check

Scenario: A chemical plant operates 120 centrifugal pumps critical to production. Unplanned downtime costs $50,000/hour. Current reactive maintenance causes 8-12 emergency failures per year. Goal: Build ML model to predict pump failure 7 days in advance.

Available Data: 18 months of operational data from 120 pumps with 47 documented failures.

Complete 7-Step Pipeline Implementation:

Step 1: Data Collection

# Sensor data collected at 1-minute intervals
sensors = {
    'vibration_x': 'Accelerometer X-axis (m/s²)',
    'vibration_y': 'Accelerometer Y-axis (m/s²)',
    'vibration_z': 'Accelerometer Z-axis (m/s²)',
    'temperature_bearing': 'Bearing temperature (°C)',
    'temperature_motor': 'Motor temperature (°C)',
    'flow_rate': 'GPM',
    'pressure_inlet': 'PSI',
    'pressure_outlet': 'PSI',
    'power_consumption': 'kW',
    'rpm': 'Rotations per minute'
}

# Failure labels from maintenance logs
# Label: 1 (failure within 7 days), 0 (normal operation)

# Class distribution:
# Normal: 775,000 samples (99.4%)
# Pre-failure: 4,700 samples (0.6%)
# → Severe class imbalance!

Step 2: Data Cleaning

import pandas as pd
import numpy as np

def clean_sensor_data(df):
    # Remove sensor glitches (physically impossible values)
    df = df[(df['vibration_magnitude'] >= 0) & (df['vibration_magnitude'] < 50)]
    df = df[(df['temperature_bearing'] > 0) & (df['temperature_bearing'] < 150)]

    # Handle missing values (sensor dropouts)
    # Linear interpolation for gaps < 10 minutes
    for col in sensor_columns:
        df[col] = df[col].interpolate(method='linear', limit=10)

    # Drop remaining NaN (gaps > 10 min indicate sensor failure)
    df = df.dropna()

    # Remove duplicate timestamps (logging errors)
    df = df.drop_duplicates(subset=['pump_id', 'timestamp'])

    # Synchronize multi-sensor data (align to 1-min buckets)
    df['timestamp'] = df['timestamp'].dt.floor('1min')

    return df

# Result: 775k samples → 761k samples after cleaning (1.8% removed)

Step 3: Feature Engineering

def extract_pump_features(df_window):
    """Extract features from 1-hour sliding window"""
    features = {}

    # Vibration features (key indicators of bearing wear)
    vib_magnitude = np.linalg.norm(df_window[['vibration_x', 'vibration_y', 'vibration_z']], axis=1)
    features['vib_rms'] = np.sqrt(np.mean(vib_magnitude ** 2))
    features['vib_peak'] = np.max(vib_magnitude)
    features['vib_kurtosis'] = scipy.stats.kurtosis(vib_magnitude)  # Spikiness indicates wear

    # Temperature trends (gradual increase before failure)
    features['temp_bearing_mean'] = df_window['temperature_bearing'].mean()
    features['temp_bearing_trend'] = (df_window['temperature_bearing'].iloc[-1] -
                                       df_window['temperature_bearing'].iloc[0])
    features['temp_motor_vs_bearing'] = (df_window['temperature_motor'].mean() -
                                          df_window['temperature_bearing'].mean())

    # Flow anomalies (cavitation or blockage)
    features['flow_std'] = df_window['flow_rate'].std()
    features['flow_efficiency'] = (df_window['flow_rate'].mean() /
                                    df_window['power_consumption'].mean())

    # Pressure differential (pump performance)
    features['pressure_diff'] = (df_window['pressure_outlet'].mean() -
                                  df_window['pressure_inlet'].mean())
    features['pressure_diff_std'] = (df_window['pressure_outlet'] -
                                      df_window['pressure_inlet']).std()

    # Power consumption patterns
    features['power_per_flow'] = (df_window['power_consumption'].mean() /
                                   df_window['flow_rate'].mean())

    return features

# Result: 12 engineered features from 10 raw sensors

Step 4: Train/Test Split (CRITICAL for Time-Series)

# WRONG: Random split (causes data leakage!)
# X_train, X_test = train_test_split(X, test_size=0.2, random_state=42)

# RIGHT: Chronological split by pump
# Training: First 14 months, all pumps
# Validation: Months 15-16, all pumps
# Test: Months 17-18, all pumps (never seen by model)

train_cutoff = '2023-03-01'
val_cutoff = '2023-05-01'

X_train = X[X['timestamp'] < train_cutoff]
y_train = y[X['timestamp'] < train_cutoff]

X_val = X[(X['timestamp'] >= train_cutoff) & (X['timestamp'] < val_cutoff)]
y_val = y[(X['timestamp'] >= train_cutoff) & (X['timestamp'] < val_cutoff)]

X_test = X[X['timestamp'] >= val_cutoff]
y_test = y[X['timestamp'] >= val_cutoff]

# Also split by pump (test on different pumps)
test_pump_ids = random.sample(pump_ids, 24)  # 20% of pumps

Step 5: Model Selection with Class Imbalance Handling

from sklearn.ensemble import RandomForestClassifier
from imblearn.over_sampling import SMOTE

# Handle 99.4% / 0.6% imbalance with SMOTE
smote = SMOTE(sampling_strategy=0.1)  # Increase minority to 10% (not 50% - too synthetic)
X_train_balanced, y_train_balanced = smote.fit_resample(X_train, y_train)

# Random Forest with class weights
model = RandomForestClassifier(
    n_estimators=200,
    max_depth=15,
    class_weight={0: 1, 1: 50},  # Penalize false negatives 50× more
    random_state=42
)

model.fit(X_train_balanced, y_train_balanced)

Step 6: Evaluation with Appropriate Metrics

from sklearn.metrics import classification_report, roc_auc_score, precision_recall_curve

y_pred = model.predict(X_test)
y_pred_proba = model.predict_proba(X_test)[:, 1]

# Accuracy is USELESS for imbalanced data
accuracy = (y_pred == y_test).mean()
print(f"Accuracy: {accuracy:.1%}")  # 98.7% - but predicting all "normal" gives 99.4%!

# Use precision, recall, F1, ROC-AUC
print(classification_report(y_test, y_pred))

# Results:
#               precision  recall  f1-score
# Normal (0)      0.997     0.982     0.989
# Pre-fail (1)    0.68      0.91      0.78   ← Key metric
# ROC-AUC: 0.94

# Business metric: Catch 91% of failures with 68% precision
# → 91% of failures predicted 7 days early
# → 32% false alarms (acceptable for $50k/hr downtime cost)

Step 7: Deployment and Monitoring

# Deploy to edge gateway (Raspberry Pi 4 at each pump station)
# Model size: 4.2 MB (fits in RAM)
# Inference time: 15 ms per pump (8 pumps per gateway = 120 ms total)

# Monitoring dashboard tracks:
# 1. Prediction distribution (drift detection)
# 2. Feature values vs training distribution (KL divergence)
# 3. False alarm rate (business metric)
# 4. Missed failure rate (critical safety metric)

# Alert rule:
# If probability > 0.7 for 3 consecutive hours → Alert maintenance team
# If probability > 0.9 → Immediate inspection required

Results:

  • Before ML: 8-12 emergency failures/year, $400k-$600k downtime cost
  • After ML (first year): 2 emergency failures (missed predictions), 10 successful early interventions, $100k downtime cost
  • ROI: $500k savings - $80k implementation cost = $420k net benefit
  • Payback period: 2 months

Key Lesson: The 7-step pipeline structure forced systematic thinking that caught the data leakage risk (Step 4) and class imbalance issue (Step 6) that would have made a naive model unusable.

Use this decision tree to select appropriate ML algorithms based on your constraints and data characteristics:

Decision Point Question If YES → If NO →
1. Data Type Is your data time-series (sequential sensor readings)? Consider LSTM, GRU, or time-series features + Random Forest Use standard tabular ML (skip to 2)
2. Labeled Data Do you have >1,000 labeled examples per class? Supervised learning (proceed to 3) Use unsupervised (anomaly detection, clustering)
3. Real-Time Must inference run in <10ms on MCU? Decision Tree or tiny MLP (proceed to 4) Can use Random Forest, SVM, or NN (proceed to 5)
4. Memory RAM <100 KB? Decision Tree (≤10 depth) or quantized MLP Random Forest (50-100 trees) or SVM
5. Interpretability Must you explain predictions to regulators/users? Decision Tree or Linear Model Neural Network acceptable
6. Class Balance Is minority class <10% of data? Apply SMOTE + class weights Standard training OK
7. Complexity Tried simple model first? Deploy simple model; iterate if needed Start with simplest: Decision Tree

Algorithm Selection Matrix:

Algorithm Accuracy Speed Memory Interpretability Best For
Decision Tree Medium <1ms 5-50 KB High MCU deployment, need explainability
Random Forest High 5-20ms 200-500 KB Medium Tabular data, ESP32+ devices
SVM High 1-5ms 10-100 KB Low High-dimensional, small datasets
Neural Network Very High 20-100ms 1-10 MB Very Low Complex patterns, cloud/edge
Quantized NN High 5-30ms 50-200 KB Very Low Compromise: accuracy + edge deployment
LSTM/GRU Very High 50-200ms 5-20 MB Very Low Sequential time-series, cloud

Example Decision Paths:

Scenario 1: Fall Detection on Wearable (Cortex-M4, 256KB RAM)

  • Time-series? YES → Extract time-domain features (no LSTM on MCU)
  • Labeled data? YES (200 fall events, 5,000 normal activities)
  • Real-time <10ms? YES (safety-critical)
  • RAM <100 KB? NO (256 KB available) → Decision: Random Forest (10 trees) with 8 time-domain features → Result: 96% recall, 8 ms inference, 45 KB model

Scenario 2: Predictive Maintenance on Industrial Gateway (RPi 4, 4GB RAM)

  • Time-series? YES → Can use LSTM or feature engineering
  • Labeled data? YES (1,500 failure events over 2 years)
  • Real-time <10ms? NO (hourly predictions acceptable)
  • Memory constraint? NO (plenty of RAM)
  • Interpretability? YES (maintenance teams need explanations) → Decision: Random Forest (200 trees) with 15 engineered features → Result: 91% recall, 15 ms inference, 4.2 MB model, feature importance plots

Scenario 3: Smart Thermostat Occupancy Detection (ESP32, 520KB RAM)

  • Time-series? NO (snapshot features: temperature, humidity, CO2, motion)
  • Labeled data? YES (10,000 labeled samples: occupied/vacant)
  • Real-time <10ms? NO (update every minute)
  • RAM <100 KB? NO (520 KB available)
  • Interpretability? NO (consumer product, no explanation needed) → Decision: Small Neural Network (3 layers: 4→8→8→1) → Result: 94% accuracy, 2 ms inference, 12 KB model

Common Pitfall: Starting with deep learning because it’s trendy, then realizing it won’t fit on the target device. Always start with the simplest model that might work (Decision Tree), then increase complexity only if metrics demand it.

Checklist Before Deploying ML Model:

5.14 Concept Relationships

The IoT ML Pipeline integrates all concepts from the ML series:

The pipeline represents best practices distilled from thousands of failed IoT ML projects—skipping any step (especially the chronological split in Step 4) leads to models that appear accurate in development but fail catastrophically in production.

5.15 See Also

Related Chapters:

External Resources:

5.16 Try It Yourself

Hands-On Challenge: Implement the 7-step pipeline for temperature anomaly detection

Task: Build a complete pipeline from scratch using Python:

  1. Generate Synthetic Data (Step 1):
    • Create 30 days of hourly temperature readings (720 samples)
    • Normal: 20-25°C with daily cycles
    • Inject 10 anomalies (spikes to 40°C or drops to 5°C)
  2. Clean Data (Step 2):
    • Add 5% missing values, interpolate gaps <3 hours
    • Add sensor glitches (readings > 50°C), remove them
  3. Feature Engineering (Step 3):
    • Per 6-hour window: mean, std, rate of change, deviation from 24h average
  4. Split Data (Step 4):
    • Chronological: Days 1-21 train, 22-25 validation, 26-30 test
  5. Train Model (Step 5):
    • Isolation Forest for anomaly detection
  6. Evaluate (Step 6):
    • Precision, recall, F1 on test set
  7. Deploy & Monitor (Step 7):
    • Run on new data, track false positive rate

What to Observe:

  • Random split inflates accuracy by 10-20% vs chronological split (data leakage)
  • Feature engineering reduces false positives dramatically vs raw values
  • Small training sets (<100 samples) lead to overfitting

5.17 Summary

This chapter covered the systematic 7-step IoT ML pipeline:

  1. Data Collection: Gather diverse, representative sensor data
  2. Data Cleaning: Remove outliers, handle missing values, align timestamps
  3. Feature Engineering: Extract time-domain and frequency-domain features
  4. Train/Test Split: Use chronological splits to avoid data leakage
  5. Model Selection: Choose based on RAM, latency, and accuracy constraints
  6. Evaluation: Use F1-score, recall, precision for imbalanced data
  7. Deployment: Monitor inference latency and prediction drift

Key Takeaways:

  • Chronological splits are mandatory for time-series data
  • Class imbalance requires specialized metrics and techniques
  • Start simple (Decision Tree), add complexity only if needed
  • Real-world accuracy is typically 10-15% lower than lab accuracy

Building a smart brain for sensors – the Sensor Squad’s 7-step recipe!

Max the Microcontroller wants to build a brain that can predict when a machine is about to break. But how do you teach a computer brain? The Sensor Squad follows a 7-step recipe!

Step 1 - Collect Data: Sammy the Sensor listens to the machine for 30 days, recording every vibration and temperature change. “I need to hear what NORMAL sounds like AND what BREAKING sounds like!”

Step 2 - Clean Up: Some of Sammy’s recordings have glitches – like when someone bumped into the sensor. Max removes these mistakes. “Garbage in, garbage out!” he says.

Step 3 - Find Clues: Instead of feeding the brain millions of raw numbers, they extract clues: “How loud was the average vibration? Did the temperature go up? Were there any sudden spikes?”

Step 4 - Split the Data: Here is the tricky part! They use the FIRST 3 weeks to teach the brain, and the LAST week to test it. “Never peek at the test!” warns Lila the LED. “That would be like studying the answer key before an exam!”

Step 5 - Pick a Brain Type: Small device? Use a simple brain (decision tree). Big computer? Use a fancy brain (neural network). “Start simple!” says Max.

Step 6 - Grade the Brain: How many times did it predict correctly? Did it miss any real breakdowns? “Getting 95% right sounds good, but if it misses the ONE time the machine actually breaks, that is terrible!” explains Bella the Battery.

Step 7 - Deploy and Watch: The brain goes to work! But they keep checking it every month. “Machines change over time,” says Sammy. “Our brain needs to keep learning too!”

5.17.1 Try This at Home!

Try predicting tomorrow’s weather using only the last 7 days. Write down the temperature each day for a week. On day 8, guess the temperature before checking. Were you close? That is basically what machine learning does – it looks at patterns in old data to predict the future!

5.18 What’s Next

Direction Chapter Focus
Next Edge ML & Deployment TinyML techniques for resource-constrained IoT devices
Previous Mobile Sensing & Activity HAR pipeline and transportation mode detection
Related Production ML Monitoring, drift detection, and continuous retraining