1350 IoT Machine Learning Pipeline

1350.1 Learning Objectives

By the end of this chapter, you will be able to:

Design ML Pipelines: Implement a systematic 7-step ML pipeline for IoT applications
Avoid Common Pitfalls: Recognize and address data leakage, overfitting, and class imbalance
Select Appropriate Models: Choose ML algorithms based on accuracy, latency, and deployment constraints
Evaluate IoT ML Systems: Use appropriate metrics for imbalanced IoT datasets

1350.2 Prerequisites

ML Fundamentals: Understanding training vs inference and feature extraction
Mobile Sensing: Activity recognition concepts

Chapter Series: Modeling and Inferencing

This is part 3 of the IoT Machine Learning series:

ML Fundamentals - Core concepts
Mobile Sensing - HAR, transportation
IoT ML Pipeline (this chapter) - 7-step pipeline
Edge ML & Deployment - TinyML
Audio Feature Processing - MFCC
Feature Engineering - Feature design
Production ML - Monitoring

1350.3 The 7-Step IoT ML Pipeline

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart LR
    S1[1. Data<br/>Collection] --> S2[2. Data<br/>Cleaning]
    S2 --> S3[3. Feature<br/>Engineering]
    S3 --> S4[4. Train/Test<br/>Split]
    S4 --> S5[5. Model<br/>Selection]
    S5 --> S6[6. Evaluation<br/>& Tuning]
    S6 --> S7[7. Deployment<br/>& Monitoring]

    S7 -.->|Retrain| S1

    style S1 fill:#2C3E50,stroke:#16A085,color:#fff
    style S2 fill:#16A085,stroke:#2C3E50,color:#fff
    style S3 fill:#E67E22,stroke:#2C3E50,color:#fff
    style S4 fill:#9B59B6,stroke:#2C3E50,color:#fff
    style S5 fill:#3498DB,stroke:#2C3E50,color:#fff
    style S6 fill:#1ABC9C,stroke:#2C3E50,color:#fff
    style S7 fill:#27AE60,stroke:#2C3E50,color:#fff

Figure 1350.1: Seven-Step IoT ML Pipeline with Continuous Feedback Loop

1350.4 Step 1: Data Collection

Goal: Gather representative sensor data that captures the full range of conditions your model will encounter.

Data Source	Sampling Rate	Duration	Labels
Accelerometer	50-100 Hz	1-2 weeks	Activity type
Temperature	1-60 Hz	30+ days	Normal/Anomaly
Audio	16 kHz	10+ hours	Keyword/Not

Best Practices: - Collect from diverse users, devices, and environments - Include edge cases (unusual activities, sensor noise) - Document collection conditions (timestamp, device model, location)

1350.5 Step 2: Data Cleaning

Goal: Remove noise, handle missing values, and ensure data quality.

# Common cleaning operations
def clean_sensor_data(df):
    # Remove outliers (sensor glitches)
    df = df[(df['accel_mag'] > 0) & (df['accel_mag'] < 50)]

    # Handle missing values
    df = df.interpolate(method='linear', limit=10)
    df = df.dropna()

    # Remove duplicates
    df = df.drop_duplicates(subset=['timestamp'])

    return df

Key Operations: - Outlier removal: Filter physically impossible values - Gap filling: Interpolate short gaps (< 10 samples) - Timestamp alignment: Synchronize multi-sensor data

1350.6 Step 3: Feature Engineering

Goal: Transform raw sensor data into discriminative features that capture patterns.

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart TB
    Raw[Raw Sensor Data<br/>100 Hz, 3-axis] --> Window[Sliding Window<br/>2 sec, 50% overlap]

    Window --> Time[Time Domain<br/>Mean, Std, Min, Max<br/>Zero Crossings]
    Window --> Freq[Frequency Domain<br/>FFT Peaks, Energy<br/>Spectral Entropy]

    Time --> Vector[Feature Vector<br/>27 features/window]
    Freq --> Vector

    Vector --> Norm[Normalize<br/>Zero mean, unit variance]

    style Raw fill:#7F8C8D,stroke:#2C3E50,color:#fff
    style Window fill:#2C3E50,stroke:#16A085,color:#fff
    style Time fill:#16A085,stroke:#2C3E50,color:#fff
    style Freq fill:#16A085,stroke:#2C3E50,color:#fff
    style Vector fill:#E67E22,stroke:#2C3E50,color:#fff
    style Norm fill:#27AE60,stroke:#2C3E50,color:#fff

Figure 1350.2: Feature Engineering Pipeline from Raw Data to Normalized Feature Vector

Feature Categories:

Category	Features	Purpose
Statistical	Mean, Std, Min, Max, IQR	Central tendency, spread
Signal Shape	Zero crossings, Peak count	Periodicity indicators
Frequency	FFT peaks, Spectral energy	Periodic patterns
Domain-Specific	Step frequency, Bearing frequencies	Application knowledge

1350.7 Step 4: Train/Test Split

Critical for Time-Series: Use chronological splits, NOT random splits.

Data Leakage Warning

Wrong: Random 80/20 split (future data leaks into training)

Right: Chronological split (train on past, test on future)

# WRONG - Data leakage!
X_train, X_test = train_test_split(X, test_size=0.2, random_state=42)

# RIGHT - Chronological split
train_end = int(len(X) * 0.7)
val_end = int(len(X) * 0.85)

X_train = X[:train_end]          # Days 1-21
X_val = X[train_end:val_end]     # Days 22-25
X_test = X[val_end:]             # Days 26-30

Split Strategy:

Split	Percentage	Purpose
Training	70%	Learn patterns
Validation	15%	Tune hyperparameters
Test	15%	Final evaluation

1350.8 Step 5: Model Selection

Choose based on constraints:

Model	Accuracy	Model Size	Inference	Best For
Decision Tree	80-85%	5-50 KB	< 1ms	Interpretable, MCU
Random Forest	88-93%	200-500 KB	5-20ms	Tabular data, ESP32
SVM	85-90%	10-100 KB	1-5ms	High-dimensional
Neural Network	92-98%	1-10 MB	20-100ms	Complex patterns
Quantized NN	90-95%	50-200 KB	5-30ms	Edge AI

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart TB
    Start[Start Model<br/>Selection] --> RAM{RAM < 256KB?}

    RAM -->|Yes| Simple[Decision Tree<br/>or Logistic Reg]
    RAM -->|No| Latency{Latency < 10ms?}

    Latency -->|Yes| RF[Random Forest<br/>50-100 trees]
    Latency -->|No| Accuracy{Need 95%+ acc?}

    Accuracy -->|Yes| NN[Neural Network<br/>Quantized]
    Accuracy -->|No| RF

    style Start fill:#2C3E50,stroke:#16A085,color:#fff
    style RAM fill:#E67E22,stroke:#2C3E50,color:#fff
    style Latency fill:#E67E22,stroke:#2C3E50,color:#fff
    style Accuracy fill:#E67E22,stroke:#2C3E50,color:#fff
    style Simple fill:#27AE60,stroke:#2C3E50,color:#fff
    style RF fill:#27AE60,stroke:#2C3E50,color:#fff
    style NN fill:#27AE60,stroke:#2C3E50,color:#fff

Figure 1350.3: Model Selection Decision Tree Based on Device Constraints

1350.9 Step 6: Evaluation and Tuning

Use appropriate metrics for IoT:

Use Case	Primary Metric	Why
Activity Recognition	F1-Score, Accuracy	Balanced classes
Fall Detection	Recall, Specificity	Rare events, false alarm cost
Anomaly Detection	Precision @ Recall	Class imbalance
Prediction	MAE, RMSE	Continuous output

Hyperparameter Tuning:

from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 15, None],
    'min_samples_leaf': [1, 2, 5]
}

grid_search = GridSearchCV(
    RandomForestClassifier(),
    param_grid,
    cv=5,  # 5-fold cross-validation
    scoring='f1_macro',
    n_jobs=-1
)

grid_search.fit(X_train, y_train)
best_model = grid_search.best_estimator_

1350.10 Step 7: Deployment and Monitoring

Deployment Options:

Location	When to Use	Tools
Edge (MCU)	Real-time, offline	TensorFlow Lite Micro
Edge (ESP32/RPi)	Moderate complexity	TensorFlow Lite
Cloud	Complex models, fleet analytics	AWS SageMaker, Azure ML

Monitoring Checklist:

Track inference latency (P50, P95, P99)
Monitor prediction distribution
Detect feature drift (KL divergence)
Set alerts for accuracy degradation

1350.11 Common Pipeline Pitfalls

Pitfall 1: Training on Clean Lab Data, Deploying to Noisy Real World

The Mistake: Developing ML models using carefully curated datasets collected under controlled laboratory conditions, then expecting the same performance when deployed to production environments.

Why It Happens: Lab datasets are convenient, well-labeled, and produce impressive accuracy numbers. Real-world data collection is expensive and messy.

The Fix: 1. Data augmentation: Add synthetic noise matching expected sensor characteristics 2. Domain randomization: Train on data from multiple devices and environments 3. Staged deployment: Deploy to 5% of devices first, monitor for accuracy degradation 4. Graceful degradation: Output confidence scores, reject uncertain predictions

Rule of thumb: If your lab accuracy is 95%, budget for 80-85% real-world accuracy.

Pitfall 2: Ignoring Class Imbalance

The Mistake: Training on imbalanced data (95% normal, 5% anomaly) and celebrating “95% accuracy” when the model just predicts “normal” for everything.

Why It Happens: Accuracy rewards majority class prediction.

The Fix: - Use precision, recall, F1-score, and ROC-AUC - Apply class weighting or SMOTE oversampling - Set decision thresholds based on business costs

Example: Fall detection with 99% normal, 1% falls: - Naive model: 99% accuracy, 0% recall (misses all falls!) - Proper model: 95% accuracy, 90% recall (catches most falls)

Pitfall 3: Data Leakage in Time-Series

The Mistake: Using random train/test splits on time-series data, allowing the model to “see the future” during training.

Why It Happens: Default sklearn train_test_split uses random sampling.

The Fix: Always use chronological splits: - Training: Past data (days 1-21) - Validation: Near future (days 22-25) - Test: Far future (days 26-30)

Impact: Models with data leakage show 10-20% higher test accuracy than real-world performance.

1350.12 Worked Example: Model Selection for Industrial Predictive Maintenance

Scenario: A manufacturing plant needs vibration-based predictive maintenance on 500 CNC machines. Each machine has a 3-axis accelerometer at 4 kHz. The edge device is an ESP32 (520KB RAM, 240 MHz).

Constraints: - Model must fit in 150KB - Inference < 100ms - Target: >90% recall for Critical class, >70% precision

Model Comparison:

Model	Size	Latency	Critical Recall	Critical Precision
Random Forest (100 trees)	2.1 MB	45ms	87%	68%
Decision Tree Ensemble (10 trees)	89 KB	8ms	82%	71%
+ Frequency Features	112 KB	35ms	91%	74%
+ INT8 Quantization	28 KB	28ms	90%	73%

Result: INT8 quantized decision tree ensemble with 16 features achieves 90% critical recall, 28ms inference, 28KB model size.

Key Insight: Start with the simplest model that fits your constraints, then add complexity only if metrics demand it.

1350.13 Knowledge Check

Question 1: What is the primary advantage of Random Forests over single Decision Trees for activity recognition?

Explanation: Random Forest = ensemble of many decision trees, each trained on random subset of data/features. Voting (classification) or averaging (regression) produces final prediction. Single tree achieves 75% test accuracy; Random Forest (100 trees) achieves 90% by correcting individual tree mistakes.

Question 2: What machine learning technique is most appropriate for real-time activity recognition on a smartphone with limited labeled data?

Explanation: Transfer learning leverages knowledge from large datasets to improve small-dataset performance. Pre-train on public dataset (100k users), freeze lower layers (general motion features), fine-tune upper layers on 1 hour of user data. Achieves 85% accuracy with 1 hour personal data vs 70% training from scratch.

1350.14 Summary

This chapter covered the systematic 7-step IoT ML pipeline:

Data Collection: Gather diverse, representative sensor data
Data Cleaning: Remove outliers, handle missing values, align timestamps
Feature Engineering: Extract time-domain and frequency-domain features
Train/Test Split: Use chronological splits to avoid data leakage
Model Selection: Choose based on RAM, latency, and accuracy constraints
Evaluation: Use F1-score, recall, precision for imbalanced data
Deployment: Monitor inference latency and prediction drift

Key Takeaways: - Chronological splits are mandatory for time-series data - Class imbalance requires specialized metrics and techniques - Start simple (Decision Tree), add complexity only if needed - Real-world accuracy is typically 10-15% lower than lab accuracy

1350.15 What’s Next

Continue to Edge ML & Deployment to learn TinyML techniques for deploying ML models on resource-constrained IoT devices.