2  Modeling and Inferencing for IoT

Learning Objectives

After completing this chapter series, you will be able to:

  • Distinguish between training and inference phases of the IoT machine learning lifecycle
  • Design feature engineering pipelines that extract domain-specific features from raw sensor data
  • Select appropriate ML model families for common IoT problem types including classification, regression, and anomaly detection
  • Apply model optimization techniques such as quantization and pruning for deployment on constrained edge devices
  • Evaluate edge versus cloud deployment trade-offs based on latency, privacy, and connectivity requirements
In 60 Seconds

Machine learning for IoT transforms raw sensor streams into actionable intelligence through a pipeline of feature engineering, model training, validation, and edge deployment — and the unique challenge is producing models compact enough to fit on resource-constrained devices while maintaining sufficient accuracy. The key insight is that 90% of IoT ML success comes from good feature engineering, not from choosing a sophisticated algorithm.

MVU — Minimum Viable Understanding

Effective IoT machine learning depends more on well-engineered features from domain knowledge than on choosing the most complex algorithm. Getting the data pipeline right—from sensor data collection through feature engineering to model deployment—is the single biggest determinant of model accuracy in production.

Characters: Sammy the Sensor, Lila the LED, Max the Microcontroller, Bella the Battery

Sammy: “I collect numbers all day—temperature, motion, light. But what do they mean?”

Max: “That’s where machine learning comes in! Think of it like teaching a puppy. You show the puppy lots of pictures of cats, and eventually it learns to recognize them. We show a computer lots of sensor readings, and it learns patterns too!”

Lila: “So if Sammy records vibrations from a machine, the computer can learn what healthy vibrations look like and what broken vibrations look like?”

Max: “Exactly! That’s called predictive maintenance—the computer warns you before something breaks, like a doctor checking your heartbeat.”

Bella: “But I don’t have enough energy to run big programs! Can the learning happen on my tiny chip?”

Max: “Yes! That’s called TinyML—we shrink the trained model so it fits on small devices like us. It’s like summarizing a whole textbook into a one-page cheat sheet!”

Sammy: “So I collect data, Max learns from it, and Lila can flash a warning if something goes wrong? Teamwork!”

Machine learning (ML) for IoT means teaching computers to find patterns in sensor data so they can make predictions or decisions automatically. Instead of writing explicit rules like “if temperature > 80°C, send alert,” ML lets the system learn what “normal” looks like from historical data and flag anything unusual.

Three things to know:

  1. Training vs. Inference: Training is the learning phase (needs lots of data and compute). Inference is using the trained model to make predictions (can run on tiny devices).
  2. Features matter most: The way you prepare and transform raw sensor readings (called “feature engineering”) has a bigger impact on accuracy than which algorithm you pick.
  3. Edge vs. Cloud: You can run ML models on the sensor device itself (edge—fast, private, but limited) or in the cloud (powerful, but needs connectivity and adds latency).

If this is your first time, start with the ML Fundamentals chapter.

2.1 Overview

Machine learning transforms raw IoT sensor data into actionable insights—detecting activities, predicting failures, and enabling intelligent automation. This chapter series covers the complete ML lifecycle from data collection through production deployment.

Geometric visualization of the data science pipeline for IoT showing stages from raw sensor data through feature engineering, model training, validation, and deployment with feedback loops for continuous improvement
Figure 2.1: The data science pipeline for IoT follows a systematic progression from raw sensor streams to deployed models.

2.2 Chapter Series

This topic has been organized into seven focused chapters for easier navigation:

Chapter Title Key Topics Difficulty
1 ML Fundamentals Training vs inference, feature extraction, edge vs cloud Beginner
2 Mobile Sensing & Activity Recognition HAR, transportation mode detection, duty cycling Intermediate
3 IoT ML Pipeline 7-step pipeline, data leakage, model selection Intermediate
4 Edge ML & TinyML Deployment Quantization, pruning, HVAC predictive control Intermediate
5 Audio Feature Processing MFCC extraction, wake word detection Intermediate
6 Feature Engineering Good vs bad features, domain knowledge Intermediate
7 Production ML Monitoring, anomaly detection, predictive maintenance Advanced

2.3 Learning Path

2.3.1 For Beginners

Start with ML Fundamentals to understand:

  • What machine learning does for IoT
  • The difference between training and inference
  • Why feature extraction matters
  • When to use edge vs cloud ML

2.3.2 For Practitioners

Follow the complete pipeline:

  1. Mobile Sensing - Real-world activity recognition
  2. IoT ML Pipeline - Systematic 7-step approach
  3. Edge Deployment - TinyML and quantization
  4. Feature Engineering - Designing discriminative features

2.3.3 For Production Engineers

Focus on deployment and operations:

2.4 Key Concepts Summary

2.4.1 The IoT ML Pipeline

Flowchart showing the end-to-end IoT machine learning lifecycle from sensor data collection through feature engineering, model training, validation, deployment, and monitoring with a feedback loop for retraining
Figure 2.2: The end-to-end IoT ML lifecycle from data collection through deployment.

2.4.2 Edge vs Cloud Decision

Factor Choose Edge Choose Cloud
Latency < 100ms required 1-5 seconds acceptable
Privacy Sensitive data Anonymous data
Connectivity Intermittent Always connected
Model Size < 1MB > 10MB

Consider a predictive maintenance ML model for 500 industrial pumps:

Cloud ML approach:

  • Model: 45 MB TensorFlow SavedModel
  • Raw sensor upload: 10 kHz vibration × 2 bytes = 20 KB/s per pump \[\text{Fleet upload} = 20\text{ KB/s} \times 500 = 10{,}000\text{ KB/s} = 10\text{ MB/s}\] \[\text{Monthly bandwidth} = 10\text{ MB/s} \times 2{,}592{,}000\text{ s} = 25{,}920\text{ GB} \approx 26\text{ TB}\] \[\text{AWS cost} = 26\text{ TB} \times \text{\$0.09/GB} = \text{\$2,340/month}\]
  • Inference latency: 150-300 ms round-trip

Edge ML approach (TinyML on ESP32): - Model: 180 KB INT8 quantized TensorFlow Lite - Upload only anomaly alerts: ~5 alerts/day per pump × 100 bytes \[\text{Monthly bandwidth} = 5 \times 100\text{ bytes} \times 500 \times 30 = 7.5\text{ MB/month}\] \[\text{Cost} = 7.5\text{ MB} \times \text{\$0.09/GB} \approx \text{\$0.00} \text{ (negligible)}\] - Inference latency: 65 ms local processing

Result: Edge ML saves $2,340/month in bandwidth while reducing latency by 55-80%. Initial cost: ESP32 modules ($3 each × 500 = $1,500 one-time). Payback period: 19 days.

Edge vs Cloud Cost Explorer

Use this interactive calculator to compare edge and cloud ML deployment costs for your own IoT fleet scenario.

2.4.3 Feature Engineering Priority

Feature engineering contributes more to accuracy than algorithm choice:

  1. Domain knowledge (physics-based features) > generic statistics
  2. Time-domain features (mean, variance) are cheap and effective
  3. Frequency-domain features (FFT) add 5-10% accuracy for periodic signals
  4. Correlation analysis removes redundant features

2.4.4 ML Model Types for IoT Tasks

Understanding which model family to use for a given IoT problem is critical:

IoT Problem Type Suitable ML Models Typical Use Case
Classification Decision Trees, Random Forest, SVM, CNN Activity recognition, fault detection
Regression Linear Regression, Gradient Boosted Trees, MLP Temperature prediction, energy forecasting
Anomaly Detection Isolation Forest, One-Class SVM, Autoencoder Equipment fault detection, intrusion detection
Time-Series Forecasting LSTM, GRU, Prophet, ARIMA Demand prediction, environmental monitoring
Clustering k-Means, DBSCAN, Gaussian Mixture Device profiling, usage pattern discovery

2.4.5 Model Optimization for Constrained Devices

Deploying ML models on IoT devices requires shrinking them without unacceptable accuracy loss. The typical optimization pipeline progresses through several stages:

Stage Technique Size Reduction Accuracy Impact
1 Pruning (remove low-magnitude weights) 2–4x 0.5–2% loss
2 Quantization (FP32 to INT8) 4x 1–3% loss
3 Knowledge Distillation (student-teacher) 5–20x 2–5% loss
4 Hardware-Specific Compilation (TFLite, ONNX) 1.2–2x Negligible

2.6 Videos

Common Pitfalls
  1. Skipping feature engineering and jumping to deep learning: In IoT, well-crafted domain-specific features (e.g., vibration RMS, rolling averages, frequency peaks) typically outperform throwing raw data at a neural network. A random forest with 10 good features often beats a deep model trained on raw accelerometer samples.

  2. Training on data without temporal splits: IoT data is time-ordered. If you randomly shuffle data for train/test splitting, you leak future information into the training set (“data leakage”), producing overly optimistic accuracy that collapses in production. Always split by time—train on past, test on future.

  3. Ignoring model drift after deployment: Sensor behavior changes over time due to aging, environmental shifts, and firmware updates. A model that was 95% accurate at deployment can degrade to 70% within months if you do not monitor for concept drift and retrain periodically.

  4. Over-fitting to a single device: Training a model on data from one sensor and deploying it across hundreds of identical units can fail due to manufacturing variance. Sensors of the same type can exhibit different offsets, gains, and noise characteristics. Train on data from multiple devices or apply domain adaptation techniques.

  5. Ignoring class imbalance in fault detection: In predictive maintenance, “healthy” readings vastly outnumber “fault” readings (often 99:1 or worse). A model that always predicts “healthy” achieves 99% accuracy but catches zero faults. Use precision-recall metrics, F1-score, and resampling techniques (SMOTE, class weighting) instead of raw accuracy.

2.7 Knowledge Check

Build a full ML pipeline – from raw sensor data to trained model with evaluation – using only Python’s standard library (no external dependencies). This demonstrates the 6-stage lifecycle described above.

import random
import math

random.seed(42)

# Stage 1: COLLECT - Simulate 3-axis accelerometer for activity recognition
# Activities: 0=sitting, 1=walking, 2=running
def generate_samples(n_per_class=100):
    data, labels = [], []
    for _ in range(n_per_class):
        # Sitting: low variance, near-zero mean
        data.append([random.gauss(0, 0.1) for _ in range(3)])
        labels.append(0)
        # Walking: moderate variance, rhythmic
        data.append([random.gauss(0, 0.5) + 0.3 * math.sin(random.random())
                      for _ in range(3)])
        labels.append(1)
        # Running: high variance, high magnitude
        data.append([random.gauss(0, 1.2) + 0.8 for _ in range(3)])
        labels.append(2)
    return data, labels

samples, labels = generate_samples(150)

# Stage 2: ENGINEER FEATURES - Extract domain features from raw axes
def extract_features(sample):
    magnitude = math.sqrt(sum(x**2 for x in sample))
    variance = sum((x - sum(sample)/3)**2 for x in sample) / 3
    max_val = max(abs(x) for x in sample)
    return [magnitude, variance, max_val]

features = [extract_features(s) for s in samples]

# Stage 3: TRAIN - Time-based split (first 70% train, last 30% test)
split = int(0.7 * len(features))
X_train, y_train = features[:split], labels[:split]
X_test, y_test = features[split:], labels[split:]

# Simple k-NN classifier (k=5) -- no external libraries needed
def distance(a, b):
    return math.sqrt(sum((ai - bi)**2 for ai, bi in zip(a, b)))

def knn_predict(X_train, y_train, query, k=5):
    dists = [(distance(query, x), y) for x, y in zip(X_train, y_train)]
    dists.sort(key=lambda d: d[0])
    votes = [d[1] for d in dists[:k]]
    return max(set(votes), key=votes.count)

# Stage 4: EVALUATE
correct = 0
confusion = [[0]*3 for _ in range(3)]
for x, y_true in zip(X_test, y_test):
    y_pred = knn_predict(X_train, y_train, x)
    confusion[y_true][y_pred] += 1
    if y_pred == y_true:
        correct += 1

accuracy = correct / len(X_test)
print(f"Activity Recognition Results (k-NN, k=5)")
print(f"Training samples: {len(X_train)}, Test samples: {len(X_test)}")
print(f"Accuracy: {accuracy:.1%}\n")

activity_names = ["Sitting", "Walking", "Running"]
print("Confusion Matrix:")
print(f"{'':>10} {'Pred Sit':>9} {'Pred Walk':>10} {'Pred Run':>9}")
for i, row in enumerate(confusion):
    total = sum(row)
    recall = row[i] / total if total > 0 else 0
    print(f"{activity_names[i]:>10} {row[0]:>9} {row[1]:>10} {row[2]:>9}  "
          f"(recall: {recall:.0%})")

# Stage 5: OPTIMIZE - Simulate quantization impact
print(f"\nQuantization Impact Simulation:")
print(f"  FP32 accuracy: {accuracy:.1%} (baseline)")
print(f"  INT8 estimate:  {accuracy - 0.02:.1%} (typical 1-3% loss)")
print(f"  INT4 estimate:  {accuracy - 0.06:.1%} (typical 3-8% loss)")
print(f"  Model size: FP32=12KB -> INT8=3KB (fits on ESP32)")

What to Observe:

  • Sitting is easiest to classify (low movement = distinctive features)
  • Walking vs running confusion shows the value of good feature engineering
  • The 3 extracted features (magnitude, variance, max) capture activity differences better than raw accelerometer values
  • Temporal split avoids data leakage – earlier samples train, later samples test

2.8 Worked Example: Choosing an ML Model for Smart Building Occupancy

Worked Example: Model Selection and Deployment for Office Occupancy Prediction

Scenario: WeWork operates a co-working space in San Francisco with 8 floors and 200 desks. They want to predict hourly occupancy per floor to optimize HVAC and lighting. Available sensor data includes Wi-Fi connected device counts, CO2 levels (ppm), PIR motion events, and door badge swipes.

Given:

  • Training data: 6 months of hourly readings from 4 sensor types across 8 floors (34,944 samples)
  • Features per sample: Wi-Fi count, CO2 ppm, PIR events/hour, badge swipes/hour, hour-of-day, day-of-week
  • Target: Floor occupancy (0-25 people, regression) or occupancy band (empty/low/medium/high, classification)
  • Deployment constraint: Must run on floor-level Raspberry Pi 4 (4GB RAM, no GPU)
  • Latency requirement: Prediction within 100ms for HVAC pre-conditioning

Step 1 – Compare model families on this dataset:

Model Training Time Inference Time (RPi4) MAE (people) Model Size Pros/Cons for This Task
Linear Regression 2 seconds 0.1 ms 4.2 1 KB Fast but misses nonlinear occupancy patterns
Random Forest (100 trees) 45 seconds 8 ms 1.8 12 MB Good accuracy, interpretable feature importance
Gradient Boosted Trees (XGBoost) 90 seconds 5 ms 1.5 8 MB Best accuracy, slightly slower to train
Neural Network (2 hidden layers) 5 minutes 3 ms 1.7 2 MB Needs GPU for training, comparable accuracy
k-NN (k=7) 0 (lazy) 45 ms 2.1 28 MB (stores all data) Too slow for 100ms requirement with full dataset

Step 2 – Feature importance analysis (from Random Forest):

Feature Importance Why
Wi-Fi device count 0.42 Most direct proxy for occupancy (people carry phones)
Hour of day 0.22 Strong daily pattern (empty at night, peak 10am-2pm)
CO2 ppm 0.18 Correlates with breathing occupants, lags by 15-20 min
Day of week 0.10 Fridays have 40% lower occupancy than Tuesdays
Badge swipes/hour 0.05 Only counts entries, misses people already inside
PIR events/hour 0.03 Saturates above 10 people (motion everywhere)

Key insight: Wi-Fi count alone predicts occupancy with MAE of 2.8 people. Adding hour-of-day improves to 2.1. The remaining 4 features only improve from 2.1 to 1.5 – diminishing returns. For a simpler deployment, Wi-Fi + time features may be sufficient.

Step 3 – Deployment decision:

  • Selected model: XGBoost (MAE 1.5, 5ms inference, 8 MB)
  • Why not Random Forest: XGBoost is 17% more accurate with similar inference speed
  • Why not Neural Network: Comparable accuracy but requires GPU for retraining; XGBoost retrains on RPi4 in 90 seconds
  • Quantization: Not needed (5ms inference already well under 100ms limit)
  • Retraining schedule: Monthly, using previous 3 months of data (handles seasonal occupancy shifts)

Step 4 – Production monitoring metrics:

Metric Threshold Action if Exceeded
MAE (7-day rolling) >3.0 people Trigger retrain
Feature drift (Wi-Fi count distribution) KL divergence >0.5 Investigate sensor change
Prediction staleness >2 consecutive hours with constant prediction Check RPi health
HVAC energy waste >15% over manual baseline Review prediction accuracy per floor

Result: XGBoost model deployed on 8 Raspberry Pi 4 devices predicts floor occupancy with MAE of 1.5 people (on a 0-25 scale). HVAC pre-conditioning starts 30 minutes before predicted high occupancy, reducing energy waste by 22% compared to fixed schedule. Monthly retraining keeps accuracy stable across seasonal changes. Total deployment cost: 8 x $55 (RPi4) + $0 (open-source ML stack) = $440 hardware.

Key Insight: For IoT ML model selection, inference speed and model size on the target hardware matter more than marginal accuracy differences. XGBoost and Random Forest are the workhorses of tabular IoT data – they handle mixed feature types, require no normalization, provide feature importance for debugging, and run efficiently on ARM processors without GPU.

2.9 How It Works: The IoT ML Lifecycle

The complete IoT machine learning lifecycle operates as a continuous feedback loop with six distinct stages:

Stage 1: Collect - Data acquisition begins with properly timestamped, labeled sensor streams. For example, a smart building collects temperature (5 bytes), occupancy (binary), and HVAC state (binary) every minute from 500 sensors, generating 43,200 samples per sensor over 30 days. Critical requirement: metadata includes sensor ID, location, and calibration state to enable troubleshooting.

Stage 2: Engineer Features - Raw sensor readings transform into domain-informed features. Instead of feeding raw temperature values to a model, extract lag features (temperature 1 hour ago, 30 minutes ago), gradients (outdoor temperature change per hour), and cyclic encodings (hour-of-day as sin/cos pair). This stage contributes 60-80% of final model accuracy – more than algorithm choice.

Stage 3: Train and Validate - Split data chronologically (not randomly!) into training (70%), validation (15%), and test (15%) sets. Train multiple model families (linear regression, random forest, gradient boosted trees, neural networks) on the training set, tune hyperparameters using validation set, and report final accuracy on unseen test set. For IoT, random forests and gradient boosted trees consistently outperform neural networks on tabular data due to better handling of mixed feature types and no normalization requirements.

Stage 4: Optimize - Apply pruning (remove 70% of weights), quantization (FP32 → INT8 = 4x size reduction), and knowledge distillation (train small “student” model to mimic large “teacher” model). Result: 500KB model shrinks to 50KB with only 1-3% accuracy loss, enabling deployment on microcontrollers.

Stage 5: Deploy - Choose edge (low latency, privacy, offline capability) or cloud (powerful, flexible) based on requirements. Edge example: Alexa wake word detection runs on Cortex-M4 with 8KB model and <100ms latency. Cloud example: Full natural language understanding requires 100MB+ models impossible to fit on edge.

Stage 6: Monitor - Track model drift (accuracy degradation over time), data quality (sensor failures), and prediction staleness (frozen predictions indicate device failure). Retrain monthly using previous 3 months of data to handle seasonal shifts. Example: HVAC model MAE increases from 0.8°C to 3.0°C over 6 months without retraining due to sensor calibration drift.

The feedback loop: Production monitoring (Stage 6) identifies degraded accuracy, triggering data collection (Stage 1) with new edge cases, feature engineering improvements (Stage 2), and model retraining (Stage 3-4), completing the cycle.

2.10 Summary and Key Takeaways

Key Takeaways

Core Principle: IoT machine learning transforms raw sensor streams into actionable predictions, but success depends far more on data quality and feature engineering than on model complexity.

The IoT ML Lifecycle (6 stages, continuously iterating):

  1. Collect — Gather labeled sensor data with proper timestamping and metadata.
  2. Engineer Features — Extract domain-informed features (time-domain statistics, frequency components, cross-sensor correlations) that capture the physical phenomena of interest.
  3. Train and Validate — Use chronological train/test splits to avoid data leakage; select models appropriate for the problem type (classification, regression, anomaly detection, forecasting).
  4. Optimize — Apply pruning, quantization, and knowledge distillation to fit models onto constrained devices (50 MB to 200 KB is achievable).
  5. Deploy — Choose edge (low-latency, private, offline-capable) or cloud (powerful, flexible) deployment based on requirements.
  6. Monitor — Track model drift, data quality, and prediction accuracy in production; trigger retraining when performance degrades.

Essential Rules of Thumb:

  • Feature engineering is the highest-leverage activity: Domain-specific features grounded in physical understanding consistently outperform brute-force approaches. Invest time here before experimenting with complex algorithms.
  • Temporal integrity is non-negotiable: Always split IoT data by time, not randomly. Data leakage is the most common source of inflated accuracy in IoT ML projects.
  • Edge vs. cloud is a design decision: Latency, privacy, connectivity, and model complexity determine where inference runs. Many production systems use a hybrid approach.
  • TinyML enables on-device intelligence: Techniques like quantization (32-bit to 8-bit) and pruning can shrink models by 4–10x with minimal accuracy loss, enabling deployment on microcontrollers.
  • Production models require monitoring: Concept drift, sensor degradation, and environmental changes mean that deployed models must be continuously monitored and periodically retrained.
  • Start simple, add complexity only when needed: A logistic regression or random forest with good features is often the right starting point. Upgrade to neural networks only when simpler models demonstrably fall short.

2.11 Concept Relationships

IoT ML builds on:

IoT ML enables:

Parallel concepts:

  • Feature engineering ↔︎ Edge data reduction: Both transform raw data into compact, meaningful representations
  • Edge ML quantization ↔︎ Data compression: Both sacrifice precision for efficiency with minimal quality loss
  • ML model drift ↔︎ Sensor calibration drift: Both require periodic retraining/recalibration to maintain accuracy

2.12 See Also

Chapter series:

  1. ML Fundamentals - Training vs inference, feature extraction, edge vs cloud
  2. Mobile Sensing - HAR, transportation mode detection
  3. IoT ML Pipeline - 7-step systematic approach
  4. Edge ML & Deployment - Quantization, pruning, TinyML
  5. Audio Feature Processing - MFCC extraction for voice recognition
  6. Feature Engineering - Designing discriminative features
  7. Production ML - Monitoring, drift detection, anomaly detection

Related topics:

Cross-hub connections:

2.13 What’s Next

Direction Chapter Link
Start Here ML Fundamentals modeling-ml-fundamentals.html
Next Mobile Sensing and Activity Recognition modeling-mobile-sensing.html
Related Edge ML and TinyML Deployment modeling-edge-deployment.html
Related Feature Engineering modeling-feature-engineering.html