3  IoT Machine Learning Fundamentals

3.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Distinguish Training from Inference: Differentiate between training and inference phases in machine learning
  • Explain Feature Extraction: Describe why raw sensor data must be transformed into meaningful features
  • Evaluate Accuracy Metrics: Assess why accuracy alone is misleading for imbalanced IoT datasets
  • Compare Edge vs Cloud ML: Analyze the trade-offs between running ML on devices vs in the cloud
In 60 Seconds

Machine learning for IoT involves selecting, training, validating, and deploying models that map sensor feature vectors to actionable predictions — and the unique IoT constraints are the scarcity of labelled data, the need for edge deployment, and the non-stationarity of sensor environments. Understanding the bias-variance trade-off and the difference between supervised, unsupervised, and semi-supervised learning is the foundation for all subsequent IoT ML work.

Minimum Viable Understanding: IoT Machine Learning

Core Concept: Machine learning transforms raw sensor data (numbers like “9.8 m/s^2”) into meaningful insights (“user is running”). Models learn patterns from historical data (training) then apply those patterns to new data (inference).

Why It Matters: IoT generates billions of data points daily. Without ML, you’d need humans to interpret every reading. With ML, sensors become intelligent - detecting falls, predicting equipment failures, and recognizing activities automatically.

Key Takeaway: Feature engineering (extracting meaningful statistics from raw data) contributes more to model accuracy than algorithm choice. Spend 80% of your time on features and 20% on model selection.

3.2 Prerequisites

Before diving into this chapter, you should be familiar with:

  • Data Storage and Databases: Understanding how IoT data is collected, stored, and accessed provides foundation for building machine learning models
  • Edge and Fog Computing: Knowledge of distributed computing architectures helps contextualize where ML inferencing occurs
  • Basic statistics concepts: Understanding mean, variance, and standard deviation helps with feature engineering

~15 min | Beginner | P10.C02.U01

Key Concepts

  • Supervised learning: Training a model on labelled examples (sensor readings paired with known outcomes) to predict labels on new, unseen data — requires labelled IoT datasets which are often expensive to obtain.
  • Unsupervised learning: Finding patterns in unlabelled sensor data — clustering similar device behaviours, detecting anomalies — without requiring labelled training examples.
  • Bias-variance trade-off: The fundamental tension between model simplicity (high bias, underfitting) and model complexity (high variance, overfitting); finding the right balance is the core model selection challenge in IoT ML.
  • Cross-validation: A technique for estimating model generalisation performance by training and evaluating on multiple different splits of the available data, producing a more reliable accuracy estimate than a single train/test split.
  • Class imbalance: The situation where anomalous or failure events are far less common than normal events in training data (0.1% vs 99.9%), requiring techniques like oversampling, undersampling, or class-weighted loss functions.
  • Model generalisation: The ability of a trained model to perform accurately on sensor data from deployment conditions not seen during training — the ultimate goal of IoT ML model development.
Chapter Series: Modeling and Inferencing

This is the first chapter in a series on IoT Machine Learning:

  1. ML Fundamentals (this chapter) - Core concepts, training vs inference
  2. Mobile Sensing & Activity Recognition - HAR, transportation detection
  3. IoT ML Pipeline - 7-step pipeline, best practices
  4. Edge ML & Deployment - TinyML, quantization
  5. Audio Feature Processing - MFCC, keyword recognition
  6. Feature Engineering - Feature design and selection
  7. Production ML - Monitoring, anomaly detection
Geometric visualization of the data science pipeline for IoT showing stages from raw sensor data through feature engineering, model training, validation, and deployment with feedback loops for continuous improvement
Figure 3.1: The data science pipeline for IoT follows a systematic progression from raw sensor streams to deployed models.

3.3 Getting Started (For Beginners)

Machine Learning is like teaching your sensors to become super-smart detectives!

3.3.1 The Sensor Squad Adventure: The Pattern Patrol

It was a quiet Tuesday when Motion Mo noticed something strange. “Hey team, I keep seeing the same pattern every day! The humans wake up at 7am, eat breakfast at 7:30, and leave for work at 8:15.”

Thermo the Temperature Sensor nodded excitedly. “I see patterns too! The house gets warm around 6pm when everyone comes home, and it cools down at 11pm when they go to bed.”

Signal Sam gathered everyone for an important announcement. “Sensors, you’ve just discovered something AMAZING. You’re doing what scientists call Machine Learning - finding patterns in data and using them to make smart predictions!”

Sam drew a picture to explain:

Step 1 - COLLECT: “First, we gather lots of examples. Motion Mo, you’ve recorded 10,000 mornings of people waking up.”

Step 2 - LEARN: “Then we show a computer all those examples. ‘See how the motion always starts in the bedroom, then moves to the bathroom, then to the kitchen? THAT’S the wake-up pattern!’”

Step 3 - PREDICT: “Now the smart part! When Motion Mo sees that same pattern starting, the computer can predict: ‘Someone’s waking up! Better start warming up the coffee maker!’”

3.3.2 Key Words for Kids

Word What It Means
Machine Learning Teaching computers to find patterns and make predictions
Pattern Something that happens the same way over and over
Training Showing the computer LOTS of examples so it can learn
Prediction A smart guess about what will happen next
Model The “brain” that the computer builds after learning
Inference When the trained model looks at NEW data and makes a prediction

3.3.3 Try This at Home!

Be a Pattern Detective:

  1. For one week, write down what time you go to bed and wake up
  2. Also note if it’s a school day or weekend
  3. After a week, look at your data - can you find the pattern?
  4. Now make a PREDICTION: What time will you wake up next Saturday?

Congratulations - you just did Machine Learning with your brain!

3.4 IoT Machine Learning Lifecycle Overview

The IoT ML lifecycle involves distinct phases, from data collection to deployed model maintenance:

Six-stage IoT machine learning lifecycle diagram showing collect data, prepare features, train model, evaluate metrics, deploy to edge or cloud, and monitor performance

Key Phases:

  • Data Collection: Gather sensor readings on edge devices and store them as raw time-series data.
  • Feature Engineering: Extract meaningful patterns on the edge or in the cloud to produce feature vectors.
  • Model Training: Learn from historical data in the cloud, typically using GPU-backed compute.
  • Compression: Reduce model size in the cloud before deployment to constrained hardware.
  • Edge Inference: Apply the trained model to new data on the device and generate predictions.
  • Monitoring: Track production performance in the cloud and raise drift alerts when behavior changes.

3.5 What is Machine Learning for IoT?

Simple Explanation

Analogy: ML for IoT is like teaching a smart assistant to recognize patterns.

Imagine you have a fitness tracker. Instead of just showing raw numbers (steps: 5000, heart rate: 120), it can tell you: - “You’re running” (not walking) - “Your workout intensity is high” - “You might be getting sick” (unusual heart patterns)

That’s machine learning—turning raw sensor data into meaningful insights!

Diagram showing machine learning transformation: raw accelerometer values (9.8, 0.2, 0.1 m/s squared) flow through an ML model to produce human-readable outputs like activity state equals running and intensity equals high
Figure 3.2: Machine Learning Transforms Raw Sensor Numbers into Actionable Insights

3.6 Training vs Inference: Two Phases

Machine learning for IoT follows a distinct two-phase workflow that separates learning from application:

Phase 1: Training (Cloud, One-Time)

  1. Data Collection: Gather labeled examples—10,000 hours of walking/running/sitting accelerometer data from diverse users
  2. Feature Extraction: Transform raw sensor streams into statistical features (mean, variance, FFT peaks)
  3. Model Training: Feed features into algorithm (Random Forest, Neural Network) that learns patterns distinguishing classes
  4. Validation: Test on held-out data to measure accuracy and tune hyperparameters
  5. Compression: Quantize weights (float32 → int8) and prune weak connections to fit edge device constraints
  6. Output: A trained model file (weights + architecture) ready for deployment

Phase 2: Inference (Edge Device, Real-Time)

  1. Sensor Input: Accelerometer provides live data stream (50 Hz)
  2. Feature Extraction: Same feature pipeline as training (must match exactly)
  3. Model Execution: Forward pass through trained model using extracted features
  4. Prediction Output: Classification label (e.g., “running”) or confidence scores
  5. Action: Trigger response (update dashboard, log event, adjust system)

Key Separation: Training happens once in the cloud with powerful GPUs over days; inference runs continuously on devices (ESP32, smartphone) with millisecond latency. The model learned during training is frozen during inference—no learning happens on the device.

  • Training: Learn patterns from historical data in the cloud using powerful computers. Example: analyze 10,000 hours of walking and running data.
  • Inference: Apply those learned patterns on an edge device in real time. Example: detect the user’s current activity from a live sensor stream.
Diagram showing two-phase ML workflow: Training phase in cloud with powerful GPUs processes historical data to create a model, then the compressed model is deployed to edge devices for real-time inference on live sensor data
Figure 3.3: Cloud Training to Edge Inference Deployment Pipeline

3.7 Model Compression for IoT

IoT devices have limited resources. Large models must be compressed before deployment:

Edge AI deployment pipeline showing a full TensorFlow model converted to TFLite, optimized for microcontrollers, and reduced to a much smaller TFLM model before on-device inference
Figure 3.4: Cloud-to-microcontroller optimization path for edge ML deployment.

Common Compression Techniques:

  • Quantization: Reduce precision from float32 to int8. Typical gain: about 4x smaller, with roughly 1-3% accuracy loss.
  • Pruning: Remove near-zero weights that contribute little to the prediction. Typical gain: about 2-10x smaller, with roughly 1-5% accuracy loss.
  • Knowledge Distillation: Train a small student model to mimic a larger teacher model. Typical gain: about 10-100x smaller, with roughly 2-10% accuracy loss.
  • Architecture Search: Choose a more efficient network structure from the start. Size reduction varies, and accuracy can sometimes improve as well.

Model Compression for ESP32 Deployment:

Compressing a neural network for embedded inference involves quantifying memory and latency trade-offs.

Original model (32-bit floating point): \[ \text{Model size} = N_{\text{params}} \times 4 \text{ bytes} \] For a HAR model with \(N_{\text{params}} = 50{,}000\) parameters: \[ \text{Size}_{\text{float32}} = 50{,}000 \times 4 = 200{,}000 \text{ bytes} = 195 \text{ KB} \]

Quantized model (8-bit integers): \[ \text{Size}_{\text{int8}} = 50{,}000 \times 1 = 50{,}000 \text{ bytes} = 49 \text{ KB} \] Compression ratio: \(195 / 49 \approx 4\times\) size reduction.

Inference latency improvement: INT8 operations on ESP32 (240 MHz) are ~3× faster than FP32: \[ \text{Latency}_{\text{int8}} \approx \frac{\text{Latency}_{\text{float32}}}{3} = \frac{150 \text{ ms}}{3} = 50 \text{ ms} \]

Accuracy degradation is typically <2% for quantized models, making this trade-off acceptable for edge deployment where 10 Hz inference (100 ms budget) is sufficient.

3.7.1 Explore: Model Compression Calculator

Adjust model parameters to see how quantization and pruning affect size, latency, and device fit.

3.8 Feature Extraction: What the Model Actually Sees

ML models don’t understand raw sensor readings. We extract features—meaningful statistics:

  • 1000 accelerometer samples -> mean, variance, peak frequency. These help distinguish running from sitting because running has much higher variance.
  • Heart rate over 1 minute -> average, minimum, maximum, and variability. These help distinguish exercise from rest.
  • Temperature readings -> rate of change and trend. These help with fire detection and HVAC optimization.
Flowchart showing feature extraction pipeline: raw accelerometer samples (1000 data points) flow through a sliding window into statistical feature extraction producing mean, variance, peak frequency, and zero crossings, which then feed into the ML model for activity classification
Figure 3.5: Feature Extraction Pipeline for Activity Recognition
Key Takeaway

In one sentence: The hardest part of IoT machine learning is not the algorithm - it is extracting the right features from noisy sensor data and compressing models small enough to run on constrained devices.

Remember this: Feature engineering contributes more to model accuracy than algorithm choice - spend 80% of your time on features (domain knowledge) and 20% on model selection.

3.9 Edge ML: Running AI on Tiny Devices

IoT devices have limited resources. The following decision tree helps you choose between edge, cloud, or hybrid deployment:

Decision chain for edge versus cloud ML showing checks for latency, privacy, and model size before recommending an edge or cloud deployment

Edge ML Overview

Edge ML means running models directly on devices rather than in the cloud:

  • Cloud: Powerful, complex models, but it needs internet and adds latency. Example: voice assistants such as Alexa.
  • Edge: Fast and works offline, but it limits model size. Example: fall detection on a smartwatch.
  • Hybrid: Combines both approaches, but it adds architectural complexity. Example: process locally while training in the cloud.

Why Edge ML?

  • Fast: No network delay (critical for safety)
  • Private: Data never leaves device
  • Cheap: No cloud costs
  • Efficient: Process only what’s needed

Tradeoff: Edge ML vs Cloud ML

Option A (Edge ML): Deploy compressed models directly on IoT devices (microcontrollers, smartphones, gateways). Inference latency <10ms, works offline, data stays private. Model size limited to device memory (50KB-50MB typically).

Option B (Cloud ML): Send sensor data to cloud servers for processing with large, accurate models. Can use state-of-the-art architectures (transformers, large CNNs) with no size constraints. Requires reliable connectivity, adds 50-500ms network latency.

Decision Factors: Choose Edge ML when latency is critical (fall detection, collision avoidance), privacy is paramount (health data, location tracking), connectivity is unreliable (remote industrial sites, mobile applications), or operating costs must be minimized (millions of devices, metered data plans). Choose Cloud ML when model accuracy is paramount and simpler edge models are insufficient, when models need frequent updates with new training data, or when device constraints are severe (<10KB available memory). Most production systems use a hybrid approach: edge handles time-critical inferences and pre-filtering while cloud handles complex analytics, model training, and aggregated insights.

3.10 Common Misconception: Accuracy Metrics

“95% Accuracy Means My Model Is Great!”

The Trap: Many IoT developers celebrate achieving 95% accuracy, assuming this guarantees production success. However, accuracy alone is misleading for imbalanced datasets.

Real-World Example: Smart Factory Anomaly Detection

A manufacturing company deployed a 95% accurate anomaly detector:

  • Dataset: 10,000 normal operations + 500 anomalies (95% normal, 5% anomalies)
  • Naive model: Always predict “normal” - 95% accuracy! (Detects 0% of anomalies)
  • 95% accurate model: Catches 90% of anomalies BUT generates 10% false positives

The Financial Impact:

  • Production line: 1,000,000 checks/day
  • False positives: 950,000 normal checks x 10% = 95,000 false alarms/day
  • Investigation cost: 95,000 false alarms x $50 = $4.75M/day wasted

The Fix: Right Metrics for IoT

Instead of accuracy, use:

  1. Precision: True positives / (true positives + false positives)
  2. Recall: True positives / actual anomalies
  3. F1-Score: Harmonic mean balancing precision/recall
  4. Specificity: True negatives / actual normals

Key Lesson: For rare events (falls, machine failures, security breaches), optimize for high specificity (99.9%+) and high recall (95%+), NOT overall accuracy.

3.11 Worked Example: Building a Simple Activity Classifier

Worked Example: Smartphone Activity Recognition

Scenario: You are building a fitness app that detects whether the user is walking, running, or stationary using the smartphone’s accelerometer.

Given:

  • Accelerometer sampling at 50 Hz (50 readings per second)
  • 3-axis data: X (left-right), Y (forward-back), Z (up-down)
  • Target activities: Stationary, Walking, Running
  • Training data: 100 users, 10 minutes each activity x 3 activities = 3000 minutes total
  • Deployment target: Android smartphone, must run in background with <5% battery impact

Step-by-Step Solution:

  1. Window the data: Group samples into 2-second windows (100 samples per window)

    • Why 2 seconds? Long enough to capture a walking stride (~1s), short enough for responsive detection
  2. Extract features per window (for each axis and magnitude):

    • Mean: sum(x)/n -> detects orientation changes.
    • Standard Deviation: sqrt(sum((x-mean)^2)/n) -> higher for running than walking.
    • Peak-to-Peak: max(x) - min(x) -> running has larger amplitude.
    • Zero Crossing Rate: count(sign changes)/n -> higher for rhythmic activities.

    Total: 4 features x 4 (X, Y, Z, magnitude) = 16 features per window

  3. Train a Random Forest classifier:

    • Input: 16 features
    • Output: Stationary (0), Walking (1), Running (2)
    • Why Random Forest? Handles mixed feature types, fast inference, no normalization needed
  4. Evaluate using stratified 5-fold cross-validation:

    • Stationary: precision 99.2%, recall 99.5%, F1-score 99.3%.
    • Walking: precision 94.1%, recall 92.8%, F1-score 93.4%.
    • Running: precision 96.5%, recall 97.3%, F1-score 96.9%.
  5. Deploy with optimizations:

    • Model size: 45 KB (100 trees, max depth 8)
    • Inference time: 2ms per window
    • Battery: Sample 5s, sleep 5s = 50% duty cycle, ~3% battery/hour

Result: 95.2% macro F1-score with 3% hourly battery consumption. Walking/Running confusion (6% of walking detected as running) occurs during fast walking - acceptable for fitness tracking.

Key Insight: Simple features (mean, std) outperformed complex frequency features in this case. Always start simple - add complexity only when needed.

3.12 Worked Example: Decision Tree for Sensor Fault Classification

Worked Example: Detecting Faulty Sensors in a Building HVAC System

Scenario: A commercial building has 50 temperature sensors monitoring HVAC zones. Over time, sensors develop faults – stuck readings, drift, erratic spikes, or gradual failure. Currently, a technician manually inspects sensor logs weekly, taking 4 hours. You want to automate fault detection using a decision tree classifier.

Given:

  • 50 temperature sensors, sampling every 60 seconds
  • 6 months of labeled data (technician flagged 312 confirmed faults)
  • Fault categories: Normal (0), Stuck (1), Drift (2), Spike (3), Dead (4)
  • Goal: Flag faulty sensors within 1 hour of fault onset

Step 1: Engineer features from raw temperature streams

For each sensor, compute features over a 1-hour sliding window (60 readings):

  • Variance: var(readings) -> stuck sensors have near-zero variance.
  • Range: max - min -> dead sensors have range = 0.
  • Delta from neighbors: abs(mean_self - mean_nearest_3) -> drifted sensors diverge from nearby sensors.
  • Spike count: count(abs(diff) > 3 * std) -> erratic sensors have frequent large jumps.
  • Trend slope: linear_regression_slope(readings) -> drifting sensors show a consistent positive or negative slope.
  • Flatline ratio: count(consecutive_identical) / n -> stuck sensors repeat the same value.

Step 2: Examine the decision boundaries

A decision tree learns rules like these from labeled data:

  • If variance < 0.01 and range = 0, classify the sensor as Dead.
  • If variance < 0.01 and flatline_ratio > 0.85, classify the sensor as Stuck.
  • If spike_count > 5, classify the sensor as Spike.
  • If neighbor_delta > 4C and the trend keeps diverging, classify the sensor as Drift.
  • Otherwise, classify the sensor as Normal.

Why a decision tree and not a neural network? Three reasons specific to this IoT use case:

  1. Interpretability: When a technician receives a “Sensor 23: DRIFT” alert, they need to understand WHY. A decision tree can report: “Flagged because variance=0.34 (OK), but neighbor_delta=5.2C (threshold: 4C).” A neural network cannot explain its reasoning this clearly.

  2. Small training set: 312 fault examples across 5 classes is modest. Decision trees perform well with small datasets. Deep learning typically needs 10,000+ examples per class.

  3. Fast inference: The tree above requires 3-4 comparisons per prediction. On an ESP32, this takes microseconds. A neural network would take milliseconds and consume more memory.

Step 3: Train and evaluate with Python

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import classification_report
import numpy as np

# Feature matrix: 50 sensors x 6 months x (24 windows/day)
# = ~219,000 samples with 6 features each
# Labels: 0=Normal, 1=Stuck, 2=Drift, 3=Spike, 4=Dead

X = np.load("sensor_features.npy")   # shape: (219000, 6)
y = np.load("sensor_labels.npy")     # shape: (219000,)

# Class distribution (highly imbalanced):
# Normal: 218,200 (99.6%), Stuck: 380 (0.17%),
# Drift: 220 (0.10%), Spike: 150 (0.07%), Dead: 50 (0.02%)

# Train decision tree with depth limit to prevent overfitting
clf = DecisionTreeClassifier(
    max_depth=6,           # Limit complexity
    min_samples_leaf=20,   # Require 20+ samples per leaf
    class_weight="balanced" # Upweight rare fault classes
)

# Stratified 5-fold cross-validation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
for fold, (train_idx, test_idx) in enumerate(skf.split(X, y)):
    clf.fit(X[train_idx], y[train_idx])
    y_pred = clf.predict(X[test_idx])
    print(f"Fold {fold+1}:")
    print(classification_report(
        y[test_idx], y_pred,
        target_names=["Normal","Stuck","Drift","Spike","Dead"]
    ))

Step 4: Evaluate results

  • Normal: precision 99.9%, recall 99.7%, F1-score 99.8%, support 218,200.
  • Stuck: precision 87.2%, recall 94.5%, F1-score 90.7%, support 380.
  • Drift: precision 78.4%, recall 82.3%, F1-score 80.3%, support 220.
  • Spike: precision 91.0%, recall 86.7%, F1-score 88.8%, support 150.
  • Dead: precision 96.0%, recall 96.0%, F1-score 96.0%, support 50.

Interpreting the results:

  • Normal: 99.7% recall means only 0.3% of normal readings get falsely flagged. With 218,000 normal windows, that is ~654 false alarms per 6 months, or ~3.6 per day. Acceptable for a 50-sensor building.
  • Stuck: 94.5% recall catches nearly all stuck sensors. The 87.2% precision means some normal low-variance periods (nighttime, stable weather) get flagged – the technician sees ~1 false stuck alert per week.
  • Drift: 82.3% recall is the weakest. Slow drift is genuinely hard to distinguish from real temperature changes. Adding HVAC schedule as a feature would improve this.
  • Dead: 96% both ways. Dead sensors are obvious (range=0, variance=0) and easy to classify.

Step 5: Deploy to edge gateway

The trained tree has max_depth=6, meaning at most 64 leaf nodes. Exported as a series of if-else statements, the model is under 2 KB:

# Auto-generated from sklearn export_text
def classify_sensor(variance, range_val, neighbor_delta,
                     spike_count, trend_slope, flatline_ratio):
    if variance < 0.01:
        if range_val < 0.001:
            return 4  # Dead
        if flatline_ratio > 0.85:
            return 1  # Stuck
        return 1      # Stuck (low variance)
    if spike_count > 5.5:
        if spike_count > 12:
            return 3  # Spike (severe)
        if variance > 8.2:
            return 3  # Spike
        return 0      # Normal (occasional spikes OK)
    if neighbor_delta > 3.8:
        if trend_slope > 0.02 or trend_slope < -0.02:
            return 2  # Drift (diverging with trend)
        return 0      # Normal (sensor in different zone)
    return 0          # Normal

Result: The building now detects sensor faults within 1 hour instead of 1 week. The 4-hour weekly manual inspection is replaced by reviewing ~4 alerts per day (under 10 minutes). Over 50 sensors, this catches 2-3 faults per month that previously went undetected for days, preventing HVAC energy waste estimated at $200-400 per undetected drift fault.

Key Insight: For tabular IoT data (sensor features arranged in rows and columns), decision trees and random forests consistently match or outperform deep learning while being faster, more interpretable, and easier to deploy on edge devices. Reserve neural networks for unstructured data like images, audio, or raw time-series where learned features outperform hand-engineered ones.

3.13 Self-Check Questions

Test Your Understanding

Before continuing, can you answer:

  1. What’s the difference between training and inference?
    • Hint: One learns, one applies
  2. Why do we extract features from raw sensor data?
    • Hint: Can an ML model understand “X: 0.2, Y: 9.8”?
  3. Why would you run ML on the edge instead of the cloud?
    • Hint: Think about speed, privacy, and connectivity
  4. Why is 95% accuracy potentially misleading?
    • Hint: What if 95% of your data is one class?

3.14 See Also

ML Fundamentals serves as the foundation for the entire IoT ML series. The core trade-off – cloud training vs. edge inference – appears throughout IoT systems: training requires powerful compute and large datasets, while inference must run on constrained devices with millisecond latency.

Within This Series:

Cross-Module:

External Resources:

3.15 Try It Yourself

Hands-On Challenge: Experience the training vs inference phases with a simple temperature anomaly detector

Task: Build a threshold-based “ML lite” model that learns normal temperature patterns:

  1. Training Phase (Simulate Learning):
    • Collect 100 temperature readings during normal operation (simulate: 20-25°C with random noise)
    • Calculate the mean and standard deviation of training data
    • Define anomaly threshold: mean ± 3 standard deviations
    • Save these learned parameters (mean, std) as your “model”
  2. Inference Phase (Apply Model):
    • Generate 20 new test readings (18 normal 20-25°C, 2 anomalies at 35°C and 10°C)
    • For each reading, check if it falls outside mean ± 3σ threshold
    • Count how many anomalies you detected correctly

What to Observe:

  • Training computes statistics once; inference applies them repeatedly
  • The “model” is just two numbers (mean, std) but encodes learned normal behavior
  • False positives occur if normal data has outliers; false negatives if anomalies are close to normal range

Extension: Try with smaller training sets (10 samples vs 100)—notice how model quality depends on training data quantity.

Common Pitfalls

A model evaluated on the same data it was trained on will appear highly accurate due to memorisation rather than generalisation. Always evaluate on a held-out test set or with cross-validation.

Random shuffling of time-series IoT data before splitting produces train/test contamination because future information leaks into training. Always split chronologically: train on earlier data, test on later data.

A model trained on data with 0.1% anomalies will learn to always predict ‘normal’ and achieve 99.9% accuracy. Address class imbalance explicitly with oversampling (SMOTE), undersampling, or class-weighted loss functions.

Laboratory cross-validation uses historical data; real deployments encounter distribution shift (new equipment, changed processes, seasonal variation). Always run a shadow deployment period where the model makes predictions alongside human decisions before trusting it for automated action.

3.16 Summary

This chapter introduced the fundamentals of machine learning for IoT:

  • Training vs Inference: Training learns patterns in the cloud; inference applies them on edge devices
  • Feature Extraction: Raw sensor data must be transformed into meaningful statistics
  • Model Compression: Techniques like quantization and pruning enable deployment on constrained devices
  • Accuracy Metrics: Use precision, recall, and F1-score instead of just accuracy for imbalanced IoT data

3.17 What’s Next