3  IoT Machine Learning Fundamentals

3.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Distinguish Training from Inference: Differentiate between training and inference phases in machine learning
  • Explain Feature Extraction: Describe why raw sensor data must be transformed into meaningful features
  • Evaluate Accuracy Metrics: Assess why accuracy alone is misleading for imbalanced IoT datasets
  • Compare Edge vs Cloud ML: Analyze the trade-offs between running ML on devices vs in the cloud
In 60 Seconds

Machine learning for IoT involves selecting, training, validating, and deploying models that map sensor feature vectors to actionable predictions — and the unique IoT constraints are the scarcity of labelled data, the need for edge deployment, and the non-stationarity of sensor environments. Understanding the bias-variance trade-off and the difference between supervised, unsupervised, and semi-supervised learning is the foundation for all subsequent IoT ML work.

Minimum Viable Understanding: IoT Machine Learning

Core Concept: Machine learning transforms raw sensor data (numbers like “9.8 m/s^2”) into meaningful insights (“user is running”). Models learn patterns from historical data (training) then apply those patterns to new data (inference).

Why It Matters: IoT generates billions of data points daily. Without ML, you’d need humans to interpret every reading. With ML, sensors become intelligent - detecting falls, predicting equipment failures, and recognizing activities automatically.

Key Takeaway: Feature engineering (extracting meaningful statistics from raw data) contributes more to model accuracy than algorithm choice. Spend 80% of your time on features and 20% on model selection.

3.2 Prerequisites

Before diving into this chapter, you should be familiar with:

  • Data Storage and Databases: Understanding how IoT data is collected, stored, and accessed provides foundation for building machine learning models
  • Edge and Fog Computing: Knowledge of distributed computing architectures helps contextualize where ML inferencing occurs
  • Basic statistics concepts: Understanding mean, variance, and standard deviation helps with feature engineering

~15 min | Beginner | P10.C02.U01

Key Concepts

  • Supervised learning: Training a model on labelled examples (sensor readings paired with known outcomes) to predict labels on new, unseen data — requires labelled IoT datasets which are often expensive to obtain.
  • Unsupervised learning: Finding patterns in unlabelled sensor data — clustering similar device behaviours, detecting anomalies — without requiring labelled training examples.
  • Bias-variance trade-off: The fundamental tension between model simplicity (high bias, underfitting) and model complexity (high variance, overfitting); finding the right balance is the core model selection challenge in IoT ML.
  • Cross-validation: A technique for estimating model generalisation performance by training and evaluating on multiple different splits of the available data, producing a more reliable accuracy estimate than a single train/test split.
  • Class imbalance: The situation where anomalous or failure events are far less common than normal events in training data (0.1% vs 99.9%), requiring techniques like oversampling, undersampling, or class-weighted loss functions.
  • Model generalisation: The ability of a trained model to perform accurately on sensor data from deployment conditions not seen during training — the ultimate goal of IoT ML model development.
Chapter Series: Modeling and Inferencing

This is the first chapter in a series on IoT Machine Learning:

  1. ML Fundamentals (this chapter) - Core concepts, training vs inference
  2. Mobile Sensing & Activity Recognition - HAR, transportation detection
  3. IoT ML Pipeline - 7-step pipeline, best practices
  4. Edge ML & Deployment - TinyML, quantization
  5. Audio Feature Processing - MFCC, keyword recognition
  6. Feature Engineering - Feature design and selection
  7. Production ML - Monitoring, anomaly detection
Geometric visualization of the data science pipeline for IoT showing stages from raw sensor data through feature engineering, model training, validation, and deployment with feedback loops for continuous improvement
Figure 3.1: The data science pipeline for IoT follows a systematic progression from raw sensor streams to deployed models.

3.3 Getting Started (For Beginners)

Machine Learning is like teaching your sensors to become super-smart detectives!

3.3.1 The Sensor Squad Adventure: The Pattern Patrol

It was a quiet Tuesday when Motion Mo noticed something strange. “Hey team, I keep seeing the same pattern every day! The humans wake up at 7am, eat breakfast at 7:30, and leave for work at 8:15.”

Thermo the Temperature Sensor nodded excitedly. “I see patterns too! The house gets warm around 6pm when everyone comes home, and it cools down at 11pm when they go to bed.”

Signal Sam gathered everyone for an important announcement. “Sensors, you’ve just discovered something AMAZING. You’re doing what scientists call Machine Learning - finding patterns in data and using them to make smart predictions!”

Sam drew a picture to explain:

Step 1 - COLLECT: “First, we gather lots of examples. Motion Mo, you’ve recorded 10,000 mornings of people waking up.”

Step 2 - LEARN: “Then we show a computer all those examples. ‘See how the motion always starts in the bedroom, then moves to the bathroom, then to the kitchen? THAT’S the wake-up pattern!’”

Step 3 - PREDICT: “Now the smart part! When Motion Mo sees that same pattern starting, the computer can predict: ‘Someone’s waking up! Better start warming up the coffee maker!’”

3.3.2 Key Words for Kids

Word What It Means
Machine Learning Teaching computers to find patterns and make predictions
Pattern Something that happens the same way over and over
Training Showing the computer LOTS of examples so it can learn
Prediction A smart guess about what will happen next
Model The “brain” that the computer builds after learning
Inference When the trained model looks at NEW data and makes a prediction

3.3.3 Try This at Home!

Be a Pattern Detective:

  1. For one week, write down what time you go to bed and wake up
  2. Also note if it’s a school day or weekend
  3. After a week, look at your data - can you find the pattern?
  4. Now make a PREDICTION: What time will you wake up next Saturday?

Congratulations - you just did Machine Learning with your brain!

3.4 IoT Machine Learning Lifecycle Overview

The IoT ML lifecycle involves distinct phases, from data collection to deployed model maintenance:

Flowchart showing IoT machine learning lifecycle with six phases: Data Collection from sensors, Feature Engineering for pattern extraction, Model Training in cloud, Model Compression for edge deployment, Edge Inference for real-time predictions, and Monitoring for continuous improvement with feedback loop

Key Phases:

Phase Purpose Location Output
Data Collection Gather sensor readings Edge devices Raw time-series data
Feature Engineering Extract meaningful patterns Edge or Cloud Feature vectors
Model Training Learn from historical data Cloud (GPUs) Trained model weights
Compression Reduce model size Cloud Optimized model
Edge Inference Apply model to new data Edge devices Predictions
Monitoring Track model performance Cloud Drift alerts

3.5 What is Machine Learning for IoT?

Simple Explanation

Analogy: ML for IoT is like teaching a smart assistant to recognize patterns.

Imagine you have a fitness tracker. Instead of just showing raw numbers (steps: 5000, heart rate: 120), it can tell you: - “You’re running” (not walking) - “Your workout intensity is high” - “You might be getting sick” (unusual heart patterns)

That’s machine learning—turning raw sensor data into meaningful insights!

Diagram showing machine learning transformation: raw accelerometer values (9.8, 0.2, 0.1 m/s squared) flow through an ML model to produce human-readable outputs like activity state equals running and intensity equals high
Figure 3.2: Machine Learning Transforms Raw Sensor Numbers into Actionable Insights

3.6 Training vs Inference: Two Phases

Machine learning for IoT follows a distinct two-phase workflow that separates learning from application:

Phase 1: Training (Cloud, One-Time)

  1. Data Collection: Gather labeled examples—10,000 hours of walking/running/sitting accelerometer data from diverse users
  2. Feature Extraction: Transform raw sensor streams into statistical features (mean, variance, FFT peaks)
  3. Model Training: Feed features into algorithm (Random Forest, Neural Network) that learns patterns distinguishing classes
  4. Validation: Test on held-out data to measure accuracy and tune hyperparameters
  5. Compression: Quantize weights (float32 → int8) and prune weak connections to fit edge device constraints
  6. Output: A trained model file (weights + architecture) ready for deployment

Phase 2: Inference (Edge Device, Real-Time)

  1. Sensor Input: Accelerometer provides live data stream (50 Hz)
  2. Feature Extraction: Same feature pipeline as training (must match exactly)
  3. Model Execution: Forward pass through trained model using extracted features
  4. Prediction Output: Classification label (e.g., “running”) or confidence scores
  5. Action: Trigger response (update dashboard, log event, adjust system)

Key Separation: Training happens once in the cloud with powerful GPUs over days; inference runs continuously on devices (ESP32, smartphone) with millisecond latency. The model learned during training is frozen during inference—no learning happens on the device.

Phase What Happens Where Example
Training Learn patterns from data Cloud (powerful computers) Analyze 10,000 hours of walking/running data
Inference Apply learned patterns Edge/Device (real-time) Detect current activity from live sensor
Diagram showing two-phase ML workflow: Training phase in cloud with powerful GPUs processes historical data to create a model, then the compressed model is deployed to edge devices for real-time inference on live sensor data
Figure 3.3: Cloud Training to Edge Inference Deployment Pipeline

3.7 Model Compression for IoT

IoT devices have limited resources. Large models must be compressed before deployment:

Model compression pipeline showing original model (100MB, 95% accuracy) undergoing quantization (float32 to int8), pruning (removing weak connections), and knowledge distillation to produce compressed model (2MB, 89% accuracy) suitable for edge deployment
Figure 3.4: Model compression enables 50x size reduction with only 6% accuracy loss, making real-time inference possible on resource-constrained IoT devices.

Common Compression Techniques:

Technique How It Works Size Reduction Accuracy Impact
Quantization Reduce precision (float32 to int8) 4x 1-3% loss
Pruning Remove near-zero weights 2-10x 1-5% loss
Knowledge Distillation Train small model to mimic large one 10-100x 2-10% loss
Architecture Search Find efficient model structure Variable Can improve!

Model Compression for ESP32 Deployment:

Compressing a neural network for embedded inference involves quantifying memory and latency trade-offs.

Original model (32-bit floating point): \[ \text{Model size} = N_{\text{params}} \times 4 \text{ bytes} \] For a HAR model with \(N_{\text{params}} = 50{,}000\) parameters: \[ \text{Size}_{\text{float32}} = 50{,}000 \times 4 = 200{,}000 \text{ bytes} = 195 \text{ KB} \]

Quantized model (8-bit integers): \[ \text{Size}_{\text{int8}} = 50{,}000 \times 1 = 50{,}000 \text{ bytes} = 49 \text{ KB} \] Compression ratio: \(195 / 49 \approx 4\times\) size reduction.

Inference latency improvement: INT8 operations on ESP32 (240 MHz) are ~3× faster than FP32: \[ \text{Latency}_{\text{int8}} \approx \frac{\text{Latency}_{\text{float32}}}{3} = \frac{150 \text{ ms}}{3} = 50 \text{ ms} \]

Accuracy degradation is typically <2% for quantized models, making this trade-off acceptable for edge deployment where 10 Hz inference (100 ms budget) is sufficient.

3.7.1 Explore: Model Compression Calculator

Adjust model parameters to see how quantization and pruning affect size, latency, and device fit.

3.8 Feature Extraction: What the Model Actually Sees

ML models don’t understand raw sensor readings. We extract features—meaningful statistics:

Raw Data Extracted Features Why It Matters
1000 accelerometer samples Mean, variance, peak frequency Running has higher variance than sitting
Heart rate over 1 minute Average, min, max, variability Exercise vs rest patterns
Temperature readings Rate of change, trend Fire detection, HVAC optimization
Flowchart showing feature extraction pipeline: raw accelerometer samples (1000 data points) flow through a sliding window into statistical feature extraction producing mean, variance, peak frequency, and zero crossings, which then feed into the ML model for activity classification
Figure 3.5: Feature Extraction Pipeline for Activity Recognition
Key Takeaway

In one sentence: The hardest part of IoT machine learning is not the algorithm - it is extracting the right features from noisy sensor data and compressing models small enough to run on constrained devices.

Remember this: Feature engineering contributes more to model accuracy than algorithm choice - spend 80% of your time on features (domain knowledge) and 20% on model selection.

3.9 Edge ML: Running AI on Tiny Devices

IoT devices have limited resources. The following decision tree helps you choose between edge, cloud, or hybrid deployment:

Decision tree diagram for choosing between Edge ML, Cloud ML, or Hybrid deployment based on factors like latency requirements, connectivity reliability, privacy concerns, and model complexity

Edge ML Overview

Edge ML means running models directly on devices rather than in the cloud:

Where Pros Cons Example
Cloud Powerful, complex models Needs internet, latency Voice assistants (Alexa)
Edge Fast, works offline Limited model size Fall detection on smartwatch
Hybrid Best of both Complex architecture Process locally, train in cloud

Why Edge ML?

  • Fast: No network delay (critical for safety)
  • Private: Data never leaves device
  • Cheap: No cloud costs
  • Efficient: Process only what’s needed

Tradeoff: Edge ML vs Cloud ML

Option A (Edge ML): Deploy compressed models directly on IoT devices (microcontrollers, smartphones, gateways). Inference latency <10ms, works offline, data stays private. Model size limited to device memory (50KB-50MB typically).

Option B (Cloud ML): Send sensor data to cloud servers for processing with large, accurate models. Can use state-of-the-art architectures (transformers, large CNNs) with no size constraints. Requires reliable connectivity, adds 50-500ms network latency.

Decision Factors: Choose Edge ML when latency is critical (fall detection, collision avoidance), privacy is paramount (health data, location tracking), connectivity is unreliable (remote industrial sites, mobile applications), or operating costs must be minimized (millions of devices, metered data plans). Choose Cloud ML when model accuracy is paramount and simpler edge models are insufficient, when models need frequent updates with new training data, or when device constraints are severe (<10KB available memory). Most production systems use a hybrid approach: edge handles time-critical inferences and pre-filtering while cloud handles complex analytics, model training, and aggregated insights.

3.10 Common Misconception: Accuracy Metrics

“95% Accuracy Means My Model Is Great!”

The Trap: Many IoT developers celebrate achieving 95% accuracy, assuming this guarantees production success. However, accuracy alone is misleading for imbalanced datasets.

Real-World Example: Smart Factory Anomaly Detection

A manufacturing company deployed a 95% accurate anomaly detector:

  • Dataset: 10,000 normal operations + 500 anomalies (95% normal, 5% anomalies)
  • Naive model: Always predict “normal” - 95% accuracy! (Detects 0% of anomalies)
  • 95% accurate model: Catches 90% of anomalies BUT generates 10% false positives

The Financial Impact:

Production line: 1,000,000 checks/day
False positives: 950,000 normal x 10% = 95,000 false alarms/day
Cost: 95,000 false alarms x $50 investigation = $4.75M/day wasted

The Fix: Right Metrics for IoT

Instead of accuracy, use:

  1. Precision: True positives / (true positives + false positives)
  2. Recall: True positives / actual anomalies
  3. F1-Score: Harmonic mean balancing precision/recall
  4. Specificity: True negatives / actual normals

Key Lesson: For rare events (falls, machine failures, security breaches), optimize for high specificity (99.9%+) and high recall (95%+), NOT overall accuracy.

3.11 Worked Example: Building a Simple Activity Classifier

Worked Example: Smartphone Activity Recognition

Scenario: You are building a fitness app that detects whether the user is walking, running, or stationary using the smartphone’s accelerometer.

Given:

  • Accelerometer sampling at 50 Hz (50 readings per second)
  • 3-axis data: X (left-right), Y (forward-back), Z (up-down)
  • Target activities: Stationary, Walking, Running
  • Training data: 100 users, 10 minutes each activity x 3 activities = 3000 minutes total
  • Deployment target: Android smartphone, must run in background with <5% battery impact

Step-by-Step Solution:

  1. Window the data: Group samples into 2-second windows (100 samples per window)

    • Why 2 seconds? Long enough to capture a walking stride (~1s), short enough for responsive detection
  2. Extract features per window (for each axis and magnitude):

    Feature Formula Why It Helps
    Mean sum(x)/n Detects orientation changes
    Standard Deviation sqrt(sum((x-mean)^2)/n) Higher for running than walking
    Peak-to-Peak max(x) - min(x) Running has larger amplitude
    Zero Crossing Rate count(sign changes)/n Higher for rhythmic activities

    Total: 4 features x 4 (X, Y, Z, magnitude) = 16 features per window

  3. Train a Random Forest classifier:

    • Input: 16 features
    • Output: Stationary (0), Walking (1), Running (2)
    • Why Random Forest? Handles mixed feature types, fast inference, no normalization needed
  4. Evaluate using stratified 5-fold cross-validation:

    Activity Precision Recall F1-Score
    Stationary 99.2% 99.5% 99.3%
    Walking 94.1% 92.8% 93.4%
    Running 96.5% 97.3% 96.9%
  5. Deploy with optimizations:

    • Model size: 45 KB (100 trees, max depth 8)
    • Inference time: 2ms per window
    • Battery: Sample 5s, sleep 5s = 50% duty cycle, ~3% battery/hour

Result: 95.2% macro F1-score with 3% hourly battery consumption. Walking/Running confusion (6% of walking detected as running) occurs during fast walking - acceptable for fitness tracking.

Key Insight: Simple features (mean, std) outperformed complex frequency features in this case. Always start simple - add complexity only when needed.

3.12 Worked Example: Decision Tree for Sensor Fault Classification

Worked Example: Detecting Faulty Sensors in a Building HVAC System

Scenario: A commercial building has 50 temperature sensors monitoring HVAC zones. Over time, sensors develop faults – stuck readings, drift, erratic spikes, or gradual failure. Currently, a technician manually inspects sensor logs weekly, taking 4 hours. You want to automate fault detection using a decision tree classifier.

Given:

  • 50 temperature sensors, sampling every 60 seconds
  • 6 months of labeled data (technician flagged 312 confirmed faults)
  • Fault categories: Normal (0), Stuck (1), Drift (2), Spike (3), Dead (4)
  • Goal: Flag faulty sensors within 1 hour of fault onset

Step 1: Engineer features from raw temperature streams

For each sensor, compute features over a 1-hour sliding window (60 readings):

Feature Formula What It Detects
Variance var(readings) Stuck sensors have near-zero variance
Range max - min Dead sensors have range = 0
Delta from neighbors abs(mean_self - mean_nearest_3) Drifted sensors diverge from nearby sensors
Spike count count(abs(diff) > 3 * std) Erratic sensors have frequent large jumps
Trend slope linear_regression_slope(readings) Drifting sensors show consistent positive or negative slope
Flatline ratio count(consecutive_identical) / n Stuck sensors repeat the same value

Step 2: Examine the decision boundaries

A decision tree learns rules like these from labeled data:

                  [variance < 0.01?]
                  /                \
               YES                  NO
            [range == 0?]       [spike_count > 5?]
            /          \        /              \
         YES           NO    YES               NO
        DEAD         STUCK  SPIKE     [neighbor_delta > 4C?]
                                      /                 \
                                   YES                   NO
                                  DRIFT               NORMAL

Why a decision tree and not a neural network? Three reasons specific to this IoT use case:

  1. Interpretability: When a technician receives a “Sensor 23: DRIFT” alert, they need to understand WHY. A decision tree can report: “Flagged because variance=0.34 (OK), but neighbor_delta=5.2C (threshold: 4C).” A neural network cannot explain its reasoning this clearly.

  2. Small training set: 312 fault examples across 5 classes is modest. Decision trees perform well with small datasets. Deep learning typically needs 10,000+ examples per class.

  3. Fast inference: The tree above requires 3-4 comparisons per prediction. On an ESP32, this takes microseconds. A neural network would take milliseconds and consume more memory.

Step 3: Train and evaluate with Python

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import classification_report
import numpy as np

# Feature matrix: 50 sensors x 6 months x (24 windows/day)
# = ~219,000 samples with 6 features each
# Labels: 0=Normal, 1=Stuck, 2=Drift, 3=Spike, 4=Dead

X = np.load("sensor_features.npy")   # shape: (219000, 6)
y = np.load("sensor_labels.npy")     # shape: (219000,)

# Class distribution (highly imbalanced):
# Normal: 218,200 (99.6%), Stuck: 380 (0.17%),
# Drift: 220 (0.10%), Spike: 150 (0.07%), Dead: 50 (0.02%)

# Train decision tree with depth limit to prevent overfitting
clf = DecisionTreeClassifier(
    max_depth=6,           # Limit complexity
    min_samples_leaf=20,   # Require 20+ samples per leaf
    class_weight="balanced" # Upweight rare fault classes
)

# Stratified 5-fold cross-validation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
for fold, (train_idx, test_idx) in enumerate(skf.split(X, y)):
    clf.fit(X[train_idx], y[train_idx])
    y_pred = clf.predict(X[test_idx])
    print(f"Fold {fold+1}:")
    print(classification_report(
        y[test_idx], y_pred,
        target_names=["Normal","Stuck","Drift","Spike","Dead"]
    ))

Step 4: Evaluate results

Class Precision Recall F1-Score Support
Normal 99.9% 99.7% 99.8% 218,200
Stuck 87.2% 94.5% 90.7% 380
Drift 78.4% 82.3% 80.3% 220
Spike 91.0% 86.7% 88.8% 150
Dead 96.0% 96.0% 96.0% 50

Interpreting the results:

  • Normal: 99.7% recall means only 0.3% of normal readings get falsely flagged. With 218,000 normal windows, that is ~654 false alarms per 6 months, or ~3.6 per day. Acceptable for a 50-sensor building.
  • Stuck: 94.5% recall catches nearly all stuck sensors. The 87.2% precision means some normal low-variance periods (nighttime, stable weather) get flagged – the technician sees ~1 false stuck alert per week.
  • Drift: 82.3% recall is the weakest. Slow drift is genuinely hard to distinguish from real temperature changes. Adding HVAC schedule as a feature would improve this.
  • Dead: 96% both ways. Dead sensors are obvious (range=0, variance=0) and easy to classify.

Step 5: Deploy to edge gateway

The trained tree has max_depth=6, meaning at most 64 leaf nodes. Exported as a series of if-else statements, the model is under 2 KB:

# Auto-generated from sklearn export_text
def classify_sensor(variance, range_val, neighbor_delta,
                     spike_count, trend_slope, flatline_ratio):
    if variance < 0.01:
        if range_val < 0.001:
            return 4  # Dead
        if flatline_ratio > 0.85:
            return 1  # Stuck
        return 1      # Stuck (low variance)
    if spike_count > 5.5:
        if spike_count > 12:
            return 3  # Spike (severe)
        if variance > 8.2:
            return 3  # Spike
        return 0      # Normal (occasional spikes OK)
    if neighbor_delta > 3.8:
        if trend_slope > 0.02 or trend_slope < -0.02:
            return 2  # Drift (diverging with trend)
        return 0      # Normal (sensor in different zone)
    return 0          # Normal

Result: The building now detects sensor faults within 1 hour instead of 1 week. The 4-hour weekly manual inspection is replaced by reviewing ~4 alerts per day (under 10 minutes). Over 50 sensors, this catches 2-3 faults per month that previously went undetected for days, preventing HVAC energy waste estimated at $200-400 per undetected drift fault.

Key Insight: For tabular IoT data (sensor features arranged in rows and columns), decision trees and random forests consistently match or outperform deep learning while being faster, more interpretable, and easier to deploy on edge devices. Reserve neural networks for unstructured data like images, audio, or raw time-series where learned features outperform hand-engineered ones.

3.13 Self-Check Questions

Test Your Understanding

Before continuing, can you answer:

  1. What’s the difference between training and inference?
    • Hint: One learns, one applies
  2. Why do we extract features from raw sensor data?
    • Hint: Can an ML model understand “X: 0.2, Y: 9.8”?
  3. Why would you run ML on the edge instead of the cloud?
    • Hint: Think about speed, privacy, and connectivity
  4. Why is 95% accuracy potentially misleading?
    • Hint: What if 95% of your data is one class?

3.14 See Also

ML Fundamentals serves as the foundation for the entire IoT ML series. The core trade-off – cloud training vs. edge inference – appears throughout IoT systems: training requires powerful compute and large datasets, while inference must run on constrained devices with millisecond latency.

Within This Series:

Cross-Module:

External Resources:

3.15 Try It Yourself

Hands-On Challenge: Experience the training vs inference phases with a simple temperature anomaly detector

Task: Build a threshold-based “ML lite” model that learns normal temperature patterns:

  1. Training Phase (Simulate Learning):
    • Collect 100 temperature readings during normal operation (simulate: 20-25°C with random noise)
    • Calculate the mean and standard deviation of training data
    • Define anomaly threshold: mean ± 3 standard deviations
    • Save these learned parameters (mean, std) as your “model”
  2. Inference Phase (Apply Model):
    • Generate 20 new test readings (18 normal 20-25°C, 2 anomalies at 35°C and 10°C)
    • For each reading, check if it falls outside mean ± 3σ threshold
    • Count how many anomalies you detected correctly

What to Observe:

  • Training computes statistics once; inference applies them repeatedly
  • The “model” is just two numbers (mean, std) but encodes learned normal behavior
  • False positives occur if normal data has outliers; false negatives if anomalies are close to normal range

Extension: Try with smaller training sets (10 samples vs 100)—notice how model quality depends on training data quantity.

Common Pitfalls

A model evaluated on the same data it was trained on will appear highly accurate due to memorisation rather than generalisation. Always evaluate on a held-out test set or with cross-validation.

Random shuffling of time-series IoT data before splitting produces train/test contamination because future information leaks into training. Always split chronologically: train on earlier data, test on later data.

A model trained on data with 0.1% anomalies will learn to always predict ‘normal’ and achieve 99.9% accuracy. Address class imbalance explicitly with oversampling (SMOTE), undersampling, or class-weighted loss functions.

Laboratory cross-validation uses historical data; real deployments encounter distribution shift (new equipment, changed processes, seasonal variation). Always run a shadow deployment period where the model makes predictions alongside human decisions before trusting it for automated action.

3.16 Summary

This chapter introduced the fundamentals of machine learning for IoT:

  • Training vs Inference: Training learns patterns in the cloud; inference applies them on edge devices
  • Feature Extraction: Raw sensor data must be transformed into meaningful statistics
  • Model Compression: Techniques like quantization and pruning enable deployment on constrained devices
  • Accuracy Metrics: Use precision, recall, and F1-score instead of just accuracy for imbalanced IoT data

3.17 What’s Next

Direction Chapter Link
Next Mobile Sensing and Activity Recognition modeling-mobile-sensing.html
Related IoT ML Pipeline modeling-pipeline.html
Related Edge ML and TinyML Deployment modeling-edge-deployment.html
Related Feature Engineering modeling-feature-engineering.html