3 IoT Machine Learning Fundamentals
3.1 Learning Objectives
By the end of this chapter, you will be able to:
- Distinguish Training from Inference: Differentiate between training and inference phases in machine learning
- Explain Feature Extraction: Describe why raw sensor data must be transformed into meaningful features
- Evaluate Accuracy Metrics: Assess why accuracy alone is misleading for imbalanced IoT datasets
- Compare Edge vs Cloud ML: Analyze the trade-offs between running ML on devices vs in the cloud
Minimum Viable Understanding: IoT Machine Learning
Core Concept: Machine learning transforms raw sensor data (numbers like “9.8 m/s^2”) into meaningful insights (“user is running”). Models learn patterns from historical data (training) then apply those patterns to new data (inference).
Why It Matters: IoT generates billions of data points daily. Without ML, you’d need humans to interpret every reading. With ML, sensors become intelligent - detecting falls, predicting equipment failures, and recognizing activities automatically.
Key Takeaway: Feature engineering (extracting meaningful statistics from raw data) contributes more to model accuracy than algorithm choice. Spend 80% of your time on features and 20% on model selection.
3.2 Prerequisites
Before diving into this chapter, you should be familiar with:
- Data Storage and Databases: Understanding how IoT data is collected, stored, and accessed provides foundation for building machine learning models
- Edge and Fog Computing: Knowledge of distributed computing architectures helps contextualize where ML inferencing occurs
- Basic statistics concepts: Understanding mean, variance, and standard deviation helps with feature engineering
Key Concepts
- Supervised learning: Training a model on labelled examples (sensor readings paired with known outcomes) to predict labels on new, unseen data — requires labelled IoT datasets which are often expensive to obtain.
- Unsupervised learning: Finding patterns in unlabelled sensor data — clustering similar device behaviours, detecting anomalies — without requiring labelled training examples.
- Bias-variance trade-off: The fundamental tension between model simplicity (high bias, underfitting) and model complexity (high variance, overfitting); finding the right balance is the core model selection challenge in IoT ML.
- Cross-validation: A technique for estimating model generalisation performance by training and evaluating on multiple different splits of the available data, producing a more reliable accuracy estimate than a single train/test split.
- Class imbalance: The situation where anomalous or failure events are far less common than normal events in training data (0.1% vs 99.9%), requiring techniques like oversampling, undersampling, or class-weighted loss functions.
- Model generalisation: The ability of a trained model to perform accurately on sensor data from deployment conditions not seen during training — the ultimate goal of IoT ML model development.
Chapter Series: Modeling and Inferencing
This is the first chapter in a series on IoT Machine Learning:
- ML Fundamentals (this chapter) - Core concepts, training vs inference
- Mobile Sensing & Activity Recognition - HAR, transportation detection
- IoT ML Pipeline - 7-step pipeline, best practices
- Edge ML & Deployment - TinyML, quantization
- Audio Feature Processing - MFCC, keyword recognition
- Feature Engineering - Feature design and selection
- Production ML - Monitoring, anomaly detection
3.3 Getting Started (For Beginners)
For Kids: Meet the Sensor Squad!
Machine Learning is like teaching your sensors to become super-smart detectives!
3.3.1 The Sensor Squad Adventure: The Pattern Patrol
It was a quiet Tuesday when Motion Mo noticed something strange. “Hey team, I keep seeing the same pattern every day! The humans wake up at 7am, eat breakfast at 7:30, and leave for work at 8:15.”
Thermo the Temperature Sensor nodded excitedly. “I see patterns too! The house gets warm around 6pm when everyone comes home, and it cools down at 11pm when they go to bed.”
Signal Sam gathered everyone for an important announcement. “Sensors, you’ve just discovered something AMAZING. You’re doing what scientists call Machine Learning - finding patterns in data and using them to make smart predictions!”
Sam drew a picture to explain:
Step 1 - COLLECT: “First, we gather lots of examples. Motion Mo, you’ve recorded 10,000 mornings of people waking up.”
Step 2 - LEARN: “Then we show a computer all those examples. ‘See how the motion always starts in the bedroom, then moves to the bathroom, then to the kitchen? THAT’S the wake-up pattern!’”
Step 3 - PREDICT: “Now the smart part! When Motion Mo sees that same pattern starting, the computer can predict: ‘Someone’s waking up! Better start warming up the coffee maker!’”
3.3.2 Key Words for Kids
| Word | What It Means |
|---|---|
| Machine Learning | Teaching computers to find patterns and make predictions |
| Pattern | Something that happens the same way over and over |
| Training | Showing the computer LOTS of examples so it can learn |
| Prediction | A smart guess about what will happen next |
| Model | The “brain” that the computer builds after learning |
| Inference | When the trained model looks at NEW data and makes a prediction |
3.3.3 Try This at Home!
Be a Pattern Detective:
- For one week, write down what time you go to bed and wake up
- Also note if it’s a school day or weekend
- After a week, look at your data - can you find the pattern?
- Now make a PREDICTION: What time will you wake up next Saturday?
Congratulations - you just did Machine Learning with your brain!
3.4 IoT Machine Learning Lifecycle Overview
The IoT ML lifecycle involves distinct phases, from data collection to deployed model maintenance:
Key Phases:
| Phase | Purpose | Location | Output |
|---|---|---|---|
| Data Collection | Gather sensor readings | Edge devices | Raw time-series data |
| Feature Engineering | Extract meaningful patterns | Edge or Cloud | Feature vectors |
| Model Training | Learn from historical data | Cloud (GPUs) | Trained model weights |
| Compression | Reduce model size | Cloud | Optimized model |
| Edge Inference | Apply model to new data | Edge devices | Predictions |
| Monitoring | Track model performance | Cloud | Drift alerts |
3.5 What is Machine Learning for IoT?
Simple Explanation
Analogy: ML for IoT is like teaching a smart assistant to recognize patterns.
Imagine you have a fitness tracker. Instead of just showing raw numbers (steps: 5000, heart rate: 120), it can tell you: - “You’re running” (not walking) - “Your workout intensity is high” - “You might be getting sick” (unusual heart patterns)
That’s machine learning—turning raw sensor data into meaningful insights!
3.6 Training vs Inference: Two Phases
Machine learning for IoT follows a distinct two-phase workflow that separates learning from application:
Phase 1: Training (Cloud, One-Time)
- Data Collection: Gather labeled examples—10,000 hours of walking/running/sitting accelerometer data from diverse users
- Feature Extraction: Transform raw sensor streams into statistical features (mean, variance, FFT peaks)
- Model Training: Feed features into algorithm (Random Forest, Neural Network) that learns patterns distinguishing classes
- Validation: Test on held-out data to measure accuracy and tune hyperparameters
- Compression: Quantize weights (float32 → int8) and prune weak connections to fit edge device constraints
- Output: A trained model file (weights + architecture) ready for deployment
Phase 2: Inference (Edge Device, Real-Time)
- Sensor Input: Accelerometer provides live data stream (50 Hz)
- Feature Extraction: Same feature pipeline as training (must match exactly)
- Model Execution: Forward pass through trained model using extracted features
- Prediction Output: Classification label (e.g., “running”) or confidence scores
- Action: Trigger response (update dashboard, log event, adjust system)
Key Separation: Training happens once in the cloud with powerful GPUs over days; inference runs continuously on devices (ESP32, smartphone) with millisecond latency. The model learned during training is frozen during inference—no learning happens on the device.
| Phase | What Happens | Where | Example |
|---|---|---|---|
| Training | Learn patterns from data | Cloud (powerful computers) | Analyze 10,000 hours of walking/running data |
| Inference | Apply learned patterns | Edge/Device (real-time) | Detect current activity from live sensor |
3.7 Model Compression for IoT
IoT devices have limited resources. Large models must be compressed before deployment:
Common Compression Techniques:
| Technique | How It Works | Size Reduction | Accuracy Impact |
|---|---|---|---|
| Quantization | Reduce precision (float32 to int8) | 4x | 1-3% loss |
| Pruning | Remove near-zero weights | 2-10x | 1-5% loss |
| Knowledge Distillation | Train small model to mimic large one | 10-100x | 2-10% loss |
| Architecture Search | Find efficient model structure | Variable | Can improve! |
Putting Numbers to It
Model Compression for ESP32 Deployment:
Compressing a neural network for embedded inference involves quantifying memory and latency trade-offs.
Original model (32-bit floating point): \[ \text{Model size} = N_{\text{params}} \times 4 \text{ bytes} \] For a HAR model with \(N_{\text{params}} = 50{,}000\) parameters: \[ \text{Size}_{\text{float32}} = 50{,}000 \times 4 = 200{,}000 \text{ bytes} = 195 \text{ KB} \]
Quantized model (8-bit integers): \[ \text{Size}_{\text{int8}} = 50{,}000 \times 1 = 50{,}000 \text{ bytes} = 49 \text{ KB} \] Compression ratio: \(195 / 49 \approx 4\times\) size reduction.
Inference latency improvement: INT8 operations on ESP32 (240 MHz) are ~3× faster than FP32: \[ \text{Latency}_{\text{int8}} \approx \frac{\text{Latency}_{\text{float32}}}{3} = \frac{150 \text{ ms}}{3} = 50 \text{ ms} \]
Accuracy degradation is typically <2% for quantized models, making this trade-off acceptable for edge deployment where 10 Hz inference (100 ms budget) is sufficient.
3.7.1 Explore: Model Compression Calculator
Adjust model parameters to see how quantization and pruning affect size, latency, and device fit.
3.8 Feature Extraction: What the Model Actually Sees
ML models don’t understand raw sensor readings. We extract features—meaningful statistics:
| Raw Data | Extracted Features | Why It Matters |
|---|---|---|
| 1000 accelerometer samples | Mean, variance, peak frequency | Running has higher variance than sitting |
| Heart rate over 1 minute | Average, min, max, variability | Exercise vs rest patterns |
| Temperature readings | Rate of change, trend | Fire detection, HVAC optimization |
Key Takeaway
In one sentence: The hardest part of IoT machine learning is not the algorithm - it is extracting the right features from noisy sensor data and compressing models small enough to run on constrained devices.
Remember this: Feature engineering contributes more to model accuracy than algorithm choice - spend 80% of your time on features (domain knowledge) and 20% on model selection.
3.9 Edge ML: Running AI on Tiny Devices
IoT devices have limited resources. The following decision tree helps you choose between edge, cloud, or hybrid deployment:
Edge ML Overview
Edge ML means running models directly on devices rather than in the cloud:
| Where | Pros | Cons | Example |
|---|---|---|---|
| Cloud | Powerful, complex models | Needs internet, latency | Voice assistants (Alexa) |
| Edge | Fast, works offline | Limited model size | Fall detection on smartwatch |
| Hybrid | Best of both | Complex architecture | Process locally, train in cloud |
Why Edge ML?
- Fast: No network delay (critical for safety)
- Private: Data never leaves device
- Cheap: No cloud costs
- Efficient: Process only what’s needed
Tradeoff: Edge ML vs Cloud ML
Option A (Edge ML): Deploy compressed models directly on IoT devices (microcontrollers, smartphones, gateways). Inference latency <10ms, works offline, data stays private. Model size limited to device memory (50KB-50MB typically).
Option B (Cloud ML): Send sensor data to cloud servers for processing with large, accurate models. Can use state-of-the-art architectures (transformers, large CNNs) with no size constraints. Requires reliable connectivity, adds 50-500ms network latency.
Decision Factors: Choose Edge ML when latency is critical (fall detection, collision avoidance), privacy is paramount (health data, location tracking), connectivity is unreliable (remote industrial sites, mobile applications), or operating costs must be minimized (millions of devices, metered data plans). Choose Cloud ML when model accuracy is paramount and simpler edge models are insufficient, when models need frequent updates with new training data, or when device constraints are severe (<10KB available memory). Most production systems use a hybrid approach: edge handles time-critical inferences and pre-filtering while cloud handles complex analytics, model training, and aggregated insights.
3.10 Common Misconception: Accuracy Metrics
“95% Accuracy Means My Model Is Great!”
The Trap: Many IoT developers celebrate achieving 95% accuracy, assuming this guarantees production success. However, accuracy alone is misleading for imbalanced datasets.
Real-World Example: Smart Factory Anomaly Detection
A manufacturing company deployed a 95% accurate anomaly detector:
- Dataset: 10,000 normal operations + 500 anomalies (95% normal, 5% anomalies)
- Naive model: Always predict “normal” - 95% accuracy! (Detects 0% of anomalies)
- 95% accurate model: Catches 90% of anomalies BUT generates 10% false positives
The Financial Impact:
Production line: 1,000,000 checks/day
False positives: 950,000 normal x 10% = 95,000 false alarms/day
Cost: 95,000 false alarms x $50 investigation = $4.75M/day wasted
The Fix: Right Metrics for IoT
Instead of accuracy, use:
- Precision: True positives / (true positives + false positives)
- Recall: True positives / actual anomalies
- F1-Score: Harmonic mean balancing precision/recall
- Specificity: True negatives / actual normals
Key Lesson: For rare events (falls, machine failures, security breaches), optimize for high specificity (99.9%+) and high recall (95%+), NOT overall accuracy.
3.11 Worked Example: Building a Simple Activity Classifier
Worked Example: Smartphone Activity Recognition
Scenario: You are building a fitness app that detects whether the user is walking, running, or stationary using the smartphone’s accelerometer.
Given:
- Accelerometer sampling at 50 Hz (50 readings per second)
- 3-axis data: X (left-right), Y (forward-back), Z (up-down)
- Target activities: Stationary, Walking, Running
- Training data: 100 users, 10 minutes each activity x 3 activities = 3000 minutes total
- Deployment target: Android smartphone, must run in background with <5% battery impact
Step-by-Step Solution:
Window the data: Group samples into 2-second windows (100 samples per window)
- Why 2 seconds? Long enough to capture a walking stride (~1s), short enough for responsive detection
Extract features per window (for each axis and magnitude):
Feature Formula Why It Helps Mean sum(x)/n Detects orientation changes Standard Deviation sqrt(sum((x-mean)^2)/n) Higher for running than walking Peak-to-Peak max(x) - min(x) Running has larger amplitude Zero Crossing Rate count(sign changes)/n Higher for rhythmic activities Total: 4 features x 4 (X, Y, Z, magnitude) = 16 features per window
Train a Random Forest classifier:
- Input: 16 features
- Output: Stationary (0), Walking (1), Running (2)
- Why Random Forest? Handles mixed feature types, fast inference, no normalization needed
Evaluate using stratified 5-fold cross-validation:
Activity Precision Recall F1-Score Stationary 99.2% 99.5% 99.3% Walking 94.1% 92.8% 93.4% Running 96.5% 97.3% 96.9% Deploy with optimizations:
- Model size: 45 KB (100 trees, max depth 8)
- Inference time: 2ms per window
- Battery: Sample 5s, sleep 5s = 50% duty cycle, ~3% battery/hour
Result: 95.2% macro F1-score with 3% hourly battery consumption. Walking/Running confusion (6% of walking detected as running) occurs during fast walking - acceptable for fitness tracking.
Key Insight: Simple features (mean, std) outperformed complex frequency features in this case. Always start simple - add complexity only when needed.
3.12 Worked Example: Decision Tree for Sensor Fault Classification
Worked Example: Detecting Faulty Sensors in a Building HVAC System
Scenario: A commercial building has 50 temperature sensors monitoring HVAC zones. Over time, sensors develop faults – stuck readings, drift, erratic spikes, or gradual failure. Currently, a technician manually inspects sensor logs weekly, taking 4 hours. You want to automate fault detection using a decision tree classifier.
Given:
- 50 temperature sensors, sampling every 60 seconds
- 6 months of labeled data (technician flagged 312 confirmed faults)
- Fault categories: Normal (0), Stuck (1), Drift (2), Spike (3), Dead (4)
- Goal: Flag faulty sensors within 1 hour of fault onset
Step 1: Engineer features from raw temperature streams
For each sensor, compute features over a 1-hour sliding window (60 readings):
| Feature | Formula | What It Detects |
|---|---|---|
| Variance | var(readings) | Stuck sensors have near-zero variance |
| Range | max - min | Dead sensors have range = 0 |
| Delta from neighbors | abs(mean_self - mean_nearest_3) | Drifted sensors diverge from nearby sensors |
| Spike count | count(abs(diff) > 3 * std) | Erratic sensors have frequent large jumps |
| Trend slope | linear_regression_slope(readings) | Drifting sensors show consistent positive or negative slope |
| Flatline ratio | count(consecutive_identical) / n | Stuck sensors repeat the same value |
Step 2: Examine the decision boundaries
A decision tree learns rules like these from labeled data:
[variance < 0.01?]
/ \
YES NO
[range == 0?] [spike_count > 5?]
/ \ / \
YES NO YES NO
DEAD STUCK SPIKE [neighbor_delta > 4C?]
/ \
YES NO
DRIFT NORMAL
Why a decision tree and not a neural network? Three reasons specific to this IoT use case:
Interpretability: When a technician receives a “Sensor 23: DRIFT” alert, they need to understand WHY. A decision tree can report: “Flagged because variance=0.34 (OK), but neighbor_delta=5.2C (threshold: 4C).” A neural network cannot explain its reasoning this clearly.
Small training set: 312 fault examples across 5 classes is modest. Decision trees perform well with small datasets. Deep learning typically needs 10,000+ examples per class.
Fast inference: The tree above requires 3-4 comparisons per prediction. On an ESP32, this takes microseconds. A neural network would take milliseconds and consume more memory.
Step 3: Train and evaluate with Python
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import classification_report
import numpy as np
# Feature matrix: 50 sensors x 6 months x (24 windows/day)
# = ~219,000 samples with 6 features each
# Labels: 0=Normal, 1=Stuck, 2=Drift, 3=Spike, 4=Dead
X = np.load("sensor_features.npy") # shape: (219000, 6)
y = np.load("sensor_labels.npy") # shape: (219000,)
# Class distribution (highly imbalanced):
# Normal: 218,200 (99.6%), Stuck: 380 (0.17%),
# Drift: 220 (0.10%), Spike: 150 (0.07%), Dead: 50 (0.02%)
# Train decision tree with depth limit to prevent overfitting
clf = DecisionTreeClassifier(
max_depth=6, # Limit complexity
min_samples_leaf=20, # Require 20+ samples per leaf
class_weight="balanced" # Upweight rare fault classes
)
# Stratified 5-fold cross-validation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
for fold, (train_idx, test_idx) in enumerate(skf.split(X, y)):
clf.fit(X[train_idx], y[train_idx])
y_pred = clf.predict(X[test_idx])
print(f"Fold {fold+1}:")
print(classification_report(
y[test_idx], y_pred,
target_names=["Normal","Stuck","Drift","Spike","Dead"]
))Step 4: Evaluate results
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Normal | 99.9% | 99.7% | 99.8% | 218,200 |
| Stuck | 87.2% | 94.5% | 90.7% | 380 |
| Drift | 78.4% | 82.3% | 80.3% | 220 |
| Spike | 91.0% | 86.7% | 88.8% | 150 |
| Dead | 96.0% | 96.0% | 96.0% | 50 |
Interpreting the results:
- Normal: 99.7% recall means only 0.3% of normal readings get falsely flagged. With 218,000 normal windows, that is ~654 false alarms per 6 months, or ~3.6 per day. Acceptable for a 50-sensor building.
- Stuck: 94.5% recall catches nearly all stuck sensors. The 87.2% precision means some normal low-variance periods (nighttime, stable weather) get flagged – the technician sees ~1 false stuck alert per week.
- Drift: 82.3% recall is the weakest. Slow drift is genuinely hard to distinguish from real temperature changes. Adding HVAC schedule as a feature would improve this.
- Dead: 96% both ways. Dead sensors are obvious (range=0, variance=0) and easy to classify.
Step 5: Deploy to edge gateway
The trained tree has max_depth=6, meaning at most 64 leaf nodes. Exported as a series of if-else statements, the model is under 2 KB:
# Auto-generated from sklearn export_text
def classify_sensor(variance, range_val, neighbor_delta,
spike_count, trend_slope, flatline_ratio):
if variance < 0.01:
if range_val < 0.001:
return 4 # Dead
if flatline_ratio > 0.85:
return 1 # Stuck
return 1 # Stuck (low variance)
if spike_count > 5.5:
if spike_count > 12:
return 3 # Spike (severe)
if variance > 8.2:
return 3 # Spike
return 0 # Normal (occasional spikes OK)
if neighbor_delta > 3.8:
if trend_slope > 0.02 or trend_slope < -0.02:
return 2 # Drift (diverging with trend)
return 0 # Normal (sensor in different zone)
return 0 # NormalResult: The building now detects sensor faults within 1 hour instead of 1 week. The 4-hour weekly manual inspection is replaced by reviewing ~4 alerts per day (under 10 minutes). Over 50 sensors, this catches 2-3 faults per month that previously went undetected for days, preventing HVAC energy waste estimated at $200-400 per undetected drift fault.
Key Insight: For tabular IoT data (sensor features arranged in rows and columns), decision trees and random forests consistently match or outperform deep learning while being faster, more interpretable, and easier to deploy on edge devices. Reserve neural networks for unstructured data like images, audio, or raw time-series where learned features outperform hand-engineered ones.
3.13 Self-Check Questions
Test Your Understanding
Before continuing, can you answer:
- What’s the difference between training and inference?
- Hint: One learns, one applies
- Why do we extract features from raw sensor data?
- Hint: Can an ML model understand “X: 0.2, Y: 9.8”?
- Why would you run ML on the edge instead of the cloud?
- Hint: Think about speed, privacy, and connectivity
- Why is 95% accuracy potentially misleading?
- Hint: What if 95% of your data is one class?
3.14 See Also
ML Fundamentals serves as the foundation for the entire IoT ML series. The core trade-off – cloud training vs. edge inference – appears throughout IoT systems: training requires powerful compute and large datasets, while inference must run on constrained devices with millisecond latency.
Within This Series:
- Mobile Sensing & Activity Recognition - Applies these concepts to real-world human activity recognition
- IoT ML Pipeline - Formalizes the 7-step process from data to deployment
- Edge ML & Deployment - Deepens model compression techniques introduced here
- Feature Engineering - Expands on feature extraction concepts
Cross-Module:
- Data Storage and Databases - How training data is collected and stored
- Edge and Fog Computing - Where ML models are deployed in tiered architectures
- Data Quality Monitoring - Ensuring training data quality
External Resources:
- TensorFlow Lite for Microcontrollers: tensorflow.org/lite/microcontrollers
- “TinyML” by Pete Warden and Daniel Situnayake (O’Reilly, 2019)
- Scikit-learn ML Basics: scikit-learn.org/stable/tutorial/basic/tutorial.html
3.15 Try It Yourself
Hands-On Challenge: Experience the training vs inference phases with a simple temperature anomaly detector
Task: Build a threshold-based “ML lite” model that learns normal temperature patterns:
- Training Phase (Simulate Learning):
- Collect 100 temperature readings during normal operation (simulate: 20-25°C with random noise)
- Calculate the mean and standard deviation of training data
- Define anomaly threshold: mean ± 3 standard deviations
- Save these learned parameters (mean, std) as your “model”
- Inference Phase (Apply Model):
- Generate 20 new test readings (18 normal 20-25°C, 2 anomalies at 35°C and 10°C)
- For each reading, check if it falls outside mean ± 3σ threshold
- Count how many anomalies you detected correctly
What to Observe:
- Training computes statistics once; inference applies them repeatedly
- The “model” is just two numbers (mean, std) but encodes learned normal behavior
- False positives occur if normal data has outliers; false negatives if anomalies are close to normal range
Extension: Try with smaller training sets (10 samples vs 100)—notice how model quality depends on training data quantity.
Common Pitfalls
1. Evaluating models only on training data
A model evaluated on the same data it was trained on will appear highly accurate due to memorisation rather than generalisation. Always evaluate on a held-out test set or with cross-validation.
2. Ignoring temporal order in train/test splits for time-series data
Random shuffling of time-series IoT data before splitting produces train/test contamination because future information leaks into training. Always split chronologically: train on earlier data, test on later data.
3. Treating class imbalance as a minor issue
A model trained on data with 0.1% anomalies will learn to always predict ‘normal’ and achieve 99.9% accuracy. Address class imbalance explicitly with oversampling (SMOTE), undersampling, or class-weighted loss functions.
4. Deploying the model with the best cross-validation score without field testing
Laboratory cross-validation uses historical data; real deployments encounter distribution shift (new equipment, changed processes, seasonal variation). Always run a shadow deployment period where the model makes predictions alongside human decisions before trusting it for automated action.
3.16 Summary
This chapter introduced the fundamentals of machine learning for IoT:
- Training vs Inference: Training learns patterns in the cloud; inference applies them on edge devices
- Feature Extraction: Raw sensor data must be transformed into meaningful statistics
- Model Compression: Techniques like quantization and pruning enable deployment on constrained devices
- Accuracy Metrics: Use precision, recall, and F1-score instead of just accuracy for imbalanced IoT data
3.17 What’s Next
| Direction | Chapter | Link |
|---|---|---|
| Next | Mobile Sensing and Activity Recognition | modeling-mobile-sensing.html |
| Related | IoT ML Pipeline | modeling-pipeline.html |
| Related | Edge ML and TinyML Deployment | modeling-edge-deployment.html |
| Related | Feature Engineering | modeling-feature-engineering.html |