%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#95A5A6', 'clusterBkg': '#ECF0F1', 'clusterBorder': '#2C3E50'}}}%%
flowchart TB
subgraph Good["Good Feature: Mean Acceleration Magnitude"]
G1["<b>Walking</b><br/>mean = 2.5 m/s2<br/>std = 0.3<br/>(Low variance)"]
G2["<b>Running</b><br/>mean = 5.0 m/s2<br/>std = 0.4<br/>(Low variance)"]
G3["<b>Clear Separation</b><br/>2.5 gap between means<br/>Easy classification<br/>90%+ accuracy"]
G1 -.-> G3
G2 -.-> G3
end
subgraph Bad["Bad Feature: Instantaneous Sample"]
B1["<b>Walking</b><br/>mean = 2.5 m/s2<br/>std = 2.0<br/>(High noise)"]
B2["<b>Running</b><br/>mean = 3.0 m/s2<br/>std = 2.5<br/>(High noise)"]
B3["<b>Overlapping Distributions</b><br/>0.5 mean difference<br/>Poor classification<br/>60% accuracy"]
B1 -.-> B3
B2 -.-> B3
end
style G1 fill:#2C3E50,stroke:#16A085,color:#fff,stroke-width:3px
style G2 fill:#16A085,stroke:#2C3E50,color:#fff,stroke-width:3px
style G3 fill:#27AE60,stroke:#2C3E50,color:#fff,stroke-width:3px
style B1 fill:#E67E22,stroke:#E74C3C,color:#fff,stroke-width:3px
style B2 fill:#E74C3C,stroke:#E67E22,color:#fff,stroke-width:3px
style B3 fill:#95A5A6,stroke:#7F8C8D,color:#fff,stroke-width:3px
1347 Feature Engineering for IoT Machine Learning
1347.1 Learning Objectives
By the end of this chapter, you will be able to:
- Design Discriminative Features: Create features that separate classes effectively
- Apply Domain Knowledge: Use physics and domain expertise to engineer powerful features
- Perform Feature Selection: Identify and remove redundant or uninformative features
- Optimize for Edge Deployment: Balance feature quality with computational cost
1347.2 Prerequisites
- ML Fundamentals: Feature extraction concepts
- IoT ML Pipeline: 7-step ML pipeline
- Basic statistics (mean, variance, correlation)
This is part 6 of the IoT Machine Learning series:
- ML Fundamentals - Core concepts
- Mobile Sensing - HAR, transportation
- IoT ML Pipeline - 7-step pipeline
- Edge ML & Deployment - TinyML
- Audio Feature Processing - MFCC
- Feature Engineering (this chapter) - Feature design and selection
- Production ML - Monitoring
1347.3 What Makes a Good Feature?
Feature engineering is often more impactful than algorithm selection. A simple Decision Tree with well-engineered features can outperform a deep neural network with raw sensor data.
1347.3.1 Good vs Bad Features: Visual Comparison
1347.3.2 Good Feature Characteristics
- High inter-class variance (separates classes well)
- Walking vs Running: Mean acceleration differs by 2-5 m/s2
- Low intra-class variance (consistent within class)
- Walking always ~2.5 m/s2, regardless of user height/weight
- Robust to noise and sensor drift
- Accelerometer calibration errors (±5%) don’t flip predictions
- Computationally cheap
- Mean calculation: O(n), minimal CPU/battery impact
- Interpretable (helps debugging)
- “Variance increased 3×” → clearly indicates running vs walking
1347.3.3 Bad Feature Characteristics
- Overlapping class distributions
- Temperature (20-25 C) same for walking/running → 50% accuracy
- High sensitivity to noise
- Instantaneous accelerometer sample: ±2 m/s2 jitter
- Expensive to compute
- Full FFT on 100-sample window: 50ms on Cortex-M4
- Correlated with other features (redundant)
- Mean X + Mean Y + Mean Z vs Magnitude: Magnitude captures all
- Not domain-relevant
- Battery level for activity recognition: random performance
1347.4 Sensor-Specific Feature Engineering
Different sensors require different strategies:
| Sensor Type | Good Features | Bad Features | Why Good Works |
|---|---|---|---|
| Accelerometer | Mean magnitude, Variance, Peak count | Raw samples, Individual axes | Noise reduction, orientation-independent |
| Temperature | Rate of change, Slope | Absolute value | Location-independent, detects events |
| Audio | MFCCs, Spectral energy | Raw waveform | 1000× compression, noise robust |
| Current Sensor | RMS, Peak, Crest factor | Instantaneous reading | Load identification, filters transients |
| GPS | Speed, Heading change rate | Latitude, Longitude | Context-independent, captures motion |
| Gyroscope | Angular velocity variance | Raw rotation matrix | Captures turning behavior |
1347.5 HAR Feature Engineering Code
import numpy as np
def extract_har_features(accel_window):
"""
Extract 12 efficient features for activity recognition
Input: accel_window (N, 3) - N samples of [X, Y, Z] acceleration
Output: 12-element feature vector
"""
# Compute magnitude (orientation-independent)
mag = np.linalg.norm(accel_window, axis=1)
# Time-domain features (cheap to compute)
features = [
np.mean(mag), # Mean magnitude
np.std(mag), # Variance (separates sit/walk/run)
np.min(mag), # Minimum (detects stationary)
np.max(mag), # Maximum (detects impacts)
np.max(mag) - np.min(mag), # Range (motion intensity)
# Zero crossings (periodicity indicator)
np.sum(np.diff(np.sign(mag - np.mean(mag))) != 0),
# Signal Magnitude Area (total energy)
np.sum(np.abs(accel_window[:, 0])) +
np.sum(np.abs(accel_window[:, 1])) +
np.sum(np.abs(accel_window[:, 2])),
# Vertical component (stairs detection)
np.std(accel_window[:, 2]), # Z-axis variance
# Frequency estimation (no FFT needed)
estimate_dominant_frequency(mag),
# Statistical moments
np.percentile(mag, 75) - np.percentile(mag, 25), # IQR
np.sum((mag - np.mean(mag)) ** 3) / len(mag), # Skewness
np.sum((mag - np.mean(mag)) ** 4) / len(mag), # Kurtosis
]
return np.array(features)
def estimate_dominant_frequency(signal, sample_rate=50):
"""Estimate frequency without FFT (autocorrelation method)"""
autocorr = np.correlate(signal - np.mean(signal),
signal - np.mean(signal), mode='full')
autocorr = autocorr[len(autocorr)//2:]
# Find first peak after lag 0
peaks = (autocorr[1:-1] > autocorr[:-2]) & (autocorr[1:-1] > autocorr[2:])
if np.any(peaks):
period = np.argmax(peaks) + 1
return sample_rate / period # Hz
return 01347.5.1 Computational Cost Comparison
| Feature Approach | Features | Time | Accuracy | Model Size |
|---|---|---|---|---|
| Raw samples (100) | 100 | <1ms | 65% | 200 KB (CNN) |
| Time-domain only | 6 | 2ms | 82% | 15 KB |
| Time + freq (12) | 12 | 8ms | 90% | 25 KB |
| Full FFT + MFCCs | 39 | 45ms | 92% | 80 KB |
Sweet spot: 12 time+freq features balance accuracy (90%) with speed (8ms).
1347.6 Feature Selection Decision Flowchart
Step 1: Start with statistical features (always compute first) - Mean, Standard deviation, Min, Max, Range - Cost: O(n) single pass - Baseline: 70-80% accuracy
Step 2: Add domain-specific features (if accuracy < 85%) - Accelerometer: Zero crossings, Peak count - Audio: MFCCs, Spectral energy - Temperature: Rate of change, Slope - Accuracy boost: +10-15%
Step 3: Consider frequency domain (only if needed) - FFT dominant frequency, Spectral entropy - When: Periodic signals (walking, rotating machinery) - Cost: 20-50ms for 100-sample FFT - Skip if: Non-periodic data or tight latency budget
Step 4: Correlation analysis (remove redundancy) - Drop features with r > 0.9 correlation - Redundant features waste compute and model capacity
Step 5: Test on held-out data - 80/20 split by users (not time!) - Ensure test users not in training set
1347.7 Common Feature Engineering Mistakes
Mistake 1: Using absolute values instead of relative - Bad: Absolute temperature (22 C) → Location-dependent - Good: Temperature rate of change (2 C/min) → Detects events
Mistake 2: Including metadata as features - Bad: Device ID, Battery %, Wi-Fi SSID → Not causal - Good: Motion variance, Audio energy → Physics-based
Mistake 3: Computing expensive features when cheap ones work - Bad: Full FFT (50ms) for non-periodic data - Good: Variance + Zero crossings (2ms)
Mistake 4: Ignoring correlation between features - Bad: Mean X, Mean Y, Mean Z (correlated) → 3 redundant features - Good: Magnitude sqrt(x2+y2+z2) → 1 orientation-independent feature
Mistake 5: Training and testing on same user’s data - Bad: User A: 80% train, 20% test → 95% accuracy (overfitting) - Good: Users A+B: train, User C: test → 85% accuracy (generalizes)
1347.8 Worked Example: Smart Agriculture Soil Monitoring
Scenario: Agricultural IoT with 12 sensors per station, ESP32 edge device (320KB RAM), <50ms inference budget.
Initial: 36 features → 2.8 MB model → 180ms inference (too slow!)
Feature Selection Process:
| Step | Features | Accuracy | Model Size | Inference |
|---|---|---|---|---|
| All 36 features | 36 | 91.2% | 2.8 MB | 180ms |
| Top 15 by importance | 15 | 90.8% | 1.1 MB | 95ms |
| Remove correlated | 8 | 89.7% | 420 KB | 52ms |
| Top 6 uncorrelated | 6 | 88.9% | 180 KB | 38ms |
| Final (top 5) | 5 | 87.2% | 95 KB | 28ms |
Final Feature Set: 1. soil_moisture_30cm (primary indicator) 2. soil_temp_15cm (evaporation driver) 3. moisture_rate_24h (trend) 4. humidity (atmospheric demand) 5. evapotranspiration (physics-based derived)
Key Insight: Top 5 features captured 76% of predictive power. Correlation analysis essential—soil moisture at 15cm and 30cm were r=0.89 correlated, so keeping both wastes capacity.
1347.9 Feature Importance Analysis
from sklearn.ensemble import RandomForestClassifier
import numpy as np
# Train model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
# Get feature importances
importances = model.feature_importances_
feature_names = ['mean', 'std', 'min', 'max', 'range', 'zero_crossings',
'sma', 'z_std', 'freq', 'iqr', 'skew', 'kurtosis']
# Sort by importance
indices = np.argsort(importances)[::-1]
print("Feature ranking:")
for i in range(len(feature_names)):
print(f"{i+1}. {feature_names[indices[i]]}: {importances[indices[i]]:.3f}")
# Output example:
# 1. std: 0.342 <- Variance most discriminative
# 2. freq: 0.251 <- Frequency second
# 3. sma: 0.158 <- Signal magnitude area
# 4. mean: 0.089 <- Mean contributes but less
# ... (remaining < 5% each)
# Drop features with < 2% importance
important_features = [f for i, f in enumerate(feature_names)
if importances[i] > 0.02]
# Result: 12 features -> 7 features, 90% -> 89% accuracy, 40% faster1347.10 Knowledge Check
Question 1: A smartwatch activity classifier achieves only 72% accuracy using these features: Mean X/Y/Z acceleration, Device brand, Battery level, GPS speed (40% missing). Which strategy would MOST improve accuracy?
Explanation: Motion-specific features directly capture biomechanical differences. Variance distinguishes walking (1-2 m/s2) from running (3-5 m/s2). FFT reveals cadence differences. These physics-based features achieve 90%+ accuracy vs 72% with metadata.
1347.11 Summary
This chapter covered feature engineering for IoT ML:
- Good Features: High inter-class variance, low intra-class variance, cheap to compute
- Domain Knowledge: Physics-based features (MFCC for audio, variance for motion) outperform generic statistics
- Feature Selection: Use importance analysis and correlation pruning to reduce 36 → 5 features
- Edge Optimization: Balance accuracy vs computational cost for deployment
Key Insight: Feature engineering contributes more to accuracy than algorithm choice—spend 80% of time on features, 20% on model selection.
1347.12 What’s Next
Continue to Production ML to learn monitoring, anomaly detection, and debugging strategies for deployed IoT ML systems.