1347 Feature Engineering for IoT Machine Learning

1347.1 Learning Objectives

By the end of this chapter, you will be able to:

Design Discriminative Features: Create features that separate classes effectively
Apply Domain Knowledge: Use physics and domain expertise to engineer powerful features
Perform Feature Selection: Identify and remove redundant or uninformative features
Optimize for Edge Deployment: Balance feature quality with computational cost

1347.2 Prerequisites

ML Fundamentals: Feature extraction concepts
IoT ML Pipeline: 7-step ML pipeline
Basic statistics (mean, variance, correlation)

Chapter Series: Modeling and Inferencing

This is part 6 of the IoT Machine Learning series:

ML Fundamentals - Core concepts
Mobile Sensing - HAR, transportation
IoT ML Pipeline - 7-step pipeline
Edge ML & Deployment - TinyML
Audio Feature Processing - MFCC
Feature Engineering (this chapter) - Feature design and selection
Production ML - Monitoring

1347.3 What Makes a Good Feature?

Feature engineering is often more impactful than algorithm selection. A simple Decision Tree with well-engineered features can outperform a deep neural network with raw sensor data.

1347.3.1 Good vs Bad Features: Visual Comparison

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#95A5A6', 'clusterBkg': '#ECF0F1', 'clusterBorder': '#2C3E50'}}}%%
flowchart TB
    subgraph Good["Good Feature: Mean Acceleration Magnitude"]
        G1["<b>Walking</b><br/>mean = 2.5 m/s2<br/>std = 0.3<br/>(Low variance)"]
        G2["<b>Running</b><br/>mean = 5.0 m/s2<br/>std = 0.4<br/>(Low variance)"]
        G3["<b>Clear Separation</b><br/>2.5 gap between means<br/>Easy classification<br/>90%+ accuracy"]
        G1 -.-> G3
        G2 -.-> G3
    end

    subgraph Bad["Bad Feature: Instantaneous Sample"]
        B1["<b>Walking</b><br/>mean = 2.5 m/s2<br/>std = 2.0<br/>(High noise)"]
        B2["<b>Running</b><br/>mean = 3.0 m/s2<br/>std = 2.5<br/>(High noise)"]
        B3["<b>Overlapping Distributions</b><br/>0.5 mean difference<br/>Poor classification<br/>60% accuracy"]
        B1 -.-> B3
        B2 -.-> B3
    end

    style G1 fill:#2C3E50,stroke:#16A085,color:#fff,stroke-width:3px
    style G2 fill:#16A085,stroke:#2C3E50,color:#fff,stroke-width:3px
    style G3 fill:#27AE60,stroke:#2C3E50,color:#fff,stroke-width:3px
    style B1 fill:#E67E22,stroke:#E74C3C,color:#fff,stroke-width:3px
    style B2 fill:#E74C3C,stroke:#E67E22,color:#fff,stroke-width:3px
    style B3 fill:#95A5A6,stroke:#7F8C8D,color:#fff,stroke-width:3px

Figure 1347.1: Good versus Bad ML Features for Activity Classification

1347.3.2 Good Feature Characteristics

High-Quality IoT Features

High inter-class variance (separates classes well)
- Walking vs Running: Mean acceleration differs by 2-5 m/s2
Low intra-class variance (consistent within class)
- Walking always ~2.5 m/s2, regardless of user height/weight
Robust to noise and sensor drift
- Accelerometer calibration errors (±5%) don’t flip predictions
Computationally cheap
- Mean calculation: O(n), minimal CPU/battery impact
Interpretable (helps debugging)
- “Variance increased 3×” → clearly indicates running vs walking

1347.3.3 Bad Feature Characteristics

Features to Avoid

Overlapping class distributions
- Temperature (20-25 C) same for walking/running → 50% accuracy
High sensitivity to noise
- Instantaneous accelerometer sample: ±2 m/s2 jitter
Expensive to compute
- Full FFT on 100-sample window: 50ms on Cortex-M4
Correlated with other features (redundant)
- Mean X + Mean Y + Mean Z vs Magnitude: Magnitude captures all
Not domain-relevant
- Battery level for activity recognition: random performance

1347.4 Sensor-Specific Feature Engineering

Different sensors require different strategies:

Sensor Type	Good Features	Bad Features	Why Good Works
Accelerometer	Mean magnitude, Variance, Peak count	Raw samples, Individual axes	Noise reduction, orientation-independent
Temperature	Rate of change, Slope	Absolute value	Location-independent, detects events
Audio	MFCCs, Spectral energy	Raw waveform	1000× compression, noise robust
Current Sensor	RMS, Peak, Crest factor	Instantaneous reading	Load identification, filters transients
GPS	Speed, Heading change rate	Latitude, Longitude	Context-independent, captures motion
Gyroscope	Angular velocity variance	Raw rotation matrix	Captures turning behavior

1347.5 HAR Feature Engineering Code

import numpy as np

def extract_har_features(accel_window):
    """
    Extract 12 efficient features for activity recognition
    Input: accel_window (N, 3) - N samples of [X, Y, Z] acceleration
    Output: 12-element feature vector
    """
    # Compute magnitude (orientation-independent)
    mag = np.linalg.norm(accel_window, axis=1)

    # Time-domain features (cheap to compute)
    features = [
        np.mean(mag),              # Mean magnitude
        np.std(mag),               # Variance (separates sit/walk/run)
        np.min(mag),               # Minimum (detects stationary)
        np.max(mag),               # Maximum (detects impacts)
        np.max(mag) - np.min(mag), # Range (motion intensity)

        # Zero crossings (periodicity indicator)
        np.sum(np.diff(np.sign(mag - np.mean(mag))) != 0),

        # Signal Magnitude Area (total energy)
        np.sum(np.abs(accel_window[:, 0])) +
        np.sum(np.abs(accel_window[:, 1])) +
        np.sum(np.abs(accel_window[:, 2])),

        # Vertical component (stairs detection)
        np.std(accel_window[:, 2]),  # Z-axis variance

        # Frequency estimation (no FFT needed)
        estimate_dominant_frequency(mag),

        # Statistical moments
        np.percentile(mag, 75) - np.percentile(mag, 25),  # IQR
        np.sum((mag - np.mean(mag)) ** 3) / len(mag),     # Skewness
        np.sum((mag - np.mean(mag)) ** 4) / len(mag),     # Kurtosis
    ]

    return np.array(features)

def estimate_dominant_frequency(signal, sample_rate=50):
    """Estimate frequency without FFT (autocorrelation method)"""
    autocorr = np.correlate(signal - np.mean(signal),
                            signal - np.mean(signal), mode='full')
    autocorr = autocorr[len(autocorr)//2:]

    # Find first peak after lag 0
    peaks = (autocorr[1:-1] > autocorr[:-2]) & (autocorr[1:-1] > autocorr[2:])
    if np.any(peaks):
        period = np.argmax(peaks) + 1
        return sample_rate / period  # Hz
    return 0

1347.5.1 Computational Cost Comparison

Feature Approach	Features	Time	Accuracy	Model Size
Raw samples (100)	100	<1ms	65%	200 KB (CNN)
Time-domain only	6	2ms	82%	15 KB
Time + freq (12)	12	8ms	90%	25 KB
Full FFT + MFCCs	39	45ms	92%	80 KB

Sweet spot: 12 time+freq features balance accuracy (90%) with speed (8ms).

1347.6 Feature Selection Decision Flowchart

Step-by-Step Feature Selection

Step 1: Start with statistical features (always compute first) - Mean, Standard deviation, Min, Max, Range - Cost: O(n) single pass - Baseline: 70-80% accuracy

Step 2: Add domain-specific features (if accuracy < 85%) - Accelerometer: Zero crossings, Peak count - Audio: MFCCs, Spectral energy - Temperature: Rate of change, Slope - Accuracy boost: +10-15%

Step 3: Consider frequency domain (only if needed) - FFT dominant frequency, Spectral entropy - When: Periodic signals (walking, rotating machinery) - Cost: 20-50ms for 100-sample FFT - Skip if: Non-periodic data or tight latency budget

Step 4: Correlation analysis (remove redundancy) - Drop features with r > 0.9 correlation - Redundant features waste compute and model capacity

Step 5: Test on held-out data - 80/20 split by users (not time!) - Ensure test users not in training set

1347.7 Common Feature Engineering Mistakes

Mistakes to Avoid

Mistake 1: Using absolute values instead of relative - Bad: Absolute temperature (22 C) → Location-dependent - Good: Temperature rate of change (2 C/min) → Detects events

Mistake 2: Including metadata as features - Bad: Device ID, Battery %, Wi-Fi SSID → Not causal - Good: Motion variance, Audio energy → Physics-based

Mistake 3: Computing expensive features when cheap ones work - Bad: Full FFT (50ms) for non-periodic data - Good: Variance + Zero crossings (2ms)

Mistake 4: Ignoring correlation between features - Bad: Mean X, Mean Y, Mean Z (correlated) → 3 redundant features - Good: Magnitude sqrt(x2+y2+z2) → 1 orientation-independent feature

Mistake 5: Training and testing on same user’s data - Bad: User A: 80% train, 20% test → 95% accuracy (overfitting) - Good: Users A+B: train, User C: test → 85% accuracy (generalizes)

1347.8 Worked Example: Smart Agriculture Soil Monitoring

Scenario: Agricultural IoT with 12 sensors per station, ESP32 edge device (320KB RAM), <50ms inference budget.

Initial: 36 features → 2.8 MB model → 180ms inference (too slow!)

Feature Selection Process:

Step	Features	Accuracy	Model Size	Inference
All 36 features	36	91.2%	2.8 MB	180ms
Top 15 by importance	15	90.8%	1.1 MB	95ms
Remove correlated	8	89.7%	420 KB	52ms
Top 6 uncorrelated	6	88.9%	180 KB	38ms
Final (top 5)	5	87.2%	95 KB	28ms

Final Feature Set: 1. soil_moisture_30cm (primary indicator) 2. soil_temp_15cm (evaporation driver) 3. moisture_rate_24h (trend) 4. humidity (atmospheric demand) 5. evapotranspiration (physics-based derived)

Key Insight: Top 5 features captured 76% of predictive power. Correlation analysis essential—soil moisture at 15cm and 30cm were r=0.89 correlated, so keeping both wastes capacity.

1347.9 Feature Importance Analysis

from sklearn.ensemble import RandomForestClassifier
import numpy as np

# Train model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# Get feature importances
importances = model.feature_importances_
feature_names = ['mean', 'std', 'min', 'max', 'range', 'zero_crossings',
                 'sma', 'z_std', 'freq', 'iqr', 'skew', 'kurtosis']

# Sort by importance
indices = np.argsort(importances)[::-1]

print("Feature ranking:")
for i in range(len(feature_names)):
    print(f"{i+1}. {feature_names[indices[i]]}: {importances[indices[i]]:.3f}")

# Output example:
# 1. std: 0.342        <- Variance most discriminative
# 2. freq: 0.251       <- Frequency second
# 3. sma: 0.158        <- Signal magnitude area
# 4. mean: 0.089       <- Mean contributes but less
# ... (remaining < 5% each)

# Drop features with < 2% importance
important_features = [f for i, f in enumerate(feature_names)
                      if importances[i] > 0.02]
# Result: 12 features -> 7 features, 90% -> 89% accuracy, 40% faster

1347.10 Knowledge Check

Question 1: A smartwatch activity classifier achieves only 72% accuracy using these features: Mean X/Y/Z acceleration, Device brand, Battery level, GPS speed (40% missing). Which strategy would MOST improve accuracy?

Explanation: Motion-specific features directly capture biomechanical differences. Variance distinguishes walking (1-2 m/s2) from running (3-5 m/s2). FFT reveals cadence differences. These physics-based features achieve 90%+ accuracy vs 72% with metadata.

1347.11 Summary

This chapter covered feature engineering for IoT ML:

Good Features: High inter-class variance, low intra-class variance, cheap to compute
Domain Knowledge: Physics-based features (MFCC for audio, variance for motion) outperform generic statistics
Feature Selection: Use importance analysis and correlation pruning to reduce 36 → 5 features
Edge Optimization: Balance accuracy vs computational cost for deployment

Key Insight: Feature engineering contributes more to accuracy than algorithm choice—spend 80% of time on features, 20% on model selection.

1347.12 What’s Next

Continue to Production ML to learn monitoring, anomaly detection, and debugging strategies for deployed IoT ML systems.