47  Edge Sampling & Compression

47.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Apply Nyquist Theorem: Calculate appropriate sampling rates for different sensor types
  • Implement Data Reduction Techniques: Use aggregation, compression, event-based reporting, and delta encoding
  • Select Compression Algorithms: Choose optimal algorithms based on data type and edge device constraints
  • Avoid Common Pitfalls: Prevent sampling aliasing, buffer overflow, and rate mismatch errors
In 60 Seconds

Adaptive sampling and on-device compression are the two most powerful techniques for reducing IoT data volume at the edge — adaptive sampling matches collection frequency to signal dynamics, while compression exploits redundancy to shrink the data that is transmitted. Together they can reduce bandwidth requirements by 90–99% without meaningful loss of analytical value.

Edge sampling and compression reduce the amount of data IoT devices need to transmit. Think of sending a friend the highlights of a movie instead of the entire film. By transmitting only important changes or compressed summaries, devices save battery power and network bandwidth while preserving the information that matters most.

47.2 Prerequisites

Before diving into this chapter, you should be familiar with:

  • Edge Data Acquisition: Architecture: Understanding device categories and data generation patterns
  • Basic signal processing concepts: Familiarity with frequency and time-domain representations
  • Python programming: Code examples use Python for data processing
Minimum Viable Understanding: Data Reduction at the Edge

Core Concept: Transform raw sensor data into actionable information at the source - send summaries, statistics, and alerts rather than every reading.

Why It Matters: Transmitting data costs 10-100x more energy than processing it locally. A sensor sending 1000 samples/minute to the cloud uses 100x more bandwidth than one sending minute-averages - with identical analytical value for most applications.

Key Takeaway: Apply the 90% rule - if 90% of your data is “normal” readings that will never be analyzed individually, aggregate them locally. Send statistical summaries (min, max, mean, std) at lower frequency, and only transmit raw data when anomalies are detected. This extends battery life from days to years.

47.3 Nyquist Sampling Rate

Time: ~8 min | Difficulty: Intermediate | Reference: P10.C08.U03

Key Concepts

  • Adaptive sampling: Dynamically adjusting the sensor sampling rate based on signal variance or event rate — increasing frequency when the signal changes rapidly and decreasing it during quiet periods.
  • Nyquist-Shannon theorem: The fundamental sampling principle stating that a signal must be sampled at least twice its highest frequency component to be reconstructed accurately — the minimum sampling rate for any IoT sensor.
  • Delta encoding: A compression technique transmitting only the change between consecutive readings rather than absolute values, highly effective for slowly varying sensors.
  • Run-length encoding (RLE): Compressing sequences of identical values into a count-value pair — very effective for binary event streams or sensors with frequent identical readings.
  • Lossless compression: Compression that allows perfect reconstruction of the original data — required for financial billing data, safety-critical readings, and regulatory compliance.
  • Lossy compression: Compression that discards some information to achieve higher compression ratios — acceptable for analytics workloads where small accuracy loss is tolerable.

To accurately capture a signal, the sampling rate must be at least twice the highest frequency component of interest:

\[f_{sample} \geq 2 \times f_{max}\]

Practical examples:

Signal Type Max Frequency Min Sample Rate Typical Rate
Temperature 0.1 Hz (slow changes) 0.2 Hz 1 sample/minute
Vibration 500 Hz 1 kHz 2-5 kHz
Audio 20 kHz 40 kHz 44.1 kHz
Motion (IMU) 50 Hz 100 Hz 100-200 Hz
Common Pitfall: Sampling Aliasing

The mistake: Sampling signals below the Nyquist rate (2x the highest frequency), causing phantom patterns (aliasing) that don’t exist in the original signal.

Symptoms:

  • Vibration analysis shows unexpected low-frequency patterns
  • Motor speed readings fluctuate despite constant RPM
  • Temperature data shows oscillations that don’t match physical reality
  • Frequency analysis reveals false peaks at wrong frequencies
  • Bearing fault detection produces false positives

Why it happens: Engineers apply “common sense” sampling rates without frequency analysis. Underestimating signal bandwidth - a 60 Hz motor generates harmonics at 120 Hz, 180 Hz, etc. Cost pressure drives lower sampling rates. Copy-pasting configurations between different sensor types.

The fix: Always sample at >2x the highest frequency of interest:

Signal Type Max Frequency Minimum Sample Rate Recommended
Room temperature 0.01 Hz 0.02 Hz 1/minute
HVAC response 0.1 Hz 0.2 Hz 1/second
Motor vibration 500 Hz 1 kHz 2.5 kHz
Bearing analysis 5 kHz 10 kHz 25 kHz

Prevention: Perform frequency analysis on representative signals before deployment. Use anti-aliasing filters (low-pass hardware filters) before the ADC. When in doubt, oversample then downsample digitally with proper filtering.

What sampling rate is needed to detect bearing faults in a 1800 RPM motor?

Given:

  • Motor speed: 1800 RPM = 30 Hz (revolutions per second)
  • Bearing has 8 rolling elements
  • Bearing fault frequency: Rolling element passes once per revolution = \(8 \times 30 = 240\) Hz
  • Harmonics: 2nd harmonic at 480 Hz, 3rd at 720 Hz
  • Highest frequency of interest: 3rd harmonic = 720 Hz

Nyquist calculation:

\[f_{sample} \geq 2 \times f_{max} = 2 \times 720 \text{ Hz} = 1440 \text{ Hz minimum}\]

Practical safety factor (2.5x above Nyquist):

\[f_{sample} = 2.5 \times 1440 = 3600 \text{ Hz} \approx 4 \text{ kHz}\]

If sampled at only 100 Hz (common mistake):

  • Nyquist frequency = 50 Hz, but bearing fault is at 240 Hz
  • 240 Hz aliases to \(|240 - \text{round}(240/100) \times 100| = |240 - 200| = 40\) Hz
  • Result: False 40 Hz pattern appears instead of the real 240 Hz bearing fault
  • Impact: Bearing failure goes undetected until catastrophic failure

Memory and bandwidth check:

  • 4 kHz sampling × 2 bytes per sample = 8 KB/s raw data
  • 1-second FFT windows → 8 KB per analysis
  • Send top 10 frequency peaks → 120 bytes per second (67× reduction)
  • Conclusion: Edge FFT compression mandatory for battery-powered vibration monitoring

47.3.1 Interactive: Nyquist Sampling Rate Calculator

Use this calculator to explore how motor speed, harmonic order, and safety factor affect the required sampling rate and resulting data volume.

47.4 Edge Data Reduction Techniques

Time: ~12 min | Difficulty: Intermediate | Reference: P10.C08.U03b

Before transmitting data to the cloud, edge devices can apply several reduction strategies:

  1. Aggregation: Compute statistics over time windows (mean, min, max, variance)
  2. Compression: Apply lossless (ZIP) or lossy (threshold-based) compression
  3. Event-based reporting: Only transmit when values exceed thresholds
  4. Delta encoding: Send only changes from previous values
# Example: Edge aggregation for temperature sensor
class EdgeAggregator:
    def __init__(self, window_size=60):  # 60 samples = 1 minute at 1 Hz
        self.window_size = window_size
        self.buffer = []

    def add_sample(self, value):
        self.buffer.append(value)
        if len(self.buffer) >= self.window_size:
            return self.compute_summary()
        return None

    def compute_summary(self):
        summary = {
            "min": min(self.buffer),
            "max": max(self.buffer),
            "mean": sum(self.buffer) / len(self.buffer),
            "samples": len(self.buffer)
        }
        self.buffer = []
        return summary

# Usage: Send 1 summary per minute instead of 60 raw samples
aggregator = EdgeAggregator(window_size=60)
for temp_reading in sensor_stream:
    summary = aggregator.add_sample(temp_reading)
    if summary:
        transmit_to_cloud(summary)  # 60x bandwidth reduction
Try It: Edge Aggregation Data Reduction

Adjust the sensor sampling rate and aggregation window size to see how edge aggregation reduces data volume. The widget calculates the bandwidth reduction and shows what information is preserved versus lost.

Battery life impact of edge aggregation vs raw transmission:

Scenario: Temperature sensor with 2500 mAh battery, 1 reading/minute

Option A: Raw transmission (every reading sent):

  • LoRa TX: 120 mA for 1.5s per transmission
  • Transmissions per hour: 60
  • TX energy per hour: \((120 \text{ mA} \times 1.5 \text{ s} \times 60) / 3600 = 3.0 \text{ mAh}\)
  • Plus sleep: 0.01 mAh/hour
  • Total: 3.01 mAh/hour → battery life = 2500 / 3.01 = 831 hours = 35 days

Option B: Edge aggregation (1 summary every 15 minutes):

  • Transmissions per hour: 4
  • TX energy per hour: \((120 \text{ mA} \times 1.5 \text{ s} \times 4) / 3600 = 0.20 \text{ mAh}\)
  • Plus sleep: 0.01 mAh/hour
  • Total: 0.21 mAh/hour → battery life = 2500 / 0.21 = 11,905 hours = 496 days = 1.4 years

Battery life improvement: \(\frac{496}{35} = 14.2\times\) longer

Data volume comparison:

  • Raw: 60 packets/hour × 20 bytes = 1.2 KB/hour = 28.8 KB/day
  • Aggregated: 4 packets/hour × 32 bytes (min/max/mean/count) = 128 bytes/hour = 3.1 KB/day
  • Bandwidth reduction: \(\frac{28.8}{3.1} = 9.3\times\)

Key insight: Transmission dominates power budget. Aggregation provides 14× battery improvement with minimal information loss for trend analysis.

47.4.1 Interactive: Battery Life vs. Aggregation Calculator

Adjust the parameters below to see how edge aggregation affects battery life and bandwidth for a LoRa-connected sensor.

Common Pitfall: Sampling Rate Mismatch

The mistake: Combining data from sensors with different sampling rates without proper resampling, leading to incorrect correlations and temporal misalignment.

Symptoms:

  • Correlation analysis shows unexpected null or spurious relationships
  • Merged datasets have many NaN/missing values at certain timestamps
  • Time-series plots show “jagged” or misaligned signals
  • ML models perform poorly despite good individual sensor data

Why it happens: Teams often assume all sensors operate at the same rate. A temperature sensor at 1 Hz combined with a vibration sensor at 100 Hz creates 99 missing values per temperature reading. Naive timestamp matching drops 99% of vibration data.

The fix: Use proper resampling/interpolation before combining:

# Resample high-frequency data to match low-frequency
vibration_1hz = vibration_100hz.resample('1S').mean()
# Or upsample low-frequency with interpolation
temp_100hz = temp_1hz.resample('10ms').interpolate(method='linear')
# Then merge on aligned timestamps
merged = pd.merge_asof(vibration_1hz, temp_1hz, on='timestamp', tolerance=pd.Timedelta('500ms'))

Prevention: Document sampling rates in sensor metadata. Create a data alignment layer that resamples all sources to a common time base before analysis.

Common Pitfall: Edge Buffer Overflow

The mistake: Configuring edge device buffers without considering worst-case scenarios, causing data loss during network outages or traffic spikes.

Symptoms:

  • Gaps in time-series data after network recovery
  • “Buffer full, dropping oldest data” warnings in device logs
  • Critical events missing during high-activity periods
  • Inconsistent data counts between edge and cloud
  • Post-incident analysis reveals missing sensor readings

Why it happens: Buffer sizes calculated for average conditions, not peak loads. Network outage duration underestimated. Sensor burst rates during events (motion, vibration) exceed steady-state assumptions. Memory constraints on edge devices force small buffers.

The fix: Size buffers for worst-case, not average:

# Buffer sizing calculation
samples_per_second = 10
max_outage_duration_seconds = 3600  # 1 hour
safety_margin = 1.5

min_buffer_size = samples_per_second * max_outage_duration_seconds * safety_margin
# = 10 * 3600 * 1.5 = 54,000 samples

# If memory-constrained, implement tiered retention:
# - Last 5 minutes: Full resolution
# - 5-60 minutes: 10x downsampled
# - Beyond 60 minutes: Statistical summary only
Try It: Buffer Sizing for Network Outages

Calculate the required buffer size for your edge device based on sampling rate, expected outage duration, and available memory. See how tiered retention can help when memory is constrained.

Prevention: Monitor buffer utilization as a health metric. Alert at 70% capacity. Implement graceful degradation (reduce resolution before dropping data). Test with simulated network outages lasting 2x your expected maximum.

47.5 Compression Algorithms Deep Dive

Time: ~20 min | Difficulty: Advanced | Reference: P10.C08.U03c

Edge devices face a fundamental trade-off: transmit less data (save power, bandwidth, cost) while preserving information needed for downstream analytics. This deep dive compares compression techniques across three dimensions: compression ratio, computational cost, and information preservation.

47.5.1 Compression Algorithm Categories

Category Compression Ratio CPU Cost Information Loss Best For
Lossless 2:1 - 4:1 Medium None Critical data, audit logs
Lossy Statistical 10:1 - 100:1 Low Controlled Trend analysis, dashboards
Lossy Transform 50:1 - 500:1 High Controlled Pattern detection, ML features
Semantic 100:1 - 1000:1 Low-Medium Significant Event detection, alerts

47.5.2 Lossless Compression: DEFLATE/GZIP

Standard lossless compression works well for structured IoT data:

import gzip
import json

def compress_batch(readings: list[dict]) -> bytes:
    """
    Compress a batch of sensor readings losslessly.
    Typical compression: 3-5x for JSON sensor data.
    """
    json_str = json.dumps(readings)
    compressed = gzip.compress(json_str.encode('utf-8'), compresslevel=6)
    return compressed

# Example: 100 temperature readings
readings = [{"ts": 1704067200 + i, "v": 22.5 + (i % 10) * 0.1} for i in range(100)]
raw_size = len(json.dumps(readings).encode())       # ~4,500 bytes
compressed_size = len(compress_batch(readings))     # ~1,200 bytes
# Compression ratio: 3.75:1
Try It: GZIP Compression Estimator

Explore how batch size, data format, and compression level affect lossless GZIP compression of IoT sensor readings.

Performance characteristics:

Metric GZIP Level 1 GZIP Level 6 GZIP Level 9
Compression ratio 2.5:1 3.5:1 4:1
Compress speed 50 MB/s 20 MB/s 5 MB/s
Decompress speed 100 MB/s 100 MB/s 100 MB/s
Edge CPU impact Low Medium High

When to use: Audit trails, compliance data, any data that may be queried in original form. Do not use for real-time streams on constrained MCUs.

47.5.3 Lossy Statistical: Aggregation Windows

Compute statistics over time windows, discard raw samples:

import statistics
from dataclasses import dataclass
from typing import Optional

@dataclass
class AggregatedWindow:
    timestamp: int      # Window start
    count: int          # Number of samples
    mean: float
    min_val: float
    max_val: float
    std_dev: float
    p95: Optional[float] = None  # Optional percentile

class WindowAggregator:
    def __init__(self, window_seconds: int = 60):
        self.window_seconds = window_seconds
        self.buffer: list[float] = []
        self.window_start: Optional[int] = None

    def add_sample(self, timestamp: int, value: float) -> Optional[AggregatedWindow]:
        if self.window_start is None:
            self.window_start = timestamp

        # Check if window complete
        if timestamp - self.window_start >= self.window_seconds:
            result = self._compute_aggregate()
            self.buffer = [value]
            self.window_start = timestamp
            return result

        self.buffer.append(value)
        return None

    def _compute_aggregate(self) -> AggregatedWindow:
        sorted_buffer = sorted(self.buffer)
        p95_idx = int(len(sorted_buffer) * 0.95)

        return AggregatedWindow(
            timestamp=self.window_start,
            count=len(self.buffer),
            mean=statistics.mean(self.buffer),
            min_val=min(self.buffer),
            max_val=max(self.buffer),
            std_dev=statistics.stdev(self.buffer) if len(self.buffer) > 1 else 0,
            p95=sorted_buffer[p95_idx] if p95_idx < len(sorted_buffer) else None
        )

# Example: 1 Hz sensor, 60-second windows
# Input: 60 samples x 8 bytes = 480 bytes
# Output: 1 aggregate x 48 bytes = 48 bytes
# Compression ratio: 10:1
# Preserved: Trend (mean), anomaly detection (min/max/std), health (count)
Try It: Statistical Window Aggregation Calculator

Configure the sensor rate and aggregation window to see how statistical summaries compress continuous data while preserving trend and anomaly detection capability.

Information loss analysis:

What’s Preserved What’s Lost
Average value (trend) Individual sample timing
Min/max (bounds) Exact sequence of values
Standard deviation (stability) Sub-window patterns
Sample count (health) Correlation with other sensors at sample level

When to use: Temperature, humidity, air quality - any slowly changing signal where trends matter more than exact samples.

47.5.4 Lossy Transform: FFT-Based Compression

Transform to frequency domain, keep only significant components:

import numpy as np
from dataclasses import dataclass

@dataclass
class FFTCompressed:
    timestamp: int
    sample_rate: float
    duration: float
    frequencies: list[float]    # Top N frequency components
    magnitudes: list[float]     # Corresponding magnitudes
    phases: list[float]         # Phase angles for reconstruction

def fft_compress(samples: np.ndarray, sample_rate: float,
                 timestamp: int, top_n: int = 10) -> FFTCompressed:
    """
    Compress time-series data using FFT, keeping top N frequency components.

    Typical compression: 100:1 to 500:1 depending on signal complexity.
    Best for: Vibration, audio, periodic signals.
    """
    # Compute FFT
    fft_result = np.fft.rfft(samples)
    freqs = np.fft.rfftfreq(len(samples), 1/sample_rate)

    # Get magnitudes and find top N (excluding DC component)
    magnitudes = np.abs(fft_result[1:])  # Skip DC
    phases = np.angle(fft_result[1:])
    freqs = freqs[1:]

    # Select top N by magnitude
    top_indices = np.argsort(magnitudes)[-top_n:]

    return FFTCompressed(
        timestamp=timestamp,
        sample_rate=sample_rate,
        duration=len(samples) / sample_rate,
        frequencies=freqs[top_indices].tolist(),
        magnitudes=magnitudes[top_indices].tolist(),
        phases=phases[top_indices].tolist()
    )

def fft_decompress(compressed: FFTCompressed, num_samples: int) -> np.ndarray:
    """
    Reconstruct signal from FFT components (lossy reconstruction).
    """
    t = np.linspace(0, compressed.duration, num_samples)
    signal = np.zeros(num_samples)

    for freq, mag, phase in zip(compressed.frequencies,
                                 compressed.magnitudes,
                                 compressed.phases):
        signal += mag * np.cos(2 * np.pi * freq * t + phase)

    return signal

# Example: Vibration sensor, 1 second at 1000 Hz
samples = np.sin(2*np.pi*50*np.linspace(0, 1, 1000))  # 50 Hz signal
samples += 0.3 * np.sin(2*np.pi*150*np.linspace(0, 1, 1000))  # 150 Hz harmonic

# Input: 1000 samples x 4 bytes = 4000 bytes
# Output: 10 freq-mag-phase tuples x 12 bytes = 120 bytes + 20 bytes metadata
# Compression ratio: ~30:1

compressed = fft_compress(samples, 1000.0, 1704067200, top_n=10)
reconstructed = fft_decompress(compressed, 1000)

# Reconstruction error for this example: ~5% RMS
# Bearing fault detection: Still works (frequency peaks preserved)
Try It: FFT Compression Explorer

Explore how FFT-based compression trades off between keeping more frequency components (higher fidelity) and achieving higher compression. Adjust the signal parameters and number of retained components.

When to use: Vibration analysis, acoustic monitoring, any signal where frequency content matters more than exact waveform.

47.5.5 Semantic Compression: Event Extraction

Highest compression, but requires domain knowledge:

from dataclasses import dataclass
from enum import Enum
from typing import Optional

class EventType(Enum):
    THRESHOLD_EXCEEDED = "threshold_exceeded"
    ANOMALY_DETECTED = "anomaly_detected"
    STATE_CHANGE = "state_change"
    PERIODIC_SUMMARY = "periodic_summary"

@dataclass
class SemanticEvent:
    timestamp: int
    device_id: str
    event_type: EventType
    value: float
    context: dict  # Additional info (threshold, previous state, etc.)

class SemanticCompressor:
    def __init__(self, device_id: str, threshold_high: float,
                 threshold_low: float, anomaly_std_factor: float = 3.0):
        self.device_id = device_id
        self.threshold_high = threshold_high
        self.threshold_low = threshold_low
        self.anomaly_std_factor = anomaly_std_factor
        self.history: list[float] = []
        self.last_state: Optional[str] = None
        self.summary_count = 0
        self.summary_sum = 0.0

    def process_sample(self, timestamp: int, value: float) -> list[SemanticEvent]:
        """
        Process a sample and return events (if any).
        Most samples produce NO events - that's the compression.
        """
        events = []

        # Update history for anomaly detection
        self.history.append(value)
        if len(self.history) > 100:
            self.history.pop(0)

        # Track for periodic summary
        self.summary_count += 1
        self.summary_sum += value

        # Check threshold crossing
        current_state = "normal"
        if value > self.threshold_high:
            current_state = "high"
        elif value < self.threshold_low:
            current_state = "low"

        if current_state != self.last_state and self.last_state is not None:
            events.append(SemanticEvent(
                timestamp=timestamp,
                device_id=self.device_id,
                event_type=EventType.STATE_CHANGE,
                value=value,
                context={
                    "previous_state": self.last_state,
                    "new_state": current_state
                }
            ))
        self.last_state = current_state

        # Check for statistical anomaly
        if len(self.history) >= 20:
            mean = sum(self.history) / len(self.history)
            std = (sum((x - mean)**2 for x in self.history) / len(self.history)) ** 0.5
            if std > 0 and abs(value - mean) > self.anomaly_std_factor * std:
                events.append(SemanticEvent(
                    timestamp=timestamp,
                    device_id=self.device_id,
                    event_type=EventType.ANOMALY_DETECTED,
                    value=value,
                    context={
                        "mean": mean,
                        "std": std,
                        "z_score": (value - mean) / std
                    }
                ))

        return events

    def get_periodic_summary(self, timestamp: int) -> SemanticEvent:
        """Call every N minutes to send a heartbeat/summary."""
        avg = self.summary_sum / self.summary_count if self.summary_count > 0 else 0
        event = SemanticEvent(
            timestamp=timestamp,
            device_id=self.device_id,
            event_type=EventType.PERIODIC_SUMMARY,
            value=avg,
            context={
                "sample_count": self.summary_count,
                "period_seconds": 300  # 5 minutes
            }
        )
        self.summary_count = 0
        self.summary_sum = 0.0
        return event

# Example: Temperature sensor, 1 sample/second
# Normal operation: 0 events per sample
# State change: 1 event (~100 bytes)
# 5-minute summary: 1 event (~80 bytes)
#
# Input: 300 samples x 8 bytes = 2400 bytes per 5 minutes
# Output: 1 summary + maybe 0-2 events = 80-280 bytes
# Compression ratio: 10:1 to 30:1 (varies by activity)
Try It: Semantic Event Compression Simulator

Configure thresholds and signal behavior to see how semantic compression extracts only meaningful events from a continuous sensor stream. Most samples produce zero events – that is the compression.

When to use: Monitoring systems where “nothing happening” is the common case. Alarm systems, threshold monitoring, sparse event streams.

47.5.6 Algorithm Selection Decision Tree

Decision tree guiding selection of compression algorithms for edge IoT devices based on signal type, data fidelity requirements, and computational constraints
Figure 47.1: Edge Data Compression Algorithm Selection Decision Tree

47.5.7 Benchmark Results: ESP32 Edge Device

Real measurements on ESP32-WROOM-32 (240 MHz, 520KB RAM):

Algorithm 1000 Samples Compress Time Output Size Power (mJ)
Raw JSON - 15 ms (serialize) 28,000 bytes 2.4
GZIP-6 28,000 bytes 85 ms 8,200 bytes 8.5
Window Agg 8 bytes/sample 2 ms 48 bytes 0.4
FFT Top-10 4 bytes/sample 45 ms 140 bytes 5.0
Semantic 8 bytes/sample 3 ms 0-100 bytes 0.5

Key insight: For battery-powered edge devices, window aggregation offers the best power efficiency. FFT is valuable when frequency content matters, but the CPU cost is significant. Semantic compression is ideal for sparse event streams.

47.5.8 Memory Constraints on Edge Devices

Compression algorithms have memory overhead. Consider carefully on constrained devices:

Algorithm RAM Required Notes
GZIP 32-64 KB Sliding window + Huffman tables
Window Agg <1 KB Just buffer for current window
FFT (1024 pt) 16 KB Complex float buffer + twiddle factors
FFT (4096 pt) 64 KB May not fit on small MCUs
Semantic 2-4 KB History buffer + state

ESP32 recommendation: Use window aggregation or semantic compression as primary strategy. Reserve FFT for specific signals where frequency analysis is required.

47.6 Common Compression Pitfalls

Pitfall: Over-Aggressive Lossy Compression

The Mistake: Applying high compression ratios uniformly across all sensor data without understanding which information is critical for downstream analytics, permanently destroying signals needed for root cause analysis.

Why It Happens: Bandwidth costs drive aggressive compression targets. Teams optimize for average case without considering anomaly detection requirements. Compression algorithms are chosen based on benchmark performance rather than domain-specific information preservation. The “we can always collect more data later” assumption fails for non-reproducible events.

The Fix: Profile your analytics requirements before choosing compression. For predictive maintenance, preserve frequency-domain information (use FFT compression, not just statistics). For threshold alerting, min/max preservation is critical. For trend analysis, mean and standard deviation suffice. Implement tiered compression: full resolution for anomalies detected locally, heavy compression for steady-state readings. Always retain enough information to answer “why did this alert trigger?” after the fact.

Pitfall: Compression Without Metadata

The Mistake: Compressing sensor data without preserving the metadata needed to decompress or interpret it correctly, creating files that cannot be decoded weeks or months later.

Why It Happens: Metadata seems redundant during development when context is fresh. Schema documentation is maintained separately and drifts over time. Edge device memory constraints pressure developers to strip every unnecessary byte. Compression parameters are hardcoded rather than embedded in output.

The Fix: Always include compression metadata in the payload or use self-describing formats. For FFT compression, include sample rate, window size, and which frequency bins are transmitted. For statistical aggregation, include sample count, window duration, and timestamp precision. Use envelope formats that version the compression scheme: {"compression": "fft-v2", "params": {...}, "data": [...]}. Maintain a compression schema registry that maps version identifiers to decompression algorithms.

Pitfall: Ignoring Compression Computation Cost

The Mistake: Selecting compression algorithms based purely on compression ratio without accounting for CPU time and energy cost on battery-powered edge devices, resulting in net-negative energy savings.

Why It Happens: Compression benchmarks on desktop hardware show impressive ratios with negligible CPU time. The 1000x difference in computational efficiency between an ESP32 and a laptop is underestimated. Energy cost of computation versus transmission varies by network type (Wi-Fi is cheap to transmit, LoRa is expensive). Algorithm selection copied from cloud/server contexts.

The Fix: Measure end-to-end energy consumption: E_total = E_compute + E_transmit. For LoRaWAN devices where transmission costs 100+ mJ per packet, aggressive compression (even expensive algorithms) saves energy. For Wi-Fi devices where transmission costs 1-5 mJ per packet, simple aggregation beats complex compression. Profile specific algorithms on your target MCU: GZIP on ESP32 consumes 8.5 mJ for 1000 samples versus 0.4 mJ for window aggregation. Choose the algorithm that minimizes total energy, not just bytes transmitted.

47.7 Understanding Check: Industrial Edge Data Pipeline Design

Scenario: Factory Vibration Monitoring System

Your manufacturing plant monitors 50 critical machines using vibration sensors to detect bearing failures before catastrophic breakdown. Each sensor must detect frequencies up to 200 Hz (bearing defects manifest at 50-200 Hz harmonics).

System constraints:

  • Sensor: MEMS accelerometer (+/-16g range)
  • Edge compute: ESP32 gateway with 4MB flash, 520KB RAM
  • Network: 4G cellular with 10 GB/month data cap ($0.10/GB overage)
  • Requirement: Detect anomalies within 1 minute, minimize bandwidth costs

Current naive approach:

  • Sample at 500 Hz (meets Nyquist: 2 x 200 Hz)
  • Stream raw data to cloud continuously
  • Result: 500 samples/sec x 2 bytes x 50 sensors = 50 KB/s = 129 GB/month ($11.90 overage!)

47.7.1 Think About: Data Reduction Strategy Trade-offs

Which edge processing strategy best balances anomaly detection accuracy, bandwidth costs, and latency?

Strategy Data Transmitted Bandwidth Cost Detection Latency Information Loss
A. Raw streaming 5000 samples/10s 129 GB/month ($11.90) Real-time (<1s) None (full fidelity)
B. Downsample to 100 Hz 1000 samples/10s 26 GB/month ($1.60) Real-time (<1s) Loses 200+ Hz info (aliasing)
C. Time-domain stats 6 values/10s (min/max/mean/std/peak/RMS) 0.15 GB/month ($0) 10 seconds Loses frequency info (can’t detect bearing harmonics)
D. FFT + compression 10 FFT bins/10s 0.26 GB/month ($0) 10 seconds Preserves frequency info (50-200 Hz)

47.7.2 Key Insight: Edge FFT for Bandwidth Reduction

Option D (FFT + compression) achieves 500x bandwidth reduction while preserving anomaly detection capability:

How it works:

# Edge processing pipeline (runs on ESP32 every 10 seconds)
def vibration_pipeline():
    # 1. Collect 10 seconds of data
    samples = collect_samples(rate=500, duration=10)  # 5000 samples

    # 2. Apply FFT (frequency-domain analysis)
    fft_result = numpy.fft.rfft(samples)  # -> 2500 frequency bins

    # 3. Extract critical frequency bins
    #    Bin width = 500 Hz / 5000 samples = 0.1 Hz per bin
    #    Bin index = frequency / bin_width
    bin_width = 500.0 / 5000  # 0.1 Hz per bin
    target_freqs = [50, 80, 110, 140, 170, 200]  # Hz
    bins = [fft_result[int(f / bin_width)] for f in target_freqs]
    # bins at indices [500, 800, 1100, 1400, 1700, 2000]
    # + 4 more bins for comprehensive coverage

    # 4. Transmit 10 values instead of 5000
    transmit_to_cloud(bins)  # 20 bytes vs 10,000 bytes

    return bins

# Data reduction: 5000 samples -> 10 FFT bins = 500x compression
Try It: Edge FFT Pipeline Data Reduction

Adjust the vibration sensor parameters and FFT settings to see how edge FFT compression reduces bandwidth for factory monitoring.

Why this works for anomaly detection:

  1. Bearing failure signatures live in frequency domain: Healthy bearing = smooth spectrum. Failing bearing = spikes at harmonics (80 Hz, 160 Hz for 2400 RPM machine).

  2. No information loss for anomaly detection: Cloud ML model trained on FFT bins, not raw waveforms. Accuracy: 94% (FFT bins) vs 96% (raw samples, not worth 500x bandwidth cost).

  3. Latency acceptable: 10-second aggregation + 2-second transmission = 12 seconds total (well under 1-minute requirement).

  4. Cost savings: 0.26 GB/month stays under data cap! (vs $11.90 overage for raw streaming).

Scenario: A manufacturing facility wants to detect bearing faults in motors running at 1800 RPM. Bearing defects produce vibration frequencies at harmonics of the motor speed. You need to determine the minimum sampling rate to capture fault signatures.

Given:

  • Motor speed: 1800 RPM = 30 Hz (revolutions per second)
  • Bearing fault frequencies:
    • 1x: 30 Hz (fundamental, imbalance)
    • 2x: 60 Hz (misalignment)
    • 3x: 90 Hz (looseness)
    • 5x: 150 Hz (bearing outer race defect)
    • 7x: 210 Hz (bearing inner race defect)
  • Highest frequency of interest: 210 Hz (7th harmonic)

Question: What is the minimum sampling rate required, and what sampling rate should you actually use in practice?

Solution:

Step 1: Apply Nyquist theorem

Minimum sampling rate = 2 × highest frequency

f_sample_min = 2 × 210 Hz = 420 Hz

Step 2: Calculate practical sampling rate with safety margin

Industry practice: Use 2.5x to 3x Nyquist for anti-aliasing filter roll-off

f_sample_recommended = 2.5 × 420 Hz = 1,050 Hz
f_sample_practical = 3 × 420 Hz = 1,260 Hz

Round up to convenient power-of-2 or decade value: 1,280 Hz or 1,000 Hz

Step 3: Verify no aliasing occurs

Check if any harmonic would alias into the measurement band: - At 1,000 Hz sampling, Nyquist frequency = 500 Hz - All fault frequencies (30-210 Hz) are below 500 Hz ✓ - No aliasing! All harmonics are correctly captured.

Step 4: Calculate data volume

Single sensor: - Sample rate: 1,000 Hz - Data size: 2 bytes per sample (16-bit ADC) - Data rate: 1,000 × 2 = 2,000 bytes/sec = 2 KB/sec - Daily data: 2 KB/sec × 86,400 sec = 172.8 MB/day

For 100 motors: - Daily data: 100 × 172.8 MB = 17.28 GB/day - Monthly data: 17.28 × 30 = 518.4 GB/month

Step 5: Apply edge FFT compression

Rather than stream raw waveforms to cloud:

# Edge processing every 10 seconds
samples_per_window = 1000 Hz × 10 sec = 10,000 samples

# Perform FFT
fft_result = fft(samples)  # 5,000 frequency bins (real FFT)

# Extract only the bins of interest (fault frequencies)
fault_bins = [30, 60, 90, 150, 210]  # Hz
bin_width = 1000 Hz / 10000 samples = 0.1 Hz per bin
selected_bins = [int(f / bin_width) for f in fault_bins]
# Result: bins [300, 600, 900, 1500, 2100]

# Transmit only these 5 bins (10 bytes) instead of 20,000 bytes
compressed_data = [fft_result[b] for b in selected_bins]
compression_ratio = 20,000 / 10 = 2,000x

Step 6: Calculate bandwidth savings

Without compression: - 100 motors × 17.28 GB/day = 1.728 TB/day

With edge FFT compression (2,000x): - 1.728 TB / 2,000 = 864 MB/day

Cost savings:

  • Cloud ingress: $0.09/GB
  • Uncompressed: 1,728 GB × $0.09 = $155/day = $56,575/year
  • Compressed: 0.864 GB × $0.09 = $0.08/day = $29/year
  • Savings: $56,546/year

Key Insight: For vibration analysis, sample at 2.5-3x Nyquist (not just 2x minimum) to allow for anti-aliasing filter roll-off. Then apply edge FFT compression by transmitting only the frequency bins of interest (harmonics), achieving 1,000-10,000x data reduction while preserving all fault detection capability. The edge gateway does the heavy computation; the cloud receives only the diagnostic features.

Choose the appropriate compression strategy based on signal characteristics, edge compute capabilities, and analytical requirements:

Signal Type Recommended Compression Typical Ratio Edge CPU Bandwidth Information Loss Best For
Slowly changing continuous (temperature, humidity) Statistical aggregation (min/max/mean/std) 100-1000x Very Low (1% CPU) 99%+ reduction Loses individual samples, keeps trends Environmental monitoring, agriculture
Periodic vibration (motors, pumps) FFT + top-N frequency bins 100-5000x High (50% CPU) 99%+ reduction Loses waveform, keeps frequency spectrum Predictive maintenance, bearing analysis
Event-driven sparse (motion, door switches) Event logging (timestamp + state change only) 1000-10000x Very Low 99.9%+ reduction Loses “no event” periods (acceptable) Security, occupancy, access control
High-frequency transient (acoustics, ultrasound) Triggered capture + FFT 50-500x High 98%+ reduction Loses non-trigger periods Leak detection, acoustic monitoring
Bounded range analog (pressure, flow) Delta encoding + GZIP 3-10x Medium (10% CPU) 70-90% reduction None (lossless) Critical measurements requiring full fidelity
Audit trail / compliance (access logs, alarms) GZIP compression only 2-5x Low (5% CPU) 50-80% reduction None (lossless) Regulatory compliance, security logs

Decision Tree:

  1. Is the signal event-driven (state changes only)?
    • YES → Use event logging (transmit only state changes)
    • NO → Continue to step 2
  2. Do you need to preserve the exact waveform for audit/compliance?
    • YES → Use lossless compression only (GZIP, DEFLATE)
    • NO → Continue to step 3
  3. Does the signal have strong frequency-domain features?
    • YES (vibration, acoustics) → Use FFT + top-N bins
    • NO → Continue to step 4
  4. Is the signal slowly changing (< 1% per sample)?
    • YES → Use statistical aggregation over time windows
    • NO → Continue to step 5
  5. Is the signal bounded with low variance?
    • YES → Use delta encoding + lossless compression
    • NO → Use adaptive sampling rate based on rate-of-change

Example: Multi-Sensor System with Different Compression Strategies:

class EdgeCompressionPipeline:
    def __init__(self):
        self.strategies = {
            'temperature': StatisticalAggregator(window_sec=300),    # 5-min windows
            'vibration': FFTCompressor(top_n=10, window_sec=10),     # Top 10 freq bins
            'motion': EventLogger(),                                 # State changes only
            'pressure': DeltaEncoder() + GZIPCompressor(),           # Lossless delta
            'door': EventLogger(),                                   # State changes only
        }

    def compress_sensor_data(self, sensor_id, raw_samples):
        sensor_type = self.get_sensor_type(sensor_id)
        strategy = self.strategies[sensor_type]
        return strategy.compress(raw_samples)

# Usage example:
pipeline = EdgeCompressionPipeline()

# Temperature: 300 samples → 4 summary values (min/max/mean/std)
temp_compressed = pipeline.compress_sensor_data("temp_01", temp_samples)
# Compression: 300 samples × 2 bytes = 600 bytes → 16 bytes (4 floats)
# Ratio: 37.5x

# Vibration: 10,000 samples → 10 FFT bins
vib_compressed = pipeline.compress_sensor_data("vib_01", vib_samples)
# Compression: 10,000 × 2 bytes = 20 KB → 40 bytes (10 complex floats)
# Ratio: 500x

# Motion: 1000 samples (mostly "no motion") → 3 events ("motion detected" at 3 timestamps)
motion_compressed = pipeline.compress_sensor_data("motion_01", motion_samples)
# Compression: 1000 × 1 byte = 1 KB → 24 bytes (3 events × 8 bytes each)
# Ratio: 42x
Try It: Multi-Sensor Compression Pipeline Comparison

Configure sensor counts and sampling rates to see how different compression strategies affect total bandwidth for a multi-sensor system.

Buffer Sizing for Each Strategy:

Strategy RAM Required (per sensor) Latency Added Notes
Statistical Aggregation 1-2 KB (circular buffer) 5-60 seconds (window duration) Minimal memory, acceptable latency
FFT Compression 16-64 KB (FFT working memory) 1-10 seconds (FFT window) High memory, fast processing
Event Logging < 1 KB (state machine) None (immediate) Minimal resources, real-time
Delta Encoding 4-8 KB (recent history) < 1 second Low memory, minimal latency

Verification Checklist:

Common Mistake: Undersampling Harmonics in Vibration Monitoring

The Mistake: Sampling vibration data at only 2x the motor’s fundamental frequency, missing critical high-frequency bearing fault signatures that appear at 5x-7x harmonics.

Real-World Example: A factory deployed vibration sensors with 100 Hz sampling on 30 Hz motors (thinking “2x motor speed is enough”):

Motor: 30 Hz
Bearing outer race fault frequency: 5 × 30 = 150 Hz

At 100 Hz sampling:
- Nyquist = 50 Hz
- 150 Hz aliases to |150 - round(150/100) × 100| = |150 - 200| = 50 Hz
- At exactly the Nyquist frequency, the signal is severely distorted

The bearing fault appeared as an unreliable artifact at the Nyquist boundary.
ML model missed 8 bearing failures before they became catastrophic.
One failure caused $500K in downtime.

Correct Implementation:

def calculate_vibration_sampling_rate(motor_rpm, bearing_type="ball"):
    """
    Calculate sampling rate for bearing fault detection.
    Accounts for all possible fault harmonics.
    """
    motor_hz = motor_rpm / 60

    # Bearing fault frequency multipliers
    fault_harmonics = {
        "ball": [1, 2, 3, 4, 5, 6, 7, 8],      # Ball bearings: up to 8x
        "roller": [1, 2, 3, 4, 5],              # Roller bearings: up to 5x
        "sleeve": [1, 2, 3],                    # Sleeve bearings: up to 3x
    }

    highest_harmonic = max(fault_harmonics[bearing_type])
    highest_frequency = motor_hz * highest_harmonic

    # Safety factor: 3x Nyquist for anti-aliasing filter
    recommended_rate = 3 * 2 * highest_frequency

    return {
        'motor_hz': motor_hz,
        'highest_fault_hz': highest_frequency,
        'nyquist_min': 2 * highest_frequency,
        'recommended': recommended_rate,
    }

# Example usage:
rate_info = calculate_vibration_sampling_rate(motor_rpm=1800, bearing_type="ball")
# Motor: 30 Hz | Highest fault: 240 Hz | Recommended: 1440 Hz (3x Nyquist)
Try It: Vibration Sampling Rate by Bearing Type

Adjust motor RPM and bearing type to see how bearing fault harmonics determine the required sampling rate.

Warning Signs: Vibration analysis shows only the fundamental frequency with no harmonics. Bearing failures occur with “no warning” despite continuous monitoring. FFT spectrum looks suspiciously clean.

Prevention: Always analyze the FULL harmonic series for rotating machinery. Sample at 3x Nyquist (not just 2x) to allow for anti-aliasing filter roll-off. Verify by plotting the FFT spectrum and confirming all expected fault harmonics are visible.

47.8 Knowledge Check

47.9 Practice Exercises

Objective: Determine optimal sampling rates for different sensor types.

Tasks:

  1. Identify signal characteristics for 4 sensors: temperature (max 0.1 Hz), vibration (max 500 Hz), audio (max 20 kHz), motion IMU (max 50 Hz)
  2. Apply Nyquist theorem: calculate minimum sampling rates
  3. Implement with margin and measure data rate impact

Expected Outcome: Understand the relationship between signal bandwidth and sampling requirements.

Objective: Implement multiple data reduction techniques and compare bandwidth savings.

Tasks:

  1. Collect 1-minute of high-rate sensor data (100 Hz = 6,000 samples)
  2. Apply 4 reduction strategies: downsampling, statistical aggregation, delta encoding, event-based
  3. Transmit reduced data and compare bandwidth
  4. Validate: can you detect a 1C temperature spike with each method?

Expected Outcome: Understand trade-offs between compression ratio and information preservation.

Sammy learns the Nyquist rule!

Sammy the Sensor was monitoring a washing machine’s vibrations. He was taking measurements really slowly – only once every 5 seconds.

“Something is wrong!” said Max the Microcontroller. “The machine is shaking like crazy, but your readings look perfectly calm!”

Lila the LED, who loved science, explained: “Sammy, you are measuring too slowly! The machine vibrates hundreds of times per second, but you only check every 5 seconds. It is like trying to watch a hummingbird by opening your eyes once every minute – you would never see its wings move!”

“There is a rule called the Nyquist rule,” Lila continued. “You need to measure at LEAST twice as fast as the thing is changing. If the machine vibrates 100 times per second, you need to measure at least 200 times per second!”

Sammy sped up his measurements, and suddenly the vibration patterns appeared clearly. But now Bella the Battery was worried: “That is SO much data! I cannot send all of it to the cloud!”

Max had the solution: “We can COMPRESS the data! Instead of sending every single measurement, let us calculate a summary – the average vibration, the biggest shake, and the overall pattern. We send 10 numbers instead of 5,000!”

The lesson: Sample fast enough to catch what is happening (Nyquist rule), then compress smartly to save energy and bandwidth!

Key Takeaway

Always sample at 2x or higher than the highest frequency of interest (Nyquist theorem) to avoid aliasing artifacts. Then apply edge data reduction – aggregation for slow-changing signals, FFT compression for vibration analysis, or semantic event extraction for sparse event streams – to reduce bandwidth by 10-1000x while preserving the information needed for downstream analytics.

47.10 Summary

Edge data acquisition requires careful balance between data fidelity and resource constraints:

  • Nyquist compliance: Sample at 2x or higher than your highest frequency of interest to avoid aliasing
  • Reduction techniques: Aggregation (10-50x), FFT compression (50-500x), and semantic extraction (100-1000x) each serve different use cases
  • Algorithm selection: Match compression to downstream analytics needs - lossless for audit, statistical for trends, FFT for vibration, semantic for events
  • Resource awareness: Consider CPU time and memory on constrained edge devices, not just compression ratio

47.11 Concept Relationships

Sampling and compression determine data fidelity, bandwidth, and power consumption trade-offs:

Sampling Theory (This chapter):

  • Nyquist theorem: sample at 2x highest frequency to avoid aliasing (vibration monitoring: sample at 2.5-3x Nyquist for anti-aliasing filter)
  • Under-sampling causes phantom patterns (150 Hz bearing fault aliased to 30 Hz when sampled at 60 Hz)

Compression Strategies (This chapter):

  • Lossless (GZIP): 2-4x reduction, preserves all data (audit trails, compliance)
  • Lossy statistical (aggregation): 10-100x reduction, preserves trends (environmental monitoring)
  • FFT-based: 50-500x reduction, preserves frequency spectrum (vibration analysis)
  • Semantic (event extraction): 100-1000x reduction, preserves state changes (threshold monitoring)

Power Impact:

Architecture Context:

Data Quality:

Key Insight: Compression algorithm selection depends on analytics requirements, not just compression ratio. Vibration monitoring needs FFT compression (preserves frequency info for bearing fault detection) even though aggregation yields higher ratios. Choosing wrong compression permanently destroys the signal needed for analysis.

47.12 What’s Next

If you want to… Read this
Understand the acquisition architecture that applies these strategies Edge Acquisition Architecture
Learn about power management for low-duty-cycle sampling Edge Acquisition Power and Gateways
Study edge compute patterns built on efficient sampling Edge Compute Patterns
Apply to the broader edge data acquisition context Edge Data Acquisition
Return to the module overview Big Data Overview

Edge Acquisition Series:

Processing Context:

Data Quality: