41 Edge Sampling and Compression

analytics-ml

edge

acq

sampling

41.1 Start With the Story

Picture an IoT team using the ideas in Edge Sampling and Compression during a live operations review. A device has produced messy evidence, an analytic step is about to change an alert or control decision, and someone has to explain why the result should be trusted.

Read this page as that path from sensor evidence to accountable action. Start with what the system observes, keep the model or data treatment visible, and finish with the check that would convince an operator, maintainer, or auditor to act.

41.2 Learning Objectives

By the end of this chapter, you will be able to:

Apply Nyquist Theorem: Calculate appropriate sampling rates for different sensor types
Implement Data Reduction Techniques: Use aggregation, compression, event-based reporting, and delta encoding
Select Compression Algorithms: Choose optimal algorithms based on data type and edge device constraints
Avoid Common Pitfalls: Prevent sampling aliasing, buffer overflow, and rate mismatch errors

Quick Check: Edge Sampling & Compression

In 60 Seconds

Adaptive sampling and on-device compression are the two most powerful techniques for reducing IoT data volume at the edge — adaptive sampling matches collection frequency to signal dynamics, while compression exploits redundancy to shrink the data that is transmitted. Together they can reduce bandwidth requirements by 90–99% without meaningful loss of analytical value.

Edge Sampling Compression Basics

Edge sampling and compression reduce the amount of data IoT devices need to transmit. Think of sending a friend the highlights of a movie instead of the entire film. By transmitting only important changes or compressed summaries, devices save battery power and network bandwidth while preserving the information that matters most.

41.3 Prerequisites

Before diving into this chapter, you should be familiar with:

Edge Data Acquisition: Architecture: Understanding device categories and data generation patterns
Basic signal processing concepts: Familiarity with frequency and time-domain representations
Python programming: Code examples use Python for data processing

Edge Data Reduction Basics

Core Concept: Transform raw sensor data into actionable information at the source - send summaries, statistics, and alerts rather than every reading.

Why It Matters: Transmitting data costs 10-100x more energy than processing it locally. A sensor sending 1000 samples/minute to the cloud uses 100x more bandwidth than one sending minute-averages - with identical analytical value for most applications.

Key Takeaway: Apply the 90% rule - if 90% of your data is “normal” readings that will never be analyzed individually, aggregate them locally. Send statistical summaries (min, max, mean, std) at lower frequency, and only transmit raw data when anomalies are detected. This extends battery life from days to years.

41.4 Nyquist Sampling Rate

Time: ~8 min | Difficulty: Intermediate | Reference: P10.C08.U03

Key Concepts

Adaptive sampling: Dynamically adjusting the sensor sampling rate based on signal variance or event rate — increasing frequency when the signal changes rapidly and decreasing it during quiet periods.
Nyquist-Shannon theorem: The fundamental sampling principle stating that a signal must be sampled at least twice its highest frequency component to be reconstructed accurately — the minimum sampling rate for any IoT sensor.
Delta encoding: A compression technique transmitting only the change between consecutive readings rather than absolute values, highly effective for slowly varying sensors.
Run-length encoding (RLE): Compressing sequences of identical values into a count-value pair — very effective for binary event streams or sensors with frequent identical readings.
Lossless compression: Compression that allows perfect reconstruction of the original data — required for financial billing data, safety-critical readings, and regulatory compliance.
Lossy compression: Compression that discards some information to achieve higher compression ratios — acceptable for analytics workloads where small accuracy loss is tolerable.

Chapter Roadmap

First set the sampling rate from the highest frequency that matters, not from the most convenient device rate.
Then reduce edge traffic with aggregation, event reporting, delta encoding, and outage-aware buffering.
Next compare lossless, statistical, FFT, and semantic compression against the evidence the cloud still needs.
Finally test the factory vibration case, choose a strategy, and use the quizzes to check that fidelity, bandwidth, and memory stay aligned.

Checkpoints recap the decision path. Deep-dive and collapsed callouts keep calculations and implementation details available without replacing the main sampling-and-compression flow.

To accurately capture a signal, the sampling rate must be at least twice the highest frequency component of interest:

\[f_{sample} \geq 2 \times f_{max}\]

Practical examples:

Signal Type	Max Frequency	Min Sample Rate	Typical Rate
Temperature	0.1 Hz (slow changes)	0.2 Hz	1 sample/minute
Vibration	500 Hz	1 kHz	2-5 kHz
Audio	20 kHz	40 kHz	44.1 kHz
Motion (IMU)	50 Hz	100 Hz	100-200 Hz

Phoebe’s Field Notes: Why Sampling First Protects Compression

Phoebe the physics guide

Phoebe’s Why

Compression cannot recover physics that the ADC never captured. Sampling multiplies the real vibration signal by a clock, which copies the spectrum at every multiple of the sample rate. If those copies overlap, a high-frequency bearing fault folds into a false low-frequency pattern. The anti-alias filter and sample rate must therefore be chosen before any averaging, downsampling, or FFT payload reduction. Quantization adds the second limit: each sample lands in a voltage bin, so the bin width sets an irreducible noise floor.

The Derivation

Sampling a continuous signal every $T_s$ seconds gives

\[x_s(t)=x(t)\sum_n \delta(t-nT_s)\]

with $f_s=1/T_s$. In frequency, the spectrum is replicated:

\[X_s(f)=f_s\sum_k X(f-kf_s)\]

If the original signal is limited to $|f|\le f_{max}$, adjacent copies avoid overlap only when

\[f_s-f_{max}\ge f_{max}\]

\[f_s\ge 2f_{max}\]

A component above the safe band folds to the nearest replica:

\[f_{alias}=|f_{signal}-k f_s|\]

For an $N$-bit ADC, the quantization step is

\[q=\frac{V_{ref}}{2^N}\]

and uniformly distributed rounding error has noise power

\[\sigma_q^2=\frac{q^2}{12}\]

For a full-scale sine wave, that produces the ideal converter limit

\[\mathrm{SNR}_{dB}=6.02N+1.76\]

Worked Numbers: This Chapter’s Vibration Case

Motor frequency: $1800\ \mathrm{RPM}/60=30.0$ Hz.
Fault component: 8 rolling elements give $8(30.0)=240$ Hz; the 3rd harmonic is $3(240)=720$ Hz.
Nyquist minimum: $2(720)=1440$ Hz, so the chapter’s 2.5x safety factor gives $2.5(1440)=3600$ Hz, rounded to about 4 kHz for practical sampling.
Aliasing check: at 100 Hz sampling, the 240 Hz bearing component folds to $|240-\mathrm{round}(240/100)(100)|=|240-200|=40.0$ Hz, which is a false slow pattern.
Raw payload: 4 kHz at 2 bytes per sample is $4000(2)=8000$ bytes/s, or 8.00 KB/s before compression.
FFT reduction: sending 120 bytes/s of top peaks instead of 8000 bytes/s gives $8000/120=66.7$, so the stated 67x reduction preserves the frequency evidence while cutting bandwidth.
Quantization ceiling: a 2-byte sample is 16 bits, so the ideal ADC limit is $6.02(16)+1.76=98.1$ dB before sensor noise, reference noise, and vibration mounting errors are counted.

Common Pitfall: Sampling Aliasing

The mistake: Sampling signals below the Nyquist rate (2x the highest frequency), causing phantom patterns (aliasing) that don’t exist in the original signal.

Symptoms:

Vibration analysis shows unexpected low-frequency patterns
Motor speed readings fluctuate despite constant RPM
Temperature data shows oscillations that don’t match physical reality
Frequency analysis reveals false peaks at wrong frequencies
Bearing fault detection produces false positives

Why it happens: Engineers apply “common sense” sampling rates without frequency analysis. Underestimating signal bandwidth - a 60 Hz motor generates harmonics at 120 Hz, 180 Hz, etc. Cost pressure drives lower sampling rates. Copy-pasting configurations between different sensor types.

The fix: Always sample at >2x the highest frequency of interest:

Signal Type	Max Frequency	Minimum Sample Rate	Recommended
Room temperature	0.01 Hz	0.02 Hz	1/minute
HVAC response	0.1 Hz	0.2 Hz	1/second
Motor vibration	500 Hz	1 kHz	2.5 kHz
Bearing analysis	5 kHz	10 kHz	25 kHz

Prevention: Perform frequency analysis on representative signals before deployment. Use anti-aliasing filters (low-pass hardware filters) before the ADC. When in doubt, oversample then downsample digitally with proper filtering.

Putting Numbers to It

What sampling rate is needed to detect bearing faults in a 1800 RPM motor?

Given:

Motor speed: 1800 RPM = 30 Hz (revolutions per second)
Bearing has 8 rolling elements
Bearing fault frequency: Rolling element passes once per revolution = $8 \times 30 = 240$ Hz
Harmonics: 2nd harmonic at 480 Hz, 3rd at 720 Hz
Highest frequency of interest: 3rd harmonic = 720 Hz

Nyquist calculation:

\[f_{sample} \geq 2 \times f_{max} = 2 \times 720 \text{ Hz} = 1440 \text{ Hz minimum}\]

Practical safety factor (2.5x above Nyquist):

\[f_{sample} = 2.5 \times 1440 = 3600 \text{ Hz} \approx 4 \text{ kHz}\]

If sampled at only 100 Hz (common mistake):

Nyquist frequency = 50 Hz, but bearing fault is at 240 Hz
240 Hz aliases to $|240 - \text{round}(240/100) \times 100| = |240 - 200| = 40$ Hz
Result: False 40 Hz pattern appears instead of the real 240 Hz bearing fault
Impact: Bearing failure goes undetected until catastrophic failure

Memory and bandwidth check:

4 kHz sampling × 2 bytes per sample = 8 KB/s raw data
1-second FFT windows → 8 KB per analysis
Send top 10 frequency peaks → 120 bytes per second (67× reduction)
Conclusion: Edge FFT compression mandatory for battery-powered vibration monitoring

41.4.1 Nyquist Rate Calculator

Use this calculator to explore how motor speed, harmonic order, and safety factor affect the required sampling rate and resulting data volume.

Show code

viewof easc_motorRpm = Inputs.range([100, 7200], {value: 1800, step: 100, label: "Motor Speed (RPM)"})
viewof easc_maxHarmonic = Inputs.range([1, 12], {value: 7, step: 1, label: "Highest Harmonic Order"})
viewof easc_safetyFactor = Inputs.range([2.0, 4.0], {value: 2.5, step: 0.1, label: "Safety Factor (x Nyquist)"})
viewof easc_bytesPerSample = Inputs.select([1, 2, 4], {value: 2, label: "Bytes per Sample"})

Show code

easc_motorHz = easc_motorRpm / 60
easc_highestFreq = easc_motorHz * easc_maxHarmonic
easc_nyquistMin = 2 * easc_highestFreq
easc_recommended = easc_safetyFactor * easc_nyquistMin
easc_practical = Math.pow(2, Math.ceil(Math.log2(easc_recommended)))
easc_dataRateBytes = easc_practical * easc_bytesPerSample
easc_dataRateKB = easc_dataRateBytes / 1024
easc_dailyMB = (easc_dataRateBytes * 86400) / (1024 * 1024)

Show code

html`<div style="background: var(--bs-light, #f8f9fa); border: 1px solid var(--bs-border-color, #dee2e6); border-radius: 8px; padding: 1.2rem; color: var(--bs-body-color);">
<h4 style="margin-top:0; color: #2C3E50;">Sampling Rate Results</h4>
<table style="width:100%; border-collapse: collapse; font-size: 0.95rem;">
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;"><strong>Motor Frequency</strong></td>
  <td style="padding: 6px 8px; text-align: right;">${easc_motorHz.toFixed(1)} Hz</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;"><strong>Highest Fault Frequency</strong> (${easc_maxHarmonic}x harmonic)</td>
  <td style="padding: 6px 8px; text-align: right;">${easc_highestFreq.toFixed(1)} Hz</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;"><strong>Nyquist Minimum</strong> (2x highest freq)</td>
  <td style="padding: 6px 8px; text-align: right;">${easc_nyquistMin.toFixed(0)} Hz</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;"><strong>Recommended Rate</strong> (${easc_safetyFactor}x Nyquist)</td>
  <td style="padding: 6px 8px; text-align: right; color: #0f6b5c; font-weight: bold;">${easc_recommended.toFixed(0)} Hz</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;"><strong>Practical Rate</strong> (nearest power of 2)</td>
  <td style="padding: 6px 8px; text-align: right; color: #0d47a1; font-weight: bold;">${easc_practical} Hz</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;"><strong>Data Rate</strong></td>
  <td style="padding: 6px 8px; text-align: right;">${easc_dataRateKB.toFixed(1)} KB/s</td>
</tr>
<tr>
  <td style="padding: 6px 8px;"><strong>Daily Data Volume</strong> (per sensor)</td>
  <td style="padding: 6px 8px; text-align: right; color: ${easc_dailyMB > 100 ? '#9f1d1d' : '#0f6b5c'};">${easc_dailyMB.toFixed(1)} MB/day</td>
</tr>
</table>
${easc_dailyMB > 100 ? html`<p style="margin-top: 0.8rem; padding: 0.5rem; background: rgba(231,76,60,0.1); border-radius: 4px; font-size: 0.9rem;"><strong>Warning:</strong> At ${easc_dailyMB.toFixed(0)} MB/day per sensor, edge compression (FFT or aggregation) is strongly recommended to reduce bandwidth.</p>` : html`<p style="margin-top: 0.8rem; padding: 0.5rem; background: rgba(22,160,133,0.1); border-radius: 4px; font-size: 0.9rem;"><strong>Manageable:</strong> Data volume is moderate, but compression can still yield significant bandwidth and power savings.</p>`}
</div>`

41.4.2 Interactive: Nyquist Sampling Animation

Checkpoint: Sampling Without Aliasing

You now know:

Nyquist starts from the highest frequency of interest, so a 60 Hz motor with 4th-harmonic faults needs a 480 Hz minimum sample rate.
The bearing example turns 1800 RPM into 30 Hz, then follows harmonics to 720 Hz before applying the 1440 Hz minimum and practical safety margin.
The calculator and animation are sanity checks for the same rule: low sample rates create false low-frequency patterns, not trustworthy compression.

41.5 Edge Data Reduction Techniques

Time: ~12 min | Difficulty: Intermediate | Reference: P10.C08.U03b

Before transmitting data to the cloud, edge devices can apply several reduction strategies:

Aggregation: Compute statistics over time windows (mean, min, max, variance)
Compression: Apply lossless (ZIP) or lossy (threshold-based) compression
Event-based reporting: Only transmit when values exceed thresholds
Delta encoding: Send only changes from previous values

# Example: Edge aggregation for temperature sensor
class EdgeAggregator:
    def __init__(self, window_size=60):  # 60 samples = 1 minute at 1 Hz
        self.window_size = window_size
        self.buffer = []

    def add_sample(self, value):
        self.buffer.append(value)
        if len(self.buffer) >= self.window_size:
            return self.compute_summary()
        return None

    def compute_summary(self):
        summary = {
            "min": min(self.buffer),
            "max": max(self.buffer),
            "mean": sum(self.buffer) / len(self.buffer),
            "samples": len(self.buffer)
        }
        self.buffer = []
        return summary

# Usage: Send 1 summary per minute instead of 60 raw samples
aggregator = EdgeAggregator(window_size=60)
for temp_reading in sensor_stream:
    summary = aggregator.add_sample(temp_reading)
    if summary:
        transmit_to_cloud(summary)  # 60x bandwidth reduction

Try It: Edge Aggregation Data Reduction

Adjust the sensor sampling rate and aggregation window size to see how edge aggregation reduces data volume. The widget calculates the bandwidth reduction and shows what information is preserved versus lost.

Show code

viewof agg_sampleRateHz = Inputs.range([0.1, 100], {value: 1, step: 0.1, label: "Sensor Sample Rate (Hz)"})
viewof agg_windowSize = Inputs.range([5, 600], {value: 60, step: 5, label: "Aggregation Window (samples)"})
viewof agg_rawPayload = Inputs.range([4, 64], {value: 20, step: 2, label: "Raw Payload per Sample (bytes)"})
viewof agg_summaryPayload = Inputs.range([16, 128], {value: 32, step: 4, label: "Summary Payload (bytes)"})

Show code

agg_windowDuration = agg_windowSize / agg_sampleRateHz
agg_rawPerHour = agg_sampleRateHz * 3600
agg_summariesPerHour = 3600 / agg_windowDuration
agg_rawBytesPerHour = agg_rawPerHour * agg_rawPayload
agg_summaryBytesPerHour = agg_summariesPerHour * agg_summaryPayload
agg_reductionFactor = agg_rawBytesPerHour / agg_summaryBytesPerHour
agg_rawDayKB = (agg_rawBytesPerHour * 24) / 1024
agg_summaryDayKB = (agg_summaryBytesPerHour * 24) / 1024
agg_pctReduction = ((agg_rawBytesPerHour - agg_summaryBytesPerHour) / agg_rawBytesPerHour) * 100

Show code

html`<div style="background: var(--bs-light, #f8f9fa); border: 1px solid var(--bs-border-color, #dee2e6); border-radius: 8px; padding: 1.2rem; color: var(--bs-body-color);">
<h4 style="margin-top:0; color: #2C3E50;">Edge Aggregation Data Reduction</h4>
<div style="display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 0.8rem; margin-bottom: 1rem;">
  <div style="text-align: center; padding: 0.6rem; background: rgba(231,76,60,0.08); border-radius: 6px;">
    <div style="font-size: 0.8rem; color: #374151;">Raw Transmissions</div>
    <div style="font-size: 1.3rem; font-weight: bold; color: #9f1d1d;">${agg_rawPerHour.toFixed(0)}/hr</div>
  </div>
  <div style="text-align: center; padding: 0.6rem; background: rgba(22,160,133,0.08); border-radius: 6px;">
    <div style="font-size: 0.8rem; color: #374151;">Aggregated Summaries</div>
    <div style="font-size: 1.3rem; font-weight: bold; color: #0f6b5c;">${agg_summariesPerHour.toFixed(1)}/hr</div>
  </div>
  <div style="text-align: center; padding: 0.6rem; background: rgba(230,126,34,0.08); border-radius: 6px;">
    <div style="font-size: 0.8rem; color: #374151;">Reduction Factor</div>
    <div style="font-size: 1.3rem; font-weight: bold; color: #8a4a00;">${agg_reductionFactor.toFixed(0)}x</div>
  </div>
</div>
<table style="width:100%; border-collapse: collapse; font-size: 0.9rem;">
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Window Duration</strong></td>
  <td style="padding: 5px 8px; text-align: right;">${agg_windowDuration < 60 ? agg_windowDuration.toFixed(1) + " sec" : (agg_windowDuration/60).toFixed(1) + " min"}</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Raw Daily Volume</strong></td>
  <td style="padding: 5px 8px; text-align: right; color: #9f1d1d;">${agg_rawDayKB < 1024 ? agg_rawDayKB.toFixed(1) + " KB" : (agg_rawDayKB/1024).toFixed(2) + " MB"}</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Aggregated Daily Volume</strong></td>
  <td style="padding: 5px 8px; text-align: right; color: #0f6b5c;">${agg_summaryDayKB < 1024 ? agg_summaryDayKB.toFixed(1) + " KB" : (agg_summaryDayKB/1024).toFixed(2) + " MB"}</td>
</tr>
<tr>
  <td style="padding: 5px 8px;"><strong>Bandwidth Saved</strong></td>
  <td style="padding: 5px 8px; text-align: right; color: #0d47a1; font-weight: bold;">${agg_pctReduction.toFixed(1)}%</td>
</tr>
</table>
<p style="margin-top: 0.8rem; padding: 0.5rem; background: rgba(52,152,219,0.06); border-radius: 4px; font-size: 0.85rem;">
<strong>Summary contents:</strong> Each ${agg_summaryPayload}-byte summary contains min, max, mean, and sample count for a ${agg_windowDuration < 60 ? agg_windowDuration.toFixed(0) + "-second" : (agg_windowDuration/60).toFixed(1) + "-minute"} window of ${agg_windowSize} samples, replacing ${agg_windowSize} individual ${agg_rawPayload}-byte transmissions.
</p>
</div>`

Putting Numbers to It

Battery life impact of edge aggregation vs raw transmission:

Scenario: Temperature sensor with 2500 mAh battery, 1 reading/minute

Option A: Raw transmission (every reading sent):

LoRa TX: 120 mA for 1.5s per transmission
Transmissions per hour: 60
TX energy per hour: $(120 \text{ mA} \times 1.5 \text{ s} \times 60) / 3600 = 3.0 \text{ mAh}$
Plus sleep: 0.01 mAh/hour
Total: 3.01 mAh/hour → battery life = 2500 / 3.01 = 831 hours = 35 days

Option B: Edge aggregation (1 summary every 15 minutes):

Transmissions per hour: 4
TX energy per hour: $(120 \text{ mA} \times 1.5 \text{ s} \times 4) / 3600 = 0.20 \text{ mAh}$
Plus sleep: 0.01 mAh/hour
Total: 0.21 mAh/hour → battery life = 2500 / 0.21 = 11,905 hours = 496 days = 1.4 years

Battery life improvement: $\frac{496}{35} = 14.2\times$ longer

Data volume comparison:

Raw: 60 packets/hour × 20 bytes = 1.2 KB/hour = 28.8 KB/day
Aggregated: 4 packets/hour × 32 bytes (min/max/mean/count) = 128 bytes/hour = 3.1 KB/day
Bandwidth reduction: $\frac{28.8}{3.1} = 9.3\times$

Key insight: Transmission dominates power budget. Aggregation provides 14× battery improvement with minimal information loss for trend analysis.

41.5.1 Battery vs Aggregation Calculator

Adjust the parameters below to see how edge aggregation affects battery life and bandwidth for a LoRa-connected sensor.

Show code

viewof easc_batteryMah = Inputs.range([500, 10000], {value: 2500, step: 100, label: "Battery Capacity (mAh)"})
viewof easc_txCurrentMa = Inputs.range([20, 300], {value: 120, step: 10, label: "TX Current (mA)"})
viewof easc_txDurationSec = Inputs.range([0.1, 5.0], {value: 1.5, step: 0.1, label: "TX Duration (seconds)"})
viewof easc_samplesPerHour = Inputs.range([1, 120], {value: 60, step: 1, label: "Sensor Samples per Hour"})
viewof easc_aggWindow = Inputs.range([1, 60], {value: 15, step: 1, label: "Aggregation Window (minutes)"})

Show code

easc_sleepMah = 0.01
easc_rawTxPerHour = easc_samplesPerHour
easc_rawEnergyPerHour = (easc_txCurrentMa * easc_txDurationSec * easc_rawTxPerHour) / 3600
easc_rawTotalPerHour = easc_rawEnergyPerHour + easc_sleepMah
easc_rawBatteryHours = easc_batteryMah / easc_rawTotalPerHour
easc_rawBatteryDays = easc_rawBatteryHours / 24

easc_aggTxPerHour = 60 / easc_aggWindow
easc_aggEnergyPerHour = (easc_txCurrentMa * easc_txDurationSec * easc_aggTxPerHour) / 3600
easc_aggTotalPerHour = easc_aggEnergyPerHour + easc_sleepMah
easc_aggBatteryHours = easc_batteryMah / easc_aggTotalPerHour
easc_aggBatteryDays = easc_aggBatteryHours / 24

easc_improvement = easc_aggBatteryDays / easc_rawBatteryDays

easc_rawBwBytesPerDay = easc_rawTxPerHour * 24 * 20
easc_aggBwBytesPerDay = easc_aggTxPerHour * 24 * 32
easc_bwReduction = easc_rawBwBytesPerDay / easc_aggBwBytesPerDay

Show code

html`<div style="background: var(--bs-light, #f8f9fa); border: 1px solid var(--bs-border-color, #dee2e6); border-radius: 8px; padding: 1.2rem; color: var(--bs-body-color);">
<h4 style="margin-top:0; color: #2C3E50;">Battery Life Comparison</h4>
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 1rem;">
  <div style="background: rgba(231,76,60,0.08); border-radius: 6px; padding: 1rem; text-align: center;">
    <div style="font-size: 0.85rem; color: #374151;">Raw Transmission</div>
    <div style="font-size: 0.85rem; color: #374151;">(${easc_rawTxPerHour} TX/hour)</div>
    <div style="font-size: 2rem; font-weight: bold; color: #9f1d1d; margin: 0.3rem 0;">${easc_rawBatteryDays < 1 ? easc_rawBatteryHours.toFixed(0) + " hrs" : easc_rawBatteryDays < 365 ? easc_rawBatteryDays.toFixed(0) + " days" : (easc_rawBatteryDays/365).toFixed(1) + " yrs"}</div>
    <div style="font-size: 0.8rem; color: #374151;">${(easc_rawBwBytesPerDay/1024).toFixed(1)} KB/day</div>
  </div>
  <div style="background: rgba(22,160,133,0.08); border-radius: 6px; padding: 1rem; text-align: center;">
    <div style="font-size: 0.85rem; color: #374151;">With ${easc_aggWindow}-min Aggregation</div>
    <div style="font-size: 0.85rem; color: #374151;">(${easc_aggTxPerHour.toFixed(0)} TX/hour)</div>
    <div style="font-size: 2rem; font-weight: bold; color: #0f6b5c; margin: 0.3rem 0;">${easc_aggBatteryDays < 1 ? easc_aggBatteryHours.toFixed(0) + " hrs" : easc_aggBatteryDays < 365 ? easc_aggBatteryDays.toFixed(0) + " days" : (easc_aggBatteryDays/365).toFixed(1) + " yrs"}</div>
    <div style="font-size: 0.8rem; color: #374151;">${(easc_aggBwBytesPerDay/1024).toFixed(1)} KB/day</div>
  </div>
</div>
<p style="text-align: center; margin-top: 1rem; font-size: 1.1rem;">
  <strong>Improvement:</strong> <span style="color: #0f6b5c; font-weight: bold;">${easc_improvement.toFixed(1)}x</span> longer battery life |
  <strong>Bandwidth:</strong> <span style="color: #0d47a1; font-weight: bold;">${easc_bwReduction.toFixed(1)}x</span> reduction
</p>
</div>`

Common Pitfall: Sampling Rate Mismatch

The mistake: Combining data from sensors with different sampling rates without proper resampling, leading to incorrect correlations and temporal misalignment.

Symptoms:

Correlation analysis shows unexpected null or spurious relationships
Merged datasets have many NaN/missing values at certain timestamps
Time-series plots show “jagged” or misaligned signals
ML models perform poorly despite good individual sensor data

Why it happens: Teams often assume all sensors operate at the same rate. A temperature sensor at 1 Hz combined with a vibration sensor at 100 Hz creates 99 missing values per temperature reading. Naive timestamp matching drops 99% of vibration data.

The fix: Use proper resampling/interpolation before combining:

# Resample high-frequency data to match low-frequency
vibration_1hz = vibration_100hz.resample('1S').mean()
# Or upsample low-frequency with interpolation
temp_100hz = temp_1hz.resample('10ms').interpolate(method='linear')
# Then merge on aligned timestamps
merged = pd.merge_asof(vibration_1hz, temp_1hz, on='timestamp', tolerance=pd.Timedelta('500ms'))

Prevention: Document sampling rates in sensor metadata. Create a data alignment layer that resamples all sources to a common time base before analysis.

Common Pitfall: Edge Buffer Overflow

The mistake: Configuring edge device buffers without considering worst-case scenarios, causing data loss during network outages or traffic spikes.

Symptoms:

Gaps in time-series data after network recovery
“Buffer full, dropping oldest data” warnings in device logs
Critical events missing during high-activity periods
Inconsistent data counts between edge and cloud
Post-incident analysis reveals missing sensor readings

Why it happens: Buffer sizes calculated for average conditions, not peak loads. Network outage duration underestimated. Sensor burst rates during events (motion, vibration) exceed steady-state assumptions. Memory constraints on edge devices force small buffers.

The fix: Size buffers for worst-case, not average:

# Buffer sizing calculation
samples_per_second = 10
max_outage_duration_seconds = 3600  # 1 hour
safety_margin = 1.5

min_buffer_size = samples_per_second * max_outage_duration_seconds * safety_margin
# = 10 * 3600 * 1.5 = 54,000 samples

# If memory-constrained, implement tiered retention:
# - Last 5 minutes: Full resolution
# - 5-60 minutes: 10x downsampled
# - Beyond 60 minutes: Statistical summary only

Buffer Sizing for Outages

Calculate the required buffer size for your edge device based on sampling rate, expected outage duration, and available memory. See how tiered retention can help when memory is constrained.

Show code

viewof buf_samplesPerSec = Inputs.range([1, 1000], {value: 10, step: 1, label: "Samples per Second"})
viewof buf_bytesPerSample = Inputs.select([1, 2, 4, 8], {value: 2, label: "Bytes per Sample"})
viewof buf_maxOutageMin = Inputs.range([5, 480], {value: 60, step: 5, label: "Max Outage Duration (minutes)"})
viewof buf_safetyMargin = Inputs.range([1.0, 3.0], {value: 1.5, step: 0.1, label: "Safety Margin"})
viewof buf_availableRAM = Inputs.range([4, 512], {value: 64, step: 4, label: "Available RAM (KB)"})

Show code

buf_outageSeconds = buf_maxOutageMin * 60
buf_minSamples = Math.ceil(buf_samplesPerSec * buf_outageSeconds * buf_safetyMargin)
buf_minBytes = buf_minSamples * buf_bytesPerSample
buf_minKB = buf_minBytes / 1024
buf_fitsInRAM = buf_minKB <= buf_availableRAM
buf_ramPct = (buf_minKB / buf_availableRAM) * 100

// Tiered retention calculation
buf_tier1Min = 5  // Full resolution for last 5 min
buf_tier1Samples = buf_samplesPerSec * buf_tier1Min * 60
buf_tier1Bytes = buf_tier1Samples * buf_bytesPerSample

buf_tier2Min = Math.min(55, Math.max(0, buf_maxOutageMin - buf_tier1Min))  // 10x downsampled
buf_tier2Samples = Math.ceil((buf_samplesPerSec / 10) * buf_tier2Min * 60)
buf_tier2Bytes = buf_tier2Samples * buf_bytesPerSample

buf_tier3Min = Math.max(0, buf_maxOutageMin - buf_tier1Min - buf_tier2Min)  // Summary only
buf_tier3Windows = Math.ceil(buf_tier3Min / 5)  // 5-min summary windows
buf_tier3Bytes = buf_tier3Windows * 24  // 6 floats per summary

buf_tieredTotal = buf_tier1Bytes + buf_tier2Bytes + buf_tier3Bytes
buf_tieredKB = buf_tieredTotal / 1024
buf_tieredFits = buf_tieredKB <= buf_availableRAM
buf_tieredSavings = buf_minKB > 0 ? ((buf_minKB - buf_tieredKB) / buf_minKB) * 100 : 0

Show code

html`<div style="background: var(--bs-light, #f8f9fa); border: 1px solid var(--bs-border-color, #dee2e6); border-radius: 8px; padding: 1.2rem; color: var(--bs-body-color);">
<h4 style="margin-top:0; color: #2C3E50;">Buffer Sizing Results</h4>
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin-bottom: 1rem;">
  <div style="padding: 0.8rem; border-radius: 6px; text-align: center; background: ${buf_fitsInRAM ? 'rgba(22,160,133,0.08)' : 'rgba(231,76,60,0.08)'};">
    <div style="font-size: 0.8rem; color: #374151;">Full Resolution Buffer</div>
    <div style="font-size: 1.5rem; font-weight: bold; color: ${buf_fitsInRAM ? '#0f6b5c' : '#9f1d1d'};">${buf_minKB < 1024 ? buf_minKB.toFixed(1) + " KB" : (buf_minKB/1024).toFixed(2) + " MB"}</div>
    <div style="font-size: 0.85rem; color: ${buf_fitsInRAM ? '#0f6b5c' : '#9f1d1d'};">${buf_fitsInRAM ? "Fits in RAM (" + buf_ramPct.toFixed(0) + "% used)" : "Exceeds RAM! (" + buf_ramPct.toFixed(0) + "% needed)"}</div>
  </div>
  <div style="padding: 0.8rem; border-radius: 6px; text-align: center; background: ${buf_tieredFits ? 'rgba(22,160,133,0.08)' : 'rgba(230,126,34,0.08)'};">
    <div style="font-size: 0.8rem; color: #374151;">Tiered Retention Buffer</div>
    <div style="font-size: 1.5rem; font-weight: bold; color: ${buf_tieredFits ? '#0f6b5c' : '#8a4a00'};">${buf_tieredKB < 1024 ? buf_tieredKB.toFixed(1) + " KB" : (buf_tieredKB/1024).toFixed(2) + " MB"}</div>
    <div style="font-size: 0.85rem; color: #0d47a1;">${buf_tieredSavings.toFixed(0)}% smaller than full buffer</div>
  </div>
</div>
<table style="width:100%; border-collapse: collapse; font-size: 0.85rem;">
<tr style="background: rgba(44,62,80,0.06);">
  <th style="padding: 5px 8px; text-align: left;">Retention Tier</th>
  <th style="padding: 5px 8px; text-align: center;">Duration</th>
  <th style="padding: 5px 8px; text-align: center;">Resolution</th>
  <th style="padding: 5px 8px; text-align: right;">Bytes</th>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px; color: #0f6b5c;">Full Resolution</td>
  <td style="padding: 5px 8px; text-align: center;">${buf_tier1Min} min</td>
  <td style="padding: 5px 8px; text-align: center;">${buf_samplesPerSec} Hz</td>
  <td style="padding: 5px 8px; text-align: right;">${(buf_tier1Bytes/1024).toFixed(1)} KB</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px; color: #0d47a1;">10x Downsampled</td>
  <td style="padding: 5px 8px; text-align: center;">${buf_tier2Min} min</td>
  <td style="padding: 5px 8px; text-align: center;">${(buf_samplesPerSec/10).toFixed(1)} Hz</td>
  <td style="padding: 5px 8px; text-align: right;">${(buf_tier2Bytes/1024).toFixed(1)} KB</td>
</tr>
<tr>
  <td style="padding: 5px 8px; color: #474091;">Summary Only</td>
  <td style="padding: 5px 8px; text-align: center;">${buf_tier3Min} min</td>
  <td style="padding: 5px 8px; text-align: center;">1 per 5 min</td>
  <td style="padding: 5px 8px; text-align: right;">${(buf_tier3Bytes/1024).toFixed(1)} KB</td>
</tr>
</table>
${!buf_fitsInRAM && buf_tieredFits ? html`<p style="margin-top: 0.8rem; padding: 0.5rem; background: rgba(22,160,133,0.1); border-radius: 4px; font-size: 0.85rem;"><strong>Recommendation:</strong> Full resolution buffer does not fit in ${buf_availableRAM} KB RAM, but tiered retention fits at ${buf_tieredKB.toFixed(1)} KB. Use tiered retention to survive ${buf_maxOutageMin}-minute outages without data loss.</p>` : !buf_fitsInRAM && !buf_tieredFits ? html`<p style="margin-top: 0.8rem; padding: 0.5rem; background: rgba(231,76,60,0.1); border-radius: 4px; font-size: 0.85rem;"><strong>Warning:</strong> Neither strategy fits in ${buf_availableRAM} KB RAM. Reduce outage tolerance, sampling rate, or add external flash storage.</p>` : html`<p style="margin-top: 0.8rem; padding: 0.5rem; background: rgba(52,152,219,0.06); border-radius: 4px; font-size: 0.85rem;"><strong>Good:</strong> Full resolution buffer fits in RAM. Tiered retention is still recommended as a fallback for unexpectedly long outages.</p>`}
</div>`

Prevention: Monitor buffer utilization as a health metric. Alert at 70% capacity. Implement graceful degradation (reduce resolution before dropping data). Test with simulated network outages lasting 2x your expected maximum.

Checkpoint: Reduce Before Transmit

You now know:

Aggregation can replace 60 raw samples with 1 summary when min, max, mean, and count preserve the decision evidence.
Transmission dominates many edge power budgets: the worked LoRa case moves from 35 days to 496 days by sending 4 summaries per hour.
Buffer plans need peak-load math, including the 54,000-sample outage calculation, before teams trust compressed or tiered retention data.

41.6 Compression Algorithms Deep Dive

Time: ~20 min | Difficulty: Advanced | Reference: P10.C08.U03c

Compare Edge Compression Algorithms

Edge devices face a fundamental trade-off: transmit less data (save power, bandwidth, cost) while preserving information needed for downstream analytics. This deep dive compares compression techniques across three dimensions: compression ratio, computational cost, and information preservation.

41.6.1 Compression Algorithm Categories

Category	Compression Ratio	CPU Cost	Information Loss	Best For
Lossless	2:1 - 4:1	Medium	None	Critical data, audit logs
Lossy Statistical	10:1 - 100:1	Low	Controlled	Trend analysis, dashboards
Lossy Transform	50:1 - 500:1	High	Controlled	Pattern detection, ML features
Semantic	100:1 - 1000:1	Low-Medium	Significant	Event detection, alerts

41.6.2 Lossless Compression: DEFLATE/GZIP

Standard lossless compression works well for structured IoT data:

import gzip
import json

def compress_batch(readings: list[dict]) -> bytes:
    """
    Compress a batch of sensor readings losslessly.
    Typical compression: 3-5x for JSON sensor data.
    """
    json_str = json.dumps(readings)
    compressed = gzip.compress(json_str.encode('utf-8'), compresslevel=6)
    return compressed

# Example: 100 temperature readings
readings = [{"ts": 1704067200 + i, "v": 22.5 + (i % 10) * 0.1} for i in range(100)]
raw_size = len(json.dumps(readings).encode())       # ~4,500 bytes
compressed_size = len(compress_batch(readings))     # ~1,200 bytes
# Compression ratio: 3.75:1

Try It: GZIP Compression Estimator

Explore how batch size, data format, and compression level affect lossless GZIP compression of IoT sensor readings.

Show code

viewof gzp_batchSize = Inputs.range([10, 1000], {value: 100, step: 10, label: "Readings per Batch"})
viewof gzp_compLevel = Inputs.range([1, 9], {value: 6, step: 1, label: "GZIP Compression Level"})
viewof gzp_dataFormat = Inputs.select(
  ["JSON (verbose)", "CSV (compact)", "Binary (minimal)"],
  {value: "JSON (verbose)", label: "Data Format"}
)
viewof gzp_valueVariance = Inputs.select(
  ["Low (steady state)", "Medium (gradual drift)", "High (volatile)"],
  {value: "Low (steady state)", label: "Value Variance"}
)

Show code

gzp_bytesPerReading = gzp_dataFormat === "JSON (verbose)" ? 45 : gzp_dataFormat === "CSV (compact)" ? 18 : 10
gzp_rawBytes = gzp_batchSize * gzp_bytesPerReading

// Compression ratio depends on level, format, and variance
gzp_baseRatio = gzp_dataFormat === "JSON (verbose)" ? 3.5 : gzp_dataFormat === "CSV (compact)" ? 2.5 : 1.8
gzp_levelMultiplier = 0.8 + (gzp_compLevel / 9) * 0.4  // 0.8 at level 1, 1.2 at level 9
gzp_varianceMultiplier = gzp_valueVariance === "Low (steady state)" ? 1.3 : gzp_valueVariance === "Medium (gradual drift)" ? 1.0 : 0.7
gzp_effectiveRatio = gzp_baseRatio * gzp_levelMultiplier * gzp_varianceMultiplier
gzp_compressedBytes = Math.round(gzp_rawBytes / gzp_effectiveRatio)

// CPU time estimate (ms on ESP32 at 240 MHz)
gzp_cpuMs = gzp_compLevel <= 3 ? gzp_rawBytes * 0.003 : gzp_compLevel <= 6 ? gzp_rawBytes * 0.006 : gzp_rawBytes * 0.015
gzp_cpuLabel = gzp_compLevel <= 3 ? "Low" : gzp_compLevel <= 6 ? "Medium" : "High"
gzp_energyMj = gzp_cpuMs * 0.1

Show code

html`<div style="background: var(--bs-light, #f8f9fa); border: 1px solid var(--bs-border-color, #dee2e6); border-radius: 8px; padding: 1.2rem; color: var(--bs-body-color);">
<h4 style="margin-top:0; color: #2C3E50;">GZIP Compression Estimate</h4>
<div style="display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 0.8rem; margin-bottom: 1rem;">
  <div style="text-align: center; padding: 0.6rem; background: rgba(231,76,60,0.08); border-radius: 6px;">
    <div style="font-size: 0.8rem; color: #374151;">Raw Size</div>
    <div style="font-size: 1.3rem; font-weight: bold; color: #9f1d1d;">${(gzp_rawBytes/1024).toFixed(1)} KB</div>
  </div>
  <div style="text-align: center; padding: 0.6rem; background: rgba(22,160,133,0.08); border-radius: 6px;">
    <div style="font-size: 0.8rem; color: #374151;">Compressed</div>
    <div style="font-size: 1.3rem; font-weight: bold; color: #0f6b5c;">${(gzp_compressedBytes/1024).toFixed(1)} KB</div>
  </div>
  <div style="text-align: center; padding: 0.6rem; background: rgba(230,126,34,0.08); border-radius: 6px;">
    <div style="font-size: 0.8rem; color: #374151;">Ratio</div>
    <div style="font-size: 1.3rem; font-weight: bold; color: #8a4a00;">${gzp_effectiveRatio.toFixed(1)}:1</div>
  </div>
</div>
<table style="width:100%; border-collapse: collapse; font-size: 0.9rem;">
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Format</strong></td>
  <td style="padding: 5px 8px; text-align: right;">${gzp_dataFormat} (~${gzp_bytesPerReading} B/reading)</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Compression Level</strong></td>
  <td style="padding: 5px 8px; text-align: right;">Level ${gzp_compLevel} (CPU: ${gzp_cpuLabel})</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>ESP32 Compress Time</strong></td>
  <td style="padding: 5px 8px; text-align: right;">${gzp_cpuMs.toFixed(1)} ms</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Energy Cost</strong></td>
  <td style="padding: 5px 8px; text-align: right;">${gzp_energyMj.toFixed(2)} mJ</td>
</tr>
<tr>
  <td style="padding: 5px 8px;"><strong>Information Loss</strong></td>
  <td style="padding: 5px 8px; text-align: right; color: #0f6b5c; font-weight: bold;">None (lossless)</td>
</tr>
</table>
<p style="margin-top: 0.8rem; padding: 0.5rem; background: rgba(52,152,219,0.06); border-radius: 4px; font-size: 0.85rem;">
<strong>Tip:</strong> ${gzp_compLevel <= 3 ? "Low compression levels (1-3) are best for real-time edge processing where CPU budget is tight." : gzp_compLevel <= 6 ? "Level 6 is the default sweet spot -- good compression with moderate CPU cost." : "Levels 7-9 provide diminishing returns. On an ESP32, the extra CPU time may cost more energy than the bandwidth saved."}
${gzp_dataFormat === "JSON (verbose)" ? " JSON's redundant keys and formatting compress well -- consider CSV or binary to reduce raw size before compression." : gzp_dataFormat === "Binary (minimal)" ? " Binary data is already compact -- GZIP yields smaller improvements. Focus on lossy techniques for further reduction." : ""}
</p>
</div>`

Performance characteristics:

Metric	GZIP Level 1	GZIP Level 6	GZIP Level 9
Compression ratio	2.5:1	3.5:1	4:1
Compress speed	50 MB/s	20 MB/s	5 MB/s
Decompress speed	100 MB/s	100 MB/s	100 MB/s
Edge CPU impact	Low	Medium	High

When to use: Audit trails, compliance data, any data that may be queried in original form. Do not use for real-time streams on constrained MCUs.

41.6.3 Lossy Statistical: Aggregation Windows

Compute statistics over time windows, discard raw samples:

import statistics
from dataclasses import dataclass
from typing import Optional

@dataclass
class AggregatedWindow:
    timestamp: int      # Window start
    count: int          # Number of samples
    mean: float
    min_val: float
    max_val: float
    std_dev: float
    p95: Optional[float] = None  # Optional percentile

class WindowAggregator:
    def __init__(self, window_seconds: int = 60):
        self.window_seconds = window_seconds
        self.buffer: list[float] = []
        self.window_start: Optional[int] = None

    def add_sample(self, timestamp: int, value: float) -> Optional[AggregatedWindow]:
        if self.window_start is None:
            self.window_start = timestamp

        # Check if window complete
        if timestamp - self.window_start >= self.window_seconds:
            result = self._compute_aggregate()
            self.buffer = [value]
            self.window_start = timestamp
            return result

        self.buffer.append(value)
        return None

    def _compute_aggregate(self) -> AggregatedWindow:
        sorted_buffer = sorted(self.buffer)
        p95_idx = int(len(sorted_buffer) * 0.95)

        return AggregatedWindow(
            timestamp=self.window_start,
            count=len(self.buffer),
            mean=statistics.mean(self.buffer),
            min_val=min(self.buffer),
            max_val=max(self.buffer),
            std_dev=statistics.stdev(self.buffer) if len(self.buffer) > 1 else 0,
            p95=sorted_buffer[p95_idx] if p95_idx < len(sorted_buffer) else None
        )

# Example: 1 Hz sensor, 60-second windows
# Input: 60 samples x 8 bytes = 480 bytes
# Output: 1 aggregate x 48 bytes = 48 bytes
# Compression ratio: 10:1
# Preserved: Trend (mean), anomaly detection (min/max/std), health (count)

Window Aggregation Calculator

Configure the sensor rate and aggregation window to see how statistical summaries compress continuous data while preserving trend and anomaly detection capability.

Show code

viewof wagg_sensorRate = Inputs.range([1, 100], {value: 1, step: 1, label: "Sensor Sample Rate (Hz)"})
viewof wagg_windowSec = Inputs.range([10, 600], {value: 60, step: 10, label: "Aggregation Window (seconds)"})
viewof wagg_bytesPerSample = Inputs.select([2, 4, 8], {value: 8, label: "Bytes per Raw Sample"})
viewof wagg_includeP95 = Inputs.toggle({value: true, label: "Include P95 percentile"})
viewof wagg_sensorCount = Inputs.range([1, 100], {value: 10, step: 1, label: "Number of Sensors"})

Show code

wagg_samplesPerWindow = wagg_sensorRate * wagg_windowSec
wagg_rawBytes = wagg_samplesPerWindow * wagg_bytesPerSample
// Output: timestamp(4) + count(4) + mean(4) + min(4) + max(4) + std(4) + optional p95(4)
wagg_statsFields = wagg_includeP95 ? 7 : 6
wagg_outputBytes = wagg_statsFields * 4
wagg_ratio = wagg_rawBytes / wagg_outputBytes
wagg_windowsPerHour = 3600 / wagg_windowSec
wagg_rawBytesPerHour = wagg_rawBytes * wagg_windowsPerHour * wagg_sensorCount
wagg_compBytesPerHour = wagg_outputBytes * wagg_windowsPerHour * wagg_sensorCount
wagg_rawDayMB = (wagg_rawBytesPerHour * 24) / (1024 * 1024)
wagg_compDayKB = (wagg_compBytesPerHour * 24) / 1024

Show code

html`<div style="background: var(--bs-light, #f8f9fa); border: 1px solid var(--bs-border-color, #dee2e6); border-radius: 8px; padding: 1.2rem; color: var(--bs-body-color);">
<h4 style="margin-top:0; color: #2C3E50;">Window Aggregation Results</h4>
<table style="width:100%; border-collapse: collapse; font-size: 0.9rem;">
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Samples per Window</strong></td>
  <td style="padding: 5px 8px; text-align: right;">${wagg_samplesPerWindow.toLocaleString()}</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Raw Window Size</strong></td>
  <td style="padding: 5px 8px; text-align: right;">${wagg_rawBytes >= 1024 ? (wagg_rawBytes/1024).toFixed(1) + " KB" : wagg_rawBytes + " bytes"}</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Summary Size</strong> (${wagg_statsFields} fields: min, max, mean, std, count${wagg_includeP95 ? ", p95" : ""})</td>
  <td style="padding: 5px 8px; text-align: right; color: #0f6b5c; font-weight: bold;">${wagg_outputBytes} bytes</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Compression Ratio</strong></td>
  <td style="padding: 5px 8px; text-align: right; color: #8a4a00; font-weight: bold;">${wagg_ratio.toFixed(0)}:1</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Summaries per Hour</strong></td>
  <td style="padding: 5px 8px; text-align: right;">${wagg_windowsPerHour.toFixed(0)} per sensor</td>
</tr>
</table>
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 0.8rem; margin-top: 1rem;">
  <div style="text-align: center; padding: 0.6rem; background: rgba(231,76,60,0.08); border-radius: 6px;">
    <div style="font-size: 0.8rem; color: #374151;">Raw Daily (${wagg_sensorCount} sensors)</div>
    <div style="font-size: 1.3rem; font-weight: bold; color: #9f1d1d;">${wagg_rawDayMB < 1024 ? wagg_rawDayMB.toFixed(1) + " MB" : (wagg_rawDayMB/1024).toFixed(2) + " GB"}</div>
  </div>
  <div style="text-align: center; padding: 0.6rem; background: rgba(22,160,133,0.08); border-radius: 6px;">
    <div style="font-size: 0.8rem; color: #374151;">Aggregated Daily (${wagg_sensorCount} sensors)</div>
    <div style="font-size: 1.3rem; font-weight: bold; color: #0f6b5c;">${wagg_compDayKB < 1024 ? wagg_compDayKB.toFixed(1) + " KB" : (wagg_compDayKB/1024).toFixed(2) + " MB"}</div>
  </div>
</div>
<p style="margin-top: 0.8rem; padding: 0.5rem; background: rgba(52,152,219,0.06); border-radius: 4px; font-size: 0.85rem;">
<strong>What's preserved:</strong> Mean (trend tracking), Min/Max (boundary alerts), Std Dev (stability monitoring), Count (data health)${wagg_includeP95 ? ", P95 (tail behavior)" : ""}. <strong>What's lost:</strong> Individual sample timing, exact value sequences, sub-window transient patterns.
</p>
</div>`

Information loss analysis:

What’s Preserved	What’s Lost
Average value (trend)	Individual sample timing
Min/max (bounds)	Exact sequence of values
Standard deviation (stability)	Sub-window patterns
Sample count (health)	Correlation with other sensors at sample level

When to use: Temperature, humidity, air quality - any slowly changing signal where trends matter more than exact samples.

41.6.4 Lossy Transform: FFT-Based Compression

Transform to frequency domain, keep only significant components:

import numpy as np
from dataclasses import dataclass

@dataclass
class FFTCompressed:
    timestamp: int
    sample_rate: float
    duration: float
    frequencies: list[float]    # Top N frequency components
    magnitudes: list[float]     # Corresponding magnitudes
    phases: list[float]         # Phase angles for reconstruction

def fft_compress(samples: np.ndarray, sample_rate: float,
                 timestamp: int, top_n: int = 10) -> FFTCompressed:
    """
    Compress time-series data using FFT, keeping top N frequency components.

    Typical compression: 100:1 to 500:1 depending on signal complexity.
    Best for: Vibration, audio, periodic signals.
    """
    # Compute FFT
    fft_result = np.fft.rfft(samples)
    freqs = np.fft.rfftfreq(len(samples), 1/sample_rate)

    # Get magnitudes and find top N (excluding DC component)
    magnitudes = np.abs(fft_result[1:])  # Skip DC
    phases = np.angle(fft_result[1:])
    freqs = freqs[1:]

    # Select top N by magnitude
    top_indices = np.argsort(magnitudes)[-top_n:]

    return FFTCompressed(
        timestamp=timestamp,
        sample_rate=sample_rate,
        duration=len(samples) / sample_rate,
        frequencies=freqs[top_indices].tolist(),
        magnitudes=magnitudes[top_indices].tolist(),
        phases=phases[top_indices].tolist()
    )

def fft_decompress(compressed: FFTCompressed, num_samples: int) -> np.ndarray:
    """
    Reconstruct signal from FFT components (lossy reconstruction).
    """
    t = np.linspace(0, compressed.duration, num_samples)
    signal = np.zeros(num_samples)

    for freq, mag, phase in zip(compressed.frequencies,
                                 compressed.magnitudes,
                                 compressed.phases):
        signal += mag * np.cos(2 * np.pi * freq * t + phase)

    return signal

# Example: Vibration sensor, 1 second at 1000 Hz
samples = np.sin(2*np.pi*50*np.linspace(0, 1, 1000))  # 50 Hz signal
samples += 0.3 * np.sin(2*np.pi*150*np.linspace(0, 1, 1000))  # 150 Hz harmonic

# Input: 1000 samples x 4 bytes = 4000 bytes
# Output: 10 freq-mag-phase tuples x 12 bytes = 120 bytes + 20 bytes metadata
# Compression ratio: ~30:1

compressed = fft_compress(samples, 1000.0, 1704067200, top_n=10)
reconstructed = fft_decompress(compressed, 1000)

# Reconstruction error for this example: ~5% RMS
# Bearing fault detection: Still works (frequency peaks preserved)

Try It: FFT Compression Explorer

Explore how FFT-based compression trades off between keeping more frequency components (higher fidelity) and achieving higher compression. Adjust the signal parameters and number of retained components.

Show code

viewof fftc_sampleRate = Inputs.range([100, 5000], {value: 1000, step: 100, label: "Sample Rate (Hz)"})
viewof fftc_windowSec = Inputs.range([0.5, 10], {value: 1.0, step: 0.5, label: "Window Duration (sec)"})
viewof fftc_topN = Inputs.range([1, 50], {value: 10, step: 1, label: "Top-N Frequency Components Kept"})
viewof fftc_bytesPerSample = Inputs.select([2, 4], {value: 4, label: "Bytes per Raw Sample"})
viewof fftc_signalType = Inputs.select(
  ["Simple (2 freqs)", "Complex (5 freqs)", "Broadband (20 freqs)"],
  {value: "Simple (2 freqs)", label: "Signal Complexity"}
)

Show code

fftc_numSamples = Math.round(fftc_sampleRate * fftc_windowSec)
fftc_rawBytes = fftc_numSamples * fftc_bytesPerSample
fftc_compBytes = fftc_topN * 12 + 20  // 3 floats (freq/mag/phase) x 4 bytes + metadata
fftc_ratio = fftc_rawBytes / fftc_compBytes
fftc_freqRes = fftc_sampleRate / fftc_numSamples
fftc_maxFreq = fftc_sampleRate / 2

fftc_numSignalFreqs = fftc_signalType === "Simple (2 freqs)" ? 2 : fftc_signalType === "Complex (5 freqs)" ? 5 : 20
fftc_captureRatio = Math.min(100, (fftc_topN / fftc_numSignalFreqs) * 100)
fftc_qualityLabel = fftc_captureRatio >= 100 ? "Lossless (all components captured)" : fftc_captureRatio >= 80 ? "High fidelity (most components captured)" : fftc_captureRatio >= 50 ? "Moderate fidelity (key patterns preserved)" : "Low fidelity (significant detail lost)"
fftc_qualityColor = fftc_captureRatio >= 100 ? "#0f6b5c" : fftc_captureRatio >= 80 ? "#0d47a1" : fftc_captureRatio >= 50 ? "#8a4a00" : "#9f1d1d"

Show code

html`<div style="background: var(--bs-light, #f8f9fa); border: 1px solid var(--bs-border-color, #dee2e6); border-radius: 8px; padding: 1.2rem; color: var(--bs-body-color);">
<h4 style="margin-top:0; color: #2C3E50;">FFT Compression Analysis</h4>
<div style="display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 0.8rem; margin-bottom: 1rem;">
  <div style="text-align: center; padding: 0.6rem; background: rgba(231,76,60,0.08); border-radius: 6px;">
    <div style="font-size: 0.8rem; color: #374151;">Raw Window</div>
    <div style="font-size: 1.3rem; font-weight: bold; color: #9f1d1d;">${(fftc_rawBytes/1024).toFixed(1)} KB</div>
    <div style="font-size: 0.8rem; color: #374151;">${fftc_numSamples.toLocaleString()} samples</div>
  </div>
  <div style="text-align: center; padding: 0.6rem; background: rgba(22,160,133,0.08); border-radius: 6px;">
    <div style="font-size: 0.8rem; color: #374151;">FFT Output</div>
    <div style="font-size: 1.3rem; font-weight: bold; color: #0f6b5c;">${fftc_compBytes} B</div>
    <div style="font-size: 0.8rem; color: #374151;">${fftc_topN} components</div>
  </div>
  <div style="text-align: center; padding: 0.6rem; background: rgba(230,126,34,0.08); border-radius: 6px;">
    <div style="font-size: 0.8rem; color: #374151;">Compression</div>
    <div style="font-size: 1.3rem; font-weight: bold; color: #8a4a00;">${fftc_ratio.toFixed(0)}:1</div>
    <div style="font-size: 0.8rem; color: #374151;">${(100 - 100/fftc_ratio).toFixed(1)}% reduction</div>
  </div>
</div>
<table style="width:100%; border-collapse: collapse; font-size: 0.9rem;">
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Frequency Resolution</strong></td>
  <td style="padding: 5px 8px; text-align: right;">${fftc_freqRes.toFixed(2)} Hz/bin</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Max Detectable Frequency</strong></td>
  <td style="padding: 5px 8px; text-align: right;">${fftc_maxFreq.toLocaleString()} Hz</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Signal Frequencies</strong></td>
  <td style="padding: 5px 8px; text-align: right;">${fftc_numSignalFreqs} components (${fftc_signalType})</td>
</tr>
<tr>
  <td style="padding: 5px 8px;"><strong>Reconstruction Quality</strong></td>
  <td style="padding: 5px 8px; text-align: right; color: ${fftc_qualityColor}; font-weight: bold;">${fftc_qualityLabel}</td>
</tr>
</table>
<div style="margin-top: 0.8rem; background: rgba(52,152,219,0.06); border-radius: 4px; padding: 0.5rem; overflow: hidden;">
  <div style="font-size: 0.8rem; color: #374151; margin-bottom: 4px;">Frequency components captured: ${Math.min(fftc_topN, fftc_numSignalFreqs)} / ${fftc_numSignalFreqs}</div>
  <div style="height: 12px; background: rgba(44,62,80,0.1); border-radius: 6px; overflow: hidden;">
    <div style="height: 100%; width: ${Math.min(100, fftc_captureRatio)}%; background: ${fftc_qualityColor}; border-radius: 6px; transition: width 0.3s;"></div>
  </div>
</div>
</div>`

When to use: Vibration analysis, acoustic monitoring, any signal where frequency content matters more than exact waveform.

41.6.5 Semantic Compression: Event Extraction

Highest compression, but requires domain knowledge:

from dataclasses import dataclass
from enum import Enum
from typing import Optional

class EventType(Enum):
    THRESHOLD_EXCEEDED = "threshold_exceeded"
    ANOMALY_DETECTED = "anomaly_detected"
    STATE_CHANGE = "state_change"
    PERIODIC_SUMMARY = "periodic_summary"

@dataclass
class SemanticEvent:
    timestamp: int
    device_id: str
    event_type: EventType
    value: float
    context: dict  # Additional info (threshold, previous state, etc.)

class SemanticCompressor:
    def __init__(self, device_id: str, threshold_high: float,
                 threshold_low: float, anomaly_std_factor: float = 3.0):
        self.device_id = device_id
        self.threshold_high = threshold_high
        self.threshold_low = threshold_low
        self.anomaly_std_factor = anomaly_std_factor
        self.history: list[float] = []
        self.last_state: Optional[str] = None
        self.summary_count = 0
        self.summary_sum = 0.0

    def process_sample(self, timestamp: int, value: float) -> list[SemanticEvent]:
        """
        Process a sample and return events (if any).
        Most samples produce NO events - that's the compression.
        """
        events = []

        # Update history for anomaly detection
        self.history.append(value)
        if len(self.history) > 100:
            self.history.pop(0)

        # Track for periodic summary
        self.summary_count += 1
        self.summary_sum += value

        # Check threshold crossing
        current_state = "normal"
        if value > self.threshold_high:
            current_state = "high"
        elif value < self.threshold_low:
            current_state = "low"

        if current_state != self.last_state and self.last_state is not None:
            events.append(SemanticEvent(
                timestamp=timestamp,
                device_id=self.device_id,
                event_type=EventType.STATE_CHANGE,
                value=value,
                context={
                    "previous_state": self.last_state,
                    "new_state": current_state
                }
            ))
        self.last_state = current_state

        # Check for statistical anomaly
        if len(self.history) >= 20:
            mean = sum(self.history) / len(self.history)
            std = (sum((x - mean)**2 for x in self.history) / len(self.history)) ** 0.5
            if std > 0 and abs(value - mean) > self.anomaly_std_factor * std:
                events.append(SemanticEvent(
                    timestamp=timestamp,
                    device_id=self.device_id,
                    event_type=EventType.ANOMALY_DETECTED,
                    value=value,
                    context={
                        "mean": mean,
                        "std": std,
                        "z_score": (value - mean) / std
                    }
                ))

        return events

    def get_periodic_summary(self, timestamp: int) -> SemanticEvent:
        """Call every N minutes to send a heartbeat/summary."""
        avg = self.summary_sum / self.summary_count if self.summary_count > 0 else 0
        event = SemanticEvent(
            timestamp=timestamp,
            device_id=self.device_id,
            event_type=EventType.PERIODIC_SUMMARY,
            value=avg,
            context={
                "sample_count": self.summary_count,
                "period_seconds": 300  # 5 minutes
            }
        )
        self.summary_count = 0
        self.summary_sum = 0.0
        return event

# Example: Temperature sensor, 1 sample/second
# Normal operation: 0 events per sample
# State change: 1 event (~100 bytes)
# 5-minute summary: 1 event (~80 bytes)
#
# Input: 300 samples x 8 bytes = 2400 bytes per 5 minutes
# Output: 1 summary + maybe 0-2 events = 80-280 bytes
# Compression ratio: 10:1 to 30:1 (varies by activity)

Event Compression Simulator

Configure thresholds and signal behavior to see how semantic compression extracts only meaningful events from a continuous sensor stream. Most samples produce zero events – that is the compression.

Show code

viewof semc_threshHigh = Inputs.range([25, 40], {value: 30, step: 0.5, label: "High Threshold (C)"})
viewof semc_threshLow = Inputs.range([10, 25], {value: 18, step: 0.5, label: "Low Threshold (C)"})
viewof semc_baseline = Inputs.range([15, 35], {value: 22, step: 0.5, label: "Baseline Temperature (C)"})
viewof semc_spikeChance = Inputs.range([0, 20], {value: 3, step: 1, label: "Anomaly Spike Chance (%)"})
viewof semc_summaryMin = Inputs.range([1, 15], {value: 5, step: 1, label: "Summary Interval (minutes)"})
viewof semc_simDuration = Inputs.range([5, 60], {value: 15, step: 5, label: "Simulation Duration (minutes)"})

Show code

semc_totalSamples = semc_simDuration * 60  // 1 Hz sampling
semc_rawBytes = semc_totalSamples * 8
semc_summaryEvents = Math.floor(semc_simDuration / semc_summaryMin)
// Simulate: each sample has semc_spikeChance% of crossing threshold
semc_expectedSpikes = Math.round(semc_totalSamples * (semc_spikeChance / 100))
semc_stateChanges = Math.min(semc_expectedSpikes * 2, semc_totalSamples)  // enter + exit spike
semc_anomalyEvents = Math.round(semc_expectedSpikes * 0.3)  // 30% of spikes also flag as anomaly
semc_totalEvents = semc_summaryEvents + semc_stateChanges + semc_anomalyEvents
semc_eventBytes = semc_summaryEvents * 80 + semc_stateChanges * 100 + semc_anomalyEvents * 120
semc_ratio = semc_eventBytes > 0 ? semc_rawBytes / semc_eventBytes : semc_rawBytes
semc_pctTransmit = (semc_totalEvents / semc_totalSamples) * 100

Show code

html`<div style="background: var(--bs-light, #f8f9fa); border: 1px solid var(--bs-border-color, #dee2e6); border-radius: 8px; padding: 1.2rem; color: var(--bs-body-color);">
<h4 style="margin-top:0; color: #2C3E50;">Semantic Compression Results</h4>
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin-bottom: 1rem;">
  <div style="padding: 0.6rem; background: rgba(231,76,60,0.08); border-radius: 6px; text-align: center;">
    <div style="font-size: 0.8rem; color: #374151;">Raw Input</div>
    <div style="font-size: 1.5rem; font-weight: bold; color: #9f1d1d;">${semc_totalSamples.toLocaleString()} samples</div>
    <div style="font-size: 0.85rem; color: #374151;">${(semc_rawBytes/1024).toFixed(1)} KB</div>
  </div>
  <div style="padding: 0.6rem; background: rgba(22,160,133,0.08); border-radius: 6px; text-align: center;">
    <div style="font-size: 0.8rem; color: #374151;">Events Extracted</div>
    <div style="font-size: 1.5rem; font-weight: bold; color: #0f6b5c;">${semc_totalEvents} events</div>
    <div style="font-size: 0.85rem; color: #374151;">${semc_eventBytes} bytes</div>
  </div>
</div>
<table style="width:100%; border-collapse: collapse; font-size: 0.9rem;">
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;">Periodic Summaries (every ${semc_summaryMin} min)</td>
  <td style="padding: 5px 8px; text-align: right; color: #0d47a1;">${semc_summaryEvents} events</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;">State Change Events (threshold crossings)</td>
  <td style="padding: 5px 8px; text-align: right; color: #8a4a00;">${semc_stateChanges} events</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;">Anomaly Detections (statistical outliers)</td>
  <td style="padding: 5px 8px; text-align: right; color: #474091;">${semc_anomalyEvents} events</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Compression Ratio</strong></td>
  <td style="padding: 5px 8px; text-align: right; font-weight: bold; color: #0f6b5c;">${semc_ratio.toFixed(0)}:1</td>
</tr>
<tr>
  <td style="padding: 5px 8px;"><strong>Samples Producing Events</strong></td>
  <td style="padding: 5px 8px; text-align: right;">${semc_pctTransmit.toFixed(1)}% of samples</td>
</tr>
</table>
<p style="margin-top: 0.8rem; padding: 0.5rem; background: rgba(155,89,182,0.08); border-radius: 4px; font-size: 0.85rem;">
<strong>Insight:</strong> ${semc_pctTransmit < 5 ? "With only " + semc_pctTransmit.toFixed(1) + "% of samples generating events, semantic compression is highly effective for this signal -- most of the data is 'nothing happening'." : semc_pctTransmit < 20 ? "Moderate event rate at " + semc_pctTransmit.toFixed(1) + "% -- semantic compression is still beneficial but consider lowering anomaly sensitivity or widening thresholds." : "High event rate at " + semc_pctTransmit.toFixed(1) + "% -- thresholds may be too tight. Consider widening the normal range or the signal may genuinely be volatile, in which case statistical aggregation may be more appropriate."}
</p>
</div>`

When to use: Monitoring systems where “nothing happening” is the common case. Alarm systems, threshold monitoring, sparse event streams.

41.6.6 Algorithm Selection Decision Tree

Decision tree guiding selection of compression algorithms for edge IoT devices based on signal type, data fidelity requirements, and computational constraints — Figure 41.1: Edge Data Compression Algorithm Selection Decision Tree

41.6.7 Benchmark Results: ESP32 Edge Device

Real measurements on ESP32-WROOM-32 (240 MHz, 520KB RAM):

41.6.7.1 Raw JSON

1000 samples: -
Compress time: 15 ms (serialize)
Output size: 28,000 bytes
Power: 2.4 mJ

41.6.7.2 GZIP-6

1000 samples: 28,000 bytes
Compress time: 85 ms
Output size: 8,200 bytes
Power: 8.5 mJ

41.6.7.3 Window aggregation

1000 samples: 8 bytes/sample
Compress time: 2 ms
Output size: 48 bytes
Power: 0.4 mJ

41.6.7.4 FFT Top-10

1000 samples: 4 bytes/sample
Compress time: 45 ms
Output size: 140 bytes
Power: 5.0 mJ

41.6.7.5 Semantic

1000 samples: 8 bytes/sample
Compress time: 3 ms
Output size: 0-100 bytes
Power: 0.5 mJ

Key insight: For battery-powered edge devices, window aggregation offers the best power efficiency. FFT is valuable when frequency content matters, but the CPU cost is significant. Semantic compression is ideal for sparse event streams.

41.6.8 Memory Constraints on Edge Devices

Compression algorithms have memory overhead. Consider carefully on constrained devices:

Algorithm	RAM Required	Notes
GZIP	32-64 KB	Sliding window + Huffman tables
Window Agg	<1 KB	Just buffer for current window
FFT (1024 pt)	16 KB	Complex float buffer + twiddle factors
FFT (4096 pt)	64 KB	May not fit on small MCUs
Semantic	2-4 KB	History buffer + state

ESP32 recommendation: Use window aggregation or semantic compression as primary strategy. Reserve FFT for specific signals where frequency analysis is required.

41.7 Common Compression Pitfalls

Over-Aggressive Lossy Compression

The Mistake: Applying high compression ratios uniformly across all sensor data without understanding which information is critical for downstream analytics, permanently destroying signals needed for root cause analysis.

Why It Happens: Bandwidth costs drive aggressive compression targets. Teams optimize for average case without considering anomaly detection requirements. Compression algorithms are chosen based on benchmark performance rather than domain-specific information preservation. The “we can always collect more data later” assumption fails for non-reproducible events.

The Fix: Profile your analytics requirements before choosing compression. For predictive maintenance, preserve frequency-domain information (use FFT compression, not just statistics). For threshold alerting, min/max preservation is critical. For trend analysis, mean and standard deviation suffice. Implement tiered compression: full resolution for anomalies detected locally, heavy compression for steady-state readings. Always retain enough information to answer “why did this alert trigger?” after the fact.

Pitfall: Compression Without Metadata

The Mistake: Compressing sensor data without preserving the metadata needed to decompress or interpret it correctly, creating files that cannot be decoded weeks or months later.

Why It Happens: Metadata seems redundant during development when context is fresh. Schema documentation is maintained separately and drifts over time. Edge device memory constraints pressure developers to strip every unnecessary byte. Compression parameters are hardcoded rather than embedded in output.

The Fix: Always include compression metadata in the payload or use self-describing formats. For FFT compression, include sample rate, window size, and which frequency bins are transmitted. For statistical aggregation, include sample count, window duration, and timestamp precision. Use envelope formats that version the compression scheme: {"compression": "fft-v2", "params": {...}, "data": [...]}. Maintain a compression schema registry that maps version identifiers to decompression algorithms.

Compression Compute Cost

The Mistake: Selecting compression algorithms based purely on compression ratio without accounting for CPU time and energy cost on battery-powered edge devices, resulting in net-negative energy savings.

Why It Happens: Compression benchmarks on desktop hardware show impressive ratios with negligible CPU time. The 1000x difference in computational efficiency between an ESP32 and a laptop is underestimated. Energy cost of computation versus transmission varies by network type (Wi-Fi is cheap to transmit, LoRa is expensive). Algorithm selection copied from cloud/server contexts.

The Fix: Measure end-to-end energy consumption: E_total = E_compute + E_transmit. For LoRaWAN devices where transmission costs 100+ mJ per packet, aggressive compression (even expensive algorithms) saves energy. For Wi-Fi devices where transmission costs 1-5 mJ per packet, simple aggregation beats complex compression. Profile specific algorithms on your target MCU: GZIP on ESP32 consumes 8.5 mJ for 1000 samples versus 0.4 mJ for window aggregation. Choose the algorithm that minimizes total energy, not just bytes transmitted.

Checkpoint: Match Compression to Evidence

You now know:

Lossless compression preserves audit data, statistical aggregation preserves trends, FFT preserves frequency peaks, and semantic extraction preserves meaningful events.
ESP32 limits matter: the benchmark section compares 520 KB RAM, GZIP state, FFT buffers, CPU time, and power rather than bytes alone.
Compression metadata is part of the evidence path because a compact payload that cannot be decoded later is not trustworthy telemetry.

41.8 Industrial Edge Pipeline Check

Factory Vibration Monitoring

Your manufacturing plant monitors 50 critical machines using vibration sensors to detect bearing failures before catastrophic breakdown. Each sensor must detect frequencies up to 200 Hz (bearing defects manifest at 50-200 Hz harmonics).

System constraints:

Sensor: MEMS accelerometer (+/-16g range)
Edge compute: ESP32 gateway with 4MB flash, 520KB RAM
Network: 4G cellular with 10 GB/month data cap ($0.10/GB overage)
Requirement: Detect anomalies within 1 minute, minimize bandwidth costs

Current naive approach:

Sample at 500 Hz (meets Nyquist: 2 x 200 Hz)
Stream raw data to cloud continuously
Result: 500 samples/sec x 2 bytes x 50 sensors = 50 KB/s = 129 GB/month ($11.90 overage!)

41.8.1 Data Reduction Tradeoffs

Which edge processing strategy best balances anomaly detection accuracy, bandwidth costs, and latency?

41.8.1.1 A. Raw streaming

Data transmitted: 5000 samples/10s
Bandwidth cost: 129 GB/month ($11.90)
Detection latency: Real-time (<1s)
Information loss: None (full fidelity)

41.8.1.2 B. Downsample to 100 Hz

Data transmitted: 1000 samples/10s
Bandwidth cost: 26 GB/month ($1.60)
Detection latency: Real-time (<1s)
Information loss: Loses 200+ Hz information (aliasing risk)

41.8.1.3 C. Time-domain stats

Data transmitted: 6 values/10s (min/max/mean/std/peak/RMS)
Bandwidth cost: 0.15 GB/month ($0)
Detection latency: 10 seconds
Information loss: Loses frequency information, so bearing harmonics disappear

41.8.1.4 D. FFT + compression

Data transmitted: 10 FFT bins/10s
Bandwidth cost: 0.26 GB/month ($0)
Detection latency: 10 seconds
Information loss: Preserves the 50-200 Hz frequency information that matters here

41.8.2 Edge FFT Cuts Bandwidth

Option D (FFT + compression) achieves 500x bandwidth reduction while preserving anomaly detection capability:

How it works:

# Edge processing pipeline (runs on ESP32 every 10 seconds)
def vibration_pipeline():
    # 1. Collect 10 seconds of data
    samples = collect_samples(rate=500, duration=10)  # 5000 samples

    # 2. Apply FFT (frequency-domain analysis)
    fft_result = numpy.fft.rfft(samples)  # -> 2500 frequency bins

    # 3. Extract critical frequency bins
    #    Bin width = 500 Hz / 5000 samples = 0.1 Hz per bin
    #    Bin index = frequency / bin_width
    bin_width = 500.0 / 5000  # 0.1 Hz per bin
    target_freqs = [50, 80, 110, 140, 170, 200]  # Hz
    bins = [fft_result[int(f / bin_width)] for f in target_freqs]
    # bins at indices [500, 800, 1100, 1400, 1700, 2000]
    # + 4 more bins for comprehensive coverage

    # 4. Transmit 10 values instead of 5000
    transmit_to_cloud(bins)  # 20 bytes vs 10,000 bytes

    return bins

# Data reduction: 5000 samples -> 10 FFT bins = 500x compression

Try It: Edge FFT Pipeline Data Reduction

Adjust the vibration sensor parameters and FFT settings to see how edge FFT compression reduces bandwidth for factory monitoring.

Show code

viewof vpip_sensorCount = Inputs.range([1, 200], {value: 50, step: 1, label: "Number of Sensors"})
viewof vpip_sampleRate = Inputs.range([100, 5000], {value: 500, step: 100, label: "Sample Rate (Hz)"})
viewof vpip_windowSec = Inputs.range([1, 60], {value: 10, step: 1, label: "FFT Window (seconds)"})
viewof vpip_topN = Inputs.range([3, 50], {value: 10, step: 1, label: "Top-N FFT Bins Transmitted"})
viewof vpip_bytesPerSample = Inputs.select([2, 4], {value: 2, label: "Bytes per Sample"})

Show code

vpip_samplesPerWindow = vpip_sampleRate * vpip_windowSec
vpip_rawBytesPerWindow = vpip_samplesPerWindow * vpip_bytesPerSample
vpip_compBytesPerWindow = vpip_topN * 12 + 20  // 12 bytes per freq/mag/phase tuple + metadata
vpip_ratio = vpip_rawBytesPerWindow / vpip_compBytesPerWindow
vpip_freqResolution = vpip_sampleRate / vpip_samplesPerWindow

vpip_rawBpsAll = (vpip_rawBytesPerWindow / vpip_windowSec) * vpip_sensorCount
vpip_compBpsAll = (vpip_compBytesPerWindow / vpip_windowSec) * vpip_sensorCount
vpip_rawMonthGB = (vpip_rawBpsAll * 86400 * 30) / (1024 * 1024 * 1024)
vpip_compMonthGB = (vpip_compBpsAll * 86400 * 30) / (1024 * 1024 * 1024)
vpip_dataCap = 10
vpip_overageCost = 0.10
vpip_rawOverage = Math.max(0, vpip_rawMonthGB - vpip_dataCap) * vpip_overageCost
vpip_compOverage = Math.max(0, vpip_compMonthGB - vpip_dataCap) * vpip_overageCost

Show code

html`<div style="background: var(--bs-light, #f8f9fa); border: 1px solid var(--bs-border-color, #dee2e6); border-radius: 8px; padding: 1.2rem; color: var(--bs-body-color);">
<h4 style="margin-top:0; color: #2C3E50;">Edge FFT Pipeline Results</h4>
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin-bottom: 1rem;">
  <div style="padding: 0.8rem; background: rgba(231,76,60,0.08); border-radius: 6px; text-align: center;">
    <div style="font-size: 0.8rem; color: #374151;">Raw Streaming</div>
    <div style="font-size: 1.6rem; font-weight: bold; color: #9f1d1d;">${vpip_rawMonthGB.toFixed(1)} GB/mo</div>
    <div style="font-size: 0.85rem; color: ${vpip_rawOverage > 0 ? '#9f1d1d' : '#0f6b5c'};">${vpip_rawOverage > 0 ? "Overage: $" + vpip_rawOverage.toFixed(2) : "Within data cap"}</div>
  </div>
  <div style="padding: 0.8rem; background: rgba(22,160,133,0.08); border-radius: 6px; text-align: center;">
    <div style="font-size: 0.8rem; color: #374151;">FFT Compressed</div>
    <div style="font-size: 1.6rem; font-weight: bold; color: #0f6b5c;">${vpip_compMonthGB < 1 ? (vpip_compMonthGB * 1024).toFixed(1) + " MB/mo" : vpip_compMonthGB.toFixed(2) + " GB/mo"}</div>
    <div style="font-size: 0.85rem; color: ${vpip_compOverage > 0 ? '#9f1d1d' : '#0f6b5c'};">${vpip_compOverage > 0 ? "Overage: $" + vpip_compOverage.toFixed(2) : "Within data cap"}</div>
  </div>
</div>
<table style="width:100%; border-collapse: collapse; font-size: 0.9rem;">
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Samples per FFT Window</strong></td>
  <td style="padding: 5px 8px; text-align: right;">${vpip_samplesPerWindow.toLocaleString()}</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Frequency Resolution</strong></td>
  <td style="padding: 5px 8px; text-align: right;">${vpip_freqResolution.toFixed(2)} Hz/bin</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 5px 8px;"><strong>Per-Window Compression</strong></td>
  <td style="padding: 5px 8px; text-align: right;">${vpip_rawBytesPerWindow.toLocaleString()} B → ${vpip_compBytesPerWindow} B = <strong style="color: #8a4a00;">${vpip_ratio.toFixed(0)}x</strong></td>
</tr>
<tr>
  <td style="padding: 5px 8px;"><strong>Monthly Savings</strong> (${vpip_sensorCount} sensors, 10 GB cap)</td>
  <td style="padding: 5px 8px; text-align: right; color: #0d47a1; font-weight: bold;">${(vpip_rawMonthGB - vpip_compMonthGB).toFixed(1)} GB saved</td>
</tr>
</table>
</div>`

Why this works for anomaly detection:

Bearing failure signatures live in frequency domain: Healthy bearing = smooth spectrum. Failing bearing = spikes at harmonics (80 Hz, 160 Hz for 2400 RPM machine).
No information loss for anomaly detection: Cloud ML model trained on FFT bins, not raw waveforms. Accuracy: 94% (FFT bins) vs 96% (raw samples, not worth 500x bandwidth cost).
Latency acceptable: 10-second aggregation + 2-second transmission = 12 seconds total (well under 1-minute requirement).
Cost savings: 0.26 GB/month stays under data cap! (vs $11.90 overage for raw streaming).

The factory example turns the algorithm comparison into an operating choice: keep the frequency evidence, discard the raw waveform volume, and check latency against the maintenance requirement.

Checkpoint: Factory Vibration Choice

You now know:

The naive 50-sensor approach streams 500 Hz raw data into 129 GB/month, exceeding the 10 GB/month cap.
Downsampling and time-domain statistics save bandwidth but remove the 50-200 Hz fault evidence the bearing model needs.
FFT plus compression sends 10 bins every 10 seconds, cuts bandwidth by 500x, and keeps total latency under the 1-minute requirement.

Vibration Sampling for Bearings

Scenario: A manufacturing facility wants to detect bearing faults in motors running at 1800 RPM. Bearing defects produce vibration frequencies at harmonics of the motor speed. You need to determine the minimum sampling rate to capture fault signatures.

Given:

Motor speed: 1800 RPM = 30 Hz (revolutions per second)
Bearing fault frequencies:
- 1x: 30 Hz (fundamental, imbalance)
- 2x: 60 Hz (misalignment)
- 3x: 90 Hz (looseness)
- 5x: 150 Hz (bearing outer race defect)
- 7x: 210 Hz (bearing inner race defect)
Highest frequency of interest: 210 Hz (7th harmonic)

Question: What is the minimum sampling rate required, and what sampling rate should you actually use in practice?

Solution:

Step 1: Apply Nyquist theorem

Minimum sampling rate = 2 × highest frequency

f_sample_min = 2 × 210 Hz = 420 Hz

Step 2: Calculate practical sampling rate with safety margin

Industry practice: Use 2.5x to 3x Nyquist for anti-aliasing filter roll-off

f_sample_recommended = 2.5 × 420 Hz = 1,050 Hz
f_sample_practical = 3 × 420 Hz = 1,260 Hz

Round up to convenient power-of-2 or decade value: 1,280 Hz or 1,000 Hz

Step 3: Verify no aliasing occurs

Check if any harmonic would alias into the measurement band: - At 1,000 Hz sampling, Nyquist frequency = 500 Hz - All fault frequencies (30-210 Hz) are below 500 Hz ✓ - No aliasing! All harmonics are correctly captured.

Step 4: Calculate data volume

Single sensor: - Sample rate: 1,000 Hz - Data size: 2 bytes per sample (16-bit ADC) - Data rate: 1,000 × 2 = 2,000 bytes/sec = 2 KB/sec - Daily data: 2 KB/sec × 86,400 sec = 172.8 MB/day

For 100 motors: - Daily data: 100 × 172.8 MB = 17.28 GB/day - Monthly data: 17.28 × 30 = 518.4 GB/month

Step 5: Apply edge FFT compression

Rather than stream raw waveforms to cloud:

# Edge processing every 10 seconds
samples_per_window = 1000 Hz × 10 sec = 10,000 samples

# Perform FFT
fft_result = fft(samples)  # 5,000 frequency bins (real FFT)

# Extract only the bins of interest (fault frequencies)
fault_bins = [30, 60, 90, 150, 210]  # Hz
bin_width = 1000 Hz / 10000 samples = 0.1 Hz per bin
selected_bins = [int(f / bin_width) for f in fault_bins]
# Result: bins [300, 600, 900, 1500, 2100]

# Transmit only these 5 bins (10 bytes) instead of 20,000 bytes
compressed_data = [fft_result[b] for b in selected_bins]
compression_ratio = 20,000 / 10 = 2,000x

Step 6: Calculate bandwidth savings

Without compression: - 100 motors × 17.28 GB/day = 1.728 TB/day

With edge FFT compression (2,000x): - 1.728 TB / 2,000 = 864 MB/day

Cost savings:

Cloud ingress: $0.09/GB
Uncompressed: 1,728 GB × $0.09 = $155/day = $56,575/year
Compressed: 0.864 GB × $0.09 = $0.08/day = $29/year
Savings: $56,546/year

Key Insight: For vibration analysis, sample at 2.5-3x Nyquist (not just 2x minimum) to allow for anti-aliasing filter roll-off. Then apply edge FFT compression by transmitting only the frequency bins of interest (harmonics), achieving 1,000-10,000x data reduction while preserving all fault detection capability. The edge gateway does the heavy computation; the cloud receives only the diagnostic features.

Choose Edge Compression

Choose the appropriate compression strategy based on signal characteristics, edge compute capabilities, and analytical requirements:

Signal Type	Recommended Compression	Typical Ratio	Edge CPU	Bandwidth	Information Loss	Best For
Slowly changing continuous (temperature, humidity)	Statistical aggregation (min/max/mean/std)	100-1000x	Very Low (1% CPU)	99%+ reduction	Loses individual samples, keeps trends	Environmental monitoring, agriculture
Periodic vibration (motors, pumps)	FFT + top-N frequency bins	100-5000x	High (50% CPU)	99%+ reduction	Loses waveform, keeps frequency spectrum	Predictive maintenance, bearing analysis
Event-driven sparse (motion, door switches)	Event logging (timestamp + state change only)	1000-10000x	Very Low	99.9%+ reduction	Loses “no event” periods (acceptable)	Security, occupancy, access control
High-frequency transient (acoustics, ultrasound)	Triggered capture + FFT	50-500x	High	98%+ reduction	Loses non-trigger periods	Leak detection, acoustic monitoring
Bounded range analog (pressure, flow)	Delta encoding + GZIP	3-10x	Medium (10% CPU)	70-90% reduction	None (lossless)	Critical measurements requiring full fidelity
Audit trail / compliance (access logs, alarms)	GZIP compression only	2-5x	Low (5% CPU)	50-80% reduction	None (lossless)	Regulatory compliance, security logs

Decision Tree:

Is the signal event-driven (state changes only)?
- YES → Use event logging (transmit only state changes)
- NO → Continue to step 2
Do you need to preserve the exact waveform for audit/compliance?
- YES → Use lossless compression only (GZIP, DEFLATE)
- NO → Continue to step 3
Does the signal have strong frequency-domain features?
- YES (vibration, acoustics) → Use FFT + top-N bins
- NO → Continue to step 4
Is the signal slowly changing (< 1% per sample)?
- YES → Use statistical aggregation over time windows
- NO → Continue to step 5
Is the signal bounded with low variance?
- YES → Use delta encoding + lossless compression
- NO → Use adaptive sampling rate based on rate-of-change

Example: Multi-Sensor System with Different Compression Strategies:

class EdgeCompressionPipeline:
    def __init__(self):
        self.strategies = {
            'temperature': StatisticalAggregator(window_sec=300),    # 5-min windows
            'vibration': FFTCompressor(top_n=10, window_sec=10),     # Top 10 freq bins
            'motion': EventLogger(),                                 # State changes only
            'pressure': DeltaEncoder() + GZIPCompressor(),           # Lossless delta
            'door': EventLogger(),                                   # State changes only
        }

    def compress_sensor_data(self, sensor_id, raw_samples):
        sensor_type = self.get_sensor_type(sensor_id)
        strategy = self.strategies[sensor_type]
        return strategy.compress(raw_samples)

# Usage example:
pipeline = EdgeCompressionPipeline()

# Temperature: 300 samples → 4 summary values (min/max/mean/std)
temp_compressed = pipeline.compress_sensor_data("temp_01", temp_samples)
# Compression: 300 samples × 2 bytes = 600 bytes → 16 bytes (4 floats)
# Ratio: 37.5x

# Vibration: 10,000 samples → 10 FFT bins
vib_compressed = pipeline.compress_sensor_data("vib_01", vib_samples)
# Compression: 10,000 × 2 bytes = 20 KB → 40 bytes (10 complex floats)
# Ratio: 500x

# Motion: 1000 samples (mostly "no motion") → 3 events ("motion detected" at 3 timestamps)
motion_compressed = pipeline.compress_sensor_data("motion_01", motion_samples)
# Compression: 1000 × 1 byte = 1 KB → 24 bytes (3 events × 8 bytes each)
# Ratio: 42x

Multi-Sensor Compression Pipeline

Configure sensor counts and sampling rates to see how different compression strategies affect total bandwidth for a multi-sensor system.

Show code

viewof cpip_tempCount = Inputs.range([0, 50], {value: 10, step: 1, label: "Temperature Sensors"})
viewof cpip_vibCount = Inputs.range([0, 50], {value: 5, step: 1, label: "Vibration Sensors"})
viewof cpip_motionCount = Inputs.range([0, 50], {value: 8, step: 1, label: "Motion Sensors"})
viewof cpip_pressCount = Inputs.range([0, 50], {value: 3, step: 1, label: "Pressure Sensors"})
viewof cpip_tempRate = Inputs.range([1, 60], {value: 1, step: 1, label: "Temp Sample Rate (Hz)"})
viewof cpip_vibRate = Inputs.range([100, 5000], {value: 1000, step: 100, label: "Vibration Sample Rate (Hz)"})

Show code

cpip_tempRawBps = cpip_tempCount * cpip_tempRate * 2
cpip_tempCompBps = cpip_tempCount * (16 / 300)  // 16 bytes per 5-min window
cpip_tempRatio = cpip_tempRawBps > 0 ? cpip_tempRawBps / cpip_tempCompBps : 0

// Vibration: FFT top-10 bins every 10 seconds (40 bytes per window)
cpip_vibRawBps = cpip_vibCount * cpip_vibRate * 2
cpip_vibCompBps = cpip_vibCount * (40 / 10)  // 40 bytes per 10-sec window
cpip_vibRatio = cpip_vibRawBps > 0 ? cpip_vibRawBps / cpip_vibCompBps : 0

// Motion: event logging (~3 events/hour, 8 bytes each)
cpip_motionRawBps = cpip_motionCount * 1  // 1 byte/s state polling
cpip_motionCompBps = cpip_motionCount * (3 * 8 / 3600)  // 3 events/hour
cpip_motionRatio = cpip_motionRawBps > 0 ? cpip_motionRawBps / cpip_motionCompBps : 0

// Pressure: delta encoding + GZIP (5x ratio on 2-byte samples at 10 Hz)
cpip_pressRawBps = cpip_pressCount * 10 * 2
cpip_pressCompBps = cpip_pressRawBps / 5
cpip_pressRatio = 5

cpip_totalRawBps = cpip_tempRawBps + cpip_vibRawBps + cpip_motionRawBps + cpip_pressRawBps
cpip_totalCompBps = cpip_tempCompBps + cpip_vibCompBps + cpip_motionCompBps + cpip_pressCompBps
cpip_totalRatio = cpip_totalRawBps > 0 ? cpip_totalRawBps / cpip_totalCompBps : 0
cpip_totalRawDayMB = (cpip_totalRawBps * 86400) / (1024 * 1024)
cpip_totalCompDayMB = (cpip_totalCompBps * 86400) / (1024 * 1024)
cpip_savedDayMB = cpip_totalRawDayMB - cpip_totalCompDayMB

Show code

html`<div style="background: var(--bs-light, #f8f9fa); border: 1px solid var(--bs-border-color, #dee2e6); border-radius: 8px; padding: 1.2rem; color: var(--bs-body-color);">
<h4 style="margin-top:0; color: #2C3E50;">Multi-Sensor Compression Summary</h4>
<table style="width:100%; border-collapse: collapse; font-size: 0.9rem;">
<tr style="background: rgba(44,62,80,0.06);">
  <th style="padding: 6px 8px; text-align: left;">Sensor Type</th>
  <th style="padding: 6px 8px; text-align: center;">Count</th>
  <th style="padding: 6px 8px; text-align: right;">Raw (B/s)</th>
  <th style="padding: 6px 8px; text-align: right;">Compressed (B/s)</th>
  <th style="padding: 6px 8px; text-align: right;">Ratio</th>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;">Temperature (Stat Agg)</td>
  <td style="padding: 6px 8px; text-align: center;">${cpip_tempCount}</td>
  <td style="padding: 6px 8px; text-align: right;">${cpip_tempRawBps.toFixed(0)}</td>
  <td style="padding: 6px 8px; text-align: right; color: #0f6b5c;">${cpip_tempCompBps.toFixed(2)}</td>
  <td style="padding: 6px 8px; text-align: right; font-weight: bold;">${cpip_tempRatio > 0 ? cpip_tempRatio.toFixed(0) + "x" : "-"}</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;">Vibration (FFT Top-10)</td>
  <td style="padding: 6px 8px; text-align: center;">${cpip_vibCount}</td>
  <td style="padding: 6px 8px; text-align: right;">${cpip_vibRawBps.toFixed(0)}</td>
  <td style="padding: 6px 8px; text-align: right; color: #0f6b5c;">${cpip_vibCompBps.toFixed(1)}</td>
  <td style="padding: 6px 8px; text-align: right; font-weight: bold;">${cpip_vibRatio > 0 ? cpip_vibRatio.toFixed(0) + "x" : "-"}</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;">Motion (Event Log)</td>
  <td style="padding: 6px 8px; text-align: center;">${cpip_motionCount}</td>
  <td style="padding: 6px 8px; text-align: right;">${cpip_motionRawBps.toFixed(0)}</td>
  <td style="padding: 6px 8px; text-align: right; color: #0f6b5c;">${cpip_motionCompBps.toFixed(3)}</td>
  <td style="padding: 6px 8px; text-align: right; font-weight: bold;">${cpip_motionRatio > 0 ? cpip_motionRatio.toFixed(0) + "x" : "-"}</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;">Pressure (Delta+GZIP)</td>
  <td style="padding: 6px 8px; text-align: center;">${cpip_pressCount}</td>
  <td style="padding: 6px 8px; text-align: right;">${cpip_pressRawBps.toFixed(0)}</td>
  <td style="padding: 6px 8px; text-align: right; color: #0f6b5c;">${cpip_pressCompBps.toFixed(1)}</td>
  <td style="padding: 6px 8px; text-align: right; font-weight: bold;">${cpip_pressRatio}x</td>
</tr>
<tr style="background: rgba(44,62,80,0.06); font-weight: bold;">
  <td style="padding: 6px 8px;">TOTAL</td>
  <td style="padding: 6px 8px; text-align: center;">${cpip_tempCount + cpip_vibCount + cpip_motionCount + cpip_pressCount}</td>
  <td style="padding: 6px 8px; text-align: right;">${cpip_totalRawBps.toFixed(0)}</td>
  <td style="padding: 6px 8px; text-align: right; color: #0f6b5c;">${cpip_totalCompBps.toFixed(1)}</td>
  <td style="padding: 6px 8px; text-align: right; color: #8a4a00;">${cpip_totalRatio > 0 ? cpip_totalRatio.toFixed(0) + "x" : "-"}</td>
</tr>
</table>
<div style="display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 0.8rem; margin-top: 1rem;">
  <div style="text-align: center; padding: 0.6rem; background: rgba(231,76,60,0.08); border-radius: 6px;">
    <div style="font-size: 0.8rem; color: #374151;">Raw Daily</div>
    <div style="font-size: 1.3rem; font-weight: bold; color: #9f1d1d;">${cpip_totalRawDayMB < 1024 ? cpip_totalRawDayMB.toFixed(1) + " MB" : (cpip_totalRawDayMB/1024).toFixed(2) + " GB"}</div>
  </div>
  <div style="text-align: center; padding: 0.6rem; background: rgba(22,160,133,0.08); border-radius: 6px;">
    <div style="font-size: 0.8rem; color: #374151;">Compressed Daily</div>
    <div style="font-size: 1.3rem; font-weight: bold; color: #0f6b5c;">${cpip_totalCompDayMB < 1 ? (cpip_totalCompDayMB * 1024).toFixed(1) + " KB" : cpip_totalCompDayMB.toFixed(2) + " MB"}</div>
  </div>
  <div style="text-align: center; padding: 0.6rem; background: rgba(52,152,219,0.08); border-radius: 6px;">
    <div style="font-size: 0.8rem; color: #374151;">Bandwidth Saved</div>
    <div style="font-size: 1.3rem; font-weight: bold; color: #0d47a1;">${cpip_savedDayMB < 1024 ? cpip_savedDayMB.toFixed(1) + " MB" : (cpip_savedDayMB/1024).toFixed(2) + " GB"}/day</div>
  </div>
</div>
</div>`

Buffer Sizing for Each Strategy:

Strategy	RAM Required (per sensor)	Latency Added	Notes
Statistical Aggregation	1-2 KB (circular buffer)	5-60 seconds (window duration)	Minimal memory, acceptable latency
FFT Compression	16-64 KB (FFT working memory)	1-10 seconds (FFT window)	High memory, fast processing
Event Logging	< 1 KB (state machine)	None (immediate)	Minimal resources, real-time
Delta Encoding	4-8 KB (recent history)	< 1 second	Low memory, minimal latency

Verification Checklist:

Compression ratio measured on representative data (not just ideal cases)
CPU usage stays under 60% during peak sensor activity
RAM usage leaves 30%+ margin for bursts
Decompression/reconstruction tested to verify information preservation
Bandwidth reduction measured end-to-end (including protocol overhead)

Sample Vibration Harmonics

The Mistake: Sampling vibration data at only 2x the motor’s fundamental frequency, missing critical high-frequency bearing fault signatures that appear at 5x-7x harmonics.

Real-World Example: A factory deployed vibration sensors with 100 Hz sampling on 30 Hz motors (thinking “2x motor speed is enough”):

Motor: 30 Hz
Bearing outer race fault frequency: 5 × 30 = 150 Hz

At 100 Hz sampling:
- Nyquist = 50 Hz
- 150 Hz aliases to |150 - round(150/100) × 100| = |150 - 200| = 50 Hz
- At exactly the Nyquist frequency, the signal is severely distorted

The bearing fault appeared as an unreliable artifact at the Nyquist boundary.
ML model missed 8 bearing failures before they became catastrophic.
One failure caused $500K in downtime.

Correct Implementation:

def calculate_vibration_sampling_rate(motor_rpm, bearing_type="ball"):
    """
    Calculate sampling rate for bearing fault detection.
    Accounts for all possible fault harmonics.
    """
    motor_hz = motor_rpm / 60

    # Bearing fault frequency multipliers
    fault_harmonics = {
        "ball": [1, 2, 3, 4, 5, 6, 7, 8],      # Ball bearings: up to 8x
        "roller": [1, 2, 3, 4, 5],              # Roller bearings: up to 5x
        "sleeve": [1, 2, 3],                    # Sleeve bearings: up to 3x
    }

    highest_harmonic = max(fault_harmonics[bearing_type])
    highest_frequency = motor_hz * highest_harmonic

    # Safety factor: 3x Nyquist for anti-aliasing filter
    recommended_rate = 3 * 2 * highest_frequency

    return {
        'motor_hz': motor_hz,
        'highest_fault_hz': highest_frequency,
        'nyquist_min': 2 * highest_frequency,
        'recommended': recommended_rate,
    }

# Example usage:
rate_info = calculate_vibration_sampling_rate(motor_rpm=1800, bearing_type="ball")
# Motor: 30 Hz | Highest fault: 240 Hz | Recommended: 1440 Hz (3x Nyquist)

Bearing Vibration Sampling

Adjust motor RPM and bearing type to see how bearing fault harmonics determine the required sampling rate.

Show code

viewof vscr_rpm = Inputs.range([300, 7200], {value: 1800, step: 100, label: "Motor Speed (RPM)"})
viewof vscr_bearingType = Inputs.select(["ball", "roller", "sleeve"], {value: "ball", label: "Bearing Type"})
viewof vscr_safetyX = Inputs.range([2.0, 4.0], {value: 3.0, step: 0.1, label: "Safety Factor (x Nyquist)"})

Show code

html`<div style="background: var(--bs-light, #f8f9fa); border: 1px solid var(--bs-border-color, #dee2e6); border-radius: 8px; padding: 1.2rem; color: var(--bs-body-color);">
<h4 style="margin-top:0; color: #2C3E50;">Vibration Sampling Rate Calculator</h4>
<table style="width:100%; border-collapse: collapse; font-size: 0.95rem;">
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;"><strong>Motor Frequency</strong></td>
  <td style="padding: 6px 8px; text-align: right;">${vscr_motorHz.toFixed(1)} Hz</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;"><strong>Bearing Type</strong></td>
  <td style="padding: 6px 8px; text-align: right;">${vscr_bearingType} (harmonics: 1x - ${vscr_highestHarmonic}x)</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;"><strong>Highest Fault Frequency</strong></td>
  <td style="padding: 6px 8px; text-align: right; color: #8a4a00; font-weight: bold;">${vscr_highestFaultHz.toFixed(1)} Hz</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;"><strong>Nyquist Minimum</strong></td>
  <td style="padding: 6px 8px; text-align: right;">${vscr_nyquistMin.toFixed(0)} Hz</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;"><strong>Recommended Rate</strong> (${vscr_safetyX}x Nyquist)</td>
  <td style="padding: 6px 8px; text-align: right; color: #0f6b5c; font-weight: bold;">${vscr_recommended.toFixed(0)} Hz</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;"><strong>Practical Rate</strong> (nearest power of 2)</td>
  <td style="padding: 6px 8px; text-align: right; color: #0d47a1; font-weight: bold;">${vscr_practical} Hz</td>
</tr>
<tr style="border-bottom: 1px solid var(--bs-border-color, #dee2e6);">
  <td style="padding: 6px 8px;"><strong>Data Rate</strong> (16-bit samples)</td>
  <td style="padding: 6px 8px; text-align: right;">${vscr_dataRateKBs.toFixed(1)} KB/s</td>
</tr>
<tr>
  <td style="padding: 6px 8px;"><strong>Daily Volume</strong> (1 sensor)</td>
  <td style="padding: 6px 8px; text-align: right; color: ${vscr_dailyGB > 0.5 ? '#9f1d1d' : '#0f6b5c'};">${vscr_dailyGB < 1 ? (vscr_dailyGB * 1024).toFixed(1) + " MB" : vscr_dailyGB.toFixed(2) + " GB"}/day</td>
</tr>
</table>
<div style="margin-top: 0.8rem; display: flex; gap: 0.3rem; flex-wrap: wrap;">
${vscr_harmonics.map(h => html`<span style="display: inline-block; padding: 3px 8px; border-radius: 4px; font-size: 0.8rem; background: ${h === vscr_highestHarmonic ? 'rgba(231,76,60,0.15)' : 'rgba(52,152,219,0.1)'}; color: ${h === vscr_highestHarmonic ? '#9f1d1d' : '#0d47a1'}; border: 1px solid ${h === vscr_highestHarmonic ? 'rgba(231,76,60,0.3)' : 'rgba(52,152,219,0.2)'};">${h}x = ${(vscr_motorHz * h).toFixed(0)} Hz</span>`)}
</div>
</div>`

Warning Signs: Vibration analysis shows only the fundamental frequency with no harmonics. Bearing failures occur with “no warning” despite continuous monitoring. FFT spectrum looks suspiciously clean.

Prevention: Always analyze the FULL harmonic series for rotating machinery. Sample at 3x Nyquist (not just 2x) to allow for anti-aliasing filter roll-off. Verify by plotting the FFT spectrum and confirming all expected fault harmonics are visible.

41.9 Knowledge Check

Quiz: Sampling and Compression

41.10 Practice Exercises

Exercise 1: Sampling Rate Selection

Objective: Determine optimal sampling rates for different sensor types.

Tasks:

Identify signal characteristics for 4 sensors: temperature (max 0.1 Hz), vibration (max 500 Hz), audio (max 20 kHz), motion IMU (max 50 Hz)
Apply Nyquist theorem: calculate minimum sampling rates
Implement with margin and measure data rate impact

Expected Outcome: Understand the relationship between signal bandwidth and sampling requirements.

Edge Aggregation Exercise

Objective: Implement multiple data reduction techniques and compare bandwidth savings.

Tasks:

Collect 1-minute of high-rate sensor data (100 Hz = 6,000 samples)
Apply 4 reduction strategies: downsampling, statistical aggregation, delta encoding, event-based
Transmit reduced data and compare bandwidth
Validate: can you detect a 1C temperature spike with each method?

Expected Outcome: Understand trade-offs between compression ratio and information preservation.

Checkpoint: Ready to Apply

You now know:

The remaining quizzes ask you to calculate sample rates, select memory-feasible compression, match concepts, order the workflow, label the pipeline, and complete aggregation code.
The practice tasks keep the same evidence rule: sample high enough first, then reduce only in ways that preserve the analysis goal.
The next audit page extends this chapter into lossless time-series boundaries when exact reconstruction becomes the requirement.

Key Takeaway

Always sample at 2x or higher than the highest frequency of interest (Nyquist theorem) to avoid aliasing artifacts. Then apply edge data reduction – aggregation for slow-changing signals, FFT compression for vibration analysis, or semantic event extraction for sparse event streams – to reduce bandwidth by 10-1000x while preserving the information needed for downstream analytics.

Interactive Quiz: Match Concepts

Interactive Quiz: Sequence the Steps

Label the Diagram

Code Challenge

41.11 Time-Series Compression Audit

The body above covers Nyquist sampling, aggregation, FFT compression, semantic extraction, and edge algorithm selection. Continue to Time-Series Compression and Audit Boundaries when you need the deeper L2 material: delta and delta-of-delta encoders, Gorilla-style XOR compression, columnar layout, lossless-vs-lossy evidence boundaries, and compression metadata required for audit-grade reconstruction.

41.12 Summary

Edge data acquisition requires careful balance between data fidelity and resource constraints:

Nyquist compliance: Sample at 2x or higher than your highest frequency of interest to avoid aliasing
Reduction techniques: Aggregation (10-50x), FFT compression (50-500x), and semantic extraction (100-1000x) each serve different use cases
Algorithm selection: Match compression to downstream analytics needs - lossless for audit, statistical for trends, FFT for vibration, semantic for events
Resource awareness: Consider CPU time and memory on constrained edge devices, not just compression ratio

41.13 Concept Relationships

Sampling and compression determine data fidelity, bandwidth, and power consumption trade-offs:

Sampling Theory (This chapter):

Nyquist theorem: sample at 2x highest frequency to avoid aliasing (vibration monitoring: sample at 2.5-3x Nyquist for anti-aliasing filter)
Under-sampling causes phantom patterns (150 Hz bearing fault aliased to 30 Hz when sampled at 60 Hz)

Compression Strategies (This chapter):

Lossless (GZIP): 2-4x reduction, preserves all data (audit trails, compliance)
Lossy statistical (aggregation): 10-100x reduction, preserves trends (environmental monitoring)
FFT-based: 50-500x reduction, preserves frequency spectrum (vibration analysis)
Semantic (event extraction): 100-1000x reduction, preserves state changes (threshold monitoring)

Power Impact:

Edge Data Acquisition: Power and Gateways - Compression reduces transmission frequency, directly extending battery life (factory case: 14,400x reduction enables LoRa vs cellular)

Architecture Context:

Edge Data Acquisition: Architecture - Device category determines compression need (cameras need heavy compression; temperature sensors need aggregation)
Edge Compute Patterns - Edge ML requires compressed features (FFT bins, not raw waveforms)

Data Quality:

Data Quality and Preprocessing - Validation must occur before compression (compressing invalid data wastes resources)

Key Insight: Compression algorithm selection depends on analytics requirements, not just compression ratio. Vibration monitoring needs FFT compression (preserves frequency info for bearing fault detection) even though aggregation yields higher ratios. Choosing wrong compression permanently destroys the signal needed for analysis.

41.14 What’s Next

Understand the acquisition architecture that applies these strategies: Edge Acquisition Architecture
Dig deeper into lossless time-series compression audits: Time-Series Compression and Audit Boundaries
Learn about power management for low-duty-cycle sampling: Edge Acquisition Power and Gateways
Study edge compute patterns built on efficient sampling: Edge Compute Patterns
Apply to the broader edge data acquisition context: Edge Data Acquisition
Return to the module overview: Big Data Overview