19  Signal Processing & Filtering

Learning Objectives

After completing this chapter, you will be able to:

  • Calculate minimum sampling rates using the Nyquist theorem and diagnose aliasing artefacts
  • Apply digital filters (moving average, EMA, median) to sensor data
  • Implement hardware and software signal conditioning
  • Detect and compensate for sensor calibration drift
  • Select appropriate filters for different noise types and justify your choice
In 60 Seconds

Raw sensor readings always contain noise — random variation from electrical interference, temperature fluctuations, and ADC quantization. Filtering removes noise: moving average smooths continuous data, median filter rejects spike outliers, and exponential moving average (EMA) balances noise rejection with response speed. Sample at least 2x your signal’s highest frequency (Nyquist criterion) to avoid aliasing — false frequencies that appear when the sampling rate is too low.

19.1 MVU: Minimum Viable Understanding

If you only have 5 minutes, here’s what you need to know about sensor signal processing:

  1. Sample at 2x the highest frequency - The Nyquist theorem states you must sample at least twice the signal’s highest frequency to avoid aliasing (false patterns)
  2. Filter your data - Raw sensor readings are noisy; use moving average for smooth data, median filter for outlier rejection
  3. Calibrate regularly - Sensors drift over time; factory calibration degrades and needs periodic recalibration
  4. Match voltage levels - 5V sensors connected to 3.3V microcontrollers cause damage; use level shifters

Bottom line: Good signal processing separates professional IoT systems from unreliable prototypes. These fundamentals prevent false alarms, noisy dashboards, and wasted debugging time.

Raw sensor readings are almost always noisy – they jump around even when nothing is changing, like a bathroom scale that flickers between 69 and 71 kg while you stand still. Signal processing cleans up this noise using techniques like averaging (take several readings and find the middle) and filtering (ignore sudden spikes that are clearly wrong). These simple techniques turn unreliable raw data into smooth, trustworthy measurements.

19.2 Prerequisites

Learning Resources:

  • Quizzes Hub - Test your signal processing knowledge with interactive questions on sampling, filtering, and calibration
  • Simulations Hub - Explore the Signal Processing Workbench to visualize filter effects in real-time
  • Labs Hub - Practice ADC sampling and digital filtering on ESP32 with Wokwi simulations
  • Knowledge Gaps Hub - Address common misconceptions about sampling rates and filter selection
  • Knowledge Map - See how signal processing connects to sensor calibration, ADC fundamentals, and data quality
Key Takeaway

In one sentence: Signal processing transforms noisy raw sensor data into clean, reliable measurements through proper sampling, filtering, and calibration.

Remember this rule: Sample at >2x your highest frequency of interest, filter with the right algorithm for your noise type (moving average for Gaussian, median for outliers), and recalibrate sensors periodically to maintain accuracy.

Meet the Clean-Up Crew!

Imagine you’re trying to hear your friend whisper in a noisy cafeteria. That’s what sensors deal with every day - they’re trying to measure the real temperature or distance, but there’s lots of “noise” getting in the way!

The Signal Processing Superheroes:

  • Sammy the Sampler takes pictures of the signal really fast - like taking 100 photos per second of a spinning wheel so you can see each spoke clearly!
  • Fiona the Filter is like a picky eater who only keeps the good readings and throws away the weird ones
  • Cal the Calibrator makes sure your ruler starts at exactly zero, not at 1 inch!

Why does this matter?

Think of your thermometer app. Without signal processing:

  • It might jump from 70 to 250 and back to 72 (scary but not real!)
  • It might show a temperature that happened 5 minutes ago
  • It might always read 3 degrees too high

Real-world example: When you use a fitness tracker, it filters out the noise from your wrist moving around and only counts actual steps. That’s signal processing in action!

Fun Fact: Your phone’s screen has a filter that ignores accidental touches when you’re holding it - that’s why it doesn’t go crazy when your palm touches the screen!

19.3 Sensor Data Acquisition Pitfalls

These common mistakes cause incorrect sensor readings, false alarms, and wasted development time. Understanding each pitfall – and its fix – will save you hours of debugging.

19.3.1 Ignoring Nyquist Sampling Rate (Aliasing)

The Mistake

Sampling signals below twice their highest frequency component, causing false low-frequency patterns to appear.

Symptoms:

  • False patterns appear in data that don’t exist in the real signal
  • High-frequency events (vibrations, transients) are missed entirely
  • Frequency analysis shows phantom signals at incorrect frequencies

Why it happens: Developers choose sample rates based on storage/bandwidth constraints rather than signal characteristics. The Nyquist theorem requirement (sample at >2x the highest frequency) isn’t understood.

How to diagnose:

  1. Identify the highest frequency component in your signal of interest
  2. Compare your sample rate to 2x that frequency
  3. Look for patterns that vary with sample rate changes
  4. Use an oscilloscope or high-speed capture to see the true signal

The fix:

# WRONG: Sampling 60Hz vibration at 50Hz
# Result: False 10Hz signal appears (aliasing)
while True:
    reading = sensor.read()
    time.sleep(0.02)  # 50Hz sample rate - causes aliasing!

# CORRECT: Sample at least 2x highest frequency of interest
# For 60Hz signal, sample at 120Hz minimum (200Hz recommended)
while True:
    reading = sensor.read()
    time.sleep(0.005)  # 200Hz sample rate - safe

# BEST: Use anti-aliasing filter before ADC
# Hardware low-pass filter removes frequencies above Nyquist
filtered_signal = low_pass_filter(raw_signal, cutoff=sampling_rate/2)
adc_value = adc.read(filtered_signal)

Prevention: Sample at >2x the highest frequency of interest (4-10x is better practice). Use hardware anti-aliasing filters (low-pass RC filter) before the ADC.

Scenario: You are designing a predictive maintenance system for industrial motors. Bearing failures produce vibration signatures between 10 Hz (slow bearing wear) and 5 kHz (bearing race defects). What sampling rate do you need?

Step 1: Identify Highest Frequency Component

Lowest frequency of interest: 10 Hz (slow wear patterns)
Highest frequency of interest: 5000 Hz (bearing race defects)

Step 2: Apply Nyquist Theorem

The Nyquist-Shannon sampling theorem states:

Minimum sampling rate = 2 × highest frequency
fs_min = 2 × 5000 Hz = 10,000 Hz (10 kHz)

The Nyquist theorem requires \(f_s > 2 f_{\max}\) to avoid aliasing. For bearing defects at 5 kHz:

\[f_{s,\min} = 2 \times 5000 \text{ Hz} = 10 \text{ kHz}\]

But real anti-aliasing filters have gradual rolloff. A 2nd-order Butterworth at 5 kHz cutoff attenuates by only 12 dB at 10 kHz (one octave above cutoff), still passing about 25% of the signal amplitude. By sampling at 20 kHz instead, the Nyquist frequency becomes 10 kHz, and the anti-aliasing filter only needs to reject frequencies above 10 kHz – where a 2nd-order filter provides much stronger attenuation. This is why we use \(f_s = 4\text{--}10 \times f_{\max}\) in practice, not just \(2 \times f_{\max}\).

Step 3: Add Safety Margin

In practice, sample at 4-10× the highest frequency of interest for anti-aliasing margin:

Recommended sampling rate = 4 × 5000 Hz = 20 kHz

Why the margin? Real-world anti-aliasing filters have gradual rolloff (a 2nd-order filter rolls off at 40 dB/decade, or 12 dB/octave). Sampling at exactly 2× the highest frequency leaves no margin for the filter’s transition band, allowing aliased energy to leak through.

Step 4: Choose ADC

ADC Max Sample Rate Resolution Cost Suitable?
ESP32 built-in 200 kSPS 12-bit $5 ✅ Yes (overkill)
ADS1115 (I2C) 860 SPS 16-bit $8 ❌ Too slow
ADS1256 (SPI) 30 kSPS 24-bit $20 ✅ Yes (high precision)
MCP3008 (SPI) 200 kSPS 10-bit $3 ✅ Yes (budget)

Choice: ESP32 built-in ADC at 20 kHz sampling rate

Step 5: Calculate Data Storage Requirements

Sample rate: 20,000 samples/second
Sample size: 2 bytes (12-bit ADC stored in 16-bit)
Data rate: 20,000 × 2 = 40,000 bytes/sec = 40 KB/sec

Per hour: 40 KB/s × 3600 s = 144 MB/hour
Per day: 144 MB/h × 24 h = 3.5 GB/day

Optimization: Store only when vibration exceeds threshold (event-driven recording):

Typical: 1% of time above threshold
Storage: 3.5 GB/day × 0.01 = 35 MB/day ✅ Manageable

Step 6: Power Consumption

Continuous 20 kHz sampling is power-intensive:

ESP32 ADC active: 80 mA
Daily energy: 80 mA × 24 h = 1920 mAh/day
2000 mAh battery lasts: 2000 / 1920 = 1.04 days ❌ Impractical

Solution: Sample in bursts (1 second every 60 seconds):

Active: 80 mA × 1 sec = 0.022 mAh
Sleep: 0.01 mA × 59 sec = 0.00016 mAh
Per minute: 0.022 mAh
Per day: 0.022 × 1440 = 31.7 mAh/day
Battery life: 2000 / 31.7 = 63 days ✅ Much better

Real-World Implementation:

#define SAMPLE_RATE 20000    // 20 kHz
#define SAMPLES_PER_BURST 20000  // 1 second of data
#define BURST_INTERVAL 60000     // 60 seconds

void loop() {
  // Capture 1 second of vibration data at 20 kHz
  uint16_t buffer[SAMPLES_PER_BURST];
  for (int i = 0; i < SAMPLES_PER_BURST; i++) {
    buffer[i] = analogRead(ADC_PIN);  // ~10us on ESP32
    delayMicroseconds(40);  // 10us read + 40us delay ≈ 50us period (20 kHz)
  }

  // Analyze data (FFT, RMS, peak detection)
  float rms = calculateRMS(buffer, SAMPLES_PER_BURST);

  if (rms > THRESHOLD) {
    // Store data or send alert
    sendAlert(rms);
  }

  // Deep sleep for 59 seconds
  esp_sleep_enable_timer_wakeup(59 * 1000000);
  esp_deep_sleep_start();
}

Key Insights:

  1. Always sample at 4-10× the highest frequency — not just 2× — to allow for anti-aliasing filter rolloff
  2. High sample rates generate massive data — use threshold-based recording or edge processing (FFT) to reduce storage
  3. Continuous high-speed ADC drains batteries fast — use burst sampling with sleep intervals for battery-powered deployments
  4. FFT on-device saves bandwidth — transmit vibration frequency spectrum (100 bytes) instead of raw waveform (40 KB/second)

19.3.1.1 Nyquist Sampling Rate Calculator

Use this calculator to determine the minimum and recommended sampling rates for your signal.

19.3.2 Ignoring Sensor Calibration Drift

The Mistake

Using factory calibration forever, assuming sensors maintain accuracy over time.

Symptoms:

  • Gradual accuracy degradation over weeks/months
  • Systematic bias in readings (consistently high or low)
  • False anomaly detections as sensors drift past thresholds

Why it happens: Sensors drift due to aging, temperature cycling, humidity exposure, and contamination. Factory calibration is a snapshot that degrades over time.

The fix:

# WRONG: Using factory calibration forever
def read_temperature():
    return adc.read() * FACTORY_SCALE + FACTORY_OFFSET

# CORRECT: Track and compensate for drift
class CalibratedSensor:
    def __init__(self, drift_rate_per_day=0.01):
        self.last_calibration = datetime.now()
        self.drift_rate = drift_rate_per_day
        self.scale = FACTORY_SCALE
        self.offset = FACTORY_OFFSET

    def read_temperature(self):
        raw = adc.read()
        days_since_cal = (datetime.now() - self.last_calibration).days
        drift_compensation = self.drift_rate * days_since_cal
        return raw * self.scale + self.offset + drift_compensation

    def recalibrate(self, reference_value):
        """Call with known reference temperature"""
        actual = self.read_raw()
        self.offset = reference_value - (actual * self.scale)
        self.last_calibration = datetime.now()

Prevention: Implement periodic recalibration procedures (quarterly for many sensors). Use reference sensors for cross-validation.

19.3.3 Using Raw Sensor Data Without Filtering

The Mistake

Using raw ADC values directly for decisions, displays, and storage without noise filtering.

Symptoms:

  • Noisy, jumpy dashboards and charts
  • False anomaly alerts from noise spikes
  • Incorrect trend analysis due to noise obscuring patterns
  • Unstable control loops that oscillate

The fix:

from collections import deque

# WRONG: Using raw ADC values directly
while True:
    temp = adc.read()
    if temp > THRESHOLD:
        alert()  # Noise spike causes false alert!
    display(temp)  # Noisy, jumpy display

# CORRECT: Apply moving average filter
readings = deque(maxlen=10)

def read_filtered():
    raw = adc.read()
    readings.append(raw)
    return sum(readings) / len(readings)

# BETTER: Exponential moving average (responds faster to changes)
class EMAFilter:
    def __init__(self, alpha=0.1):
        self.alpha = alpha  # 0.1 = smooth, 0.5 = responsive
        self.ema = None

    def filter(self, value):
        if self.ema is None:
            self.ema = value
        else:
            self.ema = self.alpha * value + (1 - self.alpha) * self.ema
        return self.ema

# BEST: Median filter for outlier rejection
def median_filter(new_value, buffer, size=5):
    buffer.append(new_value)
    if len(buffer) > size:
        buffer.pop(0)
    return sorted(buffer)[len(buffer) // 2]

19.4 Digital Filter Selection Guide

Choosing the right filter depends on your noise characteristics and application requirements:

Digital filter selection decision tree based on noise type: random Gaussian uses moving average, spikes use median filter, low memory uses exponential MA, with advanced options Kalman, IIR, and complementary filters

Decision tree for selecting the appropriate digital filter
Figure 19.1: Decision tree for selecting the appropriate digital filter

Quick Reference Table:

Noise Type Best Filter Parameters Use Case
Random Gaussian Moving Average Window = 10-20 Temperature, humidity
Spikes/Outliers Median Window = 5-7 Distance sensors, IR
High-frequency IIR Low-pass fc = max signal freq Vibration, audio
Known statistics Kalman Q, R from data IMU, tracking
50/60 Hz interference Notch filter fn = 50 or 60 Hz Analog sensors near AC

19.4.1 EMA Smoothing Factor Explorer

Adjust the EMA alpha parameter to see how it affects the time constant and responsiveness. A lower alpha gives smoother output but slower response to real changes.

19.5 Signal Conditioning Chain

The complete signal conditioning chain for a sensor transforms raw physical measurements into clean digital data:

Complete signal conditioning chain showing six stages from raw millivolt sensor through amplification, anti-alias filter, level shifting, ADC conversion to clean calibrated output in engineering units

Complete signal conditioning chain from raw sensor to clean output
Figure 19.2: Complete signal conditioning chain from raw sensor to clean output

Stage-by-Stage Explanation:

Stage Purpose Key Parameters
Raw Sensor Physical measurement Sensitivity, range
Amplification Scale signal to ADC range Gain (1x-1000x)
Analog Filter Remove frequencies above Nyquist Cutoff frequency
Sample & Hold Freeze signal during conversion Acquisition time
ADC Convert to digital Resolution (bits), sample rate
Digital Filter Remove noise, smooth data Filter type, window size
Decimation Reduce data rate Decimation factor

Real-Time Implementation for Microcontrollers:

// ESP32/Arduino optimized filter implementations

// Fixed-point EMA (no floating point)
typedef struct {
    int32_t state;
    uint8_t shift;  // alpha = 1/(2^shift), e.g., shift=3 -> alpha=0.125
} ema_filter_t;

int32_t ema_filter_update(ema_filter_t *f, int32_t input) {
    // Fixed-point EMA: state = state + (input - state) >> shift
    f->state += (input - f->state) >> f->shift;
    return f->state;
}

// Ring buffer median filter
typedef struct {
    int16_t buffer[5];
    uint8_t index;
} median_filter_t;

int16_t median_filter_update(median_filter_t *f, int16_t input) {
    f->buffer[f->index++] = input;
    if (f->index >= 5) f->index = 0;

    // Sort (optimized for small arrays)
    int16_t sorted[5];
    memcpy(sorted, f->buffer, sizeof(sorted));
    for (int i = 0; i < 4; i++) {
        for (int j = i+1; j < 5; j++) {
            if (sorted[i] > sorted[j]) {
                int16_t tmp = sorted[i];
                sorted[i] = sorted[j];
                sorted[j] = tmp;
            }
        }
    }
    return sorted[2];  // Middle element
}

Performance Comparison (ESP32, 240MHz):

Filter Execution Time RAM Usage Latency (samples)
Moving Average (N=10) 0.8 us 40 bytes 5
EMA (fixed-point) 0.2 us 8 bytes ~3
Median (N=5) 1.5 us 10 bytes 2
IIR Low-pass 0.3 us 8 bytes ~2
Kalman (1D) 2.0 us 16 bytes ~1

19.5.1 Voltage Level Mismatch

Pitfall: Voltage Level Mismatch Between Sensor and Microcontroller

The Mistake: Connecting 5V sensor outputs directly to 3.3V microcontroller inputs, or powering 3.3V sensors from 5V rails.

The Fix: Always verify voltage compatibility and use level shifting when needed:

  • 5V sensor to 3.3V MCU: Voltage divider (10k + 20k gives 3.3V from 5V) or bidirectional level shifter (BSS138-based)
  • 3.3V sensor to 5V MCU: Usually OK for digital, but use level shifter for reliable operation
  • I2C level shifting: Use dedicated I2C level shifters (PCA9306, TXS0102) – avoid TXB0104 which is push-pull only and incompatible with I2C open-drain signaling

Specific examples:

  • ESP32 GPIO absolute max: 3.6V. 5V input = instant damage
  • Raspberry Pi GPIO: 3.3V max. 5V input damages SOC
  • Arduino Uno: 5V tolerant, but analog reference still 5V

19.5.1.1 Voltage Divider Calculator

Design a resistive voltage divider for level shifting. The output voltage is \(V_{out} = V_{in} \times \frac{R_2}{R_1 + R_2}\).

19.5.2 Self-Heating Errors

Pitfall: Sensor Self-Heating Causing Temperature Errors

The Mistake: Continuously powering temperature sensors and taking rapid readings without accounting for self-heating.

The Fix: Implement duty-cycled sensing with thermal recovery time:

  • DHT22: Power consumption 1.5 mW during measurement. Allow minimum 2 seconds between readings (datasheet requirement). Self-heating error: ~0.3 °C with continuous polling
  • DS18B20: 1.5 mA active current at 5 V = 7.5 mW. Use 750 ms conversion time, then power down. Self-heating: ~0.1 °C with 1 Hz sampling
  • NTC Thermistors: Self-heating = \(I^2 \times R\). With 10 k\(\Omega\) thermistor at 100 \(\mu\)A: P = 0.1 mW (negligible). At 1 mA: P = 10 mW (significant)

Best practice: Power sensor only during measurement. If continuous monitoring needed, use 10-second intervals minimum for temperature sensors.

19.5.2.1 Self-Heating Power Calculator

Calculate the self-heating power dissipation for resistive sensors (thermistors, RTDs, strain gauges).

19.6 Summary

Six key signal processing concepts: sampling and Nyquist theorem, filtering types, calibration, quantization, oversampling for noise reduction, and data rate management

Key concepts in sensor signal processing
Figure 19.3: Key concepts in sensor signal processing

Key signal processing takeaways:

Concept Rule Related To Why It Matters
Sampling Sample at >2x highest frequency Aliasing Prevents false low-frequency patterns
Outlier Rejection Use median filter (window 5-7) Spike noise Sorts values, picks middle, rejects spikes completely
Noise Smoothing Use moving average or EMA Gaussian noise Reduces random noise while preserving trends
Calibration Recalibrate periodically (quarterly) Sensor aging Temperature cycling and contamination degrade accuracy
Voltage Matching Use level shifters for 5V→3.3V GPIO protection BSS138 or resistor divider prevents MCU damage
Critical Mistakes to Avoid
  1. Never sample below Nyquist rate – You will see phantom frequencies that do not exist
  2. Never use raw ADC values for control – Noise causes oscillation and false triggers
  3. Never connect 5V sensors to 3.3V MCUs without protection – Instant, permanent damage
  4. Never assume factory calibration lasts forever – Drift is inevitable; recalibrate quarterly

19.7 Try It Yourself

Test your understanding by implementing a filter from scratch:

Challenge: Write a median filter function that removes outlier spikes from ultrasonic distance sensor readings.

Given: An ultrasonic sensor outputs: [5, 5, 250, 5, 6, 5, 180, 5, 6, 5] cm

Your task: Implement a median filter with window size 5 that removes the 250cm and 180cm spikes.

Click for solution
from collections import deque

class MedianFilter:
    def __init__(self, window_size=5):
        self.buffer = deque(maxlen=window_size)

    def filter(self, value):
        self.buffer.append(value)
        if len(self.buffer) < self.buffer.maxlen:
            return value  # Not enough data yet
        sorted_buffer = sorted(self.buffer)
        return sorted_buffer[len(sorted_buffer) // 2]

# Test
readings = [5, 5, 250, 5, 6, 5, 180, 5, 6, 5]
mf = MedianFilter(window_size=5)

for reading in readings:
    filtered = mf.filter(reading)
    print(f"Raw: {reading:3d} cm  →  Filtered: {filtered:3d} cm")

# Output:
# Raw:   5 cm  →  Filtered:   5 cm  (buffer filling)
# Raw:   5 cm  →  Filtered:   5 cm  (buffer filling)
# Raw: 250 cm  →  Filtered: 250 cm  (buffer filling - not yet filtering)
# Raw:   5 cm  →  Filtered:   5 cm  (buffer filling)
# Raw:   6 cm  →  Filtered:   5 cm  ← Buffer full, median rejects 250!
# Raw:   5 cm  →  Filtered:   5 cm
# Raw: 180 cm  →  Filtered:   6 cm  ← Spike mostly rejected
# Raw:   5 cm  →  Filtered:   5 cm
# Raw:   6 cm  →  Filtered:   6 cm
# Raw:   5 cm  →  Filtered:   5 cm
Key insight: Once the buffer is full, the median filter sorts [5, 5, 250, 5, 6] into [5, 5, 5, 6, 250] and returns the middle value (5), completely rejecting the 250 cm outlier. Note that during the fill-up phase (first N-1 readings), unfiltered values pass through – in production code, you may want to wait until the buffer is full before acting on readings.

Common Pitfalls

A moving average window sized for a slow temperature sensor will blur rapid vibration or impact events into meaningless flat lines. Match the filter window to the expected rate of change of the measured quantity — use a short window (3-5 samples) for fast signals and a longer window (10-50 samples) for slow, noisy signals.

Sampling a 50 Hz vibration signal at 60 Hz (below 2x = 100 Hz Nyquist minimum) creates a false 10 Hz component in the output that does not exist in the real signal. Always sample at at least 2x the highest frequency component present in the sensor signal, and use an anti-aliasing filter before the ADC.

Some calibration algorithms (like two-point linear calibration) should be applied to raw ADC values before filtering, while others work better on filtered values. Define and document which stage of the processing pipeline calibration occurs at, and be consistent between calibration capture and normal operation.

The median filter requires storing a window of recent samples to find the middle value. Implementing it without a properly sized ring buffer causes it to compare only newly arrived samples, defeating its outlier rejection purpose. Pre-allocate the full filter window buffer and initialize it before beginning normal sensor operation.

19.8 What’s Next

Now that you can apply signal processing techniques to sensor data:

Chapter Focus Connection to Signal Processing
Sensor Classification Sensor categories and output types Different sensor types require different filtering strategies
Sensor Specifications Response time, resolution, accuracy Specifications determine sampling rate and filter parameters
Calibration Techniques Hands-on calibration methods Compensate for the drift discussed in this chapter
Common Mistakes Top 10 sensor pitfalls Voltage mismatch and sampling errors expanded further
Hands-On Labs ESP32 filter implementations Build and test the filters covered here on real hardware

Continue to Sensor Classification ->