15  Signal Processing & Filtering

Learning Objectives

After completing this chapter, you will be able to:

  • Calculate minimum sampling rates using the Nyquist theorem and diagnose aliasing artefacts
  • Apply digital filters (moving average, EMA, median) to sensor data
  • Implement hardware and software signal conditioning
  • Detect and compensate for sensor calibration drift
  • Select appropriate filters for different noise types and justify your choice
In 60 Seconds

Raw sensor readings always contain noise — random variation from electrical interference, temperature fluctuations, and ADC quantization. Filtering removes noise: moving average smooths continuous data, median filter rejects spike outliers, and exponential moving average (EMA) balances noise rejection with response speed. Sample at least 2x your signal’s highest frequency (Nyquist criterion) to avoid aliasing — false frequencies that appear when the sampling rate is too low.

MVU: Minimum Viable Understanding

If you only have 5 minutes, here’s what you need to know about sensor signal processing:

  1. Sample at 2x the highest frequency - The Nyquist theorem states you must sample at least twice the signal’s highest frequency to avoid aliasing (false patterns)
  2. Filter your data - Raw sensor readings are noisy; use moving average for smooth data, median filter for outlier rejection
  3. Calibrate regularly - Sensors drift over time; factory calibration degrades and needs periodic recalibration
  4. Match voltage levels - 5V sensors connected to 3.3V microcontrollers cause damage; use level shifters

Bottom line: Good signal processing separates professional IoT systems from unreliable prototypes. These fundamentals prevent false alarms, noisy dashboards, and wasted debugging time.

Raw sensor readings are almost always noisy – they jump around even when nothing is changing, like a bathroom scale that flickers between 69 and 71 kg while you stand still. Signal processing cleans up this noise using techniques like averaging (take several readings and find the middle) and filtering (ignore sudden spikes that are clearly wrong). These simple techniques turn unreliable raw data into smooth, trustworthy measurements.

15.1 Prerequisites

Learning Resources:

  • Quizzes Hub - Test your signal processing knowledge with interactive questions on sampling, filtering, and calibration
  • Simulations Hub - Explore the Signal Processing Workbench to visualize filter effects in real-time
  • Labs Hub - Practice ADC sampling and digital filtering on ESP32 with Wokwi simulations
  • Knowledge Gaps Hub - Address common misconceptions about sampling rates and filter selection
  • Knowledge Map - See how signal processing connects to sensor calibration, ADC fundamentals, and data quality
Key Takeaway

In one sentence: Signal processing transforms noisy raw sensor data into clean, reliable measurements through proper sampling, filtering, and calibration.

Remember this rule: Sample at >2x your highest frequency of interest, filter with the right algorithm for your noise type (moving average for Gaussian, median for outliers), and recalibrate sensors periodically to maintain accuracy.

Meet the Clean-Up Crew!

Imagine you’re trying to hear your friend whisper in a noisy cafeteria. That’s what sensors deal with every day - they’re trying to measure the real temperature or distance, but there’s lots of “noise” getting in the way!

The Signal Processing Superheroes:

  • Sammy the Sampler takes pictures of the signal really fast - like taking 100 photos per second of a spinning wheel so you can see each spoke clearly!
  • Fiona the Filter is like a picky eater who only keeps the good readings and throws away the weird ones
  • Cal the Calibrator makes sure your ruler starts at exactly zero, not at 1 inch!

Why does this matter?

Think of your thermometer app. Without signal processing:

  • It might jump from 70 to 250 and back to 72 (scary but not real!)
  • It might show a temperature that happened 5 minutes ago
  • It might always read 3 degrees too high

Real-world example: When you use a fitness tracker, it filters out the noise from your wrist moving around and only counts actual steps. That’s signal processing in action!

Fun Fact: Your phone’s screen has a filter that ignores accidental touches when you’re holding it - that’s why it doesn’t go crazy when your palm touches the screen!

15.2 Sensor Data Acquisition Pitfalls

These common mistakes cause incorrect sensor readings, false alarms, and wasted development time. Understanding each pitfall – and its fix – will save you hours of debugging.

15.2.1 Ignoring Nyquist Sampling Rate (Aliasing)

The Mistake

Sampling signals below twice their highest frequency component, causing false low-frequency patterns to appear.

Symptoms:

  • False patterns appear in data that don’t exist in the real signal
  • High-frequency events (vibrations, transients) are missed entirely
  • Frequency analysis shows phantom signals at incorrect frequencies

Why it happens: Developers choose sample rates based on storage/bandwidth constraints rather than signal characteristics. The Nyquist theorem requirement (sample at >2x the highest frequency) isn’t understood.

How to diagnose:

  1. Identify the highest frequency component in your signal of interest
  2. Compare your sample rate to 2x that frequency
  3. Look for patterns that vary with sample rate changes
  4. Use an oscilloscope or high-speed capture to see the true signal

The fix: Sample faster than the signal you care about, then block frequencies above the Nyquist limit before they reach the ADC.

# WRONG: Sampling 60Hz vibration at 50Hz
# Result: False 10Hz signal appears (aliasing)
while True:
    reading = sensor.read()
    time.sleep(0.02)  # 50Hz sample rate - causes aliasing!

# CORRECT: Sample at least 2x highest frequency of interest
# For 60Hz signal, sample at 120Hz minimum (200Hz recommended)
while True:
    reading = sensor.read()
    time.sleep(0.005)  # 200Hz sample rate - safe

# BEST: Use anti-aliasing filter before ADC
# Hardware low-pass filter removes frequencies above Nyquist
filtered_signal = low_pass_filter(raw_signal, cutoff=sampling_rate/2)
adc_value = adc.read(filtered_signal)

Prevention: Sample at >2x the highest frequency of interest (4-10x is better practice). Use hardware anti-aliasing filters (low-pass RC filter) before the ADC.

Scenario: You are designing a predictive maintenance system for industrial motors. Bearing failures produce vibration signatures between 10 Hz (slow bearing wear) and 5 kHz (bearing race defects). What sampling rate do you need?

Step 1: Identify Highest Frequency Component

  • Lowest frequency of interest: 10 Hz for slow wear patterns.
  • Highest frequency of interest: 5000 Hz for bearing race defects.

Step 2: Apply Nyquist Theorem

The Nyquist-Shannon sampling theorem says the minimum sampling rate is two times the highest frequency: 2 x 5000 Hz = 10,000 Hz, or 10 kHz.

The Nyquist theorem requires the sampling rate to be greater than two times the highest frequency. For bearing defects at 5 kHz, the minimum is 2 x 5000 Hz = 10 kHz.

Real anti-aliasing filters have gradual rolloff. A 2nd-order Butterworth filter at a 5 kHz cutoff attenuates by only 12 dB at 10 kHz, still passing about 25% of the signal amplitude. Sampling at 20 kHz moves the Nyquist frequency to 10 kHz, giving the filter more room to reject unwanted high-frequency energy. This is why practical systems use 4x to 10x the highest frequency, not only the mathematical 2x minimum.

Step 3: Add Safety Margin

In practice, sample at 4-10× the highest frequency of interest for anti-aliasing margin:

Recommended sampling rate: 4 x 5000 Hz = 20 kHz.

Why the margin? Real-world anti-aliasing filters have gradual rolloff (a 2nd-order filter rolls off at 40 dB/decade, or 12 dB/octave). Sampling at exactly 2× the highest frequency leaves no margin for the filter’s transition band, allowing aliased energy to leak through.

Step 4: Choose ADC

ADC Max Sample Rate Resolution Cost Suitable?
ESP32 built-in 200 kSPS 12-bit $5 Yes, overkill
ADS1115 (I2C) 860 SPS 16-bit $8 No, too slow
ADS1256 (SPI) 30 kSPS 24-bit $20 Yes, high precision
MCP3008 (SPI) 200 kSPS 10-bit $3 Yes, budget

Choice: ESP32 built-in ADC at 20 kHz sampling rate

Step 5: Calculate Data Storage Requirements

  • Sample rate: 20,000 samples per second.
  • Sample size: 2 bytes per sample, storing a 12-bit ADC value in 16 bits.
  • Data rate: 20,000 x 2 = 40,000 bytes/s, or about 40 KB/s.
  • Per hour: 40 KB/s x 3600 s = 144 MB.
  • Per day: 144 MB/h x 24 h = 3.5 GB.

Optimization: Store only when vibration exceeds threshold (event-driven recording):

If the machine is above the vibration threshold only 1% of the time, storage falls from 3.5 GB/day to about 35 MB/day.

Step 6: Power Consumption

Continuous 20 kHz sampling is power-intensive:

  • ESP32 ADC active current: about 80 mA.
  • Daily energy: 80 mA x 24 h = 1920 mAh/day.
  • A 2000 mAh battery lasts roughly 2000 / 1920 = 1.04 days, which is impractical.

Solution: Sample in bursts (1 second every 60 seconds):

  • Active energy per minute: about 0.022 mAh.
  • Sleep energy for the remaining 59 seconds is negligible.
  • Daily energy: 0.022 x 1440 = 31.7 mAh/day.
  • Battery life: 2000 / 31.7 = 63 days, much better than continuous sampling.

Real-World Implementation:

void loop() {
  captureBurst(ADC_PIN, 20000, 20000);  // 1 s at 20 kHz
  float rms = calculateRMS(buffer);
  if (rms > THRESHOLD) sendAlert(rms);
  sleepSeconds(59);
}

Key Insights:

  1. Always sample at 4-10× the highest frequency — not just 2× — to allow for anti-aliasing filter rolloff
  2. High sample rates generate massive data — use threshold-based recording or edge processing (FFT) to reduce storage
  3. Continuous high-speed ADC drains batteries fast — use burst sampling with sleep intervals for battery-powered deployments
  4. FFT on-device saves bandwidth — transmit vibration frequency spectrum (100 bytes) instead of raw waveform (40 KB/second)

15.2.1.1 Nyquist Sampling Rate Calculator

Use this calculator to determine the minimum and recommended sampling rates for your signal.

15.2.2 Ignoring Sensor Calibration Drift

The Mistake

Using factory calibration forever, assuming sensors maintain accuracy over time.

Symptoms:

  • Gradual accuracy degradation over weeks/months
  • Systematic bias in readings (consistently high or low)
  • False anomaly detections as sensors drift past thresholds

Why it happens: Sensors drift due to aging, temperature cycling, humidity exposure, and contamination. Factory calibration is a snapshot that degrades over time.

The fix: Treat calibration as a repeating maintenance process, not a one-time factory setting.

def read_temperature():
    raw = adc.read()
    days = (datetime.now() - last_calibration).days
    drift = DRIFT_C_PER_DAY * days
    return raw * scale + offset + drift

def recalibrate(reference_c):
    global offset, last_calibration
    offset = reference_c - adc.read() * scale
    last_calibration = datetime.now()

Prevention: Implement periodic recalibration procedures (quarterly for many sensors). Use reference sensors for cross-validation.

15.2.3 Using Raw Sensor Data Without Filtering

The Mistake

Using raw ADC values directly for decisions, displays, and storage without noise filtering.

Symptoms:

  • Noisy, jumpy dashboards and charts
  • False anomaly alerts from noise spikes
  • Incorrect trend analysis due to noise obscuring patterns
  • Unstable control loops that oscillate

The fix: Choose a filter based on the kind of noise you see. Use a moving average for random noise, an exponential average for low memory, and a median filter for spikes.

from collections import deque

window = deque(maxlen=10)
spike_window = deque(maxlen=5)
ema = None

def moving_average(raw):
    window.append(raw)
    return sum(window) / len(window)

def exponential_average(raw, alpha=0.1):
    global ema
    ema = raw if ema is None else alpha * raw + (1 - alpha) * ema
    return ema

def median_filter(raw):
    spike_window.append(raw)
    ordered = sorted(spike_window)
    return ordered[len(ordered) // 2]

15.3 Digital Filter Selection Guide

Choosing the right filter depends on your noise characteristics and application requirements:

Digital filter selection decision tree based on noise type: random Gaussian uses moving average, spikes use median filter, low memory uses exponential MA, with advanced options Kalman, IIR, and complementary filters

Decision tree for selecting the appropriate digital filter
Figure 15.1: Decision tree for selecting the appropriate digital filter

Quick Reference Table:

Noise Type Best Filter Parameters Use Case
Random Gaussian Moving Average Window = 10-20 Temperature, humidity
Spikes/Outliers Median Window = 5-7 Distance sensors, IR
High-frequency IIR Low-pass fc = max signal freq Vibration, audio
Known statistics Kalman Q, R from data IMU, tracking
50/60 Hz interference Notch filter fn = 50 or 60 Hz Analog sensors near AC

15.3.1 EMA Smoothing Factor Explorer

Adjust the EMA alpha parameter to see how it affects the time constant and responsiveness. A lower alpha gives smoother output but slower response to real changes.

15.4 Signal Conditioning Chain

The complete signal conditioning chain for a sensor transforms raw physical measurements into clean digital data:

Complete signal conditioning chain showing six stages from raw millivolt sensor through amplification, anti-alias filter, level shifting, ADC conversion to clean calibrated output in engineering units

Complete signal conditioning chain from raw sensor to clean output
Figure 15.2: Complete signal conditioning chain from raw sensor to clean output

Stage-by-Stage Explanation:

Stage Purpose Key Parameters
Raw Sensor Physical measurement Sensitivity, range
Amplification Scale signal to ADC range Gain (1x-1000x)
Analog Filter Remove frequencies above Nyquist Cutoff frequency
Sample & Hold Freeze signal during conversion Acquisition time
ADC Convert to digital Resolution (bits), sample rate
Digital Filter Remove noise, smooth data Filter type, window size
Decimation Reduce data rate Decimation factor

Real-Time Implementation for Microcontrollers:

For small microcontrollers, keep filter state tiny:

  • EMA needs one state variable and one shift value. The update is state += (input - state) >> shift.
  • A five-sample median filter needs only a five-value ring buffer, then sorts a copy and returns the middle value.
  • Avoid floating point if the MCU is slow; fixed-point integer arithmetic is usually enough for sensor smoothing.
int32_t ema_update(int32_t input) {
  state += (input - state) >> shift;
  return state;
}

Performance Comparison (ESP32, 240MHz):

Filter Execution Time RAM Usage Latency (samples)
Moving Average (N=10) 0.8 us 40 bytes 5
EMA (fixed-point) 0.2 us 8 bytes ~3
Median (N=5) 1.5 us 10 bytes 2
IIR Low-pass 0.3 us 8 bytes ~2
Kalman (1D) 2.0 us 16 bytes ~1

15.4.1 Voltage Level Mismatch

Pitfall: Voltage Level Mismatch Between Sensor and Microcontroller

The Mistake: Connecting 5V sensor outputs directly to 3.3V microcontroller inputs, or powering 3.3V sensors from 5V rails.

The Fix: Always verify voltage compatibility and use level shifting when needed:

  • 5V sensor to 3.3V MCU: Voltage divider (10k + 20k gives 3.3V from 5V) or bidirectional level shifter (BSS138-based)
  • 3.3V sensor to 5V MCU: Usually OK for digital, but use level shifter for reliable operation
  • I2C level shifting: Use dedicated I2C level shifters (PCA9306, TXS0102) – avoid TXB0104 which is push-pull only and incompatible with I2C open-drain signaling

Specific examples:

  • ESP32 GPIO absolute max: 3.6V. 5V input = instant damage
  • Raspberry Pi GPIO: 3.3V max. 5V input damages SOC
  • Arduino Uno: 5V tolerant, but analog reference still 5V

15.4.1.1 Voltage Divider Calculator

Design a resistive voltage divider for level shifting. The output voltage is \(V_{out} = V_{in} \times \frac{R_2}{R_1 + R_2}\).

15.4.2 Self-Heating Errors

Pitfall: Sensor Self-Heating Causing Temperature Errors

The Mistake: Continuously powering temperature sensors and taking rapid readings without accounting for self-heating.

The Fix: Implement duty-cycled sensing with thermal recovery time:

  • DHT22: Power consumption 1.5 mW during measurement. Allow minimum 2 seconds between readings (datasheet requirement). Self-heating error: ~0.3 °C with continuous polling
  • DS18B20: 1.5 mA active current at 5 V = 7.5 mW. Use 750 ms conversion time, then power down. Self-heating: ~0.1 °C with 1 Hz sampling
  • NTC Thermistors: Self-heating = \(I^2 \times R\). With 10 k\(\Omega\) thermistor at 100 \(\mu\)A: P = 0.1 mW (negligible). At 1 mA: P = 10 mW (significant)

Best practice: Power sensor only during measurement. If continuous monitoring needed, use 10-second intervals minimum for temperature sensors.

15.4.2.1 Self-Heating Power Calculator

Calculate the self-heating power dissipation for resistive sensors (thermistors, RTDs, strain gauges).

15.5 Summary

Six key signal processing concepts: sampling and Nyquist theorem, filtering types, calibration, quantization, oversampling for noise reduction, and data rate management

Key concepts in sensor signal processing
Figure 15.3: Key concepts in sensor signal processing

Key signal processing takeaways:

Concept Rule Related To Why It Matters
Sampling Sample at >2x highest frequency Aliasing Prevents false low-frequency patterns
Outlier Rejection Use median filter (window 5-7) Spike noise Sorts values, picks middle, rejects spikes completely
Noise Smoothing Use moving average or EMA Gaussian noise Reduces random noise while preserving trends
Calibration Recalibrate periodically (quarterly) Sensor aging Temperature cycling and contamination degrade accuracy
Voltage Matching Use level shifters for 5V to 3.3V GPIO protection BSS138 or resistor divider prevents MCU damage
Critical Mistakes to Avoid
  1. Never sample below Nyquist rate – You will see phantom frequencies that do not exist
  2. Never use raw ADC values for control – Noise causes oscillation and false triggers
  3. Never connect 5V sensors to 3.3V MCUs without protection – Instant, permanent damage
  4. Never assume factory calibration lasts forever – Drift is inevitable; recalibrate quarterly

15.6 Try It Yourself

Test your understanding by implementing a filter from scratch:

Challenge: Write a median filter function that removes outlier spikes from ultrasonic distance sensor readings.

Given: An ultrasonic sensor outputs: [5, 5, 250, 5, 6, 5, 180, 5, 6, 5] cm

Your task: Implement a median filter with window size 5 that removes the 250cm and 180cm spikes.

Click for solution
from collections import deque

class MedianFilter:
    def __init__(self, window_size=5):
        self.buffer = deque(maxlen=window_size)

    def filter(self, value):
        self.buffer.append(value)
        if len(self.buffer) < self.buffer.maxlen:
            return value
        ordered = sorted(self.buffer)
        return ordered[len(ordered) // 2]

readings = [5, 5, 250, 5, 6, 5, 180, 5, 6, 5]
mf = MedianFilter(window_size=5)

for reading in readings:
    print(reading, "->", mf.filter(reading))
Key insight: Once the buffer is full, the median filter sorts [5, 5, 250, 5, 6] into [5, 5, 5, 6, 250] and returns the middle value (5), rejecting the 250 cm outlier. During the fill-up phase, unfiltered values pass through; production code should wait until the buffer is full before acting on readings.

Common Pitfalls

A moving average window sized for a slow temperature sensor will blur rapid vibration or impact events into meaningless flat lines. Match the filter window to the expected rate of change of the measured quantity — use a short window (3-5 samples) for fast signals and a longer window (10-50 samples) for slow, noisy signals.

Sampling a 50 Hz vibration signal at 60 Hz (below 2x = 100 Hz Nyquist minimum) creates a false 10 Hz component in the output that does not exist in the real signal. Always sample at at least 2x the highest frequency component present in the sensor signal, and use an anti-aliasing filter before the ADC.

Some calibration algorithms (like two-point linear calibration) should be applied to raw ADC values before filtering, while others work better on filtered values. Define and document which stage of the processing pipeline calibration occurs at, and be consistent between calibration capture and normal operation.

The median filter requires storing a window of recent samples to find the middle value. Implementing it without a properly sized ring buffer causes it to compare only newly arrived samples, defeating its outlier rejection purpose. Pre-allocate the full filter window buffer and initialize it before beginning normal sensor operation.

15.7 What’s Next

Now that you can apply signal processing techniques to sensor data:

Chapter Focus Connection to Signal Processing
Calibration Techniques Hands-on calibration methods Compensate for the drift discussed in this chapter
Calibration Lab Interactive two-point calibration practice Turn filtering and calibration concepts into an experiment
Common Mistakes Top 10 sensor pitfalls Voltage mismatch and sampling errors expanded further
Hands-On Labs ESP32 filter implementations Build and test the filters covered here on real hardware
Power Management Energy-aware sensing Choose sampling and sleep strategies that preserve battery life

Continue to Sensor Calibration Techniques ->