19 Signal Processing & Filtering
Learning Objectives
After completing this chapter, you will be able to:
- Calculate minimum sampling rates using the Nyquist theorem and diagnose aliasing artefacts
- Apply digital filters (moving average, EMA, median) to sensor data
- Implement hardware and software signal conditioning
- Detect and compensate for sensor calibration drift
- Select appropriate filters for different noise types and justify your choice
For Beginners: Signal Processing and Filtering
Raw sensor readings are almost always noisy – they jump around even when nothing is changing, like a bathroom scale that flickers between 69 and 71 kg while you stand still. Signal processing cleans up this noise using techniques like averaging (take several readings and find the middle) and filtering (ignore sudden spikes that are clearly wrong). These simple techniques turn unreliable raw data into smooth, trustworthy measurements.
19.2 Prerequisites
- Sensor Introduction: Basic sensor concepts
- Sensor Specifications: Accuracy, precision, resolution
Cross-Hub Connections
Learning Resources:
- Quizzes Hub - Test your signal processing knowledge with interactive questions on sampling, filtering, and calibration
- Simulations Hub - Explore the Signal Processing Workbench to visualize filter effects in real-time
- Labs Hub - Practice ADC sampling and digital filtering on ESP32 with Wokwi simulations
- Knowledge Gaps Hub - Address common misconceptions about sampling rates and filter selection
- Knowledge Map - See how signal processing connects to sensor calibration, ADC fundamentals, and data quality
Key Takeaway
In one sentence: Signal processing transforms noisy raw sensor data into clean, reliable measurements through proper sampling, filtering, and calibration.
Remember this rule: Sample at >2x your highest frequency of interest, filter with the right algorithm for your noise type (moving average for Gaussian, median for outliers), and recalibrate sensors periodically to maintain accuracy.
Sensor Squad: Signal Processing for Kids!
Meet the Clean-Up Crew!
Imagine you’re trying to hear your friend whisper in a noisy cafeteria. That’s what sensors deal with every day - they’re trying to measure the real temperature or distance, but there’s lots of “noise” getting in the way!
The Signal Processing Superheroes:
- Sammy the Sampler takes pictures of the signal really fast - like taking 100 photos per second of a spinning wheel so you can see each spoke clearly!
- Fiona the Filter is like a picky eater who only keeps the good readings and throws away the weird ones
- Cal the Calibrator makes sure your ruler starts at exactly zero, not at 1 inch!
Why does this matter?
Think of your thermometer app. Without signal processing:
- It might jump from 70 to 250 and back to 72 (scary but not real!)
- It might show a temperature that happened 5 minutes ago
- It might always read 3 degrees too high
Real-world example: When you use a fitness tracker, it filters out the noise from your wrist moving around and only counts actual steps. That’s signal processing in action!
Fun Fact: Your phone’s screen has a filter that ignores accidental touches when you’re holding it - that’s why it doesn’t go crazy when your palm touches the screen!
19.3 Sensor Data Acquisition Pitfalls
These common mistakes cause incorrect sensor readings, false alarms, and wasted development time. Understanding each pitfall – and its fix – will save you hours of debugging.
19.3.1 Ignoring Nyquist Sampling Rate (Aliasing)
The Mistake
Sampling signals below twice their highest frequency component, causing false low-frequency patterns to appear.
Symptoms:
- False patterns appear in data that don’t exist in the real signal
- High-frequency events (vibrations, transients) are missed entirely
- Frequency analysis shows phantom signals at incorrect frequencies
Why it happens: Developers choose sample rates based on storage/bandwidth constraints rather than signal characteristics. The Nyquist theorem requirement (sample at >2x the highest frequency) isn’t understood.
How to diagnose:
- Identify the highest frequency component in your signal of interest
- Compare your sample rate to 2x that frequency
- Look for patterns that vary with sample rate changes
- Use an oscilloscope or high-speed capture to see the true signal
The fix:
# WRONG: Sampling 60Hz vibration at 50Hz
# Result: False 10Hz signal appears (aliasing)
while True:
reading = sensor.read()
time.sleep(0.02) # 50Hz sample rate - causes aliasing!
# CORRECT: Sample at least 2x highest frequency of interest
# For 60Hz signal, sample at 120Hz minimum (200Hz recommended)
while True:
reading = sensor.read()
time.sleep(0.005) # 200Hz sample rate - safe
# BEST: Use anti-aliasing filter before ADC
# Hardware low-pass filter removes frequencies above Nyquist
filtered_signal = low_pass_filter(raw_signal, cutoff=sampling_rate/2)
adc_value = adc.read(filtered_signal)Prevention: Sample at >2x the highest frequency of interest (4-10x is better practice). Use hardware anti-aliasing filters (low-pass RC filter) before the ADC.
Worked Example: Calculating Sampling Rate for Vibration Monitoring
Scenario: You are designing a predictive maintenance system for industrial motors. Bearing failures produce vibration signatures between 10 Hz (slow bearing wear) and 5 kHz (bearing race defects). What sampling rate do you need?
Step 1: Identify Highest Frequency Component
Lowest frequency of interest: 10 Hz (slow wear patterns)
Highest frequency of interest: 5000 Hz (bearing race defects)
Step 2: Apply Nyquist Theorem
The Nyquist-Shannon sampling theorem states:
Minimum sampling rate = 2 × highest frequency
fs_min = 2 × 5000 Hz = 10,000 Hz (10 kHz)
Putting Numbers to It
The Nyquist theorem requires \(f_s > 2 f_{\max}\) to avoid aliasing. For bearing defects at 5 kHz:
\[f_{s,\min} = 2 \times 5000 \text{ Hz} = 10 \text{ kHz}\]
But real anti-aliasing filters have gradual rolloff. A 2nd-order Butterworth at 5 kHz cutoff attenuates by only 12 dB at 10 kHz (one octave above cutoff), still passing about 25% of the signal amplitude. By sampling at 20 kHz instead, the Nyquist frequency becomes 10 kHz, and the anti-aliasing filter only needs to reject frequencies above 10 kHz – where a 2nd-order filter provides much stronger attenuation. This is why we use \(f_s = 4\text{--}10 \times f_{\max}\) in practice, not just \(2 \times f_{\max}\).
Step 3: Add Safety Margin
In practice, sample at 4-10× the highest frequency of interest for anti-aliasing margin:
Recommended sampling rate = 4 × 5000 Hz = 20 kHz
Why the margin? Real-world anti-aliasing filters have gradual rolloff (a 2nd-order filter rolls off at 40 dB/decade, or 12 dB/octave). Sampling at exactly 2× the highest frequency leaves no margin for the filter’s transition band, allowing aliased energy to leak through.
Step 4: Choose ADC
| ADC | Max Sample Rate | Resolution | Cost | Suitable? |
|---|---|---|---|---|
| ESP32 built-in | 200 kSPS | 12-bit | $5 | ✅ Yes (overkill) |
| ADS1115 (I2C) | 860 SPS | 16-bit | $8 | ❌ Too slow |
| ADS1256 (SPI) | 30 kSPS | 24-bit | $20 | ✅ Yes (high precision) |
| MCP3008 (SPI) | 200 kSPS | 10-bit | $3 | ✅ Yes (budget) |
Choice: ESP32 built-in ADC at 20 kHz sampling rate
Step 5: Calculate Data Storage Requirements
Sample rate: 20,000 samples/second
Sample size: 2 bytes (12-bit ADC stored in 16-bit)
Data rate: 20,000 × 2 = 40,000 bytes/sec = 40 KB/sec
Per hour: 40 KB/s × 3600 s = 144 MB/hour
Per day: 144 MB/h × 24 h = 3.5 GB/day
Optimization: Store only when vibration exceeds threshold (event-driven recording):
Typical: 1% of time above threshold
Storage: 3.5 GB/day × 0.01 = 35 MB/day ✅ Manageable
Step 6: Power Consumption
Continuous 20 kHz sampling is power-intensive:
ESP32 ADC active: 80 mA
Daily energy: 80 mA × 24 h = 1920 mAh/day
2000 mAh battery lasts: 2000 / 1920 = 1.04 days ❌ Impractical
Solution: Sample in bursts (1 second every 60 seconds):
Active: 80 mA × 1 sec = 0.022 mAh
Sleep: 0.01 mA × 59 sec = 0.00016 mAh
Per minute: 0.022 mAh
Per day: 0.022 × 1440 = 31.7 mAh/day
Battery life: 2000 / 31.7 = 63 days ✅ Much better
Real-World Implementation:
#define SAMPLE_RATE 20000 // 20 kHz
#define SAMPLES_PER_BURST 20000 // 1 second of data
#define BURST_INTERVAL 60000 // 60 seconds
void loop() {
// Capture 1 second of vibration data at 20 kHz
uint16_t buffer[SAMPLES_PER_BURST];
for (int i = 0; i < SAMPLES_PER_BURST; i++) {
buffer[i] = analogRead(ADC_PIN); // ~10us on ESP32
delayMicroseconds(40); // 10us read + 40us delay ≈ 50us period (20 kHz)
}
// Analyze data (FFT, RMS, peak detection)
float rms = calculateRMS(buffer, SAMPLES_PER_BURST);
if (rms > THRESHOLD) {
// Store data or send alert
sendAlert(rms);
}
// Deep sleep for 59 seconds
esp_sleep_enable_timer_wakeup(59 * 1000000);
esp_deep_sleep_start();
}Key Insights:
- Always sample at 4-10× the highest frequency — not just 2× — to allow for anti-aliasing filter rolloff
- High sample rates generate massive data — use threshold-based recording or edge processing (FFT) to reduce storage
- Continuous high-speed ADC drains batteries fast — use burst sampling with sleep intervals for battery-powered deployments
- FFT on-device saves bandwidth — transmit vibration frequency spectrum (100 bytes) instead of raw waveform (40 KB/second)
19.3.1.1 Nyquist Sampling Rate Calculator
Use this calculator to determine the minimum and recommended sampling rates for your signal.
19.3.2 Ignoring Sensor Calibration Drift
The Mistake
Using factory calibration forever, assuming sensors maintain accuracy over time.
Symptoms:
- Gradual accuracy degradation over weeks/months
- Systematic bias in readings (consistently high or low)
- False anomaly detections as sensors drift past thresholds
Why it happens: Sensors drift due to aging, temperature cycling, humidity exposure, and contamination. Factory calibration is a snapshot that degrades over time.
The fix:
# WRONG: Using factory calibration forever
def read_temperature():
return adc.read() * FACTORY_SCALE + FACTORY_OFFSET
# CORRECT: Track and compensate for drift
class CalibratedSensor:
def __init__(self, drift_rate_per_day=0.01):
self.last_calibration = datetime.now()
self.drift_rate = drift_rate_per_day
self.scale = FACTORY_SCALE
self.offset = FACTORY_OFFSET
def read_temperature(self):
raw = adc.read()
days_since_cal = (datetime.now() - self.last_calibration).days
drift_compensation = self.drift_rate * days_since_cal
return raw * self.scale + self.offset + drift_compensation
def recalibrate(self, reference_value):
"""Call with known reference temperature"""
actual = self.read_raw()
self.offset = reference_value - (actual * self.scale)
self.last_calibration = datetime.now()Prevention: Implement periodic recalibration procedures (quarterly for many sensors). Use reference sensors for cross-validation.
19.3.3 Using Raw Sensor Data Without Filtering
The Mistake
Using raw ADC values directly for decisions, displays, and storage without noise filtering.
Symptoms:
- Noisy, jumpy dashboards and charts
- False anomaly alerts from noise spikes
- Incorrect trend analysis due to noise obscuring patterns
- Unstable control loops that oscillate
The fix:
from collections import deque
# WRONG: Using raw ADC values directly
while True:
temp = adc.read()
if temp > THRESHOLD:
alert() # Noise spike causes false alert!
display(temp) # Noisy, jumpy display
# CORRECT: Apply moving average filter
readings = deque(maxlen=10)
def read_filtered():
raw = adc.read()
readings.append(raw)
return sum(readings) / len(readings)
# BETTER: Exponential moving average (responds faster to changes)
class EMAFilter:
def __init__(self, alpha=0.1):
self.alpha = alpha # 0.1 = smooth, 0.5 = responsive
self.ema = None
def filter(self, value):
if self.ema is None:
self.ema = value
else:
self.ema = self.alpha * value + (1 - self.alpha) * self.ema
return self.ema
# BEST: Median filter for outlier rejection
def median_filter(new_value, buffer, size=5):
buffer.append(new_value)
if len(buffer) > size:
buffer.pop(0)
return sorted(buffer)[len(buffer) // 2]19.4 Digital Filter Selection Guide
Choosing the right filter depends on your noise characteristics and application requirements:
Quick Reference Table:
| Noise Type | Best Filter | Parameters | Use Case |
|---|---|---|---|
| Random Gaussian | Moving Average | Window = 10-20 | Temperature, humidity |
| Spikes/Outliers | Median | Window = 5-7 | Distance sensors, IR |
| High-frequency | IIR Low-pass | fc = max signal freq | Vibration, audio |
| Known statistics | Kalman | Q, R from data | IMU, tracking |
| 50/60 Hz interference | Notch filter | fn = 50 or 60 Hz | Analog sensors near AC |
19.4.1 EMA Smoothing Factor Explorer
Adjust the EMA alpha parameter to see how it affects the time constant and responsiveness. A lower alpha gives smoother output but slower response to real changes.
19.5 Signal Conditioning Chain
The complete signal conditioning chain for a sensor transforms raw physical measurements into clean digital data:
Stage-by-Stage Explanation:
| Stage | Purpose | Key Parameters |
|---|---|---|
| Raw Sensor | Physical measurement | Sensitivity, range |
| Amplification | Scale signal to ADC range | Gain (1x-1000x) |
| Analog Filter | Remove frequencies above Nyquist | Cutoff frequency |
| Sample & Hold | Freeze signal during conversion | Acquisition time |
| ADC | Convert to digital | Resolution (bits), sample rate |
| Digital Filter | Remove noise, smooth data | Filter type, window size |
| Decimation | Reduce data rate | Decimation factor |
Real-Time Implementation for Microcontrollers:
// ESP32/Arduino optimized filter implementations
// Fixed-point EMA (no floating point)
typedef struct {
int32_t state;
uint8_t shift; // alpha = 1/(2^shift), e.g., shift=3 -> alpha=0.125
} ema_filter_t;
int32_t ema_filter_update(ema_filter_t *f, int32_t input) {
// Fixed-point EMA: state = state + (input - state) >> shift
f->state += (input - f->state) >> f->shift;
return f->state;
}
// Ring buffer median filter
typedef struct {
int16_t buffer[5];
uint8_t index;
} median_filter_t;
int16_t median_filter_update(median_filter_t *f, int16_t input) {
f->buffer[f->index++] = input;
if (f->index >= 5) f->index = 0;
// Sort (optimized for small arrays)
int16_t sorted[5];
memcpy(sorted, f->buffer, sizeof(sorted));
for (int i = 0; i < 4; i++) {
for (int j = i+1; j < 5; j++) {
if (sorted[i] > sorted[j]) {
int16_t tmp = sorted[i];
sorted[i] = sorted[j];
sorted[j] = tmp;
}
}
}
return sorted[2]; // Middle element
}Performance Comparison (ESP32, 240MHz):
| Filter | Execution Time | RAM Usage | Latency (samples) |
|---|---|---|---|
| Moving Average (N=10) | 0.8 us | 40 bytes | 5 |
| EMA (fixed-point) | 0.2 us | 8 bytes | ~3 |
| Median (N=5) | 1.5 us | 10 bytes | 2 |
| IIR Low-pass | 0.3 us | 8 bytes | ~2 |
| Kalman (1D) | 2.0 us | 16 bytes | ~1 |
19.5.1 Voltage Level Mismatch
Pitfall: Voltage Level Mismatch Between Sensor and Microcontroller
The Mistake: Connecting 5V sensor outputs directly to 3.3V microcontroller inputs, or powering 3.3V sensors from 5V rails.
The Fix: Always verify voltage compatibility and use level shifting when needed:
- 5V sensor to 3.3V MCU: Voltage divider (10k + 20k gives 3.3V from 5V) or bidirectional level shifter (BSS138-based)
- 3.3V sensor to 5V MCU: Usually OK for digital, but use level shifter for reliable operation
- I2C level shifting: Use dedicated I2C level shifters (PCA9306, TXS0102) – avoid TXB0104 which is push-pull only and incompatible with I2C open-drain signaling
Specific examples:
- ESP32 GPIO absolute max: 3.6V. 5V input = instant damage
- Raspberry Pi GPIO: 3.3V max. 5V input damages SOC
- Arduino Uno: 5V tolerant, but analog reference still 5V
19.5.1.1 Voltage Divider Calculator
Design a resistive voltage divider for level shifting. The output voltage is \(V_{out} = V_{in} \times \frac{R_2}{R_1 + R_2}\).
19.5.2 Self-Heating Errors
Pitfall: Sensor Self-Heating Causing Temperature Errors
The Mistake: Continuously powering temperature sensors and taking rapid readings without accounting for self-heating.
The Fix: Implement duty-cycled sensing with thermal recovery time:
- DHT22: Power consumption 1.5 mW during measurement. Allow minimum 2 seconds between readings (datasheet requirement). Self-heating error: ~0.3 °C with continuous polling
- DS18B20: 1.5 mA active current at 5 V = 7.5 mW. Use 750 ms conversion time, then power down. Self-heating: ~0.1 °C with 1 Hz sampling
- NTC Thermistors: Self-heating = \(I^2 \times R\). With 10 k\(\Omega\) thermistor at 100 \(\mu\)A: P = 0.1 mW (negligible). At 1 mA: P = 10 mW (significant)
Best practice: Power sensor only during measurement. If continuous monitoring needed, use 10-second intervals minimum for temperature sensors.
19.5.2.1 Self-Heating Power Calculator
Calculate the self-heating power dissipation for resistive sensors (thermistors, RTDs, strain gauges).
19.6 Summary
Key signal processing takeaways:
| Concept | Rule | Related To | Why It Matters |
|---|---|---|---|
| Sampling | Sample at >2x highest frequency | Aliasing | Prevents false low-frequency patterns |
| Outlier Rejection | Use median filter (window 5-7) | Spike noise | Sorts values, picks middle, rejects spikes completely |
| Noise Smoothing | Use moving average or EMA | Gaussian noise | Reduces random noise while preserving trends |
| Calibration | Recalibrate periodically (quarterly) | Sensor aging | Temperature cycling and contamination degrade accuracy |
| Voltage Matching | Use level shifters for 5V→3.3V | GPIO protection | BSS138 or resistor divider prevents MCU damage |
Critical Mistakes to Avoid
- Never sample below Nyquist rate – You will see phantom frequencies that do not exist
- Never use raw ADC values for control – Noise causes oscillation and false triggers
- Never connect 5V sensors to 3.3V MCUs without protection – Instant, permanent damage
- Never assume factory calibration lasts forever – Drift is inevitable; recalibrate quarterly
19.7 Try It Yourself
Test your understanding by implementing a filter from scratch:
Exercise: Implement a Median Filter
Challenge: Write a median filter function that removes outlier spikes from ultrasonic distance sensor readings.
Given: An ultrasonic sensor outputs: [5, 5, 250, 5, 6, 5, 180, 5, 6, 5] cm
Your task: Implement a median filter with window size 5 that removes the 250cm and 180cm spikes.
Click for solution
from collections import deque
class MedianFilter:
def __init__(self, window_size=5):
self.buffer = deque(maxlen=window_size)
def filter(self, value):
self.buffer.append(value)
if len(self.buffer) < self.buffer.maxlen:
return value # Not enough data yet
sorted_buffer = sorted(self.buffer)
return sorted_buffer[len(sorted_buffer) // 2]
# Test
readings = [5, 5, 250, 5, 6, 5, 180, 5, 6, 5]
mf = MedianFilter(window_size=5)
for reading in readings:
filtered = mf.filter(reading)
print(f"Raw: {reading:3d} cm → Filtered: {filtered:3d} cm")
# Output:
# Raw: 5 cm → Filtered: 5 cm (buffer filling)
# Raw: 5 cm → Filtered: 5 cm (buffer filling)
# Raw: 250 cm → Filtered: 250 cm (buffer filling - not yet filtering)
# Raw: 5 cm → Filtered: 5 cm (buffer filling)
# Raw: 6 cm → Filtered: 5 cm ← Buffer full, median rejects 250!
# Raw: 5 cm → Filtered: 5 cm
# Raw: 180 cm → Filtered: 6 cm ← Spike mostly rejected
# Raw: 5 cm → Filtered: 5 cm
# Raw: 6 cm → Filtered: 6 cm
# Raw: 5 cm → Filtered: 5 cm[5, 5, 250, 5, 6] into [5, 5, 5, 6, 250] and returns the middle value (5), completely rejecting the 250 cm outlier. Note that during the fill-up phase (first N-1 readings), unfiltered values pass through – in production code, you may want to wait until the buffer is full before acting on readings.
Common Pitfalls
1. Filter Window Too Large for Fast-Changing Signals
A moving average window sized for a slow temperature sensor will blur rapid vibration or impact events into meaningless flat lines. Match the filter window to the expected rate of change of the measured quantity — use a short window (3-5 samples) for fast signals and a longer window (10-50 samples) for slow, noisy signals.
2. Aliasing from Insufficient Sampling Rate
Sampling a 50 Hz vibration signal at 60 Hz (below 2x = 100 Hz Nyquist minimum) creates a false 10 Hz component in the output that does not exist in the real signal. Always sample at at least 2x the highest frequency component present in the sensor signal, and use an anti-aliasing filter before the ADC.
3. Calibrating After Filtering When the Opposite is Required
Some calibration algorithms (like two-point linear calibration) should be applied to raw ADC values before filtering, while others work better on filtered values. Define and document which stage of the processing pipeline calibration occurs at, and be consistent between calibration capture and normal operation.
4. Applying Median Filter to Streaming Data Without a Buffer
The median filter requires storing a window of recent samples to find the middle value. Implementing it without a properly sized ring buffer causes it to compare only newly arrived samples, defeating its outlier rejection purpose. Pre-allocate the full filter window buffer and initialize it before beginning normal sensor operation.
19.8 What’s Next
Now that you can apply signal processing techniques to sensor data:
| Chapter | Focus | Connection to Signal Processing |
|---|---|---|
| Sensor Classification | Sensor categories and output types | Different sensor types require different filtering strategies |
| Sensor Specifications | Response time, resolution, accuracy | Specifications determine sampling rate and filter parameters |
| Calibration Techniques | Hands-on calibration methods | Compensate for the drift discussed in this chapter |
| Common Mistakes | Top 10 sensor pitfalls | Voltage mismatch and sampling errors expanded further |
| Hands-On Labs | ESP32 filter implementations | Build and test the filters covered here on real hardware |