551 Sensor Data Processing
Filtering, Calibration, and Signal Conditioning
sensor filtering, calibration, moving average, Kalman filter, signal conditioning, noise reduction
551.1 Learning Objectives
By the end of this chapter, you will be able to:
- Implement moving average and Kalman filters to reduce sensor noise
- Perform two-point calibration to correct sensor offset and gain errors
- Design data validation pipelines that detect anomalies and outliers
- Choose appropriate filtering strategies based on signal characteristics
- Store and retrieve calibration coefficients for production deployments
551.2 Introduction
Raw sensor data is rarely perfect. Environmental noise, electrical interference, and manufacturing variations all affect measurement accuracy. This chapter covers the essential techniques for transforming noisy, uncalibrated sensor readings into reliable, accurate measurements.
Imagine a thermometer that always reads 2 degrees too high, and sometimes jumps around randomly. That is what raw sensor data often looks like. Data processing is like training that thermometer to be accurate and stable. We use “filters” to smooth out the jumps and “calibration” to correct the 2-degree error. Without these steps, your IoT system would make decisions based on wrong information.
551.3 Filtering Noisy Sensor Data
Sensor noise comes from many sources: electrical interference, quantization errors, and environmental factors. Filters remove this noise while preserving the true signal.
551.3.1 Moving Average Filter
The moving average filter is the simplest and most common approach. It averages the last N readings to smooth out random variations.
// Moving average filter
class MovingAverageFilter {
private:
float* buffer;
int size;
int index;
float sum;
public:
MovingAverageFilter(int windowSize) {
size = windowSize;
buffer = new float[size];
index = 0;
sum = 0;
for(int i = 0; i < size; i++) {
buffer[i] = 0;
}
}
float filter(float value) {
sum -= buffer[index];
buffer[index] = value;
sum += value;
index = (index + 1) % size;
return sum / size;
}
};551.3.2 Kalman Filter
The Kalman filter provides optimal noise reduction by modeling the system dynamics. It adapts based on measurement uncertainty and process noise.
// Kalman filter (simple 1D implementation)
class KalmanFilter {
private:
float q; // Process noise covariance
float r; // Measurement noise covariance
float x; // Estimated value
float p; // Estimation error covariance
float k; // Kalman gain
public:
KalmanFilter(float processNoise, float measurementNoise, float initialValue) {
q = processNoise;
r = measurementNoise;
x = initialValue;
p = 1;
}
float filter(float measurement) {
// Prediction
p = p + q;
// Update
k = p / (p + r);
x = x + k * (measurement - x);
p = (1 - k) * p;
return x;
}
};
// Usage example
MovingAverageFilter maFilter(10); // 10-sample window
KalmanFilter kFilter(0.01, 0.1, 25.0); // Process noise, measurement noise, initial value
void loop() {
float rawTemp = readTemperature();
float filteredMA = maFilter.filter(rawTemp);
float filteredKalman = kFilter.filter(rawTemp);
Serial.print("Raw: ");
Serial.print(rawTemp);
Serial.print(" | Moving Avg: ");
Serial.print(filteredMA);
Serial.print(" | Kalman: ");
Serial.println(filteredKalman);
delay(100);
}551.3.3 Median Filter for Spike Removal
When sensor data has occasional spike errors (outliers), a median filter is more effective than averaging.
float medianFilter(float* buffer, int size) {
float sorted[size];
memcpy(sorted, buffer, size * sizeof(float));
// Simple bubble sort for small arrays
for (int i = 0; i < size - 1; i++) {
for (int j = 0; j < size - i - 1; j++) {
if (sorted[j] > sorted[j+1]) {
float temp = sorted[j];
sorted[j] = sorted[j+1];
sorted[j+1] = temp;
}
}
}
return sorted[size / 2]; // Return middle value
}
// Example: [22, 55, 23] -> sorted: [22, 23, 55] -> median: 23
// The spike (55) is completely ignored!Option A: Moving Average (N=10 samples): Memory usage 40 bytes (10 floats), CPU cycles ~20 per update, latency 10 samples (fixed delay), noise reduction sqrt(N) = 3.16x, implementation complexity low (10 lines of code), no tuning parameters
Option B: Kalman Filter (1D): Memory usage 20 bytes (5 floats for state), CPU cycles ~50 per update (multiply/divide), latency 1-3 samples (adaptive), noise reduction 5-10x (optimal for known noise), implementation complexity medium (30 lines), requires Q and R tuning
Decision Factors: For stationary signals with Gaussian noise (temperature averaging), moving average is simpler and nearly as effective. For tracking changing signals (position, velocity, acceleration) where latency matters, Kalman filters provide faster response with better noise rejection. Kalman requires knowing process noise (Q) and measurement noise (R) - wrong values degrade performance. For resource-constrained 8-bit MCUs (ATmega328), moving average’s integer-only math saves flash and runs faster. ESP32’s floating-point unit makes Kalman practical.
551.4 Sensor Calibration
Calibration corrects systematic errors in sensor readings. Two-point calibration addresses both offset (zero-point shift) and gain (sensitivity) errors.
551.4.1 Two-Point Calibration
// Two-point calibration for linear sensors
struct CalibrationData {
float rawLow;
float rawHigh;
float actualLow;
float actualHigh;
};
CalibrationData cal = {
.rawLow = 512, // ADC reading at low point
.rawHigh = 3584, // ADC reading at high point
.actualLow = 0.0, // Actual value at low point
.actualHigh = 100.0 // Actual value at high point
};
float calibrate(float rawValue) {
// Linear interpolation
float slope = (cal.actualHigh - cal.actualLow) / (cal.rawHigh - cal.rawLow);
float calibratedValue = cal.actualLow + slope * (rawValue - cal.rawLow);
return calibratedValue;
}
// Store calibration in EEPROM
#include <EEPROM.h>
void saveCalibration() {
EEPROM.begin(512);
EEPROM.put(0, cal);
EEPROM.commit();
Serial.println("Calibration saved");
}
void loadCalibration() {
EEPROM.begin(512);
EEPROM.get(0, cal);
Serial.println("Calibration loaded");
}551.4.2 Worked Example: Calibrating a Soil Moisture Sensor
551.5 Worked Example: Calibrating an Analog Sensor
Scenario: You are deploying a soil moisture monitoring system for a greenhouse. The capacitive soil moisture sensor outputs an analog voltage (0-3.3V) that varies with soil moisture content. However, the raw ADC readings do not correspond to meaningful moisture percentages. The sensor reads approximately 3000 (ADC units) in completely dry soil and 1200 in saturated soil. You need accurate readings to trigger irrigation at 30% moisture.
Goal: Develop and implement a two-point calibration procedure to convert raw ADC readings into calibrated moisture percentages (0-100%).
What we do: Measure the sensor’s output range and behavior.
Initial measurements:
| Condition | ADC Reading (12-bit, 0-4095) | Expected Moisture |
|---|---|---|
| Air (no soil) | 3450 | ~0% (baseline) |
| Bone-dry soil (oven-dried) | 3000 | 0% |
| Field capacity (well-watered) | 1800 | ~60-70% |
| Saturated soil (standing water) | 1200 | 100% |
Observations:
- Output is inversely proportional to moisture (higher moisture = lower ADC value)
- Range spans approximately 1200-3000 ADC units for the usable moisture range
- Response is approximately linear in the 20-80% moisture range
What we do: Establish known moisture levels using gravimetric method.
Gravimetric calibration procedure:
- Prepare soil samples: Collect 5 containers of identical soil (200g each)
- Create moisture levels:
- Sample A: Oven-dry at 105C for 24 hours (0% moisture)
- Sample B: Add 10g water (5% moisture by weight)
- Sample C: Add 30g water (15% moisture)
- Sample D: Add 60g water (30% moisture - irrigation trigger)
- Sample E: Saturate and drain (field capacity, ~60%)
- Record calibration data:
| Sample | Added Water (g) | Calculated Moisture (%) | ADC Reading |
|---|---|---|---|
| A | 0 | 0% | 2988 |
| B | 10 | 5% | 2855 |
| C | 30 | 15% | 2500 |
| D | 60 | 30% | 2030 |
| E | ~120 (saturated) | 60% | 1515 |
What we do: Fit a linear equation to the calibration data.
Two-point calibration (using dry and field capacity points):
- Point 1 (Low): ADC = 2988, Moisture = 0%
- Point 2 (High): ADC = 1515, Moisture = 60%
Calculate slope (m) and offset (b):
\[m = \frac{Y_2 - Y_1}{X_2 - X_1} = \frac{60 - 0}{1515 - 2988} = \frac{60}{-1473} = -0.0407\]
\[b = Y_1 - m \times X_1 = 0 - (-0.0407 \times 2988) = 121.6\]
Calibration equation:
\[\text{Moisture \%} = -0.0407 \times \text{ADC} + 121.6\]
What we do: Create production-ready calibration code.
#include <EEPROM.h>
#define SOIL_PIN 34
#define NUM_SAMPLES 10
struct Calibration {
uint32_t magic;
float dryADC;
float wetADC;
float dryMoisture;
float wetMoisture;
};
Calibration cal = {
.magic = 0xCAFEBABE,
.dryADC = 2988.0,
.wetADC = 1200.0,
.dryMoisture = 0.0,
.wetMoisture = 100.0
};
float getMoisturePercent() {
// Read with median filtering
float adcValue = readADCFiltered();
// Linear interpolation with bounds checking
float moisture = cal.dryMoisture +
(cal.wetMoisture - cal.dryMoisture) *
(cal.dryADC - adcValue) / (cal.dryADC - cal.wetADC);
// Clamp to valid range
if (moisture < 0.0) moisture = 0.0;
if (moisture > 100.0) moisture = 100.0;
return moisture;
}Outcome: Successfully calibrated soil moisture sensor with 2% accuracy.
Accuracy achieved:
| Moisture Range | Calibration Error | Acceptable? |
|---|---|---|
| 0-20% (dry) | 1.5% | Yes |
| 20-40% (trigger zone) | 2.0% | Yes |
| 40-60% (moist) | 3.0% | Yes |
| 60-100% (wet) | 5.0% | Yes |
Maintenance schedule:
- Recalibrate every 6 months or after sensor replacement
- Verify with known moisture sample monthly during growing season
551.6 Knowledge Check
Knowledge Check: Data Processing and Calibration Test Your Understanding
Question 1: You have a noisy temperature sensor that occasionally produces spike values (e.g., 22C, 23C, 55C, 22C). Which filter is better for removing these spikes: moving average or median filter?
Click to see answer
Answer: Median filter is better for removing spike noise. A median filter takes the middle value of a window, effectively ignoring outliers. Example with window=3: [22, 55, 23] sorted: [22, 23, 55] median: 23C (spike ignored). A moving average would give (22+55+23)/3 = 33.3C, still affected by the spike. Use median filters for spike/impulse noise, moving average for Gaussian noise.Question 2: Your temperature sensor reads 1.2C in an ice bath (should be 0C) and 98.8C in boiling water (should be 100C). What are the calibration slope and offset?
Click to see answer
Answer: Slope = 1.025, Offset = -1.23C. Calculation: slope = (100 - 0) / (98.8 - 1.2) = 100 / 97.6 = 1.025. Offset = 0 - (1.025 x 1.2) = -1.23. Calibrated value = slope x raw + offset = 1.025 x raw - 1.23. For example, raw reading of 23.5C would become: 1.025 x 23.5 - 1.23 = 22.86C (corrected).Question 3: What is the main advantage of a Kalman filter over a simple moving average filter?
Click to see answer
Answer: Kalman filters adapt dynamically based on measurement uncertainty and process noise, providing optimal estimates that balance between sensor measurements and predictions. Moving average treats all samples equally with fixed weights. Kalman filters are ideal for tracking changing values (like tracking a moving object) because they predict the next state and adjust based on measurement confidence. They also respond faster to real changes while still filtering noise effectively.Question 4: Why is regular calibration important for IoT sensors deployed in the field?
Click to see answer
Answer: Sensors experience drift over time due to aging, environmental exposure (temperature cycles, humidity, contamination), mechanical stress, and component degradation. Drift causes systematic errors where the sensor gradually becomes less accurate. Regular calibration (every 6-12 months for precision applications) corrects this drift and maintains measurement accuracy. Without calibration, a sensor that was initially 0.5C accurate might drift to 3C over a year, making data unreliable for critical applications.551.7 Common Processing Pitfalls
The Mistake: Using a large moving average window (N=32 or N=64 samples) to filter sensor data that changes rapidly, introducing unacceptable lag that makes control systems sluggish or miss transient events entirely.
Why It Happens: Moving average is simple to implement and tutorials recommend larger windows for “smoother” data. For slowly-changing signals (room temperature sampled at 1Hz), a 10-second window works well. But applying the same approach to fast signals (accelerometer at 100Hz, current sensing for motor control) adds N/2 samples of delay - a 32-sample filter at 100Hz introduces 160ms lag, making closed-loop control unstable.
The Fix: Match filter characteristics to signal dynamics:
- Slow signals (temperature, humidity): Moving average N=8-32 at 1Hz sampling, 4-16 second settling time
- Medium signals (distance, pressure): Exponential moving average (EMA) with alpha=0.1-0.3, responds faster while filtering noise
- Fast signals (motor current, vibration): Use IIR filters (Butterworth, Chebyshev) designed for specific cutoff frequency
EMA Formula: filtered = alpha x new_value + (1-alpha) x previous_filtered
The Mistake: Calibrating a thermistor, pH sensor, or photodiode at only one reference point (e.g., room temperature, pH 7, or ambient light), then assuming the calibration applies across the entire measurement range.
Why It Happens: Single-point calibration is quick - adjust offset so the reading matches one known value and ship. This works for sensors with linear response and negligible gain error. But many sensors are inherently non-linear: thermistors follow the Steinhart-Hart equation (exponential), pH electrodes have temperature-dependent Nernst slope, photodiodes have logarithmic response at high intensity.
The Fix: Use two-point calibration minimum for linear sensors, three or more points for non-linear sensors:
- Linear sensors (RTD, 4-20mA transmitters): Calibrate at 10% and 90% of range
- Thermistor (NTC): Use Steinhart-Hart equation with three calibration points (0C, 25C, 100C)
- pH sensor: Calibrate at pH 4.0, 7.0, and 10.0 buffers
Warning Signs: You need multi-point calibration if: (1) sensor datasheet shows non-linear response curve, (2) accuracy degrades significantly away from single calibration point, (3) sensor type is known to be non-linear.
551.8 Summary
This chapter covered essential sensor data processing techniques:
- Moving Average Filter: Simple noise reduction by averaging N samples, best for slow-changing signals
- Kalman Filter: Optimal adaptive filtering that balances predictions with measurements
- Median Filter: Spike/outlier removal by selecting the middle value
- Two-Point Calibration: Corrects offset and gain errors using two reference points
- Multi-Point Calibration: Handles non-linear sensors with piecewise interpolation
- EEPROM Storage: Persists calibration across power cycles
551.9 What’s Next
The next chapter covers Sensor Networks and Power Management, including multi-sensor data aggregation, low-power sleep modes, and battery life optimization strategies for wireless sensor nodes.
- Sensor Communication Protocols - I2C, SPI, UART interfaces
- Multi-Sensor Fusion - Combining multiple sensors
- Sensor Calibration Lab - Hands-on Wokwi calibration workshop