551 Sensor Data Processing

Filtering, Calibration, and Signal Conditioning

sensing

data-processing

filtering

calibration

Author

IoT Textbook

Published

January 19, 2026

Keywords

sensor filtering, calibration, moving average, Kalman filter, signal conditioning, noise reduction

551.1 Learning Objectives

By the end of this chapter, you will be able to:

Implement moving average and Kalman filters to reduce sensor noise
Perform two-point calibration to correct sensor offset and gain errors
Design data validation pipelines that detect anomalies and outliers
Choose appropriate filtering strategies based on signal characteristics
Store and retrieve calibration coefficients for production deployments

551.2 Introduction

Raw sensor data is rarely perfect. Environmental noise, electrical interference, and manufacturing variations all affect measurement accuracy. This chapter covers the essential techniques for transforming noisy, uncalibrated sensor readings into reliable, accurate measurements.

For Beginners: Why Process Sensor Data?

Imagine a thermometer that always reads 2 degrees too high, and sometimes jumps around randomly. That is what raw sensor data often looks like. Data processing is like training that thermometer to be accurate and stable. We use “filters” to smooth out the jumps and “calibration” to correct the 2-degree error. Without these steps, your IoT system would make decisions based on wrong information.

551.3 Filtering Noisy Sensor Data

20 min | Intermediate | P06.C09.U02a

Sensor noise comes from many sources: electrical interference, quantization errors, and environmental factors. Filters remove this noise while preserving the true signal.

551.3.1 Moving Average Filter

The moving average filter is the simplest and most common approach. It averages the last N readings to smooth out random variations.

// Moving average filter
class MovingAverageFilter {
  private:
    float* buffer;
    int size;
    int index;
    float sum;

  public:
    MovingAverageFilter(int windowSize) {
      size = windowSize;
      buffer = new float[size];
      index = 0;
      sum = 0;

      for(int i = 0; i < size; i++) {
        buffer[i] = 0;
      }
    }

    float filter(float value) {
      sum -= buffer[index];
      buffer[index] = value;
      sum += value;

      index = (index + 1) % size;

      return sum / size;
    }
};

551.3.2 Kalman Filter

The Kalman filter provides optimal noise reduction by modeling the system dynamics. It adapts based on measurement uncertainty and process noise.

// Kalman filter (simple 1D implementation)
class KalmanFilter {
  private:
    float q;  // Process noise covariance
    float r;  // Measurement noise covariance
    float x;  // Estimated value
    float p;  // Estimation error covariance
    float k;  // Kalman gain

  public:
    KalmanFilter(float processNoise, float measurementNoise, float initialValue) {
      q = processNoise;
      r = measurementNoise;
      x = initialValue;
      p = 1;
    }

    float filter(float measurement) {
      // Prediction
      p = p + q;

      // Update
      k = p / (p + r);
      x = x + k * (measurement - x);
      p = (1 - k) * p;

      return x;
    }
};

// Usage example
MovingAverageFilter maFilter(10);  // 10-sample window
KalmanFilter kFilter(0.01, 0.1, 25.0);  // Process noise, measurement noise, initial value

void loop() {
  float rawTemp = readTemperature();

  float filteredMA = maFilter.filter(rawTemp);
  float filteredKalman = kFilter.filter(rawTemp);

  Serial.print("Raw: ");
  Serial.print(rawTemp);
  Serial.print(" | Moving Avg: ");
  Serial.print(filteredMA);
  Serial.print(" | Kalman: ");
  Serial.println(filteredKalman);

  delay(100);
}

551.3.3 Median Filter for Spike Removal

When sensor data has occasional spike errors (outliers), a median filter is more effective than averaging.

float medianFilter(float* buffer, int size) {
    float sorted[size];
    memcpy(sorted, buffer, size * sizeof(float));

    // Simple bubble sort for small arrays
    for (int i = 0; i < size - 1; i++) {
        for (int j = 0; j < size - i - 1; j++) {
            if (sorted[j] > sorted[j+1]) {
                float temp = sorted[j];
                sorted[j] = sorted[j+1];
                sorted[j+1] = temp;
            }
        }
    }

    return sorted[size / 2];  // Return middle value
}

// Example: [22, 55, 23] -> sorted: [22, 23, 55] -> median: 23
// The spike (55) is completely ignored!

Tradeoff: Moving Average vs. Kalman Filter for Noise Reduction

Option A: Moving Average (N=10 samples): Memory usage 40 bytes (10 floats), CPU cycles ~20 per update, latency 10 samples (fixed delay), noise reduction sqrt(N) = 3.16x, implementation complexity low (10 lines of code), no tuning parameters

Option B: Kalman Filter (1D): Memory usage 20 bytes (5 floats for state), CPU cycles ~50 per update (multiply/divide), latency 1-3 samples (adaptive), noise reduction 5-10x (optimal for known noise), implementation complexity medium (30 lines), requires Q and R tuning

Decision Factors: For stationary signals with Gaussian noise (temperature averaging), moving average is simpler and nearly as effective. For tracking changing signals (position, velocity, acceleration) where latency matters, Kalman filters provide faster response with better noise rejection. Kalman requires knowing process noise (Q) and measurement noise (R) - wrong values degrade performance. For resource-constrained 8-bit MCUs (ATmega328), moving average’s integer-only math saves flash and runs faster. ESP32’s floating-point unit makes Kalman practical.

551.4 Sensor Calibration

25 min | Intermediate | P06.C09.U02b

Calibration corrects systematic errors in sensor readings. Two-point calibration addresses both offset (zero-point shift) and gain (sensitivity) errors.

551.4.1 Two-Point Calibration

// Two-point calibration for linear sensors
struct CalibrationData {
  float rawLow;
  float rawHigh;
  float actualLow;
  float actualHigh;
};

CalibrationData cal = {
  .rawLow = 512,      // ADC reading at low point
  .rawHigh = 3584,    // ADC reading at high point
  .actualLow = 0.0,   // Actual value at low point
  .actualHigh = 100.0 // Actual value at high point
};

float calibrate(float rawValue) {
  // Linear interpolation
  float slope = (cal.actualHigh - cal.actualLow) / (cal.rawHigh - cal.rawLow);
  float calibratedValue = cal.actualLow + slope * (rawValue - cal.rawLow);

  return calibratedValue;
}

// Store calibration in EEPROM
#include <EEPROM.h>

void saveCalibration() {
  EEPROM.begin(512);
  EEPROM.put(0, cal);
  EEPROM.commit();
  Serial.println("Calibration saved");
}

void loadCalibration() {
  EEPROM.begin(512);
  EEPROM.get(0, cal);
  Serial.println("Calibration loaded");
}

551.4.2 Worked Example: Calibrating a Soil Moisture Sensor

551.5 Worked Example: Calibrating an Analog Sensor

Scenario: You are deploying a soil moisture monitoring system for a greenhouse. The capacitive soil moisture sensor outputs an analog voltage (0-3.3V) that varies with soil moisture content. However, the raw ADC readings do not correspond to meaningful moisture percentages. The sensor reads approximately 3000 (ADC units) in completely dry soil and 1200 in saturated soil. You need accurate readings to trigger irrigation at 30% moisture.

Goal: Develop and implement a two-point calibration procedure to convert raw ADC readings into calibrated moisture percentages (0-100%).

What we do: Measure the sensor’s output range and behavior.

Initial measurements:

Condition	ADC Reading (12-bit, 0-4095)	Expected Moisture
Air (no soil)	3450	~0% (baseline)
Bone-dry soil (oven-dried)	3000	0%
Field capacity (well-watered)	1800	~60-70%
Saturated soil (standing water)	1200	100%

Observations:

Output is inversely proportional to moisture (higher moisture = lower ADC value)
Range spans approximately 1200-3000 ADC units for the usable moisture range
Response is approximately linear in the 20-80% moisture range

What we do: Establish known moisture levels using gravimetric method.

Gravimetric calibration procedure:

Prepare soil samples: Collect 5 containers of identical soil (200g each)
Create moisture levels:
- Sample A: Oven-dry at 105C for 24 hours (0% moisture)
- Sample B: Add 10g water (5% moisture by weight)
- Sample C: Add 30g water (15% moisture)
- Sample D: Add 60g water (30% moisture - irrigation trigger)
- Sample E: Saturate and drain (field capacity, ~60%)
Record calibration data:

Sample	Added Water (g)	Calculated Moisture (%)	ADC Reading
A	0	0%	2988
B	10	5%	2855
C	30	15%	2500
D	60	30%	2030
E	~120 (saturated)	60%	1515

What we do: Fit a linear equation to the calibration data.

Two-point calibration (using dry and field capacity points):

Point 1 (Low): ADC = 2988, Moisture = 0%
Point 2 (High): ADC = 1515, Moisture = 60%

Calculate slope (m) and offset (b):

\[m = \frac{Y_2 - Y_1}{X_2 - X_1} = \frac{60 - 0}{1515 - 2988} = \frac{60}{-1473} = -0.0407\]

\[b = Y_1 - m \times X_1 = 0 - (-0.0407 \times 2988) = 121.6\]

Calibration equation:

\[\text{Moisture \%} = -0.0407 \times \text{ADC} + 121.6\]

What we do: Create production-ready calibration code.

#include <EEPROM.h>

#define SOIL_PIN 34
#define NUM_SAMPLES 10

struct Calibration {
    uint32_t magic;
    float dryADC;
    float wetADC;
    float dryMoisture;
    float wetMoisture;
};

Calibration cal = {
    .magic = 0xCAFEBABE,
    .dryADC = 2988.0,
    .wetADC = 1200.0,
    .dryMoisture = 0.0,
    .wetMoisture = 100.0
};

float getMoisturePercent() {
    // Read with median filtering
    float adcValue = readADCFiltered();

    // Linear interpolation with bounds checking
    float moisture = cal.dryMoisture +
        (cal.wetMoisture - cal.dryMoisture) *
        (cal.dryADC - adcValue) / (cal.dryADC - cal.wetADC);

    // Clamp to valid range
    if (moisture < 0.0) moisture = 0.0;
    if (moisture > 100.0) moisture = 100.0;

    return moisture;
}

Outcome: Successfully calibrated soil moisture sensor with 2% accuracy.

Accuracy achieved:

Moisture Range	Calibration Error	Acceptable?
0-20% (dry)	1.5%	Yes
20-40% (trigger zone)	2.0%	Yes
40-60% (moist)	3.0%	Yes
60-100% (wet)	5.0%	Yes

Maintenance schedule:

Recalibrate every 6 months or after sensor replacement
Verify with known moisture sample monthly during growing season

551.6 Knowledge Check

Knowledge Check: Data Processing and Calibration Test Your Understanding

Question 1: You have a noisy temperature sensor that occasionally produces spike values (e.g., 22C, 23C, 55C, 22C). Which filter is better for removing these spikes: moving average or median filter?

Click to see answer

Answer: Median filter is better for removing spike noise. A median filter takes the middle value of a window, effectively ignoring outliers. Example with window=3: [22, 55, 23] sorted: [22, 23, 55] median: 23C (spike ignored). A moving average would give (22+55+23)/3 = 33.3C, still affected by the spike. Use median filters for spike/impulse noise, moving average for Gaussian noise.

Question 2: Your temperature sensor reads 1.2C in an ice bath (should be 0C) and 98.8C in boiling water (should be 100C). What are the calibration slope and offset?

Click to see answer

Answer: Slope = 1.025, Offset = -1.23C. Calculation: slope = (100 - 0) / (98.8 - 1.2) = 100 / 97.6 = 1.025. Offset = 0 - (1.025 x 1.2) = -1.23. Calibrated value = slope x raw + offset = 1.025 x raw - 1.23. For example, raw reading of 23.5C would become: 1.025 x 23.5 - 1.23 = 22.86C (corrected).

Question 3: What is the main advantage of a Kalman filter over a simple moving average filter?

Click to see answer

Answer: Kalman filters adapt dynamically based on measurement uncertainty and process noise, providing optimal estimates that balance between sensor measurements and predictions. Moving average treats all samples equally with fixed weights. Kalman filters are ideal for tracking changing values (like tracking a moving object) because they predict the next state and adjust based on measurement confidence. They also respond faster to real changes while still filtering noise effectively.

Question 4: Why is regular calibration important for IoT sensors deployed in the field?

Click to see answer

Answer: Sensors experience drift over time due to aging, environmental exposure (temperature cycles, humidity, contamination), mechanical stress, and component degradation. Drift causes systematic errors where the sensor gradually becomes less accurate. Regular calibration (every 6-12 months for precision applications) corrects this drift and maintains measurement accuracy. Without calibration, a sensor that was initially 0.5C accurate might drift to 3C over a year, making data unreliable for critical applications.

551.7 Common Processing Pitfalls

Pitfall: Applying Moving Average Filter to Non-Stationary Signals

The Mistake: Using a large moving average window (N=32 or N=64 samples) to filter sensor data that changes rapidly, introducing unacceptable lag that makes control systems sluggish or miss transient events entirely.

Why It Happens: Moving average is simple to implement and tutorials recommend larger windows for “smoother” data. For slowly-changing signals (room temperature sampled at 1Hz), a 10-second window works well. But applying the same approach to fast signals (accelerometer at 100Hz, current sensing for motor control) adds N/2 samples of delay - a 32-sample filter at 100Hz introduces 160ms lag, making closed-loop control unstable.

The Fix: Match filter characteristics to signal dynamics:

Slow signals (temperature, humidity): Moving average N=8-32 at 1Hz sampling, 4-16 second settling time
Medium signals (distance, pressure): Exponential moving average (EMA) with alpha=0.1-0.3, responds faster while filtering noise
Fast signals (motor current, vibration): Use IIR filters (Butterworth, Chebyshev) designed for specific cutoff frequency

EMA Formula: filtered = alpha x new_value + (1-alpha) x previous_filtered

Pitfall: Single-Point Calibration for Non-Linear Sensors

The Mistake: Calibrating a thermistor, pH sensor, or photodiode at only one reference point (e.g., room temperature, pH 7, or ambient light), then assuming the calibration applies across the entire measurement range.

Why It Happens: Single-point calibration is quick - adjust offset so the reading matches one known value and ship. This works for sensors with linear response and negligible gain error. But many sensors are inherently non-linear: thermistors follow the Steinhart-Hart equation (exponential), pH electrodes have temperature-dependent Nernst slope, photodiodes have logarithmic response at high intensity.

The Fix: Use two-point calibration minimum for linear sensors, three or more points for non-linear sensors:

Linear sensors (RTD, 4-20mA transmitters): Calibrate at 10% and 90% of range
Thermistor (NTC): Use Steinhart-Hart equation with three calibration points (0C, 25C, 100C)
pH sensor: Calibrate at pH 4.0, 7.0, and 10.0 buffers

Warning Signs: You need multi-point calibration if: (1) sensor datasheet shows non-linear response curve, (2) accuracy degrades significantly away from single calibration point, (3) sensor type is known to be non-linear.

551.8 Summary

This chapter covered essential sensor data processing techniques:

Moving Average Filter: Simple noise reduction by averaging N samples, best for slow-changing signals
Kalman Filter: Optimal adaptive filtering that balances predictions with measurements
Median Filter: Spike/outlier removal by selecting the middle value
Two-Point Calibration: Corrects offset and gain errors using two reference points
Multi-Point Calibration: Handles non-linear sensors with piecewise interpolation
EEPROM Storage: Persists calibration across power cycles

551.9 What’s Next

The next chapter covers Sensor Networks and Power Management, including multi-sensor data aggregation, low-power sleep modes, and battery life optimization strategies for wireless sensor nodes.

Related Chapters

Sensor Communication Protocols - I2C, SPI, UART interfaces
Multi-Sensor Fusion - Combining multiple sensors
Sensor Calibration Lab - Hands-on Wokwi calibration workshop