15 Sensor Data Processing

Filtering, Calibration, and Signal Conditioning

sensing

data-processing

filtering

calibration

Author

IoT Textbook

Published

July 22, 2026

Keywords

sensor filtering, calibration, moving average, Kalman filter, signal conditioning, noise reduction

15.1 Start With the Measurement Story

A reading becomes information only after noise, drift, outliers, scaling, and timing are handled. Start with the claim the processed value must support, then choose the smallest processing step that makes that claim reliable.

Chapter Roadmap

This chapter has four passes through the same measurement story:

First you separate raw readings from trustworthy readings by naming noise, drift, offset, gain, and calibration.
Then you compare moving average, median, EMA, Kalman, and frequency-aware filters against the failure mode each one solves.
Next you turn raw ADC counts into engineering units with two-point calibration and a soil-moisture worked example.
Finally you test the whole pipeline with quizzes, matching, ordering, labeling, code, and the companion sampling-aliasing boundary.

Checkpoints summarize what you should be able to use immediately. Optional C++ patterns and deep calculations are there when you need implementation detail.

15.2 Learning Objectives

By the end of this chapter, you will be able to:

Implement and compare moving average, Kalman, and median filters to reduce sensor noise
Execute two-point calibration procedures to correct sensor offset and gain errors
Design data validation pipelines that detect anomalies and reject outliers
Select and justify appropriate filtering strategies based on signal characteristics and resource constraints
Persist calibration coefficients in EEPROM and restore them reliably across power cycles

15.3 In 60 Seconds

Raw sensor readings contain noise (random variation from electrical interference) and systematic errors (consistent offsets or gain errors from manufacturing). Filtering — moving average, Kalman, or median — reduces noise. Calibration — one-point for offset, two-point for offset plus gain — corrects systematic errors. Store calibration coefficients in EEPROM so they persist across power cycles.

15.4 Prerequisites

Before diving into this chapter, you should be familiar with:

Sensor Fundamentals: Understanding of sensor types and output characteristics
ADC Fundamentals: How analog signals are converted to digital values
Electricity Basics: Basic understanding of voltage, resistance, and signal levels
Basic Statistics (general knowledge): Mean, median, and variance concepts

15.5 Key Concepts

Noise: Random variations in sensor readings caused by electrical interference, temperature fluctuations, or quantization errors
Filtering: Mathematical techniques to remove noise while preserving the true signal
Calibration: Process of adjusting sensor output to match known reference values
Offset Error: Constant shift in readings across the entire measurement range (zero-point shift)
Gain Error: Proportional error that increases with measured value (sensitivity error)
Drift: Gradual change in sensor accuracy over time due to aging or environmental factors
EEPROM: Non-volatile memory for storing calibration data that persists across power cycles

15.6 Quick Check: Processing Purpose

15.7 Introduction

Raw sensor data is rarely perfect. Environmental noise, electrical interference, and manufacturing variations all affect measurement accuracy. This chapter covers the essential techniques for transforming noisy, uncalibrated sensor readings into reliable, accurate measurements.

15.8 Sensor Data Processing

Core Concept: Raw sensor readings contain noise (random variations) and systematic errors (consistent offsets). Processing transforms unreliable raw data into accurate, usable measurements through filtering and calibration.

Why It Matters: An uncalibrated temperature sensor reading “25.3C” might actually mean anything from 23C to 28C. For an IoT system controlling HVAC or monitoring medical equipment, this uncertainty is unacceptable. Processing ensures your decisions are based on reality, not sensor artifacts.

Key Takeaway: Apply a moving average filter for slow-changing signals (temperature, humidity), use median filters to remove spikes, and always perform two-point calibration before deployment. Store calibration coefficients in EEPROM so they survive power cycles.

15.9 Making Sensors Tell Truth

Meet our friends: Sammy the Sensor, Lila the Light, Max the Microcontroller, and Bella the Battery!

Sammy says: “I try my best to measure temperature, but sometimes I get a little jittery - like when you try to draw a straight line but your hand wobbles! That’s called NOISE.”

Lila explains: “Imagine you’re trying to count how many people are in a room, but every time you count, you get a slightly different number - 23, 25, 22, 24. A FILTER is like asking 5 friends to count and taking the average. Much more accurate!”

Real-world example: Think about your bathroom scale. If you step on it three times and get 65 kg, 64 kg, and 66 kg, you’d probably say you weigh about 65 kg - that’s filtering! And if you know your scale always reads 1 kg too high (your friend’s scale says 64 kg), you’d subtract 1 kg - that’s calibration!

Max’s tip: “Here’s how to make sensors tell the truth: 1. Filter the noise - Take several readings and average them (like asking multiple people to measure) 2. Calibrate for accuracy - Compare against a known reference and adjust (like using a ruler you KNOW is correct) 3. Save the settings - Store calibration values so you don’t have to do it every time you turn on!”

Bella adds: “Good filtering means we don’t need to send as many messages to fix bad readings - that saves MY energy so your IoT device lasts longer on batteries!”

15.10 Moving Average Noise Reduction

Raw sensor readings: 24.8°C, 25.2°C, 24.9°C, 25.1°C, 25.0°C

Standard deviation (noise): \(\sigma = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n}} = \sqrt{\frac{0.04+0.04+0.01+0.01+0}{5}} = \sqrt{0.02} = 0.14°\text{C}\)

After 5-sample moving average: \(\bar{x} = \frac{24.8+25.2+24.9+25.1+25.0}{5} = 25.0°\text{C}\)

Noise reduction: \(\frac{\sigma}{\sqrt{n}} = \frac{0.14}{\sqrt{5}} = 0.063°\text{C}\) — approximately a 2.24x improvement.

For N samples, noise reduces by factor \(\sqrt{N}\), demonstrating the mathematical basis for why averaging multiple readings improves measurement quality beyond individual sensor accuracy.

Try It: Noise Reduction Calculator

Show code

viewof numSamples = Inputs.range([1, 50], {
  value: 5,
  step: 1,
  label: "Number of samples (N)"
})

viewof originalNoise = Inputs.range([0.05, 1.0], {
  value: 0.14,
  step: 0.01,
  label: "Original noise (σ in °C)"
})

reducedNoise = originalNoise / Math.sqrt(numSamples)
improvementFactor = reducedNoise > 0 ? originalNoise / reducedNoise : 0

html`<div style="background: var(--bs-light, #f8f9fa); padding: 20px; border-radius: 8px; border-left: 4px solid #16A085; margin-top: 15px; font-family: Arial, sans-serif;">
  <h4 style="color: #2C3E50; margin-top: 0;">Noise Reduction Results</h4>
  <table style="width: 100%; border-collapse: collapse; margin-top: 10px;">
    <tr style="background: #e8f5f2;">
      <td style="padding: 10px; border-bottom: 1px solid #d4e9e2; font-weight: bold; color: #2C3E50;">Original Noise (σ)</td>
      <td style="padding: 10px; border-bottom: 1px solid #d4e9e2; color: #2C3E50;">${originalNoise.toFixed(3)}°C</td>
    </tr>
    <tr>
      <td style="padding: 10px; border-bottom: 1px solid #d4e9e2; font-weight: bold; color: #2C3E50;">Samples Averaged (N)</td>
      <td style="padding: 10px; border-bottom: 1px solid #d4e9e2; color: #2C3E50;">${numSamples}</td>
    </tr>
    <tr style="background: #e8f5f2;">
      <td style="padding: 10px; border-bottom: 1px solid #d4e9e2; font-weight: bold; color: #2C3E50;">Reduced Noise (σ/√N)</td>
      <td style="padding: 10px; border-bottom: 1px solid #d4e9e2; color: #16A085; font-weight: bold;">${reducedNoise.toFixed(3)}°C</td>
    </tr>
    <tr>
      <td style="padding: 10px; font-weight: bold; color: #2C3E50;">Improvement Factor</td>
      <td style="padding: 10px; color: #E67E22; font-weight: bold;">${improvementFactor.toFixed(2)}×</td>
    </tr>
  </table>
  <p style="margin-top: 15px; color: #555; font-size: 14px; font-style: italic;">
    Formula: Reduced Noise = σ / √N. Note that doubling improvement requires 4× samples (e.g., 2× improvement needs N=4, 3× needs N=9).
  </p>
</div>`

Checkpoint: Measurement Meaning

You now know:

Processing is only useful when it preserves the measurement claim instead of smoothing away the evidence.
A 5-sample average can turn readings near 25C into a 25.0C estimate while reducing noise from 0.14C to 0.063C.
Averaging improves as sqrt(N): the example’s 5 samples give about a 2.24x improvement, not a free accuracy guarantee.

15.11 Filtering Noisy Sensor Data

20 min | Intermediate | P06.C09.U02a

Sensor noise comes from many sources: electrical interference, quantization errors, and environmental factors. Filters remove this noise while preserving the true signal.

The roadmap now shifts from “why process?” to “which process?” Start with the noise pattern you can observe, because the wrong filter can make a clean-looking but misleading signal.

15.12 Folded Sampling And Aliasing Notes

Filtering starts before the ADC. If a signal is sampled below twice the highest frequency of interest, high-frequency content can appear as a false low-frequency pattern. Diagnose aliasing by naming the highest signal component, comparing it with the sample rate, changing the sample rate during a test, or checking the real waveform with faster capture.

Use the Nyquist limit as a minimum, not a complete design. Practical systems often sample faster than two times the highest frequency because analog filters roll off gradually. Place an anti-aliasing low-pass filter before the ADC when unwanted higher-frequency energy could enter the sampled data, then choose the digital filter after the sampled record preserves the signal you actually care about.

Phoebe’s Field Notes: Where Vibration Nyquist Meets the Sensor’s Own Step Size

Phoebe the physics guide

Phoebe’s Why

The “Vibration Filter Selection” scenario further down this page samples an ADXL345 at 100 Hz to catch bearing wear before it fails – but 100 Hz is chosen at the acquisition stage, before any of this page’s median or moving-average filters ever run. Nyquist sets a hard ceiling on what that choice can see: nothing above half the sample rate survives without folding into a lower, wrong frequency. Underneath that timing limit sits a second, physical one – the accelerometer converts g’s into digital counts in fixed-size steps, so even a perfectly timed sample still carries a small built-in rounding uncertainty from the transduction step itself.

The Derivation

Nyquist criterion:

\[f_s \geq 2 f_{max}\]

As this page’s own Folded Sampling section states, an undersampled component folds to

\[f_{alias} = |f_{signal} - k f_s|\]

Transduction quantization step (sensitivity per code) and its noise floor:

\[q = \text{sensitivity per LSB}, \qquad \sigma_q = \frac{q}{\sqrt{12}}\]

Worked Numbers: This Page’s Own ADXL345 Scenario

100 Hz sampling (this page’s own figure) gives Nyquist \(f_{max} = 100/2 = 50.0\) Hz. Any real bearing-defect harmonic above 50 Hz would alias into the passband before either of this page’s own filter stages ever sees it – exactly why an anti-alias filter belongs ahead of the ADC, not inside the two-stage median-then-moving-average filter this page designs.
The ADXL345 full-resolution mode has a catalog-typical fixed step of \(3.9\) mg/LSB \(= 0.0039\) g regardless of range, so the quantization noise floor is \(\sigma_q = 0.0039/\sqrt{12} = 0.00113\) g RMS.
Compare that to this page’s own noise sources for this scenario: the \(\pm 0.3\) g electrical-interference spikes are about \(0.3/0.00113 = 266\times\) (48.5 dB) larger than \(\sigma_q\), and the \(\pm 0.15\) g slow drift is about \(133\times\) larger. The two-stage filter this page selects is earning its keep against real electrical and mechanical noise, not against the transducer’s own quantization floor, which was never the dominant term.

15.12.1 Sensor Data Processing Pipeline

The following diagram shows the complete sensor data processing pipeline from raw readings to calibrated output:

Sensor data processing pipeline showing stages from raw ADC input through outlier rejection, smoothing, calibration, to clean output in engineering units — Sensor data processing pipeline from raw readings to calibrated output

15.12.2 Filter Selection Decision Tree

Choosing the right filter depends on your signal characteristics and noise type:

Filter Selection Decision Tree

Start with the failure mode you see in real readings, then choose the lightest filter that solves that problem.

Occasional impossible spikes?

Use a median filter first. It rejects outliers without averaging the bad value into later readings.

Slow signal with random jitter?

Use a moving average. Temperature, humidity, and pressure often only need a short rolling window.

Need low memory or faster response?

Use exponential moving average. Tune alpha: lower is smoother, higher follows changes faster.

Tracking motion or dynamic state?

Use a Kalman filter when you can estimate process noise and measurement noise.

Known frequency interference?

Use an IIR, Butterworth, or notch filter when you know the unwanted frequency band.

Still biased after filtering?

Filtering cannot fix offset or gain error. Calibrate with one, two, or multiple reference points.

Practical filter selection for sensor data processing

15.12.3 Moving Average Filter

The moving average filter is the simplest and most common approach. It averages the last N readings to smooth out random variations.

Use it when the measured quantity changes slowly and the problem is random jitter, not sudden spikes. For a beginner, the behavior matters more than the implementation: a larger window gives a smoother line but reacts more slowly.

15.12.4 Moving-Average Behavior

Keep the last N readings.
Remove the oldest reading when the window is full.
Add the newest reading.
Return the sum divided by the number of valid readings.

15.12.5 Optional C++ Pattern

// Moving average filter (template-based, stack-allocated)
// Template parameter N sets window size at compile time —
// avoids heap allocation, which is unsafe on memory-constrained MCUs.
template<int N>
class MovingAverageFilter {
  private:
    float buffer[N];  // Stack-allocated, size known at compile time
    int index;
    int count;         // Tracks samples received (for cold-start)
    float sum;

  public:
    MovingAverageFilter() {
      index = 0;
      count = 0;
      sum = 0;

      for(int i = 0; i < N; i++) {
        buffer[i] = 0;
      }
    }

    float filter(float value) {
      sum -= buffer[index];
      buffer[index] = value;
      sum += value;

      index = (index + 1) % N;

      if (count < N) count++;  // Track fill level

      return sum / count;  // Divide by actual samples, not window size
    }
};

15.12.6 Kalman Filter

The Kalman filter provides optimal noise reduction for linear systems with Gaussian noise by modeling the system dynamics. It adapts based on measurement uncertainty and process noise. (For non-linear sensors such as thermistors or pH electrodes, Extended or Unscented Kalman Filters are needed instead.)

For most beginner IoT projects, do not start with Kalman. Start with the interactive comparison below, then choose Kalman only when the signal represents a changing state such as position, velocity, or orientation and you can estimate process and measurement noise.

15.12.7 Optional C++ Pattern

// Kalman filter (simple 1D implementation)
class KalmanFilter {
  private:
    float q;  // Process noise covariance
    float r;  // Measurement noise covariance
    float x;  // Estimated value
    float p;  // Estimation error covariance
    float k;  // Kalman gain

  public:
    KalmanFilter(float processNoise, float measurementNoise, float initialValue) {
      q = processNoise;
      r = measurementNoise;
      x = initialValue;
      p = 1;
    }

    float filter(float measurement) {
      // Prediction
      p = p + q;

      // Update
      k = p / (p + r);
      x = x + k * (measurement - x);
      p = (1 - k) * p;

      return x;
    }
};

// Usage example
MovingAverageFilter<10> maFilter;  // 10-sample window (stack-allocated)
KalmanFilter kFilter(0.01, 0.1, 25.0);  // Process noise, measurement noise, initial value

void loop() {
  float rawTemp = readTemperature();

  float filteredMA = maFilter.filter(rawTemp);
  float filteredKalman = kFilter.filter(rawTemp);

  Serial.print("Raw: ");
  Serial.print(rawTemp);
  Serial.print(" | Moving Avg: ");
  Serial.print(filteredMA);
  Serial.print(" | Kalman: ");
  Serial.println(filteredKalman);

  delay(100);
}

15.12.8 Median Filter for Spike Removal

When sensor data has occasional spike errors (outliers), a median filter is more effective than averaging.

The key idea is simple: sort a short window and keep the middle value. A single bad spike is ignored instead of being averaged into future readings.

15.12.9 Optional C++ Pattern

// Median filter with fixed-size buffer (no VLA — portable C++)
const int MAX_MEDIAN_SIZE = 16;

float medianFilter(float* buffer, int size) {
    // Safety check: clamp to max supported window
    if (size > MAX_MEDIAN_SIZE) size = MAX_MEDIAN_SIZE;

    float sorted[MAX_MEDIAN_SIZE];
    memcpy(sorted, buffer, size * sizeof(float));

    // Bubble sort — O(N^2), but acceptable for small N typical
    // in sensor filtering (N=3 to N=9). For larger windows,
    // consider insertion sort or std::nth_element for O(N) median.
    for (int i = 0; i < size - 1; i++) {
        for (int j = 0; j < size - i - 1; j++) {
            if (sorted[j] > sorted[j+1]) {
                float temp = sorted[j];
                sorted[j] = sorted[j+1];
                sorted[j+1] = temp;
            }
        }
    }

    // For even N, returns upper-median (standard median averages
    // the two middle values, but single-value is simpler for embedded)
    return sorted[size / 2];
}

// Example: [22, 55, 23] -> sorted: [22, 23, 55] -> median: 23
// The spike (55) is completely ignored!

15.12.10 Filter Behavior Comparison

The following diagram compares moving average and median filter responses to the same noisy input with a spike. Kalman filter response varies based on Q and R parameters and is explored in the interactive calculator below.

Filter comparison chart showing how raw noisy signal, moving average, and median filter each respond to the same noisy input with a spike, demonstrating median filter's spike rejection ability — Filter comparison showing response to noisy input with spike

15.12.11 Filter Comparison Summary

Filter Type	Memory (N samples)	CPU Cost	Latency	Best For	Weakness
Moving Average	N floats (4N bytes)	O(1) with circular buffer	N/2 samples	Gaussian noise, slow signals	Passes spikes, fixed response
Median Filter	N floats (4N bytes)	O(N²) bubble sort; O(N) possible	N/2 samples	Spike/impulse noise	CPU intensive for large N
Exponential MA	1 float (4 bytes)	O(1)	(1-alpha)/alpha samples	Memory-constrained devices	Parameter tuning required
Kalman Filter	5 floats (20 bytes)	O(1) with multiply/divide	Adaptive	Tracking, state estimation	Requires noise characterization
IIR/Butterworth	Order x 2 floats	O(order)	Phase-dependent	Known frequency noise	Design complexity (beyond this chapter’s scope)

15.12.12 Exponential Moving Average (EMA)

The EMA is the most memory-efficient filter – it requires only a single float (4 bytes) of state. Unlike a moving average that weights all N samples equally, the EMA gives exponentially decreasing weight to older samples. The smoothing factor alpha (0 to 1) controls responsiveness: lower alpha means smoother output but slower response.

Think of alpha as a trust knob: high alpha trusts the newest reading, low alpha trusts the previous filtered value.

15.12.13 EMA Behavior

filtered = alpha x new_reading + (1 - alpha) x previous_filtered

High alpha follows new readings quickly; low alpha produces a smoother, slower response.

15.12.14 Optional C++ Pattern

// Exponential Moving Average — only 4 bytes of RAM
class EMAFilter {
  private:
    float filtered;
    float alpha;
    bool initialized;

  public:
    EMAFilter(float smoothingFactor) {
      alpha = smoothingFactor;
      filtered = 0;
      initialized = false;
    }

    float filter(float newValue) {
      if (!initialized) {
        filtered = newValue;  // First sample: no smoothing
        initialized = true;
      } else {
        filtered = alpha * newValue + (1.0 - alpha) * filtered;
      }
      return filtered;
    }
};

// Usage: alpha=0.1 for heavy smoothing, alpha=0.5 for fast response
EMAFilter emaFilter(0.1);

void loop() {
  float raw = readTemperature();
  float smoothed = emaFilter.filter(raw);  // O(1) time, 4 bytes RAM
  delay(100);
}

Try It: Filter Latency and EMA Explorer

Show code

viewof filterWindowSize = Inputs.range([2, 64], {
  value: 10,
  step: 1,
  label: "Moving average window size (N)"
})

viewof sampleRateHz = Inputs.range([1, 1000], {
  value: 100,
  step: 1,
  label: "Sample rate (Hz)"
})

viewof emaAlpha = Inputs.range([0.01, 0.99], {
  value: 0.1,
  step: 0.01,
  label: "EMA smoothing factor (alpha)"
})

Show code

maLatencySamples = filterWindowSize / 2
maLatencyMs = (maLatencySamples / sampleRateHz) * 1000
emaTimeSamples = (1 - emaAlpha) / emaAlpha
emaLatencyMs = (emaTimeSamples / sampleRateHz) * 1000
maNoiseReduction = Math.sqrt(filterWindowSize)
emaEquivWindow = 2 / emaAlpha - 1
emaNoiseReduction = Math.sqrt(emaEquivWindow)

html`<div style="background: var(--bs-light, #f8f9fa); padding: 20px; border-radius: 8px; border-left: 4px solid #3498DB; margin-top: 15px; font-family: Arial, sans-serif;">
  <h4 style="color: #2C3E50; margin-top: 0;">Filter Comparison Results</h4>
  <table style="width: 100%; border-collapse: collapse; margin-top: 10px;">
    <tr style="background: #eaf2f8;">
      <td style="padding: 8px; border-bottom: 1px solid #d4e5f7; font-weight: bold; color: #2C3E50;" colspan="3">Moving Average (N=${filterWindowSize})</td>
    </tr>
    <tr>
      <td style="padding: 8px; border-bottom: 1px solid #eee; color: #2C3E50;">Latency</td>
      <td style="padding: 8px; border-bottom: 1px solid #eee; color: #3498DB; font-weight: bold;">${maLatencySamples.toFixed(1)} samples = ${maLatencyMs.toFixed(1)} ms</td>
    </tr>
    <tr>
      <td style="padding: 8px; border-bottom: 1px solid #eee; color: #2C3E50;">Noise reduction</td>
      <td style="padding: 8px; border-bottom: 1px solid #eee; color: #3498DB; font-weight: bold;">${maNoiseReduction.toFixed(2)}x</td>
    </tr>
    <tr>
      <td style="padding: 8px; border-bottom: 1px solid #eee; color: #2C3E50;">Memory</td>
      <td style="padding: 8px; border-bottom: 1px solid #eee; color: #2C3E50;">${filterWindowSize} floats = ${filterWindowSize * 4} bytes</td>
    </tr>
    <tr style="background: #eaf2f8;">
      <td style="padding: 8px; border-bottom: 1px solid #d4e5f7; font-weight: bold; color: #2C3E50;" colspan="3">Exponential MA (alpha=${emaAlpha.toFixed(2)})</td>
    </tr>
    <tr>
      <td style="padding: 8px; border-bottom: 1px solid #eee; color: #2C3E50;">Time constant</td>
      <td style="padding: 8px; border-bottom: 1px solid #eee; color: #16A085; font-weight: bold;">${emaTimeSamples.toFixed(1)} samples = ${emaLatencyMs.toFixed(1)} ms</td>
    </tr>
    <tr>
      <td style="padding: 8px; border-bottom: 1px solid #eee; color: #2C3E50;">Equivalent MA window</td>
      <td style="padding: 8px; border-bottom: 1px solid #eee; color: #16A085; font-weight: bold;">~${emaEquivWindow.toFixed(0)} samples</td>
    </tr>
    <tr>
      <td style="padding: 8px; border-bottom: 1px solid #eee; color: #2C3E50;">Noise reduction</td>
      <td style="padding: 8px; border-bottom: 1px solid #eee; color: #16A085; font-weight: bold;">~${emaNoiseReduction.toFixed(2)}x</td>
    </tr>
    <tr>
      <td style="padding: 8px; color: #2C3E50;">Memory</td>
      <td style="padding: 8px; color: #2C3E50;">1 float = 4 bytes</td>
    </tr>
  </table>
  <p style="margin-top: 15px; color: #555; font-size: 13px; font-style: italic;">
    MA latency = N/2 samples. EMA time constant = (1-alpha)/alpha samples. Lower alpha = smoother but slower. Try alpha=0.5 for fast response, alpha=0.05 for heavy smoothing.
  </p>
</div>`

15.12.15 Moving Average vs Kalman Filters

Option A: Moving Average (N=10 samples): Memory usage 40 bytes (10 floats), CPU cycles ~20 per update, latency N/2 = 5 samples (fixed delay), noise reduction sqrt(N) = 3.16x, implementation complexity low (10 lines of code), no tuning parameters

Option B: Kalman Filter (1D): Memory usage 20 bytes (5 floats for state), CPU cycles ~50 per update (multiply/divide), latency 1-3 samples (adaptive), noise reduction 5-10x (optimal for known noise), implementation complexity medium (30 lines), requires Q and R tuning

Decision Factors: For stationary signals with Gaussian noise (temperature averaging), moving average is simpler and nearly as effective. For tracking changing signals (position, velocity, acceleration) where latency matters, Kalman filters provide faster response with better noise rejection. Kalman requires knowing process noise (Q) and measurement noise (R) – wrong values degrade performance. Note that Kalman optimality assumes linear dynamics and Gaussian noise; for non-linear sensors, consider Extended or Unscented Kalman Filters. For resource-constrained 8-bit MCUs (ATmega328), moving average’s integer-only math saves flash and runs faster. ESP32’s floating-point unit makes Kalman practical.

Try It: Kalman Filter Gain Explorer

Adjust Q (process noise) and R (measurement noise) to see how the Kalman gain converges. Higher Q/R ratio means the filter trusts measurements more; lower ratio means it trusts its own predictions more.

Show code

viewof kalmanQ = Inputs.range([0.001, 1.0], {
  value: 0.01,
  step: 0.001,
  label: "Process noise Q"
})

viewof kalmanR = Inputs.range([0.01, 5.0], {
  value: 0.1,
  step: 0.01,
  label: "Measurement noise R"
})

Show code

kalmanSteps = {
  let steps = [];
  let p = 1.0;  // Initial estimation error
  for (let i = 0; i < 20; i++) {
    p = p + kalmanQ;  // Prediction
    let k = p / (p + kalmanR);  // Update gain
    p = (1 - k) * p;  // Update covariance
    steps.push({step: i + 1, gain: k, covariance: p});
  }
  return steps;
}

steadyStateGain = kalmanSteps[kalmanSteps.length - 1].gain
trustPercent = (steadyStateGain * 100)

html`<div style="background: var(--bs-light, #f8f9fa); padding: 20px; border-radius: 8px; border-left: 4px solid #9B59B6; margin-top: 15px; font-family: Arial, sans-serif;">
  <h4 style="color: #2C3E50; margin-top: 0;">Kalman Gain Convergence</h4>
  <table style="width: 100%; border-collapse: collapse; margin-top: 10px;">
    <tr style="background: #f3ecf8;">
      <td style="padding: 10px; border-bottom: 1px solid #e2d5ed; font-weight: bold; color: #2C3E50;">Q/R Ratio</td>
      <td style="padding: 10px; border-bottom: 1px solid #e2d5ed; color: #2C3E50;">${(kalmanQ / kalmanR).toFixed(4)}</td>
    </tr>
    <tr>
      <td style="padding: 10px; border-bottom: 1px solid #eee; font-weight: bold; color: #2C3E50;">Steady-State Gain (K)</td>
      <td style="padding: 10px; border-bottom: 1px solid #eee; color: #9B59B6; font-weight: bold;">${steadyStateGain.toFixed(4)}</td>
    </tr>
    <tr style="background: #f3ecf8;">
      <td style="padding: 10px; border-bottom: 1px solid #e2d5ed; font-weight: bold; color: #2C3E50;">Measurement Trust</td>
      <td style="padding: 10px; border-bottom: 1px solid #e2d5ed; color: #9B59B6; font-weight: bold;">${trustPercent.toFixed(1)}% measurement, ${(100 - trustPercent).toFixed(1)}% prediction</td>
    </tr>
    <tr>
      <td style="padding: 10px; font-weight: bold; color: #2C3E50;">Interpretation</td>
      <td style="padding: 10px; color: #2C3E50;">${steadyStateGain > 0.5 ? "Filter favors measurements (responsive but noisier)" : steadyStateGain > 0.2 ? "Balanced between measurements and predictions" : "Filter favors predictions (smooth but slower to respond)"}</td>
    </tr>
  </table>
  <p style="margin-top: 15px; color: #555; font-size: 13px; font-style: italic;">
    Gain K converges after ~5-10 steps. K near 1.0 = follows raw data closely. K near 0.0 = heavy smoothing, ignores measurements. Try Q=0.001, R=1.0 for heavy smoothing; Q=0.5, R=0.1 for responsive tracking.
  </p>
</div>`

15.13 Vibration Filter Selection

Scenario: A food processing plant needs to monitor vibration on 12 conveyor belt motors. Excessive vibration indicates bearing wear requiring maintenance within 2-4 weeks.

Given:

Sensor: ADXL345 accelerometer on each motor (100 Hz sampling)
Normal vibration: 0.5-2.0 g RMS
Bearing wear threshold: >3.5 g RMS sustained for 10+ minutes
Noise sources: (1) electrical interference from motor drives (+/- 0.3 g spikes), (2) adjacent machine vibration (+/- 0.15 g slow drift)
MCU: ESP32 (240 MHz, FPU, 320 KB RAM)
Reporting interval: every 60 seconds to cloud via MQTT

Filter evaluation:

Filter	Spike Removal	Drift Rejection	RAM (per motor)	CPU/sample	Verdict
Moving Average (N=20)	Poor – spike averages into output	Good at N=20 (0.2s window)	80 bytes	0.2 us	Passes spikes
Median Filter (N=7)	Excellent – spike completely rejected	Moderate	28 bytes	1.5 us	Good for spikes
Kalman (Q=0.01, R=0.1)	Good – spike dampened in 2-3 samples	Excellent	20 bytes	0.8 us	Best overall tracking
Median + Moving Avg	Excellent	Excellent	108 bytes	1.7 us	Best accuracy

Selected approach: Two-stage filter (median then moving average).

Stage 1 – Median filter (N=5): Removes electrical interference spikes. At 100 Hz, a 5-sample window covers 50 ms – fast enough to preserve real vibration changes.
Stage 2 – Moving average (N=50): Smooths the median output over 0.5 seconds. Reports stable RMS value for threshold comparison.

Resource budget for 12 motors:

RAM: 12 motors x (20 + 200) bytes = 2,640 bytes (0.8% of ESP32 RAM)
CPU: 12 motors x 100 samples/s x 1.7 us = 2,040 us/s = 0.2% CPU utilization (conservative estimate)
Headroom: 99.2% RAM and 99.8% CPU available for MQTT, WiFi stack, and other tasks

Result: The two-stage filter detects bearing wear threshold crossings within approximately 0.5 seconds (the N=50 averaging window fill time at 100 Hz) while completely rejecting electrical interference spikes. The 10-minute sustained-threshold requirement is evaluated by comparing consecutive 0.5-second RMS averages over a sliding evaluation period. False alarm rate: 0.1% (vs. 12% with moving average alone). Missed detection rate: 0% for sustained threshold exceedances over 5 minutes.

Key Insight: For vibration monitoring, always use a median filter as the first stage to eliminate electrical spikes. A moving average alone will spread spike energy across the window, potentially triggering false bearing-wear alerts. The two-stage approach costs negligible additional resources on modern MCUs.

Checkpoint: Filter Choice

You now know:

Moving average is simple but adds N/2 samples of delay; with N=10 at 100 Hz, that becomes 50 ms.
EMA keeps only 1 float, or 4 bytes, while a 10-sample moving average stores 40 bytes.
Median filtering is the first stage for spikes, while Kalman filtering belongs where Q, R, and changing state are meaningful.

15.14 Sensor Calibration

25 min | Intermediate | P06.C09.U02b

Filtering removes random noise, but it cannot fix systematic errors built into the sensor itself. A sensor that consistently reads 2 degrees too high will still read 2 degrees too high after filtering – just with less jitter. Calibration corrects these systematic errors. Two-point calibration addresses both offset (zero-point shift) and gain (sensitivity) errors.

You have cleaned random variation. The next failure mode is bias: a value can be stable and still be wrong.

15.14.1 Calibration Error Types

Understanding the two main calibration errors helps you design effective correction strategies:

Sensor measurement error types diagram showing offset error as constant bias, gain error as sensitivity scaling, and random noise with target-style visualizations and correction strategies — Sensor measurement error types

15.14.2 Two-Point Calibration

Two-point calibration creates a linear correction by measuring at two known reference points:

Two-point calibration process diagram showing three steps: record low reference point, record high reference point, calculate gain and offset, with verification examples — Two-point calibration process

The calculation is the same idea students explored in the interactive calculator:

15.14.3 Two-Point Calibration Math

slope = (actual_high - actual_low) / (raw_high - raw_low)
calibrated = actual_low + slope x (raw_value - raw_low)

15.14.4 Optional C++ Pattern

// Two-point calibration for linear sensors
struct CalibrationData {
  float rawLow;
  float rawHigh;
  float actualLow;
  float actualHigh;
};

CalibrationData cal = {
  .rawLow = 512,      // ADC reading at low point
  .rawHigh = 3584,    // ADC reading at high point
  .actualLow = 0.0,   // Actual value at low point
  .actualHigh = 100.0 // Actual value at high point
};

float calibrate(float rawValue) {
  // Linear interpolation with division guard
  if (cal.rawHigh == cal.rawLow) {
    return cal.actualLow;  // Cannot calibrate with identical reference points
  }

  float slope = (cal.actualHigh - cal.actualLow) / (cal.rawHigh - cal.rawLow);
  float calibratedValue = cal.actualLow + slope * (rawValue - cal.rawLow);

  return calibratedValue;
}

// Store calibration in EEPROM
#include <EEPROM.h>

void saveCalibration() {
  EEPROM.begin(512);
  EEPROM.put(0, cal);
  EEPROM.commit();
  Serial.println("Calibration saved");
}

void loadCalibration() {
  EEPROM.begin(512);
  EEPROM.get(0, cal);
  Serial.println("Calibration loaded");
}

Try It: Two-Point Calibration Calculator

Show code

viewof rawLow = Inputs.range([0, 4095], {
  value: 512,
  step: 1,
  label: "Raw ADC at low reference point"
})

viewof rawHigh = Inputs.range([0, 4095], {
  value: 3584,
  step: 1,
  label: "Raw ADC at high reference point"
})

viewof actualLow = Inputs.range([-100, 200], {
  value: 0,
  step: 0.1,
  label: "Actual value at low point"
})

viewof actualHigh = Inputs.range([-100, 200], {
  value: 100,
  step: 0.1,
  label: "Actual value at high point"
})

viewof testRaw = Inputs.range([0, 4095], {
  value: 2048,
  step: 1,
  label: "Test raw ADC reading"
})

slope = (rawHigh - rawLow) !== 0
  ? (actualHigh - actualLow) / (rawHigh - rawLow)
  : 0

calibratedValue = slope !== 0
  ? actualLow + slope * (testRaw - rawLow)
  : 0

html`<div style="background: var(--bs-light, #f8f9fa); padding: 20px; border-radius: 8px; border-left: 4px solid #E67E22; margin-top: 15px; font-family: Arial, sans-serif;">
  <h4 style="color: #2C3E50; margin-top: 0;">Calibration Results</h4>
  <table style="width: 100%; border-collapse: collapse; margin-top: 10px;">
    <tr style="background: #fef5ec;">
      <td style="padding: 10px; border-bottom: 1px solid #f5e5d3; font-weight: bold; color: #2C3E50;">Calibration Slope (m)</td>
      <td style="padding: 10px; border-bottom: 1px solid #f5e5d3; color: #2C3E50;">${slope.toFixed(6)}</td>
    </tr>
    <tr>
      <td style="padding: 10px; border-bottom: 1px solid #f5e5d3; font-weight: bold; color: #2C3E50;">Test Raw Reading</td>
      <td style="padding: 10px; border-bottom: 1px solid #f5e5d3; color: #2C3E50;">${testRaw}</td>
    </tr>
    <tr style="background: #fef5ec;">
      <td style="padding: 10px; font-weight: bold; color: #2C3E50;">Calibrated Value</td>
      <td style="padding: 10px; color: #E67E22; font-weight: bold; font-size: 18px;">${slope !== 0 ? calibratedValue.toFixed(2) : "N/A"}</td>
    </tr>
  </table>
  <p style="margin-top: 15px; color: #555; font-size: 14px; font-style: italic;">
    Formula: Calibrated = actualLow + slope × (raw - rawLow), where slope = (actualHigh - actualLow) / (rawHigh - rawLow)
  </p>
  ${(rawHigh - rawLow) === 0 ? `<p style="color: #E74C3C; font-weight: bold; margin-top: 10px;">⚠️ Error: rawHigh must differ from rawLow to calculate slope!</p>` : ''}
</div>`

15.15 Soil Moisture Calibration

Scenario: You are deploying a soil moisture monitoring system for a greenhouse. The capacitive soil moisture sensor outputs an analog voltage (0-3.3V) that varies with soil moisture content. However, the raw ADC readings do not correspond to meaningful moisture percentages. The sensor reads approximately 3000 (ADC units) in completely dry soil and 1200 in saturated soil. You need accurate readings to trigger irrigation at 30% moisture.

Goal: Develop and implement a two-point calibration procedure to convert raw ADC readings into calibrated moisture percentages (0-100%).

What we do: Measure the sensor’s output range and behavior.

Initial measurements:

Condition	ADC Reading (12-bit, 0-4095)	Expected Moisture
Air (no soil)	3450	~0% (baseline)
Bone-dry soil (oven-dried)	3000	0%
Field capacity (well-watered)	1800	~60-70%
Saturated soil (standing water)	1200	100%

Observations:

Output is inversely proportional to moisture (higher moisture = lower ADC value)
Range spans approximately 1200-3000 ADC units for the usable moisture range
Response is approximately linear in the 20-80% moisture range

What we do: Establish known moisture levels using gravimetric method.

Gravimetric calibration procedure:

Prepare soil samples: Collect 5 containers of identical soil (200g each)
Create moisture levels:
- Sample A: Oven-dry at 105C for 24 hours (0% moisture)
- Sample B: Add 10g water (5% moisture by weight)
- Sample C: Add 30g water (15% moisture)
- Sample D: Add 60g water (30% moisture - irrigation trigger)
- Sample E: Saturate and drain (field capacity, ~60%)
Record calibration data:

Sample	Added Water (g)	Calculated Moisture (%)	ADC Reading
A	0	0%	2988
B	10	5%	2865
C	30	15%	2619
D	60	30%	2251
E	~120 (saturated)	60%	1515

What we do: Fit a linear equation to the calibration data.

Two-point calibration (using dry and field capacity points):

Point 1 (Low): ADC = 2988, Moisture = 0%
Point 2 (High): ADC = 1515, Moisture = 60%

Calculate slope (m) and offset (b):

\[m = \frac{Y_2 - Y_1}{X_2 - X_1} = \frac{60 - 0}{1515 - 2988} = \frac{60}{-1473} = -0.0407\]

\[b = Y_1 - m \times X_1 = 0 - (-0.0407 \times 2988) = 121.6\]

Calibration equation:

\[\text{Moisture \%} = -0.0407 \times \text{ADC} + 121.6\]

What we do: Apply the Step 3 equation using the same endpoints (dry at ADC=2988 for 0%, field capacity at ADC=1515 for 60%). The firmware action is just: read ADC, calculate moisture, clamp the answer to a valid range.

15.15.0.1 Firmware Behavior

Read the filtered ADC value.
Map the dry and field-capacity endpoints onto the 0-60% range.
Clamp the result to the valid 0-100% moisture range.

15.15.0.2 Optional C++ Pattern

#include <EEPROM.h>

#define SOIL_PIN 34
#define NUM_SAMPLES 10

struct Calibration {
    uint32_t magic;
    float dryADC;
    float wetADC;
    float dryMoisture;
    float wetMoisture;
};

// Two-point calibration: dry (0%) to field capacity (60%)
// Matches the gravimetric reference points from Step 2
Calibration cal = {
    .magic = 0xCAFEBABE,
    .dryADC = 2988.0,
    .wetADC = 1515.0,
    .dryMoisture = 0.0,
    .wetMoisture = 60.0
};

float getMoisturePercent() {
    // Read with median filtering
    float adcValue = readADCFiltered();

    // Linear interpolation with bounds checking
    float moisture = cal.dryMoisture +
        (cal.wetMoisture - cal.dryMoisture) *
        (cal.dryADC - adcValue) / (cal.dryADC - cal.wetADC);

    // Clamp to valid range
    if (moisture < 0.0) moisture = 0.0;
    if (moisture > 100.0) moisture = 100.0;

    return moisture;
}

Verification against calibration data:

Sample	ADC	Measured (%)	Model Prediction (%)	Error
A (dry)	2988	0%	0.0%	0.0%
B	2865	5%	5.0%	0.0%
C	2619	15%	15.0%	0.0%
D	2251	30%	30.0%	0.0%
E (field cap.)	1515	60%	60.0%	0.0%

The data points fall closely on the linear model, confirming this sensor has good linearity in the 0-60% range. For readings beyond 60% (saturated soil), the linear model extrapolates but accuracy degrades – use multi-point calibration if the full 0-100% range is needed.

Outcome: Successfully calibrated soil moisture sensor with two-point linear calibration covering 0-60% moisture.

Accuracy achieved (with repeated measurements at each point):

Moisture Range	Typical Error	Acceptable?
0-20% (dry)	+/- 1.5%	Yes
20-40% (trigger zone)	+/- 2.0%	Yes – sufficient for 30% irrigation trigger
40-60% (moist)	+/- 2.5%	Yes
>60% (saturated)	>5% (extrapolated)	Use multi-point calibration

Maintenance schedule:

Recalibrate every 6 months or after sensor replacement
Verify with known moisture sample monthly during growing season

Checkpoint: Calibration

You now know:

Two-point calibration uses a low and high reference so the firmware can correct both offset and gain.
The interactive example maps raw ADC endpoints 512 and 3584 onto actual values 0 and 100.
The soil example maps dry ADC 2988 and field-capacity ADC 1515 onto 0-60% moisture, then checks the 30% irrigation trigger.

15.16 Common Processing Pitfalls

Before testing your knowledge, review these common mistakes that trip up even experienced engineers.

15.16.1 Moving Average on Changing Signals

The Mistake: Using a large moving average window (N=32 or N=64 samples) to filter sensor data that changes rapidly, introducing unacceptable lag that makes control systems sluggish or miss transient events entirely.

Why It Happens: Moving average is simple to implement and tutorials recommend larger windows for “smoother” data. For slowly-changing signals (room temperature sampled at 1Hz), a 10-second window works well. But applying the same approach to fast signals (accelerometer at 100Hz, current sensing for motor control) adds N/2 samples of delay - a 32-sample filter at 100Hz introduces 160ms lag, making closed-loop control unstable.

The Fix: Match filter characteristics to signal dynamics:

Slow signals (temperature, humidity): Moving average N=8-32 at 1Hz sampling, 4-16 second settling time
Medium signals (distance, pressure): Exponential moving average (EMA) with alpha=0.1-0.3, responds faster while filtering noise
Fast signals (motor current, vibration): Use IIR filters (Butterworth, Chebyshev) designed for specific cutoff frequency

EMA Formula: filtered = alpha x new_value + (1-alpha) x previous_filtered

15.16.2 Single-Point Nonlinear Calibration

The Mistake: Calibrating a thermistor, pH sensor, or photodiode at only one reference point (e.g., room temperature, pH 7, or ambient light), then assuming the calibration applies across the entire measurement range.

Why It Happens: Single-point calibration is quick - adjust offset so the reading matches one known value and ship. This works for sensors with linear response and negligible gain error. But many sensors are inherently non-linear: thermistors follow the Steinhart-Hart equation (exponential), pH electrodes have temperature-dependent Nernst slope, photodiodes have logarithmic response at high intensity.

The Fix: Use two-point calibration minimum for linear sensors, three or more points for non-linear sensors:

Linear sensors (RTD, 4-20mA transmitters): Calibrate at 10% and 90% of range
Thermistor (NTC): Use Steinhart-Hart equation with three calibration points (0C, 25C, 100C)
pH sensor: Calibrate at pH 4.0, 7.0, and 10.0 buffers

Warning Signs: You need multi-point calibration if: (1) sensor datasheet shows non-linear response curve, (2) accuracy degrades significantly away from single calibration point, (3) sensor type is known to be non-linear.

15.17 Knowledge Check

Test your understanding of sensor data processing concepts with these questions.

The questions below are the checkpoint loop for the whole chapter: filter type, calibration math, latency, drift, and nonlinear sensors.

15.18 Match: Data Processing Concepts

15.19 Sensor Processing Pipeline

15.20 Label the Diagram

15.21 Code Challenge

15.22 Sampling and Aliasing Limits

The main chapter above shows how to smooth, calibrate, validate, and publish sensor readings after acquisition. The companion page checks the acquisition boundary itself: sample rate, Nyquist limits, alias frequencies, analog anti-alias filtering, and oversampling trade-offs decide whether the data entering those filters is trustworthy.

Checkpoint: Pipeline Readiness

You now know:

A complete processing pipeline moves through acquisition, cleaning, transformation, analysis, validation, and publishing.
The label quiz’s 4 hotspots make the same sequence visible: acquisition, cleaning, transformation, and analysis.
Filtering and calibration are downstream decisions; the companion page checks whether sampling captured the physical signal first.

15.23 Next Acquisition Practice

Continue with Sampling, Aliasing, and Anti-Alias Boundaries to verify that a sensor pipeline captures the physical signal before firmware filtering or calibration begins.

15.24 Summary

This chapter covered essential sensor data processing techniques:

Moving Average Filter: Simple noise reduction by averaging N samples, best for slow-changing signals
Kalman Filter: Adaptive filtering (optimal for linear systems with Gaussian noise) that balances predictions with measurements
Median Filter: Spike/outlier removal by selecting the middle value
Two-Point Calibration: Corrects offset and gain errors using two reference points
Multi-Point Calibration: Handles non-linear sensors with piecewise interpolation
EEPROM Storage: Persists calibration across power cycles

Common Pitfalls

15.24.1 Filtering Out Real Signal Variations

A moving average window that is too large smooths out genuine rapid changes in the measured quantity. A temperature spike from a briefly opened oven door may be a real event, not noise. Size the filter window to be shorter than the fastest legitimate change you need to detect.

15.24.2 Reference Accuracy in Calibration

The calibrated sensor can never be more accurate than the reference standard you calibrated against. Using a budget thermometer as a reference for a precision sensor is self-defeating. Use references at least 4x more accurate than your target accuracy.

15.24.3 Not Validating After Calibration

After applying calibration coefficients, test the sensor at several intermediate values — not just the two reference points. Nonlinearity errors will not be visible at the calibration endpoints but will appear at intermediate values.

15.24.4 EEPROM Calibration Loss

Flashing new firmware can erase EEPROM calibration data depending on memory layout. Always check coefficients on startup and alert the user if values read back as 0xFF (erased flash) or are physically implausible.

15.25 What’s Next

Chapter	Focus
Sampling, Aliasing, and Anti-Alias Boundaries	Sample-rate, Nyquist, aliasing, anti-alias filtering, and oversampling checks before firmware filtering
Sensor Power Management	Low-power sleep modes, duty cycling strategies, and battery life optimization for wireless sensor nodes
Sensor Interfacing Protocols	I2C, SPI, and UART communication between sensors and microcontrollers
Sensor Calibration Lab	Hands-on Wokwi simulation workshop applying two-point calibration to real sensors
Multi-Sensor Data Fusion	Combining readings from multiple sensors using weighted averaging and Kalman fusion
Sensor Fundamentals and Types	Core sensor types, transduction principles, and selection criteria