22 Sensor Fusion Best Practices

analytics-ml

data

fusion

best

22.1 Start With the Story

Picture an IoT team using the ideas in Sensor Fusion Best Practices during a live operations review. A device has produced messy evidence, an analytic step is about to change an alert or control decision, and someone has to explain why the result should be trusted.

Read this page as that path from sensor evidence to accountable action. Start with what the system observes, keep the model or data treatment visible, and finish with the check that would convince an operator, maintainer, or auditor to act.

22.2 Reviewable Fusion Evidence

Sensor fusion fails most often when the surrounding evidence contract is weak. A mathematically sound filter can still publish bad output if sensors are miscalibrated, timestamps describe different moments, uncertainty is misreported, or fallback rules keep stale sensors in the estimate. Best practice is to design the validation, calibration, timing, health, and retest rules at the same time as the fusion algorithm.

The practical goal is not to make the fused estimate look smooth. The goal is to publish a state estimate with visible confidence and visible limits. Every fused output should preserve enough metadata for a reviewer to answer which sensors contributed, which inputs were rejected, how fresh the evidence was, what uncertainty was reported, and what degraded mode was active.

If you only need the intuition, this layer is enough: do not trust a fused value just because it is fused. Trust it only when the calibration, time alignment, validation gates, uncertainty, sensor-health checks, and fallback behavior are all recorded and tested.

A useful best-practice checklist follows the evidence from raw register to local decision, not just from algorithm input to algorithm output. Each stage should name its owner and its review record: raw value and unit before cleaning, rejection rule before calibration, calibration formula and version before fusion, and the threshold or fallback rule before a downstream action is allowed. When the record is missing, the system can still look stable while silently mixing incompatible evidence.

For example, a room occupancy service might fuse motion, door, camera, and CO2 evidence. The dashboard should not show only “occupied: yes”. It should also expose that the camera is in privacy-masked mode, the CO2 sensor is 11 minutes old, the door sensor was accepted, and the motion channel was rejected for a stuck-high heartbeat. That evidence boundary tells operators whether to trust the fused state, wait for recovery, or route the decision to manual review.

Best practice keeps calibration, local decision, and review evidence together so a fused value can be audited after deployment.

Calibrate

Verify bias, scale, axis alignment, units, and placement before using measurements in a fused state.

Align Time

Fuse measurements by event time, not arrival time, and define how stale or missing readings are handled.

Gate Inputs

Reject measurements that violate range, physics, innovation, freshness, or sensor-health checks.

Retest

Rerun validation after firmware, enclosure, sensor supplier, clock, threshold, or deployment-environment changes.

Overview Knowledge Check

22.3 Practitioner: Use Gates Before Actuation

Validation should happen before a fused estimate drives alerts, control, or actuation. Simple gates catch obvious failures: missing values, stale timestamps, invalid units, impossible ranges, and physical jumps. Statistical gates catch subtler cases by comparing a measurement to the filter’s prediction and uncertainty. In Kalman-style filters, this is often done with the innovation and its covariance.

The gate output should be stored as operational evidence, not treated as a private filter detail. An accepted measurement records the innovation, threshold, timestamp, and calibration version that justified the update. A rejected measurement records the same evidence plus the fallback state that was published instead. That record lets an engineer distinguish a real process jump from a clock fault, a bad noise estimate, or a sensor that returned after maintenance with a different offset.

Worked example: scalar innovation gate
predicted temperature: 22.0 deg C
predicted variance P: 0.25
new measurement z: 25.0 deg C
measurement variance R: 0.25

innovation:
y = z - prediction = 25.0 - 22.0 = 3.0

innovation variance:
S = P + R = 0.25 + 0.25 = 0.50

normalized innovation squared:
NIS = y^2 / S = 9.0 / 0.50 = 18.0

decision:
For a one-dimensional 95 percent chi-square gate, the threshold is about 3.84.
18.0 is above that gate, so reject or quarantine this measurement.
Do not feed it into the fused state until the cause is understood.

Gate

What It Checks

Evidence to Keep

Input sanity

Units, NaN values, range, saturation, clipping, and impossible sensor codes.

Raw value, unit, sensor id, firmware version, range limits, and rejection reason.

Time freshness

Whether the measurement event time is close enough to the fusion time step.

Event timestamp, arrival timestamp, clock source, age, interpolation rule, and stale threshold.

Physics limit

Whether the state jump violates speed, acceleration, thermal, energy, or process constraints.

Previous state, proposed state, physical bound, time step, and action taken.

Innovation gate

Whether the new measurement is plausible given the predicted state and uncertainty.

Innovation, innovation variance, gate threshold, accepted/rejected flag, and fallback state.

Practitioner Knowledge Check

22.4 Fusion Needs a Retreat Plan

A production fusion system needs fault detection and exclusion, not just a best-effort estimate. Fault detection identifies suspect inputs through health, timing, residual, and consistency checks. Exclusion removes or downweights the suspect evidence. The retreat plan defines what the application does next: continue with reduced confidence, switch to a backup source, hold last good state for a bounded time, alert the operator, or stop actuating.

The retreat plan should be numeric enough to test. Suppose a gateway fuses three range sensors for a conveyor diverter. Normal mode requires at least two fresh sensors with residuals inside the gate. Degraded mode allows one fresh sensor for at most 5 s if the diverter remains below 0.5 m/s and an operator alert is active. Stop mode triggers when all sensors are stale, when the only fresh sensor has two consecutive rejected innovations, or when the calibration version no longer matches the approved release. Those thresholds make the fallback behavior reproducible instead of dependent on whoever reads the alarm.

Isolation should also record which state dimension was protected. If range sensors A, B, and C report residuals of 0.08 m, 0.11 m, and 0.72 m while the configured gate is 0.30 m, the system can exclude C from the position update without marking every sensor unhealthy. The published state should then say “two-of-three range evidence, C excluded, position-only confidence widened” rather than simply “healthy”. If the excluded sensor later returns with residuals of 0.09 m, 0.07 m, and 0.06 m over three windows, the recovery rule can allow it back in. If it returns once at 0.09 m and then jumps to 0.41 m, it stays quarantined and the maintenance record remains open. This prevents a single lucky sample from hiding a loose mount, bad reference, or intermittent cable.

Detect

Use residuals, innovation spikes, heartbeat loss, saturation flags, and cross-sensor disagreement to identify suspect inputs.

Exclude

Remove, downweight, or quarantine invalid evidence. Record the exclusion reason and the affected state dimensions.

Degrade

Publish a lower-confidence mode when fewer sensors remain observable. Do not pretend the full system is still available.

Recover

Require a recovery rule before trusting a sensor again, such as stable residuals over several windows or a calibration check.

Boundary

Risk

Retest Trigger

Calibration

Shared bias makes several sensors agree on a wrong value.

New sensor batch, enclosure, mount, axis mapping, reference procedure, or field recalibration.

Clocking

Measurements appear consistent but describe different physical moments.

New gateway, network path, sampling rate, time-sync source, buffering policy, or offline batch mode.

Thresholds

Gates reject good evidence during legitimate motion or accept bad evidence during faults.

New operating envelope, motion profile, load, environment, noise estimate, or alert threshold.

Fallback

The system keeps actuating after observability or confidence has fallen below the safe boundary.

New degraded mode, actuator policy, safety case, operator alert, or failover dependency.

Retreat-plan checklist for a fused output
normal: at least two fresh sensors, residuals inside gate, calibration current
degraded: one fresh sensor, bounded speed/load, alert active, max hold time set
exclude: record sensor id, rejected dimension, residual, threshold, and time
recover: require stable residuals over N windows plus a calibration/version check
stop: no fresh evidence, unsafe actuator state, or unapproved calibration version

Good exclusion is also reversible only through a defined recovery path. A sensor that failed one innovation gate should not rejoin the fused state just because its next sample looks reasonable; it may need several stable windows, a heartbeat check, or a reference comparison. The same rule protects model updates: if a new noise estimate changes which evidence is accepted, the release needs a replay of known normal, degraded, and fault cases before the new gate drives production decisions.

Under-the-Hood Knowledge Check

22.5 Summary

Sensor fusion best practices make fused estimates auditable under real deployment conditions. Calibration, event-time alignment, input sanity checks, physics limits, innovation gates, fault detection, exclusion, degraded modes, and recovery rules protect the fused state from stale or invalid evidence. The algorithm matters, but the operating contract around the algorithm is what keeps the output trustworthy.

Key Takeaway

A fusion system should never publish only a smooth number. It should publish a bounded estimate with current uncertainty, accepted and rejected evidence, sensor-health state, degraded-mode status, and retest triggers.