5 Edge ML and TinyML Deployment

analytics-ml

modeling

edge

deployment

5.1 Start With the Story

Picture an IoT team using the ideas in Edge ML and TinyML Deployment during a live operations review. A device has produced messy evidence, an analytic step is about to change an alert or control decision, and someone has to explain why the result should be trusted.

Read this page as that path from sensor evidence to accountable action. Start with what the system observes, keep the model or data treatment visible, and finish with the check that would convince an operator, maintainer, or auditor to act.

Phoebe’s Field Notes: The Sensor Feeding the Model Has Its Own Physics

Phoebe the physics guide

Phoebe’s Why

Before a vibration classifier’s 200 KB microcontroller model ever sees a feature, the accelerometer’s ADC already made two physics decisions that no amount of model quantization or pruning can undo. First, it can only report a finite number of discrete voltage codes, so every reading carries a small, unavoidable rounding error. Second, it can only faithfully capture motion below half its sampling rate; anything faster folds back and masquerades as something slower. This is a different “quantization” than the model-weight quantization this chapter’s Precision Boundary discusses – it happens in the sensor, before the feature recipe even starts, and it is exactly why the release ledger already lists “sensor firmware, sampling rate” as a retest trigger.

The Derivation

A signal sampled at \(f_s\) has its spectrum replicated at every multiple of \(f_s\). Energy above \(f_s/2\) overlaps the baseband copy – spectral folding – so a true component at \(f_{max}\) survives without distortion only if:

\[f_s \geq 2 f_{max}\]

For quantization, an ADC step \(q = V_{ref}/2^N\) rounds every sample to the nearest code. The rounding error is uniform on \([-q/2, +q/2]\), so its mean-square value (noise power) is:

\[\overline{e^2} = \frac{1}{q}\int_{-q/2}^{+q/2} x^2\,dx = \frac{q^2}{12}\]

Comparing that noise power to a full-scale signal of amplitude \(V_{ref}/2\) gives the standard result:

\[\mathrm{SNR} = 10\log_{10}\!\left(\frac{V_{ref}^2/8}{q^2/12}\right) \approx 6.02N + 1.76\ \text{dB}\]

Worked Numbers: The Vibration Classifier’s Edge Board

Using standard/typical figures, since this chapter names no specific sampling rate or ADC width:

Vibration content of interest up to \(f_{max} = 1.00\) kHz (typical bearing-fault/condition-monitoring band, stated assumption): Nyquist minimum \(f_s \geq 2.00\) kHz
Practical anti-alias margin (standard oversampling factor of 2.56, common in industrial vibration DAQ to leave room for a realizable filter roll-off): \(f_s \approx 2.56 \times 1.00 = 2.56\) kHz
Typical microcontroller ADC: 12-bit, \(V_{ref} = 3.3\) V \(\rightarrow q = 3.3/4096 = 0.806\) mV; RMS quantization noise \(q/\sqrt{12} = 0.233\) mV
Ideal SNR \(= 6.02\times12 + 1.76 = 74.0\) dB

If the deployed board’s firmware ever samples below roughly 2.00 kHz for this signal, energy above the Nyquist frequency aliases into the passband and is indistinguishable from real low-frequency vibration. The 100 MB cloud model was trained on correctly sampled data; a field firmware change that drops the sampling rate delivers corrupted features to the same 200 KB artifact without changing a single model weight – the exact gap the “sampling rate… changes” retest trigger in this chapter’s release ledger is written to catch.

5.2 Edge Deployment as Placement

Edge ML and TinyML move inference closer to the sensor, gateway, phone, controller, or embedded device that observes the field condition. The reason may be latency, privacy, offline operation, bandwidth reduction, cost control, or local resilience. The deployment is acceptable only when those reasons are tied to evidence.

The useful claim is not that a model can run somewhere small. The useful claim is that a specific model artifact, feature recipe, runtime, device, update path, fallback, and telemetry plan can support a named decision under real operating constraints.

If you only need the intuition, this layer is enough: approve edge ML when the placement reason, model artifact, feature parity, resource budget, update path, fallback behavior, telemetry, known limit, and retest trigger are all reviewable.

Worked example: a vibration classifier might begin as a 100 MB floating-point training artifact, convert to a 5 MB TFLite model, and finally ship as a roughly 200 KB microcontroller model with a fixed operator set and tensor arena. That size reduction is useful only if the deployed preprocessing, threshold, confidence rule, memory budget, and fallback path are tested on the same board and sensor schedule that will run in the field. Keep the model file, firmware build, and release record together.

Decision Fit

Name the prediction, the user or system that acts on it, the latency or offline need, and the cost of a wrong, late, or missing output.

Artifact Fit

Version the model, feature code, preprocessing, thresholds, runtime, hardware target, and conversion steps used to produce the deployable artifact.

Resource Fit

Measure memory, storage, compute time, thermal behavior, energy use, startup time, and contention with sensing, radio, and control tasks.

Operations Fit

Define update, rollback, telemetry, drift review, privacy boundary, confidence handling, and the safe behavior when inference cannot be trusted.

Edge AI and TinyML deployment diagram showing TensorFlow Lite Micro components, a cloud training pipeline from full TensorFlow model through TFLite conversion and micro optimization, microcontroller deployment steps for loading the model from flash, allocating a tensor arena, running TFLM inference, producing an output prediction, and model size reduction from 100 MB to 5 MB to 200 KB

Edge AI and TinyML deployment pipeline from cloud training to microcontroller inference

5.3 Edge ML Release Record

The release record proves that the deployed artifact matches the pipeline claim. It should be understandable by model owners, firmware owners, field support, and operations reviewers. The record is not a model card alone; it includes runtime and service evidence.

Release Area

Evidence to Keep

Common Weak Point

Retest Trigger

Feature parity

Sensor sources, units, windowing, filtering, normalization, missing-data handling, timestamp rules, and preprocessing code.

The edge runtime computes a different feature than the training pipeline because clocks, units, filters, or windows changed.

Sensor firmware, sampling rate, preprocessing code, time source, unit conversion, window length, or label definition changes.

Model artifact

Training dataset boundary, evaluation split, conversion steps, quantization or pruning decision, runtime version, and artifact checksum.

A converted model is accepted without checking behavior on representative edge inputs and degraded conditions.

Model, conversion tool, runtime, compiler, hardware target, threshold, or calibration dataset changes.

Device resources

Measured latency, memory, storage, energy, thermal behavior, startup time, and contention with sensing, radio, and control tasks.

The model works in isolation but misses timing, power, or memory margins when the full device workload runs.

Hardware revision, battery plan, duty cycle, radio schedule, firmware task layout, enclosure, or operating temperature range changes.

Operations

Update and rollback path, confidence handling, fallback mode, telemetry, drift signals, privacy boundary, owner, and incident procedure.

The model cannot be diagnosed, disabled, rolled back, or retrained when field behavior changes.

New population, location, firmware release, data drift, repeated fallback, privacy requirement, or support ownership change appears.

Keep the acceptance decision narrow. A model that classifies equipment state on one gateway may not be approved for another gateway, another sensor enclosure, a faster control loop, a safety alarm, or a new population without retest evidence.

Edge ML release record template Placement claim: why inference belongs on the sensor, gateway, phone, controller, or microcontroller for this decision. Artifact: model version, feature recipe, runtime, conversion step, threshold, checksum, and target hardware. Accepted evidence: evaluation set, representative edge inputs, resource measurements, fallback behavior, update and rollback path, telemetry, and drift review. Known limit: the claim this release does not approve, such as safety shutdown, new geography, new hardware, or unsupported sensor placement. Owner: who maintains model artifact, firmware, telemetry, retraining data, release approvals, and support response. Retest trigger: the exact data, device, firmware, runtime, threshold, feature, population, placement, or operational change that reopens review.

5.4 Edge Runtime Boundaries

Edge deployment changes the operating boundary of a model. Training may have used clean batches, stable timestamps, full precision, large memory, and cloud-side observability. The device may have noisy sensor streams, clock drift, missing packets, reduced precision, tight memory, limited power, intermittent backhaul, and delayed outcomes.

The runtime boundary is also a software boundary. The review should name the operator set, tensor arena or memory pool, compiler/runtime version, quantization calibration sample, preprocessing code path, and scheduler assumptions that make the artifact behave like the evaluated model. A firmware update that changes sampling order, task priority, clock source, or radio duty cycle can invalidate the evidence even when the model checksum is unchanged. That is why runtime changes belong in the retest trigger.

Feature Boundary

Online windows, filters, units, missing-data rules, and sensor timing must match the training recipe or the model sees a different problem.

Precision Boundary

Quantization, pruning, distillation, or runtime conversion can change confidence, rare-class behavior, and borderline decisions.

Resource Boundary

Inference shares memory, power, processor time, storage, and communication windows with firmware tasks that also matter to the product.

Observability Boundary

Privacy, bandwidth, and offline operation can hide inputs and outcomes, so telemetry must prove enough without collecting too much.

A strong deployment plan treats fallback as part of the model. If input quality is poor, confidence is low, the runtime is overloaded, or the model version is suspect, the system should choose a documented safe behavior: defer to a rule, ask for cloud review, use a conservative threshold, hold the previous state, or escalate to a human owner.

The under-the-hood rule is to preserve diagnosability. A field model that cannot report version, input quality, confidence, fallback frequency, resource pressure, and drift signals is hard to improve and unsafe to broaden.

5.5 Summary

Edge ML deployment is a placement claim that must explain why inference belongs on the device, gateway, phone, controller, or microcontroller.
The deployed artifact includes the model, feature recipe, preprocessing, threshold, runtime, conversion step, checksum, and hardware target.
Release evidence should cover representative edge inputs, feature parity, measured resource use, fallback behavior, telemetry, update, rollback, owner, and retest trigger.
Quantization, pruning, distillation, conversion, and firmware changes can alter model behavior and should be reviewed against the accepted claim.
A changed sensor, firmware, runtime, threshold, feature recipe, hardware target, population, placement, privacy rule, or operations owner should reopen the deployment review.

Key Takeaway

Deploy edge ML only when the runtime evidence is as reviewable as the training evidence. Placement, artifact, device resources, fallback, telemetry, owner, known limit, and retest trigger belong in the same release record.