20  Pipeline Processing and Formatting

In 60 Seconds

Stages 4-6 are where raw counts become useful payloads. Firmware calibrates and filters readings, encoding turns values into JSON, CBOR, or binary layouts, and packet assembly wraps the payload in transport headers. Most IoT efficiency wins happen here: local processing usually saves more energy than shaving a few bytes, but the wrong format or protocol can still make a tiny reading balloon into an expensive packet.

20.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Calibrate Raw ADC Values: Convert raw digital counts into meaningful engineering units using offset correction and scaling formulas
  • Compare Data Encoding Formats: Evaluate JSON, CBOR, and custom binary trade-offs for payload size, debuggability, and schema evolution
  • Analyze Protocol Stack Overhead: Calculate per-layer header costs and overall transmission efficiency for MQTT, CoAP, LoRaWAN, and BLE stacks
  • Design Edge Processing Strategies: Implement change detection, threshold alerting, and delta compression to reduce transmission volume by 90-99%
  • Estimate Battery Life Impact: Quantify how format selection and protocol overhead affect energy consumption and device longevity

This Series:

Fundamentals:

Architecture:

20.2 Prerequisites

Before diving into this chapter, you should be familiar with:

After the ADC converts your sensor reading to a digital number (like 3174), that raw value isn’t ready for the network yet. It needs three more transformations:

  1. Processing (Stage 4): Convert 3174 → 25.6°C using calibration
  2. Formatting (Stage 5): Encode as {"temp": 25.6} or binary
  3. Packet Assembly (Stage 6): Wrap with addresses and headers

Cooking Analogy: You’ve prepared your ingredients (stages 1-3). Now you need to: - Cook them into a dish (processing) - Plate and garnish (formatting) - Package for delivery (packet assembly)

This chapter covers these three middle stages of the pipeline.

MVU: Data Transformation Efficiency

Core Concept: Stages 4-6 determine how efficiently your sensor data travels the network. Processing reduces data volume through edge intelligence. Formatting trades readability for compactness. Packet assembly adds unavoidable protocol overhead.

Why It Matters: A typical IoT deployment transmits the same data thousands of times per day. Choosing JSON over binary might seem trivial for one reading, but multiplied by 1,000 sensors × 24 hours × 60 readings/hour = 1.44 million transmissions daily. That’s the difference between 72 MB and 2.8 MB of data transfer per day.

Key Takeaway: Optimize for the common case. If 95% of readings are “no change,” implement change-based transmission in Stage 4 to eliminate 95% of network traffic before worrying about Stage 5 format efficiency.


20.3 Stage 4: Digital Processing

⏱️ ~9 min | ⭐⭐ Intermediate | 📋 P02.C04.U05

Key Concepts

  • Process before transport: Filtering, thresholding, and aggregation often save more energy than choosing a denser payload format alone.
  • Calibration chain: Raw ADC counts need scale, offset, and unit conversion before alarms or analytics can trust them.
  • Schema discipline: Every payload format needs explicit units, field order, and versioning so decoders survive future changes.
  • Readable vs compact: JSON is easiest to inspect, CBOR balances efficiency and interoperability, and custom binary is smallest but most rigid.
  • Header dominance: For tiny sensor readings, protocol headers can outweigh the payload several times over.
  • Gateway translation: Constrained links may use binary or CBOR on the device side, then re-encode to JSON on Ethernet or cloud links.
  • Battery math: Measure the full radio transaction, including wake-up, acknowledgements, and receive windows, not only payload bytes.
How It Works: Digital Processing

The big picture: Firmware converts raw ADC counts into calibrated engineering units and applies intelligence to reduce data transmission.

Step-by-step breakdown:

  1. Calibration: ADC value 3174 → Temperature = (3174/4095) × 100°C = 77.5°C - Real example: Raw reading spans 0-4095 mapped to 0-100°C range (12-bit ADC)
  2. Offset correction: 77.5°C - 51.9°C = 25.6°C - Real example: Sensor has known 51.9°C offset at room temperature from factory calibration
  3. Change detection: Compare 25.6°C to previous reading 25.4°C, difference = 0.2°C - Real example: Only transmit if change exceeds 1°C threshold, saving 99% of transmissions for slowly-varying temperature

Why this matters: Edge processing reduces network traffic by 90-99% for slowly-changing sensors, extending battery life from months to years. Processing locally also enables real-time alerts without cloud round-trip latency.

20.3.1 Firmware Processing

Firmware processes raw ADC values:

Calibration:

Raw ADC: 3174 (12-bit ADC: 0-4095 range)
Temperature = (3174 / 4095) × 100°C = 77.5°C
Apply calibration offset: 77.5 - 51.9 = 25.6°C

Filtering:

  • Moving average: Smooth noisy readings by averaging the last N samples (e.g., N=8), reducing random noise by a factor of \(\sqrt{N}\)
  • Median filter: Remove spike outliers by selecting the middle value from a sorted window of samples – unlike averaging, a single extreme reading cannot distort the output
  • Kalman filter: Optimal linear state estimation that combines a prediction model with noisy measurements, weighting each by their uncertainty

Edge Processing:

  • Threshold detection: Alert if temp > 30°C
  • Change detection: Only transmit if changed > 1°C
  • Compression: Send delta from previous value

Output: Meaningful sensor value (25.6°C, not 3174)

Calculate the real temperature from a raw ADC reading. Adjust the sliders to see how ADC value, calibration range, and offset affect the final temperature.

20.3.2 Real-World Example: Vibration Analysis

From our industrial vibration sensor example:

4. Processing:

  • Firmware collects 1024 samples (1.024s window)
  • Fast Fourier Transform (FFT) reveals frequency spectrum
  • Peak detected at 127Hz with amplitude 0.8g
  • Edge AI algorithm recognizes: “bearing defect signature”
  • Decision: Transmit alert (not routine data)

Key Insight: Edge processing (FFT analysis) reduced transmission from 288 packets/day (routine data every 5 min) to 1 packet when anomaly detected, extending battery life from 6 months to 8 years.

Battery Life Calculation with Change Detection: Soil moisture sensor (2000 mAh battery):

Scenario A - Transmit Every Reading (no change detection): - Sampling: 1/hour x 24 = 24 readings/day - LoRaWAN TX cycle energy (wake + TX + RX1 + RX2): 45 mA x 1.5s = 67.5 mAs = 0.019 mAh per cycle - Daily TX energy: \(E_{tx} = 24 \times 0.019 = 0.456\) mAh/day - Sleep current: \(10\text{ µA} \times 24\text{ h} = 0.24\) mAh/day - Battery life: \(\frac{2000\text{ mAh}}{0.696\text{ mAh/day}} = 2,874\) days = ~7.9 years

Scenario B - No Change Detection, Higher Frequency (every 5 minutes): - Transmissions per day: \(\frac{24 \times 60}{5} = 288\) - Daily TX energy: \(288 \times 0.019 = 5.47\) mAh/day - Sleep + TX: \(5.47 + 0.24 = 5.71\) mAh/day - Battery life: \(\frac{2000}{5.71} = 350\) days = ~11.5 months

Scenario C - Change Detection on High-Frequency Sampling (sample every 5 min, transmit if change > 2%): - Soil moisture changes slowly: ~5% change per week - Transmissions per week: \(\frac{5\%}{2\%} = 2.5 \approx 3\) transmissions - Weekly TX energy: \(3 \times 0.019 = 0.057\) mAh/week - Weekly sleep: \(10\text{ µA} \times 168\text{ h} = 1.68\) mAh/week - Battery life: \(\frac{2000}{1.737\text{ mAh/week}} = 1,151\) weeks = ~22 years (limited by battery self-discharge to ~10 years)

Change detection extends effective battery life from 11.5 months (Scenario B) to 10+ years (Scenario C) at the same sampling rate – a ~10x improvement.

Explore how change detection affects battery life. Adjust transmission frequency and change detection threshold to see the impact.


20.4 Stage 5: Data Formatting

⏱️ ~10 min | ⭐⭐ Intermediate | 📋 P02.C04.U06

20.4.1 Encoding Strategies

Convert processed value to network-ready format:

JSON (Human-readable):

{"temp": 25.6, "unit": "C", "time": 1702834567}
  • Size: 52 bytes
  • Easy to debug
  • High overhead

CBOR (Binary-efficient):

A3 64 74 65 6D 70 F9 4C CD ...
  • Size: 18 bytes
  • Compact
  • Harder to debug

Custom Binary:

0x0A 0x00  (temperature × 10 in 16-bit integer)
  • Size: 2 bytes
  • Maximum efficiency
  • Requires documentation

Trade-off: Readability vs. bandwidth

A chapter-specific comparison shows one processed temperature reading encoded as JSON, CBOR, and custom binary. Each option shows example payload, bytes on the wire, key strengths, and where it fits best.
Figure 20.1

20.4.2 Format Selection Guide

  • JSON: Best for prototyping, debugging, REST APIs, and any link where humans regularly inspect payloads. Avoid it on battery-powered, high-frequency, or severely bandwidth-limited links.
  • CBOR: Best for production IoT payloads that still need self-description and schema evolution. Avoid it only when every byte is constrained or the toolchain is locked to older parsers.
  • Custom binary: Best for LoRaWAN, NB-IoT, or device-to-gateway links where byte budget is tight and the message schema is stable. Avoid it when teams need ad hoc debugging or the payload structure changes often.

A smart building has 100 temperature sensors reporting to a central gateway every minute. The gateway aggregates data and sends to the cloud.

Three format options being considered:

  • JSON: 45-byte sensor payload, 4.5 KB per minute for 100 sensors, about 6.5 MB per day.
  • CBOR: 18-byte sensor payload, 1.8 KB per minute, about 2.6 MB per day.
  • Binary: 4-byte sensor payload, 400 bytes per minute, about 576 KB per day.

Question: Which format would you recommend and why?

Recommendation: CBOR (with Binary as alternative for extreme constraints)

Analysis by factor:

1. Bandwidth/Cost:

  • Binary is smallest (576 KB/day) but CBOR (2.6 MB/day) is acceptable for Wi-Fi
  • If using cellular IoT (paid per MB), Binary saves ~$2/month vs CBOR

2. Development Time:

  • JSON: Fastest development, easy debugging
  • CBOR: Moderate (libraries available for most platforms)
  • Binary: Longest development, custom parsing code needed

3. Interoperability:

  • JSON: Universal, any system can parse
  • CBOR: Standardized (RFC 8949), growing IoT adoption
  • Binary: Proprietary, requires documentation

4. Debuggability:

  • JSON: Human-readable, easy to troubleshoot
  • CBOR: Can convert to JSON for debugging
  • Binary: Requires hex dump analysis, error-prone

5. Schema Evolution:

  • JSON/CBOR: Easy to add new fields without breaking parsers
  • Binary: Version byte needed, all endpoints must update together

Key insight: Format choice isn’t just about bytes – consider the full development lifecycle including debugging, maintenance, and future requirements.


20.5 Stage 6: Packet Assembly

⏱️ ~8 min | ⭐⭐⭐ Advanced | 📋 P02.C04.U07

20.5.1 Protocol Encapsulation

Wrap formatted data in protocol headers:

Example: MQTT Packet

[ MQTT Header (2 bytes) ]
[ Topic: "sensors/temp1" (16 bytes) ]
[ Payload: {"temp":25.6} (17 bytes) ]
[ Total: 35 bytes ]

Protocol layers stack:

Application: MQTT (35 bytes)
  Transport: TCP header (20 bytes)
    Network: IP header (20 bytes)
      Link: Wi-Fi frame (30 bytes)
Total: 105 bytes to send 17 bytes of data!

Output: Complete network packet ready for transmission

A packet-budget graphic compares four protocol stacks for the same 17-byte temperature reading and shows how much of each transmitted packet is payload versus overhead.
Figure 20.2

20.5.2 Protocol Overhead Comparison

  • MQTT/TCP/IP/Wi-Fi: 17-byte payload plus 88 bytes of headers, 105 bytes total, about 16% efficient.
  • CoAP/UDP/IP/Wi-Fi: 17-byte payload plus 58 bytes of headers, 75 bytes total, about 23% efficient.
  • LoRaWAN: 17-byte payload plus 13 bytes of headers, 30 bytes total, about 57% efficient.
  • BLE GATT: 17-byte payload plus 7 bytes of headers, 24 bytes total, about 71% efficient.

Key Insight: For small payloads, protocol overhead dominates. This is why IoT-specific protocols (CoAP, LoRaWAN) were designed with minimal headers.

Protocol Overhead Impact on Battery Life: 10-byte sensor payload, 1 transmission/minute:

MQTT over TCP/IP/Wi-Fi:

  • Payload: 10 bytes, Total on-wire: 102 bytes (with all headers)
  • Wi-Fi TX cycle (wake + associate + TX + ACK): ~300 ms active at 150 mA
  • Energy per cycle: \(150\text{ mA} \times 0.3\text{ s} = 45\text{ mAs} = 0.0125\) mAh
  • Daily energy (1440 TX/day): \(1440 \times 0.0125 = 18\) mAh/day
  • Add sleep current (\(10\text{ µA} \times 24\text{ h} = 0.24\) mAh/day)
  • Battery life: \(\frac{2000}{18.24} = 110\) days = ~3.6 months

LoRaWAN (SF7):

  • Payload: 10 bytes, Total on-wire: 23 bytes (with LoRaWAN header)
  • Airtime @ SF7 (5.47 kbps): \(T_{air} = \frac{23 \times 8}{5470} = 33.6\) ms
  • Full TX cycle (TX + RX1 + RX2): ~1.5 s active at 45 mA = \(67.5\text{ mAs} = 0.019\) mAh
  • LoRaWAN duty cycle limits 1% airtime, so max ~1 TX/min at SF7
  • Daily energy: \(1440 \times 0.019 = 27.4\) mAh/day
  • Battery life: \(\frac{2000}{27.6} = 72\) days = ~2.4 months (at 1 TX/min)
  • At 1 TX/hour: \((24 \times 0.019) + 0.24 = 0.70\) mAh/day → ~7.8 years

Key insight: Wi-Fi energy is dominated by the radio association overhead (~300 ms), not data transmission time (~0.8 ms). LoRaWAN energy is dominated by long RX windows. Both protocols benefit from reducing transmission frequency rather than payload size.

Compare protocol efficiency for different network stacks. See how payload size affects overall efficiency.

20.5.3 Real-World Packaging Example

From the industrial vibration sensor:

5. Packaging:

  • CBOR format: {id:M7, freq:127, amp:0.8, alert:1, time:1702834567}
  • Compressed to 12 bytes (vs 65 bytes JSON)

6. Transmission:

  • LoRaWAN uplink: 12-byte payload + 13-byte header + 26-byte PHY = 51 bytes total
  • Spreading Factor 7, 125kHz bandwidth
  • Airtime: 41ms at 20mW = 0.82 mJ energy

Complete Pipeline Performance:

  • Total latency: 2.3 seconds (sensor to cloud)
  • Energy consumption: 45mJ per reading (1024 samples + FFT + transmit)
  • Data reduction: 4096 raw samples (8KB) → 51 bytes transmitted = 99.4% reduction

20.5.4 Worked Example: Norsk Hydro Aluminium Smelter Data Format Selection

Scenario: Norsk Hydro operates an aluminium smelter in Sunndal, Norway with 300 electrolysis pots, each monitored by 8 sensors (pot voltage, cathode current, bath temperature, alumina level, noise amplitude, anode effect flag, crust break status, feed rate). Data is transmitted via LoRaWAN from pot controllers to the plant gateway.

Given:

  • 300 pots x 8 sensors = 2,400 sensor values per transmission cycle
  • Transmission interval: every 60 seconds
  • LoRaWAN maximum payload: 51 bytes at SF12 (worst-case indoor range)
  • Each sensor value: 4 bytes (float32) = 32 bytes per pot (8 sensors x 4 bytes)
  • Gateway has Ethernet backhaul to cloud (no bandwidth constraint)

Step 1: Calculate raw data per pot per cycle

  • JSON: {"v":4.23,"i":182.5,...} is about 156 bytes per pot, roughly 3x over the LoRaWAN limit, so it does not fit.
  • CBOR: A binary-encoded map is about 52 bytes per pot, which still misses the 51-byte limit.
  • Custom binary: 6 x uint16 + 1 x uint8 + 2-byte pot ID is 15 bytes per pot, fits comfortably, and allows 3 pots per packet.

Step 2: Design custom binary encoding

  • Pot ID: 2-byte uint16, range 1-300, resolution of 1 pot.
  • Pot voltage: 2-byte uint16, encode (V - 3.0) x 10,000, resolution 0.1 mV.
  • Cathode current: 2-byte uint16, encode kA x 100, resolution 10 A.
  • Bath temperature: 2-byte uint16, encode (T - 900) x 100, resolution 0.01 C.
  • Alumina level: 2-byte uint16, encode % x 1000, resolution 0.001%.
  • Noise amplitude: 2-byte uint16, encode mV x 100, resolution 0.01 mV.
  • Feed rate: 2-byte uint16, encode grams per minute, resolution 1 g/min.
  • Status flags: 1-byte uint8, with bit 0 for anode effect, bit 1 for crust break, and 6 reserved bits.
  • Total: 15 bytes per pot, which is compact enough to pack 3 pots inside a 51-byte LoRaWAN frame.

Step 3: Calculate protocol efficiency

  • JSON over MQTT/TCP/Wi-Fi: 156-byte application payload, 88 bytes of transport and link headers, 244 bytes total on the wire, 1 pot per packet, 300 packets per cycle.
  • Custom binary over LoRaWAN: 15-byte application payload, 13 bytes of LoRaWAN MAC overhead, 28 bytes total on the wire, 3 pots per packet, 100 packets per cycle.
  • Efficiency comparison: JSON over Wi-Fi looks numerically efficient at 64% because the payload is large, but it still cannot fit the constrained LoRaWAN link. The compact binary payload is 54% efficient and actually deployable.
  • Cycle timing: 100 LoRaWAN packets at 1.8 seconds each consume about 180 seconds of airtime inside a 300-second window.

Result: Custom binary encoding fits 3 pots per LoRaWAN packet, reducing the required 300 transmissions to 100 per cycle. The 15-byte per-pot payload preserves measurement resolution that exceeds sensor accuracy (0.1 mV voltage resolution vs 5 mV sensor accuracy). JSON is used only at the gateway-to-cloud stage where Ethernet bandwidth is unconstrained and debuggability matters.

Key Insight: Format selection should be different at each pipeline stage. Use the most compact format (custom binary) on the constrained link (LoRaWAN), then re-encode to the most debuggable format (JSON) on the unconstrained link (Ethernet). The gateway is the natural translation point, and designing the binary encoding around uint16 scaled integers keeps both encoding and decoding trivial on microcontrollers.

Concept Relationships: Pipeline Stages 4-6
  • Edge processing -> transmission volume: Change detection, thresholding, and aggregation cut whole packets before the radio ever turns on.
  • Format choice -> schema evolution: JSON and CBOR handle field additions more gracefully than tightly packed binary payloads.
  • Protocol headers -> small-payload efficiency: Once the reading is only a few bytes, header overhead often dominates the wire cost.
  • Gateway translation -> link constraints: A gateway can terminate a compact device-side format and re-encode into a debuggable cloud-side format.

Cross-module connection: Data Formats for IoT provides comprehensive comparison of encoding strategies.

Common Pitfalls

If one device sends raw ADC counts, another sends calibrated Celsius, and a third quietly switches to Fahrenheit, the payload may decode perfectly while the system makes the wrong decisions. Define the units and conversion rules at Stage 4 before the message format is treated as stable.

JSON is pleasant on a laptop and expensive on a coin cell. Put human-readable formats where humans actually inspect them, usually at the gateway or cloud boundary, and use tighter encodings on low-power radio hops when the schema is stable enough.

Shrinking a payload from 40 bytes to 12 bytes helps, but suppressing 95% of routine transmissions helps far more. Measure whether your system is dominated by packet count, radio wake-up time, or payload size before spending effort on binary packing.

20.6 Summary

Stages 4-6 transform digital values into network-ready packets:

  1. Digital Processing (Stage 4): Apply intelligence to raw values
    • Calibration converts ADC values to meaningful units
    • Filtering removes noise and outliers
    • Edge processing reduces transmission volume
  2. Data Formatting (Stage 5): Encode for transmission
    • JSON: Human-readable, high overhead
    • CBOR: Compact, self-describing, moderate overhead
    • Binary: Maximum efficiency, requires documentation
  3. Packet Assembly (Stage 6): Wrap in protocol headers
    • Each protocol layer adds overhead
    • IoT protocols minimize header size
    • Small payloads suffer most from overhead
Key Takeaway

Edge processing (Stage 4) provides the biggest efficiency gains in the pipeline – reducing transmissions through local intelligence saves far more energy than optimizing payload format. For data formatting (Stage 5), choose JSON for debugging and human readability, CBOR for a balance of compactness and self-description, or raw binary for maximum efficiency. Protocol overhead (Stage 6) can exceed payload size for small sensor readings, so optimize in order: reduce transmissions first, then reduce payload size.

20.7 See Also

Max the Microcontroller was the chef today, cooking up data in three stages!

“First, I process the raw number,” Max explained, converting ADC value 3174 into a temperature. “Calibration says 3174 equals 25.6 degrees Celsius – that’s Stage 4, like cooking raw ingredients into a meal!”

Lila the LED was in charge of Stage 5: “Now I plate the meal! I can write it fancy like {\"temp\": 25.6} – that’s JSON, easy to read but takes lots of space. Or I can pack it tight in binary – just 2 bytes instead of 17!”

“And I wrap it for delivery!” said Sammy the Sensor, adding protocol headers. “The address label, tracking number, and packaging sometimes weigh more than the food inside! A 2-byte temperature reading needs 40+ bytes of headers.”

Bella the Battery gasped. “That’s like using a giant shipping box for a single cookie! We need to be smart about packaging to save my energy.”

The lesson: Processing, formatting, and packaging transform raw numbers into network-ready data – and smart choices at each stage save power and bandwidth!


20.8 What’s Next