59  Pipeline Processing and Formatting

59.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Process Digital Sensor Data: Apply calibration, filtering, and edge intelligence
  • Choose Data Formats: Select appropriate encoding (JSON, CBOR, binary) for your use case
  • Assemble Network Packets: Understand protocol encapsulation and overhead
  • Calculate Overhead: Quantify the cost of different format and protocol choices

This Series: - Sensor Pipeline Index - Series overview - Pipeline Overview and Signal Acquisition - Stages 1-3 - Transmission and Optimization - Stage 7 and complete journey

Fundamentals: - Data Formats for IoT - JSON, CBOR, and binary formats deep dive - Packet Structure and Framing - Protocol encapsulation details

Architecture: - Edge Computing - Data processing at the edge

59.2 Prerequisites

Before diving into this chapter, you should be familiar with:

After the ADC converts your sensor reading to a digital number (like 3174), that raw value isn’t ready for the network yet. It needs three more transformations:

  1. Processing (Stage 4): Convert 3174 → 25.6°C using calibration
  2. Formatting (Stage 5): Encode as {"temp": 25.6} or binary
  3. Packet Assembly (Stage 6): Wrap with addresses and headers

Cooking Analogy: You’ve prepared your ingredients (stages 1-3). Now you need to: - Cook them into a dish (processing) - Plate and garnish (formatting) - Package for delivery (packet assembly)

This chapter covers these three middle stages of the pipeline.

TipMVU: Data Transformation Efficiency

Core Concept: Stages 4-6 determine how efficiently your sensor data travels the network. Processing reduces data volume through edge intelligence. Formatting trades readability for compactness. Packet assembly adds unavoidable protocol overhead.

Why It Matters: A typical IoT deployment transmits the same data thousands of times per day. Choosing JSON over binary might seem trivial for one reading, but multiplied by 1,000 sensors × 24 hours × 60 readings/hour = 1.44 million transmissions daily. That’s the difference between 72 MB and 2.8 MB of data transfer per day.

Key Takeaway: Optimize for the common case. If 95% of readings are “no change,” implement change-based transmission in Stage 4 to eliminate 95% of network traffic before worrying about Stage 5 format efficiency.


59.3 Stage 4: Digital Processing

⏱️ ~9 min | ⭐⭐ Intermediate | 📋 P02.C04.U05

59.3.1 Firmware Processing

Firmware processes raw ADC values:

Calibration:

Raw ADC: 3174
Temperature = (3174 / 4096) × 100°C = 77.5°C
Apply calibration offset: 77.5 - 51.9 = 25.6°C

Filtering: - Moving average: Smooth noisy readings - Median filter: Remove outliers - Kalman filter: Optimal estimation

Edge Processing: - Threshold detection: Alert if temp > 30°C - Change detection: Only transmit if changed > 1°C - Compression: Send delta from previous value

Output: Meaningful sensor value (25.6°C, not 3174)

59.3.2 Real-World Example: Vibration Analysis

From our industrial vibration sensor example:

4. Processing: - Firmware collects 1024 samples (1.024s window) - Fast Fourier Transform (FFT) reveals frequency spectrum - Peak detected at 127Hz with amplitude 0.8g - Edge AI algorithm recognizes: “bearing defect signature” - Decision: Transmit alert (not routine data)

Key Insight: Edge processing (FFT analysis) reduced transmission from 288 packets/day (routine data every 5 min) to 1 packet when anomaly detected, extending battery life from 6 months to 8 years.


59.4 Stage 5: Data Formatting

⏱️ ~10 min | ⭐⭐ Intermediate | 📋 P02.C04.U06

59.4.1 Encoding Strategies

Convert processed value to network-ready format:

JSON (Human-readable):

{"temp": 25.6, "unit": "C", "time": 1702834567}
  • Size: 52 bytes
  • Easy to debug
  • High overhead

CBOR (Binary-efficient):

A3 64 74 65 6D 70 F9 4C CD ...
  • Size: 18 bytes
  • Compact
  • Harder to debug

Custom Binary:

0x0A 0x00  (temperature × 10 in 16-bit integer)
  • Size: 2 bytes
  • Maximum efficiency
  • Requires documentation

Trade-off: Readability vs. bandwidth

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1', 'clusterBkg': '#fff', 'clusterBorder': '#2C3E50', 'edgeLabelBackground':'#fff'}}}%%
%%{init: {'themeVariables': {'xyChart': {'backgroundColor': '#ffffff'}}}}%%
xychart-beta horizontal
    title "Data Format Size Comparison (bytes)"
    x-axis [JSON, CBOR, Binary]
    y-axis "Payload Size (bytes)" 0 --> 60
    bar [52, 18, 2]

Figure 59.1: Data format size comparison for the same temperature reading (25.6°C): JSON is human-readable but uses 52 bytes, CBOR is compact binary at 18 bytes, and custom binary achieves maximum efficiency at 2 bytes

59.4.2 Format Selection Guide

Format Best For Avoid When
JSON Prototyping, debugging, REST APIs Battery-powered, high-frequency
CBOR Production IoT, CoAP, LwM2M Extreme constraints, legacy systems
Binary LoRaWAN, NB-IoT, max efficiency Frequent schema changes, team debugging

59.5 Stage 6: Packet Assembly

⏱️ ~8 min | ⭐⭐⭐ Advanced | 📋 P02.C04.U07

59.5.1 Protocol Encapsulation

Wrap formatted data in protocol headers:

Example: MQTT Packet

[ MQTT Header (2 bytes) ]
[ Topic: "sensors/temp1" (16 bytes) ]
[ Payload: {"temp":25.6} (17 bytes) ]
[ Total: 35 bytes ]

Protocol layers stack:

Application: MQTT (35 bytes)
  Transport: TCP header (20 bytes)
    Network: IP header (20 bytes)
      Link: Wi-Fi frame (30 bytes)
Total: 105 bytes to send 17 bytes of data!

Output: Complete network packet ready for transmission

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1', 'clusterBkg': '#fff', 'clusterBorder': '#2C3E50', 'edgeLabelBackground':'#fff'}}}%%
flowchart TB
    subgraph L4["Link Layer (Wi-Fi)"]
        A[Wi-Fi Frame Header<br/>30 bytes]
    end

    subgraph L3["Network Layer (IP)"]
        B[IP Header<br/>20 bytes]
    end

    subgraph L2["Transport Layer (TCP)"]
        C[TCP Header<br/>20 bytes]
    end

    subgraph L1["Application Layer (MQTT)"]
        D[MQTT Header<br/>2 bytes]
        E[Topic Name<br/>16 bytes]
        F[Payload<br/>17 bytes<br/>'temp':25.6']
    end

    A --> B
    B --> C
    C --> D
    D --> E
    E --> F

    style A fill:#7F8C8D,stroke:#2C3E50,stroke-width:2px,color:#fff
    style B fill:#2C3E50,stroke:#16A085,stroke-width:2px,color:#fff
    style C fill:#16A085,stroke:#2C3E50,stroke-width:2px,color:#fff
    style D fill:#E67E22,stroke:#2C3E50,stroke-width:2px,color:#fff
    style E fill:#E67E22,stroke:#2C3E50,stroke-width:2px,color:#fff
    style F fill:#16A085,stroke:#2C3E50,stroke-width:3px,color:#fff

    style L4 fill:#f9f9f9,stroke:#2C3E50,stroke-width:2px
    style L3 fill:#f9f9f9,stroke:#2C3E50,stroke-width:2px
    style L2 fill:#f9f9f9,stroke:#2C3E50,stroke-width:2px
    style L1 fill:#f9f9f9,stroke:#2C3E50,stroke-width:2px

Figure 59.2: Protocol stack overhead showing layer-by-layer encapsulation: 17 bytes of actual temperature data requires 88 bytes of headers (Wi-Fi + IP + TCP + MQTT), resulting in 105 total bytes transmitted

59.5.2 Protocol Overhead Comparison

Protocol Stack Payload Headers Total Efficiency
MQTT/TCP/IP/Wi-Fi 17 bytes 88 bytes 105 bytes 16%
CoAP/UDP/IP/Wi-Fi 17 bytes 58 bytes 75 bytes 23%
LoRaWAN 17 bytes 13 bytes 30 bytes 57%
BLE GATT 17 bytes 7 bytes 24 bytes 71%

Key Insight: For small payloads, protocol overhead dominates. This is why IoT-specific protocols (CoAP, LoRaWAN) were designed with minimal headers.

59.5.3 Real-World Packaging Example

From the industrial vibration sensor:

5. Packaging: - CBOR format: {id:M7, freq:127, amp:0.8, alert:1, time:1702834567} - Compressed to 12 bytes (vs 65 bytes JSON)

6. Transmission: - LoRaWAN uplink: 12-byte payload + 13-byte header + 26-byte PHY = 51 bytes total - Spreading Factor 7, 125kHz bandwidth - Airtime: 41ms at 20mW = 0.82 mJ energy

Complete Pipeline Performance: - Total latency: 2.3 seconds (sensor to cloud) - Energy consumption: 45mJ per reading (1024 samples + FFT + transmit) - Data reduction: 4096 raw samples (8KB) → 51 bytes transmitted = 99.4% reduction


59.6 Knowledge Check: Format Selection

A smart building has 100 temperature sensors reporting to a central gateway every minute. The gateway aggregates data and sends to the cloud.

Three format options being considered:

Format Sensor Payload 100 Sensors/min Daily Total
JSON 45 bytes 4.5 KB 6.5 MB
CBOR 18 bytes 1.8 KB 2.6 MB
Binary 4 bytes 400 bytes 576 KB

Question: Which format would you recommend and why?

Question: For this Wi-Fi-connected smart building, which format is the best default recommendation (balancing size, interoperability, and debuggability)?

Explanation: B. CBOR is substantially smaller than JSON while remaining self-describing and easier to evolve/debug than a custom binary format.

Recommendation: CBOR (with Binary as alternative for extreme constraints)

Analysis by factor:

1. Bandwidth/Cost: - Binary is smallest (576 KB/day) but CBOR (2.6 MB/day) is acceptable for Wi-Fi - If using cellular IoT (paid per MB), Binary saves ~$2/month vs CBOR

2. Development Time: - JSON: Fastest development, easy debugging - CBOR: Moderate (libraries available for most platforms) - Binary: Longest development, custom parsing code needed

3. Interoperability: - JSON: Universal, any system can parse - CBOR: Standardized (RFC 8949), growing IoT adoption - Binary: Proprietary, requires documentation

4. Debuggability: - JSON: Human-readable, easy to troubleshoot - CBOR: Can convert to JSON for debugging - Binary: Requires hex dump analysis, error-prone

5. Schema Evolution: - JSON/CBOR: Easy to add new fields without breaking parsers - Binary: Version byte needed, all endpoints must update together

Key insight: Format choice isn’t just about bytes—consider the full development lifecycle including debugging, maintenance, and future requirements.


59.7 Summary

Stages 4-6 transform digital values into network-ready packets:

  1. Digital Processing (Stage 4): Apply intelligence to raw values
    • Calibration converts ADC values to meaningful units
    • Filtering removes noise and outliers
    • Edge processing reduces transmission volume
  2. Data Formatting (Stage 5): Encode for transmission
    • JSON: Human-readable, high overhead
    • CBOR: Compact, self-describing, moderate overhead
    • Binary: Maximum efficiency, requires documentation
  3. Packet Assembly (Stage 6): Wrap in protocol headers
    • Each protocol layer adds overhead
    • IoT protocols minimize header size
    • Small payloads suffer most from overhead

Key Takeaways: - Edge processing (Stage 4) often provides the biggest efficiency gains - Format choice (Stage 5) trades readability for compactness - Protocol overhead can exceed payload size for small data - Optimize stages in order: reduce transmissions first, then reduce payload size


59.8 What’s Next

In the next chapter, Transmission and Optimization, we’ll explore Stage 7 (network transmission), trace the complete journey from sensor to cloud, and learn optimization strategies for latency, power, and bandwidth.

Continue to Transmission and Optimization →