42  Binary Data Formats for IoT

42.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Implement CBOR encoding: Encode and decode sensor data using CBOR’s binary format
  • Design Protocol Buffer schemas: Create .proto files for efficient typed data serialization
  • Build custom binary formats: Design byte-level encodings for ultra-constrained devices
  • Choose between binary formats: Select CBOR, Protobuf, or custom binary based on requirements
  • Handle schema evolution: Plan for future changes without breaking compatibility

This is part of a series on IoT Data Formats:

  1. IoT Data Formats Overview - Introduction and text formats
  2. Binary Data Formats (this chapter) - CBOR, Protobuf, custom binary
  3. Data Format Selection - Decision guides and real-world examples
  4. Data Formats Practice - Scenarios, quizzes, worked examples

Technical Deep Dives: - Data Representation - Binary and hexadecimal encoding - Packet Structure and Framing - Protocol headers and framing - CoAP - CBOR’s primary protocol

42.2 Prerequisites

Before starting this chapter, you should be familiar with:


42.3 CBOR - Compact Binary Object Representation

CBOR is β€œbinary JSON” - same data model, much smaller size.

Same data in CBOR:

A4                      # Map with 4 pairs
  68 646576696365496420 # "deviceId" (8-byte string)
  6A 73656E736F722D303031 # "sensor-001" (10-byte string)
  64 74656D70          # "temp"
  F9 4BBB              # 23.5 (float16)
  68 68756D6964697479  # "humidity"
  18 41                # 65 (uint8)
  69 74696D657374616D70 # "timestamp"
  1A 657F8A57          # 1702834567 (uint32)

Size: ~50 bytes (47% smaller than JSON!)

Pros:

  • Much smaller than JSON (30-60% reduction)
  • Faster parsing than JSON
  • Same data model as JSON (easy migration)
  • IETF standard (RFC 8949)

Cons:

  • Not human-readable (need hex viewer + parser)
  • Smaller ecosystem than JSON
  • Still includes field names (overhead)

Best for: CoAP, MQTT over LoRaWAN, NB-IoT

TipMinimum Viable Understanding: Serialization Fundamentals

Core Concept: Serialization converts in-memory data structures (objects, arrays, numbers) into a sequence of bytes that can be transmitted over a network or stored on disk, and deserialization reverses this process - the choice of serialization format determines message size, parsing speed, and cross-platform compatibility.

Why It Matters: Serialization is the bridge between your code and the network. A temperature reading stored as a 32-bit float (4 bytes) in memory becomes anywhere from 2 bytes (custom binary) to 50+ bytes (JSON with metadata) when serialized. This overhead multiplies across every message, every device, every day. For a 10,000-device fleet sending hourly updates, the difference between JSON and CBOR serialization can mean 500 GB/year of saved bandwidth and proportional reductions in cellular data costs and battery consumption.

Key Takeaway: Match serialization format to your encoding/decoding location. If both sender and receiver are microcontrollers (embedded-to-embedded), use compact binary formats like CBOR or custom binary. If data flows to cloud services (embedded-to-cloud), CBOR or Protobuf balance efficiency with ecosystem support. If humans need to debug or inspect data (any-to-dashboard), keep JSON for at least the final hop. The encoding cost is paid once per message; choose based on who needs to read it.

CBOR’s power comes from its rich type system that goes beyond JSON’s limited types.

Major Type Encoding (first 3 bits of initial byte):

Type Range Description Example
0 0x00-0x1F Unsigned integer 0x17 = 23
1 0x20-0x3F Negative integer 0x37 = -24
2 0x40-0x5F Byte string 0x44 + 4 bytes
3 0x60-0x7F Text string 0x64 + β€œtemp”
4 0x80-0x9F Array 0x82 = 2-item array
5 0xA0-0xBF Map 0xA4 = 4-pair map
6 0xC0-0xDF Tagged value 0xC1 = epoch time
7 0xE0-0xFF Special/float 0xF9 = float16

Compact integer encoding:

  • 0-23: Single byte (0x00 to 0x17)
  • 24-255: Two bytes (0x18 + value)
  • 256-65535: Three bytes (0x19 + 2-byte value)
  • Larger: 0x1A (4 bytes) or 0x1B (8 bytes)

IoT-specific tags (RFC 8949):

Tag Meaning Use Case
0 Date/time string ISO 8601 timestamps
1 Epoch timestamp Unix time (compact)
2 Positive bignum Large sensor IDs
32 URI Resource identifiers
55799 Self-describe CBOR Magic number for detection

Float precision selection:

  • 0xF9 + 2 bytes: float16 (3-4 significant digits)
  • 0xFA + 4 bytes: float32 (7 significant digits)
  • 0xFB + 8 bytes: float64 (15 significant digits)

IoT optimization tip: Use float16 for sensor readings (temp, humidity) where 3 digits of precision is sufficient. Saves 2-6 bytes per value!

Debugging CBOR: Use cbor2diag tool to convert binary to diagnostic notation:

echo "A2 64 74656D70 F9 4BC0 68 68756D6964697479 18 41" | xxd -r -p | cbor2diag
# Output: {"temp": 23.5, "humidity": 65}

Three-panel flowchart showing JSON to CBOR encoding transformation. Top panel (orange): JSON Input showing 95-byte structure with deviceId sensor-001, temp 23.5, humidity 65, timestamp 1702834567. Middle panel (gray/navy): CBOR Encoding Process flows from Parse JSON to Map header A4 (4 pairs), then branches to four parallel field encodings: Field 1 deviceId string to value string (10 bytes), Field 2 temp string to value float16 F9 4BBB (3 bytes), Field 3 humidity string to value uint8 18 41 (2 bytes), Field 4 timestamp string to value uint32 1A 657F8A57 (5 bytes). Bottom panel (teal): CBOR Output showing 50-byte binary representation A4 68...6449 6A...303031 64...6D70 F9 4BBB 68...697479 18 41 69...616D70 1A 657F8A57. Side annotation (teal box): 47% Size Reduction, 95 bytes to 50 bytes, 45 bytes saved. Demonstrates bandwidth savings from compact binary encoding.

Three-panel flowchart showing JSON to CBOR encoding transformation. Top panel (orange): JSON Input showing 95-byte structure with deviceId sensor-001, temp 23.5, humidity 65, timestamp 1702834567. Middle panel (gray/navy): CBOR Encoding Process flows from Parse JSON to Map header A4 (4 pairs), then branches to four parallel field encodings: Field 1 deviceId string to value string (10 bytes), Field 2 temp string to value float16 F9 4BBB (3 bytes), Field 3 humidity string to value uint8 18 41 (2 bytes), Field 4 timestamp string to value uint32 1A 657F8A57 (5 bytes). Bottom panel (teal): CBOR Output showing 50-byte binary representation A4 68…6449 6A…303031 64…6D70 F9 4BBB 68…697479 18 41 69…616D70 1A 657F8A57. Side annotation (teal box): 47% Size Reduction, 95 bytes to 50 bytes, 45 bytes saved. Demonstrates bandwidth savings from compact binary encoding.
Figure 42.1: CBOR Encoding Visualization: Shows how a JSON message (95 bytes) is transformed into CBOR binary format (50 bytes). The encoding process parses the JSON structure, creates a map header (A4 = 4 key-value pairs), then encodes each field name and value using efficient binary representations. Field names remain as strings, but values use compact types: float16 for temperature (3 bytes vs 4 bytes), uint8 for humidity (2 bytes vs string), and uint32 for timestamp (5 bytes vs string). This results in a 47% size reduction while maintaining the same data model as JSON.

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D', 'clusterBkg': '#ECF0F1'}}}%%
graph LR
    subgraph "Analogy: Sending a Package"
        LETTER["<b>Letter (JSON)</b><br/>━━━━━━━━━━<br/>Dear recipient,<br/>I am sending you<br/>the temperature<br/>which is 23.5...<br/>━━━━━━━━━━<br/>Full sentences<br/>Large envelope"]

        TELEGRAM["<b>Telegram (CBOR)</b><br/>━━━━━━━<br/>TEMP 23.5 STOP<br/>HUM 65 STOP<br/>━━━━━━━<br/>Abbreviated<br/>Medium size"]

        BARCODE["<b>Barcode (Protobuf)</b><br/>━━━━<br/>barcode pattern<br/>(encoded data)<br/>━━━━<br/>Schema lookup<br/>Small label"]
    end

    subgraph "What Each Preserves"
        PRESERVE["<b>All contain same info:</b><br/>β€’ Device ID<br/>β€’ Temperature<br/>β€’ Humidity<br/>β€’ Timestamp<br/>━━━━━━━━━━<br/>Different packaging<br/>Same content!"]
    end

    LETTER -->|"Remove<br/>verbosity"| TELEGRAM
    TELEGRAM -->|"Use codes<br/>not words"| BARCODE

    LETTER -.-> PRESERVE
    TELEGRAM -.-> PRESERVE
    BARCODE -.-> PRESERVE

    style LETTER fill:#E67E22,stroke:#2C3E50,stroke-width:2px,color:#000
    style TELEGRAM fill:#16A085,stroke:#2C3E50,stroke-width:2px,color:#fff
    style BARCODE fill:#2C3E50,stroke:#16A085,stroke-width:2px,color:#fff
    style PRESERVE fill:#ECF0F1,stroke:#7F8C8D,stroke-width:1px,color:#000

Figure 42.2: Alternative view: Postal Analogy - Data formats are like different ways to mail information. JSON is like a formal letter with full sentences - readable but verbose. CBOR is like a telegram - abbreviated but still using words. Protobuf is like a barcode - meaningless without a scanner (schema) but ultra-compact. All three deliver the same information; they just package it differently. This analogy helps beginners understand that compression does not mean data loss - it means smarter encoding. {fig-alt=β€œAnalogy diagram comparing data formats to postal communication methods. Left section shows three packages: Letter (JSON) as orange box with full sentences like Dear recipient I am sending temperature 23.5, described as full sentences in large envelope. Telegram (CBOR) as teal box with abbreviated TEMP 23.5 STOP HUM 65 STOP in medium size. Barcode (Protobuf) as navy box with encoded barcode pattern, requires schema lookup, small label. Arrows show progression: Letter to Telegram (Remove verbosity), Telegram to Barcode (Use codes not words). Right section shows gray box labeled What Each Preserves listing Device ID, Temperature, Humidity, Timestamp with note Different packaging Same content. Dotted lines connect all three formats to the preserved content, emphasizing that all contain identical information despite different sizes.”}

42.4 Protocol Buffers (Protobuf)

Google’s binary format with schema definition.

Schema file (.proto):

message SensorReading {
  string deviceId = 1;
  float temp = 2;
  uint32 humidity = 3;
  uint64 timestamp = 4;
}

Size: ~22 bytes (77% smaller than JSON!)

Pros:

  • Extremely compact (no field names sent)
  • Very fast parsing (code generation)
  • Strong typing and schema evolution
  • Good tooling (protoc compiler)

Cons:

  • Requires schema file on both ends
  • Not self-describing (can’t parse without schema)
  • More complex setup

Best for: High-volume data pipelines, gRPC APIs, edge-to-cloud

Understanding how Protocol Buffers encodes data helps you estimate payload sizes and debug wire format issues.

Binary encoding breakdown:

0A 0A 73656E736F722D303031  # Field 1 (deviceId): "sensor-001"
15 0000BC41                  # Field 2 (temp): 23.5
18 41                        # Field 3 (humidity): 65
20 57 8A7F65                 # Field 4 (timestamp): 1702834567

Encoding rules:

Wire Type Meaning Used For
0 Varint int32, int64, uint32, uint64, bool, enum
1 64-bit fixed64, sfixed64, double
2 Length-delimited string, bytes, embedded messages
5 32-bit fixed32, sfixed32, float

Field tag format: (field_number << 3) | wire_type

  • Field 1 (string): 0A = (1 << 3) | 2 = 0x0A
  • Field 2 (float): 15 = (2 << 3) | 5 = 0x15
  • Field 3 (uint32): 18 = (3 << 3) | 0 = 0x18
  • Field 4 (uint64): 20 = (4 << 3) | 0 = 0x20

Varint encoding (for integers):

  • Uses 7 bits per byte, MSB indicates continuation
  • Small values are compact (1-2 bytes)
  • Large values expand (up to 10 bytes for uint64)

Schema evolution rules:

  • New fields: Add with new field numbers (old clients ignore)
  • Removed fields: Mark as reserved (never reuse numbers)
  • Type changes: Only compatible types (int32 <-> int64)

42.5 Custom Binary Formats

For ultimate efficiency, define your own binary format.

Example: Same data in 16 bytes

Byte layout:
[0-9]:   deviceId "sensor-001" (10 bytes, no null terminator)
[10]:    temp = 235 (uint8, value x 10)
[11]:    humidity = 65 (uint8)
[12-15]: timestamp (uint32, seconds since epoch)

Total: 16 bytes

Pros:

  • Smallest possible size
  • Fastest parsing (no overhead)
  • Complete control

Cons:

  • No tooling, DIY everything
  • No schema evolution (breaking changes)
  • Not self-describing
  • Maintenance burden

Best for: Extremely constrained devices (Sigfox, ultra-low-power)

When standard formats are too large, custom binary encoding becomes necessary. Here are proven patterns for designing efficient custom formats.

Pattern 1: Fixed-Point Encoding

Instead of floating-point (4 bytes), use scaled integers:

Value Range Encoding Bytes Example
Temp: -40.0 to 85.0C int8 + 40 1 23.5C -> 64
Humidity: 0-100% uint8 1 65% -> 65
Voltage: 0-5.0V uint8 x 50 1 3.3V -> 165
GPS lat/lng int32 x 10^6 4 37.7749 -> 37774900

Pattern 2: Bit Packing

Combine multiple small values into single bytes:

// Pack 3 values into 1 byte:
// - Direction (4 bits: 0-15 -> N, NE, E, SE, S, SW, W, NW, ...)
// - Quality (2 bits: 0-3 -> Poor, Fair, Good, Excellent)
// - Alert (2 bits: 0-3 -> None, Low, Medium, High)
uint8_t packed = (direction << 4) | (quality << 2) | alert;

Pattern 3: Delta Encoding

For sequential readings, send differences instead of absolute values:

First message:  [timestamp][temp][humidity][...]  = 16 bytes
Delta messages: [delta_t (1 byte)][delta_temp (1 byte)][...]  = 4 bytes

Savings: 75% for time-series data!

Pattern 4: Enum Compression

Replace strings with numeric codes:

String Code Bytes Saved
β€œtemperature” 0x01 10 bytes
β€œhumidity” 0x02 7 bytes
β€œsensor-001” 0x0001 8 bytes

Versioning strategy (critical for future changes):

Byte 0: Version/Type field
  [0xV0-0xVF]: Version 0-15
  [0x01]: Sensor reading v1
  [0x81]: Sensor reading v1 with extended fields

Bytes 1-N: Payload (format depends on version)

Common pitfalls to avoid:

  1. Endianness: Always document byte order (prefer little-endian for ARM)
  2. Alignment: Ensure 2-byte values start at even offsets
  3. Overflow: Validate input ranges before encoding
  4. Magic numbers: Include a sync byte (0xAA, 0x55) for framing

42.6 Size Comparison Visualization

Horizontal comparison diagram showing three groups. Left group (gray): Sensor Reading description box (4 fields: string, float, int, timestamp). Center group (Format Comparison): Four boxes showing size progression. JSON (orange, 95 bytes, ten dashes, 100% baseline), CBOR (teal, 50 bytes, five dashes, 53% of JSON, 47% reduction), Protobuf (teal, 22 bytes, two dashes, 23% of JSON, 77% reduction), Custom Binary (navy, 16 bytes, one dash, 17% of JSON, 83% reduction). Right group (Key Trade-offs): Four boxes describing trade-offs. JSON (Universal ecosystem, Human-readable, Easy debugging), CBOR (50% size savings, Same data model, Good balance), Protobuf (77% savings, Schema required, Strong typing), Custom (83% savings, No tooling, Maintenance burden). Dotted lines connect each format to its trade-off description. Visual bar graph shows diminishing physical size from left to right, representing efficiency gains versus complexity costs.

Horizontal comparison diagram showing three groups. Left group (gray): Sensor Reading description box (4 fields: string, float, int, timestamp). Center group (Format Comparison): Four boxes showing size progression. JSON (orange, 95 bytes, ten dashes, 100% baseline), CBOR (teal, 50 bytes, five dashes, 53% of JSON, 47% reduction), Protobuf (teal, 22 bytes, two dashes, 23% of JSON, 77% reduction), Custom Binary (navy, 16 bytes, one dash, 17% of JSON, 83% reduction). Right group (Key Trade-offs): Four boxes describing trade-offs. JSON (Universal ecosystem, Human-readable, Easy debugging), CBOR (50% size savings, Same data model, Good balance), Protobuf (77% savings, Schema required, Strong typing), Custom (83% savings, No tooling, Maintenance burden). Dotted lines connect each format to its trade-off description. Visual bar graph shows diminishing physical size from left to right, representing efficiency gains versus complexity costs.
Figure 42.3: Format Size Comparison: Visual comparison of payload sizes for identical sensor data across four common IoT data formats. JSON serves as the 100% baseline (95 bytes) with maximum compatibility and readability. CBOR reduces size by 47% (50 bytes) while maintaining JSON’s flexibility. Protocol Buffers achieves 77% reduction (22 bytes) through schema-based encoding and field number references instead of names. Custom binary reaches 83% reduction (16 bytes) by eliminating all metadata, but requires DIY implementation. The progression shows diminishing returns: each step provides smaller incremental savings while increasing complexity and reducing flexibility.

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D', 'clusterBkg': '#ECF0F1'}}}%%
graph TB
    subgraph "Where Do the Bytes Go?"
        direction TB
        subgraph JSON["JSON: 95 bytes total"]
            J_SYNTAX["Syntax overhead<br/>braces, quotes, commas<br/>~15 bytes"]
            J_NAMES["Field names<br/>deviceId, temp, humidity, ts<br/>~35 bytes"]
            J_VALUES["Actual data values<br/>sensor-001, 23.5, 65, 1702834567<br/>~45 bytes"]
        end

        subgraph CBOR["CBOR: 50 bytes total"]
            C_HEADER["Type markers<br/>A4, 68, F9, 18, 1A<br/>~5 bytes"]
            C_NAMES["Field names (binary)<br/>Same names, compact encoding<br/>~25 bytes"]
            C_VALUES["Compact values<br/>float16, uint8, uint32<br/>~20 bytes"]
        end

        subgraph PROTO["Protobuf: 22 bytes total"]
            P_TAGS["Field tags<br/>1, 2, 3, 4 (not names)<br/>~4 bytes"]
            P_VALUES["Typed values<br/>Efficient encoding<br/>~18 bytes"]
        end

        subgraph CUSTOM["Custom: 16 bytes total"]
            X_VALUES["Pure data only<br/>No overhead at all<br/>16 bytes"]
        end
    end

    J_SYNTAX --> J_NAMES --> J_VALUES
    C_HEADER --> C_NAMES --> C_VALUES
    P_TAGS --> P_VALUES
    X_VALUES

    style J_SYNTAX fill:#7F8C8D,stroke:#2C3E50,stroke-width:1px,color:#fff
    style J_NAMES fill:#E67E22,stroke:#2C3E50,stroke-width:1px,color:#000
    style J_VALUES fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
    style C_HEADER fill:#7F8C8D,stroke:#2C3E50,stroke-width:1px,color:#fff
    style C_NAMES fill:#E67E22,stroke:#2C3E50,stroke-width:1px,color:#000
    style C_VALUES fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
    style P_TAGS fill:#7F8C8D,stroke:#2C3E50,stroke-width:1px,color:#fff
    style P_VALUES fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
    style X_VALUES fill:#16A085,stroke:#2C3E50,stroke-width:2px,color:#fff

Figure 42.4: Alternative view: Byte Breakdown Stack - This diagram answers β€œwhere do the bytes actually go?” JSON spends ~50 bytes on overhead (syntax + field names) and only ~45 bytes on actual data. CBOR eliminates syntax overhead and compresses values, but keeps field names. Protobuf replaces field names with numeric tags. Custom binary is pure data with zero overhead. This layered view helps students understand that format efficiency comes from removing different types of overhead, not just β€œcompression.” {fig-alt=β€œStacked layer diagram showing byte allocation within 4 data formats. JSON (95 bytes): Three stacked layers - Syntax overhead (gray, braces quotes commas, ~15 bytes), Field names (orange, deviceId temp humidity ts, ~35 bytes), Actual data values (teal, sensor-001 23.5 65 1702834567, ~45 bytes). CBOR (50 bytes): Three layers - Type markers (gray, A4 68 F9 18 1A, ~5 bytes), Field names binary (orange, same names compact encoding, ~25 bytes), Compact values (teal, float16 uint8 uint32, ~20 bytes). Protobuf (22 bytes): Two layers - Field tags (gray, 1 2 3 4 not names, ~4 bytes), Typed values (teal, efficient encoding, ~18 bytes). Custom (16 bytes): Single layer - Pure data only (teal with thick border, no overhead at all, 16 bytes). Vertical stacks show overhead reduction: JSON has most overhead (gray+orange layers), custom has none. Actual data (teal) is similar size across all formats; the difference is overhead elimination.”}

42.7 Performance Benchmarks

Beyond payload size, parsing speed and memory usage vary significantly across formats and libraries.

Parsing Performance (ESP32, 240 MHz, typical IoT payload):

Format Library Parse Time Memory Notes
JSON ArduinoJson 1.2 ms 512 B Dynamic allocation
JSON cJSON 0.8 ms 384 B Simpler, lighter
CBOR tinycbor 0.3 ms 128 B Streaming parser
CBOR cn-cbor 0.4 ms 256 B Tree-based
Protobuf nanopb 0.1 ms 64 B Static allocation
Custom (manual) 0.05 ms 0 B No parsing overhead

Memory allocation strategies:

  1. Static allocation (Protobuf/nanopb): Pre-allocate based on schema
    • Pros: Predictable, no fragmentation
    • Cons: Wastes memory for variable-length fields
  2. Dynamic allocation (JSON/ArduinoJson): Allocate on parse
    • Pros: Flexible for variable payloads
    • Cons: Heap fragmentation risk, slower
  3. Streaming (CBOR/tinycbor): Process as bytes arrive
    • Pros: Minimal memory, real-time processing
    • Cons: No random access to fields

Library recommendations by platform:

Platform JSON CBOR Protobuf
ESP32/ESP8266 ArduinoJson tinycbor nanopb
STM32 cJSON cn-cbor nanopb
Raspberry Pi nlohmann/json libcbor protobuf-c
Python (cloud) json (stdlib) cbor2 protobuf
Node.js JSON.parse cbor protobufjs

Energy impact (NB-IoT transmission at 23 dBm):

Format Bytes TX Time Energy Battery Impact
JSON 95 47 ms 9.4 mJ Baseline
CBOR 50 25 ms 5.0 mJ 47% savings
Protobuf 22 11 ms 2.2 mJ 77% savings
Custom 16 8 ms 1.6 mJ 83% savings

Real-world impact: For a 5-year battery target with 96 messages/day, format choice can mean the difference between 4-year and 6-year battery life.


42.8 Format Trade-off Summary

TipTradeoff: JSON vs Binary Formats (CBOR/Protobuf)

Decision context: When choosing between human-readable JSON and compact binary formats for IoT data serialization

Factor JSON CBOR Protobuf
Battery impact High (large payloads) Medium (47% smaller) Low (77% smaller)
Bandwidth High (~95 bytes typical) Medium (~50 bytes) Low (~22 bytes)
Latency Higher (parsing overhead) Medium Low (fast decode)
Readability Excellent (text editor) Requires tools Requires schema + tools
Flexibility Excellent (schemaless) Good (self-describing) Moderate (schema required)
Schema evolution Easy (add fields anytime) Good Good (with planning)
Tooling Universal Growing Strong (protoc, gRPC)
Development speed Fastest Moderate Slower (schema first)

Choose JSON when:

  • Bandwidth is not constrained (Wi-Fi, Ethernet, LTE)
  • Development speed and debugging are priorities
  • Schema may change frequently during prototyping
  • Integrating with web services and REST APIs
  • Small deployments (<100 devices) where data costs are negligible

Choose CBOR when:

  • Bandwidth is limited but you need JSON-like flexibility (LoRaWAN, NB-IoT)
  • Migrating from JSON with minimal code changes
  • Self-describing format needed (no schema coordination required)
  • CoAP protocol usage (CBOR is the standard payload format)
  • Balance between efficiency and maintainability

Choose Protobuf when:

  • High-volume deployments (>1000 devices) where bandwidth savings matter
  • Strong typing and schema enforcement are requirements
  • Building gRPC-based microservices architecture
  • Long-term API contracts with multiple teams
  • Maximum efficiency needed with acceptable schema management overhead

Default recommendation: Start with JSON for prototyping, migrate to CBOR when bandwidth becomes a concern, use Protobuf for high-scale production systems with stable schemas


42.9 Summary

Key Points:

  • CBOR: Binary JSON with 47% size reduction, same data model, IETF standard
  • Protobuf: Schema-based with 77% reduction, strong typing, requires setup
  • Custom Binary: Maximum efficiency (83%) but high maintenance burden
  • Match serialization format to your sender/receiver types
  • Consider parsing speed and memory, not just payload size

Quick Reference:

Format Size Complexity Schema Best For
CBOR 50 bytes Low None LoRaWAN, NB-IoT, CoAP
Protobuf 22 bytes Medium Required gRPC, high-volume
Custom 16 bytes High DIY Sigfox, ultra-low-power

42.10 What’s Next

Now that you understand binary data formats and their trade-offs:

Continue to Data Format Selection β†’