9  Binary Data Formats for IoT

In 60 Seconds

Binary data formats (CBOR, Protocol Buffers, MessagePack) encode IoT sensor readings in 2-10× fewer bytes than JSON. A 4-field temperature/humidity/CO2/timestamp reading takes 43 bytes in CBOR versus 98 bytes in JSON — the 55-byte saving multiplied by 1 million daily readings saves 55 MB/day in bandwidth.

9.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Implement CBOR encoding: Encode and decode sensor data using CBOR’s compact binary format and type system
  • Design Protocol Buffer schemas: Create .proto files for efficient typed data serialization with field numbering
  • Build custom binary formats: Construct byte-level encodings using fixed-point, bit packing, and delta encoding for ultra-constrained devices
  • Evaluate binary format trade-offs: Assess CBOR, Protobuf, and custom binary against bandwidth, battery, and maintainability requirements
  • Analyze schema evolution strategies: Identify forward-compatible and backward-compatible changes in Protobuf schemas
  • Calculate payload size reductions: Compute byte-level savings when migrating from JSON to CBOR, Protobuf, or custom binary
Key Concepts
  • CBOR: Concise Binary Object Representation — self-describing binary format based on JSON data model, 2-5× smaller than JSON
  • Protocol Buffers: Google’s schema-defined binary serialization — fastest parsing but requires schema distribution
  • MessagePack: Schema-free binary JSON replacement — 30-50% smaller than JSON with compatible type system
  • TLV Encoding: Type-Length-Value format used in low-level protocols: 1-byte type + length + raw bytes
  • Varint Encoding: Variable-length integer encoding in Protocol Buffers — small values use 1 byte, large values up to 10 bytes
  • Schema Evolution: Ability to add/remove fields without breaking existing producers/consumers — critical for long-lived IoT deployments
  • Endianness: Byte order for multi-byte values — big-endian (network byte order) vs. little-endian (x86); must match between sender and receiver

9.2 Most Valuable Understanding (MVU)

Binary data formats shrink IoT messages by encoding data as compact bytes instead of human-readable text, saving 50-80% in bandwidth and battery life.

This is the single most important concept in this chapter. When your sensor sends “temperature: 23.5”, that’s 16 characters (16 bytes). In CBOR, it’s just 3 bytes. In custom binary, it’s 2 bytes. Over thousands of messages per day across thousands of devices, this difference determines whether your deployment succeeds or fails.

Remember: Binary formats trade human readability for efficiency. Choose based on whether machines or humans need to read the data.

This is part of a series on IoT Data Formats:

  1. IoT Data Formats Overview - Introduction and text formats
  2. Binary Data Formats (this chapter) - CBOR, Protobuf, custom binary
  3. Data Format Selection - Decision guides and real-world examples
  4. Data Formats Practice - Scenarios, quizzes, worked examples

Technical Deep Dives:

9.3 Prerequisites

Before starting this chapter, you should be familiar with:


Hey there, future IoT engineer! Let’s learn about binary data formats with the Sensor Squad!

Sammy the Sensor needs to send a temperature reading to the cloud. But there’s a problem - Sammy runs on a tiny battery and can only send small messages through LoRaWAN!

Think of it like packing for a trip with a tiny suitcase:

  • JSON is like packing with all the original boxes and labels - “This is a TEMPERATURE reading and it equals TWENTY-THREE POINT FIVE DEGREES” - takes lots of space!
  • CBOR is like removing the boxes but keeping small labels - more efficient!
  • Protobuf is like using a packing list where you just write “Item #2 = 23.5” - super compact!
  • Custom Binary is like vacuum-sealing everything - smallest possible, but only YOU know what’s inside!

The Sensor Squad Challenge:

Sammy has to send: Temperature = 23.5, Humidity = 65

Format What Sammy Sends Size
JSON {"temp":23.5,"humidity":65} 28 characters
CBOR A2 64 74656D70 F9 4BC0 68... 15 bytes
Custom EB 41 2 bytes!

Lila the Light says: “Wow! Custom binary is 14 times smaller than JSON!”

Max the Motor adds: “But remember - only Sammy and Cloudy know what EB 41 means. It’s their secret code!”

Fun Fact: A LoRaWAN device can only send about 50 bytes at a time. With JSON, Sammy could only send 2-3 readings. With custom binary, Sammy could send 25 readings in one message!

Try This: Write down your age and your best friend’s age as regular numbers. Now try writing them as single bytes (0-255). How much space did you save?


Analogy: Binary Formats are Like Shorthand Notes

Imagine you’re a reporter taking notes during a fast-paced interview:

  • Full sentences (JSON): “The temperature reading from sensor number one is twenty-three point five degrees Celsius” - accurate but slow to write
  • Abbreviations (CBOR): “Temp s1 = 23.5C” - faster, still readable with context
  • Personal shorthand (Protobuf): “T1:23.5” - very fast, needs your codebook
  • Stenography (Custom Binary): Special symbols only you understand - fastest possible

Why does this matter in IoT?

Challenge How Binary Formats Help
Limited bandwidth Smaller messages = more data through narrow pipes (LoRaWAN, Sigfox)
Battery life Less data to transmit = radio on for shorter time = longer battery
Data costs Cellular IoT charges per byte - smaller = cheaper
Processing speed Binary parses faster than text on microcontrollers

Key Terms:

Term Definition
Serialization Converting data to bytes for transmission
Deserialization Converting bytes back to usable data
Schema A blueprint describing data structure
Self-describing Format that includes field names (like JSON/CBOR)
Schema-dependent Format that needs external schema (like Protobuf)
Endianness Byte order: big-endian (MSB first) or little-endian (LSB first)

9.4 Binary Format Decision Tree

Before diving into each format, here’s how to choose between them:

Decision tree for choosing between binary data formats (CBOR, MessagePack, Protocol Buffers, FlatBuffers) based on constraints like schema requirements, message size, and processing power

This flowchart helps you choose the right binary format based on your constraints. Now let’s explore each format in detail.


9.5 CBOR - Compact Binary Object Representation

CBOR is “binary JSON” - same data model, much smaller size.

Same data in CBOR:

A4                      # Map with 4 pairs
  68 646576696365496420 # "deviceId" (8-byte string)
  6A 73656E736F722D303031 # "sensor-001" (10-byte string)
  64 74656D70          # "temp"
  F9 4BC0              # 23.5 (float16)
  68 68756D6964697479  # "humidity"
  18 41                # 65 (uint8)
  69 74696D657374616D70 # "timestamp"
  1A 657F8A57          # 1702834567 (uint32)

Size: ~50 bytes (47% smaller than JSON!)

Pros:

  • Much smaller than JSON (30-60% reduction)
  • Faster parsing than JSON
  • Same data model as JSON (easy migration)
  • IETF standard (RFC 8949)

Cons:

  • Not human-readable (need hex viewer + parser)
  • Smaller ecosystem than JSON
  • Still includes field names (overhead)

Best for: CoAP, MQTT over LoRaWAN, NB-IoT

Scenario: A parking sensor fleet (1000 devices) sends status updates every 5 minutes over NB-IoT. Current JSON payload is 85 bytes. Cellular data costs $0.01/MB. Can CBOR reduce this?

Step 1: Calculate current usage

  • Messages per device per month: 12/hour × 24 hours × 30 days = 8,640
  • Total data: 1000 devices × 8,640 messages × 85 bytes = 700.68 MB/month
  • Cost: 700.68 MB × $0.01/MB = $7.01/month

Step 2: Estimate CBOR size

  • JSON overhead for this payload: ~40 bytes (field names, syntax)
  • CBOR overhead: ~15 bytes (type markers, field names in binary)
  • Expected CBOR size: 85 - 40 + 15 = 60 bytes (~30% reduction)

Step 3: Projected savings

  • New data: 1000 × 8,640 × 60 bytes = 494.38 MB/month
  • Cost: 494.38 MB × $0.01/MB = $4.94/month
  • Savings: $2.07/month = $24.84/year

Implementation effort: 2 weeks (encode library on devices, decode on server)

ROI: Payback in < 1 month of deployment.

Interactive Calculator:

Common Mistake: Not Testing CBOR Library Memory Usage

Problem: Assuming CBOR will work on ultra-constrained devices (e.g., 2KB RAM).

Reality: Some CBOR libraries (like cn-cbor) allocate tree structures during parsing, consuming 256-512 bytes RAM even for tiny payloads. On an ATtiny with 2KB RAM, this can cause stack overflow.

Fix: Use streaming parsers (like tinycbor) that process data incrementally, requiring only 64-128 bytes RAM. Test on actual hardware before committing to CBOR.

What to observe: Monitor heap fragmentation with ESP.getFreeHeap() (ESP32) or freeMemory() (Arduino). If available RAM drops below 20% after parsing, switch to a lighter library or custom binary.

Minimum Viable Understanding: Serialization Fundamentals

Core Concept: Serialization converts in-memory data structures (objects, arrays, numbers) into a sequence of bytes that can be transmitted over a network or stored on disk, and deserialization reverses this process - the choice of serialization format determines message size, parsing speed, and cross-platform compatibility.

Why It Matters: Serialization is the bridge between your code and the network. A temperature reading stored as a 32-bit float (4 bytes) in memory becomes anywhere from 2 bytes (custom binary) to 50+ bytes (JSON with metadata) when serialized. This overhead multiplies across every message, every device, every day. For a 10,000-device fleet sending hourly updates, the difference between JSON and CBOR serialization can mean 500 GB/year of saved bandwidth and proportional reductions in cellular data costs and battery consumption.

Key Takeaway: Match serialization format to your encoding/decoding location. If both sender and receiver are microcontrollers (embedded-to-embedded), use compact binary formats like CBOR or custom binary. If data flows to cloud services (embedded-to-cloud), CBOR or Protobuf balance efficiency with ecosystem support. If humans need to debug or inspect data (any-to-dashboard), keep JSON for at least the final hop. The encoding cost is paid once per message; choose based on who needs to read it.

Given: 10K devices, hourly updates, JSON (95B) vs CBOR (50B)

\[\text{Daily JSON} = 10{,}000 \times 24 \times 95\text{B} = 21.74\text{ MB/day}\] \[\text{Daily CBOR} = 10{,}000 \times 24 \times 50\text{B} = 10.99\text{ MB/day}\] \[\text{Annual savings} = (21.74 - 10.99) \times 365 = 3{,}923.75\text{ MB/year}\]

**At \(0.10/MB cellular data**:\)\(\text{Cost difference} = 3{,}923.75\text{ MB} \times \$0.10 = \$392.38/\text{year}\)$

For battery-powered sensors, the 47% payload reduction extends battery life proportionally - CBOR messages take 47% less transmission time, saving milliwatt-hours per message.

CBOR’s power comes from its rich type system that goes beyond JSON’s limited types.

Major Type Encoding (first 3 bits of initial byte):

Type Range Description Example
0 0x00-0x1F Unsigned integer 0x17 = 23
1 0x20-0x3F Negative integer 0x37 = -24
2 0x40-0x5F Byte string 0x44 + 4 bytes
3 0x60-0x7F Text string 0x64 + “temp”
4 0x80-0x9F Array 0x82 = 2-item array
5 0xA0-0xBF Map 0xA4 = 4-pair map
6 0xC0-0xDF Tagged value 0xC1 = epoch time
7 0xE0-0xFF Special/float 0xF9 = float16

Compact integer encoding:

  • 0-23: Single byte (0x00 to 0x17)
  • 24-255: Two bytes (0x18 + value)
  • 256-65535: Three bytes (0x19 + 2-byte value)
  • Larger: 0x1A (4 bytes) or 0x1B (8 bytes)

IoT-specific tags (RFC 8949):

Tag Meaning Use Case
0 Date/time string ISO 8601 timestamps
1 Epoch timestamp Unix time (compact)
2 Positive bignum Large sensor IDs
32 URI Resource identifiers
55799 Self-describe CBOR Magic number for detection

Float precision selection:

  • 0xF9 + 2 bytes: float16 (3-4 significant digits)
  • 0xFA + 4 bytes: float32 (7 significant digits)
  • 0xFB + 8 bytes: float64 (15 significant digits)

IoT optimization tip: Use float16 for sensor readings (temp, humidity) where 3 digits of precision is sufficient. Saves 2-6 bytes per value!

Debugging CBOR: Use cbor2diag tool to convert binary to diagnostic notation:

echo "A2 64 74656D70 F9 4BC0 68 68756D6964697479 18 41" | xxd -r -p | cbor2diag
# Output: {"temp": 23.5, "humidity": 65}
CBOR figure review replacement

CBOR Encoding Visualization

The same four-field sensor reading stays semantically identical, but the wire format drops most of the textual overhead.

1. JSON input

95 bytes
{
  "deviceId": "sensor-001",
  "temp": 23.5,
  "humidity": 65,
  "timestamp": 1702834567
}
  • Readable field names on the wire
  • Text digits for every numeric value
  • Extra braces, quotes, commas, and colons

2. Encoding decisions

Structure kept, bytes trimmed
Map header A4 says "4 key-value pairs" in one byte.
Compact value types temp -> float16, humidity -> uint8, timestamp -> uint32.
Field names remain CBOR still stores deviceId, temp, humidity, and timestamp.

3. CBOR output

50 bytes, 47% smaller
A4
68 6465766963654964
6A 73656E736F722D303031
64 74656D70 F9 4BC0
68 68756D6964697479 18 41
69 74696D657374616D70 1A 657F8A57
  • 45 bytes saved on each message
  • Same data model as JSON
  • Much smaller radio airtime on constrained links

Figure 1. CBOR reduces payload size by swapping human-readable punctuation and decimal strings for typed binary values while preserving the original JSON-style map structure.

Figure 9.1
Alternative mental model

Postal Analogy for Binary Formats

All three formats deliver the same facts. The difference is how much packaging overhead each one carries.

JSON = formal letter

Payload: "Dear cloud, device sensor-001 reports temp 23.5 and humidity 65."

Why teams like it: Anyone can open and read it with no extra tooling.

CBOR = telegram

Payload: "TEMP 23.5 STOP HUM 65 STOP TS 1702834567"

Why teams like it: Same message, fewer characters, still self-describing.

Protobuf = barcode label

Payload: field 1, field 2, field 3, field 4 encoded by number.

Why teams like it: Very compact, but the receiver must have the schema.

What stays the same

  • deviceId = sensor-001
  • temp = 23.5
  • humidity = 65
  • timestamp = 1702834567

Figure 2. Compression here is not data loss. Each format keeps the same sensor facts and changes only how much metadata rides along with them.

Figure 9.2

9.6 Protocol Buffers (Protobuf)

Google’s binary format with schema definition.

Schema file (.proto):

message SensorReading {
  string deviceId = 1;
  float temp = 2;
  uint32 humidity = 3;
  uint64 timestamp = 4;
}

Size: ~22 bytes (77% smaller than JSON!)

Pros:

  • Extremely compact (no field names sent)
  • Very fast parsing (code generation)
  • Strong typing and schema evolution
  • Good tooling (protoc compiler)

Cons:

  • Requires schema file on both ends
  • Not self-describing (can’t parse without schema)
  • More complex setup

Best for: High-volume data pipelines, gRPC APIs, edge-to-cloud

Scenario: A factory has 5,000 vibration sensors on motors, sending readings every 10 seconds (3 axes × float, device ID, timestamp). Current CBOR implementation uses 45 bytes per message. Can Protobuf improve efficiency?

Calculate current load:

  • Messages per hour: 5,000 sensors × 360 readings/hour = 1,800,000 messages/hour
  • Hourly data: 1.8M × 45 bytes = 81 MB/hour = 1.94 GB/day = 58.2 GB/month

Design Protobuf schema:

message VibrationReading {
  uint32 deviceId = 1;    // Field numbers replace names
  float accel_x = 2;
  float accel_y = 3;
  float accel_z = 4;
  uint64 timestamp = 5;
}

Expected size: 4 bytes (deviceId) + 12 bytes (3 floats) + 5 bytes (timestamp) + 5 bytes (field tags) = 26 bytes (42% reduction from CBOR)

Projected savings:

  • New monthly data: 58.2 GB × (26/45) = 33.6 GB/month
  • Savings: 24.6 GB/month = 25,190 MB/month
  • At $0.01/MB: $251.90/month = $3,022.80/year

Implementation effort: 1 week (schema design, generate code, integrate)

ROI: Payback in 2 months if development costs $5,000.

Key insight: Protobuf shines in high-message-volume scenarios where setup cost is amortized across millions of messages.

Interactive Calculator:

Understanding how Protocol Buffers encodes data helps you estimate payload sizes and debug wire format issues.

Binary encoding breakdown:

0A 0A 73656E736F722D303031  # Field 1 (deviceId): "sensor-001"
15 0000BC41                  # Field 2 (temp): 23.5
18 41                        # Field 3 (humidity): 65
20 57 8A7F65                 # Field 4 (timestamp): 1702834567

Encoding rules:

Wire Type Meaning Used For
0 Varint int32, int64, uint32, uint64, bool, enum
1 64-bit fixed64, sfixed64, double
2 Length-delimited string, bytes, embedded messages
5 32-bit fixed32, sfixed32, float

Field tag format: (field_number << 3) | wire_type

  • Field 1 (string): 0A = (1 << 3) | 2 = 0x0A
  • Field 2 (float): 15 = (2 << 3) | 5 = 0x15
  • Field 3 (uint32): 18 = (3 << 3) | 0 = 0x18
  • Field 4 (uint64): 20 = (4 << 3) | 0 = 0x20

Varint encoding (for integers):

  • Uses 7 bits per byte, MSB indicates continuation
  • Small values are compact (1-2 bytes)
  • Large values expand (up to 10 bytes for uint64)

Schema evolution rules:

  • New fields: Add with new field numbers (old clients ignore)
  • Removed fields: Mark as reserved (never reuse numbers)
  • Type changes: Only compatible types (int32 <-> int64)

9.7 Custom Binary Formats

For ultimate efficiency, define your own binary format.

Example: Same data in 16 bytes

Byte layout:
[0-9]:   deviceId "sensor-001" (10 bytes, no null terminator)
[10]:    temp = 235 (uint8, value x 10)
[11]:    humidity = 65 (uint8)
[12-15]: timestamp (uint32, seconds since epoch)

Total: 16 bytes

Pros:

  • Smallest possible size
  • Fastest parsing (no overhead)
  • Complete control

Cons:

  • No tooling, DIY everything
  • No schema evolution (breaking changes)
  • Not self-describing
  • Maintenance burden

Best for: Extremely constrained devices (Sigfox, ultra-low-power)

When standard formats are too large, custom binary encoding becomes necessary. Here are proven patterns for designing efficient custom formats.

Pattern 1: Fixed-Point Encoding

Instead of floating-point (4 bytes), use scaled integers:

Value Range Encoding Bytes Example
Temp: -40.0 to 85.0C int8 + 40 1 23.5C -> 64
Humidity: 0-100% uint8 1 65% -> 65
Voltage: 0-5.0V uint8 x 50 1 3.3V -> 165
GPS lat/lng int32 x 10^6 4 37.7749 -> 37774900

Pattern 2: Bit Packing

Combine multiple small values into single bytes:

// Pack 3 values into 1 byte:
// - Direction (4 bits: 0-15 -> N, NE, E, SE, S, SW, W, NW, ...)
// - Quality (2 bits: 0-3 -> Poor, Fair, Good, Excellent)
// - Alert (2 bits: 0-3 -> None, Low, Medium, High)
uint8_t packed = (direction << 4) | (quality << 2) | alert;

Pattern 3: Delta Encoding

For sequential readings, send differences instead of absolute values:

First message:  [timestamp][temp][humidity][...]  = 16 bytes
Delta messages: [delta_t (1 byte)][delta_temp (1 byte)][...]  = 4 bytes

Savings: 75% for time-series data!

Pattern 4: Enum Compression

Replace strings with numeric codes:

String Code Bytes Saved
“temperature” 0x01 10 bytes
“humidity” 0x02 7 bytes
“sensor-001” 0x0001 8 bytes

Versioning strategy (critical for future changes):

Byte 0: Version/Type field
  [0xV0-0xVF]: Version 0-15
  [0x01]: Sensor reading v1
  [0x81]: Sensor reading v1 with extended fields

Bytes 1-N: Payload (format depends on version)

Common pitfalls to avoid:

  1. Endianness: Always document byte order (prefer little-endian for ARM)
  2. Alignment: Ensure 2-byte values start at even offsets
  3. Overflow: Validate input ranges before encoding
  4. Magic numbers: Include a sync byte (0xAA, 0x55) for framing
Quick Check: Match the Binary Format to Its Characteristic


Question: A fleet of 500 delivery trucks sends GPS location, speed, fuel level, and engine diagnostics every 30 seconds to a central dispatch system. Current JSON payloads average 180 bytes. Cellular data costs $0.02/MB. Which format selection justifies its implementation effort?

A) Keep JSON - universal compatibility and debugging ease outweigh data costs at this scale

B) Migrate to CBOR - 50% size reduction pays back development costs in bandwidth savings within 3 months

C) Implement Protobuf - 75% size reduction and strong typing prevent data corruption in critical dispatch systems

D) Build custom binary - maximum efficiency is required for cellular networks

Correct: C) Implement Protobuf

Calculate the impact:

  • Messages per day: 500 trucks × 2 msgs/min × 60 min/hour × 24 hours = 1,440,000 messages/day
  • Daily data (JSON): 1.44M × 180 bytes = 247.19 MB/day = 7,415.77 MB/month
  • Monthly cost (JSON): 7,415.77 MB × $0.02/MB = $148.32/month

CBOR (50% reduction):

  • Size: 90 bytes/message
  • Monthly cost: $74.16/month
  • Savings: $74.16/month = $889.92/year
  • ROI: Pays back ~2 weeks of development

Protobuf (75% reduction):

  • Size: 45 bytes/message
  • Monthly cost: $37.08/month
  • Savings: $111.24/month = $1,334.88/year
  • Strong typing catches field mismatches at compile time (critical for dispatch)
  • ROI: Pays back 1-2 months of development

Why not custom binary?

  • Savings over Protobuf: Only $10-15/month additional (~10% reduction)
  • Cost: DIY maintenance, no tooling, version management burden
  • Risk: Data corruption without schema enforcement

Key insight: At 1.4M messages/day, Protobuf’s schema enforcement provides both cost savings AND reliability improvements. The strong typing prevents sending wrong data types (e.g., string instead of integer for fuel level) which could cause dispatch errors. Custom binary’s marginal efficiency gain does not justify losing type safety.

Interactive Format Comparison Calculator:


9.8 Size Comparison Visualization

Payload size comparison

How much wire overhead each format carries

Use JSON as the readable baseline, then measure what each binary option saves and what coordination cost it adds.

JSON

95 B
  • 100% baseline size
  • Best for debugging and browser tools
  • Highest bandwidth and battery cost

CBOR

50 B
  • 47% smaller than JSON
  • Same map-style data model
  • Good default when you still want self-description

Protobuf

22 B
  • 77% smaller than JSON
  • Strong typing and schema contracts
  • Best when producers and consumers can share a schema

Custom binary

16 B
  • 83% smaller than JSON
  • No field-name overhead at all
  • Only worth it when you also own parser, tests, and migration burden

Figure 3. The biggest savings come from dropping field names and textual syntax. The later jumps buy fewer bytes while increasing implementation and maintenance cost.

Figure 9.3
Alternative byte view

Where the bytes actually go

Most of JSON's size is not the sensor values themselves. It is the punctuation, field names, and repeated textual representation around those values.

JSON

95 B
Syntax ~15 B
Field names ~35 B
Actual values ~45 B
  • Great for humans, expensive for radios

CBOR

50 B
Type markers ~5 B
Field names ~25 B
Compact values ~20 B
  • Keeps names, compresses how values are stored

Protobuf

22 B
Field tags ~4 B
Typed values ~18 B
  • Schema replaces field names with tag numbers

Custom binary

16 B
Pure packed data ~16 B
  • Smallest wire size, highest long-term maintenance burden

Figure 4. Binary efficiency comes from removing different layers of overhead. JSON pays for readability, CBOR trims syntax, Protobuf trims names, and custom binary removes nearly all metadata.

Figure 9.4

9.9 Performance Benchmarks

Beyond payload size, parsing speed and memory usage vary significantly across formats and libraries.

Parsing Performance (ESP32, 240 MHz, typical IoT payload):

Format Library Parse Time Memory Notes
JSON ArduinoJson 1.2 ms 512 B Dynamic allocation
JSON cJSON 0.8 ms 384 B Simpler, lighter
CBOR tinycbor 0.3 ms 128 B Streaming parser
CBOR cn-cbor 0.4 ms 256 B Tree-based
Protobuf nanopb 0.1 ms 64 B Static allocation
Custom (manual) 0.05 ms 0 B No parsing overhead

Memory allocation strategies:

  1. Static allocation (Protobuf/nanopb): Pre-allocate based on schema
    • Pros: Predictable, no fragmentation
    • Cons: Wastes memory for variable-length fields
  2. Dynamic allocation (JSON/ArduinoJson): Allocate on parse
    • Pros: Flexible for variable payloads
    • Cons: Heap fragmentation risk, slower
  3. Streaming (CBOR/tinycbor): Process as bytes arrive
    • Pros: Minimal memory, real-time processing
    • Cons: No random access to fields

Library recommendations by platform:

Platform JSON CBOR Protobuf
ESP32/ESP8266 ArduinoJson tinycbor nanopb
STM32 cJSON cn-cbor nanopb
Raspberry Pi nlohmann/json libcbor protobuf-c
Python (cloud) json (stdlib) cbor2 protobuf
Node.js JSON.parse cbor protobufjs

Energy impact (NB-IoT transmission at 23 dBm):

Format Bytes TX Time Energy Battery Impact
JSON 95 47 ms 9.4 mJ Baseline
CBOR 50 25 ms 5.0 mJ 47% savings
Protobuf 22 11 ms 2.2 mJ 77% savings
Custom 16 8 ms 1.6 mJ 83% savings

Real-world impact: For a 5-year battery target with 96 messages/day, format choice can mean the difference between 4-year and 6-year battery life.


9.10 Format Trade-off Summary

Tradeoff: JSON vs Binary Formats (CBOR/Protobuf)

Decision context: When choosing between human-readable JSON and compact binary formats for IoT data serialization

Comparison at a glance:

  • Battery impact: JSON = high (large payloads); CBOR = medium (47% smaller); Protobuf = low (77% smaller)
  • Bandwidth: JSON = high (~95 bytes typical); CBOR = medium (~50 bytes); Protobuf = low (~22 bytes)
  • Latency: JSON = higher (parsing overhead); CBOR = medium; Protobuf = low (fast decode)
  • Readability: JSON = excellent (text editor); CBOR = requires tools; Protobuf = requires schema and tools
  • Flexibility: JSON = excellent (schemaless); CBOR = good (self-describing); Protobuf = moderate (schema required)
  • Schema evolution: JSON = easy (add fields anytime); CBOR = good; Protobuf = good (with planning)
  • Tooling: JSON = universal; CBOR = growing; Protobuf = strong (protoc, gRPC)
  • Development speed: JSON = fastest; CBOR = moderate; Protobuf = slower (schema first)

Choose JSON when:

  • Bandwidth is not constrained (Wi-Fi, Ethernet, LTE)
  • Development speed and debugging are priorities
  • Schema may change frequently during prototyping
  • Integrating with web services and REST APIs
  • Small deployments (<100 devices) where data costs are negligible

Choose CBOR when:

  • Bandwidth is limited but you need JSON-like flexibility (LoRaWAN, NB-IoT)
  • Migrating from JSON with minimal code changes
  • Self-describing format needed (no schema coordination required)
  • CoAP protocol usage (CBOR is the standard payload format)
  • Balance between efficiency and maintainability

Choose Protobuf when:

  • High-volume deployments (>1000 devices) where bandwidth savings matter
  • Strong typing and schema enforcement are requirements
  • Building gRPC-based microservices architecture
  • Long-term API contracts with multiple teams
  • Maximum efficiency needed with acceptable schema management overhead

Default recommendation: Start with JSON for prototyping, migrate to CBOR when bandwidth becomes a concern, use Protobuf for high-scale production systems with stable schemas


9.11 Summary

Key Points:

  • CBOR: Binary JSON with 47% size reduction, same data model, IETF standard
  • Protobuf: Schema-based with 77% reduction, strong typing, requires setup
  • Custom Binary: Maximum efficiency (83%) but high maintenance burden
  • Match serialization format to your sender/receiver types
  • Consider parsing speed and memory, not just payload size

Quick Reference:

  • CBOR: 50 bytes; low complexity; no schema; best for LoRaWAN, NB-IoT, and CoAP
  • Protobuf: 22 bytes; medium complexity; schema required; best for gRPC and high-volume systems
  • Custom binary: 16 bytes; high complexity; DIY schema/offset management; best for Sigfox and ultra-low-power deployments

9.12 Knowledge Check

9.13 What’s Next

Now that you can implement and evaluate binary data formats, explore these related topics:

  • Data Format Selection: decision guides with real-world deployment examples Apply the CBOR vs Protobuf vs Custom trade-offs to concrete scenarios.
  • Data Formats Practice: hands-on scenarios and worked exercises Reinforce binary encoding skills with guided practice problems.
  • CoAP Fundamentals: constrained application protocol architecture See CBOR as CoAP’s native payload format for constrained networks.
  • Packet Structure and Framing: protocol headers and byte-level framing Extend custom binary design to full protocol packet layouts.
  • Data Representation: binary and hexadecimal encoding fundamentals Deepen your understanding of the byte-level encodings used by CBOR and Protobuf.

Continue to Data Format Selection –>