%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D', 'clusterBkg': '#ECF0F1'}}}%%
graph LR
subgraph "Analogy: Sending a Package"
LETTER["<b>Letter (JSON)</b><br/>ββββββββββ<br/>Dear recipient,<br/>I am sending you<br/>the temperature<br/>which is 23.5...<br/>ββββββββββ<br/>Full sentences<br/>Large envelope"]
TELEGRAM["<b>Telegram (CBOR)</b><br/>βββββββ<br/>TEMP 23.5 STOP<br/>HUM 65 STOP<br/>βββββββ<br/>Abbreviated<br/>Medium size"]
BARCODE["<b>Barcode (Protobuf)</b><br/>ββββ<br/>barcode pattern<br/>(encoded data)<br/>ββββ<br/>Schema lookup<br/>Small label"]
end
subgraph "What Each Preserves"
PRESERVE["<b>All contain same info:</b><br/>β’ Device ID<br/>β’ Temperature<br/>β’ Humidity<br/>β’ Timestamp<br/>ββββββββββ<br/>Different packaging<br/>Same content!"]
end
LETTER -->|"Remove<br/>verbosity"| TELEGRAM
TELEGRAM -->|"Use codes<br/>not words"| BARCODE
LETTER -.-> PRESERVE
TELEGRAM -.-> PRESERVE
BARCODE -.-> PRESERVE
style LETTER fill:#E67E22,stroke:#2C3E50,stroke-width:2px,color:#000
style TELEGRAM fill:#16A085,stroke:#2C3E50,stroke-width:2px,color:#fff
style BARCODE fill:#2C3E50,stroke:#16A085,stroke-width:2px,color:#fff
style PRESERVE fill:#ECF0F1,stroke:#7F8C8D,stroke-width:1px,color:#000
42 Binary Data Formats for IoT
42.1 Learning Objectives
By the end of this chapter, you will be able to:
- Implement CBOR encoding: Encode and decode sensor data using CBORβs binary format
- Design Protocol Buffer schemas: Create .proto files for efficient typed data serialization
- Build custom binary formats: Design byte-level encodings for ultra-constrained devices
- Choose between binary formats: Select CBOR, Protobuf, or custom binary based on requirements
- Handle schema evolution: Plan for future changes without breaking compatibility
This is part of a series on IoT Data Formats:
- IoT Data Formats Overview - Introduction and text formats
- Binary Data Formats (this chapter) - CBOR, Protobuf, custom binary
- Data Format Selection - Decision guides and real-world examples
- Data Formats Practice - Scenarios, quizzes, worked examples
Technical Deep Dives: - Data Representation - Binary and hexadecimal encoding - Packet Structure and Framing - Protocol headers and framing - CoAP - CBORβs primary protocol
42.2 Prerequisites
Before starting this chapter, you should be familiar with:
- IoT Data Formats Overview: Understanding why data formats matter
- Data Representation: Binary encoding and byte operations
42.3 CBOR - Compact Binary Object Representation
CBOR is βbinary JSONβ - same data model, much smaller size.
Same data in CBOR:
A4 # Map with 4 pairs
68 646576696365496420 # "deviceId" (8-byte string)
6A 73656E736F722D303031 # "sensor-001" (10-byte string)
64 74656D70 # "temp"
F9 4BBB # 23.5 (float16)
68 68756D6964697479 # "humidity"
18 41 # 65 (uint8)
69 74696D657374616D70 # "timestamp"
1A 657F8A57 # 1702834567 (uint32)
Size: ~50 bytes (47% smaller than JSON!)
Pros:
- Much smaller than JSON (30-60% reduction)
- Faster parsing than JSON
- Same data model as JSON (easy migration)
- IETF standard (RFC 8949)
Cons:
- Not human-readable (need hex viewer + parser)
- Smaller ecosystem than JSON
- Still includes field names (overhead)
Best for: CoAP, MQTT over LoRaWAN, NB-IoT
Core Concept: Serialization converts in-memory data structures (objects, arrays, numbers) into a sequence of bytes that can be transmitted over a network or stored on disk, and deserialization reverses this process - the choice of serialization format determines message size, parsing speed, and cross-platform compatibility.
Why It Matters: Serialization is the bridge between your code and the network. A temperature reading stored as a 32-bit float (4 bytes) in memory becomes anywhere from 2 bytes (custom binary) to 50+ bytes (JSON with metadata) when serialized. This overhead multiplies across every message, every device, every day. For a 10,000-device fleet sending hourly updates, the difference between JSON and CBOR serialization can mean 500 GB/year of saved bandwidth and proportional reductions in cellular data costs and battery consumption.
Key Takeaway: Match serialization format to your encoding/decoding location. If both sender and receiver are microcontrollers (embedded-to-embedded), use compact binary formats like CBOR or custom binary. If data flows to cloud services (embedded-to-cloud), CBOR or Protobuf balance efficiency with ecosystem support. If humans need to debug or inspect data (any-to-dashboard), keep JSON for at least the final hop. The encoding cost is paid once per message; choose based on who needs to read it.
CBORβs power comes from its rich type system that goes beyond JSONβs limited types.
Major Type Encoding (first 3 bits of initial byte):
| Type | Range | Description | Example |
|---|---|---|---|
| 0 | 0x00-0x1F | Unsigned integer | 0x17 = 23 |
| 1 | 0x20-0x3F | Negative integer | 0x37 = -24 |
| 2 | 0x40-0x5F | Byte string | 0x44 + 4 bytes |
| 3 | 0x60-0x7F | Text string | 0x64 + βtempβ |
| 4 | 0x80-0x9F | Array | 0x82 = 2-item array |
| 5 | 0xA0-0xBF | Map | 0xA4 = 4-pair map |
| 6 | 0xC0-0xDF | Tagged value | 0xC1 = epoch time |
| 7 | 0xE0-0xFF | Special/float | 0xF9 = float16 |
Compact integer encoding:
- 0-23: Single byte (
0x00to0x17) - 24-255: Two bytes (
0x18+ value) - 256-65535: Three bytes (
0x19+ 2-byte value) - Larger:
0x1A(4 bytes) or0x1B(8 bytes)
IoT-specific tags (RFC 8949):
| Tag | Meaning | Use Case |
|---|---|---|
| 0 | Date/time string | ISO 8601 timestamps |
| 1 | Epoch timestamp | Unix time (compact) |
| 2 | Positive bignum | Large sensor IDs |
| 32 | URI | Resource identifiers |
| 55799 | Self-describe CBOR | Magic number for detection |
Float precision selection:
0xF9+ 2 bytes: float16 (3-4 significant digits)0xFA+ 4 bytes: float32 (7 significant digits)0xFB+ 8 bytes: float64 (15 significant digits)
IoT optimization tip: Use float16 for sensor readings (temp, humidity) where 3 digits of precision is sufficient. Saves 2-6 bytes per value!
Debugging CBOR: Use cbor2diag tool to convert binary to diagnostic notation:
echo "A2 64 74656D70 F9 4BC0 68 68756D6964697479 18 41" | xxd -r -p | cbor2diag
# Output: {"temp": 23.5, "humidity": 65}42.4 Protocol Buffers (Protobuf)
Googleβs binary format with schema definition.
Schema file (.proto):
message SensorReading {
string deviceId = 1;
float temp = 2;
uint32 humidity = 3;
uint64 timestamp = 4;
}Size: ~22 bytes (77% smaller than JSON!)
Pros:
- Extremely compact (no field names sent)
- Very fast parsing (code generation)
- Strong typing and schema evolution
- Good tooling (protoc compiler)
Cons:
- Requires schema file on both ends
- Not self-describing (canβt parse without schema)
- More complex setup
Best for: High-volume data pipelines, gRPC APIs, edge-to-cloud
Understanding how Protocol Buffers encodes data helps you estimate payload sizes and debug wire format issues.
Binary encoding breakdown:
0A 0A 73656E736F722D303031 # Field 1 (deviceId): "sensor-001"
15 0000BC41 # Field 2 (temp): 23.5
18 41 # Field 3 (humidity): 65
20 57 8A7F65 # Field 4 (timestamp): 1702834567
Encoding rules:
| Wire Type | Meaning | Used For |
|---|---|---|
| 0 | Varint | int32, int64, uint32, uint64, bool, enum |
| 1 | 64-bit | fixed64, sfixed64, double |
| 2 | Length-delimited | string, bytes, embedded messages |
| 5 | 32-bit | fixed32, sfixed32, float |
Field tag format: (field_number << 3) | wire_type
- Field 1 (string):
0A= (1 << 3) | 2 = 0x0A - Field 2 (float):
15= (2 << 3) | 5 = 0x15 - Field 3 (uint32):
18= (3 << 3) | 0 = 0x18 - Field 4 (uint64):
20= (4 << 3) | 0 = 0x20
Varint encoding (for integers):
- Uses 7 bits per byte, MSB indicates continuation
- Small values are compact (1-2 bytes)
- Large values expand (up to 10 bytes for uint64)
Schema evolution rules:
- New fields: Add with new field numbers (old clients ignore)
- Removed fields: Mark as
reserved(never reuse numbers) - Type changes: Only compatible types (int32 <-> int64)
42.5 Custom Binary Formats
For ultimate efficiency, define your own binary format.
Example: Same data in 16 bytes
Byte layout:
[0-9]: deviceId "sensor-001" (10 bytes, no null terminator)
[10]: temp = 235 (uint8, value x 10)
[11]: humidity = 65 (uint8)
[12-15]: timestamp (uint32, seconds since epoch)
Total: 16 bytes
Pros:
- Smallest possible size
- Fastest parsing (no overhead)
- Complete control
Cons:
- No tooling, DIY everything
- No schema evolution (breaking changes)
- Not self-describing
- Maintenance burden
Best for: Extremely constrained devices (Sigfox, ultra-low-power)
When standard formats are too large, custom binary encoding becomes necessary. Here are proven patterns for designing efficient custom formats.
Pattern 1: Fixed-Point Encoding
Instead of floating-point (4 bytes), use scaled integers:
| Value Range | Encoding | Bytes | Example |
|---|---|---|---|
| Temp: -40.0 to 85.0C | int8 + 40 | 1 | 23.5C -> 64 |
| Humidity: 0-100% | uint8 | 1 | 65% -> 65 |
| Voltage: 0-5.0V | uint8 x 50 | 1 | 3.3V -> 165 |
| GPS lat/lng | int32 x 10^6 | 4 | 37.7749 -> 37774900 |
Pattern 2: Bit Packing
Combine multiple small values into single bytes:
// Pack 3 values into 1 byte:
// - Direction (4 bits: 0-15 -> N, NE, E, SE, S, SW, W, NW, ...)
// - Quality (2 bits: 0-3 -> Poor, Fair, Good, Excellent)
// - Alert (2 bits: 0-3 -> None, Low, Medium, High)
uint8_t packed = (direction << 4) | (quality << 2) | alert;Pattern 3: Delta Encoding
For sequential readings, send differences instead of absolute values:
First message: [timestamp][temp][humidity][...] = 16 bytes
Delta messages: [delta_t (1 byte)][delta_temp (1 byte)][...] = 4 bytes
Savings: 75% for time-series data!
Pattern 4: Enum Compression
Replace strings with numeric codes:
| String | Code | Bytes Saved |
|---|---|---|
| βtemperatureβ | 0x01 | 10 bytes |
| βhumidityβ | 0x02 | 7 bytes |
| βsensor-001β | 0x0001 | 8 bytes |
Versioning strategy (critical for future changes):
Byte 0: Version/Type field
[0xV0-0xVF]: Version 0-15
[0x01]: Sensor reading v1
[0x81]: Sensor reading v1 with extended fields
Bytes 1-N: Payload (format depends on version)
Common pitfalls to avoid:
- Endianness: Always document byte order (prefer little-endian for ARM)
- Alignment: Ensure 2-byte values start at even offsets
- Overflow: Validate input ranges before encoding
- Magic numbers: Include a sync byte (0xAA, 0x55) for framing
42.6 Size Comparison Visualization
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D', 'clusterBkg': '#ECF0F1'}}}%%
graph TB
subgraph "Where Do the Bytes Go?"
direction TB
subgraph JSON["JSON: 95 bytes total"]
J_SYNTAX["Syntax overhead<br/>braces, quotes, commas<br/>~15 bytes"]
J_NAMES["Field names<br/>deviceId, temp, humidity, ts<br/>~35 bytes"]
J_VALUES["Actual data values<br/>sensor-001, 23.5, 65, 1702834567<br/>~45 bytes"]
end
subgraph CBOR["CBOR: 50 bytes total"]
C_HEADER["Type markers<br/>A4, 68, F9, 18, 1A<br/>~5 bytes"]
C_NAMES["Field names (binary)<br/>Same names, compact encoding<br/>~25 bytes"]
C_VALUES["Compact values<br/>float16, uint8, uint32<br/>~20 bytes"]
end
subgraph PROTO["Protobuf: 22 bytes total"]
P_TAGS["Field tags<br/>1, 2, 3, 4 (not names)<br/>~4 bytes"]
P_VALUES["Typed values<br/>Efficient encoding<br/>~18 bytes"]
end
subgraph CUSTOM["Custom: 16 bytes total"]
X_VALUES["Pure data only<br/>No overhead at all<br/>16 bytes"]
end
end
J_SYNTAX --> J_NAMES --> J_VALUES
C_HEADER --> C_NAMES --> C_VALUES
P_TAGS --> P_VALUES
X_VALUES
style J_SYNTAX fill:#7F8C8D,stroke:#2C3E50,stroke-width:1px,color:#fff
style J_NAMES fill:#E67E22,stroke:#2C3E50,stroke-width:1px,color:#000
style J_VALUES fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
style C_HEADER fill:#7F8C8D,stroke:#2C3E50,stroke-width:1px,color:#fff
style C_NAMES fill:#E67E22,stroke:#2C3E50,stroke-width:1px,color:#000
style C_VALUES fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
style P_TAGS fill:#7F8C8D,stroke:#2C3E50,stroke-width:1px,color:#fff
style P_VALUES fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
style X_VALUES fill:#16A085,stroke:#2C3E50,stroke-width:2px,color:#fff
42.7 Performance Benchmarks
Beyond payload size, parsing speed and memory usage vary significantly across formats and libraries.
Parsing Performance (ESP32, 240 MHz, typical IoT payload):
| Format | Library | Parse Time | Memory | Notes |
|---|---|---|---|---|
| JSON | ArduinoJson | 1.2 ms | 512 B | Dynamic allocation |
| JSON | cJSON | 0.8 ms | 384 B | Simpler, lighter |
| CBOR | tinycbor | 0.3 ms | 128 B | Streaming parser |
| CBOR | cn-cbor | 0.4 ms | 256 B | Tree-based |
| Protobuf | nanopb | 0.1 ms | 64 B | Static allocation |
| Custom | (manual) | 0.05 ms | 0 B | No parsing overhead |
Memory allocation strategies:
- Static allocation (Protobuf/nanopb): Pre-allocate based on schema
- Pros: Predictable, no fragmentation
- Cons: Wastes memory for variable-length fields
- Dynamic allocation (JSON/ArduinoJson): Allocate on parse
- Pros: Flexible for variable payloads
- Cons: Heap fragmentation risk, slower
- Streaming (CBOR/tinycbor): Process as bytes arrive
- Pros: Minimal memory, real-time processing
- Cons: No random access to fields
Library recommendations by platform:
| Platform | JSON | CBOR | Protobuf |
|---|---|---|---|
| ESP32/ESP8266 | ArduinoJson | tinycbor | nanopb |
| STM32 | cJSON | cn-cbor | nanopb |
| Raspberry Pi | nlohmann/json | libcbor | protobuf-c |
| Python (cloud) | json (stdlib) | cbor2 | protobuf |
| Node.js | JSON.parse | cbor | protobufjs |
Energy impact (NB-IoT transmission at 23 dBm):
| Format | Bytes | TX Time | Energy | Battery Impact |
|---|---|---|---|---|
| JSON | 95 | 47 ms | 9.4 mJ | Baseline |
| CBOR | 50 | 25 ms | 5.0 mJ | 47% savings |
| Protobuf | 22 | 11 ms | 2.2 mJ | 77% savings |
| Custom | 16 | 8 ms | 1.6 mJ | 83% savings |
Real-world impact: For a 5-year battery target with 96 messages/day, format choice can mean the difference between 4-year and 6-year battery life.
42.8 Format Trade-off Summary
Decision context: When choosing between human-readable JSON and compact binary formats for IoT data serialization
| Factor | JSON | CBOR | Protobuf |
|---|---|---|---|
| Battery impact | High (large payloads) | Medium (47% smaller) | Low (77% smaller) |
| Bandwidth | High (~95 bytes typical) | Medium (~50 bytes) | Low (~22 bytes) |
| Latency | Higher (parsing overhead) | Medium | Low (fast decode) |
| Readability | Excellent (text editor) | Requires tools | Requires schema + tools |
| Flexibility | Excellent (schemaless) | Good (self-describing) | Moderate (schema required) |
| Schema evolution | Easy (add fields anytime) | Good | Good (with planning) |
| Tooling | Universal | Growing | Strong (protoc, gRPC) |
| Development speed | Fastest | Moderate | Slower (schema first) |
Choose JSON when:
- Bandwidth is not constrained (Wi-Fi, Ethernet, LTE)
- Development speed and debugging are priorities
- Schema may change frequently during prototyping
- Integrating with web services and REST APIs
- Small deployments (<100 devices) where data costs are negligible
Choose CBOR when:
- Bandwidth is limited but you need JSON-like flexibility (LoRaWAN, NB-IoT)
- Migrating from JSON with minimal code changes
- Self-describing format needed (no schema coordination required)
- CoAP protocol usage (CBOR is the standard payload format)
- Balance between efficiency and maintainability
Choose Protobuf when:
- High-volume deployments (>1000 devices) where bandwidth savings matter
- Strong typing and schema enforcement are requirements
- Building gRPC-based microservices architecture
- Long-term API contracts with multiple teams
- Maximum efficiency needed with acceptable schema management overhead
Default recommendation: Start with JSON for prototyping, migrate to CBOR when bandwidth becomes a concern, use Protobuf for high-scale production systems with stable schemas
42.9 Summary
Key Points:
- CBOR: Binary JSON with 47% size reduction, same data model, IETF standard
- Protobuf: Schema-based with 77% reduction, strong typing, requires setup
- Custom Binary: Maximum efficiency (83%) but high maintenance burden
- Match serialization format to your sender/receiver types
- Consider parsing speed and memory, not just payload size
Quick Reference:
| Format | Size | Complexity | Schema | Best For |
|---|---|---|---|---|
| CBOR | 50 bytes | Low | None | LoRaWAN, NB-IoT, CoAP |
| Protobuf | 22 bytes | Medium | Required | gRPC, high-volume |
| Custom | 16 bytes | High | DIY | Sigfox, ultra-low-power |
42.10 Whatβs Next
Now that you understand binary data formats and their trade-offs:
- Next: Data Format Selection - Decision guides with real-world examples to help you choose
- Practice: Data Formats Practice - Work through scenarios and quizzes
- Apply: CoAP Fundamentals - See CBOR in action with CoAP