Use the format decision tree to choose between JSON, CBOR, Protobuf, and custom binary. If bandwidth is not constrained (Wi-Fi/LTE), use JSON. If constrained (LoRa/Sigfox), evaluate volume and byte limits to select CBOR, Protobuf, or custom binary. Always consider total cost of ownership – not just payload size.
10.1 Learning Objectives
By the end of this chapter, you will be able to:
Apply the format decision tree: Systematically evaluate constraints to select the right format for a given IoT scenario
Calculate total cost of ownership: Quantify bandwidth, battery, development, and maintenance costs for competing format choices
Classify IoT applications by format suitability: Map real-world use cases to JSON, CBOR, Protobuf, or custom binary based on network and power constraints
Design format migration strategies: Architect a phased transition from JSON to binary formats while maintaining backward compatibility
Justify format decisions to stakeholders: Present trade-off analyses with concrete data on cost, battery life, and schema evolution
CBOR: Binary self-describing format — best choice when schema distribution is impractical and size matters
Protocol Buffers: Schema-enforced binary format — optimal for high-throughput pipelines where schema management is acceptable
Schema Registry: Central repository storing format versions — prevents incompatibility between producers and consumers
Compression: gzip/zstd applied over any format further reduces size 60-80% — beneficial for text-heavy payloads
Self-Description: Format carrying its own type information (JSON, CBOR) versus requiring external schema (protobuf, Avro)
Parsing Overhead: CPU cost to deserialize: JSON (slow, text parsing) < CBOR (fast, binary) < Protobuf (fastest, compiled)
10.2 For Beginners: Data Format Selection
Choosing a data format is like deciding how to pack a suitcase. JSON is like packing in clear labeled bags – easy to see what is inside, but takes more space. Binary formats like CBOR are like vacuum-sealing everything – much smaller, but harder to inspect. For IoT devices with limited battery and slow connections, picking the right format can mean the difference between a sensor lasting one year or five years.
Flowchart decision tree starting with navy Start box asking Choose Data Format. First decision (gray): Bandwidth Constrained? (LoRa, Sigfox, NB-IoT). No path (Wi-Fi, Ethernet, LTE) leads to orange JSON box (Simple, Universal, Best for Wi-Fi/LTE). Yes path (LoRa, Sigfox) leads to second decision: Need Human Readability? (Debugging, Dev). Yes path leads to orange JSON or CBOR with tools box. No path leads to third decision: High Volume? (>1000 devices, >1 msg/min). No path (Small deployment) leads to teal CBOR box (Balance efficiency and flexibility). Yes path (>1000 devices) leads to fourth decision: Every Byte Critical? (e.g., 12 byte limit). No path leads to teal Protobuf or CBOR box (Typed, efficient pipelines). Yes path (Sigfox, ultra-LP) leads to navy Custom Binary box (Ultimate efficiency). Color coding: Navy for start/custom binary, Gray for decision points, Orange for JSON options, Teal for CBOR/Protobuf options.
Figure 10.1: Format Selection Decision Tree: Interactive flowchart for choosing the right IoT data format based on network constraints and application requirements. The tree starts by evaluating bandwidth constraints (Wi-Fi/LTE vs LoRa/Sigfox). High-bandwidth networks can use JSON for simplicity. Constrained networks require further evaluation: human readability needs (debugging vs production), message volume (small deployment vs high-scale fleet), and byte-level criticality (standard protocols vs ultra-constrained like Sigfox’s 12-byte limit). Each path leads to the optimal format choice balancing efficiency, maintainability, and ecosystem support.
10.4.1 Step-by-Step Decision Process
Question 1: Is bandwidth severely constrained?
No (Wi-Fi, Ethernet, LTE): Use JSON for simplicity
Yes (LoRa, Sigfox, NB-IoT): Continue to Q2
Question 2: Do you need human readability for debugging?
Yes: Use JSON or CBOR with tooling
No: Continue to Q3
Question 3: Do you have many devices with high message volume?
Yes (>1000 devices, >1 msg/min): Use Protobuf or CBOR
No (small deployment): Use CBOR for balance
Question 4: Is every byte critical? (Sigfox: 12 bytes max)
Yes: Custom binary format
No: Protobuf or CBOR
Place the format selection decision steps in the correct order:
10.4.2 Interactive Format Cost Calculator
Try this calculator to compare data format costs for your own IoT deployment:
Real-World Application Map - Instead of abstract decision criteria, this diagram shows concrete IoT applications grouped by their ideal data format. Smart home devices with Wi-Fi use JSON for easy debugging. Agricultural sensors on LoRaWAN benefit from CBOR’s balance. Industrial IoT at scale demands Protobuf’s efficiency. Only ultra-constrained scenarios like Sigfox (12-byte limit) or satellite links justify custom binary. Students can identify their project type and immediately see the recommended format.
Figure 10.2
10.5.1 Application-Format Reference Table
Application
Protocol
Format
Rationale
Smart home thermostat
Wi-Fi + MQTT
JSON
Bandwidth plentiful, debugging important
Agricultural soil sensor
LoRaWAN
CBOR
Bandwidth limited, need flexibility
City parking sensor
Sigfox
Custom binary
12-byte limit, fixed message structure
Industrial gateway
Ethernet + gRPC
Protobuf
High volume, strong typing needed
Wearable fitness tracker
BLE
Custom binary
Power-sensitive, fixed data structure
Match each IoT application to its recommended data format:
10.6 Learning Scenario: Soil Moisture Network
Learning Scenario: Choosing a Format for Soil Moisture Network
Your Challenge: You’re designing a precision agriculture system for a large vineyard. You need to monitor soil moisture to optimize irrigation and prevent crop damage.
System Requirements:
100 battery-powered sensor nodes placed throughout vineyard
Cellular backhaul (NB-IoT): $0.01/KB data cost, 250 KB/month included per SIM
Readings every 15 minutes (96 readings/day)
Battery life target: 5 years on 2x AA batteries
Sensors measure: Soil moisture (0-100%), temperature (-20 to 60C), battery voltage (2.0-3.6V), GPS location (once/day)
Your Mission: Choose the optimal data format and calculate the real costs.
10.6.1Step 1: Calculate Message Volume
Think: How many messages per year per node?
Click to reveal calculation
Readings per day: 96 (every 15 minutes)
Days per year: 365
Total messages/year/node: 96 x 365 = 35,040 messages
Fleet total: 35,040 x 100 nodes = 3,504,000 messages/year
10.6.2Step 2: Compare Format Options
You’re considering three formats. Let’s analyze each:
Cons: Largest size, more battery drain for transmission
10.6.2.2Option B: CBOR (Balanced)
Binary encoding with same structure: - Size: 42 bytes per message (46% smaller than JSON) - Pros: 50% bandwidth savings, standard format, easier debugging than custom binary - Cons: Less ecosystem than JSON, requires CBOR library
Battery capacity: 2x AA batteries = 2 x 2500 mAh x 3V = 15 Wh total
Transmission as % of battery (assuming other circuitry uses 50% of battery): - JSON: 7.6 Wh / 7.5 Wh available = 101% of available budget - 4.9 year battery life - CBOR: 4.1 Wh / 7.5 Wh = 55% of budget - 5+ year target achieved - Custom: 1.3 Wh / 7.5 Wh = 17% of budget - 5+ year target easily met
Verdict: JSON fails the 5-year battery life requirement!
10.6.6Step 6: Final Recommendation
Compare all factors:
Factor
JSON
CBOR
Custom Binary
Summer data cost
$1,040/year overage
$0
$0
Battery life
4.9 years (misses target)
5+ years
5+ years
Debugging ease
Excellent (text editor)
Requires CBOR tools
Custom parser needed
Schema evolution
Easy (add fields)
Moderate (CBOR flexible)
Rigid (breaking changes)
Development time
1 day (JSON libs everywhere)
2-3 days (CBOR setup)
1-2 weeks (custom parser)
Maintenance burden
Low (standard format)
Medium (CBOR docs)
High (DIY everything)
Total 5-year TCO
$5,200 (overage costs)
$0 (no overage)
$0 (no overage)
10.6.7Your Recommendation: CBOR (Option B)
Rationale:
Meets battery life target (5+ years) with 46% size reduction vs JSON
Zero overage costs even with summer spike (1.09 MB < 3 MB limit)
Flexible schema - Can add new sensor types without breaking existing nodes
Standard format - CBOR libraries exist for embedded C, Python, cloud processing
Reasonable debugging - Tools like cbor2diag convert binary to human-readable
Moderate setup - 2-3 days to integrate CBOR library, but well-documented
Why not custom binary?
Custom saves only 0.63 MB/year (1.47 - 0.46 MB) per node vs CBOR
Zero cost benefit (both are under data cap)
Battery savings: 2.8 Wh/year = 6 extra months of battery life
Trade-off: 6 months extra battery vs 2 weeks development + ongoing maintenance burden
Verdict: Not worth it unless battery life is absolutely critical
Why not JSON?
Fails 5-year battery life target (4.9 years)
$1,040/year overage costs during summer spike
Total 5-year cost: $5,200 vs $0 for CBOR
10.6.8Real-World Lesson
Key Insight: Don’t just optimize for bytes - optimize for total cost of ownership (TCO): - Data costs - Battery replacement costs (labor + materials) - Development time costs - Maintenance burden costs
In this case, CBOR provides 80% of the efficiency of custom binary with 20% of the engineering effort. The sweet spot!
Bonus: With CBOR, you can easily add new fields (soil pH, nutrient levels) next season without touching deployed hardware - just update the cloud parser. With custom binary, that’s a breaking change requiring firmware updates or protocol versioning.
10.7 Fleet Tracking Quiz
Quiz: Format Selection for Fleet Tracking
Scenario: You’re building a fleet tracking system for 500 delivery trucks. Each truck sends GPS updates:
Cost analysis: Since all formats fit within the $5/month plan, costs are identical: $5 x 500 = $2,500/month
BUT - what if you want to upgrade update frequency to every 10 seconds?
10-second updates (3x more frequent):
JSON: 4.32 MB/month - Still under 50 MB
CBOR: 2.22 MB/month - Still under 50 MB
Custom: 0.96 MB/month - Still under 50 MB
Best choice: CBOR (Option B)
52% smaller than JSON (bandwidth efficient)
Self-describing format (schema flexibility)
Standard libraries available
Easy debugging with CBOR tools
Fits well within data plan even with future growth
Why not custom binary?
Saves only 0.42 MB/month per truck (1% of data plan)
Loses flexibility for schema changes
Harder to debug in production
Not worth the maintenance burden for minimal savings
Real-world lesson: Choose formats based on flexibility and maintainability, not just raw size. CBOR provides 80% of custom binary’s efficiency with 20% of the complexity. In this case, all formats fit comfortably within the data budget, so optimize for developer productivity, not bytes.
Putting Numbers to It: Fleet Tracking Cost Reality
For a cellular-connected fleet tracker, format overhead directly impacts your bill:
The $350/year CBOR→Custom savings seems significant, but amortize the $15K engineering cost: breakeven takes 43 years. For most deployments, CBOR’s sweet spot (half the bandwidth of JSON, standard tooling) wins.
Decision Framework: Format Selection for Your IoT Project
Use this decision framework to choose the right data format based on your specific constraints:
10.7.1 Step 1: Identify Your Primary Constraint
Constraint
Go to Step
Bandwidth-limited (LoRa, Sigfox, NB-IoT)
Step 2
Battery-critical (10+ year lifespan)
Step 2
Wi-Fi/Ethernet (bandwidth plentiful)
Use JSON (stop here)
Rapid prototyping (speed over efficiency)
Use JSON (stop here)
10.7.2 Step 2: Calculate Your Byte Budget
Total available bytes per message = MTU - Protocol Headers
Network
MTU
Headers
Available for Payload
LoRaWAN SF12
51
13
38 bytes
Sigfox
12
0
12 bytes
NB-IoT
1500
48 (IPv6 + UDP)
1452 bytes
BLE (default ATT MTU)
23
3
20 bytes
Example: LoRaWAN → 38 bytes available for your sensor data
10.7.3 Step 3: Evaluate Format Options Against Your Payload
For your specific sensor readings, estimate size with each format:
Example: Temperature (16-bit), Humidity (8-bit), Battery (8-bit), Timestamp (32-bit) = 8 bytes raw
Format
Size
Fits in 38 bytes?
Efficiency
JSON
~45 bytes
NO - exceeds budget
N/A
CBOR
~18 bytes
Yes
44% (8/18)
Protobuf
~12 bytes
Yes
67% (8/12)
Custom binary
8 bytes
Yes
100% (8/8)
10.7.4 Step 4: Apply Decision Rules
Scenario
Recommended Format
Why
Payload fits in JSON within budget
JSON
Simplest, most debuggable
Payload doesn’t fit, but deployment is small (<100 devices)
CBOR
Balance of efficiency and flexibility
Large deployment (1000+ devices), stable schema
Protobuf
Best tooling, type safety, ecosystem
Ultra-constrained (Sigfox, every byte critical)
Custom binary
Maximum efficiency
Schema changes frequently
CBOR or JSON
Schema-less flexibility
Strong typing required (multi-team)
Protobuf
Compile-time validation
10.7.5 Step 5: Validate Your Choice
Checklist before committing:
Example Decision:
“We’re deploying 500 agricultural sensors on LoRaWAN SF12. Our payload is 10 bytes raw. JSON doesn’t fit (45 bytes). We’re a 2-person team without Protobuf experience. Schema may evolve (adding soil pH sensor next season). Decision: CBOR – fits in 18 bytes, we can debug with cbor2diag, schema-less flexibility, and our Python backend has native CBOR support.”
Inline Concept Check: Format Selection Mid-Chapter Review
Question 1: A smart building deploys 2,000 temperature sensors over WiFi (100 Mbps available bandwidth). Each sensor reports every 60 seconds. The development team is small (3 engineers) and needs to iterate quickly. Which format should they choose?
Custom binary – maximum efficiency for 2,000 devices
JSON – bandwidth is plentiful, development speed matters more
Protocol Buffers – 2,000 devices justifies the schema investment
CBOR – best compromise between size and flexibility
Show Answer
b) JSON – bandwidth is plentiful, development speed matters more
Reasoning:
WiFi bandwidth: 100 Mbps = 12.5 MB/sec. Even if all 2,000 sensors send simultaneously, that’s ~150 KB (assuming 75-byte JSON payloads). This is 0.001% of available bandwidth – bandwidth is NOT the constraint.
Development speed: Small team (3 engineers) needs fast iteration. JSON requires no schema definition, no code generation, no special tooling. Changes to data structure take minutes, not hours.
Debugging: JSON is human-readable in logs, browser dev tools, and command-line debugging.
Scale: 2,000 devices is significant but not massive. The efficiency gains from binary formats don’t justify the development overhead when bandwidth is unconstrained.
When would the answer change?
If connectivity were LoRaWAN/NB-IoT → CBOR
If scale were 100,000+ devices → Protocol Buffers
If payloads were >1KB → Consider compression or binary formats
Question 2: Your deployed sensors currently use JSON over LoRaWAN (48-byte payload, barely fits in 51-byte limit). You need to add two new sensor readings (4 bytes each). What’s the best migration path?
Compress the JSON with gzip before transmission
Migrate to CBOR (self-describing, gradual rollout possible)
Switch to Protocol Buffers (most efficient)
Remove less important existing fields to make room
Show Answer
b) Migrate to CBOR (self-describing, gradual rollout possible)
Reasoning:
Current state: JSON = 48 bytes, barely fits in 51-byte LoRaWAN limit
Adding 2 fields: JSON would be ~62 bytes (exceeds limit by 21%)
CBOR encoding: Same data = ~25 bytes (fits comfortably with room to grow)
Migration safety: CBOR is self-describing, so cloud can accept both JSON and CBOR during gradual firmware rollout
Library footprint: CBOR libraries are small (~10-20 KB), fit in typical MCU flash
Why not the alternatives?
gzip compression (a): Adds CPU overhead on battery-powered MCU, decompression complexity in cloud, and compressed JSON might still exceed 51 bytes for this payload.
Protocol Buffers (c): Requires schema definition, code generation, larger library footprint. The schema-based approach is harder to evolve gracefully during a live migration.
Remove fields (d): Defeats the purpose (you need MORE data, not less).
Key lesson: CBOR is the ideal migration path from JSON when you need size reduction but want to maintain schema flexibility and enable gradual deployment.
10.8 Try It Yourself: Format Selection Decision
Scenario: You’re designing an industrial vibration monitoring system with these requirements:
Given Data:
50 sensors per factory floor
3 factories (150 sensors total)
Each sensor: 6-axis accelerometer (3 axes × 2 readings each)
Sampling rate: 1 kHz (1,000 samples/second)
Data resolution: 16-bit integers per axis reading (±8g range, 0.001g precision)
Connectivity: Wired Ethernet (1 Gbps available) from sensor to edge gateway
Edge-to-cloud: 4G LTE cellular (10 Mbps uplink)
Development timeline: 6 months to production
Expected deployment lifespan: 15 years
Team size: 8 engineers (4 embedded, 2 backend, 2 data science)
Your Task (Step-by-Step):
Part A: Sensor-to-Edge Format Selection
Calculate the raw data rate per sensor:
Samples/sec: _________
Axes: _________
Bytes per sample (2 bytes × 6 axes): _________
Data rate per sensor: _________ bytes/sec = _________ KB/sec
Calculate total sensor-to-edge bandwidth (150 sensors):
Total data rate: _________ KB/sec = _________ Mbps
Compare this to available Ethernet bandwidth (1 Gbps = 125 MB/sec). Is bandwidth constrained?
Yes / No
Given the latency requirement (<10ms for anomaly detection), which format should you choose for sensor-to-edge?
No. 14.4 Mbps is only 1.4% of 1 Gbps Ethernet capacity.
Recommended format: FlatBuffers or Custom Binary
Reasoning:
Bandwidth is NOT constrained (only 1.4% of capacity)
Latency IS constrained (<10ms for anomaly detection)
At 1 kHz sampling, parsing overhead matters more than transmission size
FlatBuffers enables zero-copy access (edge gateway can read accelerometer values directly from buffer without deserializing)
Custom binary (just 12 raw bytes per sample) is even simpler but lacks schema evolution
Best choice: FlatBuffers – provides zero-copy performance for real-time processing while maintaining schema evolution capability for the 15-year lifespan.
15-year lifespan drove Protobuf choice: Schema evolution is inevitable over that timeline
Common Pitfalls
1. Encoding Sensor Readings as JSON Strings Instead of Numbers
Sending {‘temperature’: ‘23.5’} as a string instead of {‘temperature’: 23.5} as a number forces consumers to parse strings, breaks numeric queries, and increases message size. Validate that all numeric fields are encoded as JSON numbers — this is the most common IoT data format error.
2. Ignoring Schema Evolution When Updating Firmware
Adding a new field to a JSON/CBOR message immediately after a firmware update means old consumers (not yet updated) receive unexpected fields. Use additive-only schema changes (never remove or rename fields), version your schemas, and handle unknown fields gracefully in all consumers.
3. Using Floating Point for All Numeric Fields
IEEE 754 float64 uses 8 bytes per value — a 10-field sensor reading becomes 80 bytes just for numbers. Integer-scaled fixed point (temperature × 100 as int16) reduces the same reading to 2 bytes per value with identical precision for typical IoT ranges.
🏷️ Label the Diagram
10.9 Summary
Key Decision Factors:
Bandwidth constraints - The primary driver (Wi-Fi vs LoRaWAN)
Scale - Format efficiency matters more at >1000 devices
Battery life - Payload size directly affects transmission energy
Development time - Binary formats require more setup
Schema evolution - How often will your data structure change?
Total cost of ownership - Not just data costs, but development and maintenance
Quick Decision Guide:
Your Situation
Recommended Format
Prototyping anything
JSON
Wi-Fi/Ethernet, any scale
JSON
LoRaWAN, NB-IoT, moderate scale
CBOR
1000+ devices, stable schema
Protobuf
Sigfox, extreme battery constraints
Custom binary
For Kids: Meet the Sensor Squad!
Sammy the Sensor is at the post office, trying to choose how to send a letter.
“I have four choices,” Sammy explains to the Squad:
Lila the LED reads the options:
JSON = Writing a full letter – “Dear Cloud, my temperature is 23.5 degrees Celsius. Yours truly, Sammy.” Clear, but uses lots of paper!
CBOR = A postcard – Same information, but smaller. Like writing in neat tiny print.
Protobuf = A coded telegram – “TEMP=235 STOP” – short and structured, but you need the codebook.
Custom binary = A number on a stamp – Just “235” – the tiniest possible, but only works if everyone knows the secret.
Max the Microcontroller pulls out a chart. “Here’s my decision trick:”
“Got a BIG mailbox? (Wi-Fi)” – Write a full letter (JSON)!
“Tiny mailbox? (LoRaWAN)” – Use a postcard (CBOR) or code (binary)!