%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D', 'clusterBkg': '#ECF0F1'}}}%%
graph TB
subgraph JSON["JSON - Human Readable"]
J1["Smart Home<br/>Wi-Fi thermostat<br/>1 msg/min"]
J2["Building HVAC<br/>Ethernet sensors<br/>Debug-friendly"]
J3["Mobile App<br/>REST APIs<br/>Rapid prototyping"]
end
subgraph CBOR["CBOR - Balanced"]
C1["Agriculture<br/>LoRaWAN soil sensors<br/>6 msgs/day"]
C2["Fleet GPS<br/>NB-IoT tracking<br/>100 vehicles"]
C3["Smart Grid<br/>Meters, moderate volume<br/>Schema flexibility"]
end
subgraph PROTO["Protobuf - High Scale"]
P1["Industrial IoT<br/>Factory floor<br/>10,000 sensors"]
P2["Cloud Pipeline<br/>gRPC microservices<br/>Typed contracts"]
P3["Analytics<br/>High-volume streams<br/>ML pipelines"]
end
subgraph CUSTOM["Custom Binary - Maximum Efficiency"]
X1["Sigfox<br/>12-byte limit<br/>Parking sensors"]
X2["Wearables<br/>BLE, battery life<br/>Fitness trackers"]
X3["Satellite<br/>Extreme cost/byte<br/>Remote monitoring"]
end
style J1 fill:#E67E22,stroke:#2C3E50,stroke-width:1px,color:#000
style J2 fill:#E67E22,stroke:#2C3E50,stroke-width:1px,color:#000
style J3 fill:#E67E22,stroke:#2C3E50,stroke-width:1px,color:#000
style C1 fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
style C2 fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
style C3 fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
style P1 fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
style P2 fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
style P3 fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
style X1 fill:#2C3E50,stroke:#16A085,stroke-width:1px,color:#fff
style X2 fill:#2C3E50,stroke:#16A085,stroke-width:1px,color:#fff
style X3 fill:#2C3E50,stroke:#16A085,stroke-width:1px,color:#fff
45 Data Format Selection Guide
45.1 Learning Objectives
By the end of this chapter, you will be able to:
- Apply the format decision tree: Systematically evaluate constraints to choose the right format
- Calculate total cost of ownership: Consider bandwidth, battery, development, and maintenance costs
- Match formats to applications: Identify optimal formats for common IoT use cases
- Plan format migrations: Design strategies for upgrading from JSON to binary formats
- Justify format decisions: Explain trade-offs to stakeholders with concrete data
This is part of a series on IoT Data Formats:
- IoT Data Formats Overview - Introduction and text formats
- Binary Data Formats - CBOR, Protobuf, custom binary
- Data Format Selection (this chapter) - Decision guides and real-world examples
- Data Formats Practice - Scenarios, quizzes, worked examples
Related Decision Tools: - Protocol Selector Wizard - Interactive protocol selection - Architecture Planner - System architecture decisions
45.2 Prerequisites
Before starting this chapter, you should be familiar with:
- IoT Data Formats Overview: Basic understanding of JSON and why formats matter
- Binary Data Formats: CBOR, Protobuf, and custom binary basics
45.3 Format Selection Decision Tree
45.3.1 Step-by-Step Decision Process
Question 1: Is bandwidth severely constrained?
- No (Wi-Fi, Ethernet, LTE): Use JSON for simplicity
- Yes (LoRa, Sigfox, NB-IoT): Continue to Q2
Question 2: Do you need human readability for debugging?
- Yes: Use JSON or CBOR with tooling
- No: Continue to Q3
Question 3: Do you have many devices with high message volume?
- Yes (>1000 devices, >1 msg/min): Use Protobuf or CBOR
- No (small deployment): Use CBOR for balance
Question 4: Is every byte critical? (Sigfox: 12 bytes max)
- Yes: Custom binary format
- No: Protobuf or CBOR
45.4 Real-World Application Map
45.4.1 Application-Format Reference Table
| Application | Protocol | Format | Rationale |
|---|---|---|---|
| Smart home thermostat | Wi-Fi + MQTT | JSON | Bandwidth plentiful, debugging important |
| Agricultural soil sensor | LoRaWAN | CBOR | Bandwidth limited, need flexibility |
| City parking sensor | Sigfox | Custom binary | 12-byte limit, fixed message structure |
| Industrial gateway | Ethernet + gRPC | Protobuf | High volume, strong typing needed |
| Wearable fitness tracker | BLE | Custom binary | Power-sensitive, fixed data structure |
45.5 Learning Scenario: Soil Moisture Network
Your Challenge: You’re designing a precision agriculture system for a large vineyard. You need to monitor soil moisture to optimize irrigation and prevent crop damage.
System Requirements:
- 100 battery-powered sensor nodes placed throughout vineyard
- Cellular backhaul (NB-IoT): $0.01/KB data cost, 250 KB/month included per SIM
- Readings every 15 minutes (96 readings/day)
- Battery life target: 5 years on 2x AA batteries
- Sensors measure: Soil moisture (0-100%), temperature (-20 to 60C), battery voltage (2.0-3.6V), GPS location (once/day)
Your Mission: Choose the optimal data format and calculate the real costs.
45.5.1 Step 1: Calculate Message Volume
Think: How many messages per year per node?
Click to reveal calculation
- Readings per day: 96 (every 15 minutes)
- Days per year: 365
- Total messages/year/node: 96 x 365 = 35,040 messages
- Fleet total: 35,040 x 100 nodes = 3,504,000 messages/year
45.5.2 Step 2: Compare Format Options
You’re considering three formats. Let’s analyze each:
45.5.2.1 Option A: JSON (Readable)
{"id":"V001","moist":45.2,"temp":18.5,"batt":3.1,"lat":38.5,"lng":-122.4}- Size: 78 bytes per message
- Pros: Easy debugging, universal tools, cloud-friendly
- Cons: Largest size, more battery drain for transmission
45.5.2.2 Option B: CBOR (Balanced)
Binary encoding with same structure: - Size: 42 bytes per message (46% smaller than JSON) - Pros: 50% bandwidth savings, standard format, easier debugging than custom binary - Cons: Less ecosystem than JSON, requires CBOR library
45.5.2.3 Option C: Custom Binary (Optimized)
[2 bytes id] [1 byte moist x 2] [1 byte temp+20] [1 byte batt x 100] [4 bytes lat] [4 bytes lng]
Total: 13 bytes
- Size: 13 bytes per message (83% smaller than JSON)
- Pros: Smallest size, lowest power consumption
- Cons: No tooling, rigid schema, difficult debugging
45.5.3 Step 3: Calculate Annual Data Usage
Calculate: Data usage per node per year for each format.
Click to reveal calculations
Per node per year: - JSON: 35,040 messages x 78 bytes = 2.73 MB/year - CBOR: 35,040 messages x 42 bytes = 1.47 MB/year - Custom: 35,040 messages x 13 bytes = 0.46 MB/year
Fleet total (100 nodes): - JSON: 2.73 MB x 100 = 273 MB/year - CBOR: 1.47 MB x 100 = 147 MB/year - Custom: 0.46 MB x 100 = 46 MB/year
45.5.4 Step 4: Cost Analysis
Data plan: $0.01/KB, 250 KB/month included per SIM
Calculate: Will you exceed the included data allowance? What are the overage costs?
Click to reveal cost analysis
Included data per node per year: 250 KB/month x 12 months = 3 MB/year
Overage analysis:
| Format | Usage/year | Included | Overage | Cost/node/year | Fleet cost/year |
|---|---|---|---|---|---|
| JSON | 2.73 MB | 3 MB | 0 MB | $0 | $0 |
| CBOR | 1.47 MB | 3 MB | 0 MB | $0 | $0 |
| Custom | 0.46 MB | 3 MB | 0 MB | $0 | $0 |
Surprising result: All formats fit within the included 3 MB/year allowance! No overage costs.
BUT WAIT - what about peak months? (summer = more frequent irrigation adjustments)
Summer scenario (June-August): Increase to every 5 minutes = 288 msgs/day x 90 days = 25,920 messages
Summer data usage (3 months): - JSON: 25,920 x 78 bytes = 2.02 MB (81% of annual allowance in 3 months!) - CBOR: 25,920 x 42 bytes = 1.09 MB (44% of allowance) - Custom: 25,920 x 13 bytes = 0.34 MB (14% of allowance)
Annual with summer spike: - JSON: (9 months @ 15min) + (3 months @ 5min) = 4.07 MB/year - Exceeds 3 MB! - $10.70/node overage - $1,070/year fleet - CBOR: 2.19 MB/year - Under limit - Custom: 0.69 MB/year - Under limit
45.5.5 Step 5: Battery Life Impact
Radio power consumption (NB-IoT): - Transmit power: 23 dBm (200 mW) - Transmission time: ~50 ms/byte (including protocol overhead) - Energy per byte: 200 mW x 50 ms = 10 mJ/byte
Calculate: How much battery energy does each format consume?
Click to reveal battery analysis
Energy per message (transmission only): - JSON: 78 bytes x 10 mJ/byte = 780 mJ - CBOR: 42 bytes x 10 mJ/byte = 420 mJ - Custom: 13 bytes x 10 mJ/byte = 130 mJ
Annual energy (35,040 messages/year): - JSON: 35,040 x 780 mJ = 27.3 kJ/year = 7.6 Wh/year - CBOR: 35,040 x 420 mJ = 14.7 kJ/year = 4.1 Wh/year - Custom: 35,040 x 130 mJ = 4.6 kJ/year = 1.3 Wh/year
Battery capacity: 2x AA batteries = 2 x 2500 mAh x 3V = 15 Wh total
Transmission as % of battery (assuming other circuitry uses 50% of battery): - JSON: 7.6 Wh / 7.5 Wh available = 101% of available budget - 4.9 year battery life - CBOR: 4.1 Wh / 7.5 Wh = 55% of budget - 5+ year target achieved - Custom: 1.3 Wh / 7.5 Wh = 17% of budget - 5+ year target easily met
Verdict: JSON fails the 5-year battery life requirement!
45.5.6 Step 6: Final Recommendation
Compare all factors:
| Factor | JSON | CBOR | Custom Binary |
|---|---|---|---|
| Summer data cost | $1,070/year overage | $0 | $0 |
| Battery life | 4.9 years (misses target) | 5+ years | 5+ years |
| Debugging ease | Excellent (text editor) | Requires CBOR tools | Custom parser needed |
| Schema evolution | Easy (add fields) | Moderate (CBOR flexible) | Rigid (breaking changes) |
| Development time | 1 day (JSON libs everywhere) | 2-3 days (CBOR setup) | 1-2 weeks (custom parser) |
| Maintenance burden | Low (standard format) | Medium (CBOR docs) | High (DIY everything) |
| Total 5-year TCO | $5,350 (overage costs) | $0 (no overage) | $0 (no overage) |
45.5.7 Your Recommendation: CBOR (Option B)
Rationale: 1. Meets battery life target (5+ years) with 46% size reduction vs JSON 2. Zero overage costs even with summer spike (1.09 MB < 3 MB limit) 3. Flexible schema - Can add new sensor types without breaking existing nodes 4. Standard format - CBOR libraries exist for embedded C, Python, cloud processing 5. Reasonable debugging - Tools like cbor2diag convert binary to human-readable 6. Moderate setup - 2-3 days to integrate CBOR library, but well-documented
Why not custom binary? - Custom saves only 0.63 MB/year (1.47 - 0.46 MB) per node vs CBOR - Zero cost benefit (both are under data cap) - Battery savings: 2.8 Wh/year = 6 extra months of battery life - Trade-off: 6 months extra battery vs 2 weeks development + ongoing maintenance burden - Verdict: Not worth it unless battery life is absolutely critical
Why not JSON? - Fails 5-year battery life target (4.9 years) - $1,070/year overage costs during summer spike - Total 5-year cost: $5,350 vs $0 for CBOR
45.5.8 Real-World Lesson
Key Insight: Don’t just optimize for bytes - optimize for total cost of ownership (TCO): - Data costs - Battery replacement costs (labor + materials) - Development time costs - Maintenance burden costs
In this case, CBOR provides 80% of the efficiency of custom binary with 20% of the engineering effort. The sweet spot!
Bonus: With CBOR, you can easily add new fields (soil pH, nutrient levels) next season without touching deployed hardware - just update the cloud parser. With custom binary, that’s a breaking change requiring firmware updates or protocol versioning.
45.6 Fleet Tracking Quiz
Scenario: You’re building a fleet tracking system for 500 delivery trucks. Each truck sends GPS updates:
- Data: Latitude (float), Longitude (float), Speed (km/h, 0-120), Heading (degrees, 0-359), Timestamp (Unix epoch)
- Frequency: Every 30 seconds while moving (8 hours/day average)
- Network: Cellular (NB-IoT)
- Data plan: $5/month per truck for 50 MB
You’re considering three formats:
Option A: JSON
{"lat":37.7749,"lng":-122.4194,"spd":45,"hdg":180,"ts":1702834567}Size: 68 bytes
Option B: CBOR Binary encoding of same structure Size: 35 bytes
Option C: Custom Binary
[4 bytes lat] [4 bytes lng] [1 byte spd] [2 bytes hdg] [4 bytes ts]
Size: 15 bytes
Think about: 1. How many messages per truck per month? (30s intervals, 8 hours/day, 22 workdays) 2. What’s the monthly data usage for 500 trucks with each format? 3. Will you exceed the 50 MB/month data plan with any format? 4. What’s the total annual cost difference between formats?
Key Insights:
Messages per truck per month: - Updates: Every 30s for 8 hours/day - Per day: (8 hours x 3600s) / 30s = 960 messages/day - Per month: 960 x 22 workdays = 21,120 messages/month
Monthly data usage per truck: - JSON: 21,120 x 68 bytes = 1.44 MB/month - CBOR: 21,120 x 35 bytes = 0.74 MB/month - Custom: 21,120 x 15 bytes = 0.32 MB/month
Fleet total (500 trucks): - JSON: 1.44 x 500 = 720 MB/month - CBOR: 0.74 x 500 = 370 MB/month - Custom: 0.32 x 500 = 160 MB/month
Data plan analysis (50 MB/month per truck): - JSON: 1.44 MB < 50 MB - Under limit (3% utilization) - CBOR: 0.74 MB < 50 MB - Under limit (1.5% utilization) - Custom: 0.32 MB < 50 MB - All formats work!
Cost analysis: Since all formats fit within the $5/month plan, costs are identical: $5 x 500 = $2,500/month
BUT - what if you want to upgrade update frequency to every 10 seconds?
10-second updates (3x more frequent): - JSON: 4.32 MB/month - Still under 50 MB - CBOR: 2.22 MB/month - Still under 50 MB - Custom: 0.96 MB/month - Still under 50 MB
Best choice: CBOR (Option B) - 52% smaller than JSON (bandwidth efficient) - Self-describing format (schema flexibility) - Standard libraries available - Easy debugging with CBOR tools - Fits well within data plan even with future growth
Why not custom binary? - Saves only 0.42 MB/month per truck (1% of data plan) - Loses flexibility for schema changes - Harder to debug in production - Not worth the maintenance burden for minimal savings
Real-world lesson: Choose formats based on flexibility and maintainability, not just raw size. CBOR provides 80% of custom binary’s efficiency with 20% of the complexity. In this case, all formats fit comfortably within the data budget, so optimize for developer productivity, not bytes.
45.7 Summary
Key Decision Factors:
- Bandwidth constraints - The primary driver (Wi-Fi vs LoRaWAN)
- Scale - Format efficiency matters more at >1000 devices
- Battery life - Payload size directly affects transmission energy
- Development time - Binary formats require more setup
- Schema evolution - How often will your data structure change?
- Total cost of ownership - Not just data costs, but development and maintenance
Quick Decision Guide:
| Your Situation | Recommended Format |
|---|---|
| Prototyping anything | JSON |
| Wi-Fi/Ethernet, any scale | JSON |
| LoRaWAN, NB-IoT, moderate scale | CBOR |
| 1000+ devices, stable schema | Protobuf |
| Sigfox, extreme battery constraints | Custom binary |
45.8 What’s Next
Now that you understand how to select the right data format:
- Practice: Data Formats Practice - Work through detailed scenarios, quizzes, and worked examples
- Apply: Protocol Selector Wizard - Interactive tool combining protocol and format selection
- Implement: MQTT Fundamentals - See JSON/CBOR payloads in action