45  Data Format Selection Guide

45.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Apply the format decision tree: Systematically evaluate constraints to choose the right format
  • Calculate total cost of ownership: Consider bandwidth, battery, development, and maintenance costs
  • Match formats to applications: Identify optimal formats for common IoT use cases
  • Plan format migrations: Design strategies for upgrading from JSON to binary formats
  • Justify format decisions: Explain trade-offs to stakeholders with concrete data

This is part of a series on IoT Data Formats:

  1. IoT Data Formats Overview - Introduction and text formats
  2. Binary Data Formats - CBOR, Protobuf, custom binary
  3. Data Format Selection (this chapter) - Decision guides and real-world examples
  4. Data Formats Practice - Scenarios, quizzes, worked examples

Related Decision Tools: - Protocol Selector Wizard - Interactive protocol selection - Architecture Planner - System architecture decisions

45.2 Prerequisites

Before starting this chapter, you should be familiar with:


45.3 Format Selection Decision Tree

Flowchart decision tree starting with navy Start box asking Choose Data Format. First decision (gray): Bandwidth Constrained? (LoRa, Sigfox, NB-IoT). No path (Wi-Fi, Ethernet, LTE) leads to orange JSON box (Simple, Universal, Best for Wi-Fi/LTE). Yes path (LoRa, Sigfox) leads to second decision: Need Human Readability? (Debugging, Dev). Yes path leads to orange JSON or CBOR with tools box. No path leads to third decision: High Volume? (>1000 devices, >1 msg/min). No path (Small deployment) leads to teal CBOR box (Balance efficiency and flexibility). Yes path (>1000 devices) leads to fourth decision: Every Byte Critical? (e.g., 12 byte limit). No path leads to teal Protobuf or CBOR box (Typed, efficient pipelines). Yes path (Sigfox, ultra-LP) leads to navy Custom Binary box (Ultimate efficiency). Color coding: Navy for start/custom binary, Gray for decision points, Orange for JSON options, Teal for CBOR/Protobuf options.

Flowchart decision tree starting with navy Start box asking Choose Data Format. First decision (gray): Bandwidth Constrained? (LoRa, Sigfox, NB-IoT). No path (Wi-Fi, Ethernet, LTE) leads to orange JSON box (Simple, Universal, Best for Wi-Fi/LTE). Yes path (LoRa, Sigfox) leads to second decision: Need Human Readability? (Debugging, Dev). Yes path leads to orange JSON or CBOR with tools box. No path leads to third decision: High Volume? (>1000 devices, >1 msg/min). No path (Small deployment) leads to teal CBOR box (Balance efficiency and flexibility). Yes path (>1000 devices) leads to fourth decision: Every Byte Critical? (e.g., 12 byte limit). No path leads to teal Protobuf or CBOR box (Typed, efficient pipelines). Yes path (Sigfox, ultra-LP) leads to navy Custom Binary box (Ultimate efficiency). Color coding: Navy for start/custom binary, Gray for decision points, Orange for JSON options, Teal for CBOR/Protobuf options.
Figure 45.1: Format Selection Decision Tree: Interactive flowchart for choosing the right IoT data format based on network constraints and application requirements. The tree starts by evaluating bandwidth constraints (Wi-Fi/LTE vs LoRa/Sigfox). High-bandwidth networks can use JSON for simplicity. Constrained networks require further evaluation: human readability needs (debugging vs production), message volume (small deployment vs high-scale fleet), and byte-level criticality (standard protocols vs ultra-constrained like Sigfox’s 12-byte limit). Each path leads to the optimal format choice balancing efficiency, maintainability, and ecosystem support.

45.3.1 Step-by-Step Decision Process

Question 1: Is bandwidth severely constrained?

  • No (Wi-Fi, Ethernet, LTE): Use JSON for simplicity
  • Yes (LoRa, Sigfox, NB-IoT): Continue to Q2

Question 2: Do you need human readability for debugging?

  • Yes: Use JSON or CBOR with tooling
  • No: Continue to Q3

Question 3: Do you have many devices with high message volume?

  • Yes (>1000 devices, >1 msg/min): Use Protobuf or CBOR
  • No (small deployment): Use CBOR for balance

Question 4: Is every byte critical? (Sigfox: 12 bytes max)

  • Yes: Custom binary format
  • No: Protobuf or CBOR

45.4 Real-World Application Map

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D', 'clusterBkg': '#ECF0F1'}}}%%
graph TB
    subgraph JSON["JSON - Human Readable"]
        J1["Smart Home<br/>Wi-Fi thermostat<br/>1 msg/min"]
        J2["Building HVAC<br/>Ethernet sensors<br/>Debug-friendly"]
        J3["Mobile App<br/>REST APIs<br/>Rapid prototyping"]
    end

    subgraph CBOR["CBOR - Balanced"]
        C1["Agriculture<br/>LoRaWAN soil sensors<br/>6 msgs/day"]
        C2["Fleet GPS<br/>NB-IoT tracking<br/>100 vehicles"]
        C3["Smart Grid<br/>Meters, moderate volume<br/>Schema flexibility"]
    end

    subgraph PROTO["Protobuf - High Scale"]
        P1["Industrial IoT<br/>Factory floor<br/>10,000 sensors"]
        P2["Cloud Pipeline<br/>gRPC microservices<br/>Typed contracts"]
        P3["Analytics<br/>High-volume streams<br/>ML pipelines"]
    end

    subgraph CUSTOM["Custom Binary - Maximum Efficiency"]
        X1["Sigfox<br/>12-byte limit<br/>Parking sensors"]
        X2["Wearables<br/>BLE, battery life<br/>Fitness trackers"]
        X3["Satellite<br/>Extreme cost/byte<br/>Remote monitoring"]
    end

    style J1 fill:#E67E22,stroke:#2C3E50,stroke-width:1px,color:#000
    style J2 fill:#E67E22,stroke:#2C3E50,stroke-width:1px,color:#000
    style J3 fill:#E67E22,stroke:#2C3E50,stroke-width:1px,color:#000
    style C1 fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
    style C2 fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
    style C3 fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
    style P1 fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
    style P2 fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
    style P3 fill:#16A085,stroke:#2C3E50,stroke-width:1px,color:#fff
    style X1 fill:#2C3E50,stroke:#16A085,stroke-width:1px,color:#fff
    style X2 fill:#2C3E50,stroke:#16A085,stroke-width:1px,color:#fff
    style X3 fill:#2C3E50,stroke:#16A085,stroke-width:1px,color:#fff

Figure 45.2: Real-World Application Map - Instead of abstract decision criteria, this diagram shows concrete IoT applications grouped by their ideal data format. Smart home devices with Wi-Fi use JSON for easy debugging. Agricultural sensors on LoRaWAN benefit from CBOR’s balance. Industrial IoT at scale demands Protobuf’s efficiency. Only ultra-constrained scenarios like Sigfox (12-byte limit) or satellite links justify custom binary. Students can identify their project type and immediately see the recommended format. {fig-alt=“Grid diagram showing 12 real-world IoT applications organized into 4 format categories. JSON (orange boxes): Smart Home with Wi-Fi thermostat 1 msg/min, Building HVAC with Ethernet sensors and debug-friendly, Mobile App with REST APIs and rapid prototyping. CBOR (teal boxes): Agriculture with LoRaWAN soil sensors 6 msgs/day, Fleet GPS with NB-IoT tracking 100 vehicles, Smart Grid with meters moderate volume schema flexibility. Protobuf (teal boxes): Industrial IoT with factory floor 10,000 sensors, Cloud Pipeline with gRPC microservices typed contracts, Analytics with high-volume streams ML pipelines. Custom Binary (navy boxes): Sigfox with 12-byte limit parking sensors, Wearables with BLE battery life fitness trackers, Satellite with extreme cost per byte remote monitoring. Each application shows icon, use case name, network type, and key constraint.”}

45.4.1 Application-Format Reference Table

Application Protocol Format Rationale
Smart home thermostat Wi-Fi + MQTT JSON Bandwidth plentiful, debugging important
Agricultural soil sensor LoRaWAN CBOR Bandwidth limited, need flexibility
City parking sensor Sigfox Custom binary 12-byte limit, fixed message structure
Industrial gateway Ethernet + gRPC Protobuf High volume, strong typing needed
Wearable fitness tracker BLE Custom binary Power-sensitive, fixed data structure

45.5 Learning Scenario: Soil Moisture Network

Your Challenge: You’re designing a precision agriculture system for a large vineyard. You need to monitor soil moisture to optimize irrigation and prevent crop damage.

System Requirements:

  • 100 battery-powered sensor nodes placed throughout vineyard
  • Cellular backhaul (NB-IoT): $0.01/KB data cost, 250 KB/month included per SIM
  • Readings every 15 minutes (96 readings/day)
  • Battery life target: 5 years on 2x AA batteries
  • Sensors measure: Soil moisture (0-100%), temperature (-20 to 60C), battery voltage (2.0-3.6V), GPS location (once/day)

Your Mission: Choose the optimal data format and calculate the real costs.


45.5.1 Step 1: Calculate Message Volume

Think: How many messages per year per node?

Click to reveal calculation
  • Readings per day: 96 (every 15 minutes)
  • Days per year: 365
  • Total messages/year/node: 96 x 365 = 35,040 messages
  • Fleet total: 35,040 x 100 nodes = 3,504,000 messages/year

45.5.2 Step 2: Compare Format Options

You’re considering three formats. Let’s analyze each:

45.5.2.1 Option A: JSON (Readable)

{"id":"V001","moist":45.2,"temp":18.5,"batt":3.1,"lat":38.5,"lng":-122.4}
  • Size: 78 bytes per message
  • Pros: Easy debugging, universal tools, cloud-friendly
  • Cons: Largest size, more battery drain for transmission

45.5.2.2 Option B: CBOR (Balanced)

Binary encoding with same structure: - Size: 42 bytes per message (46% smaller than JSON) - Pros: 50% bandwidth savings, standard format, easier debugging than custom binary - Cons: Less ecosystem than JSON, requires CBOR library

45.5.2.3 Option C: Custom Binary (Optimized)

[2 bytes id] [1 byte moist x 2] [1 byte temp+20] [1 byte batt x 100] [4 bytes lat] [4 bytes lng]
Total: 13 bytes
  • Size: 13 bytes per message (83% smaller than JSON)
  • Pros: Smallest size, lowest power consumption
  • Cons: No tooling, rigid schema, difficult debugging

45.5.3 Step 3: Calculate Annual Data Usage

Calculate: Data usage per node per year for each format.

Click to reveal calculations

Per node per year: - JSON: 35,040 messages x 78 bytes = 2.73 MB/year - CBOR: 35,040 messages x 42 bytes = 1.47 MB/year - Custom: 35,040 messages x 13 bytes = 0.46 MB/year

Fleet total (100 nodes): - JSON: 2.73 MB x 100 = 273 MB/year - CBOR: 1.47 MB x 100 = 147 MB/year - Custom: 0.46 MB x 100 = 46 MB/year


45.5.4 Step 4: Cost Analysis

Data plan: $0.01/KB, 250 KB/month included per SIM

Calculate: Will you exceed the included data allowance? What are the overage costs?

Click to reveal cost analysis

Included data per node per year: 250 KB/month x 12 months = 3 MB/year

Overage analysis:

Format Usage/year Included Overage Cost/node/year Fleet cost/year
JSON 2.73 MB 3 MB 0 MB $0 $0
CBOR 1.47 MB 3 MB 0 MB $0 $0
Custom 0.46 MB 3 MB 0 MB $0 $0

Surprising result: All formats fit within the included 3 MB/year allowance! No overage costs.

BUT WAIT - what about peak months? (summer = more frequent irrigation adjustments)

Summer scenario (June-August): Increase to every 5 minutes = 288 msgs/day x 90 days = 25,920 messages

Summer data usage (3 months): - JSON: 25,920 x 78 bytes = 2.02 MB (81% of annual allowance in 3 months!) - CBOR: 25,920 x 42 bytes = 1.09 MB (44% of allowance) - Custom: 25,920 x 13 bytes = 0.34 MB (14% of allowance)

Annual with summer spike: - JSON: (9 months @ 15min) + (3 months @ 5min) = 4.07 MB/year - Exceeds 3 MB! - $10.70/node overage - $1,070/year fleet - CBOR: 2.19 MB/year - Under limit - Custom: 0.69 MB/year - Under limit


45.5.5 Step 5: Battery Life Impact

Radio power consumption (NB-IoT): - Transmit power: 23 dBm (200 mW) - Transmission time: ~50 ms/byte (including protocol overhead) - Energy per byte: 200 mW x 50 ms = 10 mJ/byte

Calculate: How much battery energy does each format consume?

Click to reveal battery analysis

Energy per message (transmission only): - JSON: 78 bytes x 10 mJ/byte = 780 mJ - CBOR: 42 bytes x 10 mJ/byte = 420 mJ - Custom: 13 bytes x 10 mJ/byte = 130 mJ

Annual energy (35,040 messages/year): - JSON: 35,040 x 780 mJ = 27.3 kJ/year = 7.6 Wh/year - CBOR: 35,040 x 420 mJ = 14.7 kJ/year = 4.1 Wh/year - Custom: 35,040 x 130 mJ = 4.6 kJ/year = 1.3 Wh/year

Battery capacity: 2x AA batteries = 2 x 2500 mAh x 3V = 15 Wh total

Transmission as % of battery (assuming other circuitry uses 50% of battery): - JSON: 7.6 Wh / 7.5 Wh available = 101% of available budget - 4.9 year battery life - CBOR: 4.1 Wh / 7.5 Wh = 55% of budget - 5+ year target achieved - Custom: 1.3 Wh / 7.5 Wh = 17% of budget - 5+ year target easily met

Verdict: JSON fails the 5-year battery life requirement!


45.5.6 Step 6: Final Recommendation

Compare all factors:

Factor JSON CBOR Custom Binary
Summer data cost $1,070/year overage $0 $0
Battery life 4.9 years (misses target) 5+ years 5+ years
Debugging ease Excellent (text editor) Requires CBOR tools Custom parser needed
Schema evolution Easy (add fields) Moderate (CBOR flexible) Rigid (breaking changes)
Development time 1 day (JSON libs everywhere) 2-3 days (CBOR setup) 1-2 weeks (custom parser)
Maintenance burden Low (standard format) Medium (CBOR docs) High (DIY everything)
Total 5-year TCO $5,350 (overage costs) $0 (no overage) $0 (no overage)

45.5.7 Your Recommendation: CBOR (Option B)

Rationale: 1. Meets battery life target (5+ years) with 46% size reduction vs JSON 2. Zero overage costs even with summer spike (1.09 MB < 3 MB limit) 3. Flexible schema - Can add new sensor types without breaking existing nodes 4. Standard format - CBOR libraries exist for embedded C, Python, cloud processing 5. Reasonable debugging - Tools like cbor2diag convert binary to human-readable 6. Moderate setup - 2-3 days to integrate CBOR library, but well-documented

Why not custom binary? - Custom saves only 0.63 MB/year (1.47 - 0.46 MB) per node vs CBOR - Zero cost benefit (both are under data cap) - Battery savings: 2.8 Wh/year = 6 extra months of battery life - Trade-off: 6 months extra battery vs 2 weeks development + ongoing maintenance burden - Verdict: Not worth it unless battery life is absolutely critical

Why not JSON? - Fails 5-year battery life target (4.9 years) - $1,070/year overage costs during summer spike - Total 5-year cost: $5,350 vs $0 for CBOR


45.5.8 Real-World Lesson

Key Insight: Don’t just optimize for bytes - optimize for total cost of ownership (TCO): - Data costs - Battery replacement costs (labor + materials) - Development time costs - Maintenance burden costs

In this case, CBOR provides 80% of the efficiency of custom binary with 20% of the engineering effort. The sweet spot!

Bonus: With CBOR, you can easily add new fields (soil pH, nutrient levels) next season without touching deployed hardware - just update the cloud parser. With custom binary, that’s a breaking change requiring firmware updates or protocol versioning.


45.6 Fleet Tracking Quiz

Scenario: You’re building a fleet tracking system for 500 delivery trucks. Each truck sends GPS updates:

  • Data: Latitude (float), Longitude (float), Speed (km/h, 0-120), Heading (degrees, 0-359), Timestamp (Unix epoch)
  • Frequency: Every 30 seconds while moving (8 hours/day average)
  • Network: Cellular (NB-IoT)
  • Data plan: $5/month per truck for 50 MB

You’re considering three formats:

Option A: JSON

{"lat":37.7749,"lng":-122.4194,"spd":45,"hdg":180,"ts":1702834567}

Size: 68 bytes

Option B: CBOR Binary encoding of same structure Size: 35 bytes

Option C: Custom Binary

[4 bytes lat] [4 bytes lng] [1 byte spd] [2 bytes hdg] [4 bytes ts]

Size: 15 bytes

Think about: 1. How many messages per truck per month? (30s intervals, 8 hours/day, 22 workdays) 2. What’s the monthly data usage for 500 trucks with each format? 3. Will you exceed the 50 MB/month data plan with any format? 4. What’s the total annual cost difference between formats?

Key Insights:

Messages per truck per month: - Updates: Every 30s for 8 hours/day - Per day: (8 hours x 3600s) / 30s = 960 messages/day - Per month: 960 x 22 workdays = 21,120 messages/month

Monthly data usage per truck: - JSON: 21,120 x 68 bytes = 1.44 MB/month - CBOR: 21,120 x 35 bytes = 0.74 MB/month - Custom: 21,120 x 15 bytes = 0.32 MB/month

Fleet total (500 trucks): - JSON: 1.44 x 500 = 720 MB/month - CBOR: 0.74 x 500 = 370 MB/month - Custom: 0.32 x 500 = 160 MB/month

Data plan analysis (50 MB/month per truck): - JSON: 1.44 MB < 50 MB - Under limit (3% utilization) - CBOR: 0.74 MB < 50 MB - Under limit (1.5% utilization) - Custom: 0.32 MB < 50 MB - All formats work!

Cost analysis: Since all formats fit within the $5/month plan, costs are identical: $5 x 500 = $2,500/month

BUT - what if you want to upgrade update frequency to every 10 seconds?

10-second updates (3x more frequent): - JSON: 4.32 MB/month - Still under 50 MB - CBOR: 2.22 MB/month - Still under 50 MB - Custom: 0.96 MB/month - Still under 50 MB

Best choice: CBOR (Option B) - 52% smaller than JSON (bandwidth efficient) - Self-describing format (schema flexibility) - Standard libraries available - Easy debugging with CBOR tools - Fits well within data plan even with future growth

Why not custom binary? - Saves only 0.42 MB/month per truck (1% of data plan) - Loses flexibility for schema changes - Harder to debug in production - Not worth the maintenance burden for minimal savings

Real-world lesson: Choose formats based on flexibility and maintainability, not just raw size. CBOR provides 80% of custom binary’s efficiency with 20% of the complexity. In this case, all formats fit comfortably within the data budget, so optimize for developer productivity, not bytes.


45.7 Summary

Key Decision Factors:

  1. Bandwidth constraints - The primary driver (Wi-Fi vs LoRaWAN)
  2. Scale - Format efficiency matters more at >1000 devices
  3. Battery life - Payload size directly affects transmission energy
  4. Development time - Binary formats require more setup
  5. Schema evolution - How often will your data structure change?
  6. Total cost of ownership - Not just data costs, but development and maintenance

Quick Decision Guide:

Your Situation Recommended Format
Prototyping anything JSON
Wi-Fi/Ethernet, any scale JSON
LoRaWAN, NB-IoT, moderate scale CBOR
1000+ devices, stable schema Protobuf
Sigfox, extreme battery constraints Custom binary

45.8 What’s Next

Now that you understand how to select the right data format:

Continue to Data Formats Practice →