%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D', 'clusterBkg': '#ECF0F1', 'edgeLabelBackground':'#ffffff'}}}%%
flowchart TD
START["Choose Your Data Format"]
Q1{"What is your<br/>PRIMARY constraint?"}
BW["Bandwidth<br/>(LoRa, Sigfox, NB-IoT)"]
DEV["Development Speed<br/>(Prototype, MVP)"]
SCALE["Scale<br/>(10,000+ devices)"]
DEBUG["Debuggability<br/>(Production monitoring)"]
JSON_REC["<b>JSON</b><br/>━━━━━━━━<br/>• 95 bytes typical<br/>• Human-readable<br/>• Universal tools<br/>• Fastest development"]
CBOR_REC["<b>CBOR</b><br/>━━━━━━━━<br/>• 50 bytes typical<br/>• 47% smaller than JSON<br/>• Self-describing<br/>• Good balance"]
PROTO_REC["<b>Protobuf</b><br/>━━━━━━━━<br/>• 22 bytes typical<br/>• 77% smaller than JSON<br/>• Schema-enforced<br/>• Strong typing"]
CUSTOM_REC["<b>Custom Binary</b><br/>━━━━━━━━<br/>• 16 bytes typical<br/>• 83% smaller than JSON<br/>• Zero overhead<br/>• Maximum efficiency"]
START --> Q1
Q1 -->|"Need to ship fast"| DEV
Q1 -->|"Every byte counts"| BW
Q1 -->|"Massive deployment"| SCALE
Q1 -->|"Easy troubleshooting"| DEBUG
DEV --> JSON_REC
DEBUG --> JSON_REC
BW --> CBOR_REC
SCALE --> PROTO_REC
BW -->|"Extreme limits<br/>(12-byte max)"| CUSTOM_REC
style START fill:#2C3E50,stroke:#16A085,stroke-width:3px,color:#fff
style Q1 fill:#7F8C8D,stroke:#2C3E50,stroke-width:2px,color:#fff
style BW fill:#E67E22,stroke:#2C3E50,stroke-width:2px,color:#000
style DEV fill:#E67E22,stroke:#2C3E50,stroke-width:2px,color:#000
style SCALE fill:#E67E22,stroke:#2C3E50,stroke-width:2px,color:#000
style DEBUG fill:#E67E22,stroke:#2C3E50,stroke-width:2px,color:#000
style JSON_REC fill:#16A085,stroke:#2C3E50,stroke-width:2px,color:#fff
style CBOR_REC fill:#16A085,stroke:#2C3E50,stroke-width:2px,color:#fff
style PROTO_REC fill:#16A085,stroke:#2C3E50,stroke-width:2px,color:#fff
style CUSTOM_REC fill:#2C3E50,stroke:#16A085,stroke-width:2px,color:#fff
43 IoT Data Formats Overview
43.1 Learning Objectives
By the end of this chapter, you will be able to:
- Understand why data formats matter: Explain how format choice impacts bandwidth, battery, and development
- Compare human-readable formats: Evaluate JSON and XML for IoT applications
- Recognize format trade-offs: Identify when text formats are appropriate vs. binary formats
- Calculate payload overhead: Measure the cost of format metadata in typical sensor messages
This is part of a series on IoT Data Formats:
- IoT Data Formats Overview (this chapter) - Introduction and text formats
- Binary Data Formats - CBOR, Protobuf, custom binary
- Data Format Selection - Decision guides and real-world examples
- Data Formats Practice - Scenarios, quizzes, worked examples
Fundamentals: - Data Representation - Binary and hexadecimal encoding - Packet Structure and Framing - How data is wrapped - Sensor to Network Pipeline - End-to-end data flow
Networking: - MQTT - Messaging protocol with JSON/binary - CoAP - Constrained protocol with CBOR
43.2 Prerequisites
Before starting this chapter, you should be familiar with:
- Data Representation Fundamentals: Binary encoding and byte operations
- Networking Basics: Basic idea of packets and payload size constraints
43.3 For Kids: How Do Devices Talk to Each Other?
Imagine you’re sending a letter to a friend in another country…
43.3.1 The Language Problem
If you wrote a letter in English and sent it to someone who only speaks Japanese, they wouldn’t understand it! The same thing happens with computers and IoT devices.
When a sensor wants to tell a computer “It’s 23 degrees outside,” it needs to say it in a way the computer understands. That’s what data formats are - they’re the languages that devices use to talk!
43.3.2 Different Ways to Say the Same Thing
Let’s say Temperature Terry wants to tell a computer it’s warm outside:
| Language | How Terry Says It | Good For |
|---|---|---|
| English (JSON) | “The temperature is 23 degrees” | People who need to read it |
| Shorthand (CBOR) | “T:23” | When you want to save space |
| Number Code (Binary) | “10111” | When computers talk fast |
43.3.3 A Story: The Three Messengers
Once upon a time, three messengers needed to deliver the same message: “It’s sunny and 25 degrees.”
Messenger 1 (JSON) wrote a beautiful letter: “Dear Computer, Today the weather is sunny. The temperature is exactly 25 degrees Celsius. Have a nice day!” It was easy to read but took a long time to write and a lot of paper!
Messenger 2 (CBOR) wrote a quick note: “Sun. 25.” It was shorter and faster, but harder to read unless you knew the code!
Messenger 3 (Binary) sent dots and dashes like Morse code: “.- .–. .-..” The fastest of all, but only machines could understand it!
43.3.4 Which Language Is Best?
It depends on what you need!
| If You Need… | Use This | Why |
|---|---|---|
| People to read it | JSON | It’s like regular writing |
| To save battery | CBOR or Binary | Smaller messages use less energy |
| Super fast | Binary | Computers love numbers |
43.3.5 Real Life Example: Your Fitness Tracker
When your fitness tracker counts your steps and sends them to your phone:
- The tracker measures: “I counted 5,000 steps today!”
- It picks a language: Usually a small format like CBOR (to save battery)
- It sends the message: Through Bluetooth to your phone
- Your phone translates: Turns it into words and pictures you can see!
43.3.6 Key Words for Kids
| Word | What It Means |
|---|---|
| Data | Information, like numbers and words |
| Format | The way information is organized |
| JSON | A popular way to write data that people can read |
| Binary | Data written in 0s and 1s (computer language) |
| Message | Data being sent from one place to another |
43.3.7 Try This at Home!
Play “Data Format” with a friend: 1. Think of a simple message like “I have 3 apples” 2. Long way (JSON): “I am holding three apples in my basket” 3. Short way (CBOR): “3 apples” 4. Code way: Hold up 3 fingers (no words at all!)
All three say the same thing, but in different ways!
43.4 For Beginners: Why Data Formats Matter
The Problem: How do you send sensor data so another device can understand it?
Sensor reads: Temperature = 23.5°C, Humidity = 65%
How to send it?
Option 1: "23.5,65" <- Which is which? What units?
Option 2: "temp=23.5;hum=65" <- Better, but custom format
Option 3: {"temp":23.5,"humidity":65} <- JSON (standard!)
Analogy: Languages for Data
Data formats are like languages—both sides must speak the same one:
| Human Languages | Data Formats |
|---|---|
| English | JSON |
| Chinese | XML |
| Morse Code | Binary/CBOR |
Trade-offs:
| Format | Human-Readable | Size | Parse Speed | Ecosystem |
|---|---|---|---|---|
| JSON | Excellent | Large | Medium | Universal |
| XML | Good | Huge | Slow | Legacy systems |
| CBOR | Binary | Compact | Fast | Growing |
| MessagePack | Binary | Compact | Fast | Moderate |
| Protobuf | Binary | Very compact | Very fast | Strong (gRPC) |
| Custom Binary | None | Smallest | Fastest | DIY only |
In one sentence: Choose your data format based on bandwidth constraints and debugging needs - JSON for prototyping and high-bandwidth networks, CBOR for most IoT deployments, and custom binary only when every byte matters.
Remember this rule: CBOR gives you 80% of custom binary’s efficiency with 20% of the engineering effort - it’s the sweet spot for most constrained IoT applications.
Core Concept: IoT data formats exist on a spectrum from human-readable (JSON at 100 bytes) to machine-optimized (binary at 10 bytes), and the IETF standardized CBOR as the recommended middle ground for constrained IoT networks.
Why It Matters: Data format directly impacts three critical costs: bandwidth (cellular IoT charges per KB), battery life (larger payloads = longer radio-on time), and development time (binary formats require custom parsers). A smart thermostat sending JSON over Wi-Fi costs nothing extra, but the same thermostat on NB-IoT cellular could cost $50/year more in data fees versus CBOR.
Key Takeaway: Follow the industry standard “50-byte rule”: if your typical payload exceeds 50 bytes, switch from JSON to CBOR. If it exceeds 100 bytes on LPWAN, consider Protobuf or custom binary. For payloads under 20 bytes (most sensors), format overhead matters more than the data itself - a 10-byte reading in JSON becomes 50+ bytes, while CBOR keeps it under 20.
43.5 The Format Spectrum: Human-Readable to Binary
Alternative View:
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D'}}}%%
timeline
title IoT Project Format Evolution Journey
section Prototype Phase
10 Devices : JSON
: Easy debugging
: Rapid development
: 95 bytes per message
section Pilot Deployment
100 Devices : JSON or CBOR?
: First bandwidth concerns
: Cloud costs appearing
: Consider optimization
section Production Scale
1,000 Devices : CBOR
: 50 bytes per message
: 47% bandwidth savings
: Acceptable complexity
section High-Scale Fleet
10,000+ Devices : Protobuf
: 22 bytes per message
: Schema enforcement
: 77% savings critical
section Ultra-Constrained
LPWAN/Battery : Custom Binary
: 16 bytes per message
: Maximum efficiency
: Only when necessary
43.6 Real-World Example: Temperature Reading Comparison
Let’s compare how a simple temperature sensor reading is encoded in different formats. This is the actual data sent over the network:
Sensor data: Temperature = 23.5°C, Device ID = “sensor-001”, Timestamp = 1702732800
43.6.1 Format 1: JSON (Human-Readable)
{"deviceId":"sensor-001","temp":23.5,"unit":"C","ts":1702732800}- Size: 62 bytes
- Hex dump:
7B 22 64 65 76 69 63 65 49 64 22 3A 22 73 65 6E 73 6F 72 2D 30 30 31 22 2C 22 74 65 6D 70 22 3A 32 33 2E 35 2C 22 75 6E 69 74 22 3A 22 43 22 2C 22 74 73 22 3A 31 37 30 32 37 33 32 38 30 30 7D - Overhead: Field names (“deviceId”, “temp”, “unit”, “ts”) + JSON syntax ({, }, :, “,”) = ~35 bytes
- Readable: Yes, you can read it in a text editor
43.6.2 Format 2: CBOR (Binary JSON)
A4 # Map with 4 pairs
68 646576696365496420 # "deviceId" (8 chars)
6A 73656E736F722D303031 # "sensor-001" (10 chars)
64 74656D70 # "temp" (4 chars)
F9 4BBB # 23.5 as float16
64 756E6974 # "unit" (4 chars)
61 43 # "C" (1 char)
62 7473 # "ts" (2 chars)
1A 6577F080 # 1702732800 as uint32
- Size: 40 bytes (35% smaller than JSON)
- Overhead: Still includes field names, but uses efficient binary encoding
- Readable: No, requires CBOR parser
43.6.3 Format 3: Protocol Buffers (Schema-Based)
Schema file (sent once, not with each message):
message SensorReading {
string deviceId = 1;
float temp = 2;
string unit = 3;
uint64 ts = 4;
}Binary message:
0A 0A 73656E736F722D303031 # Field 1: "sensor-001"
15 0000BC41 # Field 2: 23.5 (float32)
1A 01 43 # Field 3: "C"
20 80F07765 # Field 4: 1702732800
- Size: 23 bytes (63% smaller than JSON)
- Overhead: Field numbers (1, 2, 3, 4) instead of names
- Readable: No, requires schema + protoc
43.6.4 Format 4: Custom Binary (DIY)
Byte layout:
[0-9]: deviceId "sensor-001" (10 bytes, ASCII)
[10-11]: temp = 235 (uint16, value x 10 = 23.5)
[12]: unit = 0 (enum: 0=C, 1=F)
[13-16]: timestamp (uint32, seconds since epoch)
Hex: 73656E736F722D30303100EB000601F07765
- Size: 17 bytes (73% smaller than JSON)
- Overhead: Zero! Every byte is data
- Readable: No, requires custom parser
43.6.5 Size Comparison Summary
| Format | Bytes | Reduction | Time to 1GB | Cost @ $0.01/KB |
|---|---|---|---|---|
| JSON | 62 | 0% (baseline) | 16.1M msgs | $620/GB |
| CBOR | 40 | 35% | 25.0M msgs | $400/GB |
| Protobuf | 23 | 63% | 43.5M msgs | $230/GB |
| Custom | 17 | 73% | 58.8M msgs | $170/GB |
Real-world impact: For 100 sensors sending data every 60 seconds over cellular at $0.01/KB: - JSON: $89/month data cost - CBOR: $58/month (35% savings = $31/month) - Protobuf: $33/month (63% savings = $56/month) - Custom: $24/month (73% savings = $65/month)
Key insight: The savings multiply with scale. For a 10,000-sensor deployment, choosing Protobuf over JSON saves $6,720/year in data costs alone!
43.7 JSON - The Universal Choice
JavaScript Object Notation is the most popular IoT data format.
Example:
{
"deviceId": "sensor-001",
"temp": 23.5,
"humidity": 65,
"timestamp": 1702834567
}Size: ~95 bytes
Pros:
- Human-readable, easy to debug
- Universal support (every language, platform, tool)
- Self-describing (field names included)
- Easy schema evolution
Cons:
- Large overhead (field names, quotes, braces)
- Inefficient for bandwidth-constrained networks
- Parsing requires more CPU/memory than binary
Best for: Wi-Fi, Ethernet, cellular IoT where bandwidth isn’t critical
Myth: “JSON is too large and slow for IoT systems - you should always use binary formats.”
Reality: It depends on your constraints!
43.7.1 When JSON is Perfect for IoT:
- Wi-Fi/Ethernet/LTE networks: Bandwidth is plentiful (megabits/sec), JSON’s 60-byte overhead is negligible
- Development/debugging: JSON is human-readable, reducing debugging time by hours
- Small deployments: For <100 devices, the total bandwidth difference is often <10GB/year
- Rapid prototyping: JSON libraries exist in every language, accelerating development
- Cloud integration: Most cloud IoT platforms (AWS IoT, Azure IoT) default to JSON
43.7.2 When to Consider Binary Formats:
- LPWAN networks (LoRaWAN, Sigfox, NB-IoT): Bandwidth measured in bytes/sec, not megabits/sec
- High message volume: >1000 devices sending >1 msg/min = TB/year scale
- Data cost constraints: Cellular data at $0.01/KB x 1 million messages = $620 (JSON) vs $170 (custom binary)
- Power-critical devices: Transmitting 60 bytes vs 17 bytes = 3.5x more radio energy
43.7.3 Real-World Data Point:
Smart thermostat (Wi-Fi, 1 message/5 minutes): - JSON: 60 bytes x 12 msgs/hour x 24 hours x 365 days = 6.3 MB/year - Custom binary: 17 bytes x same = 1.8 MB/year - Savings: 4.5 MB/year = $0.045/year per device at $0.01/KB
Verdict: For a 1000-home deployment, JSON costs $45/year more than custom binary. Is that worth the engineering complexity of maintaining a custom format? Usually no!
Key Lesson: Don’t optimize prematurely. Start with JSON, measure your actual bandwidth usage, then optimize if needed. Many production IoT systems run JSON happily for years before hitting bandwidth limits.
When binary formats actually matter: 1. Agricultural soil sensor (LoRaWAN): 12 readings/day x 60 bytes JSON = 720 bytes/day - Exceeds LoRaWAN daily limit! Must use CBOR or custom binary. 2. City parking sensor (Sigfox): 12-byte message limit - Must use custom binary, no choice. 3. Fitness tracker (BLE): 1 reading/sec x 60 bytes x 3600 sec/hour = 216 KB/hour - Drains battery! Must use efficient binary format.
Bottom line: Use JSON by default. Switch to binary formats when you have actual evidence (measurements, not assumptions) that bandwidth or power consumption is a problem.
43.8 Summary
Key Points:
- Data formats are the “languages” devices use to communicate
- JSON is human-readable but verbose (~95 bytes for typical sensor data)
- Binary formats (CBOR, Protobuf) reduce size by 35-77%
- Format choice impacts bandwidth costs, battery life, and development time
- Start with JSON for prototyping, optimize later if needed
Format Overview Table:
| Format | Size | Readability | Best For |
|---|---|---|---|
| JSON | Large (baseline) | Excellent | Wi-Fi, prototyping, debugging |
| XML | Very large | Good | Legacy systems |
| CBOR | 35% smaller | Binary | LoRaWAN, NB-IoT, CoAP |
| Protobuf | 63% smaller | Binary | High-volume, gRPC |
| Custom | 73% smaller | Binary | Sigfox, ultra-low-power |
43.9 What’s Next
Now that you understand why data formats matter and how JSON compares to binary alternatives:
- Next: Binary Data Formats - Deep dive into CBOR, Protocol Buffers, and custom binary encoding
- Alternative: Data Format Selection - Jump to the decision guide if you need to choose a format now
- Practice: Data Formats Practice - Work through scenarios and quizzes