IoT data formats determine how sensor readings are encoded for transmission, directly impacting bandwidth, battery life, and development speed. The four main formats range from human-readable JSON (best for prototyping and Wi-Fi networks) to CBOR (47% smaller, ideal for LoRaWAN and CoAP), Protocol Buffers (77% smaller, best for high-volume stable APIs), and custom binary (83% smaller, for extreme constraints like Sigfox). On a LoRaWAN sensor, switching from JSON (56 bytes) to custom binary (8 bytes) extends battery life by 6.8x – from 12.5 years to 85.6 years on a single 2,400 mAh battery.
Putting Numbers to It
Data format choice directly impacts battery life through radio transmission time:
Battery life with 2,400 mAh battery: JSON = 2,400,000 / 526 = 4,563 days (12.5 years). Binary = 2,400,000 / 76.8 = 31,250 days (85.6 years). The 86% size reduction from binary encoding extends battery life by 6.8x.
Try it yourself: Adjust the parameters below to see how format choice impacts battery life.
By the end of this chapter series, you will be able to:
Compare data formats: Evaluate JSON, CBOR, Protobuf, and custom binary based on size, readability, and protocol compatibility
Calculate payload overhead: Quantify how format choice impacts bandwidth consumption and battery life using airtime and energy equations
Apply selection frameworks: Use decision trees to choose the optimal format for specific IoT constraints
Design efficient payloads: Construct compact message formats for bandwidth-constrained networks such as LoRaWAN and Sigfox
Implement encoding and decoding: Write serialization code in Python for JSON, CBOR, and custom binary formats
Evaluate migration strategies: Assess format transition approaches for production IoT deployments with thousands of deployed devices
Key Concepts
JSON: Human-readable key-value format — default IoT API format but 2-5× larger than binary alternatives
CBOR: Concise Binary Object Representation — binary superset of JSON data model, no schema required
Protocol Buffers (Protobuf): Schema-defined binary serialization achieving maximum efficiency with code-generated parsers
MessagePack: Binary-JSON bridge: no schema needed, 30-50% smaller than JSON, drop-in replacement for JSON libraries
Serialization: Converting in-memory data structures to bytes for transmission or storage
Deserialization: Reconstructing data structures from bytes — must match the serialization format exactly
Data Format Trade-off: Human readability (JSON) vs. size efficiency (CBOR/Protobuf) vs. schema flexibility (self-describing vs. compiled)
12.2 MVU: Minimum Viable Understanding
In 60 seconds, understand IoT data formats:
Data formats are the “languages” IoT devices use to communicate. The choice directly impacts three critical resources:
Resource impact at a glance:
Bandwidth: Smaller formats mean more messages per second. Example: 95 bytes (JSON) vs 16 bytes (binary).
Battery: Less data means less radio time and longer life. Example: roughly 6x efficiency gain with binary.
Development: Human-readable formats are easier to debug. Example: JSON shows {"temp": 23} vs 0x17.
The four main formats (from human-readable to compact):
Format
Size Reduction
When to Use
JSON
Baseline (0%)
Wi-Fi networks, prototyping, debugging
CBOR
~47% smaller
LoRaWAN, NB-IoT, CoAP - the sweet spot
Protobuf
~77% smaller
High-volume APIs, stable schemas
Custom Binary
~83% smaller
Sigfox, extreme constraints
The golden rule: Start with JSON for development, move to CBOR for production IoT, use custom binary only when every byte counts.
Read on for detailed comparisons, or jump to the Decision Framework to pick your format.
12.3 For Kids: The Secret Languages of Smart Devices!
Sensor Squad: How Do Devices Send Messages?
Hey there, future inventor! Have you ever wondered how smart devices talk to each other?
12.3.1 Meet the Message Messengers!
Imagine you need to send a message to a friend far away. You could:
Write a long letter (like JSON) - Easy to read but takes lots of paper!
Use abbreviations (like CBOR) - “Temp=23” instead of “The temperature is 23 degrees”
Send a secret code (like Binary) - Super fast but only computers understand!
12.3.2 A Sensor Squad Story
Temperature Terry the sensor measured 25 degrees. He needed to tell Cloudy the cloud computer!
Terry’s choices:
Letter (JSON): “Dear Cloudy, Today the temperature is exactly 25 degrees Celsius. From Terry.”
95 letters long! Uses lots of battery to send!
Quick Note (CBOR): “T:25”
Only 50 letters! Faster to send!
Secret Code (Binary): “00011001”
Just 16 letters! Super fast but hard to read!
Terry chose CBOR because it saved battery but his friend the programmer could still read it if something went wrong!
12.3.3 The Message Size Game
Message Type
Size
Like…
JSON
BIG
Writing a full story
CBOR
MEDIUM
Writing a quick text
Binary
TINY
Drawing one emoji
12.3.4 Why Does Size Matter?
Battery Bella the sensor says: > “Every time I send a message, I use battery power! If I send shorter messages, my battery lasts LONGER! That’s why I love compact formats!”
12.3.5 Real-World Examples
Device
Format Used
Why
Your phone apps
JSON
Easy for people to fix bugs
Smart farm sensors
CBOR
Saves battery in fields
Satellite trackers
Binary
Every byte costs money!
Fun Challenge: If JSON uses 95 letters and Binary uses 16, how many times smaller is Binary? (Answer: About 6 times smaller!)
Data formats determine how IoT devices encode and exchange information. The choice of format impacts bandwidth usage, battery life, parsing speed, and development complexity. This series covers everything from human-readable JSON to ultra-compact custom binary encodings.
For Beginners: Why Data Formats Matter
Think of data formats like different ways to pack a suitcase:
JSON is like packing with labels on everything: “This is my blue shirt, this is my toothbrush” - easy to find things but takes up space!
CBOR is like vacuum-sealing with small labels - saves space while still being organized
Binary is like compressing everything into the smallest possible bag - maximum efficiency but you need to remember where everything is!
Why does this matter for IoT?
Constrained networks: LoRaWAN can only send 51 bytes at a time - JSON won’t fit!
Battery life: A sensor sending 6x less data can last 6x longer on batteries
Test your understanding of data format fundamentals:
Question 1: Format Selection
Scenario: You’re designing a smart agriculture system with 500 soil moisture sensors using LoRaWAN (51-byte payload limit). Each sensor sends: device ID (6 chars), moisture (0-100%), temperature (-10 to 50C), battery (0-100%), and timestamp.
Which format would you choose?
JSON - for easy debugging
XML - for compatibility with enterprise systems
CBOR - for good balance of size and debuggability
Custom binary - for maximum efficiency
Answer: C) CBOR
CBOR is the best choice.
Why not the others?
JSON (A): Would exceed the 51-byte LoRaWAN limit (~95 bytes)
XML (B): Even larger than JSON (~120 bytes)
Custom binary (D): Works but adds engineering complexity; CBOR fits comfortably and is easier to debug
CBOR fits (~45 bytes) and allows field technicians to decode messages for troubleshooting.
Question 2: Battery Impact Calculation
Scenario: A sensor sends 100 messages per day. The radio uses 100mA during transmission at 50kbps.
If switching from JSON (95 bytes) to CBOR (50 bytes), approximately how much less battery is used per day?
No change - format doesn’t affect battery
About 47% less (proportional to size reduction)
About 90% less
Depends on the processor speed
Answer: B) About 47% less
This is proportional to the size reduction.
Calculation:
JSON: 95 bytes x 100 messages = 9,500 bytes/day
CBOR: 50 bytes x 100 messages = 5,000 bytes/day
Reduction: (9,500 - 5,000) / 9,500 = 47%
Radio transmission time (and therefore battery drain) is directly proportional to data size. Smaller messages = less radio-on time = longer battery life.
Question 3: When to Use Custom Binary
Which scenario justifies the engineering effort of custom binary encoding?
A home automation hub with Wi-Fi connectivity
A fitness tracker syncing to a smartphone via Bluetooth
A Sigfox-based asset tracker with 12-byte payload limit
A smart factory with Ethernet-connected PLCs
Answer: C) A Sigfox-based asset tracker with 12-byte payload limit
Why?
Sigfox’s 12-byte limit is so restrictive that even CBOR may not fit a complete message
Custom binary allows precise bit-packing (e.g., latitude/longitude in 3 bytes each instead of 8)
The high engineering cost is justified by the extreme constraints
Why not the others?
A, B, D: All have sufficient bandwidth for CBOR or even JSON, making custom binary an unnecessary complexity.
12.9 Worked Example: LoRaWAN Payload Design
Designing a 51-Byte LoRaWAN Payload for Agricultural Sensors
Scenario: A vineyard deploys 200 soil monitoring nodes. Each node measures: soil moisture (0-100%), soil temperature (-10 to 60C), air temperature (-20 to 50C), battery voltage (2.5-4.2V), and includes a device ID. LoRaWAN DR0 (SF12) allows a maximum 51-byte payload.
Size: ~47 bytes – fits, but barely. No room for future fields.
Step 3: Custom binary encoding (optimal for LoRaWAN)
import structdef encode_vineyard_payload(device_id, soil_moisture, soil_temp, air_temp, battery_mv):"""Pack sensor data into 8 bytes for LoRaWAN transmission."""# Version: 1 byte (message format version)# Device ID: 2 bytes (0-65535)# Soil moisture: 1 byte (0-200 = 0.0-100.0%, 0.5% steps)# Soil temp: 1 byte (offset by 40: value 0-127 maps to -40 to 87C)# Air temp: 1 byte (offset by 40: value 0-127 maps to -40 to 87C)# Battery: 1 byte (0-255 maps to 2.0V-4.55V in 10mV steps)# Reserved: 1 byte (future sensors) payload = struct.pack('>B H B B B B B',0x01, # format version 1 device_id,int(soil_moisture *2), # 0.5% resolutionint(soil_temp +40), # offset encodingint(air_temp +40), # offset encodingint((battery_mv -2000) /10), # 10mV resolution0x00# reserved )return payload # 8 bytes total# Examplepayload = encode_vineyard_payload(42, 67.3, 18.5, 22.1, 3820)print(f"Payload: {payload.hex()}") # 01 002a 86 3a 3e b6 00print(f"Size: {len(payload)} bytes") # 8 bytes
Result: 8 bytes – 85% smaller than JSON, 83% smaller than CBOR.
Step 4: Battery life impact calculation
LoRaWAN SF12 data rate: 250 bits/second
Radio current draw: 44 mA (SX1276 at +14 dBm)
JSON (56 bytes = 448 bits): DOES NOT FIT in DR0
CBOR (47 bytes = 376 bits): 376 / 250 = 1.50 seconds airtime
Binary (8 bytes = 64 bits): 64 / 250 = 0.26 seconds airtime
Per transmission energy:
CBOR: 44 mA x 1.50 s = 66.0 mAs = 18.3 uAh
Binary: 44 mA x 0.26 s = 11.4 mAs = 3.2 uAh
Daily (24 transmissions):
CBOR: 24 x 18.3 uAh = 439 uAh/day
Binary: 24 x 3.2 uAh = 76.8 uAh/day
Extra battery life from binary vs CBOR:
Savings: 439 - 76.8 = 362 uAh/day saved
On 2400 mAh battery (radio only):
CBOR: 2,400,000 / 439 = 5,467 days = 15.0 years
Binary: 2,400,000 / 76.8 = 31,250 days = 85.6 years
Note: total device lifetime limited by sleep current and
battery self-discharge (~2%/year), not radio time alone.
Realistic lifetime difference: ~8 years (CBOR) vs ~10 years (binary).
Conclusion: For this vineyard application, custom binary is the clear winner. The 8-byte payload leaves 43 bytes of headroom for future sensor additions, and the reduced airtime saves battery and reduces LoRaWAN duty cycle consumption.
12.10 Code Example: Encoding and Decoding in Python
What to observe: Run this code and note how the same data shrinks from 74 bytes (JSON) to 49 bytes (CBOR) to just 10 bytes (binary). The trade-off is that JSON is readable as text, CBOR is debuggable with a CBOR decoder, but binary requires your custom format documentation to interpret.
12.11 Prerequisites
Before starting this series, you should be familiar with:
1. Encoding Sensor Readings as JSON Strings Instead of Numbers
Sending {‘temperature’: ‘23.5’} as a string instead of {‘temperature’: 23.5} as a number forces consumers to parse strings, breaks numeric queries, and increases message size. Validate that all numeric fields are encoded as JSON numbers — this is the most common IoT data format error.
2. Ignoring Schema Evolution When Updating Firmware
Adding a new field to a JSON/CBOR message immediately after a firmware update means old consumers (not yet updated) receive unexpected fields. Use additive-only schema changes (never remove or rename fields), version your schemas, and handle unknown fields gracefully in all consumers.
3. Using Floating Point for All Numeric Fields
IEEE 754 float64 uses 8 bytes per value — a 10-field sensor reading becomes 80 bytes just for numbers. Integer-scaled fixed point (temperature × 100 as int16) reduces the same reading to 2 bytes per value with identical precision for typical IoT ranges.
🏷️ Label the Diagram
12.13 Summary
Key Takeaways:
Format choice is a trade-off: Human readability vs. efficiency - there’s no perfect format for all cases
CBOR is the IoT sweet spot: 47% size reduction with self-describing structure for debugging
Match format to constraints: LoRaWAN/Sigfox need compact formats; Wi-Fi can use JSON freely
Consider total cost: Development time, debugging ease, and battery life all factor into format selection
Start simple, optimize later: Begin with JSON for prototyping, switch to CBOR/binary for production