12  Data Formats for IoT

In 60 Seconds

IoT data formats determine how sensor readings are encoded for transmission, directly impacting bandwidth, battery life, and development speed. The four main formats range from human-readable JSON (best for prototyping and Wi-Fi networks) to CBOR (47% smaller, ideal for LoRaWAN and CoAP), Protocol Buffers (77% smaller, best for high-volume stable APIs), and custom binary (83% smaller, for extreme constraints like Sigfox). On a LoRaWAN sensor, switching from JSON (56 bytes) to custom binary (8 bytes) extends battery life by 6.8x – from 12.5 years to 85.6 years on a single 2,400 mAh battery.

Data format choice directly impacts battery life through radio transmission time:

\[\text{Transmission Time} = \frac{\text{Payload Size (bytes)} \times 8 \text{ bits/byte}}{\text{Data Rate (bits/sec)}}\]

\[\text{Battery Drain} = \text{Radio Current (mA)} \times \text{Transmission Time (sec)}\]

Worked example: LoRaWAN sensor at SF12 (250 bits/sec data rate), 44 mA radio current, transmitting 24 times/day.

JSON payload (56 bytes): (56 × 8) / 250 = 1.79 seconds airtime. Energy: 44 mA × 1.79s = 78.8 mAs = 21.9 µAh per transmission. Daily: 24 × 21.9 = 526 µAh/day.

Custom binary (8 bytes): (8 × 8) / 250 = 0.26 seconds airtime. Energy: 44 mA × 0.26s = 11.4 mAs = 3.2 µAh per transmission. Daily: 24 × 3.2 = 76.8 µAh/day.

Battery life with 2,400 mAh battery: JSON = 2,400,000 / 526 = 4,563 days (12.5 years). Binary = 2,400,000 / 76.8 = 31,250 days (85.6 years). The 86% size reduction from binary encoding extends battery life by 6.8x.

Try it yourself: Adjust the parameters below to see how format choice impacts battery life.

12.1 Learning Objectives

By the end of this chapter series, you will be able to:

  • Compare data formats: Evaluate JSON, CBOR, Protobuf, and custom binary based on size, readability, and protocol compatibility
  • Calculate payload overhead: Quantify how format choice impacts bandwidth consumption and battery life using airtime and energy equations
  • Apply selection frameworks: Use decision trees to choose the optimal format for specific IoT constraints
  • Design efficient payloads: Construct compact message formats for bandwidth-constrained networks such as LoRaWAN and Sigfox
  • Implement encoding and decoding: Write serialization code in Python for JSON, CBOR, and custom binary formats
  • Evaluate migration strategies: Assess format transition approaches for production IoT deployments with thousands of deployed devices
Key Concepts
  • JSON: Human-readable key-value format — default IoT API format but 2-5× larger than binary alternatives
  • CBOR: Concise Binary Object Representation — binary superset of JSON data model, no schema required
  • Protocol Buffers (Protobuf): Schema-defined binary serialization achieving maximum efficiency with code-generated parsers
  • MessagePack: Binary-JSON bridge: no schema needed, 30-50% smaller than JSON, drop-in replacement for JSON libraries
  • Serialization: Converting in-memory data structures to bytes for transmission or storage
  • Deserialization: Reconstructing data structures from bytes — must match the serialization format exactly
  • Data Format Trade-off: Human readability (JSON) vs. size efficiency (CBOR/Protobuf) vs. schema flexibility (self-describing vs. compiled)

12.2 MVU: Minimum Viable Understanding

In 60 seconds, understand IoT data formats:

Data formats are the “languages” IoT devices use to communicate. The choice directly impacts three critical resources:

Resource impact at a glance:

  • Bandwidth: Smaller formats mean more messages per second. Example: 95 bytes (JSON) vs 16 bytes (binary).
  • Battery: Less data means less radio time and longer life. Example: roughly 6x efficiency gain with binary.
  • Development: Human-readable formats are easier to debug. Example: JSON shows {"temp": 23} vs 0x17.

The four main formats (from human-readable to compact):

Format Size Reduction When to Use
JSON Baseline (0%) Wi-Fi networks, prototyping, debugging
CBOR ~47% smaller LoRaWAN, NB-IoT, CoAP - the sweet spot
Protobuf ~77% smaller High-volume APIs, stable schemas
Custom Binary ~83% smaller Sigfox, extreme constraints

The golden rule: Start with JSON for development, move to CBOR for production IoT, use custom binary only when every byte counts.

Read on for detailed comparisons, or jump to the Decision Framework to pick your format.


12.3 For Kids: The Secret Languages of Smart Devices!

Hey there, future inventor! Have you ever wondered how smart devices talk to each other?

12.3.1 Meet the Message Messengers!

Imagine you need to send a message to a friend far away. You could:

  1. Write a long letter (like JSON) - Easy to read but takes lots of paper!
  2. Use abbreviations (like CBOR) - “Temp=23” instead of “The temperature is 23 degrees”
  3. Send a secret code (like Binary) - Super fast but only computers understand!

12.3.2 A Sensor Squad Story

Temperature Terry the sensor measured 25 degrees. He needed to tell Cloudy the cloud computer!

Terry’s choices:

  • Letter (JSON): “Dear Cloudy, Today the temperature is exactly 25 degrees Celsius. From Terry.”
    • 95 letters long! Uses lots of battery to send!
  • Quick Note (CBOR): “T:25”
    • Only 50 letters! Faster to send!
  • Secret Code (Binary): “00011001”
    • Just 16 letters! Super fast but hard to read!

Terry chose CBOR because it saved battery but his friend the programmer could still read it if something went wrong!

12.3.3 The Message Size Game

Message Type Size Like…
JSON BIG Writing a full story
CBOR MEDIUM Writing a quick text
Binary TINY Drawing one emoji

12.3.4 Why Does Size Matter?

Battery Bella the sensor says: > “Every time I send a message, I use battery power! If I send shorter messages, my battery lasts LONGER! That’s why I love compact formats!”

12.3.5 Real-World Examples

Device Format Used Why
Your phone apps JSON Easy for people to fix bugs
Smart farm sensors CBOR Saves battery in fields
Satellite trackers Binary Every byte costs money!

Fun Challenge: If JSON uses 95 letters and Binary uses 16, how many times smaller is Binary? (Answer: About 6 times smaller!)


See Also

Chapter Series:

Fundamentals:

Protocol Integration:


12.4 Overview

Data formats determine how IoT devices encode and exchange information. The choice of format impacts bandwidth usage, battery life, parsing speed, and development complexity. This series covers everything from human-readable JSON to ultra-compact custom binary encodings.

Think of data formats like different ways to pack a suitcase:

  • JSON is like packing with labels on everything: “This is my blue shirt, this is my toothbrush” - easy to find things but takes up space!
  • CBOR is like vacuum-sealing with small labels - saves space while still being organized
  • Binary is like compressing everything into the smallest possible bag - maximum efficiency but you need to remember where everything is!

Why does this matter for IoT?

  1. Constrained networks: LoRaWAN can only send 51 bytes at a time - JSON won’t fit!
  2. Battery life: A sensor sending 6x less data can last 6x longer on batteries
  3. Cost: Cellular IoT charges per byte - smaller messages = lower bills

The simple rule: Use the most human-readable format your constraints allow.


12.5 The Data Format Spectrum

Understanding the trade-offs between formats is essential for making the right choice:

Data format spectrum showing JSON, XML, CBOR, Protocol Buffers, and custom binary arranged from human-readable and larger on the left to machine-efficient and smaller on the right, with each format annotated by typical payload size and best-fit deployment

12.5.1 Format Selection Flow

Decision flow for selecting an IoT data format by asking whether the link is constrained, whether the schema changes frequently, whether multiple services share a stable contract, and whether every byte is critical, leading to JSON, CBOR, Protocol Buffers, or custom binary


12.6 Chapter Guide

This topic has been organized into four focused chapters:

12.6.1 1. IoT Data Formats Overview

Difficulty: Foundational | Time: ~20 minutes

Start here to understand why data formats matter. Covers:

  • Why format choice impacts bandwidth, battery, and development
  • JSON as the universal default
  • The format spectrum from human-readable to binary
  • Real-world size comparisons (62-byte JSON vs 17-byte custom binary)
  • Common misconceptions about JSON in IoT

Best for: Beginners, anyone starting a new IoT project


12.6.2 2. Binary Data Formats

Difficulty: Intermediate | Time: ~25 minutes

Deep dive into efficient binary encodings. Covers:

  • CBOR: Binary JSON with 47% size reduction
  • Protocol Buffers: Schema-based with 77% reduction
  • Custom Binary: Maximum efficiency patterns
  • Type systems, encoding details, and performance benchmarks
  • Library recommendations by platform

Best for: Developers implementing bandwidth-constrained systems


12.6.3 3. Data Format Selection

Difficulty: Intermediate | Time: ~20 minutes

Decision frameworks for choosing the right format. Covers:

  • Step-by-step decision tree
  • Real-world application mapping (12 use cases)
  • Detailed cost analysis scenario (soil moisture network)
  • Fleet tracking quiz with calculations
  • Total cost of ownership considerations

Best for: Technical leads, architects making format decisions


12.6.4 4. Data Formats Practice

Difficulty: Intermediate | Time: ~30 minutes

Hands-on scenarios and assessments. Covers:

  • Smart meter deployment scenario (10,000 devices)
  • Agricultural sensor design (LoRaWAN constraints)
  • Format migration strategy (50,000 deployed sensors)
  • Industrial monitoring (two-tier format strategy)
  • Worked example: LoRaWAN payload design with battery calculations
  • Knowledge check questions

Best for: Students preparing for assessments, practitioners validating knowledge


12.7 Quick Reference

Format snapshot:

  • JSON: 95 bytes; baseline size; best for Wi-Fi, prototyping, and debugging
  • CBOR: 50 bytes; 47% smaller; best for LoRaWAN, NB-IoT, and CoAP
  • Protobuf: 22 bytes; 77% smaller; best for high-volume systems, gRPC, and stable schemas
  • Custom binary: 16 bytes; 83% smaller; best for Sigfox, ultra-low-power, and extreme constraints

12.7.1 Real-World Size Comparison

For a typical sensor reading: device_id=ABC123, temperature=23.5, humidity=67, timestamp=1706400000

Size comparison chart for one sensor reading showing JSON at 95 bytes, CBOR at 50 bytes, Protocol Buffers at 22 bytes, and custom binary at 16 bytes, with relative monthly traffic totals and savings compared with JSON

Interactive Size Calculator: See how format choice impacts message size and network efficiency.


12.8 Knowledge Check

Test your understanding of data format fundamentals:

Scenario: You’re designing a smart agriculture system with 500 soil moisture sensors using LoRaWAN (51-byte payload limit). Each sensor sends: device ID (6 chars), moisture (0-100%), temperature (-10 to 50C), battery (0-100%), and timestamp.

Which format would you choose?

  1. JSON - for easy debugging
  2. XML - for compatibility with enterprise systems
  3. CBOR - for good balance of size and debuggability
  4. Custom binary - for maximum efficiency

Answer: C) CBOR

CBOR is the best choice.

Why not the others?

  • JSON (A): Would exceed the 51-byte LoRaWAN limit (~95 bytes)
  • XML (B): Even larger than JSON (~120 bytes)
  • Custom binary (D): Works but adds engineering complexity; CBOR fits comfortably and is easier to debug

CBOR fits (~45 bytes) and allows field technicians to decode messages for troubleshooting.

Scenario: A sensor sends 100 messages per day. The radio uses 100mA during transmission at 50kbps.

If switching from JSON (95 bytes) to CBOR (50 bytes), approximately how much less battery is used per day?

  1. No change - format doesn’t affect battery
  2. About 47% less (proportional to size reduction)
  3. About 90% less
  4. Depends on the processor speed

Answer: B) About 47% less

This is proportional to the size reduction.

Calculation:

  • JSON: 95 bytes x 100 messages = 9,500 bytes/day
  • CBOR: 50 bytes x 100 messages = 5,000 bytes/day
  • Reduction: (9,500 - 5,000) / 9,500 = 47%

Radio transmission time (and therefore battery drain) is directly proportional to data size. Smaller messages = less radio-on time = longer battery life.

Which scenario justifies the engineering effort of custom binary encoding?

  1. A home automation hub with Wi-Fi connectivity
  2. A fitness tracker syncing to a smartphone via Bluetooth
  3. A Sigfox-based asset tracker with 12-byte payload limit
  4. A smart factory with Ethernet-connected PLCs

Answer: C) A Sigfox-based asset tracker with 12-byte payload limit

Why?

  • Sigfox’s 12-byte limit is so restrictive that even CBOR may not fit a complete message
  • Custom binary allows precise bit-packing (e.g., latitude/longitude in 3 bytes each instead of 8)
  • The high engineering cost is justified by the extreme constraints

Why not the others?

  • A, B, D: All have sufficient bandwidth for CBOR or even JSON, making custom binary an unnecessary complexity.


12.9 Worked Example: LoRaWAN Payload Design

Designing a 51-Byte LoRaWAN Payload for Agricultural Sensors

Scenario: A vineyard deploys 200 soil monitoring nodes. Each node measures: soil moisture (0-100%), soil temperature (-10 to 60C), air temperature (-20 to 50C), battery voltage (2.5-4.2V), and includes a device ID. LoRaWAN DR0 (SF12) allows a maximum 51-byte payload.

Step 1: JSON encoding (will it fit?)

{"id":"VINE042","sm":67.3,"st":18.5,"at":22.1,"bv":3.82}

Size: 56 bytes – exceeds the 51-byte limit. JSON is out.

Step 2: CBOR encoding

CBOR uses binary keys and compact value encoding:

A5                        # map(5)
  61 69                   # text(1) "i"
  67 56494E45303432       # text(7) "VINE042"
  61 6D                   # text(1) "m"
  FB 4050D33333333333     # float64(67.3)
  61 73                   # text(1) "s"
  FB 4032800000000000     # float64(18.5)
  61 74                   # text(1) "t"
  FB 40361999999999A      # float64(22.1)
  61 62                   # text(1) "b"
  FB 400E8F5C28F5C28F     # float64(3.82)

Size: ~47 bytes – fits, but barely. No room for future fields.

Step 3: Custom binary encoding (optimal for LoRaWAN)

import struct

def encode_vineyard_payload(device_id, soil_moisture, soil_temp,
                            air_temp, battery_mv):
    """Pack sensor data into 8 bytes for LoRaWAN transmission."""
    # Version:       1 byte (message format version)
    # Device ID:     2 bytes (0-65535)
    # Soil moisture: 1 byte (0-200 = 0.0-100.0%, 0.5% steps)
    # Soil temp:     1 byte (offset by 40: value 0-127 maps to -40 to 87C)
    # Air temp:      1 byte (offset by 40: value 0-127 maps to -40 to 87C)
    # Battery:       1 byte (0-255 maps to 2.0V-4.55V in 10mV steps)
    # Reserved:      1 byte (future sensors)

    payload = struct.pack('>B H B B B B B',
        0x01,                            # format version 1
        device_id,
        int(soil_moisture * 2),          # 0.5% resolution
        int(soil_temp + 40),             # offset encoding
        int(air_temp + 40),              # offset encoding
        int((battery_mv - 2000) / 10),   # 10mV resolution
        0x00                             # reserved
    )
    return payload  # 8 bytes total

# Example
payload = encode_vineyard_payload(42, 67.3, 18.5, 22.1, 3820)
print(f"Payload: {payload.hex()}")   # 01 002a 86 3a 3e b6 00
print(f"Size: {len(payload)} bytes") # 8 bytes

Result: 8 bytes – 85% smaller than JSON, 83% smaller than CBOR.

Step 4: Battery life impact calculation

LoRaWAN SF12 data rate: 250 bits/second
Radio current draw: 44 mA (SX1276 at +14 dBm)

JSON (56 bytes = 448 bits):  DOES NOT FIT in DR0
CBOR (47 bytes = 376 bits):  376 / 250 = 1.50 seconds airtime
Binary (8 bytes = 64 bits):  64 / 250  = 0.26 seconds airtime

Per transmission energy:
  CBOR:   44 mA x 1.50 s = 66.0 mAs = 18.3 uAh
  Binary: 44 mA x 0.26 s = 11.4 mAs = 3.2 uAh

Daily (24 transmissions):
  CBOR:   24 x 18.3 uAh = 439 uAh/day
  Binary: 24 x 3.2 uAh  = 76.8 uAh/day

Extra battery life from binary vs CBOR:
  Savings: 439 - 76.8 = 362 uAh/day saved
  On 2400 mAh battery (radio only):
    CBOR:   2,400,000 / 439 = 5,467 days = 15.0 years
    Binary: 2,400,000 / 76.8 = 31,250 days = 85.6 years

  Note: total device lifetime limited by sleep current and
  battery self-discharge (~2%/year), not radio time alone.
  Realistic lifetime difference: ~8 years (CBOR) vs ~10 years (binary).

Conclusion: For this vineyard application, custom binary is the clear winner. The 8-byte payload leaves 43 bytes of headroom for future sensor additions, and the reduced airtime saves battery and reduces LoRaWAN duty cycle consumption.

12.10 Code Example: Encoding and Decoding in Python

import json
import struct
# pip install cbor2
import cbor2

# Sample sensor reading
reading = {
    "device_id": "SENS042",
    "temperature": 23.5,
    "humidity": 67,
    "timestamp": 1706400000
}

# --- JSON encoding ---
json_bytes = json.dumps(reading, separators=(',', ':')).encode('utf-8')
print(f"JSON:   {len(json_bytes)} bytes")
# Output: JSON:   74 bytes

# --- CBOR encoding ---
cbor_bytes = cbor2.dumps(reading)
print(f"CBOR:   {len(cbor_bytes)} bytes")
# Output: CBOR:   49 bytes

# --- Custom binary encoding ---
def encode_binary(device_id, temp_x10, humidity, timestamp):
    """Pack into 10 bytes: 2B device + 2B temp + 1B humidity + 4B time + 1B CRC"""
    payload = struct.pack('>HhBI',
        device_id,    # uint16
        temp_x10,     # int16 (temp x 10 for 0.1C resolution)
        humidity,     # uint8 (0-100%)
        timestamp     # uint32
    )
    crc = sum(payload) & 0xFF  # simple checksum
    return payload + bytes([crc])

binary_bytes = encode_binary(42, 235, 67, 1706400000)
print(f"Binary: {len(binary_bytes)} bytes")
# Output: Binary: 10 bytes

# --- Decoding binary ---
def decode_binary(data):
    device_id, temp_x10, humidity, timestamp = struct.unpack('>HhBI', data[:9])
    return {
        "device_id": device_id,
        "temperature": temp_x10 / 10.0,
        "humidity": humidity,
        "timestamp": timestamp
    }

decoded = decode_binary(binary_bytes)
print(f"Decoded: {decoded}")
# Output: Decoded: {'device_id': 42, 'temperature': 23.5, 'humidity': 67, ...}

# --- Size comparison ---
print(f"\nSize comparison for one reading:")
print(f"  JSON:   {len(json_bytes):3d} bytes (baseline)")
print(f"  CBOR:   {len(cbor_bytes):3d} bytes ({100-len(cbor_bytes)*100//len(json_bytes)}% smaller)")
print(f"  Binary: {len(binary_bytes):3d} bytes ({100-len(binary_bytes)*100//len(json_bytes)}% smaller)")

What to observe: Run this code and note how the same data shrinks from 74 bytes (JSON) to 49 bytes (CBOR) to just 10 bytes (binary). The trade-off is that JSON is readable as text, CBOR is debuggable with a CBOR decoder, but binary requires your custom format documentation to interpret.

12.11 Prerequisites

Before starting this series, you should be familiar with:


Common Pitfalls

Sending {‘temperature’: ‘23.5’} as a string instead of {‘temperature’: 23.5} as a number forces consumers to parse strings, breaks numeric queries, and increases message size. Validate that all numeric fields are encoded as JSON numbers — this is the most common IoT data format error.

Adding a new field to a JSON/CBOR message immediately after a firmware update means old consumers (not yet updated) receive unexpected fields. Use additive-only schema changes (never remove or rename fields), version your schemas, and handle unknown fields gracefully in all consumers.

IEEE 754 float64 uses 8 bytes per value — a 10-field sensor reading becomes 80 bytes just for numbers. Integer-scaled fixed point (temperature × 100 as int16) reduces the same reading to 2 bytes per value with identical precision for typical IoT ranges.

12.13 Summary

Key Takeaways:

  1. Format choice is a trade-off: Human readability vs. efficiency - there’s no perfect format for all cases
  2. CBOR is the IoT sweet spot: 47% size reduction with self-describing structure for debugging
  3. Match format to constraints: LoRaWAN/Sigfox need compact formats; Wi-Fi can use JSON freely
  4. Consider total cost: Development time, debugging ease, and battery life all factor into format selection
  5. Start simple, optimize later: Begin with JSON for prototyping, switch to CBOR/binary for production

12.14 What’s Next