11  Data Formats Practice and Assessment

In 60 Seconds

Practice choosing IoT data formats through real-world scenarios: smart meter deployments, LoRaWAN agriculture sensors, format migration strategies, and industrial monitoring. Learn to calculate payload sizes, compare annual costs, and design custom binary encodings that maximize battery life.

11.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Solve format selection scenarios: Apply decision frameworks to real-world IoT cases involving smart meters, agriculture sensors, and industrial monitoring
  • Calculate payload sizes and costs: Compute byte-level encodings and annual bandwidth expenses for JSON, CBOR, Protobuf, and custom binary formats
  • Debug encoding issues: Diagnose and fix common serialization problems such as Base64 overhead, endianness mismatches, and precision loss
  • Design migration strategies: Plan phased format upgrades from JSON to binary without service disruption or data loss
  • Evaluate battery-life trade-offs: Quantify how payload size affects transmission energy and device longevity for LoRaWAN deployments
  • Justify format recommendations: Defend format choices with quantitative cost-benefit analysis across the full sensor-to-cloud pipeline
  • JSON: Human-readable key-value format — default IoT API format but 2-5× larger than binary alternatives
  • CBOR: Concise Binary Object Representation — binary superset of JSON data model, no schema required
  • Protocol Buffers (Protobuf): Schema-defined binary serialization achieving maximum efficiency with code-generated parsers
  • MessagePack: Binary-JSON bridge: no schema needed, 30-50% smaller than JSON, drop-in replacement for JSON libraries
  • Serialization: Converting in-memory data structures to bytes for transmission or storage
  • Deserialization: Reconstructing data structures from bytes — must match the serialization format exactly
  • Data Format Trade-off: Human readability (JSON) vs. size efficiency (CBOR/Protobuf) vs. schema flexibility (self-describing vs. compiled)

11.2 For Beginners: Data Formats Practice

This chapter is a hands-on exercise where you practice choosing the best way to package sensor data for different situations. Think of data formats like choosing between sending a text message, an email, or a letter – each has trade-offs in size, speed, and cost. You will work through real scenarios (like a farm with thousands of sensors) and learn to calculate which format saves the most battery and bandwidth.

This is part of a series on IoT Data Formats:

  1. IoT Data Formats Overview - Introduction and text formats
  2. Binary Data Formats - CBOR, Protobuf, custom binary
  3. Data Format Selection - Decision guides and real-world examples
  4. Data Formats Practice (this chapter) - Scenarios, quizzes, worked examples

11.3 Prerequisites

Before starting this chapter, you should have completed:


11.4 Knowledge Check Scenarios

Test your understanding of IoT data formats with these scenario-based questions.

Situation: You’re deploying 10,000 smart electricity meters across a city using NB-IoT cellular connectivity. Each meter sends readings every 15 minutes, including:

  • Current power consumption (float)
  • Voltage (float)
  • Meter ID (8-character string)
  • Timestamp (Unix epoch)

The cellular plan costs $0.02 per MB. You’re evaluating JSON vs CBOR vs Protocol Buffers.

Question: Which format would you choose, and why? Calculate the annual data cost difference between JSON and your recommended format.

Recommended: Protocol Buffers (or CBOR)

Calculation:

  • Messages per meter per year: 4/hour x 24 hours x 365 days = 35,040 messages
  • Total messages: 35,040 x 10,000 meters = 350.4 million messages/year

Format size estimates for this payload:

  • JSON: ~75 bytes (field names, quotes, braces)
  • CBOR: ~40 bytes (binary encoding, field names still present)
  • Protobuf: ~20 bytes (field numbers instead of names)

Annual data costs:

  • JSON: 350.4M x 75 bytes = 26.3 GB/year = $526/year
  • CBOR: 350.4M x 40 bytes = 14.0 GB/year = $280/year
  • Protobuf: 350.4M x 20 bytes = 7.0 GB/year = $140/year

Savings with Protobuf vs JSON: $386/year (73% reduction)

Interactive Calculator: Annual Data Cost

Why Protobuf over CBOR? At this scale (10,000 devices), the additional upfront schema definition cost is justified by:

  1. Stronger typing prevents data corruption bugs
  2. Efficient code generation for both meters and cloud
  3. Schema evolution handled gracefully
  4. Industry standard for high-volume IoT pipelines

Situation: You’re adding a new soil nutrient sensor to an existing LoRaWAN deployment. The sensor measures:

  • pH level (0-14, one decimal precision)
  • Nitrogen content (mg/kg, integer 0-1000)
  • Phosphorus content (mg/kg, integer 0-1000)
  • Potassium content (mg/kg, integer 0-1000)
  • Soil temperature (C, one decimal, -20 to 60)

The LoRaWAN gateway is at the edge of its range (SF12), limiting payload to 51 bytes maximum. Readings are sent every 4 hours.

Question: Can you use JSON for this sensor? If not, design a custom binary format that fits within the 51-byte limit.

JSON is risky and not recommended here.

Sample JSON: {"pH":7.2,"N":450,"P":320,"K":280,"temp":18.5} = 48 bytes

This barely fits, but:

  1. No room for error (one more decimal place breaks it)
  2. No room for device ID or timestamp
  3. JSON parsing on constrained MCU is expensive

Custom Binary Format (9 bytes total):

Field Encoding Bytes Range Notes
pH uint8 (value x 10) 1 0-140 -> 0.0-14.0 Divide by 10 to decode
Nitrogen uint16 2 0-1000 Direct value
Phosphorus uint16 2 0-1000 Direct value
Potassium uint16 2 0-1000 Direct value
Temperature int16 (C x 10) 2 -200 to 600 -> -20.0 to 60.0C Divide by 10 to decode
Total 9 bytes ~81% smaller than JSON

Benefits:

  • 9 bytes vs 48 bytes = 81% bandwidth savings
  • 42 bytes remaining for device ID, timestamp, battery voltage
  • Lower transmission energy = longer battery life
  • Simpler parsing on constrained MCU

Alternative: CBOR would be ~20 bytes, which also fits and provides more flexibility than custom binary.

Situation: Your company has 50,000 deployed sensors sending JSON over MQTT. Management wants to reduce cellular data costs by 50%. The sensors have limited flash memory (32KB available for firmware updates) and are deployed in remote locations (firmware updates are expensive to verify).

Question: What format would you migrate to, and what is your migration strategy to avoid bricking devices or losing data during the transition?

Recommended Format: CBOR

Why CBOR over Protobuf for this migration:

  1. Same data model as JSON - minimal code changes required
  2. Self-describing - cloud can decode without schema synchronization
  3. Smaller library footprint - fits in 32KB flash constraint
  4. No breaking changes - can decode both JSON and CBOR during transition

Migration Strategy (Zero-Downtime):

Phase 1: Cloud Preparation (Week 1)

  • Update cloud ingest to accept both JSON and CBOR
  • Add content-type header detection (application/json vs application/cbor)
  • Validate CBOR decoding matches JSON semantics

Phase 2: Gradual Firmware Rollout (Weeks 2-8)

  • Push firmware update to 1% of devices (500 sensors)
  • Monitor for decode errors, data quality issues
  • If successful, expand to 10%, 50%, 100%
  • New firmware sends CBOR but can fallback to JSON on decode error response

Phase 3: Transition Monitoring (Weeks 8-12)

  • Track ratio of JSON vs CBOR messages
  • Identify any devices that failed to update
  • Cloud continues accepting both formats indefinitely for stragglers

Phase 4: Cost Verification

  • Measure actual bandwidth reduction (target: 40-50%)
  • Calculate ROI: savings vs firmware update costs

Critical Success Factors:

  • Never remove JSON support from cloud (some devices may never update)
  • Include version field in firmware to track migration status
  • Have rollback plan if CBOR library causes stability issues

Situation: A factory floor has 200 vibration sensors monitoring machine health. Each sensor samples at 1 kHz and sends FFT results (256 frequency bins, each a 32-bit float) every second. The system requires:

  • Less than 100ms latency for anomaly detection
  • 99.99% reliability (equipment damage/downtime is extremely expensive)
  • Local edge processing before cloud upload

Question: What format would you use for sensor-to-edge communication vs edge-to-cloud communication? Justify your choices.

Two-Tier Format Strategy:

Sensor-to-Edge: Custom Binary or FlatBuffers

  • Payload: 256 floats x 4 bytes = 1,024 bytes per message
  • Messages: 200 sensors x 1/sec = 200 messages/sec = 204.8 KB/sec
  • Latency requirement: <100ms means format must be ultra-fast to parse

Why Custom Binary/FlatBuffers:

  • Zero-copy access - read floats directly from buffer, no parsing
  • Predictable latency - no variable-length decoding surprises
  • Minimal CPU overhead - critical when processing 200 streams simultaneously
  • FlatBuffers advantage - provides schema evolution if needed later

Edge-to-Cloud: Protocol Buffers with Delta Compression

After edge processing, only anomalies and aggregates are sent:

  • Normal operation: 1 aggregate per minute per sensor (summary stats)
  • Anomaly detected: Full FFT spectrum + alerts

Why Protobuf:

  • Schema enforcement - cloud and edge must agree on data format
  • Compression-friendly - repeated similar values compress well
  • Language-agnostic - edge might be C++, cloud is Python/Java
  • gRPC integration - natural fit for edge-to-cloud RPC patterns

Bandwidth comparison:

Path Raw Data With Strategy Savings
Sensor to Edge 204.8 KB/s 204.8 KB/s 0% (local)
Edge to Cloud (normal) 204.8 KB/s 2 KB/s (aggregates) 99%
Edge to Cloud (anomaly) 204.8 KB/s 10 KB/s (selected FFTs) 95%

Key insight: Format choice depends on communication path. Local high-speed links can afford larger payloads; cloud links need aggressive optimization.


11.5 Worked Example: Payload Design for Battery-Constrained Sensor

11.6 Worked Example: Optimizing Payload Size for LoRaWAN Sensor

Scenario: You are designing a soil moisture sensor for precision agriculture. The sensor runs on 2x AA batteries and must operate for 2+ years. It connects via LoRaWAN (max payload 51 bytes in SF12 mode) and sends readings every 15 minutes. Each reading includes: soil moisture (0-100%), temperature (-40 to +60C), battery voltage (2.0-3.6V), and timestamp.

Goal: Design the most efficient payload format that fits LoRaWAN constraints while maximizing battery life.

What we do: Determine the precision and range needed for each data field.

Why: Choosing appropriate precision prevents wasting bytes on unnecessary accuracy.

Field analysis:

Field Range Required Precision Example Value
Soil moisture 0-100% 0.5% resolution 42.5%
Temperature -40 to +60C 0.1C resolution 23.7C
Battery voltage 2.0-3.6V 0.01V resolution 3.24V
Timestamp Unix epoch 1-second resolution 1702732800

Key insight: We don’t need floating point! All values can be encoded as integers with implied decimal places:

  • 42.5% becomes 425 (divide by 10 on receive)
  • 23.7C becomes 237 (divide by 10 on receive)
  • 3.24V becomes 324 (divide by 100 on receive)

What we do: Calculate payload size for JSON, CBOR, and custom binary formats.

Why: Size directly impacts battery life (transmission energy) and LoRaWAN feasibility.

JSON approach (human-readable):

{"m":42.5,"t":23.7,"v":3.24,"ts":1702732800}
  • Size: 44 bytes
  • Pros: Debuggable, self-describing
  • Cons: Quotes, colons, braces waste 20+ bytes

CBOR approach (binary JSON):

A4                          # Map with 4 items
  61 6D                     # "m" (1 char)
  F9 4528                   # 42.5 as float16
  61 74                     # "t" (1 char)
  F9 43C3                   # 23.7 as float16
  61 76                     # "v" (1 char)
  F9 40A3                   # 3.24 as float16
  62 7473                   # "ts" (2 chars)
  1A 657DA400               # 1702732800 as uint32
  • Size: 22 bytes
  • Pros: Self-describing, standard format
  • Cons: Field names still consume bytes

Custom binary approach (maximum efficiency):

Byte layout:
[0-1]:   moisture = 425 (uint16, value/10 = 42.5%)
[2-3]:   temperature = 637 (uint16 with offset, (val-400)/10 = 23.7C)
[4-5]:   battery = 324 (uint16, value/100 = 3.24V)
[6-9]:   timestamp (uint32, seconds since epoch)

Hex: 01A9 027D 0144 657DA400
  • Size: 10 bytes (77% smaller than JSON!)
  • Pros: Minimal size, maximum battery life
  • Cons: Not self-describing, requires documentation

What we do: Show exact byte-by-byte calculations for the custom binary format.

Why: Understanding the encoding enables implementation and debugging.

Custom binary encoding detail:

Field Raw Value Encoding Bytes Hex
Moisture 42.5% 42.5 x 10 = 425 2 0x01A9
Temperature 23.7C (23.7 + 40) x 10 = 637 2 0x027D
Battery 3.24V 3.24 x 100 = 324 2 0x0144
Timestamp 1702732800 Direct uint32 4 0x657DA400
Total 10

Temperature encoding trick: Adding 40 offset makes -40C = 0, allowing unsigned integer storage:

  • -40C becomes (-40+40) x 10 = 0
  • +60C becomes (60+40) x 10 = 1000
  • Range 0-1000 fits in uint16 (0-65535)

Decoding formula (receiver side):

def decode_sensor_payload(data):
    moisture = struct.unpack('>H', data[0:2])[0] / 10.0      # 425 -> 42.5%
    temp_raw = struct.unpack('>H', data[2:4])[0]
    temperature = (temp_raw / 10.0) - 40.0                    # 637 -> 23.7C
    battery = struct.unpack('>H', data[4:6])[0] / 100.0      # 324 -> 3.24V
    timestamp = struct.unpack('>I', data[6:10])[0]           # Direct
    return moisture, temperature, battery, timestamp

What we do: Quantify how payload size affects battery life.

Why: For LoRaWAN devices, transmission energy dominates power budget.

Transmission energy scales with total packet size for LoRaWAN. The total packet includes payload plus protocol overhead (LoRaWAN header, CRC, etc.). For SF12 at 14dBm, approximate energy is \(E_{\text{tx}} = (n_{\text{total\_bytes}}) \times 0.5\,\text{mJ/byte}\) where total bytes includes ~60 bytes of protocol overhead plus your payload. Worked example: A 44-byte JSON payload becomes 104 total bytes (44 + 60 overhead) requiring \(104 \times 0.5 = 52\,\text{mJ}\) per transmission, while a 10-byte binary payload becomes 70 total bytes requiring \(70 \times 0.5 = 35\,\text{mJ}\), saving 33% energy per message. Over 2-3 years at 96 messages/day, this extends battery life by months.

LoRaWAN transmission energy (SF12, 14dBm):

  • Base overhead: ~60 bytes (header, CRC, etc.)
  • Energy per byte: ~0.5 mJ (millijoules)

Daily energy consumption:

Format Payload Total TX Energy/msg Msgs/day Energy/day
JSON 44 bytes 104 bytes 52 mJ 96 4,992 mJ
CBOR 22 bytes 82 bytes 41 mJ 96 3,936 mJ
Binary 10 bytes 70 bytes 35 mJ 96 3,360 mJ

Battery life calculation (2x AA = ~2500 mAh = 27,000 J at 3V nominal):

Format Daily Energy Battery Life Improvement
JSON 4,992 mJ 5,413 days (14.8 years) Baseline
CBOR 3,936 mJ 6,861 days (18.8 years) +27%
Binary 3,360 mJ 8,036 days (22.0 years) +49%

Reality check: These calculations assume only TX energy. Real battery life is 2-3 years due to sensor, MCU, and leakage. But relative differences still apply!

Interactive Calculator: LoRaWAN Battery Life

Outcome: Custom binary format maximizes battery life within LoRaWAN constraints.

Final payload specification:

+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Byte 0 | Byte 1 | Byte 2 | Byte 3 | Byte 4 | Byte 5 | Byte 6 | Byte 7 | Byte 8 | Byte 9 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
|    Moisture     |   Temperature   |    Battery      |           Timestamp             |
|    (uint16)     | (uint16+offset) |    (uint16)     |            (uint32)             |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+

Encoding:
- Moisture: value x 10 (42.5% becomes 425)
- Temperature: (value + 40) x 10 (23.7C becomes 637)
- Battery: value x 100 (3.24V becomes 324)
- Timestamp: Unix epoch seconds (big-endian)

Total: 10 bytes (fits LoRaWAN SF12 with room to spare)

Key design decisions:

  1. No field names - Documentation-based format (77% size reduction)
  2. Fixed-point integers - Avoid float overhead, preserve precision
  3. Temperature offset - Enables unsigned encoding of negative values
  4. Big-endian - Network byte order for consistency
  5. Timestamp included - Handles network delays, out-of-order delivery

Implementation note:

// ESP32/Arduino encoding
void encode_payload(uint8_t* buf, float moisture, float temp, float battery, uint32_t ts) {
    uint16_t m = (uint16_t)(moisture * 10);
    uint16_t t = (uint16_t)((temp + 40) * 10);
    uint16_t v = (uint16_t)(battery * 100);
    buf[0] = m >> 8; buf[1] = m & 0xFF;
    buf[2] = t >> 8; buf[3] = t & 0xFF;
    buf[4] = v >> 8; buf[5] = v & 0xFF;
    buf[6] = ts >> 24; buf[7] = (ts >> 16) & 0xFF;
    buf[8] = (ts >> 8) & 0xFF; buf[9] = ts & 0xFF;
}

11.7 Additional Knowledge Check

Knowledge Check: Data Format Selection Quick Check

Concept: Choosing the right data format for IoT applications.


11.9 How It Works: From Sensor Reading to Cloud Ingestion

Understanding the complete data format pipeline helps you make informed encoding decisions. Here’s how a sensor reading travels from measurement to cloud storage.

Step 1: Sensor Measurement The temperature sensor produces an analog voltage (e.g., 2.35V representing 23.5°C). The ADC converts this to a digital value (e.g., 2350 at 0.01V resolution).

Step 2: In-Memory Representation The microcontroller stores this as a float variable temperature = 23.5 (4 bytes in RAM) or could optimize to uint16_t temp_raw = 235 (2 bytes, implied decimal).

Step 3: Format Selection Decision The firmware chooses an encoding format based on: - Connectivity: LoRaWAN (51-byte limit) → Custom binary. WiFi (unlimited) → JSON. - Scale: 10 sensors → JSON is fine. 10,000 sensors → CBOR/Protobuf saves $thousands/year. - Lifespan: 6-month deployment → JSON. 10-year deployment → Protobuf for schema evolution.

Step 4: Serialization The firmware serializes the reading: - JSON: {"device":"A1","temp":23.5,"ts":1702732800} → 44 bytes - CBOR: Binary encoding of same structure → 22 bytes - Custom Binary: Device ID (2 bytes) + temp×10 (2 bytes) + timestamp (4 bytes) → 8 bytes

Step 5: Transmission The serialized payload is transmitted via the chosen protocol: - LoRaWAN adds 13-byte header → Total: 21 bytes (for 8-byte payload) - MQTT adds variable header (~5 bytes) + topic name (~15 bytes) → Total: ~28 bytes - CoAP adds 4-byte header → Total: 12 bytes

Step 6: Gateway Processing The network gateway may transcode the format: - LoRaWAN gateway: Forwards raw 8-byte binary unchanged (good!) - Legacy REST API: Wraps binary in Base64+JSON → 24 bytes (bad! 3x inflation) - Modern binary protocol: Keeps binary encoding → 8 bytes (good!)

Step 7: Cloud Ingestion The cloud service deserializes and stores: - Time-series DB (InfluxDB, TimescaleDB): Optimized columnar storage, ~2-3 bytes/reading long-term - Document DB (MongoDB, DynamoDB): JSON storage, ~40-60 bytes/reading - Data lake (S3 Parquet): Highly compressed, ~1 byte/reading in aggregate files

Key Insight: The format you choose in Step 4 affects battery life (Step 5), gateway costs (Step 6), and cloud storage expenses (Step 7). A 10-byte binary payload that gets Base64-wrapped becomes 24 bytes in cloud storage – always trace the full pipeline!

Example End-to-End Bandwidth:

Stage JSON (44 bytes) CBOR (22 bytes) Binary (8 bytes)
Sensor→Gateway 57 bytes (44+13 header) 35 bytes (22+13) 21 bytes (8+13)
Gateway→Cloud (if Base64-wrapped) 44 bytes (JSON pass-through) 30 bytes (Base64) 12 bytes (Base64)
Cloud Storage (TimescaleDB) ~40 bytes (JSON doc) 2.5 bytes (decomposed) 2.5 bytes (decomposed)

What to Observe: Optimizing transmission (sensor→gateway) saves battery. Optimizing cloud ingestion (gateway→cloud) saves bandwidth costs. Optimizing storage format saves long-term costs. You need to consider ALL three.

11.10 Concept Relationships

Understanding how data format concepts interrelate helps you navigate trade-offs:

Primary Concept Depends On Enables Common Confusion
JSON Text encoding (UTF-8), key-value data model Human-readable debugging, web APIs “JSON is always slower” – false for small payloads with fast parsers
CBOR Binary encoding, JSON data model 30-60% size reduction, type safety (vs JSON) “CBOR is the same as JSON” – no, it’s binary encoding WITH additional types
Protocol Buffers Schema definition (proto files), code generation Type safety, schema evolution, tooling “Protobuf is too complex for IoT” – upfront cost pays off at scale (10K+ devices)
Custom Binary Bit-level encoding, documentation Maximum efficiency, domain-specific optimization “Custom binary is always best” – ignores maintenance, debugging, interoperability costs
Base64 Encoding ASCII representation of binary Embedding binary in text protocols (JSON, XML) “Base64 compresses data” – NO! It expands by 33%
Schema Evolution Versioning strategy, backward compatibility Long-term deployments (5-10 years) “Add version byte to custom binary” – requires migration code for each version
Endianness Byte order (big-endian/little-endian) Multi-byte integer serialization “Endianness only matters for multi-platform” – it matters for ANY binary format
Payload Size Encoding format + data precision + metadata Transmission energy, airtime costs “Smaller is always better” – ignores parsing CPU cost on battery-powered MCU

How These Concepts Work Together:

  1. Choose format family (text vs binary) based on bandwidth constraints
  2. Within binary formats, choose self-describing (CBOR) vs schema-based (Protobuf) vs custom based on scale and lifespan
  3. Account for Base64 encoding if crossing text-based APIs (adds 33% overhead)
  4. Plan schema evolution strategy upfront for deployments lasting >2 years
  5. Document endianness and precision decisions for any custom binary format
  6. Calculate total pipeline cost (sensor→gateway→cloud→storage), not just over-the-air size

Critical Decision Points:

If Your Deployment Has… Then Prioritize… Because…
Tight bandwidth limits (LoRaWAN, NB-IoT) Custom binary or CBOR Payload size directly affects battery life
10,000+ devices Protocol Buffers or CBOR Schema validation prevents data corruption at scale
10-year lifespan Protocol Buffers with versioning Schema evolution is inevitable
WiFi/Ethernet connectivity JSON or CBOR Bandwidth is cheap, debugging ease matters more
Mixed platforms (Arduino → Python → Java) Protobuf or CBOR Language-agnostic formats prevent vendor lock-in
Frequent schema changes JSON or CBOR Self-describing formats tolerate ad-hoc fields

11.11 Format Matching Challenge

Match each IoT data format to its defining characteristic:

11.12 Format Selection Decision Process

Place the steps for selecting an IoT data format in the correct order:


Common Pitfalls

Sending {‘temperature’: ‘23.5’} as a string instead of {‘temperature’: 23.5} as a number forces consumers to parse strings, breaks numeric queries, and increases message size. Validate that all numeric fields are encoded as JSON numbers — this is the most common IoT data format error.

Adding a new field to a JSON/CBOR message immediately after a firmware update means old consumers (not yet updated) receive unexpected fields. Use additive-only schema changes (never remove or rename fields), version your schemas, and handle unknown fields gracefully in all consumers.

IEEE 754 float64 uses 8 bytes per value — a 10-field sensor reading becomes 80 bytes just for numbers. Integer-scaled fixed point (temperature × 100 as int16) reduces the same reading to 2 bytes per value with identical precision for typical IoT ranges.

11.13 Summary

Key Takeaways from Practice:

  1. Context matters more than raw size - JSON is often fine for Wi-Fi deployments
  2. Calculate total cost of ownership - Include development, maintenance, and debugging costs
  3. Plan for evolution - Schema changes are inevitable over device lifetimes
  4. Use two-tier strategies - Different formats for local vs cloud communication
  5. Migration requires careful planning - Never remove support for old formats immediately

Practice Checklist:

  • Can you calculate payload sizes for JSON, CBOR, Protobuf, and custom binary?
  • Can you design a custom binary encoding for given sensor data?
  • Can you justify format choices with quantitative analysis?
  • Can you plan a migration strategy from JSON to binary formats?

Sammy the Sensor has a problem. “I need to send my temperature reading to the cloud, but I only have a tiny envelope to put it in!”

Max the Microcontroller grins. “That’s like choosing how to write a letter! You could write a long, fancy letter with beautiful handwriting (that’s JSON), or you could write a short coded message (that’s binary)!”

Lila the LED holds up examples:

  • “Dear Cloud, the temperature is 23.5 degrees” = 32 bytes (like JSON – easy to read but long!)
  • “T:23.5” = 6 bytes (like a text message – shorter!)
  • Just the number “235” = 2 bytes (like a secret code – super tiny!)

Bella the Battery does a happy dance. “I LOVE the short version! The less Sammy has to transmit, the longer I last! With the tiny code, I can live for YEARS!”

“But wait,” says Sammy, “won’t the cloud be confused by just a number?”

“That’s why we agree on the code beforehand,” Max explains. “We tell the cloud: ‘the first two bytes are temperature times 10.’ It’s like having a secret decoder ring!”

The Squad’s Rule: Pick the format that fits your envelope! Big envelope (Wi-Fi)? Write fancy letters (JSON). Tiny envelope (LoRaWAN)? Use your secret code (binary)!


11.14 What’s Next

Now that you’ve practiced data format selection and design, explore these related topics:

Topic Chapter Why It Matters
Protocol headers Packet Structure and Framing See how your chosen data format fits inside protocol packets and framing layers
MQTT messaging MQTT Fundamentals Implement JSON and CBOR payloads with the most widely used IoT messaging protocol
Protocol selection Protocol Selector Wizard Get interactive recommendations for your specific deployment scenario
Data formats overview IoT Data Formats Overview Review the foundational concepts of text-based formats like JSON and XML
Binary formats Binary Data Formats Revisit CBOR, Protocol Buffers, and custom binary encoding details
CoAP protocol CoAP Fundamentals Explore the lightweight protocol designed for constrained IoT devices using CBOR