Data Serialisation: Converting in-memory data structures into a byte sequence for transmission and storage
JSON (JavaScript Object Notation): A human-readable text format for structured data; verbose but universally supported
CBOR (Concise Binary Object Representation): A binary encoding of JSON-like data structures (RFC 7049); 3–10× more compact than JSON
Protocol Buffers (Protobuf): Google’s schema-defined binary serialisation format; very compact and fast but requires shared schema definition
MessagePack: A binary serialisation format compatible with JSON types; more compact than JSON, simpler than Protobuf
Endianness: The byte order in which multi-byte integers are stored or transmitted; big-endian is the network standard (network byte order)
Base64 Encoding: Encoding binary data as ASCII text for transport over text-only channels; adds 33% size overhead
38.1 In 60 Seconds
All digital communication reduces to bits (0s and 1s), grouped into bytes (8 bits). Network speeds are measured in bits per second (bps) while storage uses bytes – a critical distinction (1 MBps = 8 Mbps). IoT devices are severely constrained: LoRaWAN packets carry only 51-242 bytes, and microcontrollers may have just kilobytes of RAM. Understanding data sizing helps you design efficient payloads and calculate transfer times for IoT scenarios.
38.2 Learning Objectives
By the end of this section, you will be able to:
Explain Binary Representation: Explain how bits represent information using only 1s and 0s and convert binary values to decimal equivalents
Calculate Between Units: Convert between bytes, kilobytes, megabytes, gigabytes, and bits using correct multiplication factors
Differentiate Network Speeds: Distinguish between bits per second (network capacity) and bytes per second (storage throughput) and apply the 8x conversion factor
Apply to IoT Contexts: Calculate data transfer requirements, channel utilization, and battery life impact for common IoT scenarios
Compare Encoding Formats: Evaluate JSON, CBOR, Protocol Buffers, and custom binary encoding against IoT payload constraints
38.3 Prerequisites
Before diving into this chapter, you should be familiar with:
Networking Basics: Understanding fundamental networking concepts and terminology
Why Data Representation Matters for IoT
Every byte counts in IoT. Constrained devices have limited memory (often kilobytes, not megabytes), and wireless protocols like LoRaWAN support only 51-242 bytes per packet. Understanding how data is represented and sized helps you design efficient IoT systems that work within these constraints.
For Beginners: Understanding Bits and Bytes
At its core, all digital communication comes down to ones and zeros. A bit is the smallest unit of data – like a light switch that is either on (1) or off (0). When we group 8 bits together, we get a byte, which is enough to represent a single letter, number, or small value.
Think of it like this:
1 bit = A single yes/no answer
1 byte (8 bits) = One character like ‘A’ or ‘7’
1 kilobyte (1,024 bytes) = A short text message
1 megabyte (1,024 KB) = A photograph
1 gigabyte (1,024 MB) = A short video
Understanding these building blocks helps you estimate how much data your IoT sensors will generate and how long it takes to transmit.
Sensor Squad: Counting in Ones and Zeros!
“Everything I communicate is made of bits – tiny ones and zeros!” said Sammy the Sensor. “My temperature reading of 23.5 degrees? It is stored as a pattern of bits in my memory. A single bit is like a light switch – on or off, 1 or 0.”
“Eight bits make a byte,” explained Max the Microcontroller. “That is enough to store one character, like the letter ‘A’ or the number 42. My microcontroller has just 256 kilobytes of RAM – that is about 262,000 bytes. Sounds like a lot, but a single photo from a security camera is several megabytes!”
“Here is a tricky distinction,” warned Lila the LED. “Network speeds use bits per second (lowercase ‘b’), but file sizes use Bytes (uppercase ‘B’). A 100 Mbps Wi-Fi connection can transfer about 12.5 MBps. Do not mix them up or your calculations will be off by 8 times!”
“For IoT, every byte matters,” added Bella the Battery. “A LoRaWAN packet can only carry 51 to 242 bytes. My sensor reading is about 10 bytes, so it fits easily. But if you try to send a photo over LoRa? Forget it – you would need hundreds of packets! Always choose your data format and protocol based on how many bytes you actually need to send.”
38.4 Bits and Bytes: The Language of Networks
38.4.1 Binary Representation
Digital computers and communication systems represent all data as BITs – binary digits with only two values: 1 (one) and 0 (zero).
Essential for understanding how IoT devices encode and transmit data.
38.5 IoT Data Size Considerations
38.5.1 Constrained Device Memory
IoT devices operate with severely limited resources compared to traditional computers:
Device Class
Typical RAM
Typical Flash
Example
Class 0
< 10 KB
< 100 KB
Sensor tags
Class 1
~10 KB
~100 KB
Basic sensors
Class 2
~50 KB
~250 KB
Smart sensors
Gateway
256 MB+
1 GB+
Edge processors
This means every byte in your protocol matters. A 1 KB JSON message that seems small on a web server represents 10% of available RAM on a Class 1 device.
38.5.2 Protocol Overhead Impact
When transmitting small sensor readings, protocol overhead becomes significant:
Temperature reading: 4 bytes (float)
+ JSON formatting: ~20 bytes ({"temp":23.5})
+ MQTT overhead: ~2 bytes minimum
+ TCP/IP headers: 40 bytes
+ Ethernet frame: 26 bytes
---------------------------------
Total: ~92 bytes for 4 bytes of data
Protocol efficiency: 4.3%
For IoT, binary protocols (CoAP, MQTT) and efficient encoding (CBOR, Protocol Buffers) can dramatically improve efficiency.
38.6 Worked Example: Data Encoding Formats for IoT
Different data encoding formats produce dramatically different payload sizes for the same sensor data. Choosing the right format directly impacts bandwidth, power consumption, and battery life.
Scenario: An environmental sensor sends temperature (23.5 C), humidity (67%), and CO2 (412 ppm) readings every 5 minutes over LoRaWAN (max 51-242 byte payload depending on spreading factor).
Size: 53 bytes – exceeds the LoRaWAN SF12 payload limit (51 bytes). With key names consuming most of the space, this format wastes bandwidth on repeated field names that the receiver already knows. For SF12, you would need to shorten key names or drop the timestamp.
Size: 9 bytes – 83% smaller than JSON. Temperature stored as 235 (23.5 x 10) in a signed 16-bit integer, humidity as single byte, CO2 as 16-bit unsigned, timestamp as 32-bit unsigned.
38.6.4 Encoding Comparison
Format
Size
Efficiency
Self-Describing
LoRaWAN SF12 Compatible
JSON
53 bytes
1x (baseline)
Yes
No (51B limit)
CBOR
35 bytes
1.5x better
Yes
Yes
Protocol Buffers
14 bytes
3.8x better
Schema required
Yes
Custom Binary
9 bytes
5.9x better
No
Yes
38.6.5 Battery Life Impact
For a LoRaWAN sensor transmitting every 5 minutes (288 messages/day) at SF10 (250 bps), the encoding choice directly affects radio on-time and battery life:
JSON (53 bytes = 424 bits): 424 / 250 = 1.70 seconds TX time
Custom Binary (9 bytes = 72 bits): 72 / 250 = 0.29 seconds TX time
At 44 mA TX current and 2000 mAh battery:
JSON: 1.70s x 288 msgs/day x 44mA / 3600 = 5.98 mAh/day → 334 days
Binary: 0.29s x 288 msgs/day x 44mA / 3600 = 1.02 mAh/day → 1,961 days
Battery life ratio: 5.9x longer with binary encoding
Key Insight: For constrained IoT devices, binary encoding is not an optimization – it is a requirement. The 5.9x battery life improvement from switching JSON to binary encoding can mean the difference between annual and multi-year battery replacement in a deployment of thousands of sensors.
Try It: IoT Payload Size and Battery Life Calculator
38.7 Real-World Case Study: Smart Agriculture Payload Design
A precision agriculture company deploys 2,000 soil sensors across 500 acres. Each sensor measures moisture (0-100%), temperature (-20 to 60 C), and electrical conductivity (0-5000 uS/cm) every 15 minutes.
38.8 Decision Framework: Choosing an IoT Data Encoding Format
Selecting the right encoding format requires balancing payload size, decode complexity, and interoperability. This decision table provides quantitative criteria for common IoT scenarios.
Criterion
JSON
CBOR
Protocol Buffers
Custom Binary
Payload size (3 sensor values + timestamp)
53 bytes
35 bytes
14 bytes
9 bytes
Decode CPU (ARM Cortex-M0)
2.1 ms
0.8 ms
0.3 ms
0.1 ms
RAM for decoder
4-8 KB
2-4 KB
1-2 KB
0.1-0.5 KB
Schema required
No
No
Yes (.proto file)
Yes (custom docs)
Human-readable
Yes
Partial (diagnostic)
No
No
Self-describing
Yes
Yes
No
No
Standard tooling
Excellent
Good
Excellent
None
Cross-platform
Universal
Growing
Excellent
Custom only
When to use each format:
JSON: During prototyping, for cloud-to-cloud APIs, and when human readability in logs is essential. Avoid for constrained devices transmitting over LPWAN.
CBOR: When you need compact self-describing payloads – ideal for CoAP (IETF standard pairing) and heterogeneous device fleets where schema versioning is difficult.
Protocol Buffers: For medium-to-large IoT platforms where schema management is feasible, especially with gRPC backends. Google, Uber, and Netflix use this for fleet telemetry.
Custom Binary: For extreme-constraint scenarios (LoRaWAN SF12, Class 0 devices with <10 KB RAM). Requires tight coupling between sender and receiver firmware versions.
Real deployment example: Semtech’s reference LoRaWAN sensor uses custom binary encoding with a 7-byte payload (2 bytes temperature, 2 bytes humidity, 2 bytes pressure, 1 byte battery), achieving 18-month battery life at SF10. The same data in JSON would be approximately 85 bytes – exceeding the SF10 payload limit and forcing SF7 (shorter range, more gateways needed).
38.9 Knowledge Check
Common Pitfalls
1. Using JSON for High-Frequency Sensor Data on Constrained Networks
A JSON-encoded temperature reading ({"t":23.5}) is 10 bytes vs 3 bytes in CBOR. At 100 readings/second from 100 sensors, that is 700 extra bytes/second per sensor — a 70 kbps overhead. Fix: use binary encoding (CBOR, Protobuf, or custom binary) for high-frequency data on constrained links.
2. Not Agreeing on a Shared Schema Before Deployment
Protobuf and similar schema-based formats require both sender and receiver to use the same schema version. Schema mismatches cause silent data corruption or parse failures. Fix: version your schemas and include the schema version in every message, with a compatibility policy for backward and forward compatibility.
3. Forgetting Endianness When Implementing Custom Binary Formats
A sensor sending a 16-bit temperature value in little-endian format will be misinterpreted by a gateway expecting big-endian (network byte order). Fix: document the byte order explicitly in the data format specification and use struct.pack('>H', value) (Python) or htons() (C) consistently.
🏷️ Label the Diagram
Code Challenge
38.10 Summary
Bits are the foundation of all digital communication, representing data as 1s and 0s
8 bits = 1 byte, with data sizes scaling through KB, MB, GB, TB
Network speeds use bits (100 Mb/s) while storage uses bytes (100 MB) – remember to convert (divide by 8)
IoT devices are constrained – Class 0/1 devices have only 10-100 KB of memory
Protocol overhead matters – small sensor readings can have 90%+ overhead from headers
Encoding format choice directly impacts battery life – binary encoding can extend battery life by 5-6x compared to JSON
38.11 What’s Next
Now that you can calculate data sizes, convert between bits and bytes, and select appropriate encoding formats for IoT payloads, the next section explores how this data is packaged into datagrams for transmission across networks. You will learn about packet structure, headers, payloads, and how large data is fragmented into manageable pieces.