38  Data Representation in Networks

Key Concepts
  • Data Serialisation: Converting in-memory data structures into a byte sequence for transmission and storage
  • JSON (JavaScript Object Notation): A human-readable text format for structured data; verbose but universally supported
  • CBOR (Concise Binary Object Representation): A binary encoding of JSON-like data structures (RFC 7049); 3–10× more compact than JSON
  • Protocol Buffers (Protobuf): Google’s schema-defined binary serialisation format; very compact and fast but requires shared schema definition
  • MessagePack: A binary serialisation format compatible with JSON types; more compact than JSON, simpler than Protobuf
  • Endianness: The byte order in which multi-byte integers are stored or transmitted; big-endian is the network standard (network byte order)
  • Base64 Encoding: Encoding binary data as ASCII text for transport over text-only channels; adds 33% size overhead

38.1 In 60 Seconds

All digital communication reduces to bits (0s and 1s), grouped into bytes (8 bits). Network speeds are measured in bits per second (bps) while storage uses bytes – a critical distinction (1 MBps = 8 Mbps). IoT devices are severely constrained: LoRaWAN packets carry only 51-242 bytes, and microcontrollers may have just kilobytes of RAM. Understanding data sizing helps you design efficient payloads and calculate transfer times for IoT scenarios.

38.2 Learning Objectives

By the end of this section, you will be able to:

  • Explain Binary Representation: Explain how bits represent information using only 1s and 0s and convert binary values to decimal equivalents
  • Calculate Between Units: Convert between bytes, kilobytes, megabytes, gigabytes, and bits using correct multiplication factors
  • Differentiate Network Speeds: Distinguish between bits per second (network capacity) and bytes per second (storage throughput) and apply the 8x conversion factor
  • Apply to IoT Contexts: Calculate data transfer requirements, channel utilization, and battery life impact for common IoT scenarios
  • Compare Encoding Formats: Evaluate JSON, CBOR, Protocol Buffers, and custom binary encoding against IoT payload constraints

38.3 Prerequisites

Before diving into this chapter, you should be familiar with:

Why Data Representation Matters for IoT

Every byte counts in IoT. Constrained devices have limited memory (often kilobytes, not megabytes), and wireless protocols like LoRaWAN support only 51-242 bytes per packet. Understanding how data is represented and sized helps you design efficient IoT systems that work within these constraints.

At its core, all digital communication comes down to ones and zeros. A bit is the smallest unit of data – like a light switch that is either on (1) or off (0). When we group 8 bits together, we get a byte, which is enough to represent a single letter, number, or small value.

Think of it like this:

  • 1 bit = A single yes/no answer
  • 1 byte (8 bits) = One character like ‘A’ or ‘7’
  • 1 kilobyte (1,024 bytes) = A short text message
  • 1 megabyte (1,024 KB) = A photograph
  • 1 gigabyte (1,024 MB) = A short video

Understanding these building blocks helps you estimate how much data your IoT sensors will generate and how long it takes to transmit.

“Everything I communicate is made of bits – tiny ones and zeros!” said Sammy the Sensor. “My temperature reading of 23.5 degrees? It is stored as a pattern of bits in my memory. A single bit is like a light switch – on or off, 1 or 0.”

“Eight bits make a byte,” explained Max the Microcontroller. “That is enough to store one character, like the letter ‘A’ or the number 42. My microcontroller has just 256 kilobytes of RAM – that is about 262,000 bytes. Sounds like a lot, but a single photo from a security camera is several megabytes!”

“Here is a tricky distinction,” warned Lila the LED. “Network speeds use bits per second (lowercase ‘b’), but file sizes use Bytes (uppercase ‘B’). A 100 Mbps Wi-Fi connection can transfer about 12.5 MBps. Do not mix them up or your calculations will be off by 8 times!”

“For IoT, every byte matters,” added Bella the Battery. “A LoRaWAN packet can only carry 51 to 242 bytes. My sensor reading is about 10 bytes, so it fits easily. But if you try to send a photo over LoRa? Forget it – you would need hundreds of packets! Always choose your data format and protocol based on how many bytes you actually need to send.”


38.4 Bits and Bytes: The Language of Networks

38.4.1 Binary Representation

Digital computers and communication systems represent all data as BITs – binary digits with only two values: 1 (one) and 0 (zero).

Diagram showing one byte composed of 8 bits (10101000 binary) representing decimal value 168 and ASCII character encoding
Figure 38.1: Binary byte representation showing 8 bits and decimal/ASCII conversion

Why binary?

  • Simple: Only two states to detect
  • Fast: Can be processed at extremely high speeds
  • Reliable: Less prone to errors than multi-level systems
  • Universal: Any information can be represented

38.4.2 Bytes and Data Sizes

8 bits = 1 BYTE – used to represent a single character or value.

Common data sizes in IoT:

Unit Size Example Usage
Byte (B) 8 bits Single sensor reading
Kilobyte (KB) 1,024 bytes Small JSON message
Megabyte (MB) 1,024 KB Firmware update file
Gigabyte (GB) 1,024 MB Daily sensor data collection
Terabyte (TB) 1,024 GB Annual IoT platform storage

Real-world examples:

  • 55 MB document file
  • 16 GB microcontroller memory
  • 2 TB hard disk drive

38.4.3 Network Data Rates

Network capacity is measured in bits per second (bps or b/s):

Connection Type Data Rate IoT Use Case
LoRaWAN 0.3-50 kb/s Long-range sensors
BLE 1-2 Mb/s Wearables, beacons
Wi-Fi 10-1000 Mb/s Smart home devices
5G 100-1000 Mb/s Autonomous vehicles
Gigabit Ethernet 1 Gb/s Industrial IoT gateways
Bits vs Bytes: Common Confusion

Storage uses Bytes (B): “16 GB memory card” Network speed uses bits (b): “100 Mb/s Wi-Fi”

To convert: 1 Byte = 8 bits, so 100 Mb/s = 12.5 MB/s

Try It: Bits and Bytes Converter

38.4.4 Binary to Decimal Conversion

For human readability, binary is converted to decimal:

Examples:

  • 1010 binary = 10 decimal
  • 1000 1010 binary = 138 decimal

Learn binary number systems: Intro to Binary Numbers

Essential for understanding how IoT devices encode and transmit data.


38.5 IoT Data Size Considerations

38.5.1 Constrained Device Memory

IoT devices operate with severely limited resources compared to traditional computers:

Device Class Typical RAM Typical Flash Example
Class 0 < 10 KB < 100 KB Sensor tags
Class 1 ~10 KB ~100 KB Basic sensors
Class 2 ~50 KB ~250 KB Smart sensors
Gateway 256 MB+ 1 GB+ Edge processors

This means every byte in your protocol matters. A 1 KB JSON message that seems small on a web server represents 10% of available RAM on a Class 1 device.

38.5.2 Protocol Overhead Impact

When transmitting small sensor readings, protocol overhead becomes significant:

Temperature reading: 4 bytes (float)
+ JSON formatting: ~20 bytes ({"temp":23.5})
+ MQTT overhead: ~2 bytes minimum
+ TCP/IP headers: 40 bytes
+ Ethernet frame: 26 bytes
---------------------------------
Total: ~92 bytes for 4 bytes of data
Protocol efficiency: 4.3%

For IoT, binary protocols (CoAP, MQTT) and efficient encoding (CBOR, Protocol Buffers) can dramatically improve efficiency.


38.6 Worked Example: Data Encoding Formats for IoT

Different data encoding formats produce dramatically different payload sizes for the same sensor data. Choosing the right format directly impacts bandwidth, power consumption, and battery life.

Scenario: An environmental sensor sends temperature (23.5 C), humidity (67%), and CO2 (412 ppm) readings every 5 minutes over LoRaWAN (max 51-242 byte payload depending on spreading factor).

38.6.1 JSON Encoding (Human-Readable)

{"temp":23.5,"humidity":67,"co2":412,"ts":1706886400}

Size: 53 bytes – exceeds the LoRaWAN SF12 payload limit (51 bytes). With key names consuming most of the space, this format wastes bandwidth on repeated field names that the receiver already knows. For SF12, you would need to shorten key names or drop the timestamp.

38.6.2 CBOR Encoding (Binary, Self-Describing)

A4                        -- map(4)
   64 74656D70            -- text(4) "temp"
   F9 4DF0                -- float16(23.5)
   68 68756D6964697479    -- text(8) "humidity"
   18 43                  -- unsigned(67)
   63 636F32              -- text(3) "co2"
   19 019C                -- unsigned(412)
   62 7473                -- text(2) "ts"
   1A 65BD0500            -- unsigned(1706886400)

Size: 35 bytes – 33% smaller than JSON while remaining self-describing.

38.6.3 Protocol Buffers / Custom Binary (Compact)

Pre-agreed schema: [temp:int16 x10, humidity:uint8, co2:uint16, ts:uint32]

EB 00  43  01 9C  65 BD 05 00

Size: 9 bytes – 83% smaller than JSON. Temperature stored as 235 (23.5 x 10) in a signed 16-bit integer, humidity as single byte, CO2 as 16-bit unsigned, timestamp as 32-bit unsigned.

38.6.4 Encoding Comparison

Format Size Efficiency Self-Describing LoRaWAN SF12 Compatible
JSON 53 bytes 1x (baseline) Yes No (51B limit)
CBOR 35 bytes 1.5x better Yes Yes
Protocol Buffers 14 bytes 3.8x better Schema required Yes
Custom Binary 9 bytes 5.9x better No Yes

38.6.5 Battery Life Impact

For a LoRaWAN sensor transmitting every 5 minutes (288 messages/day) at SF10 (250 bps), the encoding choice directly affects radio on-time and battery life:

JSON (53 bytes = 424 bits): 424 / 250 = 1.70 seconds TX time
Custom Binary (9 bytes = 72 bits): 72 / 250 = 0.29 seconds TX time

At 44 mA TX current and 2000 mAh battery:
JSON: 1.70s x 288 msgs/day x 44mA / 3600 = 5.98 mAh/day → 334 days
Binary: 0.29s x 288 msgs/day x 44mA / 3600 = 1.02 mAh/day → 1,961 days

Battery life ratio: 5.9x longer with binary encoding

Key Insight: For constrained IoT devices, binary encoding is not an optimization – it is a requirement. The 5.9x battery life improvement from switching JSON to binary encoding can mean the difference between annual and multi-year battery replacement in a deployment of thousands of sensors.

Try It: IoT Payload Size and Battery Life Calculator

38.7 Real-World Case Study: Smart Agriculture Payload Design

A precision agriculture company deploys 2,000 soil sensors across 500 acres. Each sensor measures moisture (0-100%), temperature (-20 to 60 C), and electrical conductivity (0-5000 uS/cm) every 15 minutes.

Initial Design (JSON over LoRaWAN):

{"soil_moisture":42.3,"soil_temp":18.7,"ec":1250,"battery":87,"id":"SENS-0847"}
  • Payload: 72 bytes – exceeds LoRaWAN SF10 max (51 bytes)
  • Forced to use SF7 (shorter range, more gateways needed)
  • Required 15 gateways at $1,500 each = $22,500 infrastructure

Optimized Design (Binary Encoding):

Packed binary: [moisture:uint16 x10, temp:int16 x10, ec:uint16, battery:uint8]
01 A6  00 BB  04 E2  57
  • Payload: 7 bytes – fits comfortably in SF12 (longest range)
  • Device ID embedded in LoRaWAN DevEUI (free, no payload cost)
  • Battery percentage in single byte (0-255 maps to 0-100%)
  • Required only 5 gateways = $7,500 infrastructure
  • Annual savings: $15,000 infrastructure + $8,000 fewer battery replacements = $23,000/year

38.8 Decision Framework: Choosing an IoT Data Encoding Format

Selecting the right encoding format requires balancing payload size, decode complexity, and interoperability. This decision table provides quantitative criteria for common IoT scenarios.

Criterion JSON CBOR Protocol Buffers Custom Binary
Payload size (3 sensor values + timestamp) 53 bytes 35 bytes 14 bytes 9 bytes
Decode CPU (ARM Cortex-M0) 2.1 ms 0.8 ms 0.3 ms 0.1 ms
RAM for decoder 4-8 KB 2-4 KB 1-2 KB 0.1-0.5 KB
Schema required No No Yes (.proto file) Yes (custom docs)
Human-readable Yes Partial (diagnostic) No No
Self-describing Yes Yes No No
Standard tooling Excellent Good Excellent None
Cross-platform Universal Growing Excellent Custom only

When to use each format:

  • JSON: During prototyping, for cloud-to-cloud APIs, and when human readability in logs is essential. Avoid for constrained devices transmitting over LPWAN.
  • CBOR: When you need compact self-describing payloads – ideal for CoAP (IETF standard pairing) and heterogeneous device fleets where schema versioning is difficult.
  • Protocol Buffers: For medium-to-large IoT platforms where schema management is feasible, especially with gRPC backends. Google, Uber, and Netflix use this for fleet telemetry.
  • Custom Binary: For extreme-constraint scenarios (LoRaWAN SF12, Class 0 devices with <10 KB RAM). Requires tight coupling between sender and receiver firmware versions.

Real deployment example: Semtech’s reference LoRaWAN sensor uses custom binary encoding with a 7-byte payload (2 bytes temperature, 2 bytes humidity, 2 bytes pressure, 1 byte battery), achieving 18-month battery life at SF10. The same data in JSON would be approximately 85 bytes – exceeding the SF10 payload limit and forcing SF7 (shorter range, more gateways needed).

38.9 Knowledge Check


Common Pitfalls

A JSON-encoded temperature reading ({"t":23.5}) is 10 bytes vs 3 bytes in CBOR. At 100 readings/second from 100 sensors, that is 700 extra bytes/second per sensor — a 70 kbps overhead. Fix: use binary encoding (CBOR, Protobuf, or custom binary) for high-frequency data on constrained links.

Protobuf and similar schema-based formats require both sender and receiver to use the same schema version. Schema mismatches cause silent data corruption or parse failures. Fix: version your schemas and include the schema version in every message, with a compatibility policy for backward and forward compatibility.

A sensor sending a 16-bit temperature value in little-endian format will be misinterpreted by a gateway expecting big-endian (network byte order). Fix: document the byte order explicitly in the data format specification and use struct.pack('>H', value) (Python) or htons() (C) consistently.

38.10 Summary

  • Bits are the foundation of all digital communication, representing data as 1s and 0s
  • 8 bits = 1 byte, with data sizes scaling through KB, MB, GB, TB
  • Network speeds use bits (100 Mb/s) while storage uses bytes (100 MB) – remember to convert (divide by 8)
  • IoT devices are constrained – Class 0/1 devices have only 10-100 KB of memory
  • Protocol overhead matters – small sensor readings can have 90%+ overhead from headers
  • Encoding format choice directly impacts battery life – binary encoding can extend battery life by 5-6x compared to JSON

38.11 What’s Next

Now that you can calculate data sizes, convert between bits and bytes, and select appropriate encoding formats for IoT payloads, the next section explores how this data is packaged into datagrams for transmission across networks. You will learn about packet structure, headers, payloads, and how large data is fragmented into manageable pieces.

Topic Chapter Description
Datagrams and Packet Structure Datagrams and Packet Structure How data is encapsulated into packets with headers and payloads for network transmission
Network Addressing Network Addressing How devices are identified on networks using MAC addresses, IP addresses, and port numbers
Protocol Overhead Protocol Overhead and Efficiency Quantifying header overhead across protocol stacks and strategies to minimize it in IoT
LoRaWAN Protocol LoRaWAN Architecture Deep dive into LoRaWAN spreading factors, payload limits, and duty cycle constraints
CBOR and CoAP CoAP Application Protocol The IETF standard pairing of CoAP and CBOR for constrained IoT devices
Binary Encoding in Practice IoT Data Formats Practical guide to Protocol Buffers, CBOR, and MessagePack in IoT data pipelines