4  Data Representation Fundamentals

In 60 Seconds

Data Representation Fundamentals covers the core principles and practical techniques essential for IoT practitioners. Understanding these concepts enables informed design decisions that balance performance, energy efficiency, and scalability in real-world deployments.

4.1 Learning Objectives

By the end of this topic, you will be able to:

  • Convert between number systems: Translate numbers between binary, decimal, and hexadecimal representations
  • Classify data sizing units: Differentiate bits, bytes, nibbles, and word sizes across processor architectures
  • Compare text encoding schemes: Evaluate ASCII, Unicode, and UTF-8 trade-offs for IoT applications
  • Apply endianness concepts: Identify big-endian vs little-endian byte ordering in cross-platform communication
  • Perform bitwise operations: Implement AND, OR, XOR, shifts, and masks for IoT register manipulation
  • Interpret protocol specifications: Decode hexadecimal values in datasheets, memory dumps, and protocol documentation
  • Troubleshoot data representation errors: Diagnose common bugs caused by sign extension, truncation, and encoding mismatches
Key Concepts
  • Binary is the ground truth: Sensors, registers, protocol fields, and memory dumps all resolve to bit patterns even when tools show them as decimal or hex.
  • Hex is the debugging shortcut: Each hex digit maps to exactly 4 bits, which makes packet dumps, register maps, and device addresses much easier to read than long binary strings.
  • Representation affects interoperability: Signed vs. unsigned types, endianness, and text encoding rules determine whether two systems interpret the same bytes the same way.
  • Compact encoding is often intentional: Constrained devices commonly use fixed-width integers or fixed-point scaling instead of verbose strings or floating-point values.
  • Most bugs are interpretation bugs: Common failures come from dropped leading zeros, wrong byte order, incorrect sign extension, or assuming ASCII when the payload is UTF-8 or binary.
  • Datasheets speak in bytes and bits: Reading real hardware documentation requires comfort with register addresses, masks, field widths, and protocol examples shown in hex.
  • The practical workflow is consistent: Observe the raw bytes, decode structure and byte order, apply scale factors, then verify the result against realistic sensor ranges.

4.2 Chapter Scope (Avoiding Duplicate Deep Dives)

This chapter gives the data representation foundation needed across IoT.

  • Use this chapter to understand number systems, encoding, and byte-level interpretation.
  • Use Data Formats for IoT for format trade-offs (JSON/CBOR/Protobuf).
  • Use Packet Structure and Framing for protocol-level encapsulation and framing details.
MVU: Data Representation in IoT

Core Concept: All IoT data - sensor readings, protocol headers, device addresses, and configuration settings - is ultimately stored and transmitted as patterns of binary digits (0s and 1s), with hexadecimal notation providing a human-readable shorthand where each hex digit represents exactly 4 binary bits.

Why It Matters: When you debug a temperature sensor reading of 0x1A3B over I2C, analyze a LoRaWAN packet starting with 0x40, or configure a device register at address 0xFF, you’re working directly with binary data. Engineers who can’t fluently read hex notation are effectively blind to what their devices are actually doing.

Key Takeaway: Master the “4-bit rule” - every hex digit (0-F) maps to exactly 4 binary bits. Once this becomes second nature, you can instantly decode any datasheet, protocol dump, or error log that shows hex values.

No-One-Left-Behind Practice Loop
  1. Read one concept in decimal language first.
  2. Translate it to binary and hex manually once.
  3. Verify with one real packet/register example.
  4. Reinforce with one quiz or simulation before moving on.
Connect with Learning Hubs

Explore Further:


4.3 Topic Overview

Data representation is the foundation of all IoT programming. Every sensor reading, protocol header, memory address, and configuration value is ultimately stored as patterns of 0s and 1s. Mastering these concepts enables you to read datasheets, debug protocols, and write efficient embedded code.

Scenario: Your temperature sensor (I2C address 0x48) returns garbled readings. Logic analyzer shows: 0x 91 A3.

Step 1: Convert hex to binary

  • 0x91 = 1001 0001 binary
  • 0xA3 = 1010 0011 binary

Step 2: Understand sensor datasheet format Datasheet says: “Temperature = 12-bit value, MSB first, in bits [15:4], lower 4 bits unused”

Step 3: Extract temperature bits

  • Combine bytes: 1001000110100011 (16 bits)
  • Take bits [15:4]: 100100011010 (12 bits)
  • Convert to decimal: 2330

Step 4: Apply scale factor Datasheet: “Temperature (°C) = raw_value × 0.0625” - 2330 × 0.0625 = 145.625°C

Problem identified: This is impossible for room temperature! The sensor is actually using two’s complement for negative values, and bit 15 (the sign bit) is 1.

Correct interpretation (two’s complement): - Raw value is negative: ~2330 + 1 = -1766 (in 12-bit range) - Temperature: -1766 × 0.0625 = -110.375°C

Root cause: Forgot to connect sensor VDD power pin, sensor was reading its own internal voltage drift as extreme cold.

Key lesson: Understanding binary/hex representation is essential for hardware debugging. Without this knowledge, you can’t interpret sensor outputs correctly.

Common Mistake: Confusing Hex Digits with Decimal

Problem: Engineer reads datasheet showing I2C address 0x50 and writes code to communicate with decimal address 50.

Why this fails:

  • 0x50 (hexadecimal) = 80 in decimal
  • Decimal 50 = 0x32 (hex)
  • Device at address 0x50 never responds because code is addressing wrong device

Real-world example: A student spent 6 hours debugging an EEPROM communication issue before realizing they used decimal 36 instead of hex 0x36 (54 decimal). The EEPROM was at address 54, not 36.

Fix: Always use 0x prefix for hex in code. If datasheet shows hex without prefix, add it: Datasheet “Address: 50h” → Code: 0x50.

What to observe: In debug output, verify addresses match datasheet: I2C address: 0x50 (80 decimal). If unsure, use a calculator app in programmer mode.

4.4 Concept Relationships

This Concept Builds On Leads To Contrasts With
Binary Boolean logic, digital circuits Hexadecimal, bitwise operations Decimal (base-10) human counting
Hexadecimal Binary representation Protocol specifications, memory addresses Base-10 arithmetic
Bitwise Operations Binary fundamentals Embedded programming, register configuration Arithmetic operations
Endianness Binary byte ordering Network byte order, protocol parsing Architecture-agnostic formats
Text Encoding Character sets String handling, internationalization Binary data encoding
See Also

Next Steps:

Protocol Applications:

Tools:


The diagram below shows how the same value can be written in three different ways - just like how “twelve” and “12” and “XII” (Roman numerals) all mean the same number!

Diagram showing the three number systems used in IoT: decimal (base 10), binary (base 2), and hexadecimal (base 16), with arrows illustrating how the same value converts between each system

Diagram showing the three number systems used in IoT: decimal (base 10), binary (base 2), and hexadecimal (base 16), with arrows illustrating how the same value converts between each system

Figure 1: The Three Number Systems in IoT - Decimal is familiar to humans, binary is native to computers, and hexadecimal serves as the practical bridge between them. Each hex digit represents exactly 4 binary bits, making conversions quick and intuitive.

This topic is divided into three focused chapters:

4.4.1 Number Systems and Data Units

Difficulty: Beginner | ~18 minutes

Learn the three number systems essential for IoT development:

  • Binary (Base 2): How computers store and process all data
  • Decimal (Base 10): Human-friendly representation for debugging
  • Hexadecimal (Base 16): Compact notation where each digit = 4 binary bits
  • Bits, Bytes, Words: Understanding data sizing and processor architectures
  • Conversion Practice: Step-by-step methods for translating between systems

Start with Number Systems →


4.4.2 Text Encoding for IoT

Difficulty: Intermediate | ~10 minutes

Understand how text becomes numbers in IoT systems:

  • ASCII: The original 7-bit standard (128 characters)
  • Unicode and UTF-8: Universal character encoding for all languages
  • Data Format Efficiency: Comparing JSON, CBOR, and binary encodings
  • Common Pitfalls: Buffer sizing, encoding mismatches, and garbled text

Continue to Text Encoding →


4.4.3 Bitwise Operations and Endianness

Difficulty: Intermediate | ~21 minutes

Master the low-level skills for hardware programming:

  • Endianness: Big-endian (network) vs little-endian (Intel/ARM) byte ordering
  • Bitwise AND, OR, XOR, NOT: Fundamental operations with truth tables
  • Bit Manipulation Patterns: Setting, clearing, toggling, and checking bits
  • Real-World Examples: Sensor status bytes, LoRaWAN flag packing, cross-platform bugs
  • Python struct Module: Binary data parsing for server-side code

Master Bitwise Operations →


4.5 IoT Data Representation Ecosystem

The following diagram illustrates how data representation concepts connect across the entire IoT stack - from physical sensors to cloud analytics:

Flowchart of the IoT data representation pipeline showing data flowing from analog sensors through ADC conversion, device registers, network protocols, and cloud storage, with each stage labeled by its data format

Flowchart of the IoT data representation pipeline showing data flowing from analog sensors through ADC conversion, device registers, network protocols, and cloud storage, with each stage labeled by its data format

Figure 2: The IoT Data Representation Pipeline - Data flows from analog sensors through digital conversion (ADC), device registers, network protocols, and finally to cloud storage. Each stage uses specific data representation formats: binary for hardware, hexadecimal for debugging, UTF-8/CBOR for network encoding.


4.6 Common Pitfalls

Avoid These Data Representation Mistakes

Pitfall 1: Ignoring Endianness (and Strict Aliasing)

// WRONG: Assumes same byte order AND violates strict aliasing
// Casting buffer (char*) to uint16_t* is undefined behavior in C
uint16_t value = *(uint16_t*)buffer;

// RIGHT: Use memcpy to avoid aliasing issues, then convert byte order
uint16_t value;
memcpy(&value, buffer, sizeof(value));
value = ntohs(value);

Network protocols use big-endian (MSB first), but most ARM/Intel processors use little-endian. Always convert explicitly. Additionally, casting between pointer types violates C’s strict aliasing rule – use memcpy for type-safe byte reinterpretation.

Pitfall 2: Truncating Hex Values

# WRONG: Loses leading zeros
hex_str = hex(15)  # Returns '0xf' not '0x0f'

# RIGHT: Format with leading zeros
hex_str = f'0x{15:02x}'  # Returns '0x0f'

Sensor protocols often expect fixed-width hex values. Dropping leading zeros causes parsing errors.

Pitfall 3: Character Encoding Mismatches

# WRONG: Assumes ASCII for all text
data = sensor_name.encode()  # May fail for non-ASCII characters

# RIGHT: Specify encoding explicitly
data = sensor_name.encode('utf-8')

IoT devices may have names with special characters. Always specify UTF-8 encoding.

Pitfall 4: Forgetting Sign Extension

uint8_t sensor_byte = 0xF6;  // Raw byte from sensor (-10 as signed)

// WRONG: Widening uint8_t directly zero-extends instead of sign-extending
int16_t temp = (int16_t)sensor_byte;  // Gets 246, not -10!

// RIGHT: Cast through int8_t first to trigger sign extension
int16_t temp = (int8_t)sensor_byte;   // Gets -10 (correct)

Temperature sensors often send signed 8-bit values. When the raw byte is stored as uint8_t, casting directly to a wider type zero-extends (0xF6 becomes 246). Cast through int8_t first so the compiler sign-extends the value correctly.

Quick Check: Test your understanding of endianness and byte-order conversion:


4.7 Quick Reference: Number System Comparison

The table below summarizes when and why to use each number system in IoT development:

Number System Base Digits Used IoT Use Cases Example
Decimal 10 0-9 Sensor readings, user displays, calculations Temperature: 25 degrees
Binary 2 0, 1 Bit flags, hardware registers, protocol fields Status: 00011001
Hexadecimal 16 0-9, A-F Memory addresses, MAC addresses, color codes Device ID: 0xA3F2
Interactive Number System Converter

Try converting between number systems interactively. Enter a decimal value and see the binary and hexadecimal equivalents instantly.


4.8 Prerequisites

This topic is an early foundation. You can start with:

  • General familiarity with decimal numbers and basic arithmetic
  • No prior IoT, electronics, or programming experience required

4.10 Key Takeaway

In one sentence: All IoT data - sensor readings, addresses, protocol headers - is ultimately binary, and mastering hex notation lets you read it efficiently since each hex digit represents exactly 4 binary bits.

Remember this rule: When debugging IoT systems, think in hex (0x prefix) for compact notation, but remember endianness matters - network protocols use big-endian (MSB first) while most ARM/Intel processors use little-endian (LSB first).


4.11 Try It Yourself: Decode a Sensor Packet

Hands-On Exercise: Parse a Real IoT Sensor Reading

You receive this raw data from a temperature/humidity sensor over I2C:

Raw bytes: 0x01 0x5E 0x02 0x8A

Your task: Decode the temperature and humidity values.

Protocol specification (from the sensor datasheet): - Bytes 0-1: Temperature (big-endian, unsigned 16-bit, value = raw / 10.0 °C) - Bytes 2-3: Humidity (big-endian, unsigned 16-bit, value = raw / 10.0 %)

Step 1: Combine Temperature Bytes

Temperature bytes: 0x01 0x5E (big-endian)

Combine: 0x015E = (1 × 256) + (94 × 1) = 350 raw
Step 2: Convert to Celsius Temperature = 350 / 10.0 = 35.0 °C
Step 3: Combine Humidity Bytes

Humidity bytes: 0x02 0x8A (big-endian)

Combine: 0x028A = (2 × 256) + (138 × 1) = 650 raw
Step 4: Convert to Percentage Humidity = 650 / 10.0 = 65.0 %
Complete Python Solution
import struct

# Raw bytes from sensor
raw_data = bytes([0x01, 0x5E, 0x02, 0x8A])

# Unpack as two big-endian unsigned 16-bit values
temp_raw, humid_raw = struct.unpack('>HH', raw_data)

# Convert to physical units
temperature = temp_raw / 10.0  # 35.0 °C
humidity = humid_raw / 10.0    # 65.0 %

print(f"Temperature: {temperature}°C")
print(f"Humidity: {humidity}%")

Output:

Temperature: 35.0°C
Humidity: 65.0%

Try different sensor values interactively:

Putting Numbers to It

For a sensor hub reading 50 temperature/humidity sensors every 10 seconds over I2C:

Daily transactions = 50 sensors x (86,400 seconds / 10-second interval) = 432,000 sensor reads per day

Each transaction transfers 4 bytes. Assuming the MCU processes at 80 MHz and each byte takes ~20 clock cycles (I2C overhead + byte parsing):

CPU time per day = 432,000 transactions x 4 bytes x (20 cycles / 80,000,000 cycles per second) = 0.432 seconds

That’s only 0.0005% of the MCU’s time! But if you misinterpret endianness and add byte-swap logic on every read, you double the processing time (now 40 cycles/byte), extending active time from 0.432 to 0.864 seconds daily. For battery-powered hubs running at 10 mA during processing, this wastes 0.0024 mAh per day (0.864s × 10mA / 3600s), or 0.88 mAh/year—negligible for one device with a 2,000 mAh battery, but multiplied across 10,000 devices in a smart building deployment, that’s 8,800 mAh wasted annually, equivalent to 4.4 extra battery replacements.


4.12 For Beginners: Why Should I Care About Binary and Hex?

Scenario 1: Debugging a Temperature Sensor

You connect a DHT22 temperature sensor to your ESP32, but the readings look wrong. The serial monitor shows: Raw data: 0x01 0x90 0x00 0xC8

Without understanding hex and binary, this is meaningless. With this knowledge, you can decode it: - 0x0190 = 400 in decimal = 40.0 degrees Celsius - 0x00C8 = 200 in decimal = 20.0% relative humidity (wait, that seems too low…)

Now you know to check your humidity calculation!

Scenario 2: Reading a Datasheet

The datasheet for your accelerometer says: “Write 0x2D to register 0x1F to enable measurements.”

If you don’t understand hex, you can’t configure the sensor. With this knowledge, you know: - Register address: 0x1F = 31 in decimal - Configuration value: 0x2D = 00101101 in binary, where each bit enables different features

Scenario 3: Analyzing Network Traffic

Your IoT device sends a packet: 48 65 6C 6C 6F

Using an ASCII table (covered in the text encoding chapter), you can decode this as “Hello” - crucial for debugging communication issues.


4.13 Learning Path Visualization

The diagram below shows how the three chapters in this topic build upon each other:

Learning path diagram with three connected boxes: Number Systems (foundation) leads to Text Encoding (intermediate) leads to Bitwise Operations and Endianness (advanced)

Learning path diagram with three connected boxes: Number Systems (foundation) leads to Text Encoding (intermediate) leads to Bitwise Operations and Endianness (advanced)

Figure 3: Learning path through data representation - Start with number systems (the foundation), then learn how text is encoded as numbers, and finally master the bit-level operations used in real IoT protocols.


4.14 Knowledge Check

Test your understanding of data representation fundamentals before diving into the detailed chapters.

Question 1: What is the hexadecimal representation of the binary number 11110000?

    1. 0xF0
    1. 0x0F
    1. 0xFF
    1. 0x240
Show Answer

Correct: A) 0xF0

Split the binary into groups of 4 bits: - 1111 = F (15 in decimal) - 0000 = 0

So 11110000 = 0xF0

Question 2: If a sensor sends the value 0x2A over I2C, what decimal number is this?

    1. 2
    1. 10
    1. 42
    1. 26
Show Answer

Correct: C) 42

0x2A = (2 x 16) + (A x 1) = (2 x 16) + (10 x 1) = 32 + 10 = 42

Fun fact: This is also “the answer to life, the universe, and everything” from The Hitchhiker’s Guide to the Galaxy!

Question 3: Why do IoT engineers prefer hexadecimal over decimal when reading memory dumps?

    1. Hexadecimal uses fewer digits
    1. Each hex digit maps exactly to 4 binary bits
    1. Hexadecimal is easier to pronounce
    1. Computers internally use hexadecimal
Show Answer

Correct: B) Each hex digit maps exactly to 4 binary bits

This 4-bit mapping makes it trivial to convert between hex and binary mentally. For example, 0xB7 immediately tells you the binary is 1011 0111 - just replace each hex digit with its 4-bit equivalent.

Question 4: A LoRaWAN device sends 0x1A2B as a 16-bit value. On a little-endian processor, which byte is stored at the lower memory address?

    1. 0x1A (the most significant byte)
    1. 0x2B (the least significant byte)
    1. Both bytes are stored at the same address
    1. It depends on the compiler
Show Answer

Correct: B) 0x2B (the least significant byte)

In little-endian systems (Intel, ARM), the least significant byte (LSB) is stored first (at the lower address). So 0x1A2B is stored as: - Address 0: 0x2B (LSB) - Address 1: 0x1A (MSB)

This is why cross-platform IoT communication must explicitly handle byte order conversion.

Question 5: Which bitwise operation would you use to check if bit 3 (value 8) is set in a status register?

    1. status | 0x08
    1. status & 0x08
    1. status ^ 0x08
    1. status >> 3
Show Answer

Correct: B) status & 0x08

The AND operation with a mask isolates specific bits: - If bit 3 is set: status & 0x08 returns 0x08 (non-zero/true) - If bit 3 is clear: status & 0x08 returns 0x00 (zero/false)

Common patterns: AND to check/clear, OR to set, XOR to toggle.

Question 6: A UTF-8 encoded string “°C” (degree symbol + C) requires how many bytes?

    1. 2 bytes (one per character)
    1. 3 bytes (2 for ° + 1 for C)
    1. 4 bytes (2 for each character)
    1. 1 byte (ASCII compatible)
Show Answer

Correct: B) 3 bytes (2 for ° + 1 for C)

In UTF-8: - The degree symbol (°, U+00B0) requires 2 bytes: 0xC2 0xB0 - The letter C (ASCII) requires 1 byte: 0x43 - Total: 3 bytes

This is why buffer sizing for UTF-8 text must account for multi-byte characters - a 10-character string might need up to 40 bytes!

Quick Check: Test your understanding of bandwidth-efficient data encoding:


Quick Check: Match each data representation concept to its correct definition:


Quick Check: Place the steps of converting a hexadecimal sensor reading to a physical temperature value in the correct order:

4.15 What’s Next

Topic Chapter Description
Number Systems Number Systems and Data Units Convert between binary, decimal, and hexadecimal; classify bits, bytes, and word sizes
Text Encoding Text Encoding for IoT Compare ASCII, Unicode, and UTF-8 encoding schemes for IoT string handling
Bitwise Operations Bitwise Operations and Endianness Implement AND, OR, XOR, shifts, and masks; apply endianness conversion
Packet Framing Packet Structure and Framing Examine how binary data is packaged into protocol headers and payloads
Data Formats Data Formats for IoT Evaluate JSON, CBOR, and Protobuf trade-offs for constrained devices
Sensor Pipeline Sensor to Network Pipeline Trace data flow from analog sensor readings through digital encoding to cloud storage

Place the five labels from the highest-level application view down to the raw bit-level representation.

4.16 Summary

Data representation is the invisible foundation that makes all IoT systems work. Every smart device - from a simple temperature sensor to a complex autonomous vehicle - relies on the same fundamental concepts:

  • Binary is the native language of all digital systems
  • Hexadecimal provides a compact, human-readable way to work with binary data
  • Text encoding (ASCII, Unicode) maps human-readable characters to numbers
  • Endianness determines how multi-byte values are stored and transmitted
  • Bitwise operations enable efficient manipulation of individual bits and flags

Mastering these concepts transforms you from someone who copies code samples into someone who truly understands what their IoT devices are doing at the most fundamental level.