4 Data Representation Fundamentals
4.1 Learning Objectives
By the end of this topic, you will be able to:
- Convert between number systems: Translate numbers between binary, decimal, and hexadecimal representations
- Classify data sizing units: Differentiate bits, bytes, nibbles, and word sizes across processor architectures
- Compare text encoding schemes: Evaluate ASCII, Unicode, and UTF-8 trade-offs for IoT applications
- Apply endianness concepts: Identify big-endian vs little-endian byte ordering in cross-platform communication
- Perform bitwise operations: Implement AND, OR, XOR, shifts, and masks for IoT register manipulation
- Interpret protocol specifications: Decode hexadecimal values in datasheets, memory dumps, and protocol documentation
- Troubleshoot data representation errors: Diagnose common bugs caused by sign extension, truncation, and encoding mismatches
Key Concepts
- Binary is the ground truth: Sensors, registers, protocol fields, and memory dumps all resolve to bit patterns even when tools show them as decimal or hex.
- Hex is the debugging shortcut: Each hex digit maps to exactly 4 bits, which makes packet dumps, register maps, and device addresses much easier to read than long binary strings.
- Representation affects interoperability: Signed vs. unsigned types, endianness, and text encoding rules determine whether two systems interpret the same bytes the same way.
- Compact encoding is often intentional: Constrained devices commonly use fixed-width integers or fixed-point scaling instead of verbose strings or floating-point values.
- Most bugs are interpretation bugs: Common failures come from dropped leading zeros, wrong byte order, incorrect sign extension, or assuming ASCII when the payload is UTF-8 or binary.
- Datasheets speak in bytes and bits: Reading real hardware documentation requires comfort with register addresses, masks, field widths, and protocol examples shown in hex.
- The practical workflow is consistent: Observe the raw bytes, decode structure and byte order, apply scale factors, then verify the result against realistic sensor ranges.
4.2 Chapter Scope (Avoiding Duplicate Deep Dives)
This chapter gives the data representation foundation needed across IoT.
- Use this chapter to understand number systems, encoding, and byte-level interpretation.
- Use Data Formats for IoT for format trade-offs (JSON/CBOR/Protobuf).
- Use Packet Structure and Framing for protocol-level encapsulation and framing details.
MVU: Data Representation in IoT
Core Concept: All IoT data - sensor readings, protocol headers, device addresses, and configuration settings - is ultimately stored and transmitted as patterns of binary digits (0s and 1s), with hexadecimal notation providing a human-readable shorthand where each hex digit represents exactly 4 binary bits.
Why It Matters: When you debug a temperature sensor reading of 0x1A3B over I2C, analyze a LoRaWAN packet starting with 0x40, or configure a device register at address 0xFF, you’re working directly with binary data. Engineers who can’t fluently read hex notation are effectively blind to what their devices are actually doing.
Key Takeaway: Master the “4-bit rule” - every hex digit (0-F) maps to exactly 4 binary bits. Once this becomes second nature, you can instantly decode any datasheet, protocol dump, or error log that shows hex values.
No-One-Left-Behind Practice Loop
- Read one concept in decimal language first.
- Translate it to binary and hex manually once.
- Verify with one real packet/register example.
- Reinforce with one quiz or simulation before moving on.
Connect with Learning Hubs
Explore Further:
- Knowledge Map: See where data representation fits in the IoT landscape at Knowledge Map
- Simulations: Try the Protocol Comparison Tool to see format overhead differences
- Quizzes: Test your understanding with Fundamentals Quizzes
- Videos: Watch encoding/decoding demonstrations in the Video Gallery
4.3 Topic Overview
Data representation is the foundation of all IoT programming. Every sensor reading, protocol header, memory address, and configuration value is ultimately stored as patterns of 0s and 1s. Mastering these concepts enables you to read datasheets, debug protocols, and write efficient embedded code.
Worked Example: Debugging a Sensor I2C Communication
Scenario: Your temperature sensor (I2C address 0x48) returns garbled readings. Logic analyzer shows: 0x 91 A3.
Step 1: Convert hex to binary
0x91=1001 0001binary0xA3=1010 0011binary
Step 2: Understand sensor datasheet format Datasheet says: “Temperature = 12-bit value, MSB first, in bits [15:4], lower 4 bits unused”
Step 3: Extract temperature bits
- Combine bytes:
1001000110100011(16 bits) - Take bits [15:4]:
100100011010(12 bits) - Convert to decimal: 2330
Step 4: Apply scale factor Datasheet: “Temperature (°C) = raw_value × 0.0625” - 2330 × 0.0625 = 145.625°C
Problem identified: This is impossible for room temperature! The sensor is actually using two’s complement for negative values, and bit 15 (the sign bit) is 1.
Correct interpretation (two’s complement): - Raw value is negative: ~2330 + 1 = -1766 (in 12-bit range) - Temperature: -1766 × 0.0625 = -110.375°C
Root cause: Forgot to connect sensor VDD power pin, sensor was reading its own internal voltage drift as extreme cold.
Key lesson: Understanding binary/hex representation is essential for hardware debugging. Without this knowledge, you can’t interpret sensor outputs correctly.
Common Mistake: Confusing Hex Digits with Decimal
Problem: Engineer reads datasheet showing I2C address 0x50 and writes code to communicate with decimal address 50.
Why this fails:
0x50(hexadecimal) = 80 in decimal- Decimal 50 =
0x32(hex) - Device at address
0x50never responds because code is addressing wrong device
Real-world example: A student spent 6 hours debugging an EEPROM communication issue before realizing they used decimal 36 instead of hex 0x36 (54 decimal). The EEPROM was at address 54, not 36.
Fix: Always use 0x prefix for hex in code. If datasheet shows hex without prefix, add it: Datasheet “Address: 50h” → Code: 0x50.
What to observe: In debug output, verify addresses match datasheet: I2C address: 0x50 (80 decimal). If unsure, use a calculator app in programmer mode.
4.4 Concept Relationships
| This Concept | Builds On | Leads To | Contrasts With |
|---|---|---|---|
| Binary | Boolean logic, digital circuits | Hexadecimal, bitwise operations | Decimal (base-10) human counting |
| Hexadecimal | Binary representation | Protocol specifications, memory addresses | Base-10 arithmetic |
| Bitwise Operations | Binary fundamentals | Embedded programming, register configuration | Arithmetic operations |
| Endianness | Binary byte ordering | Network byte order, protocol parsing | Architecture-agnostic formats |
| Text Encoding | Character sets | String handling, internationalization | Binary data encoding |
See Also
Next Steps:
- Data Formats for IoT - Apply binary/hex to real format choices (JSON, CBOR, Protobuf)
- Packet Anatomy - See how binary data forms packet headers and payloads
- Bitwise Operations and Endianness - Deep dive into AND, OR, XOR, shifts, and masks
Protocol Applications:
- I2C Protocol - 7-bit addressing and hex register access
- SPI Communication - Binary commands and register writes
- LoRaWAN Network Architecture - Device identity and LPWAN framing context
Tools:
- Hex-to-Binary Converter - Online conversion tool
- IEEE 754 Floating Point Visualizer - See binary float representation
- ASCII Table Reference - Character encoding lookup
The diagram below shows how the same value can be written in three different ways - just like how “twelve” and “12” and “XII” (Roman numerals) all mean the same number!
Figure 1: The Three Number Systems in IoT - Decimal is familiar to humans, binary is native to computers, and hexadecimal serves as the practical bridge between them. Each hex digit represents exactly 4 binary bits, making conversions quick and intuitive.
This topic is divided into three focused chapters:
4.4.1 Number Systems and Data Units
Difficulty: Beginner | ~18 minutes
Learn the three number systems essential for IoT development:
- Binary (Base 2): How computers store and process all data
- Decimal (Base 10): Human-friendly representation for debugging
- Hexadecimal (Base 16): Compact notation where each digit = 4 binary bits
- Bits, Bytes, Words: Understanding data sizing and processor architectures
- Conversion Practice: Step-by-step methods for translating between systems
4.4.2 Text Encoding for IoT
Difficulty: Intermediate | ~10 minutes
Understand how text becomes numbers in IoT systems:
- ASCII: The original 7-bit standard (128 characters)
- Unicode and UTF-8: Universal character encoding for all languages
- Data Format Efficiency: Comparing JSON, CBOR, and binary encodings
- Common Pitfalls: Buffer sizing, encoding mismatches, and garbled text
4.4.3 Bitwise Operations and Endianness
Difficulty: Intermediate | ~21 minutes
Master the low-level skills for hardware programming:
- Endianness: Big-endian (network) vs little-endian (Intel/ARM) byte ordering
- Bitwise AND, OR, XOR, NOT: Fundamental operations with truth tables
- Bit Manipulation Patterns: Setting, clearing, toggling, and checking bits
- Real-World Examples: Sensor status bytes, LoRaWAN flag packing, cross-platform bugs
- Python struct Module: Binary data parsing for server-side code
4.5 IoT Data Representation Ecosystem
The following diagram illustrates how data representation concepts connect across the entire IoT stack - from physical sensors to cloud analytics:
Figure 2: The IoT Data Representation Pipeline - Data flows from analog sensors through digital conversion (ADC), device registers, network protocols, and finally to cloud storage. Each stage uses specific data representation formats: binary for hardware, hexadecimal for debugging, UTF-8/CBOR for network encoding.
4.6 Common Pitfalls
Avoid These Data Representation Mistakes
Pitfall 1: Ignoring Endianness (and Strict Aliasing)
// WRONG: Assumes same byte order AND violates strict aliasing
// Casting buffer (char*) to uint16_t* is undefined behavior in C
uint16_t value = *(uint16_t*)buffer;
// RIGHT: Use memcpy to avoid aliasing issues, then convert byte order
uint16_t value;
memcpy(&value, buffer, sizeof(value));
value = ntohs(value);Network protocols use big-endian (MSB first), but most ARM/Intel processors use little-endian. Always convert explicitly. Additionally, casting between pointer types violates C’s strict aliasing rule – use memcpy for type-safe byte reinterpretation.
Pitfall 2: Truncating Hex Values
# WRONG: Loses leading zeros
hex_str = hex(15) # Returns '0xf' not '0x0f'
# RIGHT: Format with leading zeros
hex_str = f'0x{15:02x}' # Returns '0x0f'Sensor protocols often expect fixed-width hex values. Dropping leading zeros causes parsing errors.
Pitfall 3: Character Encoding Mismatches
# WRONG: Assumes ASCII for all text
data = sensor_name.encode() # May fail for non-ASCII characters
# RIGHT: Specify encoding explicitly
data = sensor_name.encode('utf-8')IoT devices may have names with special characters. Always specify UTF-8 encoding.
Pitfall 4: Forgetting Sign Extension
uint8_t sensor_byte = 0xF6; // Raw byte from sensor (-10 as signed)
// WRONG: Widening uint8_t directly zero-extends instead of sign-extending
int16_t temp = (int16_t)sensor_byte; // Gets 246, not -10!
// RIGHT: Cast through int8_t first to trigger sign extension
int16_t temp = (int8_t)sensor_byte; // Gets -10 (correct)Temperature sensors often send signed 8-bit values. When the raw byte is stored as uint8_t, casting directly to a wider type zero-extends (0xF6 becomes 246). Cast through int8_t first so the compiler sign-extends the value correctly.
Quick Check: Test your understanding of endianness and byte-order conversion:
4.7 Quick Reference: Number System Comparison
The table below summarizes when and why to use each number system in IoT development:
| Number System | Base | Digits Used | IoT Use Cases | Example |
|---|---|---|---|---|
| Decimal | 10 | 0-9 | Sensor readings, user displays, calculations | Temperature: 25 degrees |
| Binary | 2 | 0, 1 | Bit flags, hardware registers, protocol fields | Status: 00011001 |
| Hexadecimal | 16 | 0-9, A-F | Memory addresses, MAC addresses, color codes | Device ID: 0xA3F2 |
Interactive Number System Converter
Try converting between number systems interactively. Enter a decimal value and see the binary and hexadecimal equivalents instantly.
4.8 Prerequisites
This topic is an early foundation. You can start with:
- General familiarity with decimal numbers and basic arithmetic
- No prior IoT, electronics, or programming experience required
4.10 Key Takeaway
In one sentence: All IoT data - sensor readings, addresses, protocol headers - is ultimately binary, and mastering hex notation lets you read it efficiently since each hex digit represents exactly 4 binary bits.
Remember this rule: When debugging IoT systems, think in hex (0x prefix) for compact notation, but remember endianness matters - network protocols use big-endian (MSB first) while most ARM/Intel processors use little-endian (LSB first).
4.11 Try It Yourself: Decode a Sensor Packet
Hands-On Exercise: Parse a Real IoT Sensor Reading
You receive this raw data from a temperature/humidity sensor over I2C:
Raw bytes: 0x01 0x5E 0x02 0x8A
Your task: Decode the temperature and humidity values.
Protocol specification (from the sensor datasheet): - Bytes 0-1: Temperature (big-endian, unsigned 16-bit, value = raw / 10.0 °C) - Bytes 2-3: Humidity (big-endian, unsigned 16-bit, value = raw / 10.0 %)
Step 1: Combine Temperature Bytes
Temperature bytes: 0x01 0x5E (big-endian)
0x015E = (1 × 256) + (94 × 1) = 350 raw
Step 2: Convert to Celsius
Temperature = 350 / 10.0 = 35.0 °CStep 3: Combine Humidity Bytes
Humidity bytes: 0x02 0x8A (big-endian)
0x028A = (2 × 256) + (138 × 1) = 650 raw
Step 4: Convert to Percentage
Humidity = 650 / 10.0 = 65.0 %Complete Python Solution
import struct
# Raw bytes from sensor
raw_data = bytes([0x01, 0x5E, 0x02, 0x8A])
# Unpack as two big-endian unsigned 16-bit values
temp_raw, humid_raw = struct.unpack('>HH', raw_data)
# Convert to physical units
temperature = temp_raw / 10.0 # 35.0 °C
humidity = humid_raw / 10.0 # 65.0 %
print(f"Temperature: {temperature}°C")
print(f"Humidity: {humidity}%")Output:
Temperature: 35.0°C
Humidity: 65.0%
Try different sensor values interactively:
Putting Numbers to It
For a sensor hub reading 50 temperature/humidity sensors every 10 seconds over I2C:
Daily transactions = 50 sensors x (86,400 seconds / 10-second interval) = 432,000 sensor reads per day
Each transaction transfers 4 bytes. Assuming the MCU processes at 80 MHz and each byte takes ~20 clock cycles (I2C overhead + byte parsing):
CPU time per day = 432,000 transactions x 4 bytes x (20 cycles / 80,000,000 cycles per second) = 0.432 seconds
That’s only 0.0005% of the MCU’s time! But if you misinterpret endianness and add byte-swap logic on every read, you double the processing time (now 40 cycles/byte), extending active time from 0.432 to 0.864 seconds daily. For battery-powered hubs running at 10 mA during processing, this wastes 0.0024 mAh per day (0.864s × 10mA / 3600s), or 0.88 mAh/year—negligible for one device with a 2,000 mAh battery, but multiplied across 10,000 devices in a smart building deployment, that’s 8,800 mAh wasted annually, equivalent to 4.4 extra battery replacements.
4.12 For Beginners: Why Should I Care About Binary and Hex?
Real-World Scenarios Where This Matters
Scenario 1: Debugging a Temperature Sensor
You connect a DHT22 temperature sensor to your ESP32, but the readings look wrong. The serial monitor shows: Raw data: 0x01 0x90 0x00 0xC8
Without understanding hex and binary, this is meaningless. With this knowledge, you can decode it: - 0x0190 = 400 in decimal = 40.0 degrees Celsius - 0x00C8 = 200 in decimal = 20.0% relative humidity (wait, that seems too low…)
Now you know to check your humidity calculation!
Scenario 2: Reading a Datasheet
The datasheet for your accelerometer says: “Write 0x2D to register 0x1F to enable measurements.”
If you don’t understand hex, you can’t configure the sensor. With this knowledge, you know: - Register address: 0x1F = 31 in decimal - Configuration value: 0x2D = 00101101 in binary, where each bit enables different features
Scenario 3: Analyzing Network Traffic
Your IoT device sends a packet: 48 65 6C 6C 6F
Using an ASCII table (covered in the text encoding chapter), you can decode this as “Hello” - crucial for debugging communication issues.
4.13 Learning Path Visualization
The diagram below shows how the three chapters in this topic build upon each other:
Figure 3: Learning path through data representation - Start with number systems (the foundation), then learn how text is encoded as numbers, and finally master the bit-level operations used in real IoT protocols.
4.14 Knowledge Check
Test your understanding of data representation fundamentals before diving into the detailed chapters.
4.15 What’s Next
| Topic | Chapter | Description |
|---|---|---|
| Number Systems | Number Systems and Data Units | Convert between binary, decimal, and hexadecimal; classify bits, bytes, and word sizes |
| Text Encoding | Text Encoding for IoT | Compare ASCII, Unicode, and UTF-8 encoding schemes for IoT string handling |
| Bitwise Operations | Bitwise Operations and Endianness | Implement AND, OR, XOR, shifts, and masks; apply endianness conversion |
| Packet Framing | Packet Structure and Framing | Examine how binary data is packaged into protocol headers and payloads |
| Data Formats | Data Formats for IoT | Evaluate JSON, CBOR, and Protobuf trade-offs for constrained devices |
| Sensor Pipeline | Sensor to Network Pipeline | Trace data flow from analog sensor readings through digital encoding to cloud storage |
4.16 Summary
Data representation is the invisible foundation that makes all IoT systems work. Every smart device - from a simple temperature sensor to a complex autonomous vehicle - relies on the same fundamental concepts:
- Binary is the native language of all digital systems
- Hexadecimal provides a compact, human-readable way to work with binary data
- Text encoding (ASCII, Unicode) maps human-readable characters to numbers
- Endianness determines how multi-byte values are stored and transmitted
- Bitwise operations enable efficient manipulation of individual bits and flags
Mastering these concepts transforms you from someone who copies code samples into someone who truly understands what their IoT devices are doing at the most fundamental level.