16  Error Detection: Checksums and CRC

In 60 Seconds

Error detection adds a calculated value (checksum or CRC) to packets so receivers can verify data integrity. Simple checksums (add all bytes) are fast but weak – they miss transposed bytes. CRC uses polynomial division to catch 99.9999% of errors, making it the standard choice for Ethernet, USB, LoRaWAN, and safety-critical IoT systems.

16.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Calculate checksums and CRC values: Compute 8-bit checksums by hand and trace the XOR-based polynomial division used by CRC-16
  • Differentiate checksum from CRC: Classify error types (single-bit, burst, transposition) that each method detects or misses
  • Evaluate detection strength quantitatively: Compare undetected-error probabilities for 8-bit checksum, CRC-16, and CRC-32 in a given deployment scenario
  • Select an error-detection scheme: Justify the choice of checksum, CRC-16, or CRC-32 based on channel noise, safety requirements, and device constraints
  • Diagnose corrupted packets: Parse a hex dump, verify the CRC or checksum, and pinpoint the likely corruption source

When data travels wirelessly, bits can get flipped by interference – like static on a phone call garbling words. Error detection adds a small mathematical “fingerprint” to each message so the receiver can check if anything got corrupted along the way. It is similar to how a cashier adds up your items and checks the total – if the numbers do not match, something went wrong. This chapter covers the two main techniques: simple checksums and the more powerful CRC used in nearly all modern networks.

Prerequisites (Read These First):

Companion Chapters (Packet Structure Series):

Security:


16.2 Error Detection: Checksums and CRC

Time: ~9 min | Difficulty: Intermediate | Unit: P02.C02.U03

Key Concepts

  • Integrity check: The sender calculates a value from the payload and trailer fields so the receiver can recompute it and reject corrupted frames.
  • Key metric: Detection strength is measured by what kinds of corruption go unnoticed, especially burst errors, bit flips, and byte transpositions.
  • Main trade-off: Simple checksums are cheap to compute but weak against structured errors, while CRCs cost more bits and logic but catch far more real channel faults.
  • Protocol pattern: Lightweight protocols may use addition-based checksums, but most modern link layers and industrial buses rely on CRC-16 or CRC-32.
  • Deployment consideration: The value of stronger error detection rises quickly when retransmissions are expensive, safety matters, or the channel is noisy.
  • Design checkpoint: Choose the weakest method only after checking both the error model and the cost of a missed corruption event.

Problem: Noise, interference, or hardware faults can corrupt data during transmission.

Solution: Add a calculated value in the trailer that the receiver can verify.

16.2.1 Simple Checksum

Add all bytes, take lowest 8 bits:

Payload bytes: [0x45, 0x3F, 0x12]
Checksum = (0x45 + 0x3F + 0x12) & 0xFF = 0x96
Trailer: [0x96]

Pros: Simple, fast Cons: Weak error detection (can miss burst errors)

Try calculating checksums for different byte sequences and see how transposition affects the result.

Quick Check: Checksum Limitations Quick Check

Concept: Why simple checksums are insufficient for reliable communication.

16.2.2 Cyclic Redundancy Check (CRC)

Uses polynomial division for robust error detection: - CRC-16: 16-bit value, detects all single-bit and double-bit errors - CRC-32: 32-bit value, detects 99.9999% of errors - Used by: Ethernet, USB, LoRaWAN, Modbus

Example: Ethernet Frame Check Sequence (FCS) uses CRC-32


16.3 How CRC Works

CRC treats the data as a polynomial and divides it by a generator polynomial. The remainder becomes the CRC value:

  1. Data as polynomial: Each bit position represents a coefficient (e.g., 0x45 = 0b01000101 = x^6 + x^2 + x^0)
  2. Generator polynomial: Standardized for each CRC variant (CRC-32 uses 0x04C11DB7)
  3. Division: XOR-based polynomial division (no carry, just XOR)
  4. Remainder: The final remainder is appended as the CRC

Why CRC is better than checksum:

Error Type 8-bit Checksum CRC-16 CRC-32
Single bit flip 100% 100% 100%
Two bit flips High but not guaranteed 100% 100%
Transposed bytes 0% (undetected!) 100% 100%
Burst < 16 bits Poor (~50%) 100% 100%
Burst < 32 bits Poor (~50%) 99.998% 100%
Random multi-bit ~99.6% (1 - 1/256) 99.998% 99.9999%

Explore how CRC detects different types of errors that checksums miss.

Let’s quantify how CRC-32’s superior error detection translates to real-world IoT reliability.

Scenario: A smart city deploys 10,000 parking sensors, each transmitting occupancy status every 5 minutes.

Annual packet volume: \[N_{\text{packets/year}} = 10{,}000 \text{ sensors} \times \frac{60 \text{ min}}{5 \text{ min}} \times 24 \times 365 = 1{,}051{,}200{,}000 \text{ packets/year}\]

Bit Error Rate (BER) in urban RF environment: Typical \(\text{BER} = 10^{-5}\) (1 bit error per 100,000 bits transmitted)

Payload size: 32 bytes = 256 bits per packet

Expected corrupted packets per year: \[N_{\text{corrupted}} = 1{,}051{,}200{,}000 \times (256 \times 10^{-5}) = 2{,}691{,}072 \text{ corrupted packets/year}\]

Undetected errors with 8-bit checksum (\(2^8 = 256\) possible values): \[N_{\text{undetected, checksum}} = \frac{2{,}691{,}072}{256} \approx 10{,}512 \text{ bad packets accepted/year}\]

Undetected errors with CRC-32 (\(2^{32} = 4{,}294{,}967{,}296\) possible values): \[N_{\text{undetected, CRC-32}} = \frac{2{,}691{,}072}{4{,}294{,}967{,}296} \approx 0.000627 \text{ bad packets/year}\]

Key insight: CRC-32 reduces undetected errors by 16,777,216x (factor of \(2^{24}\)) compared to 8-bit checksums. Over the system’s 10-year lifetime, the checksum approach would accept roughly 105,120 corrupted parking occupancy readings, potentially causing billing disputes or incorrect navigation guidance.

Calculate the impact of bit error rates on your IoT deployment.


Comparison view showing how checksum, CRC-16, and CRC-32 process the same bytes and what each method can reliably catch. The diagram pairs a simple additive checksum workflow with a polynomial remainder workflow and a capability summary for transpositions, burst errors, and common protocol uses.
Figure 16.1: Comparison view showing how checksum, CRC-16, and CRC-32 process the same bytes and what each method can reliably catch. The diagram pairs a simple additive checksum workflow with a polynomial remainder workflow and a capability summary for transpositions, burst errors, and common protocol uses.
Timeline view of CRC in action. A transmitter computes the integrity trailer, the channel corrupts one byte, the receiver recalculates and detects the mismatch, and a retransmission succeeds when the recomputed CRC finally matches the received trailer.
Figure 16.2: Timeline view of CRC in action. A transmitter computes the integrity trailer, the channel corrupts one byte, the receiver recalculates and detects the mismatch, and a retransmission succeeds when the recomputed CRC finally matches the received trailer.

16.4 Common CRC Polynomials

CRC Type Polynomial Size Used By
CRC-8 0x07 1 byte I2C, ATM
CRC-16-CCITT 0x1021 2 bytes Bluetooth, X.25
CRC-16-Modbus 0x8005 2 bytes Modbus RTU
CRC-32 0x04C11DB7 4 bytes Ethernet, USB, Zip
CRC-32C 0x1EDC6F41 4 bytes iSCSI, SCTP

16.5 Concept Relationships

This Concept Builds On Leads To Contrasts With
Checksum Binary addition, modulo arithmetic Simple integrity checks CRC (polynomial-based)
CRC (Cyclic Redundancy Check) Polynomial division, Galois field math Robust error detection Cryptographic hashes (SHA, MD5)
Error Detection Digital transmission theory Forward Error Correction (FEC), ARQ protocols Error Correction (rebuilds data)
Burst Error Detection Signal processing, noise patterns Reed-Solomon codes Single-bit error detection
FCS (Frame Check Sequence) CRC-32 Ethernet MAC layer Application-layer checksums
See Also

Packet Structure Series:

Reliability Mechanisms:

Security (Beyond Error Detection):

Implementation:


16.6 Error Detection vs. Error Correction

Error Detection (this chapter): Identifies that an error occurred, triggers retransmission

Error Correction (Forward Error Correction): Fixes errors without retransmission

Approach Overhead Latency Use Case
Detection + Retransmit Low (2-4 bytes) Variable Wi-Fi, TCP, most IoT
FEC (Reed-Solomon) High (10-30%) Fixed Satellite, LoRa physical layer
Hybrid ARQ Medium Medium LTE, 5G

For most IoT applications, error detection with retransmission is preferred because: 1. Errors are rare (< 1% on good links) 2. FEC overhead is expensive for constrained devices 3. Retransmission latency is acceptable for sensor data


16.7 Knowledge Check: Error Detection

Knowledge Check: Error Detection Methods Quick Check

Concept: Comparing checksum and CRC error detection.


16.8 Scenario-Based Practice

Situation: You’re designing a communication protocol for 1,000 pressure sensors in an oil refinery. Requirements: - Each sensor sends: Sensor ID (16-bit), pressure (32-bit float), temperature (16-bit), timestamp (32-bit) - Transmission medium: RS-485 serial bus (noisy industrial environment) - Messages must be detectable even if receiver joins mid-transmission - Critical safety system: undetected errors could cause explosions

Question: Design the packet structure including header, payload, and trailer. Justify your choice of framing method and error detection mechanism.

Recommended Packet Structure:

Field Size Value/Purpose
Start Delimiter 2 bytes 0x55 0xAA (unique pattern, unlikely in data)
Length 1 byte Total payload length (12 bytes for this message)
Sensor ID 2 bytes 16-bit sensor identifier
Pressure 4 bytes 32-bit IEEE 754 float
Temperature 2 bytes 16-bit signed integer (C x 100)
Timestamp 4 bytes 32-bit Unix epoch (seconds)
CRC-32 4 bytes Polynomial: 0x04C11DB7
End Delimiter 1 byte 0x7E
Total 20 bytes

Error Detection: CRC-32 (not just checksum)

Why CRC-32 for safety-critical systems: - Checksum weakness: Can miss errors where bytes are transposed (0x45 0x32 vs 0x32 0x45 have same sum) - CRC-32 detects: All single-bit errors, all double-bit errors, all odd-bit errors, all burst errors < 32 bits - Safety margin: For random errors, undetected corruption is about 1 in 4.3 billion (about 1/2^32)

Real-world consideration: Many industrial protocols (Modbus RTU, CAN) use CRC-16, which is often sufficient. CRC-32 adds 2 bytes of overhead but provides extra safety margin for explosion-risk environments.

Situation: Your smart home gateway received this hex dump from a Zigbee temperature sensor, but the reading seems wrong (showing 500C instead of expected 25C):

61 04 00 08 02 01 F4 01 48 2A

The expected packet format is: - Frame Control: 2 bytes - Sequence: 1 byte - Cluster ID: 2 bytes - Attribute ID: 2 bytes - Data Type: 1 byte - Value: 2 bytes (temperature x 100, little-endian)

Question: Parse the packet byte-by-byte and identify where the error might be. What temperature does the packet actually encode?

Byte-by-Byte Parsing:

Position Hex Field Interpretation
0-1 61 04 Frame Control 0x0461 (ZCL Global, Server to Client)
2 00 Sequence Message #0
3-4 08 02 Cluster ID 0x0208 (should be 0x0402 for Temperature Measurement!)
5-6 01 F4 Attribute ID 0xF401 (should be 0x0000 for MeasuredValue!)
7 01 Data Type 0x01 (likely incorrect, should be 0x29 for int16)
8-9 48 2A Value 0x2A48 = 10,824 / 100 = 108.24C

The actual temperature value:

Looking at bytes 8-9: 48 2A - Little-endian: 0x2A48 = 10,824 - As signed int16: 10,824 / 100 = 108.24C (still wrong!)

Root cause found:

The bytes should be: 09 C4 for 25.0C (2500 in hex = 0x09C4)

But we have: 48 2A (0x2A48 = 10824 = 108.24C)

Likely causes:

  1. Sensor malfunction - reading garbage
  2. Byte corruption - single bit flip in transmission
  3. Wrong sensor type - maybe it’s humidity (0-100%) encoded differently

Debug steps:

  1. Check CRC/FCS (not shown in dump) - was it valid?
  2. Request retransmission
  3. Check sensor wiring and calibration

16.9 Code: Implementing Checksums and CRC in Python

def simple_checksum(data: bytes) -> int:
    """Simple 8-bit checksum: sum all bytes, keep lowest 8 bits."""
    return sum(data) & 0xFF

def crc16_ccitt(data: bytes, poly=0x1021, init=0xFFFF) -> int:
    """CRC-16/CCITT used by Bluetooth, X.25, and many IoT protocols."""
    crc = init
    for byte in data:
        crc ^= byte << 8
        for _ in range(8):
            if crc & 0x8000:
                crc = (crc << 1) ^ poly
            else:
                crc = crc << 1
            crc &= 0xFFFF  # Keep 16-bit
    return crc

# --- Demo: checksum weakness ---
packet_a = bytes([0x45, 0x3F, 0x12])  # Original
packet_b = bytes([0x3F, 0x45, 0x12])  # Bytes 0 and 1 swapped

print("=== Checksum (weak) ===")
print(f"Original:  {packet_a.hex()} -> checksum = 0x{simple_checksum(packet_a):02X}")
print(f"Swapped:   {packet_b.hex()} -> checksum = 0x{simple_checksum(packet_b):02X}")
print(f"Same checksum? {simple_checksum(packet_a) == simple_checksum(packet_b)}")
# Output: Both = 0x96. Checksum MISSES the transposition error!

print("\n=== CRC-16 (robust) ===")
print(f"Original:  {packet_a.hex()} -> CRC-16 = 0x{crc16_ccitt(packet_a):04X}")
print(f"Swapped:   {packet_b.hex()} -> CRC-16 = 0x{crc16_ccitt(packet_b):04X}")
print(f"Same CRC?  {crc16_ccitt(packet_a) == crc16_ccitt(packet_b)}")
# Output: Different CRCs. CRC DETECTS the transposition error.

# --- Demo: single bit flip detection ---
print("\n=== Single bit flip ===")
corrupted = bytes([0x45, 0x3F, 0x13])  # Last byte: 0x12 -> 0x13 (1 bit flip)
print(f"Original:   CRC = 0x{crc16_ccitt(packet_a):04X}")
print(f"Corrupted:  CRC = 0x{crc16_ccitt(corrupted):04X}")
print(f"Detected?   {crc16_ccitt(packet_a) != crc16_ccitt(corrupted)}")

What to observe: Run this code to see that the simple checksum produces identical values for [0x45, 0x3F, 0x12] and [0x3F, 0x45, 0x12] (transposed bytes), while CRC-16 catches the error. This is exactly why CRC is required for reliable IoT communication.

16.10 Worked Example: Debugging a Corrupted LoRaWAN Packet

Situation: A LoRaWAN temperature sensor on a building roof reports 847C. The sensor (SHT31) has a range of -40 to 125C. What happened?

Received payload (hex): 03 4F 01 A2

Expected format: [msg_type(1B)] [temp_x100(2B, big-endian, signed)] [humidity(1B)]

Parsing the corrupt payload:

msg_type = 0x03       -> OK (sensor reading type)
temp_raw = 0x4F01     -> 20,225 / 100 = 202.25C   (still wrong!)

Wait -- is this a byte order issue?
temp_raw = 0x014F     -> 335 / 100 = 3.35C   (plausible for a roof!)

Root cause: The sensor firmware was updated from big-endian to little-endian encoding, but the server decoder was not updated. The bytes 4F 01 were decoded as big-endian (0x4F01 = 20,225) instead of little-endian (0x014F = 335 = 3.35C).

But what about the 847C report? That was a different packet where the CRC check passed but a framing error shifted the payload bytes by one position. The humidity byte (0x64 = 100% RH) was interpreted as the high byte of temperature.

Lesson: CRC detects bit-level corruption, but it cannot detect application-layer framing errors where bytes are valid but misinterpreted. Always include a message type or version byte so decoders can validate the packet structure.


16.11 Review Exercises

Common Pitfalls

Checksums and CRCs are not interchangeable just because both live in the trailer. Addition-based checksums miss structured errors such as byte transpositions that CRCs catch reliably, so calling them equivalent leads to silent corruption in real deployments.

Integrity checks only work if sender and receiver run the algorithm over the exact same byte sequence in the exact same order. A wrong initial value, reflected bit order, omitted header byte, or endian mistake makes good packets fail and bad debugging assumptions spread quickly.

Detecting corruption is only half the system behavior. The real protocol must still decide whether to drop, retransmit, request a new sample, or raise an alarm. If recovery behavior is undefined, a strong CRC only tells you that the packet is bad, not what the system should do next.

16.12 Summary

Error detection ensures data integrity across noisy networks:

  • Checksums: Simple addition-based method, fast but weak detection
  • CRC: Polynomial-based method, detects 99.9999% of errors
  • CRC-16/CRC-32: Standard choices for IoT protocols
  • Trade-offs: More robust detection requires more computation and bytes

Key Takeaways:

  • CRC is much more reliable than simple checksums
  • Checksums can miss transposed bytes that CRC catches
  • Safety-critical systems should use CRC-32 or better
  • Error detection enables retransmission; error correction avoids it

16.13 What’s Next

Chapter Why Read It Next
Protocol Overhead Compare CRC and header overhead across Ethernet, LoRaWAN, Zigbee, and more
Frame Delimiters and Boundaries See how CRC interacts with start/end delimiters and byte stuffing
TCP Reliability Learn how TCP combines checksums with retransmission for end-to-end reliability
LoRaWAN Security and Joining Explore the AES-CMAC Message Integrity Code that goes beyond CRC
Encryption Security Properties Understand HMAC and CMAC – cryptographic integrity checks that also authenticate the sender

Sammy the Sensor sends a message: “Temperature is 25 degrees!” But oh no – a noisy radio wave garbles it to “Temperature is 95 degrees!”

Max the Microcontroller explains: “This is why we need ERROR DETECTION. It’s like adding a secret check to every message!”

Lila the LED shows two methods:

Method 1 – Checksum (Simple): “Add up all the numbers in your message. 2+5 = 7. Send ‘25’ plus the check ‘7’. The receiver adds 2+5 and checks: does it equal 7? YES! Message is good!”

“But,” Lila warns, “if the message changes from ‘25’ to ‘52’ (numbers swapped), the checksum is still 7! Oops – we missed the error!”

Method 2 – CRC (Super Smart): “CRC is like a magic math puzzle. It does fancy polynomial division (don’t worry, the computer does it automatically!) and catches almost EVERY error – even swapped numbers!”

Bella the Battery asks: “But doesn’t CRC use more energy?”

Max nods: “A little more math, but it catches 99.9999% of errors. For a sensor in a hospital or factory, that’s worth it! We don’t want wrong readings causing problems!”

The Squad’s Rule: Always add a check to your messages! Checksum is quick and easy. CRC is stronger and catches almost everything. For important data, always use CRC!