16  Error Detection: Checksums and CRC

In 60 Seconds

Error detection adds a calculated value (checksum or CRC) to packets so receivers can verify data integrity. Simple checksums (add all bytes) are fast but weak – they miss transposed bytes. CRC uses polynomial division to catch 99.9999% of errors, making it the standard choice for Ethernet, USB, LoRaWAN, and safety-critical IoT systems.

16.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Calculate checksums and CRC values: Compute 8-bit checksums by hand and trace the XOR-based polynomial division used by CRC-16
  • Differentiate checksum from CRC: Classify error types (single-bit, burst, transposition) that each method detects or misses
  • Evaluate detection strength quantitatively: Compare undetected-error probabilities for 8-bit checksum, CRC-16, and CRC-32 in a given deployment scenario
  • Select an error-detection scheme: Justify the choice of checksum, CRC-16, or CRC-32 based on channel noise, safety requirements, and device constraints
  • Diagnose corrupted packets: Parse a hex dump, verify the CRC or checksum, and pinpoint the likely corruption source

When data travels wirelessly, bits can get flipped by interference – like static on a phone call garbling words. Error detection adds a small mathematical “fingerprint” to each message so the receiver can check if anything got corrupted along the way. It is similar to how a cashier adds up your items and checks the total – if the numbers do not match, something went wrong. This chapter covers the two main techniques: simple checksums and the more powerful CRC used in nearly all modern networks.

Prerequisites (Read These First):

Companion Chapters (Packet Structure Series):

Security:


16.2 Error Detection: Checksums and CRC

Time: ~9 min | Difficulty: Intermediate | Unit: P02.C02.U03

Key Concepts

  • Integrity check: The sender calculates a value from the payload and trailer fields so the receiver can recompute it and reject corrupted frames.
  • Key metric: Detection strength is measured by what kinds of corruption go unnoticed, especially burst errors, bit flips, and byte transpositions.
  • Main trade-off: Simple checksums are cheap to compute but weak against structured errors, while CRCs cost more bits and logic but catch far more real channel faults.
  • Protocol pattern: Lightweight protocols may use addition-based checksums, but most modern link layers and industrial buses rely on CRC-16 or CRC-32.
  • Deployment consideration: The value of stronger error detection rises quickly when retransmissions are expensive, safety matters, or the channel is noisy.
  • Design checkpoint: Choose the weakest method only after checking both the error model and the cost of a missed corruption event.

Problem: Noise, interference, or hardware faults can corrupt data during transmission.

Solution: Add a calculated value in the trailer that the receiver can verify.

16.2.1 Simple Checksum

Add all bytes, take lowest 8 bits:

Payload bytes: [0x45, 0x3F, 0x12]
Byte sum: 0x45 + 0x3F + 0x12 = 0x96
Checksum (low 8 bits): 0x96
Trailer: [0x96]

Pros: Simple, fast Cons: Weak error detection (can miss burst errors)

Try calculating checksums for different byte sequences and see how transposition affects the result.

Quick Check: Checksum Limitations Quick Check

Concept: Why simple checksums are insufficient for reliable communication.

16.2.2 Cyclic Redundancy Check (CRC)

Uses polynomial division for robust error detection: - CRC-16: 16-bit value, detects all single-bit and double-bit errors - CRC-32: 32-bit value, detects 99.9999% of errors - Used by: Ethernet, USB, LoRaWAN, Modbus

Example: Ethernet Frame Check Sequence (FCS) uses CRC-32


16.3 How CRC Works

CRC treats the data as a polynomial and divides it by a generator polynomial. The remainder becomes the CRC value:

  1. Data as polynomial: Each bit position represents a coefficient (e.g., 0x45 = 0b01000101 = x^6 + x^2 + x^0)
  2. Generator polynomial: Standardized for each CRC variant (CRC-32 uses 0x04C11DB7)
  3. Division: XOR-based polynomial division (no carry, just XOR)
  4. Remainder: The final remainder is appended as the CRC

Why CRC is better than checksum:

  • Single bit flip: 8-bit checksum: 100% CRC-16: 100% CRC-32: 100%
  • Two bit flips: 8-bit checksum: High but not guaranteed CRC-16: 100% CRC-32: 100%
  • Transposed bytes: 8-bit checksum: 0% (undetected!) CRC-16: 100% CRC-32: 100%
  • Burst < 16 bits: 8-bit checksum: Poor (~50%) CRC-16: 100% CRC-32: 100%
  • Burst < 32 bits: 8-bit checksum: Poor (~50%) CRC-16: 99.998% CRC-32: 100%
  • Random multi-bit: 8-bit checksum: ~99.6% (1 - 1/256) CRC-16: 99.998% CRC-32: 99.9999%

Explore how CRC detects different types of errors that checksums miss.

Let’s quantify how CRC-32’s superior error detection translates to real-world IoT reliability.

Scenario: A smart city deploys 10,000 parking sensors, each transmitting occupancy status every 5 minutes.

Annual packet volume: \[N_{\text{packets/year}} = 10{,}000 \text{ sensors} \times \frac{60 \text{ min}}{5 \text{ min}} \times 24 \times 365 = 1{,}051{,}200{,}000 \text{ packets/year}\]

Bit Error Rate (BER) in urban RF environment: Typical \(\text{BER} = 10^{-5}\) (1 bit error per 100,000 bits transmitted)

Payload size: 32 bytes = 256 bits per packet

Expected corrupted packets per year: \[N_{\text{corrupted}} = 1{,}051{,}200{,}000 \times (256 \times 10^{-5}) = 2{,}691{,}072 \text{ corrupted packets/year}\]

Undetected errors with 8-bit checksum (\(2^8 = 256\) possible values): \[N_{\text{undetected, checksum}} = \frac{2{,}691{,}072}{256} \approx 10{,}512 \text{ bad packets accepted/year}\]

Undetected errors with CRC-32 (\(2^{32} = 4{,}294{,}967{,}296\) possible values): \[N_{\text{undetected, CRC-32}} = \frac{2{,}691{,}072}{4{,}294{,}967{,}296} \approx 0.000627 \text{ bad packets/year}\]

Key insight: CRC-32 reduces undetected errors by 16,777,216x (factor of \(2^{24}\)) compared to 8-bit checksums. Over the system’s 10-year lifetime, the checksum approach would accept roughly 105,120 corrupted parking occupancy readings, potentially causing billing disputes or incorrect navigation guidance.

Calculate the impact of bit error rates on your IoT deployment.


Comparison view showing how checksum, CRC-16, and CRC-32 process the same bytes and what each method can reliably catch. The diagram pairs a simple additive checksum workflow with a polynomial remainder workflow and a capability summary for transpositions, burst errors, and common protocol uses.
Figure 16.1: Comparison view showing how checksum, CRC-16, and CRC-32 process the same bytes and what each method can reliably catch. The diagram pairs a simple additive checksum workflow with a polynomial remainder workflow and a capability summary for transpositions, burst errors, and common protocol uses.
Mobile Figure Summary: Checksum vs CRC

Checksum workflow

  • Input bytes: 45 3F 12
  • Add the bytes: 0x45 + 0x3F + 0x12 = 0x96
  • Append trailer byte 0x96
  • Fast, but weak against structured corruption

CRC workflow

  • Divide the data bitstream by generator polynomial 0x1021
  • Append the remainder as the trailer, for example 0x7F82
  • Costs more computation, but catches far more real transmission faults

Detection coverage

  • Single-bit flips: checksum Yes, CRC-16 Yes, CRC-32 Yes
  • Byte swaps: checksum No, CRC-16 Yes, CRC-32 Yes
  • Burst errors under 16 bits: checksum No, CRC-16 Yes, CRC-32 Yes
  • Burst errors under 32 bits: checksum No, CRC-16 No, CRC-32 Yes
Timeline view of CRC in action. A transmitter computes the integrity trailer, the channel corrupts one byte, the receiver recalculates and detects the mismatch, and a retransmission succeeds when the recomputed CRC finally matches the received trailer.
Figure 16.2: Timeline view of CRC in action. A transmitter computes the integrity trailer, the channel corrupts one byte, the receiver recalculates and detects the mismatch, and a retransmission succeeds when the recomputed CRC finally matches the received trailer.
Mobile Figure Summary: CRC Retransmission Timeline
  1. Sender builds a frame with payload 45 3F 12 and CRC-16 7F82.
  2. Noise corrupts one byte so the receiver sees 44 3F 12 while the trailer still says 7F82.
  3. Receiver recalculates the CRC and gets A1C4, which does not match the received trailer.
  4. Receiver rejects the frame and sends a NACK requesting retransmission.
  5. Sender retransmits the original frame, the receiver recalculates 7F82, and the clean packet is accepted.

16.4 Common CRC Polynomials

  • CRC-8: polynomial 0x07, size 1 byte, used by I2C and ATM
  • CRC-16-CCITT: polynomial 0x1021, size 2 bytes, used by Bluetooth and X.25
  • CRC-16-Modbus: polynomial 0x8005, size 2 bytes, used by Modbus RTU
  • CRC-32: polynomial 0x04C11DB7, size 4 bytes, used by Ethernet, USB, and Zip
  • CRC-32C: polynomial 0x1EDC6F41, size 4 bytes, used by iSCSI and SCTP

16.5 Concept Relationships

  • Checksum Builds on: Binary addition and modulo arithmetic Leads to: Simple integrity checks Contrasts with: CRC (polynomial-based)
  • CRC (Cyclic Redundancy Check) Builds on: Polynomial division and Galois field math Leads to: Robust error detection Contrasts with: Cryptographic hashes (SHA, MD5)
  • Error Detection Builds on: Digital transmission theory Leads to: Forward Error Correction (FEC) and ARQ protocols Contrasts with: Error Correction (rebuilds data)
  • Burst Error Detection Builds on: Signal processing and noise patterns Leads to: Reed-Solomon codes Contrasts with: Single-bit error detection
  • FCS (Frame Check Sequence) Builds on: CRC-32 Leads to: Ethernet MAC layer Contrasts with: Application-layer checksums
See Also

Packet Structure Series:

Reliability Mechanisms:

Security (Beyond Error Detection):

Implementation:


16.6 Error Detection vs. Error Correction

Error Detection (this chapter): Identifies that an error occurred, triggers retransmission

Error Correction (Forward Error Correction): Fixes errors without retransmission

  • Detection + Retransmit: overhead Low (2-4 bytes), latency Variable, best for Wi-Fi, TCP, and most IoT deployments
  • FEC (Reed-Solomon): overhead High (10-30%), latency Fixed, best for Satellite links and the LoRa physical layer
  • Hybrid ARQ: overhead Medium, latency Medium, best for LTE and 5G

For most IoT applications, error detection with retransmission is preferred because: 1. Errors are rare (< 1% on good links) 2. FEC overhead is expensive for constrained devices 3. Retransmission latency is acceptable for sensor data


16.7 Knowledge Check: Error Detection

Knowledge Check: Error Detection Methods Quick Check

Concept: Comparing checksum and CRC error detection.


16.8 Scenario-Based Practice

Situation: You’re designing a communication protocol for 1,000 pressure sensors in an oil refinery. Requirements: - Each sensor sends: Sensor ID (16-bit), pressure (32-bit float), temperature (16-bit), timestamp (32-bit) - Transmission medium: RS-485 serial bus (noisy industrial environment) - Messages must be detectable even if receiver joins mid-transmission - Critical safety system: undetected errors could cause explosions

Question: Design the packet structure including header, payload, and trailer. Justify your choice of framing method and error detection mechanism.

Recommended Packet Structure:

  • Start delimiter: 2 bytes set to 0x55 0xAA so receivers can resynchronize mid-stream.
  • Length: 1 byte storing the total payload length (12 bytes for this example).
  • Sensor ID: 2 bytes for the 16-bit device identifier.
  • Pressure: 4 bytes as an IEEE 754 float.
  • Temperature: 2 bytes as a signed integer in C x 100.
  • Timestamp: 4 bytes as Unix epoch seconds.
  • CRC-32: 4 bytes using polynomial 0x04C11DB7.
  • End delimiter: 1 byte set to 0x7E.
  • Total frame size: 20 bytes.

Error Detection: CRC-32 (not just checksum)

Why CRC-32 for safety-critical systems: - Checksum weakness: Can miss errors where bytes are transposed (0x45 0x32 vs 0x32 0x45 have same sum) - CRC-32 detects: All single-bit errors, all double-bit errors, all odd-bit errors, all burst errors < 32 bits - Safety margin: For random errors, undetected corruption is about 1 in 4.3 billion (about 1/2^32)

Real-world consideration: Many industrial protocols (Modbus RTU, CAN) use CRC-16, which is often sufficient. CRC-32 adds 2 bytes of overhead but provides extra safety margin for explosion-risk environments.

Situation: Your smart home gateway received this hex dump from a Zigbee temperature sensor, but the reading seems wrong (showing 500C instead of expected 25C):

61 04 00 08 02 01 F4 01 48 2A

The expected packet format is: - Frame Control: 2 bytes - Sequence: 1 byte - Cluster ID: 2 bytes - Attribute ID: 2 bytes - Data Type: 1 byte - Value: 2 bytes (temperature x 100, little-endian)

Question: Parse the packet byte-by-byte and identify where the error might be. What temperature does the packet actually encode?

Byte-by-Byte Parsing:

  • Bytes 0-1 (61 04): Frame Control = 0x0461 (ZCL Global, Server to Client).
  • Byte 2 (00): Sequence number = message #0.
  • Bytes 3-4 (08 02): Cluster ID = 0x0208, but it should be 0x0402 for Temperature Measurement.
  • Bytes 5-6 (01 F4): Attribute ID = 0xF401, but it should be 0x0000 for MeasuredValue.
  • Byte 7 (01): Data Type = 0x01, likely wrong because an int16 reading should use 0x29.
  • Bytes 8-9 (48 2A): Value = 0x2A48 = 10,824 / 100 = 108.24C.

The actual temperature value:

Looking at bytes 8-9: 48 2A - Little-endian: 0x2A48 = 10,824 - As signed int16: 10,824 / 100 = 108.24C (still wrong!)

Root cause found:

The bytes should be: 09 C4 for 25.0C (2500 in hex = 0x09C4)

But we have: 48 2A (0x2A48 = 10824 = 108.24C)

Likely causes:

  1. Sensor malfunction - reading garbage
  2. Byte corruption - single bit flip in transmission
  3. Wrong sensor type - maybe it’s humidity (0-100%) encoded differently

Debug steps:

  1. Check CRC/FCS (not shown in dump) - was it valid?
  2. Request retransmission
  3. Check sensor wiring and calibration

16.9 Code: Implementing Checksums and CRC in Python

def simple_checksum(data: bytes) -> int:
    """Simple 8-bit checksum: sum all bytes, keep lowest 8 bits."""
    return sum(data) & 0xFF

def crc16_ccitt(data: bytes, poly=0x1021, init=0xFFFF) -> int:
    """CRC-16/CCITT used by Bluetooth, X.25, and many IoT protocols."""
    crc = init
    for byte in data:
        crc ^= byte << 8
        for _ in range(8):
            if crc & 0x8000:
                crc = (crc << 1) ^ poly
            else:
                crc = crc << 1
            crc &= 0xFFFF  # Keep 16-bit
    return crc

# --- Demo: checksum weakness ---
packet_a = bytes([0x45, 0x3F, 0x12])  # Original
packet_b = bytes([0x3F, 0x45, 0x12])  # Bytes 0 and 1 swapped

print("=== Checksum (weak) ===")
print(f"Original:  {packet_a.hex()} -> checksum = 0x{simple_checksum(packet_a):02X}")
print(f"Swapped:   {packet_b.hex()} -> checksum = 0x{simple_checksum(packet_b):02X}")
print(f"Same checksum? {simple_checksum(packet_a) == simple_checksum(packet_b)}")
# Output: Both = 0x96. Checksum MISSES the transposition error!

print("\n=== CRC-16 (robust) ===")
print(f"Original:  {packet_a.hex()} -> CRC-16 = 0x{crc16_ccitt(packet_a):04X}")
print(f"Swapped:   {packet_b.hex()} -> CRC-16 = 0x{crc16_ccitt(packet_b):04X}")
print(f"Same CRC?  {crc16_ccitt(packet_a) == crc16_ccitt(packet_b)}")
# Output: Different CRCs. CRC DETECTS the transposition error.

# --- Demo: single bit flip detection ---
print("\n=== Single bit flip ===")
corrupted = bytes([0x45, 0x3F, 0x13])  # Last byte: 0x12 -> 0x13 (1 bit flip)
print(f"Original:   CRC = 0x{crc16_ccitt(packet_a):04X}")
print(f"Corrupted:  CRC = 0x{crc16_ccitt(corrupted):04X}")
print(f"Detected?   {crc16_ccitt(packet_a) != crc16_ccitt(corrupted)}")

What to observe: Run this code to see that the simple checksum produces identical values for [0x45, 0x3F, 0x12] and [0x3F, 0x45, 0x12] (transposed bytes), while CRC-16 catches the error. This is exactly why CRC is required for reliable IoT communication.

16.10 Worked Example: Debugging a Corrupted LoRaWAN Packet

Situation: A LoRaWAN temperature sensor on a building roof reports 847C. The sensor (SHT31) has a range of -40 to 125C. What happened?

Received payload (hex): 03 4F 01 A2

Expected format: [msg_type(1B)] [temp_x100(2B, big-endian, signed)] [humidity(1B)]

Parsing the corrupt payload:

  • msg_type = 0x03: sensor reading type, so the first byte looks valid.
  • temp_raw = 0x4F01: big-endian decode gives 20,225 / 100 = 202.25C, which is still impossible.
  • temp_raw = 0x014F: little-endian decode gives 335 / 100 = 3.35C, which is plausible for a roof sensor.

Root cause: The sensor firmware was updated from big-endian to little-endian encoding, but the server decoder was not updated. The bytes 4F 01 were decoded as big-endian (0x4F01 = 20,225) instead of little-endian (0x014F = 335 = 3.35C).

But what about the 847C report? That was a different packet where the CRC check passed but a framing error shifted the payload bytes by one position. The humidity byte (0x64 = 100% RH) was interpreted as the high byte of temperature.

Lesson: CRC detects bit-level corruption, but it cannot detect application-layer framing errors where bytes are valid but misinterpreted. Always include a message type or version byte so decoders can validate the packet structure.


16.11 Review Exercises

Common Pitfalls

Checksums and CRCs are not interchangeable just because both live in the trailer. Addition-based checksums miss structured errors such as byte transpositions that CRCs catch reliably, so calling them equivalent leads to silent corruption in real deployments.

Integrity checks only work if sender and receiver run the algorithm over the exact same byte sequence in the exact same order. A wrong initial value, reflected bit order, omitted header byte, or endian mistake makes good packets fail and bad debugging assumptions spread quickly.

Detecting corruption is only half the system behavior. The real protocol must still decide whether to drop, retransmit, request a new sample, or raise an alarm. If recovery behavior is undefined, a strong CRC only tells you that the packet is bad, not what the system should do next.

16.12 Summary

Error detection ensures data integrity across noisy networks:

  • Checksums: Simple addition-based method, fast but weak detection
  • CRC: Polynomial-based method, detects 99.9999% of errors
  • CRC-16/CRC-32: Standard choices for IoT protocols
  • Trade-offs: More robust detection requires more computation and bytes

Key Takeaways:

  • CRC is much more reliable than simple checksums
  • Checksums can miss transposed bytes that CRC catches
  • Safety-critical systems should use CRC-32 or better
  • Error detection enables retransmission; error correction avoids it

16.13 What’s Next

Sammy the Sensor sends a message: “Temperature is 25 degrees!” But oh no – a noisy radio wave garbles it to “Temperature is 95 degrees!”

Max the Microcontroller explains: “This is why we need ERROR DETECTION. It’s like adding a secret check to every message!”

Lila the LED shows two methods:

Method 1 – Checksum (Simple): “Add up all the numbers in your message. 2+5 = 7. Send ‘25’ plus the check ‘7’. The receiver adds 2+5 and checks: does it equal 7? YES! Message is good!”

“But,” Lila warns, “if the message changes from ‘25’ to ‘52’ (numbers swapped), the checksum is still 7! Oops – we missed the error!”

Method 2 – CRC (Super Smart): “CRC is like a magic math puzzle. It does fancy polynomial division (don’t worry, the computer does it automatically!) and catches almost EVERY error – even swapped numbers!”

Bella the Battery asks: “But doesn’t CRC use more energy?”

Max nods: “A little more math, but it catches 99.9999% of errors. For a sensor in a hospital or factory, that’s worth it! We don’t want wrong readings causing problems!”

The Squad’s Rule: Always add a check to your messages! Checksum is quick and easy. CRC is stronger and catches almost everything. For important data, always use CRC!