20  Retries & Sequence Numbers

In 60 Seconds

When IoT packets are lost, exponential backoff prevents network collapse by doubling the retry interval after each failure (e.g., 1s, 2s, 4s, 8s) with random jitter to avoid synchronized retransmission storms. Sequence numbers (typically 16-bit) let receivers detect lost packets, reject duplicates, and reorder out-of-sequence arrivals. Handling wraparound (when sequence numbers exceed 65,535 and reset to 0) requires modular arithmetic comparison to avoid breaking the protocol.

Key Concepts
  • Sequence Number: A monotonically increasing integer assigned to each message, allowing the receiver to detect: missing messages (gap in sequence), duplicate messages (repeated sequence), and reordering (out-of-order sequence)
  • Sliding Window Protocol: Flow control mechanism where sender can have up to W unacknowledged messages in-flight simultaneously; receiver acknowledges with cumulative or selective ACK
  • Go-Back-N ARQ: Retransmission strategy: on NAK or timeout, retransmit from the first unacknowledged message; simple but inefficient — retransmits messages that may have been received
  • Selective Repeat ARQ: Retransmission strategy: only retransmit specifically NAKed or timed-out messages; requires receiver to buffer out-of-order messages; more efficient than Go-Back-N
  • MQTT QoS 0/1/2: QoS 0 = fire-and-forget (no guarantee); QoS 1 = at-least-once delivery (publisher retransmits until PUBACK received, duplicates possible); QoS 2 = exactly-once (4-message handshake, no duplicates)
  • Idempotent Operation: Operation that can be safely repeated multiple times without changing the result beyond the first execution; enables at-least-once retry without data corruption
  • Message Deduplication: Mechanism for detecting and discarding duplicate messages; approaches: sequence number tracking, message hash table, or UUID deduplication window
  • CoAP Confirmable Message: CoAP message type requiring ACK; sender retransmits with exponential backoff until ACK or MAX_RETRANSMIT (default 4) is reached; provides CoAP-level reliability over UDP
Learning Objectives

By the end of this section, you will be able to:

  • Implement exponential backoff: Design congestion-aware retry algorithms with proper timeout calculations
  • Apply random jitter: Prevent collision storms by spreading retry attempts across time
  • Configure sequence numbers: Select and apply sequence numbers to detect loss, duplication, and reordering
  • Construct wraparound handlers: Build modular arithmetic comparators for 16-bit sequence number comparison
  • Select retry parameters: Evaluate and choose appropriate timeout and retry limits for different IoT scenarios
  • Calculate retry overhead: Assess the cost of reliability in terms of latency and power consumption

When a message gets lost on a network, the sender waits and then tries again — that is a retry. Sequence numbers are like numbering the pages of a book so the receiver can put them back in order if they arrive jumbled. Together, these two simple ideas make unreliable networks work reliably.

“When a packet gets lost, the worst thing you can do is immediately blast out retries,” warned Max the Microcontroller. “That is like everyone in a crowded room shouting louder when they cannot be heard — it just makes the noise worse!”

“Exponential backoff is the answer,” explained Sammy the Sensor. “If my first try fails, I wait 1 second. If the second try fails, I wait 2 seconds. Then 4, then 8. Each doubling gives the network time to clear congestion. And I add random jitter so I do not retry at the exact same time as 500 other sensors.”

“Sequence numbers are like page numbers in a book,” added Lila the LED. “I number every message: 1, 2, 3, 4. If the receiver gets 1, 2, 4 — they know message 3 was lost and can request it. If they get 1, 2, 2, 3 — they know the second 2 is a duplicate and can ignore it.”

“Watch out for wraparound,” cautioned Bella the Battery. “With 16-bit sequence numbers, after 65,535 comes 0 again. Your code needs modular arithmetic to handle this correctly, or it will think sequence 0 is older than 65,535 and break everything!”

20.1 Prerequisites

Before diving into this chapter, you should be familiar with:

Why Retry Mechanisms Matter

Packet loss is inevitable in IoT networks. Radio interference, network congestion, sleeping receivers, and gateway reboots all cause packets to disappear. Error detection tells you when data is corrupted, but retry mechanisms recover from complete loss. Without proper retry logic, your sensor readings may never reach the cloud, or critical commands may fail to reach actuators.


20.2 The Problem with Naive Retry

Time: ~10 min | Level: Intermediate | Unit: P07.REL.U03

When a transmission fails, the obvious solution is to try again immediately. But naive retry can worsen the problem:

Timeline diagram showing multiple IoT devices all retrying at the same instant after a failure, resulting in overlapping transmissions and a collision storm that makes congestion worse.
Figure 20.1: Naive retry problem: all devices retry simultaneously, causing collision storms that worsen congestion.

The solution: Exponential backoff with random jitter.


20.3 Exponential Backoff Algorithm

Exponential backoff doubles the wait time after each failure, reducing network load during congestion:

Wait Time = min(Initial_Timeout * 2^retry_count, Max_Timeout) + random_jitter

Exponential backoff’s effectiveness comes from how quickly it spreads colliding devices. With \(k\) devices each selecting a random slot from \(2^n\) slots at retry attempt \(n\), the probability that any two specific devices choose the same slot is \(1/2^n\). As \(n\) grows, the collision window widens exponentially, rapidly separating devices.

For 100 devices colliding at t=0 with base timeout 500ms and jitter:

  • Retry 1 (500ms window): Devices spread randomly across 0–500ms — many succeed
  • Retry 2 (1000ms window): Remaining devices spread across 0–1000ms — more succeed
  • Retry 3 (2000ms window): Remaining devices spread across 0–2000ms

Approximate total packets sent across 4 attempts at 50% per-attempt collision rate:

\[\text{Packets} \approx N + N/2 + N/4 + N/8 = N \times \frac{15}{8} = 100 \times 1.875 \approx 188\]

This compares to 400+ packets with naive synchronized retry (all devices retry at identical time each round). The exponential spreading reduces retransmissions by roughly 50%.

20.3.1 Backoff Timeline

Timeline diagram showing retry attempts spaced at exponentially increasing intervals: 500 ms after the first failure, 1000 ms after the second, 2000 ms after the third, and 4000 ms after the fourth, illustrating how backoff reduces network load.
Figure 20.2: Exponential backoff timeline: each retry doubles the wait time (500ms, 1000ms, 2000ms, 4000ms) until success or maximum retries.

20.3.2 Implementation

#define INITIAL_TIMEOUT_MS  500
#define MAX_TIMEOUT_MS      8000
#define JITTER_PERCENT      25

int calculateBackoff(int retryCount) {
    // Exponential: initial * 2^retry
    int baseTimeout = INITIAL_TIMEOUT_MS * (1 << retryCount);

    // Cap at maximum
    baseTimeout = min(baseTimeout, (int)MAX_TIMEOUT_MS);

    // Add random jitter (0 to 25% of timeout)
    int jitter = random((baseTimeout * JITTER_PERCENT) / 100);

    return baseTimeout + jitter;
}

20.3.3 Backoff Calculation Examples

Retry Count Base Timeout With Max Cap With 25% Jitter
0 500ms 500ms 500-625ms
1 1000ms 1000ms 1000-1250ms
2 2000ms 2000ms 2000-2500ms
3 4000ms 4000ms 4000-5000ms
4 8000ms 8000ms 8000-10000ms
5 16000ms 8000ms (capped) 8000-10000ms
Try It: Exponential Backoff Calculator

Adjust the initial timeout, maximum timeout cap, jitter percentage, and number of retries to see how backoff values grow exponentially and get capped.


20.4 Why Random Jitter is Essential

Without jitter, devices that experience simultaneous failures will retry at exactly the same time, causing collision storms:

Side-by-side timeline diagram: the left side shows ten devices all retrying at exactly the same moment with no jitter, causing a collision; the right side shows the same devices retrying at randomly distributed times within a window, avoiding collisions.
Figure 20.3: Jitter comparison: without jitter all retries collide; with jitter retries are spread across a time window.

20.4.1 Jitter Implementation Patterns

// Pattern 1: Percentage jitter (0 to X%)
int jitter1 = random((timeout * 25) / 100);

// Pattern 2: Fixed window jitter
int jitter2 = random(500);  // 0-500ms random

// Pattern 3: Full decorrelation (AWS recommendation)
int jitter3 = random(min(MAX_TIMEOUT, baseTimeout * 3));

// Pattern 4: Equal jitter (half random, half fixed)
int jitter4 = (timeout / 2) + random(timeout / 2);
Best Practice: Jitter Parameters

For IoT applications, typical jitter parameters are:

  • Percentage jitter: 20-30% of base timeout
  • Minimum jitter: 50-100ms (to spread even short timeouts)
  • Seed randomness: Use hardware random (ADC noise, radio noise) not just millis()
Try It: Jitter Collision Visualizer

See how random jitter spreads retry attempts from multiple devices. Adjust the number of devices, jitter percentage, and base timeout to observe collision probability with and without jitter.


Check Your Understanding: Jitter and Backoff

20.5 Retry Parameter Selection

Different IoT scenarios require different retry parameters:

20.5.1 Parameter Guidelines

Scenario Initial Timeout Max Timeout Max Retries Jitter
Real-time control 100-200ms 1s 2-3 10%
Sensor reporting 500ms-1s 30s 3-5 25%
Firmware update 2-5s 60s 10+ 30%
Low-power sleep 5-10s 5min 3 50%

20.5.2 Timeout Calculation

The initial timeout should be based on expected round-trip time (RTT) plus margin:

// Timeout = RTT + margin for processing + margin for variability
int timeout = expectedRTT * 2 + 100;  // 2x RTT + 100ms buffer
Horizontal bar diagram breaking the round-trip time into five sequential segments: sender transmits (50 ms), signal propagates to receiver (20 ms), receiver processes the packet (30 ms), receiver transmits ACK (5 ms), and ACK propagates back (20 ms), totalling 125 ms, with a recommended timeout of 250 ms including a 2x safety margin.
Figure 20.4: Timeout components: the initial timeout should cover the full round-trip time including transmission, propagation delay, receiver processing, ACK transmission, and return propagation, plus a safety margin.
Try It: Initial Timeout Calculator

Enter the components of your round-trip time to calculate a safe initial timeout value. The timeout must be long enough for the full round trip plus processing, with a safety margin.


20.6 Sequence Numbers and Loss Detection

Time: ~10 min | Level: Intermediate | Unit: P07.REL.U04

Sequence numbers are unique identifiers assigned to each message. They solve three critical problems:

  1. Loss detection: Gaps in sequence indicate missing packets
  2. Duplicate detection: Same sequence number received twice indicates a retransmission (the original ACK was lost), so the duplicate copy can be safely discarded
  3. Ordering: Receiver can reassemble out-of-order packets correctly

20.6.1 Sequence Number Scenarios

Four sequence diagrams illustrating: (1) normal delivery with packets 1-4 arriving in order; (2) gap detection where packets 1, 2, 4 arrive and the receiver identifies packet 3 as missing; (3) duplicate detection where packet 2 arrives twice and the second copy is discarded; (4) reordering where packets arrive as 1, 3, 2 and the receiver buffers and reorders them.
Figure 20.5: Sequence number scenarios: normal ordered delivery, gap detection for loss, duplicate identification, and reordering correction.

20.6.2 Implementing Sequence Numbers

// Sender side
uint16_t nextSequenceNum = 0;

void sendMessage(const char* data) {
    Message msg;
    msg.sequenceNum = nextSequenceNum++;
    msg.payload = data;
    transmit(&msg);
}

// Receiver side
uint16_t expectedSequenceNum = 0;

void handleMessage(Message* msg) {
    if (msg->sequenceNum == expectedSequenceNum) {
        // Normal in-order delivery
        processData(msg->payload);
        expectedSequenceNum++;
        sendAck(msg->sequenceNum);

    } else if (msg->sequenceNum < expectedSequenceNum) {
        // Duplicate - ACK but don't process again
        sendAck(msg->sequenceNum);

    } else {
        // Gap detected - request missing packets
        for (uint16_t seq = expectedSequenceNum; seq < msg->sequenceNum; seq++) {
            requestRetransmit(seq);
        }
        bufferForLater(msg);
    }
}

20.7 Sequence Number Wraparound

With a 16-bit sequence number (0-65535), what happens after message 65535? The sequence “wraps around” to 0. Receivers must handle this using modular arithmetic:

20.7.1 The Wraparound Problem

Sequence progression: 65533, 65534, 65535, 0, 1, 2, ...

Naive comparison fails:

// WRONG! This breaks at wraparound
if (received_seq > expected_seq) {
    // Handle out of order
}
// When expected=65535 and received=0, this incorrectly thinks 0 < 65535

20.7.2 Correct Wraparound Handling

// Check if seq_a comes before seq_b (handling wraparound)
bool isSequenceBefore(uint16_t seq_a, uint16_t seq_b) {
    // Interpret difference as signed value
    return (int16_t)(seq_a - seq_b) < 0;
}

// Example: seq_a=65535, seq_b=0
// 65535 - 0 = 65535, interpreted as signed = -1
// -1 < 0, so 65535 "comes before" 0 (correct!)
Wraparound Window

This approach works correctly as long as sequence numbers don’t differ by 32,768 or more (half the 16-bit range of 65,536). A difference of exactly 32,768 is ambiguous — the signed interpretation cannot determine which value comes first. For high-throughput applications that may exhaust the 16-bit space in seconds, use 32-bit sequence numbers.

Try It: Sequence Number Wraparound Explorer

Enter two 16-bit sequence numbers and see how modular arithmetic determines which one comes first — even across the wraparound boundary at 65535 to 0.


20.8 Selective Acknowledgment (SACK)

For high-throughput applications, acknowledging every packet is inefficient. Selective Acknowledgment allows the receiver to acknowledge multiple packets at once and specifically identify gaps:

Two-part sequence diagram comparing retransmission strategies: the top panel shows Go-Back-N where all packets from sequence 3 onward are resent after a gap, and the bottom panel shows Selective ACK where only the missing packet 3 is retransmitted while packets 4 and 5 are not repeated.
Figure 20.6: SACK comparison: without SACK the sender retransmits all packets from the gap onward (Go-Back-N); with SACK only the specifically missing packet is retransmitted.

20.9 Retry Overhead Analysis

Reliability has costs. Understanding the overhead helps you choose appropriate parameters:

20.9.1 Time Overhead

Total delivery time = RTT + (retries * average_backoff)

Example (20% loss, initial=500ms, up to 3 retries, RTT=140ms):
- 80.0% success on first try:    140ms RTT
- 16.0% success on second try:   140ms + 500ms = 640ms
-  3.2% success on third try:    140ms + 500ms + 1000ms = 1640ms
-  0.64% success on fourth try:  140ms + 500ms + 1000ms + 2000ms = 3640ms

Weighted average (normalized over 99.84% delivery):
  0.800*140 + 0.160*640 + 0.032*1640 + 0.0064*3640
= 112 + 102.4 + 52.5 + 23.3 = 290ms (approx)

20.9.2 Power Overhead

Assuming 3.3 V supply. Energy = Current × Voltage × Time.

Component Current Duration Energy (at 3.3V)
TX attempt 100 mA 50 ms 16.5 mJ
Wait (sleep) 10 µA 500 ms 0.0165 mJ
RX window 20 mA 100 ms 6.6 mJ

With 20% loss, expected retries per message ≈ 0.25, so average transmissions = 1.25:

Energy per message = 1.25 * (16.5 + 0.0165 + 6.6) ≈ 28.9 mJ
Power Optimization

To minimize retry power cost: 1. Use hardware CRC to avoid retransmits from corruption 2. Increase TX power slightly to reduce loss rate 3. Sleep during backoff rather than busy-waiting 4. Use adaptive timeout based on recent RTT measurements

Try It: Retry Overhead Calculator

Explore how packet loss rate, retry parameters, and power consumption interact. Adjust the sliders to see the expected delivery time and energy cost per message.


Worked Example: Designing Retry Parameters for a Smart Meter Network

Scenario: A utility company deploys 10,000 smart electricity meters using LoRaWAN. Each meter sends a 24-byte reading every hour. The LoRaWAN gateway experiences 8% packet loss during peak hours (6-9 AM) due to channel congestion. Design the retry strategy.

Step 1: Characterize the link

Round-trip time (RTT): 2 seconds (LoRaWAN Class A RX1 + RX2 windows)
Packet loss rate: 8% peak, 2% off-peak
Duty cycle limit: 1% (EU 868 MHz regulation)
TX time per message: 51 ms (SF7, 24 bytes)
Battery: 3.6V, 19 Ah (C-cell lithium)
Target battery life: 15 years

Step 2: Calculate retry budget

Messages per day: 24 (hourly)
Max TX time per day (1% duty cycle): 864 seconds
TX time per message attempt: 51 ms
Max attempts per day: 864 / 0.051 = 16,941
Attempts per message: 16,941 / 24 = 705 (theoretical max)
Practical limit: 3 retries per message (energy budget)

Step 3: Choose backoff parameters

Parameter Value Rationale
Initial timeout 3 seconds RTT (2s) + 1s margin
Backoff multiplier 2x Standard exponential
Max timeout 12 seconds 4x initial, stays within duty cycle
Max retries 3 Energy budget allows it
Jitter 50% High jitter because 10,000 meters may collide

Step 4: Analyze delivery probability

P(success on attempt k) = (1 - loss_rate) * loss_rate^(k-1)

Peak hours (8% loss):
  Attempt 1: 92.0% success
  Attempt 2: 7.36% (cumulative: 99.36%)
  Attempt 3: 0.59% (cumulative: 99.95%)
  Attempt 4: 0.047% (cumulative: 99.997%)

After 4 attempts: 0.004% failure = 0.4 meters per 10,000 miss one reading
That is about 10 missed readings per day across the fleet — acceptable.

Step 5: Calculate energy cost of retries

Energy per TX: 40 mA x 51 ms x 3.6V = 7.3 mJ
Average retries per message (8% loss): 0.08 + 0.08^2 + 0.08^3 = 0.087

Energy per message (with retries): 7.3 mJ x 1.087 = 7.9 mJ
Energy per day: 7.9 mJ x 24 = 189.6 mJ
Energy per year: 189.6 mJ x 365 = 69.2 J

Battery capacity: 19 Ah x 3.6V = 68.4 Wh = 246,240 J
Battery life: 246,240 / 69.2 = 3,558 years (TX only)

With sleep current (10 µA):
  Sleep energy per year: 10×10⁻⁶ A × 3.6V × 365 × 24 × 3600 s = 1.135 J/year
  Total energy per year: 69.2 + 1.135 ≈ 70.3 J/year
  Battery life: 246,240 / 70.3 ≈ 3,503 years (TX dominates; capped by battery shelf life ~15–20 years)

Result: 3 retries with 50% jitter and 3-second initial timeout achieves 99.997% delivery reliability while staying within the 1% duty cycle and supporting 15+ year battery life. The high jitter is critical — without it, all 10,000 meters retrying at 3-second intervals would create a synchronized collision storm.

20.10 Concept Relationships

Prerequisites:

Builds Upon:

  • Probability theory → Jitter uses randomness to decorrelate retry attempts
  • Queueing theory → Exponential backoff stabilizes overloaded networks
  • Modular arithmetic → Sequence number wraparound comparison

Enables:

Related Standards:

  • RFC 6298: TCP Retransmission Timeout (RTO) computation
  • RFC 7252: CoAP retransmission timing and backoff
  • IEEE 802.11: Wi-Fi exponential backoff for CSMA/CA
  • Ethernet CSMA/CD uses exponential backoff for collision resolution
  • Aloha protocol analysis proves exponential backoff optimality

Advanced Topics:

20.11 Try It Yourself

Exercise 1: Visualize Exponential Backoff Spread

Plot how exponential backoff spreads retry attempts over time:

# exponential_backoff_visualization.py
import matplotlib.pyplot as plt
import random

def calculate_backoff(retry_count, initial=500, max_timeout=8000, jitter_pct=25):
    base = initial * (2 ** retry_count)
    base = min(base, max_timeout)
    jitter = random.randint(0, int(base * jitter_pct / 100))
    return base + jitter

# Simulate 100 devices
devices = []
for device_id in range(100):
    retry_times = [0]  # Initial attempt at t=0
    for retry in range(1, 6):
        retry_time = retry_times[-1] + calculate_backoff(retry - 1)
        retry_times.append(retry_time)
    devices.append(retry_times)

# Plot
fig, ax = plt.subplots(figsize=(12, 6))
for i, times in enumerate(devices):
    ax.scatter(times, [i] * len(times), s=10, alpha=0.6)

ax.set_xlabel('Time (ms)')
ax.set_ylabel('Device ID')
ax.set_title('Exponential Backoff: 100 Devices, 5 Retries')
ax.grid(alpha=0.3)
plt.tight_layout()
plt.savefig('exponential_backoff.png', dpi=150)
print("Saved exponential_backoff.png")

What to Observe:

  • Initial vertical line at t=0 (all devices try simultaneously)
  • Each retry round spreads horizontally across wider time window
  • By retry 4, devices are scattered across 0-10,000 ms range

Exercise 2: Sequence Number Wraparound Test

Verify modular arithmetic handles wraparound correctly:

// sequence_wraparound_test.ino (Arduino/ESP32)
bool isSequenceBefore(uint16_t seq_a, uint16_t seq_b) {
  return (int16_t)(seq_a - seq_b) < 0;
}

void setup() {
  Serial.begin(115200);
  delay(1000);

  Serial.println("Testing sequence number wraparound:");

  // Normal case: 100 comes before 200
  Serial.printf("100 before 200? %s (expect: YES)\n",
                isSequenceBefore(100, 200) ? "YES" : "NO");

  // Wraparound case: 65535 comes before 0
  Serial.printf("65535 before 0? %s (expect: YES)\n",
                isSequenceBefore(65535, 0) ? "YES" : "NO");

  // Wraparound case: 65500 comes before 100
  Serial.printf("65500 before 100? %s (expect: YES)\n",
                isSequenceBefore(65500, 100) ? "YES" : "NO");

  // Edge case: 32767 (half-range) - boundary
  Serial.printf("0 before 32767? %s (expect: YES)\n",
                isSequenceBefore(0, 32767) ? "YES" : "NO");
  Serial.printf("0 before 32768? %s (expect: NO - too far)\n",
                isSequenceBefore(0, 32768) ? "YES" : "NO");
}

void loop() {
  // Test with real sequence progression
  static uint16_t seq = 65530;
  Serial.printf("Current seq: %u\n", seq);
  seq++;
  delay(1000);
}

Expected Output:

Testing sequence number wraparound:
100 before 200? YES (expect: YES)
65535 before 0? YES (expect: YES)
65500 before 100? YES (expect: YES)
0 before 32767? YES (expect: YES)
0 before 32768? NO (expect: NO - too far)

Current seq: 65530
Current seq: 65531
...
Current seq: 65535
Current seq: 0    ← Wraparound handled correctly
Current seq: 1

What to Observe: The signed arithmetic comparison correctly handles wraparound as long as sequences don’t differ by more than 32,767 (half the range).


Exercise 3: Retry Budget Analysis

Calculate how many retries your battery budget allows:

# retry_budget_calculator.py
def retry_budget_analysis(
    battery_mAh=2000,
    target_years=5,
    messages_per_day=100,
    tx_current_mA=120,
    tx_time_ms=50,
    packet_loss_rate=0.05
):
    # Daily budget
    daily_budget_mAh = battery_mAh / (target_years * 365)

    # Energy per successful transmission
    tx_energy_mAh = (tx_current_mA * tx_time_ms) / 3_600_000

    # With retries, expected transmissions per message
    # E[transmissions] = 1 / (1 - loss_rate)
    avg_transmissions = 1 / (1 - packet_loss_rate)

    # Daily energy with retries
    daily_energy_mAh = messages_per_day * avg_transmissions * tx_energy_mAh

    # Check if within budget
    budget_ok = daily_energy_mAh < daily_budget_mAh

    print(f"=== Retry Budget Analysis ===")
    print(f"Battery: {battery_mAh} mAh, Target: {target_years} years")
    print(f"Messages/day: {messages_per_day}, Loss rate: {packet_loss_rate*100}%")
    print(f"")
    print(f"Daily budget: {daily_budget_mAh:.3f} mAh")
    print(f"Energy per TX: {tx_energy_mAh:.6f} mAh")
    print(f"Avg transmissions per message: {avg_transmissions:.2f}")
    print(f"Daily energy (with retries): {daily_energy_mAh:.3f} mAh")
    print(f"")
    print(f"Status: {'✓ WITHIN BUDGET' if budget_ok else '✗ EXCEEDS BUDGET'}")
    print(f"Headroom: {(daily_budget_mAh - daily_energy_mAh):.3f} mAh")

# Example scenarios
print("Scenario 1: Low loss, frequent transmissions")
retry_budget_analysis(packet_loss_rate=0.05, messages_per_day=100)

print("\n" + "="*50 + "\n")

print("Scenario 2: High loss, same transmissions")
retry_budget_analysis(packet_loss_rate=0.20, messages_per_day=100)

What to Observe: At 20% packet loss, average 1.25 transmissions per message (25% retry overhead). Adjust transmission frequency or improve link quality if budget exceeded.

Common Pitfalls

MQTT QoS 2 (exactly-once) requires a 4-message handshake (PUBLISH → PUBREC → PUBREL → PUBCOMP) for every single message. At 10 messages/second, this generates 40 MQTT messages/second and 40 round trips/second of network traffic. For 1 Hz temperature readings where occasional loss is acceptable, use QoS 0 or QoS 1. Reserve QoS 2 for truly business-critical transactions (financial events, safety commands) where at-most-once and at-least-once are both unacceptable.

Using MQTT QoS 1 or CoAP Confirmable for device commands (turn on valve, increment counter) without idempotency protection causes duplicate execution when retransmissions occur. A “turn on valve” command received twice opens the valve twice (second open may be ignored) but a “increment counter” command received twice incorrectly increments the counter. Include a command sequence number or UUID in every command message, and have the device track recently-executed command IDs to detect and discard duplicates.

Default CoAP MAX_RETRANSMIT is 4, with exponential backoff: 2 s, 4 s, 8 s, 16 s = 30 seconds total before giving up. For NB-IoT with PSM (device unreachable for hours), a server retrying 4 times over 30 seconds fails long before the device wakes up. Match retry parameters to the actual device duty cycle: for hourly-reporting devices, set MAX_RETRANSMIT=1 with a 5-minute timeout, then queue the command for the next scheduled wakeup rather than burning retransmissions.

IoT devices using 16-bit sequence numbers wrap around after 65,535 messages. A device sending 100 messages/day wraps in 655 days. If the receiver does not handle wrap-around (e.g., simply checks new_seq > last_seq), it will reject all messages after the wrap as duplicates. Implement modular arithmetic comparison for sequence numbers: message is “new” if (new_seq - last_seq) mod 2^16 < 2^15 (the window is less than half the sequence space).

20.12 What’s Next

After mastering retry mechanisms and sequence numbers, continue with these related chapters:

Chapter What You Will Learn Relevance
Connection State Lab Build a complete reliable transport combining all five reliability pillars Apply backoff and sequence numbers in working code
Transport Optimizations Advanced reliability tuning: adaptive timeout, Karn’s algorithm, TCP RTO Deepen and refine the retry strategies from this chapter
CoAP Features and Labs CoAP Confirmable messages with built-in exponential backoff (max 4 retries) See the exact retry mechanism from this chapter in a real protocol
MQTT QoS and Sessions QoS 1 acknowledged delivery and QoS 2 exactly-once delivery with sequence tracking Compare broker-side reliability to the device-side approach covered here
Transport Fundamentals ACK patterns, timeout basics, stop-and-wait vs. sliding window Review the foundation concepts that underpin exponential backoff
Error Detection CRC and checksum-based corruption detection that triggers retries Understand how errors are detected before retry mechanisms engage