1189  MQTT Quality of Service

1189.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Understand QoS Levels: Master the three MQTT Quality of Service levels and their trade-offs
  • Calculate Message Overhead: Quantify bandwidth and message costs for each QoS level
  • Select Appropriate QoS: Choose the right reliability level for different IoT scenarios
  • Optimize for Battery Life: Balance reliability against power consumption

1189.2 Prerequisites

Before diving into this chapter, you should be familiar with:

1189.3 Quality of Service Levels

MQTT offers three reliability levels for message delivery:

QoS Level Guarantee Speed Use Case
QoS 0 At most once (fire and forget) Fastest Frequent sensor data
QoS 1 At least once (may duplicate) Medium Important alerts
QoS 2 Exactly once (guaranteed) Slowest Critical commands
TipRule of Thumb

Start with QoS 1 for most IoT applications. Use QoS 0 for high-frequency telemetry and QoS 2 only for critical commands.

1189.3.1 QoS 0: Fire and Forget

Publisher                Broker                  Subscriber
    |                       |                         |
    |--- PUBLISH (QoS 0) -->|                         |
    |    [No Packet ID]     |                         |
    |                       |--- PUBLISH (QoS 0) ---->|
    |                       |    [No ACK needed]      |

Packet overhead: 1 message Delivery guarantee: None (message may be lost) Use case: Temperature sensor sending updates every 10 seconds

1189.3.2 QoS 1: At Least Once

Publisher                Broker                  Subscriber
    |                       |                         |
    |--- PUBLISH (QoS 1) -->|                         |
    |    Packet ID: 42      |                         |
    |                       |--- PUBLISH (QoS 1) ---->|
    |                       |    Packet ID: 99        |
    |<------ PUBACK --------|                         |
    |    Packet ID: 42      |<------- PUBACK ---------|
    |                       |    Packet ID: 99        |

Packet overhead: 2 messages (PUBLISH + PUBACK) Delivery guarantee: At least once (may receive duplicates) Use case: Motion detection alerts

1189.3.3 QoS 2: Exactly Once (4-Way Handshake)

Publisher                Broker                  Subscriber
    |                       |                         |
    |--- PUBLISH (QoS 2) -->|                         |
    |    Packet ID: 42      |                         |
    |<------ PUBREC --------|                         |
    |    Packet ID: 42      |--- PUBLISH (QoS 2) ---->|
    |                       |    Packet ID: 100       |
    |------ PUBREL -------->|                         |
    |    Packet ID: 42      |<------ PUBREC ----------|
    |                       |    Packet ID: 100       |
    |<------ PUBCOMP -------|                         |
    |    Packet ID: 42      |------ PUBREL ---------->|
    |                       |    Packet ID: 100       |
    |                       |<------ PUBCOMP ---------|
    |                       |    Packet ID: 100       |

Packet overhead: 4 messages per side (8 total) Delivery guarantee: Exactly once (no duplicates, guaranteed delivery) Use case: Critical commands (unlock door, dispense medication)

Why 4 steps?

  1. PUBLISH - β€œHere’s a message”
  2. PUBREC - β€œI received it, but haven’t processed yet”
  3. PUBREL - β€œOK, you can process it now”
  4. PUBCOMP - β€œDone processing, we’re complete”

1189.4 QoS Trade-off Analysis

WarningTradeoff: QoS Level Selection

Option A: Use QoS 0 (fire-and-forget) for all telemetry data Option B: Use QoS 1/2 for guaranteed delivery with acknowledgments

Decision Factors:

Factor QoS 0 QoS 1 QoS 2
Message overhead 1 packet 2 packets 4 packets
Battery impact Lowest Medium (+25%) Highest (+75%)
Delivery guarantee None At-least-once Exactly-once
Duplicate messages No Possible Never
Network traffic Baseline +12% +29%
Broker load Lowest 2x 4x

Choose QoS 0 when:

  • High-frequency sensor readings where next value supersedes lost ones
  • Battery life is critical priority
  • Data redundancy exists (multiple sensors)
  • Network is generally reliable

Choose QoS 1 when:

  • Important notifications and alerts
  • Audit logging where completeness matters
  • Commands where idempotent execution is safe

Choose QoS 2 when:

  • Financial transactions or billing events
  • Security-critical commands
  • State changes where duplicate execution is dangerous

Default recommendation: QoS 0 for 95%+ of IoT telemetry; QoS 1 for alerts; QoS 2 only when exactly-once is required

1189.5 Common Misconception: Always Use QoS 2

WarningMyth: β€œAlways Use QoS 2 for Important Data”

The Misconception: Many developers assume that β€œimportant” IoT data should always use QoS 2 because it provides the strongest guarantee.

The Reality: QoS 2 has a 4x message overhead compared to QoS 0 and can reduce battery life by 70-80% in battery-powered devices.

Real-World Example - Smart Agriculture Deployment:

A large-scale agricultural IoT deployment initially configured 10,000 soil moisture sensors with QoS 2 for all readings because the data was considered β€œcritical for irrigation decisions.”

Results after 3 months:

  • Battery life: 4.2 months (expected 12-18 months with QoS 0)
  • Network congestion: 40% of messages delayed >10 seconds
  • AWS IoT costs: $14,800/month (vs projected $3,700/month)
  • Maintenance costs: $45,000 for emergency battery replacements

After switching to QoS 0 with 1-minute sampling:

  • Battery life: Extended to 14 months
  • Data quality: Only 0.03% message loss - acceptable for soil moisture
  • Network latency: Reduced to <2 seconds average
  • AWS IoT costs: Dropped to $3,200/month (78% savings)
  • Total annual savings: ~$175,000

Rule of Thumb:

  • QoS 0: Sensor readings every few seconds/minutes (99%+ of traffic)
  • QoS 1: Alerts, notifications (<1% of traffic)
  • QoS 2: Critical commands (<0.1% of traffic)

1189.6 Worked Example: Smart Building MQTT Traffic Analysis

NoteProblem Statement

Context: You are designing an MQTT-based monitoring system for a commercial office building:

  • 50 temperature sensors distributed across 5 floors
  • Each sensor publishes every 5 minutes (288 readings/day per sensor)
  • Payload: 20 bytes JSON (e.g., {"temp":24.5,"unit":"C"})
  • Topic: Average 25 bytes (e.g., building/floor3/room12/temp)

Task: Calculate the daily network traffic and message overhead for QoS 0, QoS 1, and QoS 2.

1189.6.1 MQTT Packet Sizes (from MQTT 3.1.1 Specification)

Component Size Notes
Fixed Header 2 bytes Packet type + remaining length
Variable Header (PUBLISH) Topic length (2) + Topic + Packet ID (2 for QoS>0)
PUBACK packet 4 bytes Fixed header (2) + Packet ID (2)
PUBREC/PUBREL/PUBCOMP 4 bytes each For QoS 2 handshake

1189.6.2 Step 1: Calculate Base PUBLISH Packet Size

QoS 0 PUBLISH packet:

Fixed Header:           2 bytes
Topic Length:           2 bytes
Topic Name:            25 bytes  (e.g., "building/floor3/room12/temp")
Packet ID:              0 bytes  (not used for QoS 0)
Payload:               20 bytes  (JSON data)
-----------------------------------------
Total QoS 0 PUBLISH:   49 bytes

QoS 1/2 PUBLISH packet:

Fixed Header:           2 bytes
Topic Length:           2 bytes
Topic Name:            25 bytes
Packet ID:              2 bytes  (required for QoS 1 and 2)
Payload:               20 bytes
-----------------------------------------
Total QoS 1/2 PUBLISH: 51 bytes

1189.6.3 Step 2: Calculate Total Bytes per Message

QoS 0: Fire and Forget

Messages per reading:  1 (PUBLISH only)
Bytes per reading:    49 bytes

QoS 1: At Least Once

Messages per reading:  2 (PUBLISH + PUBACK)
PUBLISH:              51 bytes
PUBACK:                4 bytes
-----------------------------------------
Total per reading:    55 bytes

QoS 2: Exactly Once

Messages per reading:  4 (PUBLISH + PUBREC + PUBREL + PUBCOMP)
PUBLISH:              51 bytes
PUBREC:                4 bytes
PUBREL:                4 bytes
PUBCOMP:               4 bytes
-----------------------------------------
Total per reading:    63 bytes

1189.6.4 Step 3: Calculate Daily Traffic

Readings per day: 24 hours x 12 readings/hour = 288 readings/sensor/day

QoS Level Bytes/Reading Daily per Sensor Calculation
QoS 0 49 bytes 14,112 bytes 49 x 288
QoS 1 55 bytes 15,840 bytes 55 x 288
QoS 2 63 bytes 18,144 bytes 63 x 288

1189.6.5 Step 4: Calculate Building-Wide Traffic (50 Sensors)

QoS Level Daily per Sensor Total Daily Daily in KB
QoS 0 14,112 bytes 705,600 bytes 689 KB
QoS 1 15,840 bytes 792,000 bytes 774 KB
QoS 2 18,144 bytes 907,200 bytes 886 KB

1189.6.6 Step 5: Calculate Monthly Data and Message Count

QoS Level Monthly Data Monthly Messages Overhead vs QoS 0
QoS 0 20.2 MB 432,000 Baseline
QoS 1 22.7 MB 864,000 +12.3%
QoS 2 25.9 MB 1,728,000 +28.6%

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#E67E22', 'secondaryColor': '#16A085', 'tertiaryColor': '#E67E22'}}}%%
graph LR
    subgraph "Monthly Data Transfer"
        Q0D["QoS 0<br/>20.2 MB"]
        Q1D["QoS 1<br/>22.7 MB"]
        Q2D["QoS 2<br/>25.9 MB"]
    end

    subgraph "Monthly Messages"
        Q0M["QoS 0<br/>432K"]
        Q1M["QoS 1<br/>864K"]
        Q2M["QoS 2<br/>1.73M"]
    end

    subgraph "Overhead vs QoS 0"
        Q0O["QoS 0<br/>Baseline"]
        Q1O["QoS 1<br/>+12%"]
        Q2O["QoS 2<br/>+29%"]
    end

    style Q0D fill:#16A085,color:#fff
    style Q1D fill:#E67E22,color:#fff
    style Q2D fill:#2C3E50,color:#fff
    style Q0M fill:#16A085,color:#fff
    style Q1M fill:#E67E22,color:#fff
    style Q2M fill:#2C3E50,color:#fff
    style Q0O fill:#16A085,color:#fff
    style Q1O fill:#E67E22,color:#fff
    style Q2O fill:#2C3E50,color:#fff

Figure 1189.1: QoS comparison for 50-sensor smart building deployment

1189.6.7 Design Recommendation

TipRecommendation for This Scenario

Use QoS 0 for this deployment because:

  1. Temperature data is non-critical - If one reading is lost, the next arrives soon
  2. Redundancy from frequency - 288 readings/day means losing a few has minimal impact
  3. Lowest bandwidth cost - Saves 5.7 MB/month vs QoS 2
  4. Lowest broker load - 4x fewer messages than QoS 2

When would QoS 1/2 be justified?

Scenario Recommended QoS Rationale
Temperature readings every 5 min QoS 0 High frequency, non-critical
Fire alarm trigger QoS 1 Critical alert, duplicates acceptable
HVAC setpoint command QoS 1 Important, duplicate won’t cause harm
Door lock/unlock command QoS 2 Security-critical, exactly once

1189.6.8 Cost Implications (AWS IoT Core Pricing)

QoS Level Monthly Messages Cost @ $1/million Annual Cost
QoS 0 432,000 $0.43 $5.18
QoS 1 864,000 $0.86 $10.37
QoS 2 1,728,000 $1.73 $20.74

Key insight: QoS 2 costs 4x more than QoS 0 for the same sensor data.

1189.7 Interactive: QoS Level Simulator

1189.8 Session Management

CautionPitfall: Ignoring Clean Session Implications

The Mistake: Using clean_session=True without understanding that the broker discards all stored state, causing missed messages after reconnection.

Why It Happens: Many examples use clean_session=True for simplicity. When a device reconnects after network loss, it must re-subscribe, and messages published during disconnection are lost.

The Fix: For most IoT applications, use clean_session=False with a persistent client ID:

# BAD: Clean session loses all state on reconnect
client.connect(broker, clean_session=True)

# GOOD: Persistent session retains subscriptions and queued messages
client.connect(broker, client_id="device-ABC123", clean_session=False)

With persistent sessions:

  • Subscriptions survive reconnection
  • QoS 1/2 messages are queued while client is offline
  • Session expiry interval (MQTT 5.0) controls how long broker retains state

1189.9 Summary

Key QoS decisions:

Scenario Recommended QoS
High-frequency sensor data QoS 0
Alerts and notifications QoS 1
Critical actuator commands QoS 2
Battery-constrained devices QoS 0 (prefer)
Financial transactions QoS 2

Remember:

  • QoS 2 costs 4x more than QoS 0 in messages and battery
  • Use persistent sessions (clean_session=false) for reliable delivery
  • Default to QoS 0 for 95%+ of IoT telemetry

1189.10 What’s Next

Now that you understand MQTT Quality of Service: