16  MQTT Quality of Service

In 60 Seconds

MQTT offers three QoS levels that trade reliability for overhead: QoS 0 (fire-and-forget, 1 message) is fastest but may lose data, QoS 1 (at-least-once, 2 messages) guarantees delivery but may duplicate, and QoS 2 (exactly-once, 4 messages) prevents both loss and duplication at 3x the battery cost. Start with QoS 1 for most IoT applications and use QoS 0 for high-frequency telemetry where the next reading replaces the last.

16.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Distinguish QoS Levels: Explain the differences between MQTT QoS 0, QoS 1, and QoS 2 and justify when each is appropriate
  • Calculate Message Overhead: Compute bandwidth, packet counts, and energy costs for each QoS level given real device parameters
  • Select Appropriate QoS: Evaluate IoT scenarios and apply the correct reliability level based on loss impact, duplicate impact, and battery constraints
  • Assess Battery Trade-offs: Analyze how QoS level selection affects device power consumption and calculate resulting battery life
  • MQTT: Message Queuing Telemetry Transport — pub/sub protocol optimized for constrained IoT devices over unreliable networks
  • Broker: Central server routing messages from publishers to all matching subscribers by topic pattern
  • Topic: Hierarchical string (e.g., home/bedroom/temperature) used to route messages to interested subscribers
  • QoS Level: Quality of Service 0/1/2 trading delivery guarantee for message overhead
  • Retained Message: Last message on a topic stored by broker for immediate delivery to new subscribers
  • Last Will and Testament: Pre-configured message published by broker when a client disconnects ungracefully
  • Persistent Session: Broker stores subscriptions and pending messages allowing clients to resume after disconnection

16.2 For Beginners: MQTT Quality of Service

MQTT QoS (Quality of Service) lets you choose how reliably messages are delivered. QoS 0 is fire-and-forget (fast, no guarantee). QoS 1 guarantees at-least-once delivery. QoS 2 guarantees exactly-once delivery. It is like choosing between a text message, a delivery confirmation email, and a registered letter.

“I need all three QoS levels for different things!” said Sammy the Sensor. “My routine temperature readings use QoS 0 – if one gets lost every few minutes, no big deal. The next one is coming soon anyway.”

“But my door lock commands use QoS 1!” said Max the Microcontroller. “When someone says ‘lock the front door,’ that message MUST arrive. QoS 1 means the broker keeps trying until it gets an acknowledgment. Sure, the lock might get the same ‘lock’ command twice, but locking an already-locked door is harmless.”

Lila the LED raised the stakes: “My payment system uses QoS 2 – exactly once. Imagine charging someone’s credit card twice because of a duplicate message! The four-step handshake (PUBLISH, PUBREC, PUBREL, PUBCOMP) is slower, but it guarantees the message arrives exactly once.”

Bella the Battery summed it up: “QoS 0 costs 1 message. QoS 1 costs 2. QoS 2 costs 4. Choose the cheapest level that’s safe for your use case – my battery will thank you!”

16.3 Prerequisites

Before diving into this chapter, you should be familiar with:

16.4 Quality of Service Levels

MQTT offers three reliability levels for message delivery:

QoS Level Guarantee Speed Use Case
QoS 0 At most once (fire and forget) Fastest Frequent sensor data
QoS 1 At least once (may duplicate) Medium Important alerts
QoS 2 Exactly once (guaranteed) Slowest Critical commands
Rule of Thumb

Default to QoS 0 for high-frequency telemetry, QoS 1 for important alerts, and reserve QoS 2 for safety-critical or financial commands.

16.4.1 QoS 0: Fire and Forget

Publisher                Broker                  Subscriber
    |                       |                         |
    |--- PUBLISH (QoS 0) -->|                         |
    |    [No Packet ID]     |                         |
    |                       |--- PUBLISH (QoS 0) ---->|
    |                       |    [No ACK needed]      |

Packet overhead: 1 message Delivery guarantee: None (message may be lost) Use case: Temperature sensor sending updates every 10 seconds

For a sensor sending 1 message/minute over cellular (100ms RTT, 8 mA TX, 5 mA RX):

QoS 0 energy per message: $ E_{} = E_{} = 8 = 0.4 $

QoS 1 energy per message (PUBLISH + PUBACK): $ E_{} = E_{} + E_{} = (8 ) + (5 ) = 0.65 $

QoS 2 energy per message (4-way handshake): $ E_{} = 2 E_{} + 2 E_{} = 2(8 ) + 2(5 ) = 1.3 $

Energy ratio: $ = = 3.25 $

Battery life (230 mAh CR2032, 1 msg/min, 1-year target):

  • QoS 0: \(E_{\text{daily}} = 0.4 \times 1440 = 576\text{ mAs} = 0.16\text{ mAh/day}\) → 3.9 years
  • QoS 1: \(E_{\text{daily}} = 0.65 \times 1440 = 936\text{ mAs} = 0.26\text{ mAh/day}\) → 2.4 years
  • QoS 2: \(E_{\text{daily}} = 1.3 \times 1440 = 1872\text{ mAs} = 0.52\text{ mAh/day}\) → 1.2 years

Conclusion: QoS 2 uses 3.25× more energy than QoS 0. For battery devices, use QoS 0 for telemetry and QoS 1 for important events; reserve QoS 2 only for truly critical commands.

16.4.2 QoS 1: At Least Once

Publisher                Broker                  Subscriber
    |                       |                         |
    |--- PUBLISH (QoS 1) -->|                         |
    |    Packet ID: 42      |                         |
    |                       |--- PUBLISH (QoS 1) ---->|
    |                       |    Packet ID: 99        |
    |<------ PUBACK --------|                         |
    |    Packet ID: 42      |<------- PUBACK ---------|
    |                       |    Packet ID: 99        |

Packet overhead: 2 messages (PUBLISH + PUBACK) Delivery guarantee: At least once (may receive duplicates) Use case: Motion detection alerts

16.4.3 QoS 2: Exactly Once (4-Way Handshake)

Publisher                Broker                  Subscriber
    |                       |                         |
    |--- PUBLISH (QoS 2) -->|                         |
    |    Packet ID: 42      |                         |
    |<------ PUBREC --------|                         |
    |    Packet ID: 42      |--- PUBLISH (QoS 2) ---->|
    |                       |    Packet ID: 100       |
    |------ PUBREL -------->|                         |
    |    Packet ID: 42      |<------ PUBREC ----------|
    |                       |    Packet ID: 100       |
    |<------ PUBCOMP -------|                         |
    |    Packet ID: 42      |------ PUBREL ---------->|
    |                       |    Packet ID: 100       |
    |                       |<------ PUBCOMP ---------|
    |                       |    Packet ID: 100       |

Packet overhead: 4 messages per side (8 total) Delivery guarantee: Exactly once (no duplicates, guaranteed delivery) Use case: Critical commands (unlock door, dispense medication)

Why 4 steps?

  1. PUBLISH - “Here’s a message”
  2. PUBREC - “I received it, but haven’t processed yet”
  3. PUBREL - “OK, you can process it now”
  4. PUBCOMP - “Done processing, we’re complete”

16.5 QoS Trade-off Analysis

Tradeoff: QoS Level Selection

Option A: Use QoS 0 (fire-and-forget) for all telemetry data Option B: Use QoS 1/2 for guaranteed delivery with acknowledgments

Decision Factors:

Factor QoS 0 QoS 1 QoS 2
Message overhead 1 packet 2 packets 4 packets
Battery impact Lowest Medium (+63%) Highest (+225%)
Delivery guarantee None At-least-once Exactly-once
Duplicate messages No Possible Never
Network traffic Baseline +12% +29%
Broker load Lowest 2x 4x

Choose QoS 0 when:

  • High-frequency sensor readings where next value supersedes lost ones
  • Battery life is critical priority
  • Data redundancy exists (multiple sensors)
  • Network is generally reliable

Choose QoS 1 when:

  • Important notifications and alerts
  • Audit logging where completeness matters
  • Commands where idempotent execution is safe

Choose QoS 2 when:

  • Financial transactions or billing events
  • Security-critical commands
  • State changes where duplicate execution is dangerous

Default recommendation: QoS 0 for 95%+ of IoT telemetry; QoS 1 for alerts; QoS 2 only when exactly-once is required

16.6 Interactive Calculators

16.6.1 QoS Energy & Battery Life Calculator

Estimate how each QoS level affects battery life for a coin-cell or battery-powered IoT device.

16.6.2 QoS Bandwidth & Traffic Calculator

Calculate daily and monthly network traffic for your IoT fleet at each QoS level.

16.6.3 QoS Cloud Cost Calculator

Estimate monthly and annual cloud messaging costs (e.g., AWS IoT Core) for each QoS level.

16.6.4 QoS Selection Advisor

Answer three questions about your use case to get a QoS recommendation.

16.7 Common Misconception: Always Use QoS 2

Myth: “Always Use QoS 2 for Important Data”

The Misconception: Many developers assume that “important” IoT data should always use QoS 2 because it provides the strongest guarantee.

The Reality: QoS 2 has a 4x message overhead compared to QoS 0 and can reduce battery life by 70-80% in battery-powered devices.

Real-World Example - Smart Agriculture Deployment:

A large-scale agricultural IoT deployment initially configured 10,000 soil moisture sensors with QoS 2 for all readings because the data was considered “critical for irrigation decisions.”

Results after 3 months:

  • Battery life: 4.2 months (expected 12-18 months with QoS 0)
  • Network congestion: 40% of messages delayed >10 seconds
  • AWS IoT costs: $14,800/month (vs projected $3,700/month)
  • Maintenance costs: $45,000 for emergency battery replacements

After switching to QoS 0 with 1-minute sampling:

  • Battery life: Extended to 14 months
  • Data quality: Only 0.03% message loss - acceptable for soil moisture
  • Network latency: Reduced to <2 seconds average
  • AWS IoT costs: Dropped to $3,200/month (78% savings)
  • Total annual savings: ~$175,000

Rule of Thumb:

  • QoS 0: Sensor readings every few seconds/minutes (99%+ of traffic)
  • QoS 1: Alerts, notifications (<1% of traffic)
  • QoS 2: Critical commands (<0.1% of traffic)

16.8 Worked Example: Smart Building MQTT Traffic Analysis

Problem Statement

Context: You are designing an MQTT-based monitoring system for a commercial office building:

  • 50 temperature sensors distributed across 5 floors
  • Each sensor publishes every 5 minutes (288 readings/day per sensor)
  • Payload: 20 bytes JSON (e.g., {"temp":24.5,"unit":"C"})
  • Topic: Average 25 bytes (e.g., building/floor3/room12/temp)

Task: Calculate the daily network traffic and message overhead for QoS 0, QoS 1, and QoS 2.

16.8.1 MQTT Packet Sizes (from MQTT 3.1.1 Specification)

Component Size Notes
Fixed Header 2 bytes Packet type + remaining length
Variable Header (PUBLISH) Topic length (2) + Topic + Packet ID (2 for QoS>0)
PUBACK packet 4 bytes Fixed header (2) + Packet ID (2)
PUBREC/PUBREL/PUBCOMP 4 bytes each For QoS 2 handshake

16.8.2 Step 1: Calculate Base PUBLISH Packet Size

QoS 0 PUBLISH packet:

Fixed Header:           2 bytes
Topic Length:           2 bytes
Topic Name:            25 bytes  (e.g., "building/floor3/room12/temp")
Packet ID:              0 bytes  (not used for QoS 0)
Payload:               20 bytes  (JSON data)
-----------------------------------------
Total QoS 0 PUBLISH:   49 bytes

QoS 1/2 PUBLISH packet:

Fixed Header:           2 bytes
Topic Length:           2 bytes
Topic Name:            25 bytes
Packet ID:              2 bytes  (required for QoS 1 and 2)
Payload:               20 bytes
-----------------------------------------
Total QoS 1/2 PUBLISH: 51 bytes

16.8.3 Step 2: Calculate Total Bytes per Message

QoS 0: Fire and Forget

Messages per reading:  1 (PUBLISH only)
Bytes per reading:    49 bytes

QoS 1: At Least Once

Messages per reading:  2 (PUBLISH + PUBACK)
PUBLISH:              51 bytes
PUBACK:                4 bytes
-----------------------------------------
Total per reading:    55 bytes

QoS 2: Exactly Once

Messages per reading:  4 (PUBLISH + PUBREC + PUBREL + PUBCOMP)
PUBLISH:              51 bytes
PUBREC:                4 bytes
PUBREL:                4 bytes
PUBCOMP:               4 bytes
-----------------------------------------
Total per reading:    63 bytes

16.8.4 Step 3: Calculate Daily Traffic

Readings per day: 24 hours x 12 readings/hour = 288 readings/sensor/day

QoS Level Bytes/Reading Daily per Sensor Calculation
QoS 0 49 bytes 14,112 bytes 49 x 288
QoS 1 55 bytes 15,840 bytes 55 x 288
QoS 2 63 bytes 18,144 bytes 63 x 288

16.8.5 Step 4: Calculate Building-Wide Traffic (50 Sensors)

QoS Level Daily per Sensor Total Daily Daily in KB
QoS 0 14,112 bytes 705,600 bytes 689 KB
QoS 1 15,840 bytes 792,000 bytes 774 KB
QoS 2 18,144 bytes 907,200 bytes 886 KB

16.8.6 Step 5: Calculate Monthly Data and Message Count

QoS Level Monthly Data Monthly Messages Overhead vs QoS 0
QoS 0 20.2 MB 432,000 Baseline
QoS 1 22.7 MB 864,000 +12.3%
QoS 2 25.9 MB 1,728,000 +28.6%
MQTT QoS level comparison
Figure 16.1: QoS comparison for 50-sensor smart building deployment

16.8.7 Design Recommendation

Recommendation for This Scenario

Use QoS 0 for this deployment because:

  1. Temperature data is non-critical - If one reading is lost, the next arrives soon
  2. Redundancy from frequency - 288 readings/day means losing a few has minimal impact
  3. Lowest bandwidth cost - Saves 5.7 MB/month vs QoS 2
  4. Lowest broker load - 4x fewer messages than QoS 2

When would QoS 1/2 be justified?

Scenario Recommended QoS Rationale
Temperature readings every 5 min QoS 0 High frequency, non-critical
Fire alarm trigger QoS 1 Critical alert, duplicates acceptable
HVAC setpoint command QoS 1 Important, duplicate won’t cause harm
Door lock/unlock command QoS 2 Security-critical, exactly once

Scenario: You’re designing an MQTT system for a 50-acre strawberry farm with three types of data:

  1. Soil moisture sensors (20 sensors): Publish every 30 minutes → 960 messages/day
  2. Frost alerts (5 temperature sensors): Publish when temp < 2°C → ~10 alerts/day (seasonal)
  3. Irrigation valve commands: Control 8 zones → ~50 commands/day

Step 1: Analyze Message Criticality

Data Type Loss Impact Duplicate Impact Recommended QoS
Soil moisture Low (next reading in 30 min) None (idempotent data) QoS 0
Frost alert High (crop damage in 30 min) Low (duplicate alerts acceptable) QoS 1
Valve command Medium (wasted water/drought) High (toggle effect: ON-ON-OFF = OFF!) QoS 2

Step 2: Calculate Battery Impact

Assume LoRaWAN gateway with 2000 mAh battery, 120 mA TX current, 50 ms per QoS 0 message, 150 ms per QoS 1, 300 ms per QoS 2:

  • Soil moisture (QoS 0): 960 msg/day × 50 ms × 120 mA = 1.6 mAh/day
  • Frost alerts (QoS 1): 10 msg/day × 150 ms × 120 mA = 0.05 mAh/day
  • Valve commands (QoS 2): 50 msg/day × 300 ms × 120 mA = 0.5 mAh/day

Total daily consumption: 1.6 + 0.05 + 0.5 = 2.15 mAh/day Battery life: 2000 mAh / 2.15 mAh/day = 930 days (~2.5 years)

What if we used QoS 2 for everything?

  • Total: (960 + 10 + 50) × 300 ms × 120 mA = 10.2 mAh/day
  • Battery life: 2000 / 10.2 = 196 days (6.5 months) - 4.7x worse!

Step 3: Design Decision

# Publisher configuration
TOPICS = {
    "farm/soil/moisture": {"qos": 0, "retain": True},   # Latest reading for new subscribers
    "farm/alerts/frost": {"qos": 1, "retain": False},   # Event, not state
    "farm/valves/+/command": {"qos": 2, "retain": True} # Last command = current state
}

Key Insight: QoS 2 usage is <5% of messages but ensures critical commands never duplicate. QoS 0 for 94% of traffic optimizes battery life without compromising reliability.

16.8.8 Cost Implications (AWS IoT Core Pricing)

QoS Level Monthly Messages Cost @ $1/million Annual Cost
QoS 0 432,000 $0.43 $5.18
QoS 1 864,000 $0.86 $10.37
QoS 2 1,728,000 $1.73 $20.74

Key insight: QoS 2 costs 4x more than QoS 0 for the same sensor data.

16.9 Interactive: QoS Level Simulator

Interactive Animation: This visualization is under development.

16.10 Session Management

Pitfall: Ignoring Clean Session Implications

The Mistake: Using clean_session=True without understanding that the broker discards all stored state, causing missed messages after reconnection.

Why It Happens: Many examples use clean_session=True for simplicity. When a device reconnects after network loss, it must re-subscribe, and messages published during disconnection are lost.

The Fix: For most IoT applications, use clean_session=False with a persistent client ID:

# BAD: Clean session loses all state on reconnect
client.connect(broker, clean_session=True)

# GOOD: Persistent session retains subscriptions and queued messages
client.connect(broker, client_id="device-ABC123", clean_session=False)

With persistent sessions:

  • Subscriptions survive reconnection
  • QoS 1/2 messages are queued while client is offline
  • Session expiry interval (MQTT 5.0) controls how long broker retains state

Common Pitfalls

Unencrypted MQTT exposes device credentials and sensor data to network eavesdroppers — in a building IoT deployment on shared WiFi, this means any connected device can read all sensor data. Always enable TLS 1.2+ on the broker and generate unique client certificates for each device class.

Without LWT, there is no automatic notification when a device disconnects ungracefully — missed timeout alarms and false-healthy device status are common consequences. Configure LWT on every device connection to publish an offline status message, enabling real-time fleet health monitoring.

A single MQTT connection serializes all publishes through one TCP socket — at 100 messages/second with QoS 1, TCP backpressure creates queuing latency. Use multiple parallel MQTT connections or partition topics across connection pools for throughput above 1,000 messages/second.

16.11 What’s Next

Now that you can distinguish MQTT QoS levels and calculate their overhead:

Chapter Focus Why Read It
MQTT Security TLS, authentication, and access control Protect your QoS-configured MQTT deployment from unauthorised access
MQTT QoS and Session Management Persistent sessions, message queuing, and session expiry Go deeper on how the broker manages state across QoS levels
MQTT Labs Hands-on MQTT client implementation Apply QoS selection decisions in working Python and C++ clients
MQTT Publish-Subscribe Basics Topics, wildcards, and broker architecture Revisit the foundation if topics or broker routing are still unclear
CoAP Protocol Fundamentals Confirmable vs non-confirmable messages in CoAP Compare MQTT QoS with CoAP’s reliability model for constrained devices