21  MQTT QoS Fundamentals

In 60 Seconds

MQTT Quality of Service (QoS) determines how reliably messages are delivered between clients and broker: QoS 0 sends once with no confirmation (like shouting into a crowd), QoS 1 retries until acknowledged (like registered mail), and QoS 2 uses a four-step handshake for exactly-once delivery (like a bank transfer). For most IoT sensor readings, QoS 0 or 1 suffices; reserve QoS 2 for commands where duplication would cause harm, such as billing transactions or actuator controls.

21.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Explain QoS Semantics: Explain what Quality of Service means in MQTT messaging and justify why three distinct delivery guarantee levels exist
  • Distinguish QoS Levels: Distinguish between QoS 0, 1, and 2 by comparing their handshake sequences, overhead costs, and reliability guarantees
  • Analyze Session Trade-offs: Analyze the difference between clean and persistent sessions, assessing the impact on broker memory and offline message delivery
  • Apply Retained Messages: Apply retained messages correctly to device status topics and demonstrate when they provide value over real-time publishing
  • Select and Justify QoS Choices: Select appropriate QoS levels for common IoT scenarios and justify each selection based on criticality, idempotency, and battery constraints
  • QoS 0 (At Most Once): Fire-and-forget delivery with no acknowledgment — lowest overhead, message loss possible
  • QoS 1 (At Least Once): ACK-based delivery ensuring message arrives at least once — duplicate possible if PUBACK lost
  • QoS 2 (Exactly Once): Four-message handshake (PUBLISH/PUBREC/PUBREL/PUBCOMP) guaranteeing exactly one delivery
  • PUBACK: Broker acknowledgment for QoS 1 publish — triggers publisher to remove message from retry queue
  • QoS Downgrade: Broker silently delivers at subscriber’s QoS level if lower than publisher’s — common source of unexpected behavior
  • In-flight Messages: QoS 1/2 messages awaiting acknowledgment — window size limits concurrent unacknowledged messages
  • Message Ordering: MQTT guarantees ordered delivery within a client session but not across sessions or publishers

21.2 For Beginners: MQTT QoS Fundamentals

Quality of Service in MQTT determines how hard the system tries to deliver each message. At the lowest level (QoS 0), messages are sent once with no confirmation. At the highest level (QoS 2), a four-step handshake ensures the message arrives exactly once. Choosing the right level is a trade-off between reliability and speed.

“Why can’t everything just be QoS 2?” asked Sammy the Sensor. “Then nothing would ever get lost!”

Bella the Battery shook her head. “Because QoS 2 uses FOUR messages for every single reading: PUBLISH, PUBREC, PUBREL, PUBCOMP. If you send temperature every 10 seconds, that’s 4 times the radio usage. My battery would drain in weeks instead of months!”

Max the Microcontroller drew it out: “QoS 0 is one arrow: you send it and forget. QoS 1 is two arrows: you send, the broker sends back PUBACK. If you don’t get PUBACK, you resend. QoS 2 is four arrows with a two-phase commit. Each step up doubles the overhead but strengthens the guarantee.”

“The key insight,” said Lila the LED, “is matching QoS to the MESSAGE, not the device. Sammy can use QoS 0 for routine temperature and QoS 1 for critical alerts – in the same connection. You don’t have to pick one QoS for everything. Be smart about it and Bella stays happy!”

Common Pitfalls

QoS 2’s four-message handshake consumes 4× the bandwidth of QoS 0 and adds 2-4 roundtrips of latency — on a 2G connection this can mean 2-8 seconds per message. Reserve QoS 2 for irreplaceable commands (actuator control, firmware triggers); use QoS 1 for telemetry where occasional duplicates are acceptable.

MQTT QoS governs publisher-to-broker and broker-to-subscriber segments independently — QoS 1 from publisher to broker combined with QoS 0 subscription results in at-most-once overall. Always set both the publish QoS AND the subscribe QoS to the required level.

QoS 1 guarantees at-least-once delivery — when the broker does not receive PUBACK, it retransmits the message with dup=true flag. Applications storing sensor readings will create duplicate database entries unless they check message IDs or implement idempotent writes.

21.3 What is QoS (Quality of Service)?

Analogy: QoS is like different shipping methods when sending a package.

Three QoS levels = Three shipping methods:

QoS Level Shipping Analogy Real Meaning When to Use
QoS 0 Drop in mailbox Fire and forget (maybe lost) Temperature readings every 10 sec
QoS 1 Certified mail At least once (signature required, might get duplicate) Door sensor “opened” alert
QoS 2 Registered mail Exactly once (signed tracking, no duplicates) Smart lock “UNLOCK” command

21.4 QoS 0: Fire and Forget

Simple explanation:

Sensor -> Broker: "Temperature: 24C"
Broker: (receives it... or doesn't, no confirmation)
Sensor: (doesn't care, sends next reading in 10 seconds anyway)
  • Fastest (no waiting for acknowledgment)
  • Lowest battery usage (minimal radio time)
  • Might be lost (no delivery guarantee)

21.5 QoS 1: At Least Once

Simple explanation:

Sensor -> Broker: "Door opened!"
Broker -> Sensor: "Got it! (PUBACK)"
Sensor: "OK, done"

(If PUBACK is lost, sensor retries)
Sensor -> Broker: "Door opened!" (again)
Broker: "Got it again!" (duplicate received)
  • Guaranteed delivery (will retry until acknowledged)
  • Might get duplicates (if acknowledgment is lost)
  • Slower (waits for PUBACK)

21.6 QoS 2: Exactly Once

Simple explanation:

Sensor -> Broker: "UNLOCK door!"
Broker -> Sensor: "Received (PUBREC)"
Sensor -> Broker: "Release it (PUBREL)"
Broker -> Sensor: "Complete! (PUBCOMP)"

Four-way handshake ensures exactly one delivery
  • Guaranteed exactly once (no duplicates, no losses)
  • Slowest (4-way handshake)
  • Highest battery usage (most radio time)

Figure 21.1: MQTT Quality of Service levels with increasing reliability guarantees
Minimum Viable Understanding: QoS Semantics

Core Concept: QoS 0 delivers at most once (may lose messages), QoS 1 delivers at least once (may duplicate), QoS 2 delivers exactly once (no loss, no duplicates) - each level adds handshake overhead for stronger guarantees. Why It Matters: Choosing the wrong QoS wastes battery life (QoS 2 for temperature readings) or loses critical data (QoS 0 for door unlock commands) - this single decision can determine whether your IoT deployment succeeds or fails. Key Takeaway: Use QoS 0 for high-frequency telemetry where the next reading supersedes any lost one, QoS 1 for important alerts and state changes, and reserve QoS 2 only for critical commands where duplicates would cause harm (like financial transactions or actuator commands).

21.7 Real-World Examples

Scenario 1: Temperature Sensor (Use QoS 0)

10:00:00 -> Sensor: "Temp: 24.0C" (QoS 0)
10:00:10 -> Sensor: "Temp: 24.1C" (QoS 0)
10:00:20 -> Sensor: "Temp: 24.1C" (QoS 0) <- This one gets lost!
10:00:30 -> Sensor: "Temp: 24.2C" (QoS 0)

Dashboard shows: 24.0, 24.1, 24.2
Missing one reading? No problem! Next one comes in 10 seconds.

Scenario 2: Motion Sensor Alert (Use QoS 1)

Motion detected!
Sensor -> Broker: "Motion detected in garage!" (QoS 1)
Network glitch... PUBACK lost...
Sensor waits 5 seconds, no ACK
Sensor -> Broker: "Motion detected in garage!" (retry, QoS 1)
Broker: "PUBACK" (acknowledges)

Result: Alert delivered (maybe duplicate, but that's OK)

Scenario 3: Smart Lock Command (Use QoS 2)

User presses "Unlock" in app
App -> Broker: "UNLOCK door #1" (QoS 2)
Broker -> Lock: "UNLOCK door #1" (QoS 2)

4-way handshake ensures lock receives exactly once
Lock unlocks (never unlocks twice, never missed)

21.8 What are Sessions? (Clean vs Persistent)

Analogy: Think of sessions as hotel check-in preferences.

21.8.1 Clean Session (true)

Guest: "I want a fresh room, no baggage from last visit"
Hotel: "Sure! Starting fresh."

(Guest leaves)
Hotel: "They checked out, throw away their stuff"

MQTT Translation:

  • Client connects with Clean Session = true
  • Broker forgets everything when client disconnects
  • No stored messages, no subscriptions remembered
  • Like checking into a hotel with no history

When to use: Temporary connections, testing, devices that don’t care about missed messages

21.8.2 Persistent Session (false)

Guest: "I'm a regular, keep my preferences!"
Hotel: "Welcome back! We saved your room setup, mail, messages"

(Guest leaves for a week)
Hotel: "Keeping their stuff safe..."

(Guest returns)
Hotel: "Here are the 15 letters that arrived while you were gone"

MQTT Translation:

  • Client connects with Clean Session = false
  • Broker remembers subscriptions even after disconnect
  • Broker stores messages sent while client was offline (if QoS 1 or 2)
  • Client gets “catch-up” messages when reconnecting

When to use: Devices with intermittent connectivity, battery-powered sensors that sleep, critical subscribers that can’t miss messages

Clean vs persistent sessions

MQTT session lifecycle comparison
Figure 21.2: MQTT session lifecycle comparison showing Clean Session (temporary, discards everything on disconnect) vs Persistent Session (maintains subscriptions and queues messages for offline clients). Clean sessions are optimal for simple publishers, while persistent sessions are essential for devices with intermittent connectivity that cannot miss critical messages.

21.9 Retained Messages: Last-Known State

MQTT retained messages allow the broker to store the most recent message on a topic and deliver it immediately to new subscribers:

MQTT retained message workflow

MQTT retained message workflow
Figure 21.3: MQTT retained message workflow showing how brokers store the last published value and deliver it immediately to new subscribers. When a sensor publishes with retain=true, the broker stores that message and automatically sends it to any future subscribers, even if the sensor is offline. This is essential for status topics where clients need the current state immediately upon connection.
Tradeoff: Persistent Session vs Clean Session

Decision context: When configuring an MQTT client, you must choose whether the broker should remember client state between connections.

Factor Persistent Session Clean Session
Memory usage Higher (broker stores state) Lower (no state stored)
Reconnect speed Faster (subscriptions restored) Slower (must resubscribe)
Offline messages Queued (QoS 1/2) Lost
Client ID Must be unique and stable Can be random/ephemeral
Broker scalability Lower (state per client) Higher (stateless)
Battery impact Higher (catch-up data on reconnect) Lower (fresh start)

Choose Persistent Session when:

  • Device has intermittent connectivity (sleep cycles, mobile networks)
  • Cannot afford to miss messages during disconnection
  • Subscribes to topics and needs subscriptions restored automatically
  • Receiving critical commands (e.g., firmware updates, emergency alerts)

Choose Clean Session when:

  • Device only publishes (no subscriptions to maintain)
  • Frequent sensor readings where missing a few is acceptable
  • Testing or development environments
  • Battery-powered devices that prioritize power savings over completeness
  • High-scale deployments where broker memory is constrained

Default recommendation: Clean Session unless your device subscribes to topics and cannot miss messages during offline periods.

21.10 Quick Self-Check

Q: You have a battery-powered door sensor that: - Sleeps for 10 minutes to save battery - Wakes up, connects to broker, publishes “status: OK” - Disconnects and goes back to sleep - Someone subscribes to see door sensor status while sensor is asleep

Should you use: - A) QoS 0, Clean Session = true - B) QoS 1, Clean Session = true - C) QoS 0, Clean Session = false - D) QoS 1, Clean Session = false

Click to see the answer

Answer: B) QoS 1, Clean Session = true

Analysis:

QoS Level:

  • QoS 1 is correct
    • “status: OK” is important (want delivery confirmation)
    • If message is lost, subscriber doesn’t know sensor is alive
    • QoS 1 ensures broker receives it
    • Possible duplicates are OK (idempotent message)
  • QoS 0 is risky
    • Message might be lost
    • Subscriber never knows if sensor is working
  • QoS 2 is overkill
    • “status: OK” duplicates are harmless
    • Wastes battery with 4-way handshake

Clean Session:

  • Clean Session = true is correct
    • Sensor doesn’t subscribe to anything (only publishes)
    • Sensor doesn’t care about messages from when it was offline
    • Saves broker memory (doesn’t store session state)
  • Clean Session = false is unnecessary
    • Sensor isn’t subscribing, so no benefit from saved subscriptions
    • Broker wastes memory storing empty session

Power consumption comparison:

QoS 0, Clean=true:  10 mA x 100ms = 0.28 uAh per wake cycle
QoS 1, Clean=true:  10 mA x 150ms = 0.42 uAh per wake cycle (Best balance)
QoS 1, Clean=false: 10 mA x 200ms = 0.56 uAh per wake cycle
QoS 2, Clean=false: 10 mA x 400ms = 1.11 uAh per wake cycle

What about the subscriber getting the status?

Separate concern: Use Retained Messages!

// Door sensor publishes with QoS 1 AND retain flag
mqttClient.publish("door/sensor1/status", "OK", 1, true);
//                   topic                  msg    QoS  retain

// Now when subscriber connects (even if sensor is asleep):
// Broker immediately sends retained message: "status: OK"

Optimal configuration:

  • Sensor: QoS 1, Clean Session = true, Retain = true
  • Subscriber: QoS 0, Clean Session = true (just wants current status)

Relative battery impact (QoS 0 = baseline):

  • QoS 0: 1x overhead (shortest active time per cycle)
  • QoS 1: ~1.5x overhead (one extra round-trip for PUBACK)
  • QoS 2: ~4x overhead (four-step handshake per message)

Pro tip: For battery-powered sensors, use QoS 1 + Retained + Clean Session = true. This ensures reliable delivery without the overhead of persistent sessions.

21.11 Real-World QoS Decision Framework

Selecting the right QoS level requires analyzing three dimensions of your data: criticality, frequency, and idempotency.

Worked Example: QoS Selection for a Smart Hospital

Scenario: A hospital deploys 2,000 IoT devices across three categories. Each category has different reliability requirements and message patterns.

Device inventory and QoS assignment:

Device Type Count Frequency Critical? Idempotent? QoS Rationale
Room temperature 800 Every 60s No Yes 0 Next reading replaces lost one; 800 x 1 msg/min = manageable
Patient heart rate 600 Every 5s Yes Yes 1 Must arrive; duplicate “HR: 72” is harmless
IV pump dosage 400 On command Critical No 2 Duplicate “administer 50mg” could be dangerous
Nurse call button 200 On event Yes Yes 1 Must arrive; duplicate alert is harmless

Battery and bandwidth impact calculation:

  • QoS 0 messages: 800 devices x 1/min = 800 msg/min, 1 packet each = 800 packets/min
  • QoS 1 messages: 800 devices x 12/min = 9,600 msg/min, 2 packets each = 19,200 packets/min
  • QoS 2 messages: 400 devices x ~2/hour = 800 msg/hour, 4 packets each = 53 packets/min

Total network load: 800 + 19,200 + 53 = 20,053 packets/min

If all devices used QoS 2: (800 + 9,600 + 13) x 4 ≈ 41,652 packets/min (2.1x more traffic for no benefit on temperature sensors).

Key insight: The hospital saves ~52% of network traffic by matching QoS to data criticality rather than using a blanket “everything QoS 2” policy.

The packet overhead grows linearly with QoS level. For \(n\) devices publishing at rate \(r\) (messages/min), total packet count is:

\[P_{\text{total}} = \sum_{i=1}^{n} r_i \cdot m_i\]

where \(m_i\) is the packet multiplier per QoS level: \(m_0 = 1\), \(m_1 = 2\), \(m_2 = 4\).

For this hospital with mixed QoS assignment: - QoS 0: \(P_0 = 800 \times 1 \times 1 = 800\) packets/min - QoS 1: \(P_1 = 800 \times 12 \times 2 = 19,200\) packets/min - QoS 2: \(P_2 = 400 \times (2/60) \times 4 = 53\) packets/min

Total: \(P = 800 + 19,200 + 53 = 20,053\) packets/min

If all used QoS 2: \(P'_{\text{all-QoS2}} = (800 \times 1 + 800 \times 12 + 400 \times 2/60) \times 4 \approx 41,652\) packets/min

Efficiency gain: \(\eta = 1 - P/P' = 1 - 20,053/41,652 \approx 52\%\) reduction.

This demonstrates that the packet overhead ratio is \(m_2 : m_1 : m_0 = 4 : 2 : 1\) — each QoS level doubles the network load.

21.12 Interactive Calculators

21.12.1 QoS Level Advisor

Enter your scenario parameters to get a QoS level recommendation based on message criticality, frequency, and idempotency.

21.12.2 Network Packet Overhead Calculator

Model a mixed-QoS IoT deployment and compare total packet load against a blanket QoS 2 policy.

21.12.3 Message Delivery Probability Explorer

See how network reliability affects actual message delivery for each QoS level. QoS 0 delivers at most once; QoS 1 retries to improve success; QoS 2 adds a full handshake.

21.12.4 Session Type Decision Tool

Answer three questions about your device to determine whether to use a clean or persistent MQTT session.

21.13 Summary

This chapter introduced the fundamentals of MQTT Quality of Service:

  • QoS 0 (At Most Once): Fire-and-forget delivery with no acknowledgment, fastest and most battery-efficient but messages may be lost
  • QoS 1 (At Least Once): Acknowledged delivery ensuring messages arrive but may create duplicates if acknowledgments are lost
  • QoS 2 (Exactly Once): Four-way handshake guaranteeing single delivery, slowest but essential for critical commands
  • Clean Sessions: Temporary connections where broker forgets everything on disconnect, ideal for simple publishers
  • Persistent Sessions: Broker remembers subscriptions and queues messages for offline clients, essential for command receivers
  • Retained Messages: Broker stores last message and delivers to new subscribers immediately, perfect for device status

21.14 Knowledge Check

21.15 Concept Relationships

MQTT QoS Fundamentals connect to:

QoS decision framework: Assess message criticality (can it be lost?) → assess idempotency (is duplicate safe?) → assess battery constraints → choose QoS 0 (telemetry), 1 (alerts), or 2 (commands).

21.16 See Also

21.17 What’s Next

Chapter Focus Why Read It
MQTT QoS Levels Technical deep dive into QoS handshakes and message flow Understand the byte-level protocol mechanics and timing of each QoS level
MQTT Session Management Persistent sessions, message queuing, and reconnection strategies Learn how brokers track client state and deliver offline messages reliably
MQTT Implementation Coding QoS in Python and on ESP32 microcontrollers Translate QoS theory into working publisher/subscriber code
MQTT Fundamentals Core pub/sub model, topics, and broker architecture Revisit the foundational concepts that QoS levels build upon
CoAP Features and Labs CoAP confirmable messages as an alternative to MQTT QoS Compare MQTT’s QoS model with CoAP’s reliability mechanism for constrained devices