21  MQTT QoS Fundamentals

In 60 Seconds

MQTT Quality of Service (QoS) determines how reliably messages are delivered between clients and broker: QoS 0 sends once with no confirmation (like shouting into a crowd), QoS 1 retries until acknowledged (like registered mail), and QoS 2 uses a four-step handshake for exactly-once delivery (like a bank transfer). For most IoT sensor readings, QoS 0 or 1 suffices; reserve QoS 2 for commands where duplication would cause harm, such as billing transactions or actuator controls.

21.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Explain QoS Semantics: Explain what Quality of Service means in MQTT messaging and justify why three distinct delivery guarantee levels exist
  • Distinguish QoS Levels: Distinguish between QoS 0, 1, and 2 by comparing their handshake sequences, overhead costs, and reliability guarantees
  • Analyze Session Trade-offs: Analyze the difference between clean and persistent sessions, assessing the impact on broker memory and offline message delivery
  • Apply Retained Messages: Apply retained messages correctly to device status topics and demonstrate when they provide value over real-time publishing
  • Select and Justify QoS Choices: Select appropriate QoS levels for common IoT scenarios and justify each selection based on criticality, idempotency, and battery constraints
  • QoS 0 (At Most Once): Fire-and-forget delivery with no acknowledgment — lowest overhead, message loss possible
  • QoS 1 (At Least Once): ACK-based delivery ensuring message arrives at least once — duplicate possible if PUBACK lost
  • QoS 2 (Exactly Once): Four-message handshake (PUBLISH/PUBREC/PUBREL/PUBCOMP) guaranteeing exactly one delivery
  • PUBACK: Broker acknowledgment for QoS 1 publish — triggers publisher to remove message from retry queue
  • QoS Downgrade: Broker silently delivers at subscriber’s QoS level if lower than publisher’s — common source of unexpected behavior
  • In-flight Messages: QoS 1/2 messages awaiting acknowledgment — window size limits concurrent unacknowledged messages
  • Message Ordering: MQTT guarantees ordered delivery within a client session but not across sessions or publishers

21.2 For Beginners: MQTT QoS Fundamentals

Quality of Service in MQTT determines how hard the system tries to deliver each message. At the lowest level (QoS 0), messages are sent once with no confirmation. At the highest level (QoS 2), a four-step handshake ensures the message arrives exactly once. Choosing the right level is a trade-off between reliability and speed.

“Why can’t everything just be QoS 2?” asked Sammy the Sensor. “Then nothing would ever get lost!”

Bella the Battery shook her head. “Because QoS 2 uses FOUR messages for every single reading: PUBLISH, PUBREC, PUBREL, PUBCOMP. If you send temperature every 10 seconds, that’s 4 times the radio usage. My battery would drain in weeks instead of months!”

Max the Microcontroller drew it out: “QoS 0 is one arrow: you send it and forget. QoS 1 is two arrows: you send, the broker sends back PUBACK. If you don’t get PUBACK, you resend. QoS 2 is four arrows with a two-phase commit. Each step up doubles the overhead but strengthens the guarantee.”

“The key insight,” said Lila the LED, “is matching QoS to the MESSAGE, not the device. Sammy can use QoS 0 for routine temperature and QoS 1 for critical alerts – in the same connection. You don’t have to pick one QoS for everything. Be smart about it and Bella stays happy!”

Common Pitfalls

QoS 2’s four-message handshake consumes 4× the bandwidth of QoS 0 and adds 2-4 roundtrips of latency — on a 2G connection this can mean 2-8 seconds per message. Reserve QoS 2 for irreplaceable commands (actuator control, firmware triggers); use QoS 1 for telemetry where occasional duplicates are acceptable.

MQTT QoS governs publisher-to-broker and broker-to-subscriber segments independently — QoS 1 from publisher to broker combined with QoS 0 subscription results in at-most-once overall. Always set both the publish QoS AND the subscribe QoS to the required level.

QoS 1 guarantees at-least-once delivery — when the broker does not receive PUBACK, it retransmits the message with dup=true flag. Applications storing sensor readings will create duplicate database entries unless they check message IDs or implement idempotent writes.

21.3 What is QoS (Quality of Service)?

Analogy: QoS is like different shipping methods when sending a package.

Three QoS levels = Three shipping methods:

QoS 0
  • Shipping analogy: Drop in mailbox
  • Real meaning: Fire and forget; the message may be lost
  • Typical use: Temperature readings every 10 seconds
QoS 1
  • Shipping analogy: Certified mail
  • Real meaning: At least once; duplicate delivery is possible
  • Typical use: Door sensor "opened" alert
QoS 2
  • Shipping analogy: Registered mail
  • Real meaning: Exactly once; no loss and no duplicates
  • Typical use: Smart lock "UNLOCK" command

21.4 QoS 0: Fire and Forget

Simple explanation:

  • Sensor publishes "Temperature: 24C" to the broker.

  • The broker may receive it, or it may be lost in transit.

  • The sensor does not wait for any reply; it just sends the next reading 10 seconds later.

  • Fastest (no waiting for acknowledgment)

  • Lowest battery usage (minimal radio time)

  • Might be lost (no delivery guarantee)

21.5 QoS 1: At Least Once

Simple explanation:

  • Sensor publishes "Door opened!".

  • Broker replies with PUBACK to confirm receipt.

  • If that PUBACK is lost, the sensor retries and the broker may see the same alert twice.

  • Guaranteed delivery (will retry until acknowledged)

  • Might get duplicates (if acknowledgment is lost)

  • Slower (waits for PUBACK)

21.6 QoS 2: Exactly Once

Simple explanation:

  • Sensor publishes "UNLOCK door!".

  • Broker replies PUBREC to say it has received the message.

  • Sensor sends PUBREL to release the message for final processing.

  • Broker replies PUBCOMP to confirm the exchange is complete.

  • This four-step handshake ensures the command is processed exactly once.

  • Guaranteed exactly once (no duplicates, no losses)

  • Slowest (4-way handshake)

  • Highest battery usage (most radio time)

Figure 21.1: MQTT Quality of Service levels with increasing reliability guarantees
Minimum Viable Understanding: QoS Semantics

Core Concept: QoS 0 delivers at most once (may lose messages), QoS 1 delivers at least once (may duplicate), QoS 2 delivers exactly once (no loss, no duplicates) - each level adds handshake overhead for stronger guarantees. Why It Matters: Choosing the wrong QoS wastes battery life (QoS 2 for temperature readings) or loses critical data (QoS 0 for door unlock commands) - this single decision can determine whether your IoT deployment succeeds or fails. Key Takeaway: Use QoS 0 for high-frequency telemetry where the next reading supersedes any lost one, QoS 1 for important alerts and state changes, and reserve QoS 2 only for critical commands where duplicates would cause harm (like financial transactions or actuator commands).

21.7 Real-World Examples

Scenario 1: Temperature Sensor (Use QoS 0) - 10:00:00: Sensor publishes 24.0C at QoS 0. - 10:00:10: Sensor publishes 24.1C at QoS 0. - 10:00:20: One 24.1C reading is lost. - 10:00:30: Sensor publishes 24.2C at QoS 0. - Dashboard still shows a useful trend: 24.0, 24.1, 24.2.

Scenario 2: Motion Sensor Alert (Use QoS 1) - Motion is detected in the garage. - Sensor publishes "Motion detected in garage!" at QoS 1. - A network glitch loses the PUBACK. - Sensor waits, retries, and the broker acknowledges the alert on the second attempt. - Result: the alert is delivered, even if it appears twice.

Scenario 3: Smart Lock Command (Use QoS 2) - User presses "Unlock" in the app. - App sends UNLOCK door #1 using QoS 2. - Broker forwards the same command to the lock with QoS 2. - The four-step handshake guarantees the lock acts on that command exactly once.

21.8 What are Sessions? (Clean vs Persistent)

Analogy: Think of sessions as hotel check-in preferences.

21.8.1 Clean Session (true)

  • Guest says, “I want a fresh room, no baggage from last visit.”
  • Hotel replies, “Sure. Starting fresh.”
  • When the guest leaves, the hotel throws everything away.

MQTT Translation:

  • Client connects with Clean Session = true
  • Broker forgets everything when client disconnects
  • No stored messages, no subscriptions remembered
  • Like checking into a hotel with no history

When to use: Temporary connections, testing, devices that don’t care about missed messages

21.8.2 Persistent Session (false)

  • Guest says, “I’m a regular. Keep my preferences.”
  • Hotel replies, “Welcome back. We saved your room setup, mail, and messages.”
  • While the guest is away, the hotel keeps everything safe.
  • When the guest returns, the hotel hands over everything that arrived during the absence.

MQTT Translation:

  • Client connects with Clean Session = false
  • Broker remembers subscriptions even after disconnect
  • Broker stores messages sent while client was offline (if QoS 1 or 2)
  • Client gets “catch-up” messages when reconnecting

When to use: Devices with intermittent connectivity, battery-powered sensors that sleep, critical subscribers that can’t miss messages

Clean vs persistent sessions

MQTT session lifecycle comparison
Figure 21.2: MQTT session lifecycle comparison showing Clean Session (temporary, discards everything on disconnect) vs Persistent Session (maintains subscriptions and queues messages for offline clients). Clean sessions are optimal for simple publishers, while persistent sessions are essential for devices with intermittent connectivity that cannot miss critical messages.

21.9 Retained Messages: Last-Known State

MQTT retained messages allow the broker to store the most recent message on a topic and deliver it immediately to new subscribers:

MQTT retained message workflow

MQTT retained message workflow
Figure 21.3: MQTT retained message workflow showing how brokers store the last published value and deliver it immediately to new subscribers. When a sensor publishes with retain=true, the broker stores that message and automatically sends it to any future subscribers, even if the sensor is offline. This is essential for status topics where clients need the current state immediately upon connection.
Tradeoff: Persistent Session vs Clean Session

Decision context: When configuring an MQTT client, you must choose whether the broker should remember client state between connections.

Persistent Session - Memory usage: Higher because the broker stores state - Reconnect speed: Faster because subscriptions are restored - Offline messages: Queued for QoS 1 and QoS 2 - Client ID: Must be unique and stable - Broker scalability: Lower because state is stored per client - Battery impact: Higher because catch-up data may be delivered on reconnect

Clean Session - Memory usage: Lower because no state is stored - Reconnect speed: Slower because the client must resubscribe - Offline messages: Lost while the client is disconnected - Client ID: Can be random or ephemeral - Broker scalability: Higher because the broker stays stateless - Battery impact: Lower because the client starts fresh each time

Choose Persistent Session when:

  • Device has intermittent connectivity (sleep cycles, mobile networks)
  • Cannot afford to miss messages during disconnection
  • Subscribes to topics and needs subscriptions restored automatically
  • Receiving critical commands (e.g., firmware updates, emergency alerts)

Choose Clean Session when:

  • Device only publishes (no subscriptions to maintain)
  • Frequent sensor readings where missing a few is acceptable
  • Testing or development environments
  • Battery-powered devices that prioritize power savings over completeness
  • High-scale deployments where broker memory is constrained

Default recommendation: Clean Session unless your device subscribes to topics and cannot miss messages during offline periods.

21.10 Quick Self-Check

Q: You have a battery-powered door sensor that: - Sleeps for 10 minutes to save battery - Wakes up, connects to broker, publishes “status: OK” - Disconnects and goes back to sleep - Someone subscribes to see door sensor status while sensor is asleep

Should you use: - A) QoS 0, Clean Session = true - B) QoS 1, Clean Session = true - C) QoS 0, Clean Session = false - D) QoS 1, Clean Session = false

Click to see the answer

Answer: B) QoS 1, Clean Session = true

Analysis:

QoS Level:

  • QoS 1 is correct
    • “status: OK” is important (want delivery confirmation)
    • If message is lost, subscriber doesn’t know sensor is alive
    • QoS 1 ensures broker receives it
    • Possible duplicates are OK (idempotent message)
  • QoS 0 is risky
    • Message might be lost
    • Subscriber never knows if sensor is working
  • QoS 2 is overkill
    • “status: OK” duplicates are harmless
    • Wastes battery with 4-way handshake

Clean Session:

  • Clean Session = true is correct
    • Sensor doesn’t subscribe to anything (only publishes)
    • Sensor doesn’t care about messages from when it was offline
    • Saves broker memory (doesn’t store session state)
  • Clean Session = false is unnecessary
    • Sensor isn’t subscribing, so no benefit from saved subscriptions
    • Broker wastes memory storing empty session

Power consumption comparison: - QoS 0, Clean=true: 10 mA x 100 ms = 0.28 uAh per wake cycle - QoS 1, Clean=true: 10 mA x 150 ms = 0.42 uAh per wake cycle and the best balance here - QoS 1, Clean=false: 10 mA x 200 ms = 0.56 uAh per wake cycle - QoS 2, Clean=false: 10 mA x 400 ms = 1.11 uAh per wake cycle

What about the subscriber getting the status?

Separate concern: Use Retained Messages!

Use a retained publish call that sends the current status on door/sensor1/status with payload OK, QoS 1, and the retain flag set to true.

That means: - topic: door/sensor1/status - payload: OK - QoS: 1 - retain flag: true

When a subscriber connects later, the broker immediately sends the retained "status: OK" value even if the sensor is asleep.

Optimal configuration:

  • Sensor: QoS 1, Clean Session = true, Retain = true
  • Subscriber: QoS 0, Clean Session = true (just wants current status)

Relative battery impact (QoS 0 = baseline):

  • QoS 0: 1x overhead (shortest active time per cycle)
  • QoS 1: ~1.5x overhead (one extra round-trip for PUBACK)
  • QoS 2: ~4x overhead (four-step handshake per message)

Pro tip: For battery-powered sensors, use QoS 1 + Retained + Clean Session = true. This ensures reliable delivery without the overhead of persistent sessions.

21.11 Real-World QoS Decision Framework

Selecting the right QoS level requires analyzing three dimensions of your data: criticality, frequency, and idempotency.

Worked Example: QoS Selection for a Smart Hospital

Scenario: A hospital deploys 2,000 IoT devices across three categories. Each category has different reliability requirements and message patterns.

Device inventory and QoS assignment:

Room temperature - Count: 800 devices - Frequency: every 60 seconds - Critical: no - Idempotent: yes - Recommended QoS: 0 - Rationale: the next reading replaces any lost value

Patient heart rate - Count: 600 devices - Frequency: every 5 seconds - Critical: yes - Idempotent: yes - Recommended QoS: 1 - Rationale: the data must arrive, but a duplicate heart-rate sample is harmless

IV pump dosage - Count: 400 devices - Frequency: on command - Critical: yes - Idempotent: no - Recommended QoS: 2 - Rationale: duplicate "administer 50mg" commands could be dangerous

Nurse call button - Count: 200 devices - Frequency: on event - Critical: yes - Idempotent: yes - Recommended QoS: 1 - Rationale: alerts must arrive, but duplicate alerts are still safe to handle

Battery and bandwidth impact calculation:

  • QoS 0 messages: 800 devices x 1/min = 800 msg/min, 1 packet each = 800 packets/min
  • QoS 1 messages: 800 devices x 12/min = 9,600 msg/min, 2 packets each = 19,200 packets/min
  • QoS 2 messages: 400 devices x ~2/hour = 800 msg/hour, 4 packets each = 53 packets/min

Total network load: 800 + 19,200 + 53 = 20,053 packets/min

If all devices used QoS 2: (800 + 9,600 + 13) x 4 ≈ 41,652 packets/min (2.1x more traffic for no benefit on temperature sensors).

Key insight: The hospital saves ~52% of network traffic by matching QoS to data criticality rather than using a blanket “everything QoS 2” policy.

The packet overhead grows linearly with QoS level. For \(n\) devices publishing at rate \(r\) (messages/min), total packet count is:

\[P_{\text{total}} = \sum_{i=1}^{n} r_i \cdot m_i\]

where \(m_i\) is the packet multiplier per QoS level: \(m_0 = 1\), \(m_1 = 2\), \(m_2 = 4\).

For this hospital with mixed QoS assignment: - QoS 0: \(P_0 = 800 \times 1 \times 1 = 800\) packets/min - QoS 1: \(P_1 = 800 \times 12 \times 2 = 19,200\) packets/min - QoS 2: \(P_2 = 400 \times (2/60) \times 4 = 53\) packets/min

Total: \(P = 800 + 19,200 + 53 = 20,053\) packets/min

If all used QoS 2: \(P'_{\text{all-QoS2}} = (800 \times 1 + 800 \times 12 + 400 \times 2/60) \times 4 \approx 41,652\) packets/min

Efficiency gain: \(\eta = 1 - P/P' = 1 - 20,053/41,652 \approx 52\%\) reduction.

This demonstrates that the packet overhead ratio is \(m_2 : m_1 : m_0 = 4 : 2 : 1\) — each QoS level doubles the network load.

21.12 Interactive Calculators

21.12.1 QoS Level Advisor

Enter your scenario parameters to get a QoS level recommendation based on message criticality, frequency, and idempotency.

21.12.2 Network Packet Overhead Calculator

Model a mixed-QoS IoT deployment and compare total packet load against a blanket QoS 2 policy.

21.12.3 Message Delivery Probability Explorer

See how network reliability affects actual message delivery for each QoS level. QoS 0 delivers at most once; QoS 1 retries to improve success; QoS 2 adds a full handshake.

21.12.4 Session Type Decision Tool

Answer three questions about your device to determine whether to use a clean or persistent MQTT session.

21.13 Summary

This chapter introduced the fundamentals of MQTT Quality of Service:

  • QoS 0 (At Most Once): Fire-and-forget delivery with no acknowledgment, fastest and most battery-efficient but messages may be lost
  • QoS 1 (At Least Once): Acknowledged delivery ensuring messages arrive but may create duplicates if acknowledgments are lost
  • QoS 2 (Exactly Once): Four-way handshake guaranteeing single delivery, slowest but essential for critical commands
  • Clean Sessions: Temporary connections where broker forgets everything on disconnect, ideal for simple publishers
  • Persistent Sessions: Broker remembers subscriptions and queues messages for offline clients, essential for command receivers
  • Retained Messages: Broker stores last message and delivers to new subscribers immediately, perfect for device status

21.14 Knowledge Check

21.15 Concept Relationships

MQTT QoS Fundamentals connect to:

QoS decision framework: Assess message criticality (can it be lost?) → assess idempotency (is duplicate safe?) → assess battery constraints → choose QoS 0 (telemetry), 1 (alerts), or 2 (commands).

21.16 See Also

21.17 What’s Next

  • MQTT QoS Levels: Technical deep dive into QoS handshakes and message flow. Read this next to understand the byte-level protocol mechanics and timing of each QoS level.
  • MQTT Session Management: Persistent sessions, message queuing, and reconnection strategies. Read this next to learn how brokers track client state and deliver offline messages reliably.
  • MQTT Implementation: Coding QoS in Python and on ESP32 microcontrollers. Read this next to translate QoS theory into working publisher/subscriber code.
  • MQTT Fundamentals: Core pub/sub model, topics, and broker architecture. Read this next to revisit the foundational concepts that QoS levels build upon.
  • CoAP Features and Labs: CoAP confirmable messages as an alternative to MQTT QoS. Read this next to compare MQTT’s QoS model with CoAP’s reliability mechanism for constrained devices.