MQTT Quality of Service (QoS) determines how reliably messages are delivered between clients and broker: QoS 0 sends once with no confirmation (like shouting into a crowd), QoS 1 retries until acknowledged (like registered mail), and QoS 2 uses a four-step handshake for exactly-once delivery (like a bank transfer). For most IoT sensor readings, QoS 0 or 1 suffices; reserve QoS 2 for commands where duplication would cause harm, such as billing transactions or actuator controls.
Explain QoS Semantics: Explain what Quality of Service means in MQTT messaging and justify why three distinct delivery guarantee levels exist
Distinguish QoS Levels: Distinguish between QoS 0, 1, and 2 by comparing their handshake sequences, overhead costs, and reliability guarantees
Analyze Session Trade-offs: Analyze the difference between clean and persistent sessions, assessing the impact on broker memory and offline message delivery
Apply Retained Messages: Apply retained messages correctly to device status topics and demonstrate when they provide value over real-time publishing
Select and Justify QoS Choices: Select appropriate QoS levels for common IoT scenarios and justify each selection based on criticality, idempotency, and battery constraints
Key Concepts
QoS 0 (At Most Once): Fire-and-forget delivery with no acknowledgment — lowest overhead, message loss possible
QoS 1 (At Least Once): ACK-based delivery ensuring message arrives at least once — duplicate possible if PUBACK lost
Message Ordering: MQTT guarantees ordered delivery within a client session but not across sessions or publishers
21.2 For Beginners: MQTT QoS Fundamentals
Quality of Service in MQTT determines how hard the system tries to deliver each message. At the lowest level (QoS 0), messages are sent once with no confirmation. At the highest level (QoS 2), a four-step handshake ensures the message arrives exactly once. Choosing the right level is a trade-off between reliability and speed.
Sensor Squad: Understanding Delivery Guarantees
“Why can’t everything just be QoS 2?” asked Sammy the Sensor. “Then nothing would ever get lost!”
Bella the Battery shook her head. “Because QoS 2 uses FOUR messages for every single reading: PUBLISH, PUBREC, PUBREL, PUBCOMP. If you send temperature every 10 seconds, that’s 4 times the radio usage. My battery would drain in weeks instead of months!”
Max the Microcontroller drew it out: “QoS 0 is one arrow: you send it and forget. QoS 1 is two arrows: you send, the broker sends back PUBACK. If you don’t get PUBACK, you resend. QoS 2 is four arrows with a two-phase commit. Each step up doubles the overhead but strengthens the guarantee.”
“The key insight,” said Lila the LED, “is matching QoS to the MESSAGE, not the device. Sammy can use QoS 0 for routine temperature and QoS 1 for critical alerts – in the same connection. You don’t have to pick one QoS for everything. Be smart about it and Bella stays happy!”
Common Pitfalls
1. Using QoS 2 for All IoT Telemetry
QoS 2’s four-message handshake consumes 4× the bandwidth of QoS 0 and adds 2-4 roundtrips of latency — on a 2G connection this can mean 2-8 seconds per message. Reserve QoS 2 for irreplaceable commands (actuator control, firmware triggers); use QoS 1 for telemetry where occasional duplicates are acceptable.
2. Expecting QoS to Guarantee Subscriber Delivery
MQTT QoS governs publisher-to-broker and broker-to-subscriber segments independently — QoS 1 from publisher to broker combined with QoS 0 subscription results in at-most-once overall. Always set both the publish QoS AND the subscribe QoS to the required level.
3. Not Handling Duplicate Messages at QoS 1
QoS 1 guarantees at-least-once delivery — when the broker does not receive PUBACK, it retransmits the message with dup=true flag. Applications storing sensor readings will create duplicate database entries unless they check message IDs or implement idempotent writes.
21.3 What is QoS (Quality of Service)?
Analogy: QoS is like different shipping methods when sending a package.
Three QoS levels = Three shipping methods:
QoS 0
Shipping analogy: Drop in mailbox
Real meaning: Fire and forget; the message may be lost
Typical use: Temperature readings every 10 seconds
QoS 1
Shipping analogy: Certified mail
Real meaning: At least once; duplicate delivery is possible
Typical use: Door sensor "opened" alert
QoS 2
Shipping analogy: Registered mail
Real meaning: Exactly once; no loss and no duplicates
Typical use: Smart lock "UNLOCK" command
21.4 QoS 0: Fire and Forget
Simple explanation:
Sensor publishes "Temperature: 24C" to the broker.
The broker may receive it, or it may be lost in transit.
The sensor does not wait for any reply; it just sends the next reading 10 seconds later.
Fastest (no waiting for acknowledgment)
Lowest battery usage (minimal radio time)
Might be lost (no delivery guarantee)
21.5 QoS 1: At Least Once
Simple explanation:
Sensor publishes "Door opened!".
Broker replies with PUBACK to confirm receipt.
If that PUBACK is lost, the sensor retries and the broker may see the same alert twice.
Guaranteed delivery (will retry until acknowledged)
Might get duplicates (if acknowledgment is lost)
Slower (waits for PUBACK)
21.6 QoS 2: Exactly Once
Simple explanation:
Sensor publishes "UNLOCK door!".
Broker replies PUBREC to say it has received the message.
Sensor sends PUBREL to release the message for final processing.
Broker replies PUBCOMP to confirm the exchange is complete.
This four-step handshake ensures the command is processed exactly once.
Guaranteed exactly once (no duplicates, no losses)
Figure 21.1: MQTT Quality of Service levels with increasing reliability guarantees
Minimum Viable Understanding: QoS Semantics
Core Concept: QoS 0 delivers at most once (may lose messages), QoS 1 delivers at least once (may duplicate), QoS 2 delivers exactly once (no loss, no duplicates) - each level adds handshake overhead for stronger guarantees. Why It Matters: Choosing the wrong QoS wastes battery life (QoS 2 for temperature readings) or loses critical data (QoS 0 for door unlock commands) - this single decision can determine whether your IoT deployment succeeds or fails. Key Takeaway: Use QoS 0 for high-frequency telemetry where the next reading supersedes any lost one, QoS 1 for important alerts and state changes, and reserve QoS 2 only for critical commands where duplicates would cause harm (like financial transactions or actuator commands).
21.7 Real-World Examples
Scenario 1: Temperature Sensor (Use QoS 0) - 10:00:00: Sensor publishes 24.0C at QoS 0. - 10:00:10: Sensor publishes 24.1C at QoS 0. - 10:00:20: One 24.1C reading is lost. - 10:00:30: Sensor publishes 24.2C at QoS 0. - Dashboard still shows a useful trend: 24.0, 24.1, 24.2.
Scenario 2: Motion Sensor Alert (Use QoS 1) - Motion is detected in the garage. - Sensor publishes "Motion detected in garage!" at QoS 1. - A network glitch loses the PUBACK. - Sensor waits, retries, and the broker acknowledges the alert on the second attempt. - Result: the alert is delivered, even if it appears twice.
Scenario 3: Smart Lock Command (Use QoS 2) - User presses "Unlock" in the app. - App sends UNLOCK door #1 using QoS 2. - Broker forwards the same command to the lock with QoS 2. - The four-step handshake guarantees the lock acts on that command exactly once.
21.8 What are Sessions? (Clean vs Persistent)
Analogy: Think of sessions as hotel check-in preferences.
21.8.1 Clean Session (true)
Guest says, “I want a fresh room, no baggage from last visit.”
Hotel replies, “Sure. Starting fresh.”
When the guest leaves, the hotel throws everything away.
MQTT Translation:
Client connects with Clean Session = true
Broker forgets everything when client disconnects
No stored messages, no subscriptions remembered
Like checking into a hotel with no history
When to use: Temporary connections, testing, devices that don’t care about missed messages
21.8.2 Persistent Session (false)
Guest says, “I’m a regular. Keep my preferences.”
Hotel replies, “Welcome back. We saved your room setup, mail, and messages.”
While the guest is away, the hotel keeps everything safe.
When the guest returns, the hotel hands over everything that arrived during the absence.
MQTT Translation:
Client connects with Clean Session = false
Broker remembers subscriptions even after disconnect
Broker stores messages sent while client was offline (if QoS 1 or 2)
Client gets “catch-up” messages when reconnecting
When to use: Devices with intermittent connectivity, battery-powered sensors that sleep, critical subscribers that can’t miss messages
MQTT session lifecycle comparison
Figure 21.2: MQTT session lifecycle comparison showing Clean Session (temporary, discards everything on disconnect) vs Persistent Session (maintains subscriptions and queues messages for offline clients). Clean sessions are optimal for simple publishers, while persistent sessions are essential for devices with intermittent connectivity that cannot miss critical messages.
21.9 Retained Messages: Last-Known State
MQTT retained messages allow the broker to store the most recent message on a topic and deliver it immediately to new subscribers:
MQTT retained message workflow
Figure 21.3: MQTT retained message workflow showing how brokers store the last published value and deliver it immediately to new subscribers. When a sensor publishes with retain=true, the broker stores that message and automatically sends it to any future subscribers, even if the sensor is offline. This is essential for status topics where clients need the current state immediately upon connection.
Tradeoff: Persistent Session vs Clean Session
Decision context: When configuring an MQTT client, you must choose whether the broker should remember client state between connections.
Persistent Session - Memory usage: Higher because the broker stores state - Reconnect speed: Faster because subscriptions are restored - Offline messages: Queued for QoS 1 and QoS 2 - Client ID: Must be unique and stable - Broker scalability: Lower because state is stored per client - Battery impact: Higher because catch-up data may be delivered on reconnect
Clean Session - Memory usage: Lower because no state is stored - Reconnect speed: Slower because the client must resubscribe - Offline messages: Lost while the client is disconnected - Client ID: Can be random or ephemeral - Broker scalability: Higher because the broker stays stateless - Battery impact: Lower because the client starts fresh each time
Choose Persistent Session when:
Device has intermittent connectivity (sleep cycles, mobile networks)
Cannot afford to miss messages during disconnection
Subscribes to topics and needs subscriptions restored automatically
Device only publishes (no subscriptions to maintain)
Frequent sensor readings where missing a few is acceptable
Testing or development environments
Battery-powered devices that prioritize power savings over completeness
High-scale deployments where broker memory is constrained
Default recommendation: Clean Session unless your device subscribes to topics and cannot miss messages during offline periods.
21.10 Quick Self-Check
Q: You have a battery-powered door sensor that: - Sleeps for 10 minutes to save battery - Wakes up, connects to broker, publishes “status: OK” - Disconnects and goes back to sleep - Someone subscribes to see door sensor status while sensor is asleep
Should you use: - A) QoS 0, Clean Session = true - B) QoS 1, Clean Session = true - C) QoS 0, Clean Session = false - D) QoS 1, Clean Session = false
Click to see the answer
Answer: B) QoS 1, Clean Session = true
Analysis:
QoS Level:
QoS 1 is correct
“status: OK” is important (want delivery confirmation)
If message is lost, subscriber doesn’t know sensor is alive
QoS 1 ensures broker receives it
Possible duplicates are OK (idempotent message)
QoS 0 is risky
Message might be lost
Subscriber never knows if sensor is working
QoS 2 is overkill
“status: OK” duplicates are harmless
Wastes battery with 4-way handshake
Clean Session:
Clean Session = true is correct
Sensor doesn’t subscribe to anything (only publishes)
Sensor doesn’t care about messages from when it was offline
Saves broker memory (doesn’t store session state)
Clean Session = false is unnecessary
Sensor isn’t subscribing, so no benefit from saved subscriptions
Broker wastes memory storing empty session
Power consumption comparison: - QoS 0, Clean=true: 10 mA x 100 ms = 0.28 uAh per wake cycle - QoS 1, Clean=true: 10 mA x 150 ms = 0.42 uAh per wake cycle and the best balance here - QoS 1, Clean=false: 10 mA x 200 ms = 0.56 uAh per wake cycle - QoS 2, Clean=false: 10 mA x 400 ms = 1.11 uAh per wake cycle
What about the subscriber getting the status?
Separate concern: Use Retained Messages!
Use a retained publish call that sends the current status on door/sensor1/status with payload OK, QoS 1, and the retain flag set to true.
That means: - topic: door/sensor1/status - payload: OK - QoS: 1 - retain flag: true
When a subscriber connects later, the broker immediately sends the retained "status: OK" value even if the sensor is asleep.
QoS 0: 1x overhead (shortest active time per cycle)
QoS 1: ~1.5x overhead (one extra round-trip for PUBACK)
QoS 2: ~4x overhead (four-step handshake per message)
Pro tip: For battery-powered sensors, use QoS 1 + Retained + Clean Session = true. This ensures reliable delivery without the overhead of persistent sessions.
21.11 Real-World QoS Decision Framework
Selecting the right QoS level requires analyzing three dimensions of your data: criticality, frequency, and idempotency.
Worked Example: QoS Selection for a Smart Hospital
Scenario: A hospital deploys 2,000 IoT devices across three categories. Each category has different reliability requirements and message patterns.
Device inventory and QoS assignment:
Room temperature - Count: 800 devices - Frequency: every 60 seconds - Critical: no - Idempotent: yes - Recommended QoS: 0 - Rationale: the next reading replaces any lost value
Patient heart rate - Count: 600 devices - Frequency: every 5 seconds - Critical: yes - Idempotent: yes - Recommended QoS: 1 - Rationale: the data must arrive, but a duplicate heart-rate sample is harmless
IV pump dosage - Count: 400 devices - Frequency: on command - Critical: yes - Idempotent: no - Recommended QoS: 2 - Rationale: duplicate "administer 50mg" commands could be dangerous
Nurse call button - Count: 200 devices - Frequency: on event - Critical: yes - Idempotent: yes - Recommended QoS: 1 - Rationale: alerts must arrive, but duplicate alerts are still safe to handle
Battery and bandwidth impact calculation:
QoS 0 messages: 800 devices x 1/min = 800 msg/min, 1 packet each = 800 packets/min
QoS 1 messages: 800 devices x 12/min = 9,600 msg/min, 2 packets each = 19,200 packets/min
QoS 2 messages: 400 devices x ~2/hour = 800 msg/hour, 4 packets each = 53 packets/min
See how network reliability affects actual message delivery for each QoS level. QoS 0 delivers at most once; QoS 1 retries to improve success; QoS 2 adds a full handshake.
html`<div style="background:#f8f9fa;padding:20px;border-radius:8px;border-left:4px solid #3498DB;margin:20px 0;"> <h4 style="color:#2C3E50;margin-top:0;">Delivery Probability by QoS Level</h4> <div style="display:grid;grid-template-columns:repeat(auto-fit,minmax(180px,1fr));gap:12px;margin-bottom:16px;"> <div style="background:white;padding:12px;border-radius:6px;border:1px solid #16A085;text-align:center;"> <div style="color:#7F8C8D;font-size:0.85em;">QoS 0 (single try)</div> <div style="color:#16A085;font-size:1.8em;font-weight:bold;">${(pQ0 *100).toFixed(1)}%</div> <div style="color:#7F8C8D;font-size:0.75em;">1 packet sent</div> </div> <div style="background:white;padding:12px;border-radius:6px;border:1px solid #E67E22;text-align:center;"> <div style="color:#7F8C8D;font-size:0.85em;">QoS 1 (with retries)</div> <div style="color:#E67E22;font-size:1.8em;font-weight:bold;">${(pQ1 *100).toFixed(4)}%</div> <div style="color:#7F8C8D;font-size:0.75em;">up to ${q1MaxRetries +1} attempts</div> </div> <div style="background:white;padding:12px;border-radius:6px;border:1px solid #E74C3C;text-align:center;"> <div style="color:#7F8C8D;font-size:0.85em;">QoS 2 (4-step, single)</div> <div style="color:#E74C3C;font-size:1.8em;font-weight:bold;">${(pQ2 *100).toFixed(2)}%</div> <div style="color:#7F8C8D;font-size:0.75em;">4 packets must succeed</div> </div> <div style="background:white;padding:12px;border-radius:6px;border:1px solid #9B59B6;text-align:center;"> <div style="color:#7F8C8D;font-size:0.85em;">QoS 2 (with retries)</div> <div style="color:#9B59B6;font-size:1.8em;font-weight:bold;">${(pQ2retried *100).toFixed(4)}%</div> <div style="color:#7F8C8D;font-size:0.75em;">handshake retried ${q1MaxRetries +1}x</div> </div> </div> <div style="background:white;padding:12px;border-radius:6px;font-size:0.9em;color:#2C3E50;"> <strong>Insight:</strong> At ${(netReliability *100).toFixed(0)}% per-packet reliability, QoS 0 loses ${((1- pQ0) *100).toFixed(1)}% of messages. QoS 1 retries reduce loss to ${((1- pQ1) *100).toFixed(4)}%. QoS 2 single-attempt success is ${(pQ2 *100).toFixed(2)}% (4 packets must all succeed), but retries bring it to ${(pQ2retried *100).toFixed(4)}%. </div></div>`
21.12.4 Session Type Decision Tool
Answer three questions about your device to determine whether to use a clean or persistent MQTT session.
Show code
viewof sessSubscribes = Inputs.select( ["yes","no"], {value:"no",label:"Does the device subscribe to topics?"})viewof sessMissOk = Inputs.select( ["yes","no"], {value:"yes",label:"Can it miss messages while offline?"})viewof sessConnType = Inputs.select( ["always_on","intermittent","deep_sleep"], {value:"intermittent",label:"Connection pattern"})
Show code
sessNeedPersist = (sessSubscribes ==="yes"&& sessMissOk ==="no") || (sessSubscribes ==="yes"&& sessConnType ==="deep_sleep")sessRecommendation = sessNeedPersist ?"Persistent Session (clean_session=false)":"Clean Session (clean_session=true)"sessRecommendationTitle = sessNeedPersist ?"Persistent Session":"Clean Session"sessRecommendationCode = sessNeedPersist ?"clean_session=false":"clean_session=true"sessColor = sessNeedPersist ?"#E67E22":"#16A085"sessExplanation = sessNeedPersist?"Your device subscribes to topics and cannot afford to miss messages. A persistent session lets the broker queue QoS 1/2 messages while the device is offline and restore subscriptions on reconnect.":"Your device either only publishes, or can tolerate missed messages. A clean session avoids broker-side state overhead and keeps connections lightweight."
MQTT QoS Levels: Technical deep dive into QoS handshakes and message flow. Read this next to understand the byte-level protocol mechanics and timing of each QoS level.
MQTT Session Management: Persistent sessions, message queuing, and reconnection strategies. Read this next to learn how brokers track client state and deliver offline messages reliably.
MQTT Implementation: Coding QoS in Python and on ESP32 microcontrollers. Read this next to translate QoS theory into working publisher/subscriber code.
MQTT Fundamentals: Core pub/sub model, topics, and broker architecture. Read this next to revisit the foundational concepts that QoS levels build upon.
CoAP Features and Labs: CoAP confirmable messages as an alternative to MQTT QoS. Read this next to compare MQTT’s QoS model with CoAP’s reliability mechanism for constrained devices.