Selecting the right MQTT QoS level requires analyzing each message type’s criticality, replacement frequency, and power budget. This chapter walks through real-world worked examples including fleet tracking (GPS at QoS 0, delivery confirmations at QoS 2), calculates the battery and bandwidth impact of each choice, and demonstrates adaptive QoS patterns that adjust reliability dynamically based on network conditions.
Apply QoS Selection Frameworks: Apply systematic decision criteria to assign QoS levels to real-world message types based on criticality and duplicate tolerance
Calculate Resource Impact: Calculate battery consumption, bandwidth overhead, and broker data costs for each QoS level across fleet-scale deployments
Design Session Strategies: Design and configure persistent MQTT sessions for sleeping and intermittently connected IoT devices
Implement Adaptive QoS Patterns: Implement store-and-forward and adaptive QoS strategies that respond to network conditions and device duty cycles
Evaluate Trade-offs: Evaluate and justify QoS selections by comparing reliability guarantees against energy and bandwidth costs with worked numerical examples
Distinguish QoS Semantics: Distinguish between publisher-to-broker and broker-to-subscriber QoS segments, and diagnose mismatches that silently downgrade delivery guarantees
Key Concepts
QoS 0 (At Most Once): Fire-and-forget delivery with no acknowledgment — lowest overhead, message loss possible
QoS 1 (At Least Once): ACK-based delivery ensuring message arrives at least once — duplicate possible if PUBACK lost
Message Ordering: MQTT guarantees ordered delivery within a client session but not across sessions or publishers
25.2 For Beginners: MQTT QoS Examples
Worked examples show MQTT QoS in action with realistic scenarios – a temperature sensor that can afford to miss occasional readings (QoS 0), a door lock that must confirm every command (QoS 1), and a billing system that cannot process the same transaction twice (QoS 2). Seeing the concepts in context makes them click.
Sensor Squad: Real QoS Scenarios
“Let me walk through three real scenarios,” said Max the Microcontroller. “Scenario 1: Sammy reports outdoor temperature every 5 minutes. If one reading gets lost, the next one comes 5 minutes later. Use QoS 0 – no acknowledgments needed, minimum battery drain.”
“Scenario 2,” said Sammy the Sensor. “A smart door lock receives ‘LOCK’ and ‘UNLOCK’ commands. Missing a command means the door stays in the wrong state. Use QoS 1 – the broker retries until the lock acknowledges. Getting ‘LOCK’ twice on an already-locked door? Harmless. Missing ‘LOCK’ entirely? Dangerous.”
Lila the LED presented the toughest case: “Scenario 3: an electric meter reports energy usage for billing. If the message arrives twice, the customer gets charged double. If it doesn’t arrive, the utility loses revenue. This demands QoS 2 – the four-step handshake guarantees exactly one delivery. Yes, it’s slower. But when money is on the line, you need that guarantee.”
Bella the Battery summarized: “Weather data = QoS 0. Safety commands = QoS 1. Financial transactions = QoS 2. The worked examples in this chapter show you exactly what happens at each step, with real packet diagrams!”
Scenario: You are designing the messaging system for a logistics company tracking 500 delivery trucks. Each truck sends various message types: GPS location updates, delivery confirmations, driver alerts, and fuel level readings. The system uses cellular connectivity which has variable reliability (3G/4G coverage varies by location).
Goal: Select the optimal QoS level for each message type, balancing reliability, battery consumption, and data costs.
Takeaway: Using QoS 0 for GPS and QoS 2 only for critical confirmations saves $225/month ($2,700/year) and doubles battery life vs all-QoS-2.
What we do: Match each message type to the appropriate QoS level using the decision criteria.
Why: The framework ensures consistent, justified decisions rather than guessing.
Decision questions:
Can we afford to lose this message? (Yes -> QoS 0, No -> Continue)
Would duplicates cause problems? (No -> QoS 1, Yes -> QoS 2)
Analysis per message type:
What we do: Quantify the battery, bandwidth, and cost impact of our QoS choices.
Why: Understanding the trade-offs helps justify decisions and optimize where possible.
Message overhead calculation:
Message Type
Count/Day
QoS
Messages/Msg
Total Msgs/Day
GPS location
2,880
0
1
2,880
Delivery confirmed
50
2
4
200
Panic alert
~0.1
1
2
~0.2
Fuel level
144
0
1
144
Speed violation
~10
1
2
20
Total
3,084
-
-
3,244
Overhead analysis:
If everything were QoS 0: 3,084 messages/day
With our QoS choices: 3,244 messages/day (+5.2% overhead)
If everything were QoS 2: 12,336 messages/day (+300% overhead)
Cellular data impact (assuming 100 bytes/message): - Our design: 3,244 x 100 = 324 KB/day per truck - 500 trucks x 324 KB x 30 days = 4.7 GB/month - All QoS 2: 18.5 GB/month (4x higher data cost)
What we do: Consider scenarios that might require QoS adjustments.
Why: Real-world conditions may differ from typical operation.
Edge case: Poor cellular coverage
When truck enters area with weak signal:
# Pseudo-code for adaptive QoSdef get_gps_qos(signal_strength, time_since_last_update):if time_since_last_update >300: # 5 min since last successfulreturn QOS_1 # Promote GPS to QoS 1 (need confirmation)elif signal_strength <-110: # Very weak signalreturn QOS_1 # Upgrade to ensure deliveryelse:return QOS_0 # Normal operation
Edge case: Delivery confirmation retry
If QoS 2 handshake fails after 3 attempts:
# Store-and-forward patterndef confirm_delivery(delivery_id, mqtt_client):for attempt inrange(3): result = mqtt_client.publish(f"fleet/truck123/delivery/{delivery_id}/confirmed", payload=confirmation_data, qos=2, timeout=10 )if result.is_published():returnTrue time.sleep(5* (2** attempt)) # Exponential backoff# Store locally for later sync local_buffer.save(delivery_id, confirmation_data)returnFalse
Outcome: Optimized QoS configuration balancing reliability and efficiency.
Final QoS assignments:
Message Type
QoS
Rationale
GPS location
0
High frequency, loss acceptable, next update imminent
Delivery confirmed
2
Business-critical, duplicates cause billing issues
Compliance record needed, duplicates minor annoyance
Key design decisions:
95% of messages use QoS 0 - GPS and fuel readings dominate volume
QoS 2 only where duplicates cause harm - Delivery confirmation (billing)
QoS 1 for safety events - Reliability without QoS 2 latency overhead
Adaptive QoS in poor coverage - Promote GPS to QoS 1 when stale
Local buffering - Store failed deliveries for eventual consistency
Cost-benefit summary:
Battery: 5.2% more messages than all-QoS-0 (acceptable for reliability)
Data cost: 4.7 GB/month vs 18.5 GB if all-QoS-2 (75% savings)
Reliability: Critical messages guaranteed, non-critical efficiently sent
25.5 Worked Example: Smart Door Lock QoS Selection
Smart Door Lock Scenario
Scenario: A commercial building deploys 50 smart door locks that must respond to unlock commands from a mobile app. The security team needs to ensure commands are executed reliably while minimizing battery drain on battery-backup locks.
Given:
50 door locks, each with 4x AA battery backup (2,400 mAh total at 6V)
Unlock commands: ~200 per lock per day (employees entering/exiting)
Status updates: locks publish state every 30 seconds
Network: Enterprise Wi-Fi with 2% packet loss
Critical requirement: No duplicate unlock commands (security audit trail)
Battery backup duration: 51,840 J / 8.82 J/day = 5,877 days = 16 years
Result: Use QoS 2 for unlock commands (zero duplicates, audit compliance) and QoS 0 for status updates (high frequency, duplicates harmless). Daily energy cost is 8.82 J, providing 16 years of battery backup. The additional 0.96 J/day for QoS 2 commands (vs QoS 1) is negligible compared to status message energy.
Key Insight: QoS 2’s 2x energy overhead (9.6 mJ vs 4.8 mJ per command) seems expensive, but commands represent only 22% of daily energy (1.92 J / 8.82 J). Status updates dominate at 78%. For security-critical commands where duplicates cause audit violations, QoS 2’s guarantee is worth the modest energy cost. Never use QoS 1 for commands where duplicates have real-world consequences.
25.6 Worked Example: Persistent Session Sizing
Fleet Session Sizing Scenario
Scenario: A logistics company manages 1,000 delivery trucks, each with an MQTT client that sleeps during overnight hours. The broker must queue messages for offline trucks and deliver them when trucks reconnect in the morning.
Given:
1,000 trucks, each offline 10 hours/night (10 PM to 8 AM)
During offline period, central system sends:
Route updates: 1 per truck, 2 KB payload
Delivery manifests: 5 per truck, 500 bytes each
System alerts: 20 total (broadcast), 100 bytes each
Disk spillover threshold: 100 MB (current 16.6 MB is 16% of threshold)
Result: Persistent sessions for 1,000 trucks require only 16.6 MB broker memory, handling 26,000 queued messages during 10-hour offline periods. Morning reconnection storm (50/minute) and message delivery (22 msg/sec) are within 1% of broker capacity. Set message expiry to 20 hours and disk spillover at 100 MB.
Key Insight: Persistent sessions seem expensive but scale efficiently - 1,000 trucks need only 16.6 MB total. The real cost is reconnection storms: when trucks reconnect simultaneously, message delivery spikes. Stagger reconnection times (e.g., random 0-10 minute delay) to spread load. Without persistent sessions, trucks would miss route updates and require manual re-sync, costing far more in operational overhead than the 16.6 MB memory investment.
25.7 Worked Example: Medical Device Telemetry
Medical Telemetry Scenario
Scenario: A hospital deploys 100 patient monitors that transmit vital signs (heart rate, blood oxygen, blood pressure) to a central monitoring station. Nurses need real-time alerts for critical values, and all readings must be logged for medical records. You must select appropriate QoS levels balancing reliability with device battery life.
Regulatory requirement: all vital signs must be logged (no data loss for records)
Network: Hospital Wi-Fi with 0.5% packet loss
Device radio power: 120 mA transmit, 15 mA idle
Message transmission time: 8 ms for QoS 0, 25 ms for QoS 1, 50 ms for QoS 2
Steps:
Categorize message types by criticality:
Routine vitals (99% of messages): Regular readings within normal range
Critical alerts (<1% of messages): Out-of-range values requiring immediate attention
Acknowledgment requests (rare): Nurse confirms alert received
Analyze delivery requirements per message type:
Calculate energy consumption per QoS level:
QoS 0 energy: 120 mA x 8 ms = 0.96 mAs per message
QoS 1 energy: 120 mA x 25 ms = 3.0 mAs per message
QoS 2 energy: 120 mA x 50 ms = 6.0 mAs per message
Calculate daily energy with mixed QoS strategy:
Routine vitals (QoS 0): 3 msg/sec x 86,400 sec x 0.96 mAs = 248,832 mAs = 69.1 mAh
Critical alerts (QoS 1): 10 alerts/day x 3.0 mAs = 30 mAs = 0.008 mAh
Acknowledgments (QoS 2): 10/day x 6.0 mAs = 60 mAs = 0.017 mAh
Total daily: 69.1 mAh for MQTT transmission
Battery life (1,500 mAh, 50% for MQTT): 750 / 69.1 = 10.9 days
Compare with all-QoS-1 approach (regulatory conservative):
All messages QoS 1: 259,200 msg/day x 3.0 mAs = 777,600 mAs = 216 mAh
Battery life: 750 / 216 = 3.5 days (3x more frequent charging)
Address logging requirement (no data loss):
Use persistent session (Clean Session = false) for historian subscriber
Broker queues missed messages during historian restarts
Historian publishes acknowledgment after writing to database
If historian offline > 1 hour, alert IT staff (not QoS problem)
Result: Use QoS 0 for routine vital signs (99% of traffic), QoS 1 for critical alerts, and QoS 2 for nurse acknowledgments. This mixed strategy provides 10.9 days battery life versus 3.5 days with all-QoS-1. Regulatory logging is ensured by persistent sessions on the historian subscriber, not by upgrading publisher QoS.
Key Insight: QoS selection should be per-message-type, not per-device. Critical alerts representing <1% of traffic can use QoS 1 without significantly impacting battery life. The logging requirement (no data loss) is solved at the subscriber side with persistent sessions, not by forcing publishers to use higher QoS. Never use QoS 2 for high-frequency telemetry - the 6x energy cost destroys battery life. Reserve QoS 2 for rare, non-idempotent operations like acknowledgments where duplicate confusion matters.
25.8 Worked Example: Sleep-Wake Sensor Sessions
Sleep-Wake Sensor Scenario
Scenario: An agricultural monitoring system uses 200 soil moisture sensors that sleep for 55 minutes, wake for 5 minutes to transmit data and receive commands, then return to sleep. Sensors must receive any pending irrigation commands issued while they were asleep.
Given:
200 sensors with solar-powered batteries
Sleep/wake cycle: 55 min sleep, 5 min active (5/60 duty cycle = 8.3%)
During active period: publish 3 readings, check for commands
Commands issued centrally: ~50 per day total across all sensors
Command types: calibrate (safe to duplicate), irrigate (must not duplicate)
Broker: Mosquitto with persistent message storage enabled
Maximum acceptable command delay: 60 minutes (one full cycle)
Steps:
Configure session persistence for each sensor:
// ESP32 sensor MQTT configurationconstchar* clientId ="sensor-042";// Stable ID (from chip MAC)bool cleanSession =false;// Persistent session - broker saves subscriptions// Subscribe to command topic on every wakeclient.subscribe("farm/zone-b/sensor-042/command",1);// QoS 1
Central system issues command at 10:15 AM:
"farm/zone-b/sensor-042/command" = "{"action":"irrigate","duration":300}"
Sensor-042 timeline:
10:00 AM - Goes to sleep
10:15 AM - Command arrives, broker queues it (sensor offline)
10:55 AM - Sensor wakes, connects with same clientId
10:55 AM - Broker detects persistent session, sends queued command
10:55 AM - Sensor receives command, starts irrigation
10:56 AM - Sensor publishes confirmation, goes back to sleep
Latency: 40 minutes (acceptable, within 60-min requirement)
# When replacing sensor hardware, clear old sessiondef decommission_sensor(sensor_id):# Connect with same clientId, clean session to clear state temp_client = mqtt.Client(client_id=sensor_id, clean_session=True) temp_client.connect(broker, port) temp_client.disconnect()# Old session cleared, queued messages discardedprint(f"Session cleared for {sensor_id}")
Result: Configure sensors with Clean Session = false and stable client IDs derived from hardware MAC. Broker queues commands during 55-minute sleep periods, delivering them when sensors wake. Use QoS 1 for idempotent commands (calibrate) and QoS 2 for non-idempotent commands (irrigate). Set session expiry to 24 hours to handle extended low-battery sleep. Total broker memory for 200 sensors: ~1 MB.
Key Insight: Persistent sessions transform MQTT into a store-and-forward system for sleeping devices. The key requirements are: (1) stable client IDs - random IDs break session restoration, (2) subscribe on every wake - subscriptions persist but re-subscribing is harmless and handles broker restarts, (3) match QoS to command idempotency - irrigate twice wastes water while calibrate twice is harmless. Without persistent sessions, sensors would need to poll for commands or miss them entirely, requiring complex application-level queuing that MQTT already provides for free.
Common Mistake: Assuming QoS Level is End-to-End
The mistake: Developers set QoS 2 on the publisher and assume messages are delivered exactly-once to all subscribers, regardless of subscriber QoS settings.
Why it’s wrong: MQTT QoS is NOT end-to-end – it operates independently on two segments: 1. Publisher → Broker (governed by publisher’s QoS) 2. Broker → Subscriber (governed by minimum of publisher QoS and subscriber QoS)
# Publisher sets QoS 2 for critical commandclient.publish("factory/emergency_stop", "STOP_ALL", qos=2)# EVERY subscriber MUST match the required QoSplc_a.subscribe("factory/emergency_stop", qos=2) # Critical safety systemplc_b.subscribe("factory/emergency_stop", qos=2) # NOT qos=0!dashboard.subscribe("factory/emergency_stop", qos=2) # Display needs consistency
Key insight: Document required subscriber QoS in your API specification. Failing to match QoS on the subscriber side silently downgrades delivery guarantees, causing bugs that only appear under specific network conditions.
Quick Check: End-to-End QoS
25.9 Interactive Calculators
25.9.1 Fleet QoS Data Cost Calculator
Compare monthly cellular data costs for a fleet of vehicles under different QoS assignment strategies. Adjust the fleet size, message rates, and payload sizes to see how QoS choices impact your data budget.
Calculate how long a battery-powered IoT device will last under different QoS configurations. This calculator models the energy cost per MQTT transaction for each QoS level.
Show code
viewof batCapacity = Inputs.range([100,10000], {value:3000,step:100,label:"Battery capacity (mAh)"})viewof batVoltage = Inputs.range([1.8,12], {value:3.7,step:0.1,label:"Supply voltage (V)"})viewof batRadioPower = Inputs.range([10,500], {value:120,step:10,label:"Radio transmit current (mA)"})viewof batMsgRate = Inputs.range([0.01,10], {value:0.5,step:0.01,label:"Messages per second"})viewof batTxTime = Inputs.range([1,50], {value:8,step:1,label:"Tx time per packet (ms)"})viewof batQoSLevel = Inputs.select([0,1,2], {value:0,label:"QoS level"})viewof batMqttFraction = Inputs.range([0.1,1.0], {value:0.5,step:0.05,label:"Battery fraction for MQTT"})
Size the broker memory requirements for persistent MQTT sessions. Model how many devices, queued messages, and metadata overhead your broker must handle during offline periods.
Show code
viewof brDevices = Inputs.range([10,50000], {value:1000,step:10,label:"Number of devices"})viewof brOfflineHours = Inputs.range([1,48], {value:10,step:1,label:"Offline period (hours)"})viewof brMsgsPerHour = Inputs.range([0.1,100], {value:2.6,step:0.1,label:"Queued msgs per device/hour"})viewof brAvgPayload = Inputs.range([10,5000], {value:252,step:10,label:"Avg message payload (bytes)"})viewof brSessionOverhead = Inputs.range([1,20], {value:5,step:0.5,label:"Session state overhead (KB)"})viewof brMetadataPerMsg = Inputs.range([50,500], {value:200,step:10,label:"Metadata per message (bytes)"})viewof brBrokerRAM = Inputs.range([1,128], {value:16,step:1,label:"Broker RAM (GB)"})
Compare battery life when using a mixed QoS strategy (QoS 0 for routine data, QoS 1 for alerts) versus a uniform QoS approach. This models the medical telemetry scenario where message frequency varies by type.
Show code
viewof mixBattery = Inputs.range([100,5000], {value:1500,step:100,label:"Battery capacity (mAh)"})viewof mixRoutineRate = Inputs.range([0.1,10], {value:3,step:0.1,label:"Routine messages per second"})viewof mixAlertRate = Inputs.range([1,100], {value:10,step:1,label:"Alert messages per day"})viewof mixRadioCurrent = Inputs.range([10,300], {value:120,step:10,label:"Radio current (mA)"})viewof mixQ0Time = Inputs.range([1,30], {value:8,step:1,label:"QoS 0 transmit time (ms)"})viewof mixQ1Time = Inputs.range([5,80], {value:25,step:1,label:"QoS 1 transmit time (ms)"})viewof mixQ2Time = Inputs.range([10,150], {value:50,step:1,label:"QoS 2 transmit time (ms)"})viewof mixBudgetFraction = Inputs.range([0.1,1.0], {value:0.5,step:0.05,label:"Battery budget for MQTT"})
Model the message queue requirements for duty-cycled sensors that sleep for extended periods. Calculate how many commands accumulate during sleep and the burst delivery rate at wake-up.
Show code
viewof swSensors = Inputs.range([10,10000], {value:200,step:10,label:"Number of sensors"})viewof swSleepMin = Inputs.range([1,1440], {value:55,step:1,label:"Sleep duration (minutes)"})viewof swWakeMin = Inputs.range([1,60], {value:5,step:1,label:"Wake duration (minutes)"})viewof swCmdsPerDay = Inputs.range([1,500], {value:50,step:1,label:"Total commands per day (all sensors)"})viewof swCmdSize = Inputs.range([50,2000], {value:200,step:10,label:"Average command size (bytes)"})viewof swSessionKB = Inputs.range([1,20], {value:5,step:0.5,label:"Session overhead per sensor (KB)"})viewof swMaxQueuePerClient = Inputs.range([5,100], {value:10,step:1,label:"Max queue per client"})
QoS 2’s four-message handshake consumes 4× the bandwidth of QoS 0 and adds 2-4 roundtrips of latency — on a 2G connection this can mean 2-8 seconds per message. Reserve QoS 2 for irreplaceable commands (actuator control, firmware triggers); use QoS 1 for telemetry where occasional duplicates are acceptable.
2. Expecting QoS to Guarantee Subscriber Delivery
MQTT QoS governs publisher-to-broker and broker-to-subscriber segments independently — QoS 1 from publisher to broker combined with QoS 0 subscription results in at-most-once overall. Always set both the publish QoS AND the subscribe QoS to the required level.
3. Not Handling Duplicate Messages at QoS 1
QoS 1 guarantees at-least-once delivery — when the broker does not receive PUBACK, it retransmits the message with dup=true flag. Applications storing sensor readings will create duplicate database entries unless they check message IDs or implement idempotent writes.
Label the Diagram
Order the Steps
Match the Concepts
25.10 Summary
This chapter provided detailed worked examples for MQTT QoS and session configuration:
Fleet Tracking: 95% of messages use QoS 0 for efficiency, QoS 2 only for delivery confirmations where duplicates cause billing issues, QoS 1 for safety alerts where speed matters more than duplicate prevention
Smart Door Locks: QoS 2 for unlock commands (audit compliance, no duplicates), QoS 0 for status updates (high frequency, duplicates harmless), battery impact is dominated by status messages not commands
Fleet Session Sizing: 1,000 trucks need only 16.6 MB broker memory, reconnection storms are the real challenge, stagger reconnections with random delays
Medical Telemetry: Mixed QoS by message type provides 3x better battery life than uniform QoS 1, logging requirements are solved at subscriber side with persistent sessions
Sleep-Wake Sensors: Persistent sessions enable store-and-forward for sleeping devices, stable client IDs are critical, QoS matches command idempotency