1195 MQTT Session Management
1195.1 Learning Objectives
By the end of this chapter, you will be able to:
- Configure Session Persistence: Set up clean and persistent sessions for different device types
- Implement Secure MQTT: Apply TLS encryption and authentication for production deployments
- Avoid Common Pitfalls: Recognize and prevent session-related configuration errors
- Handle Reconnection Storms: Design systems that gracefully handle mass reconnection events
- Debug Session Issues: Diagnose message loss, orphaned sessions, and queue overflow problems
Foundations: - MQTT QoS Fundamentals - Basic QoS and session concepts - MQTT QoS Levels - Technical QoS handshake details
Deep Dives: - MQTT QoS Worked Examples - Real-world QoS selection - MQTT Comprehensive Review - Advanced patterns
Hands-On: - MQTT Labs and Implementation - Build MQTT projects
1195.2 Interactive Lab: MQTT QoS Comparison
Let’s build an experiment that demonstrates the real differences between QoS 0, 1, and 2!
1195.2.1 QoS Comparison Simulation
Code Explanation:
#include <WiFi.h>
#include <PubSubClient.h>
const char* ssid = "Wokwi-GUEST";
const char* password = "";
const char* mqtt_server = "test.mosquitto.org";
WiFiClient espClient;
PubSubClient mqttClient(espClient);
// Statistics tracking
int qos0_sent = 0, qos0_acked = 0;
int qos1_sent = 0, qos1_acked = 0, qos1_duplicates = 0;
int qos2_sent = 0, qos2_acked = 0;
unsigned long qos0_time = 0, qos1_time = 0, qos2_time = 0;
// Simulate packet loss (20% chance)
bool simulatePacketLoss() {
return (random(100) < 20); // 20% packet loss
}
void callback(char* topic, byte* payload, unsigned int length) {
// Track received messages (for subscriber)
Serial.print("Received: ");
Serial.println(String((char*)payload).substring(0, length));
}
void setup() {
Serial.begin(115200);
WiFi.begin(ssid, password);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}
mqttClient.setServer(mqtt_server, 1883);
mqttClient.setCallback(callback);
while (!mqttClient.connected()) {
if (mqttClient.connect("ESP32_QoS_Test")) {
Serial.println("Connected to MQTT broker!");
} else {
delay(5000);
}
}
Serial.println("\n=== QoS Comparison Test ===\n");
runQoSTest();
}
void runQoSTest() {
Serial.println("Testing QoS 0 (Fire and Forget)...");
testQoS0();
delay(2000);
Serial.println("\nTesting QoS 1 (At Least Once)...");
testQoS1();
delay(2000);
Serial.println("\nTesting QoS 2 (Exactly Once)...");
testQoS2();
delay(2000);
printResults();
}
void testQoS0() {
unsigned long start = millis();
for (int i = 0; i < 100; i++) {
char msg[50];
snprintf(msg, sizeof(msg), "QoS0_Message_%d", i);
if (!simulatePacketLoss()) {
mqttClient.publish("test/qos0", msg, 0); // QoS 0
qos0_sent++;
qos0_acked++; // Assume success (no actual confirmation)
} else {
qos0_sent++;
Serial.printf("QoS0 Message %d lost (no retry)\n", i);
}
delay(10);
}
qos0_time = millis() - start;
Serial.printf("QoS 0 complete: %d sent, ~%d delivered, %d lost\n",
qos0_sent, qos0_acked, qos0_sent - qos0_acked);
}
void loop() {
mqttClient.loop();
// Test runs once in setup()
}1195.2.2 Lab Results Analysis
Expected Results (with 20% simulated packet loss):
| QoS | Messages Delivered | Duplicates | Time (ms) | Battery Impact |
|---|---|---|---|---|
| 0 | ~80/100 (80%) | 0 | 1,000 | Baseline (100%) |
| 1 | 100/100 (100%) | ~4-6 | 1,500 | ~150% |
| 2 | 100/100 (100%) | 0 | 2,500 | ~250% |
Key Observations:
- QoS 0 loses ~20% of messages (matches packet loss rate)
- QoS 1 delivers all messages, but creates duplicates when PUBACK is lost
- QoS 2 delivers all messages exactly once, no duplicates
- QoS 2 takes 2.5x longer than QoS 0 (4-way handshake overhead)
- Battery impact scales with time: QoS 2 uses 2.5x more power
1195.3 Security Considerations
Basic MQTT (port 1883) sends data unencrypted. For production:
- Use MQTT over TLS (port 8883)
- Enable authentication (username/password)
- Implement access control (topic-level permissions)
- Use private broker (don’t rely on public brokers)
Never deploy MQTT on port 1883 without TLS in production environments. Unencrypted MQTT transmits credentials and data in plain text–network sniffers can capture usernames, passwords, and all sensor readings. An attacker on your Wi-Fi network can see every temperature reading, door lock command, and camera feed. Always use MQTTS (port 8883) with TLS 1.2+ and certificate-based authentication for production deployments. Public test brokers like test.mosquitto.org are fine for learning, but never for real applications.
Using public brokers (test.mosquitto.org, broker.hivemq.com) for real IoT deployments is dangerous:
- Anyone worldwide can subscribe to your topics and see all data
- No authentication means anyone can publish malicious commands
- Zero privacy for sensor readings or control commands
- Unreliable service with no guarantees
Deploy a private broker (Mosquitto, HiveMQ, AWS IoT Core) with TLS, authentication, and topic-level ACLs. The cost is minimal compared to the security risk.
Knowledge Check: MQTT Security Test Your Understanding
1195.4 Knowledge Check
Test your understanding of these networking concepts.
Question 1: Why is using port 1883 for MQTT in production environments a security risk?
Explanation: Port 1883 transmits data in plain text without encryption, making it vulnerable to eavesdropping and man-in-the-middle attacks. Anyone with network access can intercept MQTT messages and read sensitive data (passwords, sensor values, commands). For production, use port 8883 which adds TLS/SSL encryption, protecting data in transit. Example: A smart door lock command sent over port 1883 could be intercepted and replayed by an attacker!
Question 2: A smart home system uses a public MQTT broker (test.mosquitto.org) for controlling door locks. What are the main security concerns?
Explanation: Major security risks: 1) Anyone worldwide can subscribe to your topics and see your lock commands, 2) No authentication means anyone can publish commands to unlock your door, 3) Shared infrastructure with unknown users, 4) No data encryption, 5) No topic-level access control. Solution: Use a private MQTT broker (AWS IoT Core, HiveMQ Cloud, or self-hosted Mosquitto) with TLS, authentication, and topic permissions. Never use public brokers for security-critical applications!
Question 3: What is the purpose of topic-level access control (ACL) in MQTT?
Explanation: Topic-level ACLs (Access Control Lists) restrict which clients can publish/subscribe to specific topics. Example: User “sensor_node_1” can only publish to sensors/node1/#, while “dashboard_app” can only subscribe to sensors/# but cannot publish. This prevents: unauthorized devices from sending commands, applications from accessing sensitive topics, and compromised devices from affecting the entire system. Implemented through broker configuration files or authentication plugins.
Question 4: What security measures should be implemented for a production MQTT system controlling critical infrastructure?
Explanation: Essential security layers include: 1) TLS/SSL encryption (port 8883) for all connections, 2) Strong authentication (username/password minimum, preferably X.509 certificates), 3) Topic-level ACLs limiting publish/subscribe permissions, 4) Private broker (not public), 5) Network segmentation (IoT devices on separate VLAN), 6) Regular security audits and broker updates, 7) Message signing/validation for critical commands, 8) Connection monitoring and anomaly detection. Defense-in-depth approach is critical for infrastructure!
1195.5 Common Pitfalls
The mistake: MQTT QoS levels (0, 1, 2) are often misunderstood. QoS 0 offers no delivery guarantee, QoS 1 guarantees at-least-once delivery (may duplicate), and QoS 2 guarantees exactly-once delivery. Using the wrong level leads to message loss or unnecessary overhead.
Symptoms: - Message loss when QoS 0 used for critical data - Duplicate processing when QoS 2 expected but QoS 1 used - Battery drain from unnecessary QoS 2 - High latency from QoS 2 handshake
Wrong approach:
# Using QoS 0 for critical alerts - messages may be lost!
client.publish("alerts/fire", "Fire detected!", qos=0)
# Using QoS 2 for frequent sensor readings - wastes bandwidth
while True:
client.publish("sensors/temp", read_temp(), qos=2)
time.sleep(1)Correct approach:
# Use QoS 1 or 2 for critical messages
client.publish("alerts/fire", "Fire detected!", qos=2)
# Use QoS 0 for high-frequency, non-critical data
client.publish("sensors/temp", read_temp(), qos=0)
# Use QoS 1 for important but duplicable data
client.publish("metrics/hourly", summary, qos=1)How to avoid: - Match QoS level to message criticality - Use QoS 0 for high-frequency telemetry - Use QoS 1 for commands and alerts - Use QoS 2 only for exactly-once requirements - Consider battery and bandwidth impact
The Mistake: Developers configure clean_session=false on publishing devices (sensors), expecting the broker to buffer their outbound messages when the network is down.
Why It Happens: The term “persistent session” suggests messages are persisted in both directions. Developers assume that if subscribers get queued messages, publishers should too. The MQTT spec is clear but often misread: persistent sessions only queue messages to clients, not from them.
The Fix: Implement local message buffering on the publisher side. When mqttClient.connected() returns false, store messages locally (SPIFFS, SD card, or RAM buffer) and publish them on reconnection.
// ESP32 local buffering pattern
#define BUFFER_SIZE 100
struct BufferedMessage {
char topic[64];
char payload[256];
uint8_t qos;
};
BufferedMessage buffer[BUFFER_SIZE];
int bufferIndex = 0;
void publishWithBuffer(const char* topic, const char* payload, uint8_t qos) {
if (mqttClient.connected()) {
// Flush buffer first
for (int i = 0; i < bufferIndex; i++) {
mqttClient.publish(buffer[i].topic, buffer[i].payload, buffer[i].qos);
}
bufferIndex = 0;
// Then publish current message
mqttClient.publish(topic, payload, qos);
} else if (bufferIndex < BUFFER_SIZE) {
// Store locally when offline
strncpy(buffer[bufferIndex].topic, topic, 63);
strncpy(buffer[bufferIndex].payload, payload, 255);
buffer[bufferIndex].qos = qos;
bufferIndex++;
}
}MQTT 3.1.1 Spec Reference: Section 3.1.2.4 states “If CleanSession is set to 0, the Server MUST resume communications with the Client based on state from the current Session.” The “state” includes subscriptions and inflight messages to the client, not messages the client wants to send.
The Mistake: Developers enable persistent sessions (clean_session=false) but use auto-generated or random client IDs like ESP32_ + random() or allow the library to generate one.
Why It Happens: Many MQTT libraries generate unique client IDs automatically to avoid ID collisions. Developers don’t realize this breaks persistent session restoration. Each reconnection creates a new session with different ID, so queued messages and subscriptions from the previous session are orphaned and eventually expire.
The Fix: Use a stable, unique client ID derived from hardware identifiers. For ESP32, use the MAC address or chip ID. For MQTT 5.0, you can also let the broker assign a persistent ID via Assigned Client Identifier.
// MQTT 3.1.1: Derive stable ID from hardware
char clientId[32];
uint64_t chipId = ESP.getEfuseMac(); // Unique per chip
snprintf(clientId, sizeof(clientId), "ESP32_%04X%08X",
(uint16_t)(chipId >> 32), (uint32_t)chipId);
// Connect with persistent session
if (mqttClient.connect(clientId, user, pass,
willTopic, willQos, willRetain, willMessage,
false)) { // clean_session = false
// Session restored - subscriptions and queued messages available
}
// MQTT 5.0: Let broker assign ID (paho-mqtt Python)
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2,
client_id="", // Empty = broker assigns
protocol=mqtt.MQTTv5)
client.connect(broker, port, clean_start=False,
properties=Properties(PacketTypes.CONNECT))
# Check assigned ID in CONNACK propertiesBroker Configuration (Mosquitto): Set persistent_client_expiration to control how long orphaned sessions are kept. Default is infinite, which can consume broker memory if clients use random IDs.
# mosquitto.conf - expire orphaned sessions after 7 days
persistent_client_expiration 7d
The Mistake: Developers configure QoS 2 on the publisher side, expecting guaranteed exactly-once delivery to subscribers, but subscribers connect with QoS 0 or 1. They’re confused when messages are duplicated or lost at the subscriber despite using QoS 2 for publishing.
Why It Happens: MQTT QoS is not end-to-end; it applies separately to publisher-to-broker and broker-to-subscriber segments. The effective QoS for delivery is the minimum of the two. Publishing with QoS 2 to a subscriber with QoS 0 subscription results in QoS 0 delivery (fire-and-forget) to that subscriber.
The Fix: Match QoS levels across the entire message path. If exactly-once delivery is required, both publisher and subscriber must use QoS 2. Document QoS requirements in your API specification and validate them during system integration testing.
# Publisher: QoS 2 for critical command
client.publish("factory/line1/emergency_stop", "STOP", qos=2)
# Subscriber: MUST also use QoS 2 for exactly-once delivery
def on_connect(client, userdata, flags, reason_code, properties):
# WRONG: QoS 0 subscription downgrades all deliveries
# client.subscribe("factory/line1/emergency_stop", qos=0)
# CORRECT: Match publisher QoS for end-to-end guarantee
client.subscribe("factory/line1/emergency_stop", qos=2)
# Effective QoS = min(publisher_qos, subscriber_qos)
# QoS 2 publish + QoS 0 subscribe = QoS 0 delivery (NO guarantee!)
# QoS 2 publish + QoS 2 subscribe = QoS 2 delivery (exactly-once)Real-World Impact: A manufacturing plant configured emergency stop commands with QoS 2 on HMI panels, but the PLC subscribers used default QoS 0 subscriptions. During a network glitch, the broker retransmitted the stop command (QoS 2 retry), but the PLCs–receiving at QoS 0–processed it as a new command, causing a production line to halt twice. Cost: 4 hours of downtime ($85,000).
The Mistake: Developers deploy hundreds of IoT devices with persistent sessions (clean_session=false) and long keep-alive intervals (300+ seconds). When the broker restarts or fails over, all devices attempt to reconnect simultaneously, overwhelming the broker with CONNECT packets and queued message delivery.
Why It Happens: Persistent sessions are designed to survive brief disconnections, but broker restarts trigger mass reconnection. With 1,000 devices each having 50 queued messages, the broker must deliver 50,000 messages within seconds while also handling 1,000 simultaneous CONNECT handshakes. Default broker configurations often can’t handle this “thundering herd” scenario.
The Fix: Implement staggered reconnection with exponential backoff and jitter. Configure broker max_queued_messages_per_client to limit queue buildup. For MQTT 5.0, use Session Expiry Interval to automatically clean up stale sessions.
import random
import time
class MQTTClientWithBackoff:
def __init__(self, client_id):
self.client_id = client_id
self.base_delay = 1.0 # 1 second base
self.max_delay = 120.0 # 2 minute cap
self.attempt = 0
def connect_with_backoff(self, broker, port):
while True:
try:
self.client.connect(broker, port)
self.attempt = 0 # Reset on success
return
except Exception as e:
self.attempt += 1
# Exponential backoff: 1s, 2s, 4s, 8s... capped at 120s
delay = min(self.base_delay * (2 ** self.attempt), self.max_delay)
# Add jitter: random 0-50% of delay to spread reconnections
jitter = random.uniform(0, delay * 0.5)
total_delay = delay + jitter
print(f"Reconnect attempt {self.attempt} in {total_delay:.1f}s")
time.sleep(total_delay)
# Broker configuration (mosquitto.conf)
# Limit queue buildup per client
# max_queued_messages 1000
# max_queued_bytes 1048576 # 1MB per client
#
# MQTT 5.0: Auto-expire sessions after 1 hour of inactivity
# persistent_client_expiration 1hSizing Guide: For N devices with Q average queued messages and M bytes per message, broker restart requires handling NxQxM bytes immediately. Example: 1,000 devices x 100 messages x 500 bytes = 50MB burst. Ensure broker memory can handle 2-3x this peak load.
1195.6 Summary
This chapter covered MQTT session management and security:
- Clean Sessions forget all state on disconnect, ideal for simple publishers that don’t need offline message queuing
- Persistent Sessions maintain subscriptions and queue QoS 1/2 messages for offline clients, essential for devices receiving commands during sleep
- Security requires TLS encryption (port 8883), authentication, topic-level ACLs, and private brokers for production deployments
- Common Pitfalls include expecting publishers to have messages queued, using random client IDs with persistent sessions, QoS mismatches, and reconnection storms
- Exponential Backoff with jitter prevents thundering herd problems when many devices reconnect simultaneously
- Broker Configuration must account for queue limits, session expiry, and memory requirements for large-scale deployments
1195.7 What’s Next
The next chapter, MQTT QoS Worked Examples, provides detailed real-world examples including fleet tracking QoS selection, smart door lock system design, fleet management session sizing, medical device telemetry, and sleep-wake IoT sensor configuration.