Analyze MQTT publish-subscribe traffic patterns and diagnose connection issues
Examine CoAP RESTful communication including observe relationships and block transfers
Interpret Zigbee mesh network traffic including routing and cluster operations
Interpret LoRaWAN uplink/downlink patterns and join procedures
Identify common protocol-specific issues through traffic analysis
In 60 Seconds
IoT traffic analysis at the protocol level examines how specific protocols (MQTT, CoAP, AMQP, HTTP) behave on the wire — verifying message formats, QoS handshake sequences, and error response codes. Protocol-specific Wireshark dissectors decode binary protocol frames into human-readable fields, enabling verification that devices correctly implement standards. Protocol analysis catches implementation mismatches between devices and cloud backends that appear as intermittent connection failures in higher-level logs.
15.2 For Beginners: Analyzing IoT Protocols
Protocol analysis is like learning to read different languages that IoT devices use to talk to each other. Just as English, Spanish, and Mandarin have different grammars and vocabularies, protocols like MQTT, CoAP, and Zigbee have different message formats and conversation patterns. Tools like Wireshark act as translators, showing you exactly what devices are saying to each other, word by word. This helps you diagnose problems like “why did my sensor stop sending data?” or “why is my device getting disconnected?” by examining the actual conversation happening on the network.
Sensor Squad: Reading the Protocol Languages
“Each IoT protocol has its own conversation style,” said Max the Microcontroller. “MQTT uses publish and subscribe – a sensor publishes a message to a topic, and anyone subscribed to that topic receives it. CoAP is more like HTTP – request and response, but lightweight enough for tiny sensors.”
Sammy the Sensor demonstrated. “In Wireshark, my MQTT publish looks like: PUBLISH Topic=‘sensors/temp1’ Payload=‘{“temp”: 23.5}’. If the broker responds with PUBACK, I know the message was delivered. If I see CONNACK with an error code, I know the connection failed – maybe a wrong password or expired certificate.”
Lila the LED explained Zigbee analysis. “Zigbee mesh traffic is trickier because messages hop between routers. In a packet capture, you can trace the route: Sammy sends to Router A, Router A forwards to Router B, Router B delivers to the coordinator. If a hop is failing, the link quality indicators show which connection is weak.” Bella the Battery mentioned LoRaWAN. “LoRaWAN captures show the join procedure – how a device authenticates with the network server using keys. You can see uplink and downlink messages, confirm-vs-unconfirm patterns, and adaptive data rate adjustments. Each protocol tells a different story when you know how to read it!”
15.3 Prerequisites
Before diving into this chapter, you should be familiar with:
IoT protocol analysis leverages Wireshark’s protocol dissectors—software components that parse and interpret protocol-specific packet structures. Here’s how dissection works for IoT protocols:
MQTT Dissection Process:
Transport Layer: TCP dissector identifies port 1883/8883 as potential MQTT traffic
Protocol Detection: First byte (0x10-0xE0) identifies MQTT message type (CONNECT=1, PUBLISH=3, etc.)
Option Parsing: Variable-length options (URI-Path, Content-Format, Observe) use option-number encoding
Payload Marker: 0xFF byte separates options from payload
Content Decoding: Based on Content-Format option (0=text, 50=JSON, 60=CBOR), payload is interpreted
Why This Matters: Without dissectors, you see raw hex bytes. With dissectors, Wireshark translates \x30\x0E into “PUBLISH, QoS 1” and \x00\x0C sensors/temp into “Topic: sensors/temp”. This makes debugging 10-100x faster.
Limitations: Encrypted payloads (TLS/DTLS) show as “Application Data” unless you provide decryption keys. Custom protocols require writing your own Lua dissector script.
15.5 Analyzing IoT Protocols
~30 min | Advanced | P13.C06.U03
(a) Protocol analysis across OSI network stack layers with progressive decoding from physical signals through application-level protocol semantics, showing how captured packets flow through Layer 1 Physical (Wi-Fi, BLE, LoRa signal) to Layer 2 Data Link (MAC addresses, frames), Layer 3 Network (IP addresses, routing), Layer 4 Transport (TCP/UDP ports), and Layer 5-7 Application (MQTT, CoAP, HTTP). The application layer feeds a protocol decoder that branches into four parallel analysis paths: MQTT Analysis for topic/QoS/payload inspection, CoAP Analysis for method/URI/options examination, HTTP Analysis for REST APIs and headers, and Custom Protocol with binary dissector for proprietary protocols. All analysis paths converge to generate insights including message frequency metrics, payload size statistics, error rates monitoring, and security issues detection.
Figure 15.1: Protocol analysis across network stack layers, showing how traffic analysis examines Application (MQTT, CoAP), Transport (TCP, UDP), Network (IP), and Link (Wi-Fi, Zigbee) layers, with cross-layer performance metrics including latency, throughput, packet loss, and jitter.
15.5.1 MQTT (Message Queue Telemetry Transport)
Protocol Overview: Lightweight publish-subscribe messaging protocol commonly used in IoT for sensor data and commands.
Traffic Pattern:
Port: 1883 (unencrypted), 8883 (TLS)
Transport: TCP
Message types: CONNECT, CONNACK, PUBLISH, SUBSCRIBE, PUBACK, etc.
Frame Counter: Detects replay attacks and packet loss
Message Type: Join, Unconfirmed Data, Confirmed Data
Gateway Count: How many gateways received message (diversity)
RSSI/SNR: Signal strength and quality
Common Issues:
Collision: Multiple devices transmitting simultaneously (no ACK received)
Duty Cycle Violations: Exceeding allowed transmission time
ADR (Adaptive Data Rate) Issues: Suboptimal SF selection
Downlink Failures: Class A devices must wait for RX windows
15.6 Knowledge Check
Quiz: Protocol Analysis
15.7 Case Study: Field Debugging a Smart Meter Deployment with Protocol Analysis
Scenario: A utility company deployed 2,000 smart electricity meters communicating via LoRaWAN to 12 gateways across a mid-sized city. After 3 months, 340 meters (17%) were reporting intermittent data gaps – some missing 2-3 readings per day, others going silent for entire days before resuming. The utility’s monitoring dashboard showed gaps but no error messages.
Phase 1: Gateway-level capture (Day 1)
The team captured LoRaWAN traffic at the 3 worst-performing gateways using the packet forwarder log (JSON format, no specialized sniffer needed):
Gateway
Expected uplinks/hour
Actual uplinks/hour
Missing (%)
GW-07 (industrial park)
180
142
21%
GW-03 (downtown)
210
178
15%
GW-11 (residential)
160
155
3%
The industrial park gateway had the worst performance. Filtering by spreading factor revealed the pattern:
Spreading Factor
Expected count
Actual count
Loss rate
SF7 (close meters)
80/hr
79/hr
1%
SF8
45/hr
43/hr
4%
SF9
30/hr
24/hr
20%
SF10-12 (distant meters)
25/hr
6/hr
76%
Meters using higher spreading factors (further from gateway) were losing most of their packets.
Phase 2: Spectrum analysis (Day 2)
An SDR (RTL-SDR, $25) tuned to 868 MHz near GW-07 revealed high noise floor at -95 dBm during working hours (8 AM - 6 PM), dropping to -115 dBm at night. Normal LoRaWAN sensitivity at SF10 requires -132 dBm, so the 37 dB noise raise effectively blinded the gateway to distant meters during the day.
Root cause: A new logistics warehouse 200 meters from the gateway had installed a fleet of autonomous warehouse robots communicating at 868 MHz (permitted ISM band, but generating continuous interference).
Phase 3: Solution evaluation
Option
Cost
Implementation time
Impact
Relocate GW-07 antenna 500m away from warehouse
$800 (new mounting + cable)
2 days
Moved gateway out of interference zone
Add directional antenna facing away from warehouse
The team implemented both the antenna relocation AND a second gateway. After the fix, packet loss at GW-07 dropped from 21% to 2.1%, and the 340 affected meters all returned to full reporting within 48 hours.
Total cost of the outage: 3 months of 17% data gaps across 2,000 meters meant approximately 306,000 missing readings. At $0.02 estimated revenue impact per missing reading (billing inaccuracy + customer complaints), the data gaps cost approximately $6,120. The fix cost $2,000 in hardware plus 3 engineer-days ($1,800). The protocol analysis itself took 2 days and required only a $25 SDR and existing gateway logs.
Lesson: Protocol-specific analysis revealed that the problem was not the LoRaWAN protocol, the meters, or the network server – it was RF interference invisible to application-layer monitoring. Without capturing at the physical layer, the team would have continued replacing “faulty” meters ($85 each x 340 = $28,900) based on the incorrect assumption that hardware was failing.
Worked Example: Diagnosing MQTT Connection Failures Using Wireshark
Scenario: A fleet of 340 ESP32-based environmental sensors deployed across a university campus reports intermittent MQTT connection failures. Devices show “MQTT CONNACK Error Code 5” in logs approximately 8-12 times per day per device. The MQTT broker (mosquitto 2.0.18 on AWS EC2 t3.medium) shows no obvious errors in its logs. The university IT department suspects a network issue, but ping tests show 0% packet loss and <15ms latency to the broker.
Investigation Goal: Identify root cause of CONNACK error code 5 using Wireshark packet captures.
Step 1: Understand MQTT CONNACK Return Codes
MQTT v3.1.1 CONNACK return codes: - 0: Connection accepted - 1: Connection refused, unacceptable protocol version - 2: Connection refused, identifier rejected - 3: Connection refused, server unavailable - 4: Connection refused, bad username or password - 5: Connection refused, not authorized
Error code 5 indicates authentication/authorization failure, NOT a network issue.
Step 2: Capture MQTT Traffic on Failing Device
Set up packet capture on one affected ESP32 device (connected via USB serial + Wi-Fi):
# On laptop connected to ESP32# Use tcpdump to capture traffic from ESP32's IP (192.168.1.145)sudo tcpdump -i wlan0 host 192.168.1.145 and port 1883 -w mqtt_capture.pcap# Trigger a connection attempt on ESP32 via serial commandscreen /dev/ttyUSB0 115200> connect_mqtt# Wait for CONNACK error code 5# Stop capture after 60 seconds
Step 3: Analyze Capture in Wireshark
Open mqtt_capture.pcap in Wireshark and apply filter: mqtt
Root Cause Identified: The ESP32 firmware is sending the correct password (“MySecurePassword123” = 19 bytes) but then appending 119 null bytes (0x00), making the total password length 138 bytes. The mosquitto broker sees this 138-byte blob as the password, which doesn’t match the stored credential “MySecurePassword123” (19 bytes).
The bug: sizeof(password_buffer) returns 256 (the allocated buffer size), not the actual string length. The MQTT library sends all 256 bytes, but only the first 19 are the actual password - the rest are uninitialized memory (in this case, zeros from .bss section).
Step 7: Fix and Validate
// File: mqtt_client.cpp (FIXED VERSION)void MQTTClient::connect(){char password_buffer[256]; strcpy(password_buffer, MQTT_PASSWORD); mqtt_connect( broker_address, CLIENT_ID, USERNAME, password_buffer, strlen(password_buffer)// FIX: Use actual string length, not buffer size);}
After reflashing firmware with the fix, captured new MQTT CONNECT packet:
Initial report to root cause identified: 2 hours (including Wireshark analysis)
Firmware fix and testing: 4 hours
OTA rollout to 340 devices: 48 hours
Total: 56 hours from report to full resolution
Cost Impact:
Engineering time: 6 hours @ $120/hour = $720
Cloud costs (excess MQTT connection attempts before fix): $45/week
Customer support time saved: 47 tickets × 30 min/ticket × $45/hour = $1,057/week
The Lesson: Network-level packet capture with Wireshark revealed the bug in 2 hours, whereas application-level logging (“MQTT connection failed, error code 5”) gave no actionable information. The university IT department wasted 3 days investigating network issues before escalating to the IoT vendor, who solved it with packet analysis in 2 hours.
Key Wireshark Techniques Used:
Display filter mqtt to isolate MQTT traffic from other protocols
Follow TCP Stream to see human-readable MQTT message contents
Hex dump inspection to find non-printable characters and buffer issues
Packet detail pane to examine MQTT field lengths and detect oversized fields
Time column to measure broker response latency (187ms CONNECT to CONNACK - acceptable)
Decision Framework: Choosing the Right Traffic Analysis Tool for IoT Protocols
Recommended Starting Point (90% of IoT debugging):
Start with application logs (MQTT client library debug logs)
If logs show “connection failed” or “publish failed” but no details → use Wireshark
Filter by protocol: mqtt, coap, http, etc.
Look at CONNACK/response codes first (high-level success/failure)
If still unclear, inspect packet hex dumps (malformed data, buffer overflows)
Rule of Thumb: Use Wireshark when application logs say “it failed” but don’t explain WHY. Wireshark shows the actual bytes on the wire, revealing issues invisible to application code (wrong password length, malformed JSON, TLS certificate errors, etc.).
Common Mistake: Filtering Too Aggressively and Missing Root Cause
The Scenario: You’re debugging why your ESP32 weather station stops publishing MQTT messages after 24-48 hours of uptime. You capture MQTT traffic with Wireshark using filter mqtt.msgtype == 3 (only PUBLISH packets) to see when publishing stops.
Wireshark capture (filtered view):
Time
Source
Destination
Info
10:23:14
192.168.1.42
broker
Publish Message (sensors/temp)
10:24:14
192.168.1.42
broker
Publish Message (sensors/temp)
10:25:14
192.168.1.42
broker
Publish Message (sensors/temp)
… 24 hours pass …
10:23:08
192.168.1.42
broker
Publish Message (sensors/temp)
10:24:08
192.168.1.42
broker
Publish Message (sensors/temp)
[24-hour gap - no more PUBLISH packets]
You conclude: “The ESP32 stops publishing after exactly 24 hours. Must be a timer overflow bug or memory leak.”
What You Missed:
You filtered out ALL non-PUBLISH MQTT traffic. If you had looked at the FULL capture (no filter), you would have seen:
Time
Source
Destination
Protocol
Info
10:24:08
192.168.1.42
broker
MQTT
Publish Message (sensors/temp)
10:24:22
broker
192.168.1.42
MQTT
Disconnect (reason: keep alive timeout)
10:24:22
broker
192.168.1.42
TCP
[FIN, ACK]
10:24:23
192.168.1.42
broker
TCP
[RST]
10:25:14
192.168.1.42
broker
TCP
[SYN] - connection attempt
10:25:14
broker
192.168.1.42
TCP
[RST] - connection refused
10:26:14
192.168.1.42
broker
TCP
[SYN]
10:26:14
broker
192.168.1.42
TCP
[RST]
… ESP32 keeps retrying connection, all refused …
The Real Problem: The MQTT broker disconnected the client due to a keep-alive timeout (client didn’t send PINGREQ within the 60-second keep-alive window). After disconnect, the ESP32 tries to reconnect but gets TCP RST (connection refused), likely because: 1. The broker has banned the client IP due to too many failed reconnection attempts 2. The broker hit max connection limit and is refusing new connections 3. A firewall rule was triggered by the rapid reconnection pattern
See complete sequence: SYN → CONNECT → PUBLISH (x10) → PINGREQ → PINGRESP → DISCONNECT → RST
This reveals the PINGREQ/PINGRESP keep-alive mechanism stopped working, causing broker disconnect.
Phase 4: Find root cause in firmware
Knowing the keep-alive stopped, inspect ESP32 firmware:
// mqtt_client.cpp (BUGGY VERSION)void MQTTClient::loop(){if(mqtt.connected()){ mqtt.loop();// Process incoming messages// BUG: Only publish if sensor read succeedsif(read_sensor_success){ mqtt.publish("sensors/temp", sensor_data);}}}
The Bug: The mqtt.loop() function handles keep-alive PINGREQ automatically, but it’s only called when mqtt.connected() is true. If the sensor read hangs for >60 seconds (e.g., I2C bus lockup), the loop() function is blocked and never calls mqtt.loop(), so PINGREQ is never sent. The broker times out and disconnects.
Start broad, filter narrow: Capture everything first, then apply filters to zoom in
Never filter out errors: RST, DISCONNECT, NAK packets are often the smoking gun
Use Flow Graph: Visualize packet sequence over time to see protocol state machine
Cross-layer analysis: MQTT app-layer problem (publish stops) was caused by TCP-layer disconnect, which was caused by network-layer keep-alive failure
Save full captures: When asking for help (StackOverflow, vendor support), provide full pcap file, not filtered view
Real-World Statistics: In a survey of 127 IoT engineers who use Wireshark: - 68% admitted to missing root causes by filtering too early - Average time wasted: 4.2 hours debugging with wrong filter before starting over - Most common missed issue: TCP RST packets (filtered out by focusing only on application layer)
The Golden Rule: When in doubt, capture ALL, filter NEVER (until you understand the full packet sequence). Disk space is cheap, your debugging time is expensive.
15.8 Common Pitfalls
Protocol Analysis Mistakes
1. Ignoring Timing and Sequence in Analysis
Mistake: Filtering for only MQTT PUBLISH packets to debug message delivery, missing the DISCONNECT that happened 50ms earlier which explains why publishes stopped
Why it happens: IoT protocols have complex state machines. Filtering too aggressively removes surrounding context
Solution: Start with broad captures and narrow filters gradually. Use Wireshark’s “Follow TCP Stream” to see complete conversations
2. Misinterpreting Encrypted Traffic
Mistake: Seeing TLS-encrypted MQTT traffic showing only “Application Data” and assuming the connection is working correctly, when actually authentication is failing inside the encrypted tunnel
Why it happens: TLS encryption hides payload contents, so developers see successful TCP handshakes but cannot observe MQTT-level errors
Solution: For debugging, temporarily use unencrypted connections on a test network. Configure Wireshark with TLS pre-master secrets if you control the client
3. Confusing QoS Behavior with Protocol Errors
Mistake: Seeing duplicate MQTT PUBLISH packets and assuming the broker is malfunctioning
Why it happens: Not understanding that QoS 1 “at least once” delivery intentionally retransmits if PUBACK is delayed
Solution: Check the DUP flag in duplicate packets–it’s set for legitimate retransmissions. Count retransmission rate to assess network quality
15.9 Concept Check
15.10 Concept Relationships
How This Concept Connects
Prerequisites (What You Need First):
Traffic Capture Tools: Using Wireshark, tcpdump, and tshark for capturing protocol traffic
Measure timing: How long between PUBLISH and final acknowledgment for each QoS level?
Simulate packet loss: Disconnect Wi-Fi during QoS 1 publish. Does it retry? How many times?
Compare message IDs: Are they sequential or random? Why?
Expected Outcome: Visual understanding of QoS trade-offs (speed vs. reliability). QoS 0 is fastest (1 packet), QoS 1 adds reliability (2 packets), QoS 2 guarantees exactly-once (4 packets).
Challenge Extension: Write a Python script that publishes 100 messages with QoS 1 and calculates the % of messages that required retransmission (DUP flag set).
Matching Exercise: Key Concepts
Order the Steps
Label the Diagram
💻 Code Challenge
15.13 Summary
MQTT analysis focuses on CONNECT/CONNACK sequences, QoS acknowledgments, and keep-alive patterns to diagnose connection and delivery issues
CoAP analysis examines confirmable vs. non-confirmable messages, observe relationships, and block transfers for constrained device communication
Zigbee analysis requires specialized sniffers and interprets mesh routing, link quality, and ZCL cluster commands
LoRaWAN analysis monitors join procedures, spreading factors, frame counters, and gateway diversity for long-range LPWAN troubleshooting
Protocol-specific filters in Wireshark (mqtt, coap, zbee_nwk, lorawan) enable focused analysis of each IoT protocol
15.14 What’s Next
Continue to Traffic Analysis Testing to learn systematic approaches for network testing, validation, and continuous monitoring with worked examples from production IoT deployments.