%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1', 'noteTextColor': '#2C3E50', 'noteBkgColor': '#fff9e6', 'textColor': '#2C3E50', 'fontSize': '14px'}}}%%
graph TB
subgraph VoIPStack["VoIP/SIP Protocol Stack"]
direction TB
subgraph Control["Signaling Layer"]
SIP["SIP<br/>Session Initiation Protocol<br/>RFC 3261"]
end
subgraph Media["Media Transport Layer"]
RTP["RTP<br/>Real-time Transport Protocol<br/>RFC 3550"]
RTCP["RTCP<br/>RTP Control Protocol<br/>Quality Feedback"]
end
subgraph Transport["Transport Layer"]
UDP["UDP<br/>User Datagram Protocol<br/>RFC 768"]
end
subgraph Network["Network Layer"]
IP["IP<br/>Internet Protocol"]
end
SIP --> UDP
RTP --> UDP
RTCP --> UDP
UDP --> IP
end
subgraph Security["Security Options"]
SRTP["SRTP<br/>Encrypted Media"]
TLS["TLS<br/>Encrypted Signaling"]
zRTP["zRTP<br/>Key Exchange"]
end
Security -.->|Protects| VoIPStack
style Control fill:#E67E22,stroke:#D35400,color:#fff
style Media fill:#16A085,stroke:#16A085,color:#fff
style Transport fill:#2C3E50,stroke:#2C3E50,color:#fff
style Network fill:#7F8C8D,stroke:#7F8C8D,color:#fff
style Security fill:#ecf0f1,stroke:#16A085,color:#2C3E50
1171 IoT Application Protocols: Real-time Protocols for Audio and Video
1171.1 Learning Objectives
By the end of this chapter, you will be able to:
- Understand VoIP/SIP/RTP Architecture: Explain protocols for audio/video IoT applications
- Compare RTP vs MQTT: Select appropriate protocol for real-time vs telemetry data
- Design Secure Doorbell Systems: Apply security best practices for IoT audio/video devices
- Apply Protocol Selection Principles: Choose protocols based on requirements and constraints
- Understand Visual References: Navigate protocol landscape diagrams
1171.2 Prerequisites
Before diving into this chapter, you should be familiar with:
- Protocol Overview and Comparison: Understanding MQTT, CoAP, and HTTP trade-offs
- Networking Fundamentals: UDP vs TCP, real-time communication challenges
- Transport Fundamentals: Latency, jitter, and packet loss considerations
1171.3 How This Chapter Fits
Chapter Series Navigation: 1. Introduction and Why Lightweight Protocols Matter 2. Protocol Overview and Comparison 3. REST API Design for IoT 4. Real-time Protocols (this chapter) 5. Worked Examples
This chapter extends the protocol discussion to real-time audio/video applications like smart doorbells, intercoms, and video surveillance systems.
1171.4 Real-time Protocols for IoT
While MQTT and CoAP handle most IoT data exchange scenarios, some applications require real-time audio and video streaming with strict latency requirements. Video doorbells, baby monitors, voice assistants, and intercom systems all need protocols designed specifically for continuous media streams.
1171.4.1 VoIP and SIP Architecture
Voice over IP (VoIP) enables real-time voice and video communication over IP networks. The protocol stack for VoIP consists of several layers working together:
VoIP Protocol Stack: SIP handles session control (call setup/teardown), while RTP carries the actual audio/video data over UDP. Security layers (SRTP, TLS, zRTP) protect both signaling and media streams.
1171.4.2 Key Protocol Components
| Protocol | RFC | Purpose | IoT Relevance |
|---|---|---|---|
| SIP | RFC 3261 | Session Initiation Protocol - multimedia session control | Call setup for video doorbells, intercoms |
| RTP | RFC 3550 | Real-time Transport Protocol - media stream delivery | Audio/video streaming from cameras |
| RTCP | RFC 3550 | RTP Control Protocol - quality monitoring | Adaptive bitrate for constrained networks |
| UDP | RFC 768 | User Datagram Protocol - connectionless transport | Low-latency delivery (no TCP handshake) |
| SRTP | RFC 3711 | Secure RTP - encrypted media | Privacy for baby monitors, doorbells |
| zRTP | RFC 6189 | Key exchange for SRTP | End-to-end encryption setup |
| TLS | RFC 5246 | Transport Layer Security - encrypted signaling | Secure SIP (SIPS) on port 5061 |
1171.4.3 SIP Ports and Security
| Port | Protocol | Security | Use Case |
|---|---|---|---|
| 5060 | SIP over UDP/TCP | Unencrypted | Internal/trusted networks |
| 5061 | SIPS over TLS | Encrypted | Internet-facing devices |
For IoT devices accessible from the internet (video doorbells, remote intercoms), always use port 5061 with TLS encryption to prevent eavesdropping and session hijacking.
1171.4.4 Real-time IoT Applications
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1', 'noteTextColor': '#2C3E50', 'noteBkgColor': '#fff9e6', 'textColor': '#2C3E50', 'fontSize': '14px'}}}%%
graph LR
subgraph Devices["IoT Devices with Real-time Audio/Video"]
Doorbell["🔔 Video Doorbell"]
Monitor["👶 Baby Monitor"]
Intercom["📞 Smart Intercom"]
Assistant["🎤 Voice Assistant"]
Camera["📹 Security Camera"]
end
subgraph Protocols["Protocol Selection"]
SIPbased["SIP + RTP/SRTP<br/>Two-way communication"]
RTSPbased["RTSP + RTP<br/>One-way streaming"]
WebRTC["WebRTC<br/>Browser-based"]
end
Doorbell --> SIPbased
Intercom --> SIPbased
Monitor --> RTSPbased
Camera --> RTSPbased
Assistant --> WebRTC
style Devices fill:#2C3E50,stroke:#2C3E50,color:#fff
style Protocols fill:#16A085,stroke:#16A085,color:#fff
Real-time Protocol Selection for IoT: Two-way communication devices (doorbells, intercoms) use SIP+RTP, while one-way streaming devices (cameras, monitors) often use RTSP+RTP. Voice assistants increasingly use WebRTC for browser compatibility.
Common IoT Use Cases:
| Device | Protocol | Why |
|---|---|---|
| Video Doorbell | SIP + SRTP | Two-way audio/video with visitor, needs secure encrypted stream |
| Baby Monitor | RTP (or RTSP) | One-way video stream, low latency critical for responsiveness |
| Smart Intercom | SIP + RTP | Full duplex audio between rooms, session-based communication |
| Voice Assistant | WebRTC or proprietary | Browser/app integration, cloud speech processing |
| Security Camera | RTSP + RTP | Continuous streaming, ONVIF standard interoperability |
1171.4.5 Comparison: Real-time vs Messaging Protocols
When should you use VoIP/RTP instead of MQTT/CoAP?
| Requirement | MQTT/CoAP | VoIP/RTP |
|---|---|---|
| Data Type | Sensor readings, commands, telemetry | Continuous audio/video streams |
| Latency Tolerance | 100ms - seconds acceptable | <150ms required (human perception) |
| Packet Loss | Retransmit (reliability critical) | Skip/interpolate (continuity critical) |
| Bandwidth | Low (bytes to KB per message) | High (64kbps - 2Mbps continuous) |
| Connection Model | Message-based (discrete) | Session-based (continuous stream) |
| Typical Payload | JSON, CBOR, binary sensor data | PCM audio, H.264/H.265 video |
Use RTP/SRTP when: - Streaming continuous audio/video (doorbell live view) - Two-way voice communication (intercom, doorbell talk) - Latency under 150ms is critical - You need synchronized audio/video playback
Use MQTT when: - Sending audio clips/recordings (doorbell motion event) - Push notifications with audio alert - Speech-to-text results from cloud processing - Audio metadata (noise level, voice detection events)
Hybrid approach (common in smart doorbells):
Motion detected → MQTT notification to phone app
User opens app → SIP session established
Live video/audio → RTP/SRTP stream
User speaks → RTP audio to doorbell speaker
Call ends → SIP session terminated
Event recorded → Video clip stored, MQTT notification
1171.4.6 Security for Real-time IoT
Real-time audio and video streams require strong security to prevent eavesdropping:
| Security Layer | Protocol | Protects |
|---|---|---|
| Signaling Encryption | TLS (SIPS on port 5061) | Call setup, session details |
| Media Encryption | SRTP | Audio/video content |
| Key Exchange | zRTP, DTLS-SRTP | Secure key negotiation |
| Authentication | Digest auth, certificates | Caller/device identity |
Many consumer IoT devices have been compromised due to unencrypted streams. Always ensure:
- SRTP enabled - Not just RTP (encrypted vs plaintext audio/video)
- TLS for SIP - Port 5061, not 5060
- Strong authentication - Not default credentials
- End-to-end encryption - zRTP prevents cloud provider access to content
1171.4.7 RTP Packet Structure
Understanding RTP helps diagnose audio/video issues in IoT deployments:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Synchronization Source (SSRC) Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Contributing Source (CSRC) Identifiers |
| .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field | Bits | Purpose |
|---|---|---|
| V | 2 | Version (always 2) |
| P | 1 | Padding flag |
| X | 1 | Extension header present |
| CC | 4 | CSRC count |
| M | 1 | Marker (frame boundary) |
| PT | 7 | Payload type (codec identifier) |
| Sequence | 16 | Packet ordering/loss detection |
| Timestamp | 32 | Media timing (audio sample count) |
| SSRC | 32 | Stream identifier |
RTP Header Overhead: 12 bytes minimum (vs CoAP’s 4 bytes, MQTT’s 2 bytes)
The larger header is justified for streaming media because: - Sequence numbers detect packet loss and reordering - Timestamps enable jitter buffering and synchronization - SSRC allows multiple streams in one session
1171.5 Summary Table: Quick Reference
| Criterion | CoAP | MQTT | RTP/SIP |
|---|---|---|---|
| Best Use Case | Direct device queries | Event distribution | Audio/video streaming |
| Communication | Request-Response | Publish-Subscribe | Session-based streams |
| Transport | UDP (lightweight) | TCP (reliable) | UDP (low latency) |
| Power | Ultra-low | Low | Medium-High |
| Reliability | Optional | Built-in (QoS) | Lost packets skipped |
| Scalability | Good | Excellent | Per-session |
| Complexity | Low | Medium | High |
| Browser Support | Limited | Good (WebSockets) | WebRTC bridge |
| Setup | No broker needed | Requires broker | SIP server optional |
| Latency | Low | Medium | Ultra-low (<150ms) |
| Data Type | Sensor data | Events/telemetry | Continuous media |
1171.6 Key Takeaways
- No single protocol is always best - Evaluate based on specific requirements
- CoAP excels in constrained environments - Direct, lightweight, low power
- MQTT excels in event-driven systems - Reliable, scalable, many-to-many
- VoIP/RTP for real-time media - Video doorbells, intercoms, and voice assistants require SIP+RTP
- Hybrid approaches are common - Use both where appropriate (MQTT for notifications, RTP for live streams)
- Consider the entire system - Not just protocol features, but network, power, and architecture
Protocol Deep Dives: - MQTT - Pub/sub messaging - CoAP - RESTful IoT - AMQP - Enterprise messaging - XMPP - Presence protocol
Protocol Selection: - Protocol Selection Framework - Choosing protocols - IoT Protocols Overview - Full landscape
Architecture: - IoT Reference Models - Architecture layers - Edge Fog Computing - Protocol placement
Interactive Tools: - Simulations Hub - Protocol comparison tool
Learning Hubs: - Quiz Navigator - Protocol quizzes
The following figures from the CP IoT System Design Guide provide alternative visual representations of IoT application protocol concepts covered in this chapter.
Application Protocols Overview:

CoAP vs MQTT Comparison:

Source: CP IoT System Design Guide, Chapter 4 - Application Protocols
1171.7 Visual Reference Gallery
This comprehensive overview shows how the major IoT application protocols relate to each other, helping guide protocol selection based on device constraints and communication patterns.
Understanding the fundamental differences between CoAP (RESTful, UDP-based) and MQTT (pub/sub, TCP-based) is essential for selecting the right protocol for specific IoT scenarios.
This architectural view shows how application protocols fit into the broader IoT network stack, clarifying dependencies and integration points for multi-protocol systems.
1171.8 Summary
This chapter covered application layer protocols for IoT - the languages devices speak to exchange data:
- MQTT: Publish-subscribe pattern over TCP, lightweight headers, QoS levels (0/1/2), ideal for event-driven telemetry and cloud connectivity
- CoAP: RESTful (GET/PUT/POST/DELETE) over UDP, binary headers, supports observe/multicast, perfect for constrained devices with request-response patterns
- HTTP/REST: Universal compatibility but high overhead, best for gateways and web integration
- AMQP: Enterprise-grade message queuing with guaranteed delivery, transactions, and complex routing
- VoIP/SIP/RTP: Real-time audio and video streaming over UDP with SIP session control (RFC 3261), RTP media transport (RFC 3550), and SRTP encryption for video doorbells, baby monitors, and voice assistants
- Hybrid Architectures: Combine protocols (CoAP at edge, MQTT to cloud, RTP for live streams) to leverage each protocol’s strengths
Understanding protocol trade-offs enables optimal IoT system design matching communication patterns to device capabilities.
1171.8.1 IoT Application Protocol Selection (Variant View)
This decision framework guides protocol selection based on application requirements, device constraints, and communication patterns:
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#E67E22', 'secondaryColor': '#16A085', 'tertiaryColor': '#7F8C8D'}}}%%
flowchart TD
START(["Application Protocol<br/>Selection"])
Q1{"Communication<br/>pattern?"}
Q2{"Device<br/>constraints?"}
Q3{"Reliability<br/>requirement?"}
Q4{"Many subscribers?"}
Q5{"Real-time<br/>media?"}
MQTT["MQTT<br/>Publish-Subscribe"]
COAP["CoAP<br/>Request-Response"]
HTTP["HTTP/REST<br/>Web Integration"]
AMQP["AMQP<br/>Enterprise Messaging"]
RTP["RTP/SIP<br/>Real-Time Media"]
MQTT_DETAILS["MQTT:<br/>• Publish-subscribe pattern<br/>• TCP transport<br/>• QoS 0/1/2 levels<br/>• 2-byte header minimum<br/>• Broker-based"]
COAP_DETAILS["CoAP:<br/>• Request-response (REST)<br/>• UDP transport<br/>• 4-byte header<br/>• Observe pattern<br/>• Proxy-friendly"]
HTTP_DETAILS["HTTP/REST:<br/>• Request-response<br/>• TCP transport<br/>• Large headers<br/>• Universal compatibility<br/>• Cacheable"]
AMQP_DETAILS["AMQP:<br/>• Publish-subscribe + queues<br/>• TCP transport<br/>• Guaranteed delivery<br/>• Transactions support<br/>• Enterprise routing"]
START --> Q1
Q1 -->|"Events/Telemetry"| Q4
Q1 -->|"Resource Access"| Q2
Q1 -->|"Audio/Video"| Q5
Q4 -->|"Yes (fan-out)"| MQTT
Q4 -->|"No (point-to-point)"| Q3
Q2 -->|"Constrained MCU"| COAP
Q2 -->|"Gateway/PC"| HTTP
Q3 -->|"Critical (guaranteed)"| AMQP
Q3 -->|"Best effort OK"| MQTT
Q5 -->|"Yes"| RTP
MQTT --> MQTT_DETAILS
COAP --> COAP_DETAILS
HTTP --> HTTP_DETAILS
AMQP --> AMQP_DETAILS
style START fill:#7F8C8D,color:#fff
style Q1 fill:#2C3E50,color:#fff
style Q2 fill:#2C3E50,color:#fff
style Q3 fill:#2C3E50,color:#fff
style Q4 fill:#2C3E50,color:#fff
style Q5 fill:#2C3E50,color:#fff
style MQTT fill:#16A085,color:#fff
style COAP fill:#E67E22,color:#fff
style HTTP fill:#3498db,color:#fff
style AMQP fill:#9b59b6,color:#fff
style RTP fill:#c0392b,color:#fff
style MQTT_DETAILS fill:#d4efdf,color:#2C3E50
style COAP_DETAILS fill:#fdebd0,color:#2C3E50
style HTTP_DETAILS fill:#d6eaf8,color:#2C3E50
style AMQP_DETAILS fill:#ebdef0,color:#2C3E50
1171.8.2 Protocol Overhead Comparison (Variant View)
This visualization compares message overhead and efficiency across IoT application protocols:
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#E67E22', 'secondaryColor': '#16A085', 'tertiaryColor': '#7F8C8D'}}}%%
graph TB
subgraph Header["Protocol Message Overhead (20-byte payload)"]
direction LR
H1["Lower Overhead"]
H2["→"]
H3["Higher Overhead"]
end
subgraph Minimal["Minimal Overhead"]
COAP_OH["CoAP:<br/>4-byte header<br/>24 bytes total<br/>83% efficiency"]
MQTT_OH["MQTT (QoS 0):<br/>2-byte header + topic<br/>~30 bytes total<br/>67% efficiency"]
end
subgraph Moderate["Moderate Overhead"]
MQTT_Q1["MQTT (QoS 1):<br/>~35 bytes total<br/>57% efficiency<br/>+ ACK message"]
AMQP_OH["AMQP:<br/>8-byte header + framing<br/>~50 bytes total<br/>40% efficiency"]
end
subgraph Heavy["Heavy Overhead"]
HTTP_OH["HTTP:<br/>~200+ bytes headers<br/>~220 bytes total<br/>9% efficiency"]
XMPP_OH["XMPP:<br/>~280+ bytes XML<br/>~300 bytes total<br/>7% efficiency"]
end
subgraph Impact["Battery/Bandwidth Impact"]
I1["Low power sensors:<br/>Use CoAP or MQTT QoS 0<br/>Minimize TX time"]
I2["Reliable delivery:<br/>MQTT QoS 1/2 or AMQP<br/>Accept overhead"]
I3["Web integration:<br/>HTTP at gateway<br/>Not on sensors"]
end
Minimal --> Impact
Moderate --> Impact
Heavy --> Impact
style Header fill:#f9f9f9,stroke:#2C3E50
style Minimal fill:#16A085,color:#fff
style Moderate fill:#E67E22,color:#fff
style Heavy fill:#c0392b,color:#fff
style Impact fill:#7F8C8D,color:#fff
style COAP_OH fill:#d4efdf,color:#2C3E50
style MQTT_OH fill:#d4efdf,color:#2C3E50
style MQTT_Q1 fill:#fdebd0,color:#2C3E50
style AMQP_OH fill:#fdebd0,color:#2C3E50
style HTTP_OH fill:#fadbd8,color:#2C3E50
style XMPP_OH fill:#fadbd8,color:#2C3E50
style I1 fill:#e8e8e8,color:#2C3E50
style I2 fill:#e8e8e8,color:#2C3E50
style I3 fill:#e8e8e8,color:#2C3E50
1171.9 Summary
This chapter explored real-time protocols for IoT audio and video applications:
Key topics: - RTP (Real-time Transport Protocol): UDP-based protocol for audio/video streaming with timing and sequencing - SIP (Session Initiation Protocol): Call setup, management, and teardown for VoIP systems - Port assignments: Standard ports for SIP (5060), RTP (dynamic), and related protocols - RTP vs MQTT comparison: When to use streaming protocols vs telemetry protocols - Security best practices: SRTP, TLS, authentication, and network isolation for IoT doorbells - Protocol selection principles: Quick reference tables and decision frameworks
1171.10 What’s Next?
Complete the application protocols series:
- Worked Examples: Agricultural sensor network protocol selection case study
For related topics: - MQTT Fundamentals - CoAP Fundamentals and Architecture - Privacy and Security