%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#E67E22', 'signalColor': '#16A085', 'actorLineColor': '#2C3E50', 'fontSize': '11px'}}}%%
sequenceDiagram
participant S as Source<br/>(End Device)
participant R1 as Router 1
participant R2 as Router 2
participant D as Destination<br/>(Coordinator)
Note over S: No route to Destination
rect rgb(230, 126, 34)
Note over S,D: Phase 1: Route Request (RREQ) Flood
S->>R1: RREQ: "Looking for Coordinator"
S->>R2: RREQ (broadcast)
R1->>R2: RREQ (forward)
R1->>D: RREQ (forward)
R2->>D: RREQ (forward)
end
rect rgb(22, 160, 133)
Note over S,D: Phase 2: Route Reply (RREP) Unicast
D->>R1: RREP: "Route found, 1 hop"
R1->>S: RREP: "Route found, 2 hops"
end
Note over S: Route cached: S → R1 → D
rect rgb(44, 62, 80)
Note over S,D: Phase 3: Data Transmission
S->>R1: DATA "Temperature: 23°C"
R1->>D: DATA (forwarded)
D->>R1: ACK
R1->>S: ACK
end
985 Zigbee Routing and Self-Healing
AODV routing protocol, path discovery, and automatic mesh recovery
985.1 Learning Objectives
By the end of this chapter, you will be able to:
- Explain how AODV (Ad-hoc On-Demand Distance Vector) routing works in Zigbee
- Describe the route discovery process using RREQ and RREP messages
- Understand how Zigbee mesh networks self-heal when devices fail
- Calculate expected routing latency for multi-hop paths
- Design networks with adequate redundancy for reliable self-healing
985.2 Introduction
Zigbee’s mesh networking capability relies on sophisticated routing protocols to deliver messages across multiple hops. The primary routing mechanism is AODV (Ad-hoc On-Demand Distance Vector), which discovers routes only when needed, saving memory and reducing overhead on resource-constrained devices.
Imagine you need to send a package across the country, but there’s no direct route. Instead, the package travels through multiple cities:
Your city → City A → City B → City C → Destination
Each city is like a Zigbee Router, and the package is your message. Routing protocols figure out: 1. Which cities (routers) to use 2. What to do if a city (router) is unavailable 3. How to find the best path
AODV is the “GPS” of Zigbee - it finds routes when you need them.
985.3 AODV Routing Protocol
AODV (Ad-hoc On-Demand Distance Vector) is a reactive routing protocol - it discovers routes only when traffic needs to flow, rather than maintaining routes to all destinations proactively.
985.3.1 Why On-Demand Routing?
| Approach | Memory Usage | Network Traffic | Route Freshness |
|---|---|---|---|
| Proactive | High (all routes) | High (periodic updates) | Always fresh |
| Reactive (AODV) | Low (active routes only) | Low (on-demand) | Fresh when used |
For resource-constrained Zigbee devices with 8-32KB RAM, reactive routing is essential.
985.3.2 Route Discovery Process
When a device needs to send a message to a destination without a known route:
985.3.3 RREQ (Route Request)
When a device needs a route, it broadcasts an RREQ:
RREQ Message Contents:
- Source Address: 0x0023 (the sensor)
- Destination Address: 0x0000 (the coordinator)
- Sequence Number: 12345 (prevents loops)
- Hop Count: 0 (incremented at each hop)
- TTL: 5 (maximum hops allowed)
RREQ Propagation: 1. Source broadcasts RREQ to all neighbors 2. Each Router that receives RREQ: - Checks if it’s the destination → If yes, send RREP - Checks if already seen this RREQ (by sequence number) → If yes, drop - Otherwise, increment hop count and rebroadcast 3. RREQ floods through network until destination reached
985.3.4 RREP (Route Reply)
When the destination (or a router with a fresh route) receives the RREQ:
RREP Message Contents:
- Source Address: 0x0000 (coordinator)
- Destination Address: 0x0023 (original requester)
- Hop Count: 2 (hops from destination)
- Route Lifetime: 60 seconds (how long to cache route)
RREP Propagation: 1. Destination sends RREP back along the path RREQ arrived 2. Each Router stores the reverse route (toward source) 3. RREP travels unicast (not broadcast) - efficient 4. Source receives RREP and caches the route
985.3.5 Route Table Entries
After route discovery, each device stores route information:
Router 1 Routing Table:
| Destination | Next Hop | Hop Count | Lifetime |
|-------------|----------|-----------|----------|
| 0x0000 | 0x0000 | 1 | 60s |
| 0x0023 | 0x0023 | 1 | 60s |
Sensor Routing Table:
| Destination | Next Hop | Hop Count | Lifetime |
|-------------|----------|-----------|----------|
| 0x0000 | 0x0001 | 2 | 60s |
985.4 Route Maintenance
Routes don’t last forever. AODV includes mechanisms for maintaining valid routes:
985.4.1 Route Expiration
Route Lifecycle:
1. Route discovered → Lifetime timer starts (60s typical)
2. Data transmitted → Timer reset to full value
3. No activity → Timer counts down
4. Timer expires → Route marked invalid
5. Next transmission → New route discovery required
985.4.2 Route Error (RERR)
When a link fails (device offline, out of range), the detecting device sends RERR:
RERR Trigger Conditions:
- MAC-layer ACK not received after retries
- Neighbor timeout (no heartbeats)
- Explicit device leave notification
RERR Message:
- Unreachable Destination: 0x0001 (failed router)
- Affected Routes: List of destinations via failed router
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#E67E22', 'signalColor': '#E67E22', 'actorLineColor': '#2C3E50', 'fontSize': '11px'}}}%%
sequenceDiagram
participant S as Sensor
participant R1 as Router 1
participant R2 as Router 2
participant C as Coordinator
Note over R1: Router 1 FAILS!
S->>R1: DATA (attempt 1)
Note over S: No ACK...
S->>R1: DATA (attempt 2)
Note over S: No ACK...
S->>R1: DATA (attempt 3)
Note over S: No ACK - Link failed!
rect rgb(230, 126, 34)
Note over S,C: Route Error & Recovery
S->>S: Mark route via R1 invalid
S->>R2: RREQ: "New route to Coordinator?"
R2->>C: RREQ (forward)
C->>R2: RREP: "1 hop via me"
R2->>S: RREP: "2 hops"
end
Note over S: New route: S → R2 → C
S->>R2: DATA (via new route)
R2->>C: DATA (forwarded)
985.5 Self-Healing Mesh
Self-healing is one of Zigbee’s most valuable features for reliability-critical deployments.
985.5.1 Self-Healing Timeline
When a Router fails, the network recovers automatically:
T = 0: Router fails (power loss, damage, interference)
T = 0-300ms: Failure detection
- Devices sending to failed router don't receive ACKs
- 3 retries × 100ms timeout = 300ms
T = 300ms-3s: Route invalidation
- Devices mark routes via failed router as invalid
- RERR propagates to affected sources
T = 3-10s: Route rediscovery
- Affected devices broadcast new RREQs
- Parallel discovery for all affected routes
T = 10s: Network recovered
- All devices have new routes
- Traffic flows through alternate paths
985.5.2 Redundancy Design
For reliable self-healing, design networks with path redundancy:
Minimum Redundancy (N+1):
Every end device should have at least 2 routers in range
If Router A fails → Route through Router B
Recommended Redundancy (N+2 or N+3):
3-4 routers in range of each end device
Multiple alternate paths available
Faster recovery, better load balancing
985.5.3 Self-Healing Verification
Test your network’s self-healing capability:
Test Procedure:
1. Monitor message delivery rate (baseline)
2. Disable one router (power off)
3. Measure recovery time (messages start flowing again)
4. Verify new routes in device tables
5. Re-enable router, verify mesh rebalances
Expected Results:
- Recovery time: 5-15 seconds
- Message loss during recovery: 1-5 messages
- Full restoration of delivery rate
985.6 Latency Considerations
Multi-hop routing adds latency. Understanding this helps set appropriate expectations:
985.6.1 Per-Hop Latency
| Component | Time |
|---|---|
| CSMA/CA backoff | 5-20ms |
| Transmission (127 bytes @ 250Kbps) | 4ms |
| Processing at router | 1-5ms |
| Total per hop | 10-30ms |
985.6.2 End-to-End Latency Examples
| Path Length | Typical Latency | Worst Case |
|---|---|---|
| 1 hop | 10-30ms | 50ms |
| 2 hops | 20-60ms | 100ms |
| 3 hops | 30-90ms | 150ms |
| 5 hops | 50-150ms | 250ms |
985.6.3 First Message Latency
The first message to a new destination incurs route discovery overhead:
Route Discovery Time:
- RREQ flood: 100-500ms (depends on network size)
- RREP return: 50-200ms
- Total: 150-700ms (first message)
Subsequent Messages:
- Use cached route: 10-30ms per hop
Design Implication: For latency-sensitive applications, keep hop counts low (3-4 hops maximum) and consider pre-establishing routes during network formation.
985.7 Alternative Routing Methods
While AODV is the primary routing protocol, Zigbee supports additional methods:
985.7.1 Tree Routing
Hierarchical routing based on network address structure:
Address-Based Routing:
- Coordinator: 0x0000
- Router A (child of Coordinator): 0x1000
- Router B (child of Router A): 0x1100
- End Device (child of Router B): 0x1110
Route to 0x1110:
0x0000 → 0x1000 → 0x1100 → 0x1110
(Follow address hierarchy)
Advantage: No route discovery needed - address implies route Disadvantage: Rigid structure, no alternate paths
985.7.2 Source Routing
Sender specifies the complete path in the packet:
Source Route Header:
- Path: [0x0001, 0x0002, 0x0003, 0x0000]
- Each router forwards to next in list
Advantage: Predictable path, no route lookups at routers Disadvantage: Larger packet headers, sender must know full path
985.7.3 Many-to-One Routing
Optimized for sensor networks where all traffic flows to a central collector:
Coordinator broadcasts "Route Record Request"
All routers respond with their path to coordinator
Coordinator builds complete network map
Routes from any device to coordinator pre-established
Advantage: Efficient for data collection scenarios Disadvantage: Only optimizes traffic toward coordinator
985.8 Routing Best Practices
985.8.1 Network Design
- Limit hop count: Design for maximum 5-7 hops
- Ensure redundancy: 2-3 routers in range of each device
- Place routers strategically: Hallways, central locations
- Avoid bottlenecks: Multiple paths between network regions
985.8.2 Monitoring
Track these metrics to identify routing issues:
| Metric | Healthy | Warning | Critical |
|---|---|---|---|
| Average hop count | 2-3 | 4-5 | 6+ |
| Route discovery rate | < 1/min | 1-5/min | > 5/min |
| Route failures | < 1/hour | 1-5/hour | > 5/hour |
| Message delivery | > 99% | 95-99% | < 95% |
985.8.3 Troubleshooting
Common routing problems and solutions:
| Symptom | Likely Cause | Solution |
|---|---|---|
| High latency | Too many hops | Add routers to reduce hop count |
| Frequent route changes | Marginal links | Improve router placement |
| Devices dropping offline | Insufficient redundancy | Add backup routers |
| Recovery too slow | Network too large | Segment into multiple PANs |
985.9 Summary
This chapter covered Zigbee routing and self-healing:
- AODV Protocol: On-demand route discovery saves memory and bandwidth
- Route Discovery: RREQ broadcasts find paths, RREP unicasts establish routes
- Route Maintenance: Lifetime timers and RERR messages keep routes valid
- Self-Healing: Automatic recovery in 5-15 seconds when paths fail
- Latency: 10-30ms per hop, plus discovery overhead for first messages
Key design principles: - Plan for redundancy (2-3 routers per end device) - Limit maximum hop count to 5-7 - Test self-healing before deployment - Monitor routing metrics in production
985.10 What’s Next
In the next chapter, Zigbee Application Profiles, we explore how ZHA, ZLL, and Zigbee 3.0 profiles enable device interoperability across manufacturers.
- Zigbee Network Topologies - Star, tree, and mesh configurations
- Zigbee Network Formation - Device joining process
- Zigbee Security - Encrypted routing