AODV routing protocol, path discovery, and automatic mesh recovery
In 60 Seconds
Zigbee uses AODV (Ad-hoc On-Demand Distance Vector) routing to discover paths through the mesh network. When a device needs to send data, it broadcasts a Route Request (RREQ); intermediate routers forward it until the destination replies with a Route Reply (RREP), establishing the path. If a link breaks, the mesh self-heals by triggering a new route discovery. This on-demand approach conserves bandwidth (no periodic routing updates) but adds latency for the first message to a new destination.
20.1 Learning Objectives
By the end of this chapter, you will be able to:
Explain how AODV (Ad-hoc On-Demand Distance Vector) routing discovers paths on demand in Zigbee networks
Trace the route discovery process through RREQ broadcast, RREP unicast, and routing table creation
Analyse how Zigbee mesh networks detect link failures and self-heal through RERR propagation and route rediscovery
Calculate expected routing latency for multi-hop paths including route discovery overhead
Design networks with adequate path redundancy to ensure reliable self-healing under router failure
20.2 Introduction
Zigbee’s mesh networking capability relies on sophisticated routing protocols to deliver messages across multiple hops. The primary routing mechanism is AODV (Ad-hoc On-Demand Distance Vector), which discovers routes only when needed, saving memory and reducing overhead on resource-constrained devices.
For Beginners: What is Routing?
Imagine you need to send a package across the country, but there’s no direct route. Instead, the package travels through multiple cities:
Your city → City A → City B → City C → Destination
Each city is like a Zigbee Router, and the package is your message. Routing protocols figure out: 1. Which cities (routers) to use 2. What to do if a city (router) is unavailable 3. How to find the best path
AODV is the “GPS” of Zigbee - it finds routes when you need them.
20.3 AODV Routing Protocol
AODV (Ad-hoc On-Demand Distance Vector) is a reactive routing protocol - it discovers routes only when traffic needs to flow, rather than maintaining routes to all destinations proactively.
20.3.1 Why On-Demand Routing?
Approach
Memory Usage
Network Traffic
Route Freshness
Proactive
High (all routes)
High (periodic updates)
Always fresh
Reactive (AODV)
Low (active routes only)
Low (on-demand)
Fresh when used
For resource-constrained Zigbee devices with 8-32KB RAM, reactive routing is essential.
20.3.2 Route Discovery Process
When a device needs to send a message to a destination without a known route:
Figure 20.1: AODV route discovery showing RREQ broadcast, RREP unicast, and subsequent data transmission
20.3.3 RREQ (Route Request)
When a device needs a route, it broadcasts an RREQ:
RREQ Message Contents:
- Source Address: 0x0023 (the sensor)
- Destination Address: 0x0000 (the coordinator)
- RREQ ID: 42 (unique per source, used with Source Address to detect duplicates)
- Hop Count: 0 (incremented at each hop)
- Radius: 5 (maximum hops allowed, called TTL in generic AODV)
RREQ Propagation:
Source broadcasts RREQ to all neighbors
Each Router that receives RREQ:
Checks if it is the destination → If yes, send RREP
Checks if already seen this RREQ (by Source Address + RREQ ID pair) → If yes, drop
Otherwise, increment hop count and rebroadcast
RREQ floods through network until destination is reached
20.3.4 RREP (Route Reply)
When the destination (or a router with a fresh route) receives the RREQ:
RREP Message Contents:
- Source Address: 0x0000 (coordinator)
- Destination Address: 0x0023 (original requester)
- Hop Count: 2 (hops from destination)
- Route Lifetime: 60 seconds (how long to cache route)
RREP Propagation:
Destination sends RREP back along the path RREQ arrived
Each Router stores the reverse route (toward source)
RREP travels unicast (not broadcast) - efficient
Source receives RREP and caches the route
20.3.5 Route Table Entries
After route discovery, each device stores route information:
Router 1 Routing Table:
| Destination | Next Hop | Hop Count | Lifetime |
|-------------|----------|-----------|----------|
| 0x0000 | 0x0000 | 1 | 60s |
| 0x0023 | 0x0023 | 1 | 60s |
Sensor Routing Table:
| Destination | Next Hop | Hop Count | Lifetime |
|-------------|----------|-----------|----------|
| 0x0000 | 0x0001 | 2 | 60s |
20.4 Route Maintenance
Routes don’t last forever. AODV includes mechanisms for maintaining valid routes:
20.4.1 Route Expiration
Route Lifecycle:
1. Route discovered → Lifetime timer starts (60s typical)
2. Data transmitted → Timer reset to full value
3. No activity → Timer counts down
4. Timer expires → Route marked invalid
5. Next transmission → New route discovery required
20.4.2 Route Error (RERR)
When a link fails (device offline, out of range), the detecting device sends RERR:
RERR Trigger Conditions:
- MAC-layer ACK not received after retries
- Neighbor timeout (no heartbeats)
- Explicit device leave notification
RERR Message:
- Unreachable Destination: 0x0001 (failed router)
- Affected Routes: List of destinations via failed router
Figure 20.2: Route error detection and automatic recovery through alternate path
Mid-Chapter Check: Route Maintenance
20.5 Self-Healing Mesh
Self-healing is one of Zigbee’s most valuable features for reliability-critical deployments.
20.5.1 Self-Healing Timeline
When a Router fails, the network recovers automatically:
T = 0: Router fails (power loss, damage, interference)
T = 0-300ms: Failure detection
- Devices sending to failed router don't receive ACKs
- 3 retries × 100ms timeout = 300ms
T = 300ms-3s: Route invalidation
- Devices mark routes via failed router as invalid
- RERR propagates to affected sources
T = 3-10s: Route rediscovery
- Affected devices broadcast new RREQs
- Parallel discovery for all affected routes
T = 10s: Network recovered
- All devices have new routes
- Traffic flows through alternate paths
20.5.2 Redundancy Design
For reliable self-healing, design networks with path redundancy:
Minimum Redundancy (N+1):
Every end device should have at least 2 routers in range
If Router A fails → Route through Router B
Recommended Redundancy (N+2 or N+3):
3-4 routers in range of each end device
Multiple alternate paths available
Faster recovery, better load balancing
20.5.3 Self-Healing Verification
Test your network’s self-healing capability:
Test Procedure:
1. Monitor message delivery rate (baseline)
2. Disable one router (power off)
3. Measure recovery time (messages start flowing again)
4. Verify new routes in device tables
5. Re-enable router, verify mesh rebalances
Expected Results:
- Recovery time: 5-15 seconds
- Message loss during recovery: 1-5 messages
- Full restoration of delivery rate
20.6 Latency Considerations
Multi-hop routing adds latency. Understanding this helps set appropriate expectations:
The first message to a new destination incurs route discovery overhead:
Route Discovery Time:
- RREQ flood: 100-500ms (depends on network size)
- RREP return: 50-200ms
- Total: 150-700ms (first message)
Subsequent Messages:
- Use cached route: 10-30ms per hop
Design Implication: For latency-sensitive applications, keep hop counts low (3-4 hops maximum) and consider pre-establishing routes during network formation.
20.6.5 Why AODV and Not RPL? Zigbee’s Routing Protocol Choice
Zigbee adopted AODV (a reactive, on-demand protocol) rather than RPL (a proactive, tree-based protocol used by Thread and 6LoWPAN) for reasons rooted in the constraints of early 2000s hardware and Zigbee’s target use cases.
Memory constraints drove the decision. RPL requires each node to maintain a Directed Acyclic Graph (DAG) with parent sets, rank information, and Trickle timers – approximately 200-500 bytes of RAM per routing entry. In 2004 when Zigbee 1.0 was standardized, typical target platforms (Ember EM250, Freescale MC1322x) had 4-8 KB of available RAM after the protocol stack. AODV stores routing entries only for active destinations – a device communicating with 3 destinations uses approximately 36 bytes (12 bytes per entry), compared to RPL’s 200+ bytes for the same topology awareness.
Traffic pattern assumptions also mattered. RPL optimizes for many-to-one collection traffic (sensors reporting to a border router), which is the dominant pattern in industrial monitoring. Zigbee was designed for home automation where traffic patterns are more diverse: a light switch sends commands to specific lights (point-to-point), a remote control sends to a media center, and a door sensor reports to a hub. AODV handles arbitrary point-to-point traffic efficiently because it discovers only the routes actually needed.
The trade-off is visible in practice. AODV’s first-message latency (150-700 ms for route discovery) is noticeable when you press a Zigbee light switch for the first time after the route has expired. Thread devices using RPL respond in under 50 ms because the route is already established. However, AODV’s lower memory footprint allows Zigbee to run on cheaper, more constrained chips – a meaningful cost advantage at scale when deploying hundreds of sensors.
20.7 Alternative Routing Methods
While AODV is the primary routing protocol, Zigbee supports additional methods:
20.7.1 Tree Routing
Hierarchical routing based on network address structure:
Address-Based Routing:
- Coordinator: 0x0000
- Router A (child of Coordinator): 0x1000
- Router B (child of Router A): 0x1100
- End Device (child of Router B): 0x1110
Route to 0x1110:
0x0000 → 0x1000 → 0x1100 → 0x1110
(Follow address hierarchy)
Advantage: No route discovery needed - address implies route Disadvantage: Rigid structure, no alternate paths
20.7.2 Source Routing
Sender specifies the complete path in the packet:
Source Route Header:
- Path: [0x0001, 0x0002, 0x0003, 0x0000]
- Each router forwards to next in list
Advantage: Predictable path, no route lookups at routers Disadvantage: Larger packet headers, sender must know full path
20.7.3 Many-to-One Routing
Optimized for sensor networks where most traffic flows toward a central collector (the concentrator, typically the Coordinator):
1. Coordinator broadcasts a Many-to-One Route Request
(a special RREQ with the many-to-one flag set)
2. Every router that receives it creates a routing table
entry pointing toward the Coordinator (reverse route)
3. When a device sends data to the Coordinator, each
router along the path appends its address to a
Route Record frame
4. The Coordinator uses collected Route Records to build
source routes for downlink (Coordinator → device) traffic
Advantage: Eliminates per-device RREQ floods for the dominant uplink traffic pattern; scales to large networks Disadvantage: Only optimizes traffic toward the concentrator; downlink still requires source routing or on-demand AODV
20.8 Worked Example: Zigbee Routing in a Three-Floor Office Building
Scenario: An office building deploys 120 Zigbee-based occupancy sensors across three floors, with a single Coordinator on Floor 2. Each floor has 8 routers (mains-powered smart plugs) and 32 battery-powered occupancy sensors (end devices).
Key insight: First message takes 4.5× longer than subsequent messages. For real-time control, pre-establish routes during network formation.
Step 3: Evaluate Self-Healing Capacity
Routers per floor: 8
Average neighbors per router: 3-4 (overlapping coverage)
Path redundancy: N+2 (each sensor can reach 2-3 routers)
If 1 router fails on Floor 1:
Affected end devices: ~4 sensors (those closest to failed router)
Recovery time: 5-15 seconds (RERR + new RREQ/RREP)
Alternate paths available: 2-3 via neighboring routers
Message loss during recovery: 1-3 messages (at 30s reporting interval,
likely zero lost since recovery < reporting interval)
Step 4: Identify the Routing Method
Traffic Pattern
Routing Method
Why
Sensor → Coordinator
Many-to-One
95% of traffic flows to Coordinator; pre-established routes reduce discovery overhead
Coordinator → single sensor
AODV on-demand
Infrequent commands; route cached from uplink traffic
Firmware update to all
Tree Routing + broadcast
Hierarchical delivery via address-based forwarding
Decision: Use Many-to-One routing as primary method. The Coordinator periodically sends Route Record Requests, and all 24 routers establish upstream paths. This eliminates RREQ floods for the dominant traffic pattern (sensor data collection), reducing network overhead by approximately 80% compared to pure AODV.
Key Insight: For data-collection IoT networks where traffic is predominantly many-to-one, configuring the Coordinator as the route concentrator dramatically reduces route discovery traffic. Reserve on-demand AODV for the rare downlink commands.
20.9 Routing Best Practices
20.9.1 Network Design
Limit hop count: Design for maximum 5-7 hops
Ensure redundancy: 2-3 routers in range of each device
Place routers strategically: Hallways, central locations
Avoid bottlenecks: Multiple paths between network regions
20.9.2 Monitoring
Track these metrics to identify routing issues:
Metric
Healthy
Warning
Critical
Average hop count
2-3
4-5
6+
Route discovery rate
< 1/min
1-5/min
> 5/min
Route failures
< 1/hour
1-5/hour
> 5/hour
Message delivery
> 99%
95-99%
< 95%
20.9.3 Troubleshooting
Common routing problems and solutions:
Symptom
Likely Cause
Solution
High latency
Too many hops
Add routers to reduce hop count
Frequent route changes
Marginal links
Improve router placement
Devices dropping offline
Insufficient redundancy
Add backup routers
Recovery too slow
Network too large
Segment into multiple PANs
Sensor Squad: Zigbee Routing
Sammy the Sensor needs to send a message: “How does my temperature reading find its way to the Coordinator across a big building?”
Max the Microcontroller explains: “We use AODV routing! When you need to send a message to someone you’ve never talked to, you broadcast a Route Request (RREQ) – like shouting ‘Does anyone know how to reach the Coordinator?’ Every Router passes the shout along until it reaches the destination.”
Lila the LED continues: “Then the Coordinator replies with a Route Reply (RREP) that comes back along the path, like leaving breadcrumbs. Now you know exactly which Routers to use!”
Bella the Battery adds the best part: “And if a Router breaks or gets unplugged? The mesh self-heals! The network notices the broken link and finds a new path around it – like water flowing around a rock in a stream.”
Key ideas for kids:
AODV = A way to discover the best path by asking neighbors
Route Request (RREQ) = Shouting “How do I get there?” through the network
Route Reply (RREP) = The answer coming back with directions
Self-healing = Automatically finding a new path when one breaks
20.10 Knowledge Check
Q1: What triggers AODV route discovery in a Zigbee network?
The Coordinator periodically broadcasts routing updates to all devices
A device needs to send data to a destination for which it has no routing entry
Every device discovers routes to all other devices during network formation
Route discovery runs on a fixed 30-second timer
Answer
B) A device needs to send data to a destination for which it has no routing entry – AODV is an on-demand (reactive) routing protocol. Routes are only discovered when needed, conserving bandwidth and memory. This differs from proactive protocols that maintain routes to all destinations continuously.
20.11 Knowledge Check
Q2: How does Zigbee’s mesh network self-heal when a Router fails?
The Coordinator immediately assigns a replacement Router
Devices detect the failed link and trigger new AODV route discoveries to find alternate paths
All devices restart and rejoin the network from scratch
End Devices switch to direct communication with the Coordinator
Answer
B) Devices detect the failed link and trigger new AODV route discoveries to find alternate paths – When a device fails to deliver a message (no acknowledgment), it marks the route as broken and initiates a new route discovery. The mesh topology ensures multiple paths exist, so traffic reroutes around the failure automatically.
Common Pitfalls
1. Route Discovery Storms
When many devices simultaneously lose their routes (e.g., after network partition resolution), a flood of RREQ messages can saturate the 802.15.4 channel and prevent normal traffic for several seconds. Implement exponential backoff for route discovery retries.
2. Not Accounting for Route Discovery Latency
The first packet to a new destination triggers route discovery, adding 100–500 ms before delivery. Applications expecting immediate first-message delivery without triggering discovery should pre-discover routes at startup.
3. Routing Table Overflow in Dense Networks
Zigbee routers have fixed routing table sizes (typically 16–32 entries). Dense networks with many unique source-destination pairs overflow routing tables, causing recent entries to evict older ones. Monitor routing table utilization in large deployments.
🏷️ Label the Diagram
Code Challenge
Order the Steps
Match the Concepts
20.12 Summary
This chapter covered Zigbee routing and self-healing:
AODV Protocol: On-demand route discovery saves memory and bandwidth
Route Maintenance: Lifetime timers and RERR messages keep routes valid
Self-Healing: Automatic recovery in 5-15 seconds when paths fail
Latency: 10-30ms per hop, plus discovery overhead for first messages
Key design principles: - Plan for redundancy (2-3 routers per end device) - Limit maximum hop count to 5-7 - Test self-healing before deployment - Monitor routing metrics in production
20.13 Knowledge Check
Quiz: Zigbee Routing and Self-Healing
::
::
Key Concepts
AODV (Ad Hoc On-Demand Distance Vector): The routing algorithm used in Zigbee mesh networks, discovering routes only when needed using RREQ/RREP message flooding.
Route Discovery: The process initiated by a Zigbee source node broadcasting RREQ messages to find a path to a destination; RREP messages trace back the route.
Route Table Entry: A stored path from a source to a destination in a Zigbee router’s routing table, used to forward future packets without re-running discovery.
RREQ (Route Request): A broadcast message initiating Zigbee route discovery; each intermediate router forwards it while recording the reverse path for the subsequent RREP.
RREP (Route Reply): A unicast message tracing back from destination to source through the discovered path, establishing routing table entries at each hop.
Link Cost: A metric (based on link quality indicator and expected transmissions) used to evaluate path quality during AODV route selection.
20.14 Concept Relationships
Concept
Related To
How They Connect
AODV Protocol
On-Demand Routing
Routes discovered only when needed, saving memory and bandwidth
RREQ/RREP
Route Discovery
Request floods network, Reply establishes path back to source
Route Lifetime
Memory Efficiency
Routes expire after timeout, freeing table space for active paths
RERR Messages
Self-Healing
Route Error triggers immediate invalidation and rediscovery
Hop Count
Latency Budget
10-30ms per hop determines total end-to-end delay
Many-to-One Routing
Sensor Networks
Optimized for data collection scenarios where all traffic flows to hub