102 DTN Epidemic Routing
Sensor Squad: The Rumor Network
“I have a secret message for Lila!” whispered Sammy the Sensor. “But I don’t know where she is!”
Max the Microcontroller grinned. “No problem! Tell EVERYONE you meet, and they’ll tell everyone THEY meet. Eventually someone will bump into Lila!”
“That’s epidemic routing!” said Bella the Battery. “But wait – if Sammy tells 50 friends, and each tells 50 more… that’s 2,500 copies of one message! My battery will run out carrying all those copies around!”
“Good point,” Max said. “That’s why we use Spray-and-Wait instead. Sammy makes only 10 copies and hands them to 10 friends. Each friend just holds their copy until they personally bump into Lila. Way fewer copies, and it still works!”
The Squad’s takeaway: Epidemic routing is like photocopying your message and handing it to every person on the street – expensive but very reliable. Spray-and-Wait is like making just 10 copies and giving them to people who travel a lot – much cheaper and almost as reliable!
102.1 Learning Objectives
By the end of this chapter, you will be able to:
- Design Epidemic Routing: Implement message flooding strategies for intermittent networks
- Analyze Replication Trade-offs: Balance delivery ratio against resource consumption
- Apply Buffer Management: Use FIFO, SHLI, and MOFO strategies for constrained devices
- Optimize with Spray-and-Wait: Implement bounded replication for resource efficiency
- Calculate DTN Performance: Estimate buffer requirements and latency for real deployments
102.2 Prerequisites
Before diving into this chapter, you should be familiar with:
- DTN Fundamentals: Understanding the store-carry-forward paradigm is essential before learning epidemic routing strategies
- Multi-Hop Fundamentals: Multi-hop networking concepts provide background for understanding message propagation
- Networking Basics: Fundamental networking concepts help contextualize DTN’s departure from traditional routing
Key Concepts
- Epidemic Routing: DTN routing where every node forwards messages to all contacts; maximizes delivery probability at cost of network flooding
- Replication Factor: Number of copies of a message in the network; epidemic creates unbounded copies limited only by storage
- Spray-and-Wait: Controlled epidemic variant; sprays L copies then waits for direct delivery; limits replication while maintaining coverage
- Binary Spray-and-Wait: Each copy holder gives half its tickets to a new contact; logarithmically distributes copies
- Delivery Probability: Probability of eventual message delivery given contact rate and TTL; increases with more copies in epidemic
- Resource Waste: Buffer overflow, bandwidth consumption, energy expenditure from unnecessary message copies in epidemic routing
- TTL (Time-To-Live): Maximum time a message is held and forwarded; expired messages are dropped to free resources
- Contact Opportunity: Moment when two DTN nodes are within communication range; the only time for message exchange
102.3 For Beginners: Epidemic Routing
Epidemic Routing - The Gossip Strategy:
Imagine spreading a rumor: tell everyone you meet, and they tell everyone they meet. Eventually, everyone knows!
Epidemic routing copies messages to every node encountered: - Pro: High delivery probability (many copies = high chance one reaches destination) - Con: Massive overhead (100 nodes = 100 message copies!)
| Term | Simple Explanation | Everyday Analogy |
|---|---|---|
| Epidemic Routing | Give message copy to everyone you meet | Spreading a rumor |
| Spray and Wait | Create limited copies, then wait for destination | Hand flyers to 10 people, they wait for recipient |
| TTL (Time-To-Live) | Message expiration timer | Best-before date on food |
| Buffer Management | Deciding which messages to keep/drop | Cleaning out your inbox |
Related Chapters
DTN Series:
- DTN Fundamentals - Store-carry-forward basics
- DTN Social Routing - Context-aware and social-based routing
- DTN Applications - Real-world DTN deployments
Background:
- Multi-Hop Fundamentals - Multi-hop communication basics
- Ad-hoc Fundamentals - Ad-hoc network basics
Learning:
- Simulations Hub - DTN simulators
102.4 Epidemic Routing
Epidemic routing is a flooding-based DTN protocol where nodes exchange all buffered packets upon contact, spreading messages like an infectious disease.
102.4.1 Epidemic Routing Algorithm
102.4.2 Epidemic Routing Example
Detailed Time Steps:
T=0: Packet P1 originates at Node A - A buffer: {P1} - B, C, D buffers: {}
T=1: A encounters B 1. Exchange summary vectors - A has: {P1} - B has: {} 2. B requests P1 3. A transfers P1 to B - A buffer: {P1} - B buffer: {P1} (new copy) - Replication count: 2
T=2: Multiple simultaneous contacts - B encounters C: C receives P1 - A encounters D: D receives P1 - Replication count: 4
T=3: If C is destination - C receives P1 (successful delivery) - But replicas still exist on A, B, D - Message not removed (continues spreading)
102.4.3 Epidemic Routing Control Parameters
1. Delivery Threshold
- Stop forwarding after K successful deliveries
- Reduces unnecessary replication
- Trade-off: delivery confidence vs overhead
2. Time-To-Live (TTL)
- Packets expire after TTL hops or seconds
- Prevents infinite propagation
- Limits resource consumption
3. Buffer Management
- FIFO: Drop oldest packets when buffer full
- SHLI (Shortest Lifetime First): Drop packets closest to expiration
- MOFO (Most Forwarded First): Drop most-replicated packets
4. Replication Limit
- Maximum number of copies allowed
- Source creates N replicas, carriers don’t replicate further
- Controlled flooding
102.4.4 Epidemic Routing Performance
Worked Example: Epidemic Routing Buffer Sizing
Scenario: A wildlife tracking network monitors 50 elephants in a reserve. Each collar generates location updates that must reach a base station. Calculate required buffer size for 95% delivery confidence.
Given:
- 50 elephants with GPS collars
- Each collar generates 1 message (200 bytes) per hour
- Average inter-encounter time: 4 hours between any two elephants
- Messages expire after 72 hours (TTL)
- Target: 95% delivery ratio
Steps:
- Calculate message generation rate:
- 50 collars x 1 msg/hour = 50 messages/hour network-wide
- Over 72-hour TTL: 50 x 72 = 3,600 messages exist at any time
- Estimate replication factor for 95% delivery:
- With 4-hour average encounter time, each message spreads to:
- After 4h: 2 copies (source + 1 encounter)
- After 8h: 4 copies
- After 16h: 8 copies
- After 32h: 16 copies
- After 64h: 32 copies (64% of herd)
- For 95% delivery, need ~40 copies (80% of herd coverage)
- Calculate per-node buffer:
- Total message-copies at steady state: 3,600 msgs x 20 avg copies = 72,000 copies
- Distributed across 50 nodes: 72,000 / 50 = 1,440 messages per collar
- Memory: 1,440 x 200 bytes = 288 KB minimum buffer
Result: Each collar needs ~300 KB buffer to support epidemic routing with 95% delivery over 72-hour TTL.
Putting Numbers to It
Buffer requirements grow with the product of message rate, TTL, and replication factor: \(B = \frac{R \times T \times C \times S}{N}\) where \(R\) is message rate (packets/hour), \(T\) is TTL (hours), \(C\) is average copies, \(S\) is packet size (bytes), and \(N\) is node count. Worked example: For 50 collars at 1 msg/hr, 72-hour TTL, 20 average copies, 200 bytes: \(B = \frac{1 \times 72 \times 20 \times 200}{50} = 5{,}760\) bytes per collar (~6 KB base load). With network-wide replication adding 282 KB of forwarded messages, total buffer = 288 KB as calculated.
Key Insight: Epidemic routing’s replication causes buffer requirements to scale with (message rate x TTL x replication factor / nodes). For resource-constrained wildlife collars, consider Spray-and-Wait (bounded replication) to reduce buffer needs by 80%+.
Worked Example: DTN Latency Estimation for Rural Connectivity
Scenario: A rural health clinic in Kenya uses bus-based DTN to deliver medical test results to village patients. Calculate expected delivery latency.
Given:
- 3 buses serve the route daily
- Bus route cycle: Village to Hub to Village = 8 hours
- Each bus carries DTN node with 50 Mbps Wi-Fi
- Clinic uploads results at hub (internet gateway)
- Village kiosk downloads when bus passes (30-second contact)
- Results file size: 500 KB per patient
Steps:
- Calculate worst-case latency:
- Patient test at clinic, uploaded to hub server
- Next bus arrives at hub: 0-8 hours (avg 4 hours)
- Bus downloads pending results: instant (server to bus)
- Bus travels to village: 4 hours
- Total worst case: 8h (wait) + 4h (travel) = 12 hours
- Average case: 4h (wait) + 4h (travel) = 8 hours
- Verify contact window sufficiency:
- 30-second contact at 50 Mbps = 1.5 Gb = 187 MB capacity
- If 100 patients have pending results: 100 x 500 KB = 50 MB
- 50 MB << 187 MB: Contact window sufficient
- Calculate throughput:
- 3 buses x 187 MB capacity x 2 trips/day = 1.12 GB/day
- Actual demand: ~500 results/day x 500 KB = 250 MB/day
- System has 4.5x capacity margin
Result: Medical results delivered within 8-12 hours average latency, with ample bandwidth margin for growth.
Key Insight: DTN latency = (wait for carrier) + (carrier travel time) + (contact duration overhead). For scheduled carriers like buses, latency is bounded and predictable. This 8-12 hour delay is acceptable for non-emergency medical results but inadequate for urgent cases (which should use cellular/satellite if available).
102.5 Spray-and-Wait Parameter Calculator
Use this calculator to explore how the spray count (L) affects delivery ratio and overhead in Spray-and-Wait routing:
102.5.1 Epidemic Routing Optimizations
Spray and Wait:
Phase 1 (Spray): Create L copies
Phase 2 (Wait): Each copy carrier waits for direct destination contact
- Limits replication while maintaining good delivery
- Balance: L=10 typical for moderate networks
Encounter-Based Routing (EBR):
- Use past encounter history to predict future contacts
- Forward to nodes with high encounter rate with destination
- “Smart” epidemic routing
Prioritized Epidemic Routing (PREP):
- Packets have priority levels
- High-priority packets replicate aggressively
- Low-priority packets replicate conservatively
- Application-driven resource allocation
102.6 Knowledge Check
Test your understanding of epidemic routing concepts.
Worked Example: Wildlife DTN Buffer Sizing for 14-Day Deployment
Scenario: A remote wildlife reserve deploys 30 camera traps that must store images for eventual delivery to ranger station. Calculate buffer requirements for Spray-and-Wait (L=6 copies) over 14-day patrol cycle.
Given:
- 30 camera traps, each generates 10 images/day (200KB each)
- Spray-and-Wait: L=6 copies per image
- Patrol visits: Every 14 days (2-week deployment cycle)
- TTL: 20 days (longer than patrol cycle for safety margin)
- Collar battery capacity limits buffer size to 8MB SRAM
Steps:
- Calculate message generation over deployment:
- Per trap: 10 images/day × 14 days = 140 images
- Image size: 200KB
- Total generated: 140 × 200KB = 28MB per trap
- Calculate replication overhead with Spray-and-Wait (L=6):
- Each trap creates 6 copies of its own images: 28MB × 6 = 168MB
- Trap also receives copies from neighbors (spray phase)
- Average: Each trap encounters ~8 other traps over 14 days
- Received foreign copies: 8 traps × (140 images / 30 total traps) × 200KB × 6 copies = ~4.5MB
- Total buffer needed: 168MB + 4.5MB ≈ 173MB
- Compare against hardware constraint (8MB buffer):
- Required: 173MB
- Available: 8MB
- Shortfall: 165MB (buffer overflow by factor of 21×!)
- Re-design with aggressive TTL and replication limits:
- Option A: Reduce L from 6 to 2 copies: 28MB × 2 + 1.5MB = 57.5MB (still 7× too large)
- Option B: Reduce image quality: 200KB → 50KB: (140 × 50KB × 2) + (1.5MB / 4) = 14.4MB (still 1.8× too large)
- Option C: Combined: L=2, 50KB images, TTL=14 days (exact patrol): 14MB + 0.4MB = 14.4MB
- Still exceeds 8MB by 1.8×
- Final solution: Store-and-forward with compression:
- Compress images 3:1 (JPEG quality reduction): 50KB → 17KB
- Single copy (no epidemic): 140 × 17KB = 2.4MB
- Buffer for received images: 1MB
- Total: 3.4MB (fits comfortably in 8MB with 4.6MB margin)
Result: Pure epidemic or Spray-and-Wait exceeds hardware constraints by 20-7×. Solution requires aggressive compression (3:1), minimal replication (L=1-2), and direct delivery to ranger during patrol. This demonstrates that DTN replication strategies must account for storage-constrained IoT devices.
Key Insight: Epidemic and Spray-and-Wait assume abundant storage. For constrained IoT (cameras, sensors), single-copy or very limited replication (L=2-3) is often the only viable strategy. Always calculate worst-case buffer occupancy: (messages × TTL × L × size) before deploying DTN protocols.
Decision Framework: DTN Routing Strategy Selection
Choose epidemic, Spray-and-Wait, or context-aware routing based on deployment constraints.
| Constraint | Epidemic Routing | Spray-and-Wait | Context-Aware (CAR) |
|---|---|---|---|
| Delivery Ratio Requirement | >95% critical | 85-92% acceptable | 80-90% acceptable |
| Energy Budget | Abundant (solar, vehicle) | Moderate (replaceable battery) | Limited (coin cell, 2+ year life) |
| Buffer Size | Large (GB scale) | Medium (100MB-1GB) | Small (<100MB) |
| Bandwidth | High (WiFi, LTE) | Medium (Bluetooth, Zigbee) | Low (LoRa, narrowband) |
| Contact Predictability | None (random) | Low | High (daily routines, schedules) |
| Network Density | Sparse (< 5 encounters/day) | Medium (5-20 encounters/day) | Any |
| Overhead Tolerance | 500-1000% acceptable | 200-400% acceptable | <200% required |
| Use Case Examples | Disaster response (critical) | Wildlife tracking (moderate) | Smart city wearables (constrained) |
Decision Example for Agricultural Drone Data Relay:
Scenario: Drones fly over farmland collecting crop health images, relay via ground-based solar-powered relay stations to farm gateway.
- Delivery requirement: 90% sufficient (some image loss tolerable) → Rules out Epidemic
- Energy: Relays are solar (abundant), drones are battery (24hr mission) → Moderate budget → Spray-and-Wait or CAR
- Buffer: Relay stations have 256MB flash → Medium → Spray-and-Wait or CAR
- Bandwidth: 802.11g WiFi between relay and gateway → High → All viable
- Contacts: Drones follow predictable flight paths, encounter same relays daily → High predictability → Favors CAR
- Overhead: Farm gateway has limited uplink (cellular) → <300% preferred → Favors CAR or Spray-and-Wait
Decision: Spray-and-Wait with L=5 - Rationale: Predictable contacts favor CAR, but implementation complexity and lack of historical data (new deployment) make Spray-and-Wait safer - L=5 provides good delivery (88-92%) with moderate overhead (250-300%) - Fallback: If overhead exceeds cellular budget after deployment, tune to L=3 or migrate to CAR with 2-week learning period
Red Flags:
- ❌ Epidemic for coin-cell battery devices (will drain in days)
- ❌ Single-copy for critical alerts (50-60% delivery too risky)
- ❌ CAR for unpredictable mobility (random walks, ad-hoc disaster networks)
Common Mistake: Using Epidemic Routing with Battery-Powered Nodes
The Mistake: Deploying epidemic routing in sensor networks with battery-powered nodes, leading to battery exhaustion in 3-7 days despite 2-year design life.
Real-World Example: A 2021 smart city deployment used epidemic routing for municipal WiFi-mesh hotspots to relay citizen reports. Battery-powered access points on bus stops depleted batteries in 5 days (expected: 6 months), requiring 50× more frequent maintenance.
Why Epidemic Fails for Battery Devices:
Energy Budget Breakdown:
- Epidemic (50 nodes, 100 packets/day):
- Each packet replicated to 30-40 nodes average (network saturation)
- 100 packets × 35 replicas = 3,500 transmissions/day
- Per transmission: 50mW × 100ms = 5mJ
- Daily energy: 3,500 × 5mJ = 17,500mJ = 17.5J/day
- 3000mAh battery (3.6V) = 38,880J total
- Battery life: 38,880J / 17.5J/day = 2,220 days = 6 years? NO!
Reality Check (Why Above Calculation is Wrong): - Epidemic keeps radio always-on for opportunistic contacts: 100mW × 86,400s = 8,640J/day (receive overhead) - Actual daily: 17.5J (transmit) + 8,640J (receive) = 8,657.5J/day - Battery life: 38,880J / 8,657.5J = 4.5 days (not 6 years!)
Quantified Impact:
- Expected maintenance (6-month battery): 2 visits/year per node × 50 nodes = 100 visits/year
- Actual maintenance (5-day battery): 73 visits/year per node × 50 nodes = 3,650 visits/year (36× increase!)
- Labor cost: 3,550 extra visits × $50/visit = $177,500/year extra cost
Solution - Switch to Spray-and-Wait (L=5):
- Replication: 100 packets × 5 copies = 500 transmissions/day (vs 3,500)
- Transmit energy: 500 × 5mJ = 2.5J/day (vs 17.5J)
- Duty-cycle radio: 5% on-time (scheduled contacts) = 432J/day (vs 8,640J always-on)
- Daily total: 2.5J + 432J = 434.5J/day
- Battery life: 38,880J / 434.5J = 89 days (3 months)
- Still not 6 months, but 18× better than epidemic!
Key Lessons:
- Epidemic’s receive energy dominates transmit (must stay always-on for opportunistic contacts)
- Duty-cycling breaks epidemic (nodes miss contacts, delivery ratio plummets)
- Spray-and-Wait enables duty-cycling (bounded replication + scheduled encounters)
- For battery IoT, use Spray-and-Wait (L=3-10) or context-aware with <10% duty cycle
Design Rule: If battery life is a constraint, epidemic routing is almost never the answer. Calculate duty-cycle energy cost before deployment.
102.7 Concept Relationships
| Concept | Relationship | Connected Concept |
|---|---|---|
| Epidemic Flooding | Achieves highest delivery ratio but generates | Massive Replication Overhead |
| Spray-and-Wait | Bounds replication to L copies reducing | Bandwidth and Energy Consumption |
| TTL Expiration | Prevents unbounded message propagation by limiting | Message Lifetime |
| Buffer Management | Handles storage constraints through | FIFO/SHLI/MOFO Eviction Policies |
| Contact Duration | Limits data transfer capacity during | Brief Opportunistic Encounters |
102.8 See Also
- DTN Fundamentals - Store-carry-forward foundation
- DTN Social Routing - Selective forwarding optimization
- DTN Applications - Wildlife tracking and rural connectivity
- Ad Hoc Routing: Proactive (DSDV) - Connected network routing comparison
- Multi-Hop Fundamentals - Relay forwarding concepts
Common Pitfalls
1. Confusing Epidemic Routing With Broadcast
Epidemic routing is store-carry-forward — nodes hold messages and transmit them opportunistically when contact occurs. Broadcasting sends messages simultaneously to all in range. In a sparse DTN where nodes rarely meet, epidemic routing may never successfully deliver a message even after days of waiting.
2. Not Bounding Replication in Epidemic Routing
Unconstrained epidemic routing fills all node buffers with copies, leaving no space for new messages. In practice, epidemic routing must be combined with buffer management policies and TTL limits. Pure unbounded epidemic is only theoretical; real implementations need resource bounds.
3. Setting L Too Low in Spray-and-Wait
Spray-and-Wait with L=2 copies may fail to reach the destination if both copy holders never contact the destination before TTL expires. Too-small L reduces epidemic’s delivery advantage. Tune L based on network size, contact frequency, and required delivery probability.
4. Ignoring the Benefit of Asymmetric Contact Opportunities
In real DTN scenarios, some nodes (data mules, buses, frequent travelers) have much more contact with others than sedentary nodes. Epidemic routing treats all contacts equally. Social-aware routing exploits these asymmetries to achieve better delivery with fewer copies.
102.9 Summary
This chapter covered epidemic routing for delay-tolerant networks:
- Epidemic Routing Algorithm: Nodes exchange summary vectors upon contact and transfer missing packets, spreading messages like an infectious disease throughout the network
- Replication Growth: Messages replicate exponentially (2, 4, 8, 16+ copies) as nodes encounter each other, achieving 95-99% delivery ratios
- Control Parameters: TTL limits message lifetime, delivery thresholds stop unnecessary replication, and replication limits cap the number of copies
- Buffer Management: FIFO drops oldest packets, SHLI drops packets closest to expiration, and MOFO drops most-replicated packets when buffers fill
- Performance Trade-offs: Epidemic routing achieves optimal delivery at the cost of 500-1000% overhead in bandwidth, buffer, and energy consumption
- Optimizations: Spray-and-Wait creates bounded L copies, Encounter-Based Routing uses history for smart forwarding, and Prioritized Epidemic Routing allocates resources by application priority
102.10 What’s Next
| If you want to… | Read this |
|---|---|
| Learn DTN fundamentals | DTN Store-Carry-Forward |
| Explore social routing for smarter DTN | DTN Social Routing |
| Study DTN applications in practice | DTN Applications |
| Compare DTN with ad hoc routing | Ad-Hoc Multi-Hop Routing |
| Review all ad hoc network protocols | Ad Hoc Networks Review |