31 Fog Scenarios & Pitfalls
Real-world failure modes, deployment pitfalls, and resilience strategies for fog computing
- Scenario Analysis: Evaluating architectural choices against concrete use cases with specific latency, bandwidth, and reliability requirements to validate design decisions
- Load Profile: Characterization of workload over time — peak events/second, average throughput, burst duration — used to size fog node resources
- Failure Mode Analysis: Systematic identification of how a fog system behaves when individual components fail (node crash, link failure, cloud disconnection)
- Latency-Bandwidth Trade-off: Increasing edge processing reduces bandwidth but requires more local compute; optimal balance depends on WAN cost vs. hardware cost
- Deployment Scenario: Concrete operational context (smart factory floor, hospital ward, autonomous farm) that defines constraints: available power, physical access, connectivity
- Graceful Degradation: System design ensuring that partial failures (one fog node down, WAN congested) reduce performance gracefully rather than causing total outage
- Capacity Planning: Estimating required fog node count, compute specifications, and storage based on device density, event rates, and retention requirements
- Edge Case Handling: Designing for rare but critical scenarios (simultaneous sensor failures, network storms) that stress-test fog architectures beyond normal operating parameters
31.1 Learning Objectives
By the end of this section, you will be able to:
- Analyze Real-World Applications: Evaluate fog computing deployments in autonomous vehicles and smart cities
- Diagnose Failure Modes: Explain the cascade failure sequence when fog nodes become overloaded
- Assess Common Mistakes: Distinguish the 7 most frequent pitfalls in fog computing deployments
- Design for Resilience: Implement graceful degradation and load management strategies
- Calculate Cost-Benefit Tradeoffs: Quantify bandwidth savings, latency improvements, and cost reductions from fog deployments
Fog computing scenarios show how fog processing is used in different real-world situations, along with common pitfalls to avoid. Think of case studies from experienced chefs who have already made the mistakes you want to avoid. Each scenario reveals practical lessons about when fog computing works well and when simpler approaches might be better.
31.2 Introduction
Fog computing promises to bring processing closer to data sources, reducing latency, bandwidth costs, and cloud dependency. However, the gap between theory and real-world deployment is vast. Many fog computing projects fail not because of bad technology, but because of poor architectural decisions, insufficient capacity planning, and ignored failure modes.
This chapter bridges that gap by walking through concrete scenarios with real numbers: an autonomous vehicle processing pipeline, a smart city under rush-hour overload, and seven deployment pitfalls observed across dozens of production fog systems. Each scenario includes quantitative analysis so you can apply these lessons to your own designs.
Before diving in, ensure you understand these prerequisites:
- Edge vs. Fog vs. Cloud: Edge processes data on-device (<10ms), fog aggregates at intermediate nodes (10-100ms), cloud provides elastic compute (>100ms). See Core Concepts.
- Latency budgets: Real-time systems need end-to-end latency guarantees. Safety-critical functions require <50ms; analytics can tolerate seconds.
- Data reduction: The key fog value proposition is processing data locally and sending only summaries upstream, reducing bandwidth by 100-300,000x.
- Graceful degradation: Systems that lose non-critical features under stress while preserving critical functions, rather than failing entirely.
If any of these are unfamiliar, review the Fog Computing Introduction first.
31.3 Real-World Example: Autonomous Vehicle Processing
The Setup:
- 8 high-resolution cameras capturing video at 30 frames/second
- Each frame: 1920×1080 pixels × 3 color channels = 6.2 MB
- Total data rate: 8 cameras × 30 fps × 6.2 MB = 1.5 GB per second
31.3.1 Approach 1: Cloud-Only Processing (What NOT to Do)
Step 1: Upload all video to cloud
- 1.5 GB/sec × 60 sec/min = 90 GB per minute
- At 4G LTE speeds (50 Mbps): would take 14,400 seconds (240 minutes) to upload 1 minute of data!
- Monthly data cost: 90 GB/min × 60 min/hour × 720 hours/month = 3,888 TB/month
- At $0.10/GB: $388,800 per month per vehicle!
Step 2: Cloud processes video
- Detect pedestrians, road signs, lane markings
- Latency: Upload time (2,000ms+) + processing (200ms) + download (100ms) = 2,300ms+
Step 3: Send commands back to car
- “Pedestrian detected, brake now!”
- Result: At 60 mph (88 feet/second), the car travels 202 feet before responding
- Outcome: Crash!
31.3.2 Approach 2: Fog/Edge Computing (The Right Way)
Edge Processing (In-Vehicle Computer):
What Gets Sent to Fog/Cloud:
Instead of 1.5 GB/second of raw video, send only:
- Metadata (sent to fog node - nearby cellular tower):
- “Detected: 2 pedestrians, 1 stop sign, 3 vehicles”
- “Current lane: Center, Speed: 45 mph”
- Data size: ~500 bytes every 100ms = 5 KB/second (300,000× reduction!)
- Anomaly reports (sent to cloud when available):
- “Near-miss incident at GPS coordinates”
- 5-second video clip of the incident
- Data size: ~150 MB per incident (only when something unusual happens)
- Aggregate statistics (sent to cloud daily):
- “Total miles driven: 247”
- “Pedestrians detected: 127”
- “Emergency brakes: 2”
- Data size: ~10 KB per day
31.3.3 The Results: Concrete Numbers
| Metric | Cloud-Only | Fog/Edge | Improvement |
|---|---|---|---|
| Response Time | 2,300ms | 15ms | 153× faster |
| Data Uploaded | 1.5 GB/sec | 5 KB/sec | 300,000× reduction |
| Monthly Cost | $388,800 | $15 | $388,785 saved |
| Works Offline? | No (fails in tunnels) | Yes (fully autonomous) | 100% uptime |
| Braking Distance | 202 feet (crash!) | 1.3 feet (safe stop) | Safety achieved |
The bandwidth reduction calculation reveals why edge computing is essential for autonomous vehicles:
\[\text{Cloud Upload} = 8 \text{ cameras} \times 30 \frac{\text{frames}}{\text{sec}} \times 6.2 \frac{\text{MB}}{\text{frame}} = 1{,}488 \frac{\text{MB}}{\text{sec}}\]
Over one month of driving (720 hours): \(1{,}488 \times 60 \times 60 \times 720 = 3{,}858{,}432 \text{ GB} \approx 3.86 \text{ PB}\). At $0.10/GB: $386K/month. Edge processing sends only metadata at 5 KB/sec = 12.96 GB/month = $1.30. The data reduction factor is \(\frac{1{,}488{,}000}{5} = 297{,}600\times\), achieving 99.9997% bandwidth savings while meeting safety latency requirements.
31.3.4 Key Takeaway
By processing 99.9997% of data locally (at the edge) and sending only meaningful insights to the cloud, autonomous vehicles become:
- Safe (15ms response vs 2,300ms)
- Affordable ($15/month vs $389K/month)
- Reliable (works in tunnels, rural areas, anywhere)
- Private (video stays in the car, not uploaded to cloud)
This is fog/edge computing in action – processing where the data is generated, transmitting only what matters.
Hey Sensor Squad! Imagine you are riding in a self-driving car. The car has 8 cameras – like 8 pairs of eyes – all looking around at the same time. Every single second, those cameras create a mountain of pictures. So much data that if you tried to send it all to a faraway computer (the “cloud”), it would be like trying to pour an entire swimming pool through a garden hose!
What the car does instead:
- It has a super-fast mini-computer right inside, like a brain in the car itself
- This brain looks at the camera pictures and says “I see a person crossing the road!” in just 15 milliseconds – that is faster than you can blink!
- Instead of sending all those pictures, it just sends a tiny note: “Saw 2 people, 1 stop sign, going 45 mph” – that is like sending a postcard instead of the whole swimming pool
Why does this matter? If the car had to ask a faraway cloud computer “Should I brake?”, the answer would take over 2 seconds to come back. At driving speed, the car would travel 200 feet before it even started braking. That is almost the length of a soccer field! But with its own brain, the car brakes in just 1.3 feet. That is the length of your ruler!
Think about it: What other things need to react super fast and cannot wait for a faraway answer? (Hint: think about robots in factories, or a drone avoiding a bird!)
31.3.5 Fog Data Flow Architecture
The following diagram illustrates the three-tier data flow in the autonomous vehicle scenario, showing what gets processed where and how much data flows between tiers:
31.4 What Would Happen If: Fog Nodes Get Overloaded
The Situation:
- Smart city system with 10,000 traffic cameras, 5,000 air quality sensors, 500 smart parking lots
- Normal load: Fog gateway processes 50 MB/second of sensor data
- Rush hour (5-6 PM): All systems hit peak simultaneously
- Traffic cameras detect 10× more vehicles
- Parking sensors update 20× more frequently as cars search for spots
- Air quality spikes trigger emergency alerts
- Fog node capacity: Designed for 100 MB/second, now receiving 500 MB/second
31.4.1 What Happens: The Cascade Failure
Phase 1 Impact: Initial Overload (5:00-5:05 PM)
- Traffic light coordination delays cause intersections to gridlock
- Parking availability data lags by 2 minutes (drivers circle endlessly)
- Emergency vehicle route optimization fails (cannot process real-time traffic)
Phase 2 Impact: Memory Exhaustion (5:05-5:10 PM)
- Real-time becomes batch processing
- Critical alerts (air quality spikes) delayed by minutes
- System thrashing (spending 80% of CPU moving data, only 20% processing)
Phase 3 Impact: Cascading Failures (5:10-5:15 PM)
- Data loss: 5 minutes of traffic data permanently lost (cannot analyze accident causes)
- Service degradation: System falls back to “cloud direct mode” (adding 200ms latency)
- Snowball effect: Neighboring fog nodes now receive overflow traffic, they start to fail too
31.4.2 Real-World Consequences
Traffic Management:
- Intersection coordination fails → 45-minute gridlock vs normal 10-minute rush
- Economic cost: 50,000 vehicles × 35 minutes delay × $25/hour wage = $729,166 lost productivity
Emergency Response:
- Ambulance route optimization offline → takes local roads instead of optimized route
- Arrives 8 minutes late instead of 12-minute target (emergency services benchmark: <15 min)
Air Quality:
- Spike detection delayed by 12 minutes → asthma alert system fails to notify vulnerable residents
- Health consequences for 2,000+ people with respiratory conditions
31.4.3 The Solutions: How to Prevent Overload
Solution 1: Graceful Degradation
# Priority-based load shedding for fog nodes
if fog_node.cpu_usage > 0.80: # 80% CPU threshold
# Drop low-priority data (parking updates every 30s instead of 5s)
reduce_update_frequency(priority="low")
if fog_node.queue_depth > 5.0: # 5 seconds of backlog
# Shed load: Process only critical data
process_only(["emergency_vehicles", "air_quality_alerts"])
drop_data(["traffic_stats", "parking_analytics"])Result: Critical services (emergency routing, alerts) stay operational, non-critical services degrade gracefully
Solution 2: Dynamic Load Balancing
Result: No single point of failure, load spreads across available resources
Solution 3: Predictive Scaling
Result: System ready for predictable peaks, no overload occurs
Solution 4: Edge Pre-Filtering
| Mode | Frame Rate | Data per Camera | 10,000 Cameras Total |
|---|---|---|---|
| Normal | 30 fps (all frames) | 6.2 MB/s | 62 GB/s |
| Overload | Changes only | 0.5 MB/s | 5 GB/s |
| Reduction | – | 92% less | 12x less data |
Result: Fog node receives 12x less data during peaks by shifting filtering responsibility to edge devices
31.4.4 Key Lessons
- Plan for 3-5× peak capacity, not average load (rush hour ≠ midnight traffic)
- Implement graceful degradation: Lose non-critical features, keep critical ones running
- Monitor and alert: CPU >70% = yellow, >85% = red, trigger autoscaling
- Test failure modes: Deliberately overload fog nodes in staging to verify degradation behavior
- Have a fallback: Cloud-direct mode for when fog nodes fail (slower but functional)
Real-world example: Google Cloud Load Balancer sheds 20% of low-priority traffic when overloaded to protect critical services. Your fog system should follow the same principle.
31.5 Common Mistakes When Deploying Fog Computing
31.5.1 Mistake 1: “Fog Nodes Should Process Everything Locally”
The Problem: Many teams try to replicate entire cloud functionality on fog nodes, leading to: - Oversized, expensive fog hardware (trying to run full ML models on edge) - Complex deployments that are hard to maintain - Fog nodes that can’t handle actual workloads
Example: Smart factory deploys $5,000 industrial PCs as fog nodes to run TensorFlow models with 500M parameters. Models take 30 seconds to run inference (useless for real-time). Hardware costs spiral.
The Right Approach:
- Edge/Fog: Run lightweight inference with quantized models (<50ms)
- Cloud: Train full models, update fog nodes weekly with optimized versions
- Use TensorFlow Lite or ONNX Runtime for fog nodes, not full frameworks
Rule of thumb: If it takes >100ms on fog hardware, move it to cloud or optimize the model.
31.5.2 Mistake 2: “Internet Outage = Fog System Keeps Working Perfectly”
The Problem: Assuming fog nodes are fully autonomous without planning for degraded functionality: - No local user authentication (auth server in cloud) - No local time synchronization (NTP servers unreachable) - Certificate validation failures (can’t check revocation lists)
Example: Hospital fog gateway continues monitoring patients during internet outage, but: - Doctors can’t log in (OAuth relies on cloud identity provider) - Alerts don’t reach pagers (notification service is cloud-only) - Data timestamps drift by 10 seconds (NTP unreachable, local clock skew)
The Right Approach:
- Local authentication cache: Store recent user credentials with 7-day expiry
- Local NTP server: One fog node runs NTP, others sync to it
- Certificate caching: Cache OCSP responses for 24 hours
- Graceful degradation plan: Document what works offline vs what requires cloud
Test it: Disconnect internet for 4 hours in staging. What breaks? Fix it.
31.5.3 Mistake 3: “More Fog Nodes = Better Performance”
The Problem: Deploying too many fog nodes creates management overhead without benefits: - 100 fog nodes for 100 sensors = 100× management complexity - Update deployment takes hours (updating 100 nodes sequentially) - Monitoring costs exceed hardware costs (100 nodes × $50/month monitoring)
Example: Retail chain deploys one fog node per store (500 stores). Software update pushes to 500 nodes take 8 hours, blocking bug fixes. Security patch deployment becomes a multi-day project.
The Right Approach:
- Fog node per site: 1 fog node per physical location (store, factory floor, building)
- Coverage rule: 1 fog node per 100-1,000 devices (balance management vs latency)
- Clustering: Use K8s/Docker Swarm to manage fog nodes as a fleet, not individuals
Calculate optimal fog nodes:
Formula A (device coverage):
Optimal_Nodes = Total_Devices / 500Formula B (latency constraint):
Optimal_Nodes = Sites / (Max_Acceptable_Latency_ms x 0.1ms/km)Use whichever formula yields the greater number to satisfy both constraints.
31.5.4 Mistake 4: “Fog Nodes Don’t Need Monitoring—They’re Autonomous”
The Problem: Treating fog nodes as “set and forget” leads to silent failures: - Disk full (100% storage, can’t buffer data during outages) - Thermal throttling (CPU at 90°C, performance degraded by 50%) - Memory leaks (service using 95% RAM after 30 days uptime)
Example: Manufacturing fog node runs for 6 months without monitoring. Unknown memory leak causes it to crash every 14 days. Production line loses 4 hours of sensor data per crash. Root cause not discovered for months because “it reboots and works again.”
The Right Approach:
- Monitor key metrics: CPU, RAM, disk, network, temperature
- Set alerts: >80% CPU for 10 min, >90% disk, >75°C temperature
- Watchdog timers: Automatically reboot if service hangs for >5 minutes
- Health reporting: Fog nodes report status to cloud every 5 minutes
Minimum monitoring stack:
| Component | Role | License |
|---|---|---|
| Prometheus | Metrics collection & storage | Open source |
| Grafana | Dashboards & visualization | Open source |
| Alertmanager | Alert routing to ops team | Open source |
| Node Exporter | Hardware metrics (CPU, RAM, disk, temp) | Open source |
Cost: $0 (all open source), approximately 30 minutes setup time per node.
31.5.5 Mistake 5: “5G/Fiber is Fast Enough, We Don’t Need Fog”
The Problem: Assuming fast networks eliminate need for local processing: - Physics can’t be beaten: Speed of light = 300 km/ms, nearest data center 1,000 km away = 3.3ms minimum - Network congestion: 5G advertises 10ms latency, reality during peaks: 50-200ms - Reliability: Fiber gets cut (construction), 5G has dead zones
Example: Autonomous vehicle team: “5G has 10ms latency, we’ll process in the cloud!” Reality: Highway tunnel = no 5G signal. Car can’t see. Crash.
The Right Approach:
- Critical functions (collision avoidance): Always edge (on-device)
- Real-time functions (traffic coordination): Fog (roadside units)
- Non-critical (analytics, maps): Cloud with local caching
Network latency breakdown (advertised vs. reality):
| Latency Component | Advertised | Real-World (Dense Urban) |
|---|---|---|
| Radio access | ~2ms | 15ms |
| Backhaul | ~3ms | 20ms |
| Internet routing | ~3ms | 30ms |
| Server processing | ~2ms | 10ms |
| TOTAL | ~10ms | 75ms (7.5x worse) |
Rule: Never trust marketing numbers. Measure real-world latency in your specific deployment environment under realistic load conditions.
31.5.6 Mistake 6: “Security Doesn’t Matter—Fog Nodes are on Private Networks”
The Problem: Assuming physical/network isolation provides security: - Fog nodes often have remote access (SSH, VPN) for maintenance - Insider threats (disgruntled employees with physical access) - Supply chain attacks (compromised firmware updates)
Example: Factory fog nodes on isolated OT network. Contractor connects laptop to maintenance port, laptop has malware. Malware spreads to fog nodes, then to PLCs. Production shutdown for 3 days, $2M loss.
The Right Approach:
- Encrypt everything: TLS for all communication, even on “private” networks
- Least privilege: Fog services run as non-root, separate user accounts
- Signed updates: Cryptographically verify firmware/software updates
- Network segmentation: Fog nodes on separate VLAN from corporate network
- Physical security: Lock fog hardware in secure cabinets
Minimum security checklist:
31.5.7 Mistake 7: “We Can Update All Fog Nodes Simultaneously”
The Problem: Pushing updates to all fog nodes at once causes outages: - Update has a bug → all fog nodes fail simultaneously - Network congestion downloading 500 MB update × 100 nodes = 50 GB spike - Rollback takes hours (re-downloading old version)
Example: Smart city pushes traffic management update to 50 fog nodes at 5 PM (rush hour). Update has bug causing fog nodes to crash. All traffic lights revert to manual mode simultaneously. Citywide gridlock for 2 hours. Economic loss: $5M.
The Right Approach:
- Canary deployments: Update 5% of fog nodes first, wait 24 hours, check metrics
- Rolling updates: Update 10 nodes/hour, not all 100 simultaneously
- Rollback plan: Keep previous version on disk, one-command rollback
- Time windows: Update during off-peak (midnight, not rush hour)
Update strategy:
Real-world: Google Chrome updates 1% of users per day (canary), then 10%/day. Takes 10 days for 100% rollout. Your fog network should follow similar caution.
31.5.8 Pitfall Summary: Fog Computing Success Principles
DO:
- Design for graceful degradation (lose features, not entire system)
- Monitor everything (you cannot fix what you cannot see)
- Test offline mode (disconnect internet, verify critical functions work)
- Plan for 3-5x peak capacity (rush hour is not average load)
- Update incrementally (canary deployments, not all-at-once)
DO NOT:
- Assume fog nodes are autonomous (plan for degraded modes)
- Over-deploy fog nodes (balance management overhead vs latency)
- Trust network promises (measure real-world latency)
- Neglect security (encrypt everything, even on “private” networks)
- Skip monitoring (silent failures are the worst failures)
Remember: Fog computing is about intelligent distribution of workloads, not just “mini clouds everywhere.” Think carefully about what belongs at edge vs fog vs cloud.
31.6 Fog Deployment Decision Matrix
Use this decision framework to determine the correct processing tier for any IoT workload:
Misconception 1: “Fog computing always reduces costs”
- Reality: Fog hardware and maintenance costs money. It only saves costs when bandwidth/cloud processing savings exceed fog infrastructure expenses. For small deployments (<100 devices), cloud-only is often cheaper.
Misconception 2: “Fog = Mini cloud that does everything locally”
- Reality: Fog nodes have limited resources (CPU, memory, storage). They should handle time-critical processing and filtering, not replicate full cloud ML models or analytics. Know when to process locally vs offload to cloud.
Misconception 3: “Edge and fog are the same thing”
- Reality: Edge = on-device processing (sensor, gateway). Fog = intermediate layer between edge and cloud (local servers, base stations). Edge handles <10ms critical tasks, fog handles 10-100ms aggregation/analytics.
Misconception 4: “Internet outage = Fog keeps everything working”
- Reality: Fog enables autonomous operation of critical functions, but many services depend on cloud (authentication, firmware updates, long-term storage). Design for graceful degradation, not full autonomy.
Misconception 5: “5G eliminates the need for fog”
- Reality: 5G reduces latency but cannot beat physics (speed of light = 300 km/ms). Data centers 1,000+ km away still have >3ms minimum latency. Real-world 5G latency: 15-75ms (not the advertised 1-10ms). Critical applications (<10ms) still need edge processing.
31.7 Summary and Key Takeaways
This chapter examined fog computing through the lens of real-world scenarios, failure modes, and deployment pitfalls. The key lessons are:
From the Autonomous Vehicle Scenario:
- Edge processing achieves 15ms response time vs 2,300ms for cloud-only (153x faster)
- Data reduction from 1.5 GB/s to 5 KB/s (300,000x) transforms economics from $389K/month to $15/month
- Safety-critical functions must always run at the edge, not in the cloud
From the Smart City Overload Scenario:
- Fog nodes must be designed for 3-5x peak capacity, not average load
- Cascade failures progress from CPU saturation to memory exhaustion to neighbor overload in minutes
- Four mitigation strategies: graceful degradation, dynamic load balancing, predictive scaling, and edge pre-filtering
From the Seven Deployment Pitfalls:
| Pitfall | Core Lesson |
|---|---|
| Process everything locally | Use lightweight models at fog; train full models in cloud |
| Assume full autonomy offline | Cache credentials, run local NTP, document degraded modes |
| More nodes = better | Balance coverage (1 per 100-1,000 devices) vs management overhead |
| Skip monitoring | Monitor CPU, RAM, disk, temperature; use watchdog timers |
| Fast networks replace fog | Physics limits network speed; measure real latency, not marketing |
| Ignore security on private nets | Encrypt all traffic; sign firmware updates; enforce least privilege |
| Update all nodes at once | Use canary deployments (5% first, then staged rollout) |
31.8 Worked Example: Fog Node Failure Impact Analysis for Smart City Traffic
Scenario: A city deploys 12 fog nodes managing 2,400 traffic signals (200 signals per fog node). Each fog node runs adaptive signal timing based on real-time camera feeds from 50 intersection cameras. One fog node fails at 8:15 AM during morning rush hour. What is the blast radius?
Step 1: Immediate Impact (First 60 Seconds)
| Metric | Normal Operation | During Fog Failure |
|---|---|---|
| Signals affected | 0 | 200 (8.3% of city) |
| Signal behavior | Adaptive timing (adjusts green/red based on queue length) | Falls back to fixed-time schedule (pre-programmed in signal controller) |
| Intersection camera analytics | Real-time vehicle counting, queue detection | Stops (no fog node to process video) |
| Adjacent fog nodes | Normal load | No change (fog nodes are independent – no cascade) |
Step 2: Traffic Impact Over Time
| Time Since Failure | Affected Corridor | Congestion Effect |
|---|---|---|
| 0-5 min | 200 signals on fixed timing | Minor – fixed timing is within 10% of optimal during moderate traffic |
| 5-15 min | Rush hour traffic builds | Queues extend 40% longer than with adaptive timing. Average intersection delay increases from 35 sec to 55 sec. |
| 15-30 min | Spillover to adjacent corridors | 3 adjacent fog zones see 15% increased traffic from diverted drivers. Their adaptive timing handles it. |
| 30-60 min | Peak rush hour at failed zone | Average delay: 72 sec (vs 35 sec normal). 4 intersections gridlocked. |
Step 3: Financial Impact Calculation
| Cost Category | Calculation | Amount |
|---|---|---|
| Commuter delay | 50,000 vehicles x 37 sec extra delay x ($25/hr wage / 3,600) | $12,847 |
| Fuel waste | 50,000 vehicles x 0.2L extra idle fuel x $1.50/L | $15,000 |
| Emergency response delay (if ambulance routed through zone) | 2-5 min additional response time | Unquantifiable (life safety risk) |
| Total cost of 1-hour fog failure | ~$28,000 |
Step 4: Prevention vs Cost
| Mitigation | Cost | Downtime Reduced To |
|---|---|---|
| No redundancy (current) | $0 | 60-120 min (manual replacement) |
| Hot standby fog node (active-passive) | $2,400 ($200/node x 12 nodes) | 30-60 sec (automatic failover) |
| Paired fog nodes (active-active) | $28,800 ($2,400/node x 12 nodes) | 0 sec (instant failover) |
Result: A $200 hot standby per fog node prevents $28,000/hour in congestion costs. With 3 fog failures per year (typical for commodity hardware), the hot standby pays for itself in the first failure: $200 investment prevents $28,000 loss = 14,000% ROI. The active-active option ($2,400) is only justified if even 30-second failover is unacceptable (e.g., safety-critical autonomous vehicle corridors).
Key insight: Fog failure is localized (only 200 of 2,400 signals affected), unlike cloud failure which would affect all 2,400. This containment is a core advantage of fog architecture – but it also means each fog node is a single point of failure for its zone, making per-node redundancy essential.
Decision Framework: Use the tier placement decision tree – latency requirements and multi-device data needs determine whether a workload belongs at edge (<10ms), fog (<100ms, multi-device), or cloud (elastic, non-real-time).
31.9 Knowledge Check
31.10 What’s Next
Continue your fog computing journey with:
| Topic | Chapter | Description |
|---|---|---|
| Core Concepts | Fog Computing Concepts | Fog computing theory and the paradigm shift from centralized to distributed processing |
| Requirements Analysis | Fog Requirements | Systematic methods for determining when to use fog vs edge vs cloud |
| Design Tradeoffs | Fog Design Tradeoffs | Architecture decisions, cost models, and optimization strategies |