Four critical fog tradeoffs: containers vs VMs (50-200 MB vs 512 MB-2 GB overhead), edge vs fog processing (<5ms vs 5-20ms latency), Active-Active vs Active-Passive redundancy (0-50ms vs 30-120s failover), and synchronous vs asynchronous replication (0 RPO vs 1-60s lag). Size fog nodes for 3x peak load, not average – 100 sensors reporting anomalies simultaneously will overwhelm under-provisioned hardware.
Key Concepts
Latency-Cost Trade-off: Closer edge processing reduces latency but increases CapEx (hardware at each site); cloud processing costs less per unit compute but adds network delay
Consistency vs. Availability: CAP theorem applied to fog — during network partition, fog nodes must choose between serving stale-but-available data vs. refusing requests to maintain consistency
Edge Intelligence vs. Maintenance Burden: More capable edge models improve local decision quality but require MLOps pipelines for continuous retraining and deployment
Centralization vs. Distribution: Centralized cloud architectures are easier to manage and update; distributed fog architectures are more resilient but harder to debug
Data Freshness: Trade-off between how current edge data is (continuous transmission) vs. bandwidth cost (batched transmission); real-time dashboards vs. hourly reports
Security Perimeter Trade-off: Cloud centralizes security controls; edge/fog distributes the attack surface requiring security hardening at each node
Vendor vs. Open Source: Proprietary edge platforms (AWS Greengrass, Azure IoT Edge) reduce integration effort but create lock-in; open-source (K3s, Eclipse Kura) requires more ops expertise
Build vs. Buy for Fog: Custom fog solutions optimize for specific workloads but require engineering resources; commercial fog platforms deploy faster but limit customization
32.1 Learning Objectives
By the end of this section, you will be able to:
Evaluate Design Tradeoffs: Compare containers vs VMs, edge vs fog processing, and redundancy models across latency, reliability, cost, and complexity dimensions
Calculate Capacity Requirements: Apply the 3x peak load sizing rule to determine fog node hardware specifications for a given sensor deployment
Design Redundancy Architectures: Select between Active-Active (0-50ms failover) and Active-Passive (30-120s failover) based on availability SLAs and team expertise
Implement Replication Strategies: Choose synchronous, asynchronous, or tiered replication based on RPO targets and offline operation requirements
Analyze Common Pitfalls: Identify fog node overload, orchestration complexity, over-engineering, and availability assumption failures from real deployment symptoms
Minimum Viable Understanding
Four critical tradeoffs: Every fog deployment must decide containers vs VMs (50-200 MB vs 512 MB-2 GB overhead), edge vs fog processing (<5ms vs 5-20ms latency), Active-Active vs Active-Passive redundancy (0-50ms vs 30-120s failover), and synchronous vs asynchronous replication (0 RPO vs 1-60s data lag)
Hybrid approaches dominate production: Real deployments combine options – edge for safety-critical threshold alerts, fog for cross-sensor analytics, tiered replication with sync for critical events and async for bulk telemetry
Size for 3x peak, not average: Fog nodes sized for average load fail when 100 sensors report anomalies simultaneously; use the formula: sensors x messages/sec x peak_multiplier x safety_margin to calculate required capacity
Sensor Squad: Fog Design Tradeoffs
Fog computing has tricky choices to make – just like planning a party!
32.1.1 The Sensor Squad Adventure: The Great Party Planning Puzzle
Sammy the Sound Sensor was SO excited – the Smart Factory was throwing a big party for all 500 sensor friends! But there were so many decisions to make…
“Should we have the party in the SMALL room that’s really close,” asked Sammy, “or the BIG room that’s far away?”
Lila the Light Sensor thought about it: “The small room is fast to get to (like edge processing!), but we can only fit 50 friends. The big room can fit everyone, but it takes 20 minutes to walk there (like cloud processing!).”
Max the Motion Sensor had a brilliant idea: “What about the MEDIUM room down the hall? It fits 200 friends and only takes 2 minutes to get there!” That was the fog node – not too small, not too far, just right!
But then Bella the Bio Sensor asked the REALLY hard question: “What if the medium room’s door gets locked? Do we have a backup plan?”
That is exactly what this chapter is about – making smart choices and ALWAYS having a backup plan!
32.1.2 Key Words for Kids
Word
What It Means
Tradeoff
When you choose one good thing, you might give up another good thing – like choosing between a fast car (edge) and a big truck (cloud)
Redundancy
Having a backup plan, like bringing an umbrella AND a raincoat just in case
Capacity Planning
Making sure the party room is big enough for all your friends, plus extra space for surprises
Failover
When Plan A breaks, automatically switching to Plan B so nothing stops working
32.1.3 Try This at Home!
The Backup Plan Game: Think about your morning routine. What is your backup plan if… 1. Your alarm clock stops working? (Phone alarm = redundancy!) 2. The bus is late? (Walk, bike, or parent drives = failover!) 3. You forgot your lunch? (Cafeteria food = graceful degradation!)
Every fog system needs backup plans just like you do!
For Beginners: Fog Architecture Decisions
If terms like “active-active redundancy” or “synchronous replication” sound intimidating, don’t worry. Every fog design decision boils down to a simple question: What matters most for THIS specific use case?
Think of it like choosing transportation:
Your Need
Best Choice
Tradeoff
Get there FAST
Sports car (Edge)
Small trunk, expensive
Carry LOTS of stuff
Moving truck (Cloud)
Slow, needs highway
Balance of speed + capacity
SUV (Fog)
Not the fastest, not the biggest
In fog computing, every decision works the same way. This chapter walks through four critical decisions with clear criteria for when to choose each option. You do not need to memorize every number – focus on understanding when each option makes sense and why.
32.2 Introduction
Designing a fog computing architecture requires navigating a series of interconnected decisions. Unlike cloud computing, where a single provider manages most infrastructure concerns, fog deployments distribute responsibility across edge devices, fog nodes, and cloud services – each with different capabilities, constraints, and failure modes.
This chapter examines four critical design tradeoffs that every fog architect must evaluate:
Each tradeoff involves balancing competing concerns: latency vs throughput, simplicity vs reliability, cost vs performance. The right choice depends on your specific requirements, constraints, and operational capacity.
32.3 Tradeoff 1: Containers vs Virtual Machines
The first decision in fog deployment is how to package and isolate services on fog nodes. Containers and virtual machines represent fundamentally different approaches to workload isolation.
Option A (Containers - Docker/Podman/K3s):
Startup time: 1-5 seconds (immediate service availability)
Resource overhead: 50-200 MB memory per container
Density: 10-50 services per fog node (e.g., 8 GB RAM Raspberry Pi)
VM approach:\(512 + 512 + 512 + 512 = 2{,}048 \text{ MB}\) overhead alone, leaving only 1.95 GB for actual services (48% utilization) — before running any workloads. On resource-constrained fog hardware, containers provide \(\frac{2{,}100}{1{,}950} \approx 1.08\times\) more headroom, enabling 10-50 services vs 2-8 VMs per node.
32.3.1 Interactive: Container vs VM Density Calculator
Show code
viewof fog_ram_gb = Inputs.range([2,64], {value:8,step:1,label:"Fog node RAM (GB)"})viewof num_services = Inputs.range([1,20], {value:4,step:1,label:"Number of services"})viewof avg_service_mb = Inputs.range([50,2000], {value:300,step:50,label:"Avg service memory (MB)"})viewof container_overhead_mb = Inputs.range([20,200], {value:100,step:10,label:"Container overhead per service (MB)"})viewof vm_overhead_mb = Inputs.range([256,2048], {value:512,step:128,label:"VM overhead per instance (MB)"})
html`<div style="background: var(--bs-light, #f8f9fa); padding: 1rem; border-radius: 8px; border-left: 4px solid #3498DB; margin-top: 0.5rem;"><p><strong>Containers:</strong> ${density_calc.container_total} MB total (${density_calc.container_fits? density_calc.container_headroom.toFixed(0) +"% headroom":"DOES NOT FIT"})</p><p><strong>VMs:</strong> ${density_calc.vm_total} MB total (${density_calc.vm_fits? density_calc.vm_headroom.toFixed(0) +"% headroom":"DOES NOT FIT"})</p><p><strong>RAM saved by containers:</strong> ${density_calc.vm_total- density_calc.container_total} MB (${((density_calc.vm_total- density_calc.container_total) / density_calc.vm_total*100).toFixed(0)}% less overhead)</p><p style="color: ${density_calc.container_fits&&!density_calc.vm_fits?'#16A085':'#2C3E50'}; font-weight: bold;">${!density_calc.container_fits?"Neither approach fits — upgrade hardware or reduce services":!density_calc.vm_fits?"Only containers fit on this hardware — VMs exceed available RAM": density_calc.container_headroom>50?"Both fit. Containers recommended for better headroom.":"Both fit but tight. Consider upgrading hardware for 3x peak rule."}</p></div>`
Decision Factors:
Choose Containers when: Resource-constrained fog hardware (Raspberry Pi, Jetson Nano), need rapid scaling and updates, microservices architecture, team has container expertise, deploying 10+ services per node
Choose VMs when: Regulatory requirements mandate strong isolation (healthcare, finance), running legacy Windows applications, need full OS customization, multi-tenant fog nodes serving different customers, security-critical workloads requiring separate kernels
Hybrid approach: VMs for tenant isolation, containers within each VM for application density - common in industrial fog deployments where each customer gets a VM with containerized services
32.4 Tradeoff 2: Edge Processing vs Fog Processing Placement
The second critical decision determines where computation happens: directly on the device (edge) or at a nearby shared server (fog). This is not an either/or choice – most production systems use both, with a clear split based on latency requirements and computational complexity.
Tradeoff: Edge Processing vs Fog Processing Placement
Option A (Edge Processing - On-Device/Gateway):
Latency: 1-5ms (no network hop)
Bandwidth to fog/cloud: Minimal (only alerts/aggregates sent upstream)
Cost per compute unit: $500-5,000 per fog node serving 100+ devices
Decision Factors:
Choose Edge when: Safety-critical with <5ms requirement (collision avoidance, emergency shutoff), battery-powered devices cannot tolerate network latency, privacy requires data never leave device (medical wearables), network unreliable (rural, mobile, satellite)
Choose Fog when: ML inference needs GPU acceleration (video analytics, speech recognition), aggregation across multiple sensors required (anomaly correlation), regulatory compliance needs audit logging (industrial, healthcare), devices too constrained for local processing (low-cost sensors)
Split strategy: Edge handles threshold-based alerts (temperature > 80C = local shutoff), fog handles complex analytics (predict failure in 2 hours based on vibration patterns) - this hybrid approach optimizes for both latency-critical safety and compute-intensive intelligence
32.5 Tradeoff 3: Active-Active vs Active-Passive Redundancy
Fog nodes are physical hardware in uncontrolled environments – they fail. The question is not whether your fog node will fail, but how fast the system recovers when it does. This tradeoff determines the failover architecture.
Tradeoff: Active-Active vs Active-Passive Fog Node Redundancy
Option A (Active-Active Deployment):
Availability: 99.99% (4 nines) with two nodes, 99.999% with three
Failover time: 0-50ms (instant, no switchover needed - both nodes serve traffic)
Choose Active-Active when: Zero-tolerance for failover latency (autonomous vehicles, industrial safety), need to maximize throughput from hardware investment, team has distributed systems expertise, workload is stateless or uses distributed state stores (Redis Cluster, CockroachDB)
Choose Active-Passive when: Simpler operations are priority (small team, limited expertise), stateful workloads difficult to synchronize (legacy applications, file-based state), cost of idle standby acceptable for operational simplicity, failover time of 30-120 seconds is tolerable for the use case
Hybrid approach: Active-Active for stateless API gateways and message routing, Active-Passive for stateful databases and ML model serving - this balances complexity with availability requirements
32.6 Tradeoff 4: Synchronous vs Asynchronous Replication
The final critical tradeoff governs how data moves from fog nodes to the cloud. Synchronous replication guarantees consistency but blocks operations; asynchronous replication enables high throughput but risks data loss during failures.
Tradeoff: Synchronous vs Asynchronous Replication for Fog-to-Cloud Data
Option A (Synchronous Replication):
Data consistency: Strong - cloud has exact copy of fog data at all times
RPO (Recovery Point Objective): 0 seconds (zero data loss on fog node failure)
RTO (Recovery Time Objective): 10-60 seconds (cloud already has current state)
Write latency impact: +50-200ms per write (must wait for cloud acknowledgment)
Throughput ceiling: Limited by WAN bandwidth and latency (typically 100-1000 writes/sec)
Network dependency: Critical - fog operations block if cloud unreachable
Failure mode: Fog node stops accepting writes when cloud connection fails
Use cases: Financial transactions, safety-critical audit logs, compliance records
Option B (Asynchronous Replication):
Data consistency: Eventual - cloud may lag fog by seconds to minutes
RPO: 1-60 seconds typical (data in flight at time of failure may be lost)
RTO: 60-300 seconds (must replay queued data, potentially from backup)
Throughput ceiling: Limited only by fog node capacity (10,000+ writes/sec possible)
Network dependency: Low - fog continues operating during cloud outages
Failure mode: Data accumulates locally during outage, syncs when connection restores
Use cases: Telemetry, metrics, non-critical sensor data, ML training datasets
Decision Factors:
Choose Synchronous when: Regulatory compliance requires zero data loss (HIPAA, SOX), financial transactions where inconsistency means liability, safety systems where cloud must have real-time state for emergency coordination, data value exceeds latency cost
Choose Asynchronous when: High write throughput required (>1000 writes/sec), fog-to-cloud latency is high or variable (satellite, cellular), fog must operate autonomously during cloud outages, telemetry data where 30-second RPO is acceptable
Tiered approach: Synchronous for critical events (alerts, transactions) with separate high-priority queue, asynchronous for bulk telemetry - this optimizes both reliability and throughput within the same system
32.7 Visual Reference Gallery
Fog Computing Visualizations
These AI-generated figures provide alternative visual representations of fog computing concepts covered in this chapter.
32.7.1 Fog Node Architecture
Fog Node Architecture
32.7.2 Cloud-Edge Integration
Cloud Edge
32.7.3 Continuum Architecture
Computing Continuum
32.8 Common Pitfalls and Misconceptions
Common Pitfalls and Misconceptions in Fog Architecture
Sizing for average load instead of peak: A Raspberry Pi 4 handles 50 sensors at 200 msg/sec, but when all 50 report anomalies simultaneously the burst can reach 1,000 msg/sec and crash the node. Always size fog hardware for 3x expected peak load using: sensors x msg_rate x peak_multiplier x safety_margin.
Treating fog nodes like cloud infrastructure: Fog nodes sit on factory floors, utility poles, and retail stores – not air-conditioned data centers. They face power outages, overheating, theft, and network partitions. Design every edge device to degrade gracefully when its fog node disappears.
Over-engineering the fog tier: 80% of IoT applications with >100ms latency tolerance work fine with direct edge-to-cloud. Adding a fog layer for a temperature monitoring system that reports every 60 seconds introduces unnecessary hardware cost, maintenance burden, and failure points without measurable benefit.
Manual SSH-based fleet management: Managing 50 fog nodes by hand works until version drift causes 12 nodes to run v2.1 while 38 run v2.3, and failed updates require expensive on-site visits. Adopt Ansible, Terraform, or Balena from day one with canary deployments (1 node, then 10%, then 90%).
Choosing pure synchronous or pure asynchronous replication: Synchronous-only blocks operations during cloud outages (ATMs become unusable). Asynchronous-only risks data loss if a fog node fails before sync completes (regulatory violations). Use tiered replication: synchronous to local persistent storage, asynchronous to cloud with guaranteed delivery queues.
Even with the right tradeoff decisions, fog deployments can fail due to operational mistakes. The following pitfalls are drawn from real production incidents and represent the most frequent causes of fog system failures.
Common Pitfall: Fog Node Overload
The mistake: Deploying fog nodes without capacity planning, leading to resource exhaustion when device counts grow or workloads spike.
Symptoms:
Fog node CPU pegged at 100% during peak hours
Message queue backlogs growing unbounded
Latency increases from 10ms to 500ms+ under load
Out-of-memory crashes causing data loss
Edge devices timing out waiting for fog responses
Why it happens: Teams size fog hardware for average load, not peak load. A Raspberry Pi handles 50 sensors fine, but struggles when all 50 report anomalies simultaneously. Growth from 50 to 200 sensors happens gradually until sudden failure.
The fix:
# Fog Node Capacity Planninghardware_sizing:rule_of_thumb:"Size for 3x expected peak load"example_calculation:sensors:100messages_per_sensor_per_second:1peak_multiplier:5 # Anomaly events trigger burstssafety_margin:2required_capacity: 100 * 1 * 5 * 2 = 1000 msg/sechardware_benchmarks:raspberry_pi_4:"~200 msg/sec with local processing"intel_nuc_i5:"~2000 msg/sec with ML inference"industrial_gateway:"~5000 msg/sec with redundancy"overload_protection:- Implement backpressure (reject new connections when queue > threshold)- Priority queuing (critical alerts processed first)- Load shedding (drop low-priority telemetry during overload)- Horizontal scaling (add fog nodes, partition by device groups)monitoring:- CPU utilization (alert at 70%, critical at 85%)- Memory usage (alert at 75%)- Message queue depth (alert at 1000 messages)- Processing latency P99 (alert at 100ms)
Prevention: Benchmark fog node capacity before deployment using realistic traffic generators. Implement graceful degradation (shed load before crashing). Monitor resource utilization continuously and set alerts well below failure thresholds. Plan for horizontal scaling before vertical limits are reached.
Common Pitfall: Fog Orchestration Complexity
The mistake: Underestimating the operational complexity of managing distributed fog infrastructure, leading to configuration drift, update failures, and inconsistent behavior across nodes.
Symptoms:
Different fog nodes running different software versions
Configuration changes applied inconsistently across fleet
Failed updates leave nodes in broken states
No visibility into which nodes have which capabilities
Why it happens: Cloud infrastructure has mature tooling (Kubernetes, Terraform). Fog/edge environments lack equivalent maturity. Teams start with manual SSH-based management, which doesn’t scale past 10-20 nodes. Geographic distribution and unreliable connectivity complicate remote management.
The fix:
# Fog Orchestration Strategy (Ansible example)infrastructure_as_code:tool:"Ansible, Terraform, or Balena"principle:"Every fog node config is version-controlled"example_tasks:-service: name=docker state=started-docker_container:name: fog_processorimage:"registry.local/fog:{{ version }}"restart_policy: alwaysupdate_strategy:approach:"Canary: 1 node -> 10 % -> remaining 90 %"rollback:"Automatic if health checks fail"fleet_management:grouping:"By location, capability, criticality"health_checks:[heartbeat 60 s, version, CPU/RAM, connectivity]observability:logging:"Centralized aggregator"metrics:"Prometheus / Grafana"alerting:"PagerDuty for critical failures"
Prevention: Treat fog infrastructure with the same rigor as cloud infrastructure. Adopt configuration management tools from day one, not after scale problems emerge. Implement health monitoring and automated remediation. Design for nodes being unreachable (queued updates applied on reconnection). Test disaster recovery: what happens if 30% of fog nodes fail simultaneously?
Pitfall: Over-Engineering Fog Tiers for Simple Workloads
The Mistake: Teams implement complex three-tier fog architectures (edge-fog-cloud) with sophisticated workload orchestration for applications that would work fine with simple edge-to-cloud connectivity, adding unnecessary latency hops, maintenance burden, and failure points.
Why It Happens: Fog computing papers and vendor marketing emphasize multi-tier architectures. Teams apply “best practice” templates without analyzing whether their specific latency, bandwidth, or autonomy requirements actually justify fog infrastructure. A temperature monitoring system with 1-minute reporting intervals doesn’t need sub-10ms fog processing.
The Fix: Right-size your architecture based on actual requirements:
Start simple: direct edge-to-cloud works for 80% of IoT applications with >100ms latency tolerance
Add fog nodes only when you can quantify the benefit: latency requirements <50ms, bandwidth savings >10x, offline autonomy >1 hour
Calculate total cost of ownership: fog nodes add hardware, power, maintenance, and networking costs
Evaluate cloud-edge hybrid options: modern cloud services (AWS Greengrass, Azure IoT Edge) provide fog-like capabilities without dedicated hardware
Design for horizontal scaling: add fog capacity as requirements grow, don’t pre-deploy for hypothetical scale
Pitfall: Assuming Fog Nodes Are Always Available
The Mistake: Architects design fog systems assuming fog nodes will have 99.9%+ uptime like cloud services, then experience cascading failures when fog hardware fails, loses power, or becomes unreachable due to network partitions.
Why It Happens: Cloud services achieve high availability through massive redundancy invisible to users. Fog nodes are physical hardware in less controlled environments: factory floors, utility poles, retail stores, vehicle compartments. They face power outages, hardware failures, theft, vandalism, environmental damage, and network isolation that cloud data centers are designed to prevent.
The Fix: Design for fog node failure as a normal operating condition:
Implement graceful degradation: edge devices should operate (possibly with reduced functionality) when their fog node is unreachable
Deploy N+1 redundancy for critical fog functions: if one fog node fails, another can assume its workload
Use quorum-based decisions: require 2 of 3 fog nodes to agree before taking critical actions
Buffer data at edge: if fog is unavailable, queue data locally until connectivity returns (with priority-based buffer management)
Monitor fog node health: detect failures in <60 seconds and alert operators or trigger automatic failover
Test failure scenarios: simulate fog node crashes, network partitions, and power loss during system validation
32.9 Tradeoff Decision Matrix
Use this consolidated reference when evaluating fog architecture options:
Tradeoff
Option A
Option B
Hybrid Approach
Packaging
Containers (lightweight, fast startup)
VMs (strong isolation, legacy support)
VMs for tenants, containers within
Processing
Edge (sub-5ms, single device)
Fog (5-20ms, multi-sensor)
Edge for safety, fog for analytics
Redundancy
Active-Active (instant failover)
Active-Passive (simple ops)
A-A for stateless, A-P for stateful
Replication
Synchronous (zero data loss)
Asynchronous (high throughput)
Sync for critical, async for bulk
Interactive: Edge-Fog-Cloud Latency Trade-offs
Interactive Quiz: Match Concepts
Interactive Quiz: Sequence the Steps
🏷️ Label the Diagram
💻 Code Challenge
32.10 Summary
This chapter examined four critical design tradeoffs in fog computing architecture, along with common pitfalls that undermine fog deployments:
32.10.1 Key Tradeoffs
Containers vs Virtual Machines: Containers provide lightweight isolation (50-200 MB overhead, 1-5s startup) ideal for resource-constrained fog hardware like Raspberry Pi. VMs provide strong isolation (512 MB-2 GB overhead, 30-120s startup) required for regulatory compliance and multi-tenant environments. Most production deployments use a hybrid: VMs for tenant isolation, containers within each VM for application density.
Edge vs Fog Processing Placement: Edge processing achieves sub-5ms latency for safety-critical functions (collision avoidance, emergency shutoffs) on single devices. Fog processing provides 5-20ms latency with multi-sensor correlation for predictive analytics. The optimal split puts threshold-based alerts at the edge and compute-intensive intelligence at the fog layer.
Active-Active vs Active-Passive Redundancy: Active-Active provides 0-50ms failover with 99.99% availability but requires distributed state synchronization. Active-Passive offers simpler operations with 30-120s failover. Choose Active-Active for zero-downtime requirements; Active-Passive when operational simplicity outweighs failover speed.
Synchronous vs Asynchronous Replication: Synchronous replication guarantees zero data loss (RPO=0) but adds 50-200ms write latency and blocks during cloud outages. Asynchronous replication provides zero write latency impact and offline operation but risks 1-60s of data loss. A tiered approach (sync for critical events, async for bulk telemetry) balances both concerns.
32.11 Worked Example: Fog Node Sizing with the 3x Peak Rule
Worked Example: Container vs VM Density on a Retail Store Fog Gateway
Scenario: A retail chain deploys fog gateways (Intel NUC, 8 GB RAM, 4-core i5) in 200 stores. Each store has 40 BLE beacons (customer tracking), 12 IP cameras (loss prevention), and 8 POS terminals sending transaction events. The fog gateway runs three workloads: beacon aggregation, video thumbnail extraction, and real-time promotion engine.
Lightweight. All three workloads fit with 5.55 GB free for burst/buffering.
VMs (KVM)
1.75 GB workload + 3 x 512 MB VM overhead + 1 GB host OS = 4.29 GB
Yes, but only 1.9x headroom
Each VM adds 512 MB for guest OS. Less room for peak bursts.
3x peak rule check
Peak total = 4.0 cores. Need 3x = 12 cores. Have 4 cores.
Fails CPU test
4 cores cannot handle 3x peak.
Step 3: Resolve the CPU Bottleneck
The 3x rule reveals that Black Friday peaks (4 cores needed) leave zero headroom on a 4-core NUC. Options:
Solution
Cost
Result
Upgrade to 6-core NUC (i7)
+$120/store = $24,000 fleet
1.5x headroom (still below 3x)
Cloud burst for video thumbnails
+$15/month/store = $36,000/year
Offload peak video to cloud. Fog handles beacon + promos.
Reduce video processing on peak days
$0
Process every 3rd frame instead of every frame. Acceptable for loss prevention.
Result: The team chooses containers (not VMs – saving 1.84 GB RAM) plus adaptive frame-skip during peaks. Cost: $0 additional hardware. On normal days, all 3 workloads run on the 4-core NUC at 38% utilization. On Black Friday, the video extractor drops to 1/3 frame rate, keeping total CPU under 2.5 cores (62% utilization with buffer). This avoids the $24,000 hardware upgrade while maintaining all three services.
Key insight: The 3x rule is a sizing TARGET, not a hard requirement. When you cannot hit 3x, design graceful degradation (frame-skip, cloud burst) for the workload most tolerant of reduced quality.
32.11.1 Common Pitfalls
Fog Node Overload: Size hardware for 3x expected peak load, not average load. Implement backpressure, priority queuing, and load shedding before exhaustion.
Orchestration Complexity: Treat fog infrastructure with cloud-level rigor from day one. Use infrastructure-as-code (Ansible, Terraform) and canary deployments.
Over-Engineering: Start with direct edge-to-cloud for the 80% of IoT applications that tolerate >100ms latency. Add fog only when requirements justify the complexity.
Availability Assumptions: Design for fog node failure as a normal operating condition, not an exceptional event. Edge devices must degrade gracefully when their fog node is unreachable.
32.11.2 Self-Assessment Checklist
Before moving on, ensure you can:
32.12 Knowledge Check
Quiz: Fog Design Tradeoffs
32.13 What’s Next
Apply your understanding of fog computing tradeoffs: