54 Edge Architecture Review
54.1 Learning Objectives
By the end of this chapter, you will be able to:
- Explain the Edge-Fog-Cloud Continuum: Describe how data flows through multiple processing tiers with increasing latency but decreasing bandwidth requirements
- Apply the Seven-Level IoT Reference Model: Map processing capabilities, latency characteristics, and use cases to each level
- Identify Processing Trade-offs: Evaluate latency, bandwidth, processing power, and cost at each architectural tier
- Design Tiered Architectures: Apply the golden rule of edge computing to determine optimal processing placement
Key Concepts
- Edge processing hierarchy: The ordered set of processing capabilities from sensor (simplest) to microcontroller to gateway to fog server to cloud (most powerful), each tier handling more complex tasks.
- Resource provisioning: Allocating sufficient memory, CPU, and network bandwidth at each edge tier for the processing tasks assigned to it, with safety margins for unexpected load spikes.
- Edge orchestration: Automated management of which tasks run on which edge nodes, including dynamic reallocation when nodes fail or become overloaded, typically using Kubernetes at the edge (K3s, KubeEdge).
- Stateless vs stateful edge processing: Stateless edge operations (format conversion, threshold check) can be run on any node without coordination; stateful operations (running averages, anomaly models) require state management and careful placement.
54.2 Prerequisites
Before studying this chapter, complete:
- Edge Compute Patterns - Core edge architectures
- Edge Data Acquisition - Data collection at the edge
- IoT Reference Models - Architecture foundations
For Beginners: Understanding the Edge Computing Continuum
Think of edge computing like a postal system:
- Edge (Level 1-3): Your local post office - handles urgent letters quickly, sorts mail before sending
- Fog (Level 4-5): Regional distribution center - accumulates mail from multiple local offices, makes routing decisions
- Cloud (Level 6-7): National headquarters - handles complex logistics, long-term records, cross-country coordination
Data flows the same way: urgent processing happens locally, while complex analysis travels to centralized systems.
54.3 Edge-Fog-Cloud Architecture Overview
The following diagram illustrates the complete edge computing continuum, showing how data flows from sensors through multiple processing tiers to the cloud, with increasing latency but decreasing bandwidth requirements at each level.
54.4 Seven-Level IoT Reference Model
The following table summarizes the seven-level IoT reference model, mapping each level to its processing capabilities, latency characteristics, and typical use cases.
| Level | Name | Processing Capabilities | Latency | Data Volume | Typical Use Cases |
|---|---|---|---|---|---|
| Level 1 | Physical Devices & Controllers | Raw sensing, basic actuation, signal conditioning | <1 ms | Very High (GB/hour) | Sensor sampling, emergency shutoffs, real-time control |
| Level 2 | Connectivity | Protocol translation, network routing, device addressing | <1 ms | Very High | Data transmission, network protocols, device communication |
| Level 3 | Edge Computing (Fog) | Evaluation (filtering), Formatting (standardization), Distillation (aggregation), Assessment (thresholds) | 1-10 ms | High (MB/hour) | Data reduction (100-1000x), downsampling, statistical aggregation, anomaly detection |
| Level 4 | Data Accumulation | Time-series storage, buffer management, data persistence | 10-100 ms | Medium | Local databases, recent data cache, query processing |
| Level 5 | Data Abstraction | Data modeling, semantic integration, format conversion | 10-100 ms | Medium | Data normalization, schema mapping, API abstraction |
| Level 6 | Application | Business logic, analytics, ML inference | 100-500 ms | Low (MB/day) | Dashboards, reporting, predictive models |
| Level 7 | Collaboration & Processes | Cross-system integration, workflow automation, enterprise services | 100-500 ms | Very Low | ERP integration, business processes, multi-tenant services |
54.4.1 Key Data Reduction Example
Scenario: 500 vibration sensors, 1 kHz sampling, 16-byte readings
| Processing Stage | Data Rate | Cumulative Reduction | Operations Applied |
|---|---|---|---|
| Raw Sensors (Level 1) | 28.8 GB/hour | Baseline | None |
| After Downsampling (Level 3) | 288 MB/hour | 100x | Frequency: 1 kHz to 10 Hz |
| After Aggregation (Level 3) | 2 MB/hour | ~14,400x | Spatial grouping (100 sensors per group) + statistical summarization at 1 Hz |
Cost Impact: ~$25,000/year savings in cloud ingress costs (@$0.10/GB)
Putting Numbers to It
The ~14,400x data reduction ratio comes from chaining two processing stages at the edge gateway:
\[\text{Total Reduction} = \text{Downsampling Ratio} \times \text{Aggregation Ratio}\]
For the vibration sensor example:
Stage 1 - Temporal Downsampling (100x):
\[500 \text{ sensors} \times 1000 \text{ Hz} \times 16 \text{ B} = 8{,}000{,}000 \text{ B/s} = 28.8 \text{ GB/hr}\]
Downsample each sensor from 1000 Hz to 10 Hz:
\[500 \text{ sensors} \times 10 \text{ Hz} \times 16 \text{ B} = 80{,}000 \text{ B/s} = 288 \text{ MB/hr}\]
Stage 2 - Spatial Aggregation (~144x):
Group 100 sensors into 5 groups. For each group, compute a statistical summary (min, max, mean, std dev, median, peak frequency, count – 7 metrics at 16 bytes each = 112 bytes) once per second:
\[5 \text{ groups} \times 1 \text{ Hz} \times 112 \text{ B} = 560 \text{ B/s} \approx 2 \text{ MB/hr}\]
Each group reduces 100 sensors at 10 Hz (16,000 B/s input) to a 112-byte summary (112 B/s output), a ~143x reduction per group.
\[R_{total} = 100 \times 143 \approx 14{,}300\times\]
(Rounded to ~14,400x in practice, as summary sizes vary slightly with encoding.)
Annual cost impact at $0.10/GB cloud ingress:
- Without edge: \(28.8 \text{ GB/hr} \times 24 \text{ hr} \times 365 \text{ days} \times \$0.10 = \$25{,}228/\text{year}\)
- With edge: \(0.002 \text{ GB/hr} \times 24 \text{ hr} \times 365 \text{ days} \times \$0.10 = \$1.75/\text{year}\)
- Savings: ~$25,226/year (99.99% cost reduction)
54.4.2 Edge Data Reduction Calculator
Use this interactive calculator to explore how different sensor configurations affect data reduction and cost savings.
54.4.3 Processing Trade-off Summary
| Factor | Edge Layer | Fog Layer | Cloud Layer |
|---|---|---|---|
| Latency | <1 ms (Best) | 10-100 ms (Moderate) | 100-500 ms (Highest) |
| Bandwidth | Very High (Worst) | Medium | Low (Best) |
| Processing Power | Limited | Moderate | Unlimited |
| Data Retention | Seconds-Minutes | Hours-Days | Unlimited |
| Cost per Node | Low | Medium | High (centralized) |
| Scalability | Distributed | Regional | Global |
| Use Cases | Real-time control, safety | Analytics, ML inference | Training, long-term storage |
54.5 Architecture Design Principle
The Golden Rule of Edge Computing: Process data as close to the source as possible, but only as close as necessary.
- Edge (Level 1-3): Latency-critical operations, data reduction, real-time decisions
- Fog (Level 4-5): Regional analytics, ML inference, medium-term storage
- Cloud (Level 6-7): Deep analytics, model training, historical analysis, global coordination
54.6 Knowledge Check: Architecture Concepts
## Edge-Cloud Symbiosis: Why Hybrid Architectures Win {#edge-review-arch-hybrid}
A common misconception is that edge computing means devices process data completely independently without cloud connectivity. In reality, edge computing is about intelligent data reduction and latency-critical processing, not replacing the cloud.
The Misconception: Many students believe edge computing means fully autonomous devices that never need the cloud.
Reality – Hybrid Edge-Cloud Model:
- Edge processing: ~95% of raw data volume (filtered and aggregated locally)
- Cloud transmission: ~5% of data (summaries, anomalies, important events)
- Cloud computation: ~80% of ML training workload (requires historical fleet-wide data)
- Edge inference: ~20% of ML workload (simple threshold-based and lightweight model decisions)
When to Use Each:
- Edge: Real-time safety shutdowns (<10ms), data reduction (100-1000x), privacy filtering
- Cloud: ML model training, historical analytics, cross-site correlation, firmware updates
- Wrong approach: Trying to do all ML training on edge devices, or sending all raw sensor data to cloud
Cost Impact of Misconception: Companies over-investing in edge infrastructure waste $50,000-$200,000 per deployment site, while companies under-utilizing edge spend $25,000-$80,000/year in unnecessary cloud ingress costs.
The edge-cloud relationship is symbiotic, not competitive. The following table illustrates how hybrid approaches outperform either tier alone:
| Capability | Edge Strength | Cloud Strength | Hybrid Approach |
|---|---|---|---|
| Anomaly Detection | Rules-based, <1ms latency | ML-based, learns new patterns | Edge: immediate alerts; Cloud: model retraining |
| Model Accuracy | Fixed rules, ~85% accuracy | Continuous learning, ~98% accuracy | Edge runs cloud-trained models |
| Failure Modes | Detects known signatures only | Discovers unknown patterns | Cloud identifies new failure types, pushes to edge |
| Resource Usage | 2 GB RAM, 10W power | 128 GB RAM, 500W GPUs | Edge inference, cloud training |
| Data Requirements | Single site | Fleet-wide (1000+ sites) | Edge sends summaries, cloud correlates |
Case Study: Hybrid Predictive Maintenance
Phase 1 – Edge-Only Deployment:
A manufacturing company deployed edge-only predictive maintenance:
- Edge gateway: Rule-based anomaly detection
- 6 months operation: Detected 18 known failure modes successfully
- Problem: Missed 3 novel failure types (bearing cage fracture, new vibration signature)
- Cost of missed failures: $220,000 (unplanned downtime)
Phase 2 – Adding Cloud-in-the-Loop:
- Edge: Continues real-time detection (<1ms)
- Cloud: Trains on 12 months of fleet data (50 factories)
- Discovers 5 new failure signatures
- Pushes updated models to all edge gateways monthly
- Year 2: Detected 23 failure modes (5 discovered by cloud ML)
- Prevented failures: $180,000 saved
- Cloud training cost: $12,000/year
- Net benefit: $168,000/year
Quantified Benefits of Hybrid Edge-Cloud:
| Metric | Edge-Only | Hybrid Edge-Cloud | Improvement |
|---|---|---|---|
| Detection latency | <1ms | <1ms | No change (edge handles) |
| False positive rate | 15% | 4% | 73% reduction (cloud training) |
| Novel failure detection | 0% | 85% | New capability |
| Model staleness | Permanent | 30-day refresh | Continuous improvement |
| Development cost | $80K (rules) | $120K (ML pipeline) | +$40K |
| Annual value | $200K | $380K | +$180K |
Architecture Best Practices:
- Edge handles time-critical decisions: Sub-second response using current model
- Cloud handles model improvement: Train on months of data, push updates monthly/quarterly
- Edge sends representative samples: Not all data, just anomalies + 1% baseline
- Cloud correlates across sites: Patterns invisible at single-site level
- Graceful degradation: Edge continues working if cloud offline, using last-known-good model
The Lesson: Edge computing is not about replacing the cloud – it is about intelligently dividing labor. Edge handles real-time decisions with local data. Cloud handles complex analysis with global data. Together, they create systems smarter than either alone.
Worked Example: Smart City Traffic Management Architecture
Scenario: A city deploys IoT for real-time traffic management across 200 intersections.
Requirements:
- Traffic light control: <100ms response time
- Pedestrian detection: Real-time video processing
- Traffic flow analytics: Historical pattern analysis
- Incident detection: Automated alerts for accidents/congestion
Data Generation:
- 200 intersections x 4 cameras = 800 cameras
- Each camera: 1080p @ 30 fps = ~8 Mbps (H.264 compressed)
- Total raw video: 800 x 8 Mbps = 6,400 Mbps = 6.4 Gbps
- Per day: 6.4 Gbps x 86,400 s / 8 bits per byte = 69,120 GB/day = ~69 TB/day
Architecture by Level:
Level 1 - Physical Devices:
- 800 cameras + 200 traffic light controllers
- Edge compute modules (NVIDIA Jetson) at each intersection
- Local object detection (pedestrians, vehicles)
- Raw data: ~69 TB/day
Level 2 - Connectivity:
- Intersection-to-gateway: Gigabit Ethernet
- Gateway-to-fog: Fiber optic (100 Gbps backbone)
- Fog-to-cloud: Internet uplink (10 Gbps)
Level 3 - Edge Computing (Per Intersection):
- Vehicle counting and classification
- Pedestrian detection
- License plate reading (when needed)
- Local traffic light optimization
- Data reduction: 8 Mbps (1 MB/s) video per camera reduced to 2 KB/s metadata
- Reduction factor per camera: ~500x
- Output: 800 cameras x 2 KB/s = 1.6 MB/s = 138 GB/day
Level 4 - Fog Node (5 Regional Hubs):
- Each hub aggregates 40 intersections
- Traffic flow optimization across corridors
- Incident detection via pattern matching
- 7-day data retention for local queries
- Further aggregation: 138 GB/day reduced to 15 GB/day sent to cloud
- Reduction factor: ~9.2x
Level 5 - Data Abstraction (Fog):
- Normalize traffic data formats
- Spatial-temporal correlation
- Generate traffic congestion heat maps
- API layer for city dashboards
Level 6 - Cloud Application:
- City-wide traffic analytics
- ML model training on months of data
- Long-term trend analysis
- Integration with public transit systems
- Final data: 15 GB/day (down from ~69 TB/day)
- Total reduction: ~4,600x
Level 7 - Collaboration:
- Integration with emergency services (ambulance routing)
- Public APIs for traffic apps
- Inter-city coordination for highway traffic
Latency by Tier:
| Decision Type | Processing Tier | Latency | Example |
|---|---|---|---|
| Traffic light change | Edge (Level 1) | <50ms | Pedestrian button pressed |
| Corridor optimization | Fog (Level 4) | 1-2 seconds | Adjust 10-light timing |
| City-wide rerouting | Cloud (Level 6) | 30-60 seconds | Major incident detected |
Cost Analysis:
| Component | Cost per Unit | Quantity | Total |
|---|---|---|---|
| Edge compute (Jetson Xavier) | $600 | 200 | $120,000 |
| Fog servers (regional hubs) | $15,000 | 5 | $75,000 |
| Network infrastructure | $50,000 | 1 | $50,000 |
| Cloud services (annual) | $180,000/year | - | $180,000 |
| Total (Year 1) | - | - | $425,000 |
Savings vs Cloud-Only Architecture:
If all video sent to cloud:
- Bandwidth: 69 TB/day x 30 days x $0.05/GB = 69,000 GB/day x 30 x $0.05 = $103,500/month = $1.24M/year
- Cloud processing: Equivalent compute for video analysis at scale = $600,000/year
- Total cloud-only cost: $1.84M/year
With edge-fog architecture:
- Edge hardware (amortized over 5 years): $245,000 / 5 = $49,000/year
- Cloud costs (reduced data): $180,000/year
- Total hybrid cost: $229,000/year
Annual savings: $1.84M - $229K = $1.61M (87.5% reduction) Payback period: ~3 months
Decision Framework: Workload Placement Across Edge-Fog-Cloud
Use this decision framework to determine optimal processing placement:
Step 1: Assess Latency Requirements
| Latency Requirement | Recommended Tier | Rationale |
|---|---|---|
| <10ms | Edge (Level 1-3) | Safety-critical, real-time control |
| 10-100ms | Fog (Level 4-5) | Local coordination, multi-device |
| 100ms-1s | Fog or Cloud | Real-time analytics |
| 1-10s | Cloud (Level 6-7) | Complex analytics |
| >10s | Cloud | Historical analysis, ML training |
Step 2: Evaluate Data Volume
| Daily Data Volume | Action | Reason |
|---|---|---|
| >1 TB/day per site | Edge processing mandatory | Bandwidth costs prohibitive |
| 100GB-1TB | Fog aggregation beneficial | Regional optimization |
| 10-100GB | Hybrid approach | Balance cost and capability |
| <10GB | Cloud-first acceptable | Bandwidth costs manageable |
Step 3: Consider Processing Complexity
| Processing Type | Recommended Tier | Example |
|---|---|---|
| Simple thresholds | Edge | Temperature >50C alert |
| Statistical aggregation | Edge/Fog | Hourly min/max/avg |
| Pattern recognition (rules) | Fog | Known anomaly signatures |
| ML inference (small model) | Edge/Fog | Classification, 1-10 classes |
| ML inference (large model) | Cloud | NLP, complex vision tasks |
| ML training | Cloud | Requires fleet-wide data |
Step 4: Evaluate Connectivity
| Connectivity Profile | Architecture | Notes |
|---|---|---|
| Always connected, reliable | Cloud-first | Leverage cloud scale |
| Intermittent (hourly drops) | Edge-fog hybrid | Buffer at edge |
| Frequently offline (daily) | Edge-primary | Local autonomy required |
| Rarely connected (weekly+) | Edge-only | Full local processing |
Decision Algorithm (Scoring):
def recommend_processing_tier(latency_ms, data_volume_gb_day,
processing_complexity, connectivity):
"""
Recommend optimal processing tier based on requirements.
Returns: ("EDGE", "FOG", "CLOUD", or "HYBRID") with justification
"""
score_edge = 0
score_fog = 0
score_cloud = 0
# Latency scoring
if latency_ms < 10:
score_edge += 10
elif latency_ms < 100:
score_edge += 5
score_fog += 5
elif latency_ms < 1000:
score_fog += 5
score_cloud += 3
else:
score_cloud += 10
# Data volume scoring
if data_volume_gb_day > 1000:
score_edge += 10
score_fog += 5
elif data_volume_gb_day > 100:
score_edge += 5
score_fog += 5
score_cloud += 2
elif data_volume_gb_day > 10:
score_fog += 3
score_cloud += 5
else:
score_cloud += 10
# Processing complexity
complexity_map = {
'simple': {'edge': 10, 'fog': 5, 'cloud': 2},
'moderate': {'edge': 5, 'fog': 8, 'cloud': 5},
'complex': {'edge': 2, 'fog': 5, 'cloud': 10}
}
scores = complexity_map.get(processing_complexity, {})
score_edge += scores.get('edge', 0)
score_fog += scores.get('fog', 0)
score_cloud += scores.get('cloud', 0)
# Connectivity
connectivity_map = {
'always': {'edge': 2, 'fog': 5, 'cloud': 10},
'intermittent': {'edge': 7, 'fog': 8, 'cloud': 3},
'rare': {'edge': 10, 'fog': 5, 'cloud': 1}
}
scores = connectivity_map.get(connectivity, {})
score_edge += scores.get('edge', 0)
score_fog += scores.get('fog', 0)
score_cloud += scores.get('cloud', 0)
# Determine recommendation
max_score = max(score_edge, score_fog, score_cloud)
if abs(score_edge - score_fog) < 5 and \
abs(score_edge - score_cloud) < 5:
return ("HYBRID",
f"Mixed requirements (E:{score_edge} "
f"F:{score_fog} C:{score_cloud})")
if score_edge == max_score:
return ("EDGE",
f"Best fit (E:{score_edge} "
f"F:{score_fog} C:{score_cloud})")
elif score_fog == max_score:
return ("FOG",
f"Best fit (E:{score_edge} "
f"F:{score_fog} C:{score_cloud})")
else:
return ("CLOUD",
f"Best fit (E:{score_edge} "
f"F:{score_fog} C:{score_cloud})")
# Example: Industrial predictive maintenance
tier, reason = recommend_processing_tier(
latency_ms=50,
data_volume_gb_day=800,
processing_complexity='moderate',
connectivity='intermittent'
)
print(f"Recommendation: {tier}")
print(f"Reasoning: {reason}")
# Output: Recommendation: FOG
# Reasoning: Best fit (E:22 F:26 C:10)54.6.1 Workload Placement Calculator
54.7 Chapter Summary
The Edge-Fog-Cloud Continuum provides progressive data processing where latency increases (under 1ms at edge to 100-500ms at cloud) while bandwidth requirements decrease dramatically through data reduction at each tier.
The Seven-Level Reference Model guides processing decisions: Levels 1-2 handle physical sensing and connectivity, Level 3 performs edge computing (filtering, aggregation, format standardization), Levels 4-5 provide fog-layer storage and abstraction, and Levels 6-7 enable cloud analytics and enterprise integration.
Processing trade-offs must balance latency requirements, bandwidth constraints, processing power availability, data retention needs, and cost considerations when determining where to place computation.
The Golden Rule states: process data as close to the source as possible, but only as close as necessary, based on latency requirements and processing complexity.
Edge-cloud symbiosis means edge computing reduces data volume and handles real-time decisions, while the cloud provides model training, historical analytics, and cross-site correlation. Neither tier replaces the other.
Key Takeaway
The Edge-Fog-Cloud continuum follows a golden rule: process data as close to the source as possible, but only as close as necessary. Edge handles latency-critical operations and data reduction (achieving ~14,000x compression), fog provides regional analytics and ML inference, and cloud delivers deep analytics and global coordination. This is a hybrid model – edge computing reduces data volume, not replaces cloud computing.
For Kids: Meet the Sensor Squad!
“The Relay Race of Data!”
Sammy the Sensor was collecting vibration readings from a factory floor – thousands every second! “I’m making SO much data!” Sammy cheered.
Max the Microcontroller, sitting in the edge gateway nearby, shook his head. “Sammy, you can’t send ALL of that to the cloud. That’s like trying to pour a swimming pool through a garden hose!”
“So what do we do?” asked Lila the LED, blinking nervously.
“It’s like a relay race!” Max explained. “Sammy runs the first leg – collecting data super fast. I run the second leg – I take Sammy’s thousands of readings and squeeze them into a tiny summary. Then the cloud runs the final leg – it takes my summaries and does the really brainy stuff, like predicting when a machine might break.”
“So each runner does what they’re best at?” Sammy asked.
“Exactly! I’m close to you so I can react in a flash – under one millisecond! The cloud is far away but super smart. Together, we’re an unstoppable team!”
Bella the Battery smiled. “And because Max filters out 99% of the data before sending, we don’t waste energy transmitting things nobody needs. That means I last WAY longer!”
The Sensor Squad learned: Edge, fog, and cloud each have their own superpower. The trick is giving each one the right job!
54.8 Concept Relationships
Edge architecture builds on:
- IoT Reference Models - Seven-level model defines the edge, fog, and cloud tiers
- Edge Fog Computing - Architecture principles for distributed processing
Edge architecture enables:
- Edge Data Reduction - Level 3 EFR pipeline achieves ~14,000x compression through the architectural framework
- Gateway Security - Gateway placement at Level 3 creates security perimeter for non-IP devices
- Power Optimization - Architecture decisions determine device power profiles
Parallel concepts:
- Edge-Fog-Cloud continuum and Tiered storage (hot/warm/cold): Both use hierarchical placement based on access patterns
- Golden rule of edge computing and Workload placement algorithms: Process as close to source as necessary
54.9 See Also
Related review chapters:
- Edge Review: Data Reduction - Calculations and formulas
- Edge Review: Gateway Security - Protocol translation and fail-closed security
- Edge Review: Deployments - Real-world patterns and technology stack
Foundational chapters:
- Edge Compute Patterns - Processing patterns
- Edge Data Acquisition - Data collection methods
Interactive tools:
- Simulations Hub - Edge vs Cloud Latency Explorer
- Knowledge Map - Visual relationship mapping
54.10 What’s Next
| Current | Next |
|---|---|
| Edge Architecture Review | Edge Review: Data Reduction |
Related chapters in this review series:
| Chapter | Focus |
|---|---|
| Edge Review: Gateway and Security | Protocol translation and fail-closed security |
| Edge Review: Power Optimization | Deep sleep and battery life calculations |
| Edge Review: Storage and Economics | Tiered storage and ROI analysis |
Common Pitfalls
1. Designing an edge architecture review without distinguishing tiers
An architecture review that uses ‘edge’ to mean both microcontrollers and gateways conflates systems differing by 3–4 orders of magnitude in resources. Always specify which tier and its exact resource constraints.
2. Reviewing static architectures without considering operational dynamics
Edge deployments experience node failures, network partitions, and load spikes. Reviewing only the steady-state architecture without analysing failure scenarios leaves critical operational questions unanswered.
3. Focusing on the data path and ignoring the management path
The data path (sensor → cloud) is typically well-reviewed, but the management path (cloud → device configuration, firmware, diagnostics) is equally critical and equally complex. Include both in any architecture review.
4. Accepting ‘it works in the lab’ as sufficient for edge deployment
Lab conditions with stable power, Ethernet connectivity, and controlled temperature do not represent field conditions with intermittent wireless, temperature extremes, and physical vibration. Architecture reviews must include a field conditions stress test plan.