%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D', 'fontSize': '12px'}}}%%
graph TB
subgraph Scale["Device Scale"]
Small["<100 devices<br/>β Centralized"]
Medium["100-10K devices<br/>β Hierarchical"]
Large[">10K devices<br/>β Distributed"]
end
subgraph Latency["Latency Requirements"]
Tolerant[">1s acceptable<br/>β Cloud OK"]
Low["100ms-1s<br/>β Fog/Gateway"]
Critical["<100ms<br/>β Edge Required"]
end
subgraph Data["Data Volume"]
LowVol["<1 GB/day<br/>β Cloud Storage"]
MedVol["1-100 GB/day<br/>β Edge Filter"]
HighVol[">100 GB/day<br/>β Edge Process"]
end
subgraph Architecture["Recommended Architecture"]
CloudCentric["Cloud-Centric<br/>AWS IoT, Azure IoT"]
Fog["Fog Computing<br/>Gateway Processing"]
EdgeCentric["Edge-Centric<br/>Local Processing"]
end
Small --> CloudCentric
Medium --> Fog
Large --> EdgeCentric
Tolerant --> CloudCentric
Low --> Fog
Critical --> EdgeCentric
LowVol --> CloudCentric
MedVol --> Fog
HighVol --> EdgeCentric
style CloudCentric fill:#7F8C8D,stroke:#16A085,color:#fff
style Fog fill:#E67E22,stroke:#2C3E50,color:#fff
style EdgeCentric fill:#16A085,stroke:#2C3E50,color:#fff
173 IoT Architecture Selection Framework
173.1 Learning Objectives
By the end of this chapter, you will be able to:
- Apply systematic criteria for selecting IoT reference architectures
- Evaluate device scale, latency, connectivity, and data volume requirements
- Match industry domains to appropriate architecture patterns
- Design multi-region architectures with data sovereignty compliance
173.2 Prerequisites
Before diving into this chapter, you should be familiar with:
- Key Reference Models: Understanding of ITU-T, IoT-A, and WSN architectures
- Introduction to Reference Architectures: Basic concepts and why reference architectures matter
173.3 Introduction
Core Concept: Three primary factors drive architecture selection: Device Scale (small/medium/large), Latency Requirements (tolerant/>1s, low/100ms-1s, critical/<100ms), and Data Volume (low/<1GB, medium/1-100GB, high/>100GB per day).
Why It Matters: Choosing based on familiarity or trends rather than requirements leads to over-engineered solutions (unnecessary edge infrastructure) or under-capable designs (cloud-only failing real-time requirements).
Key Takeaway: Map each use case to its latency, scale, and connectivity needs. Small scale + tolerant latency + low data = cloud-centric. Large scale + critical latency + high data = distributed edge. Start simple and add tiers only when requirements demand them.
Making informed architecture decisions requires evaluating multiple factors. This framework provides a systematic approach to selecting the appropriate IoT reference architecture for your deployment.
Option A (ITU-T Y.2060): Four-layer telecom-centric architecture (Device, Network, Service Support, Application). Standardized by international body, excellent for carrier integration. Best documentation for network-level concerns. Simpler conceptual model with clear layer boundaries.
Option B (IoT-A): Three-view enterprise architecture (Functional, Information, Deployment). Rich modeling of business entities and services. Better support for complex multi-stakeholder systems. More detailed security and interoperability cross-cutting concerns.
Decision Factors:
Choose ITU-T when: Building telecom-integrated IoT (5G/LTE-M), smart city infrastructure requiring carrier partnerships, systems where network layer is the primary complexity (protocol bridges, gateways), or teams with networking/telecom background who think in protocol stacks.
Choose IoT-A when: Enterprise systems with complex business logic (asset management, supply chain), multi-stakeholder deployments (hospitals, campuses) requiring access control modeling, systems where information models are critical (digital twins, virtual entities), or teams with enterprise architecture background (TOGAF, ArchiMate).
Practical guidance: For most IoT projects, start with ITU-Tβs simpler 4-layer model for initial design. Add IoT-Aβs Functional and Information views when you need to model complex business entities or multi-tenant access. Barcelona Smart City uses both: ITU-T for infrastructure, IoT-A for multi-department coordination.
Option A (Centralized Gateway): Single powerful gateway (Raspberry Pi 4, Intel NUC) aggregates all local sensors. Easier management (one device to update), unified protocol translation, simpler security perimeter. Typical capacity: 500-2,000 sensors, 10,000 messages/minute.
Option B (Distributed Edge): Multiple edge nodes (ESP32, industrial PLCs) each handle local processing. Better fault tolerance (no single point of failure), lower latency for local loops, scales horizontally. Typical capacity: 50-200 sensors per node, 1,000 messages/minute per node.
Decision Factors:
Choose Centralized Gateway when: Deployment area is compact (<500m radius), network is reliable (wired Ethernet or stable Wi-Fi), processing requirements are uniform across sensors, management simplicity is prioritized, or budget favors one capable device over many simple ones.
Choose Distributed Edge when: Latency requirements vary by zone (some <50ms, others tolerant), network partitions are possible (factory floors, multi-building campus), different sensor groups need different processing (vision in one area, vibration in another), or fault isolation is critical (one failed node shouldnβt affect others).
Cost comparison for 1,000 sensors: Centralized gateway (1x Intel NUC $500 + network switches $300) = $800. Distributed edge (10x ESP32 $50 each + 10x local switches $100) = $600, but add 5x management overhead. Choose centralized unless you have specific distributed requirements.
173.4 Decision Criteria Explained
173.4.1 1. Device Scale
The number of devices fundamentally impacts architecture choices:
< 100 devices (Small): Simple centralized architectures work well. Direct cloud connectivity is feasible. Management overhead is minimal. Examples: home automation, small office monitoring.
100-10K devices (Medium): Requires gateway aggregation and hierarchical management. Network topology becomes important. Data aggregation needed. Examples: building management, campus deployments.
> 10K devices (Large): Demands distributed architecture with multiple coordination points. Scalability is critical. Automated provisioning essential. Examples: smart city, nationwide sensor networks.
173.4.2 2. Latency Requirements
Real-time responsiveness determines processing location:
< 100ms (Ultra-low latency): Edge computing mandatory. Local decision-making required. Cloud used only for analytics and coordination. Examples: industrial automation, autonomous vehicles.
100ms - 1s (Low latency): Hybrid architectures work well. Gateway can make decisions. Cloud handles non-time-critical tasks. Examples: smart building HVAC, traffic management.
> 1s acceptable (Standard latency): Cloud-centric is viable. Network delays acceptable. Simpler architecture possible. Examples: environmental monitoring, asset tracking.
173.4.3 3. Network Connectivity
Connection reliability shapes architecture resilience:
Reliable Internet: Cloud-first architecture. Continuous connectivity assumed. Centralized control and storage. Examples: urban deployments with fiber/cellular.
Intermittent connectivity: Fog computing for local intelligence. Store-and-forward capability. Eventual consistency models. Examples: rural areas, mobile deployments.
Offline periods expected: Edge autonomy required. Local data storage and processing. Synchronization when connected. Examples: maritime, remote locations.
173.4.4 4. Data Volume
The amount of data determines processing strategies:
< 1 GB/day: Full cloud transmission feasible. Simple architectures sufficient. Cost-effective bandwidth use. Examples: meter reading, simple sensors.
1-100 GB/day: Edge filtering recommended. Pre-process and aggregate locally. Send summaries to cloud. Examples: video analytics, high-frequency sensors.
> 100 GB/day: Multi-tier processing essential. Distributed storage required. Hierarchical data reduction. Examples: video surveillance networks, continuous high-resolution sensing.
173.4.5 5. Industry Domain
Domain-specific requirements guide reference model selection:
Industrial/Manufacturing: Follow ISA-95 or RAMI 4.0. Emphasis on deterministic control, safety, and interoperability with legacy systems.
Smart Home/Building: Use Matter, Thread, or Zigbee standards. Focus on user experience, interoperability, and energy efficiency.
Healthcare/Medical: HIPAA compliance mandatory. Follow HL7 FHIR standards. Priority on privacy, security, and regulatory compliance.
Agriculture: Sensor network architectures (WSN). Optimize for low power and wide-area coverage. Handle seasonal data patterns.
Smart City: Multi-stakeholder architecture. Open data standards. Scalability and public API access.
General Purpose: ITU-T Y.2060 or IoT-A provide flexible frameworks applicable across domains.
173.5 Multi-Region Architecture Patterns
Deploying IoT systems across multiple geographic regions introduces unique architectural challenges. Here are proven patterns for global-scale IoT deployments.
Pattern 1: Regional Edge with Global Orchestration
ββββββββββββββββββββββββββββββββββββββ
β Global Orchestrator β
β (Configuration, Analytics, ML) β
ββββββββββββββββββ¬ββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββ
β β β
ββββββΌβββββ ββββββΌβββββ ββββββΌβββββ
β US-WEST β β EU β β APAC β
β Regional β β Regional β β Regional β
β Hub β β Hub β β Hub β
ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ
β β β
Local Edge Local Edge Local Edge
Implementation:
class RegionalHub:
def __init__(self, region: str, local_endpoints: list):
self.region = region
self.local_db = TimescaleDB(f"{region}-tsdb.example.com")
self.global_sync = GlobalSync("global-orchestrator.example.com")
def process_device_data(self, device_id: str, data: dict):
# Step 1: Store locally first (low latency)
self.local_db.insert(device_id, data)
# Step 2: Apply local rules (real-time)
alerts = self.apply_local_rules(data)
if alerts:
self.notify_local_operators(alerts)
# Step 3: Sync aggregates to global (async, eventual consistency)
self.global_sync.queue_aggregate({
"region": self.region,
"device_id": device_id,
"hourly_summary": self.compute_summary(data)
})
def handle_command(self, command: dict):
# Commands can come from global or local
if command["source"] == "global":
# Verify authorization for cross-region commands
if not self.global_sync.verify_command_auth(command):
raise UnauthorizedError("Global command not authorized")
# Execute locally
return self.execute_command(command)Pattern 2: Data Sovereignty Compliance
# Regional data handling configuration
regions:
EU:
data_residency: "eu-west-1"
pii_handling: "gdpr"
retention_days: 730
cross_border_transfer: false
encryption: "AES-256-GCM"
US:
data_residency: "us-east-1"
pii_handling: "ccpa"
retention_days: 365
cross_border_transfer: true
encryption: "AES-256-GCM"
CHINA:
data_residency: "cn-beijing"
pii_handling: "mlps"
retention_days: 1095
cross_border_transfer: false
encryption: "SM4" # Chinese national standard
# Aggregation rules for global analytics
global_analytics:
allowed_data:
- device_counts_per_region
- anonymized_usage_patterns
- aggregated_sensor_averages
prohibited_data:
- raw_sensor_readings
- device_identifiers
- user_piiPattern 3: Latency-Optimized Routing
class GlobalRouter:
"""Route device connections to nearest regional hub."""
def __init__(self):
self.regions = {
"us-west": {"endpoint": "iot.us-west.example.com", "lat": 37.7, "lng": -122.4},
"us-east": {"endpoint": "iot.us-east.example.com", "lat": 40.7, "lng": -74.0},
"eu-west": {"endpoint": "iot.eu-west.example.com", "lat": 51.5, "lng": -0.1},
"apac": {"endpoint": "iot.apac.example.com", "lat": 35.7, "lng": 139.7},
}
def get_nearest_endpoint(self, device_lat: float, device_lng: float) -> str:
"""Return endpoint for nearest regional hub."""
min_distance = float('inf')
nearest = None
for region, info in self.regions.items():
distance = self._haversine(device_lat, device_lng, info["lat"], info["lng"])
if distance < min_distance:
min_distance = distance
nearest = info["endpoint"]
return nearest
def failover_endpoint(self, primary_region: str) -> str:
"""Return backup endpoint if primary is unavailable."""
failover_map = {
"us-west": "us-east",
"us-east": "us-west",
"eu-west": "us-east", # EU failover to US-East (lowest latency)
"apac": "us-west"
}
return self.regions[failover_map[primary_region]]["endpoint"]Cost-latency trade-offs:
| Deployment | Latency | Monthly Cost | Best For |
|---|---|---|---|
| Single region | 50-200ms globally | $1,000 | Small scale, single market |
| 3 regions | 20-80ms | $5,000 | Global consumer products |
| Edge + 3 regions | 5-30ms | $15,000 | Real-time industrial IoT |
| Full mesh (5+ regions) | <10ms everywhere | $50,000+ | Gaming, financial, critical |
173.6 Summary
The architecture selection framework helps you make systematic decisions based on:
| Factor | Cloud-Centric | Fog/Hybrid | Edge-Centric |
|---|---|---|---|
| Device Scale | <100 | 100-10K | >10K |
| Latency | >1s acceptable | 100ms-1s | <100ms critical |
| Connectivity | Reliable | Intermittent | Offline expected |
| Data Volume | <1 GB/day | 1-100 GB/day | >100 GB/day |
Key insights:
- Mixed requirements demand multi-tier architectures - donβt force a single pattern
- Start simple and add tiers only when requirements demand them
- Industry domain influences reference model selection
- Multi-region deployments require careful data sovereignty planning
173.7 Whatβs Next
Now that you understand how to select architectures:
- To apply it: Real-World Applications - See detailed examples across industries
- To avoid mistakes: Common Pitfalls - Learn what goes wrong and how to prevent it
- Related concept: Edge-Fog Computing - Deep dive into multi-tier processing