12  IoT Architecture Selection Framework

In 60 Seconds

Three factors drive architecture selection: Device Scale (10s vs 100,000s), Latency Requirements (tolerant vs sub-100ms), and Data Volume (KB vs GB per day). Start cloud-centric for small scale with tolerant latency; add edge/fog layers only when real-time processing or offline operation is required.

Minimum Viable Understanding
  • Three factors drive architecture selection: Device Scale (small/medium/large), Latency Requirements (tolerant/low/critical), and Data Volume (low/medium/high per day).
  • Mixed requirements demand multi-tier architectures – do not force a single pattern when different subsystems have different latency or scale needs.
  • Start simple and add tiers only when requirements demand them – cloud-centric works for small scale and tolerant latency; add edge/fog layers only for real-time or offline needs.

Sammy the Sensor has a problem. He collects temperature readings every second, but where should he send them?

“Send everything to the cloud!” says Lila the LED excitedly. But Max the Microcontroller shakes his head. “That is like mailing a letter to another country just to ask your neighbor a question. If you need a fast answer, keep it local!”

Bella the Battery agrees: “And all that sending uses my energy! If Sammy only sends important changes instead of every single reading, I last way longer.”

The lesson? Small and simple projects can send data straight to the cloud (like mailing a letter). But big, fast projects need local helpers (edge computers) to make quick decisions nearby – like having a smart friend right next door instead of calling someone far away!

12.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Apply a systematic decision matrix for selecting IoT reference architectures
  • Evaluate device scale, latency, connectivity, and data volume requirements quantitatively
  • Match industry domains to appropriate architecture patterns using weighted scoring
  • Design multi-region architectures that enforce data sovereignty compliance per jurisdiction

This chapter covers foundational concepts for designing IoT systems at scale. Think of IoT system design like city planning – you need to consider where devices go, how they communicate, where data is stored, and how everything stays secure. Reference architectures and design principles help you create systems that work reliably and can grow over time.

12.2 Prerequisites

Before diving into this chapter, you should be familiar with:

Key Concepts

  • Selection Framework: A structured decision process mapping IoT deployment requirements (scale, latency, connectivity, power, cost) to appropriate reference architecture patterns through systematic constraint elimination
  • Latency Budget: The maximum acceptable delay from sensor event to control response, determining whether processing must occur at the device, edge, or cloud — budgets below 100 ms typically mandate edge processing
  • Connectivity Tier: The network layer classification (PAN, LAN, WAN, LPWAN, cellular) appropriate for an IoT deployment based on range, bandwidth, power, and cost requirements
  • Scale Dimension: The number of devices, message rate, data volume, and geographic distribution characterizing IoT deployment size — each scale dimension influences architecture selection independently
  • Total Cost of Ownership (TCO): The complete 5-year cost of an IoT architecture including device hardware, network connectivity, cloud services, development, operations, and maintenance — used to compare architectural alternatives
  • Architecture Fitness Function: A measurable criterion evaluating how well an IoT architecture satisfies a specific quality attribute (latency: <10 ms, availability: 99.9%, cost: <$0.01/device/day) guiding selection between alternatives

12.3 Introduction

MVU: Architecture Selection Criteria

Core Concept: Three primary factors drive architecture selection: Device Scale (small/medium/large), Latency Requirements (tolerant/>1s, low/100ms-1s, critical/<100ms), and Data Volume (low/<1GB, medium/1-100GB, high/>100GB per day).

Why It Matters: Choosing based on familiarity or trends rather than requirements leads to over-engineered solutions (unnecessary edge infrastructure) or under-capable designs (cloud-only failing real-time requirements).

Key Takeaway: Map each use case to its latency, scale, and connectivity needs. Small scale + tolerant latency + low data = cloud-centric. Large scale + critical latency + high data = distributed edge. Start simple and add tiers only when requirements demand them.

Making informed architecture decisions requires evaluating multiple factors. This framework provides a systematic approach to selecting the appropriate IoT reference architecture for your deployment.

How It Works: The Architecture Selection Decision Process

Understanding how to systematically select an IoT architecture prevents costly mistakes. Here’s the step-by-step decision process:

Step 1: Quantify Your Requirements

Start by measuring concrete values, not assumptions: - Device Scale: Count actual sensors/actuators (e.g., “500 temperature sensors, 200 HVAC controllers”) - Latency Requirement: Define max acceptable delay (e.g., “HVAC must respond within 2 seconds of occupancy change”) - Data Volume: Calculate daily data generation (e.g., “500 sensors × 10 bytes × 86,400 samples/day = 432 MB/day”) - Connectivity Pattern: Measure actual uptime (e.g., “99.9% uptime in building, 80% uptime in remote sites”)

Step 2: Map to Decision Matrix

Use the three-factor decision matrix:

IF device_scale < 100 AND latency_acceptable > 1s AND data < 1GB/day:
    → Cloud-Centric Architecture
ELSE IF device_scale < 10,000 AND latency 100ms-1s AND data 1-100GB/day:
    → Fog/Hybrid Architecture (edge + cloud tiers)
ELSE IF device_scale > 10,000 OR latency < 100ms OR data > 100GB/day:
    → Edge-Centric Architecture (distributed processing)

Step 3: Validate Edge Necessity

Edge processing adds deployment complexity - only use when required: - Latency: Can cloud round-trip (200-500ms) meet your SLA? If no → edge mandatory - Bandwidth: Does edge aggregation save >50% of cloud traffic? If yes → edge justified - Offline: Must system work during internet outages? If yes → edge mandatory - Privacy: Must sensitive data stay local? If yes → edge mandatory

Step 4: Select Reference Model

Choose based on domain requirements: - ITU-T Y.2060: Telecom integration (5G, NB-IoT, carrier partnerships) - IoT-A: Multi-stakeholder systems (smart cities, hospitals with multiple departments) - ISA-95/RAMI 4.0: Industrial automation (factories with PLCs, SCADA integration)

Example Walkthrough: Smart Building Selection

Requirements: - Scale: 750 devices (500 occupancy, 200 HVAC, 50 meters) - Latency: <2s for HVAC response to occupancy - Data: 751 readings/sec × 100 bytes × 86,400 = 6.5 GB/day - Connectivity: Reliable building Wi-Fi/Ethernet

Decision Matrix Application: - Device scale: 750 → Medium (100-10K range) - Latency: <2s → Low latency category (requires local processing) - Data: 6.5 GB/day → Medium (1-100 GB/day) - Connectivity: Reliable → BUT latency requirement overrides

Architecture Selected: Fog/Hybrid with floor-level edge gateways

Why: - <2s latency requirement forces HVAC control logic to edge (cloud round-trip would risk exceeding SLA) - Edge aggregation reduces 6.5 GB/day to ~650 MB/day (90% reduction by sending summaries, not raw data) - Cloud handles historical analytics, long-term storage, and user dashboards - Floor-level gateways provide zone isolation (one floor failure doesn’t affect others)

Reference Model: IoT-A (multi-stakeholder: building management, tenants, energy consultants)

The 90% data reduction at the edge isn’t arbitrary – it’s mathematically driven by aggregation ratios. Let’s quantify the bandwidth savings:

Raw data at Layer 1: \[ \text{Daily Volume} = 751 \text{ readings/sec} \times 100 \text{ bytes} \times 86{,}400 \text{ sec/day} = 6.5 \text{ GB/day} \]

Edge aggregation at Layer 3 (5-minute summaries): \[ \text{Aggregation Factor} = \frac{300 \text{ sec}}{1 \text{ sec}} = 300:1 \]

Reduced cloud traffic: \[ \text{Cloud Volume} = \frac{6.5 \text{ GB}}{10} = 650 \text{ MB/day} \]

Over cellular at $10/GB, this saves \((6.5 - 0.65) \times 10 = \$58.50\) per day, or $21,352 annually in bandwidth costs alone.

What to Remember: The decision process is systematic - requirements drive architecture, not vice versa. Start with measurements (Step 1), use the matrix (Step 2), validate edge necessity (Step 3), then select reference model (Step 4). Never skip requirements quantification.

Architecture selection decision matrix showing how device scale from small under 100 to large over 10000, latency requirements from tolerant over 1 second to critical under 100ms, and data volume from low under 1GB to high over 100GB per day drive architecture choices between cloud-centric, fog computing, and edge-centric deployments
Figure 12.1: Architecture Selection Decision Matrix: Three key factors (device scale, latency requirements, data volume) determine whether a cloud-centric, fog computing, or edge-centric architecture is most appropriate for your IoT deployment.
Interactive: Architecture Selection Tool

Enter your requirements to get an architecture recommendation.

Tradeoff: ITU-T Y.2060 vs IoT-A Reference Architecture

Option A (ITU-T Y.2060): Four-layer telecom-centric architecture (Device, Network, Service Support, Application). Standardized by international body, excellent for carrier integration. Best documentation for network-level concerns. Simpler conceptual model with clear layer boundaries.

Option B (IoT-A): Three-view enterprise architecture (Functional, Information, Deployment). Rich modeling of business entities and services. Better support for complex multi-stakeholder systems. More detailed security and interoperability cross-cutting concerns.

Decision Factors:

  • Choose ITU-T when: Building telecom-integrated IoT (5G/LTE-M), smart city infrastructure requiring carrier partnerships, systems where network layer is the primary complexity (protocol bridges, gateways), or teams with networking/telecom background who think in protocol stacks.

  • Choose IoT-A when: Enterprise systems with complex business logic (asset management, supply chain), multi-stakeholder deployments (hospitals, campuses) requiring access control modeling, systems where information models are critical (digital twins, virtual entities), or teams with enterprise architecture background (TOGAF, ArchiMate).

  • Practical guidance: For most IoT projects, start with ITU-T’s simpler 4-layer model for initial design. Add IoT-A’s Functional and Information views when you need to model complex business entities or multi-tenant access. Barcelona Smart City uses both: ITU-T for infrastructure, IoT-A for multi-department coordination.

Tradeoff: Centralized Gateway vs Distributed Edge Processing

Option A (Centralized Gateway): Single powerful gateway (Raspberry Pi 4, Intel NUC) aggregates all local sensors. Easier management (one device to update), unified protocol translation, simpler security perimeter. Typical capacity: 500-2,000 sensors, 10,000 messages/minute.

Option B (Distributed Edge): Multiple edge nodes (ESP32, industrial PLCs) each handle local processing. Better fault tolerance (no single point of failure), lower latency for local loops, scales horizontally. Typical capacity: 50-200 sensors per node, 1,000 messages/minute per node.

Decision Factors:

  • Choose Centralized Gateway when: Deployment area is compact (<500m radius), network is reliable (wired Ethernet or stable Wi-Fi), processing requirements are uniform across sensors, management simplicity is prioritized, or budget favors one capable device over many simple ones.

  • Choose Distributed Edge when: Latency requirements vary by zone (some <50ms, others tolerant), network partitions are possible (factory floors, multi-building campus), different sensor groups need different processing (vision in one area, vibration in another), or fault isolation is critical (one failed node shouldn’t affect others).

  • Cost comparison for 1,000 sensors: Centralized gateway (1x Intel NUC $500 + network switches $300) = $800. Distributed edge (10x ESP32 $50 each + 10x local switches $100) = $600, but add 5x management overhead. Choose centralized unless you have specific distributed requirements.

## Decision Criteria Explained

12.3.1 1. Device Scale

The number of devices fundamentally impacts architecture choices:

  • < 100 devices (Small): Simple centralized architectures work well. Direct cloud connectivity is feasible. Management overhead is minimal. Examples: home automation, small office monitoring.

  • 100-10K devices (Medium): Requires gateway aggregation and hierarchical management. Network topology becomes important. Data aggregation needed. Examples: building management, campus deployments.

  • > 10K devices (Large): Demands distributed architecture with multiple coordination points. Scalability is critical. Automated provisioning essential. Examples: smart city, nationwide sensor networks.

12.3.2 2. Latency Requirements

Real-time responsiveness determines processing location:

  • < 100ms (Ultra-low latency): Edge computing mandatory. Local decision-making required. Cloud used only for analytics and coordination. Examples: industrial automation, autonomous vehicles.

  • 100ms - 1s (Low latency): Hybrid architectures work well. Gateway can make decisions. Cloud handles non-time-critical tasks. Examples: smart building HVAC, traffic management.

  • > 1s acceptable (Standard latency): Cloud-centric is viable. Network delays acceptable. Simpler architecture possible. Examples: environmental monitoring, asset tracking.

12.3.3 3. Network Connectivity

Connection reliability shapes architecture resilience:

  • Reliable Internet: Cloud-first architecture. Continuous connectivity assumed. Centralized control and storage. Examples: urban deployments with fiber/cellular.

  • Intermittent connectivity: Fog computing for local intelligence. Store-and-forward capability. Eventual consistency models. Examples: rural areas, mobile deployments.

  • Offline periods expected: Edge autonomy required. Local data storage and processing. Synchronization when connected. Examples: maritime, remote locations.

12.3.4 4. Data Volume

The amount of data determines processing strategies:

  • < 1 GB/day: Full cloud transmission feasible. Simple architectures sufficient. Cost-effective bandwidth use. Examples: meter reading, simple sensors.

  • 1-100 GB/day: Edge filtering recommended. Pre-process and aggregate locally. Send summaries to cloud. Examples: video analytics, high-frequency sensors.

  • > 100 GB/day: Multi-tier processing essential. Distributed storage required. Hierarchical data reduction. Examples: video surveillance networks, continuous high-resolution sensing.

12.3.5 5. Industry Domain

Domain-specific requirements guide reference model selection:

  • Industrial/Manufacturing: Follow ISA-95 or RAMI 4.0. Emphasis on deterministic control, safety, and interoperability with legacy systems.

  • Smart Home/Building: Use Matter, Thread, or Zigbee standards. Focus on user experience, interoperability, and energy efficiency.

  • Healthcare/Medical: HIPAA compliance mandatory. Follow HL7 FHIR standards. Priority on privacy, security, and regulatory compliance.

  • Agriculture: Sensor network architectures (WSN). Optimize for low power and wide-area coverage. Handle seasonal data patterns.

  • Smart City: Multi-stakeholder architecture. Open data standards. Scalability and public API access.

  • General Purpose: ITU-T Y.2060 or IoT-A provide flexible frameworks applicable across domains.

Common Mistake: Treating All IoT Systems as High-Latency-Tolerant

The Scenario: A startup builds a smart home security system with door sensors, motion detectors, and cameras. Their architecture team designs a cloud-centric system where all sensor events go to AWS IoT Core for processing, then trigger alerts back to the mobile app. Latency measurements show 150-400ms round-trip times.

The Problem Emerges: Beta testers complain that “door open” notifications arrive 1-3 seconds after the door actually opens, making the security system feel unresponsive. Some testers abandon the product, saying “my $20 dumb doorbell is faster.”

What Went Wrong:

The team made three critical assumptions:

  1. “IoT = cloud” assumption: They assumed all IoT systems process data in the cloud because that’s what they learned from smart home tutorials that focused on data logging, not real-time control.

  2. Ignoring human perception: 150-400ms latency seems “fast” in absolute terms, but human perception of “instantaneous” is <100ms. Anything over 200ms feels laggy for safety/security applications.

  3. Missing the architecture selection framework: They never asked “what is our latency requirement?” If they had, they would have scored:

    • Latency requirement: < 100ms for responsive security alerts
    • Data volume: 500 devices × 10 events/day × 100 bytes = 0.5 MB/day (low)
    • Device scale: 500 beta units (small)

    Architecture selection matrix verdict: Small scale + low data + ultra-low latency = Edge processing mandatory

The Fix:

They redesigned with a hybrid architecture:

Edge Layer (local hub in home):

  • Zigbee coordinator receives door sensor events (15ms from sensor to hub)
  • Hub processes security rules locally: “If door opens AND system armed → trigger siren + send notification”
  • Immediate response: siren sounds in 30ms, notification sent to phone via local push (<50ms if phone on same Wi-Fi)

Cloud Layer (for non-time-critical functions):

  • Hub uploads event logs every 5 minutes for historical analysis
  • Cloud handles user account management, firmware updates, and long-term analytics
  • Alert history and video recordings streamed asynchronously

Before vs After:

Metric Cloud-Only Hybrid Edge-Cloud
Door sensor → siren 1,200 ms (AWS round-trip) 30 ms (local hub)
Door sensor → phone notification 800 ms (AWS push) 50 ms (local, same Wi-Fi) or 300 ms (cellular)
User perception “Laggy and untrustworthy” “Instant and reliable”
Works during internet outage? No (system unusable) Yes (security functions continue)
Bandwidth cost (500 devices) $25/month (AWS IoT Core) $5/month (aggregated logs only)

Cost Impact:

  • Hardware: Added $35 Zigbee hub per home (one-time)
  • Development: 2 extra months to implement edge logic ($60K)
  • But: Churn reduction from 40% to 8% saved 160 customers × $15/month × 24 months = $57,600 annual recurring revenue saved → ROI in 2.5 months

How to Avoid This:

  1. Map latency requirements early: Before writing code, create a latency SLA table:
Feature Human-Perceived Requirement Technical Latency Budget Edge or Cloud?
Security alert “Instant” < 100ms Edge mandatory
Temperature display “Current” < 2 seconds Cloud acceptable
Monthly usage report “On-demand” < 5 seconds Cloud acceptable
  1. Use the architecture selection framework: Don’t guess. Calculate:

    • If any feature requires < 100ms latency → Edge processing needed
    • If > 50% of data is time-critical → Edge reduces bandwidth costs
    • If reliability during internet outages matters → Edge provides autonomy
  2. Prototype both approaches: Build a minimal edge version AND cloud version in parallel for 2-week sprints. Measure actual latency, not simulated. User testing reveals perception issues that specs miss.

  3. Benchmark competitor latency: The smart home security market leader (Ring, SimpliSafe) achieves 50-150ms notification times. If your cloud-only approach is 2-4× slower, users will notice.

Industry Examples:

  • Google Nest Hub (correct): Local voice processing for “Hey Google” (<100ms), cloud only for intent understanding → feels responsive
  • Early Nest Thermostat (mistake): Cloud-only temperature scheduling → system unresponsive when internet dropped → redesigned with local automation
  • Philips Hue Bridge (correct): Edge processing for on/off/dimming (<30ms), cloud for scenes/automation → fast for immediate control

Lesson: Not all IoT is tolerant of cloud latency. Security, safety, and human-interface systems need <100ms response times that only edge processing can deliver. Cloud-centric is not a universal IoT architecture pattern — it’s one option in a spectrum from edge to cloud, selected based on latency requirements, not trends or familiarity.

12.4 Multi-Region Architecture Patterns

Deploying IoT systems across multiple geographic regions introduces unique architectural challenges. Here are proven patterns for global-scale IoT deployments.

Pattern 1: Regional Edge with Global Orchestration

          ┌────────────────────────────────────┐
          │         Global Orchestrator        │
          │  (Configuration, Analytics, ML)    │
          └────────────────┬───────────────────┘
                           │
     ┌─────────────────────┼─────────────────────┐
     │                     │                     │
┌────▼────┐           ┌────▼────┐           ┌────▼────┐
│ US-WEST │           │   EU    │           │  APAC   │
│ Regional │          │ Regional │          │ Regional │
│   Hub    │          │   Hub    │          │   Hub    │
└────┬────┘           └────┬────┘           └────┬────┘
     │                     │                     │
 Local Edge            Local Edge            Local Edge

Implementation:

class RegionalHub:
    def __init__(self, region: str, local_endpoints: list):
        self.region = region
        self.local_db = TimescaleDB(f"{region}-tsdb.example.com")
        self.global_sync = GlobalSync("global-orchestrator.example.com")

    def process_device_data(self, device_id: str, data: dict):
        # Step 1: Store locally first (low latency)
        self.local_db.insert(device_id, data)

        # Step 2: Apply local rules (real-time)
        alerts = self.apply_local_rules(data)
        if alerts:
            self.notify_local_operators(alerts)

        # Step 3: Sync aggregates to global (async, eventual consistency)
        self.global_sync.queue_aggregate({
            "region": self.region,
            "device_id": device_id,
            "hourly_summary": self.compute_summary(data)
        })

    def handle_command(self, command: dict):
        # Commands can come from global or local
        if command["source"] == "global":
            # Verify authorization for cross-region commands
            if not self.global_sync.verify_command_auth(command):
                raise UnauthorizedError("Global command not authorized")
        # Execute locally
        return self.execute_command(command)

Pattern 2: Data Sovereignty Compliance

# Regional data handling configuration
regions:
  EU:
    data_residency: "eu-west-1"
    pii_handling: "gdpr"
    retention_days: 730
    cross_border_transfer: false
    encryption: "AES-256-GCM"

  US:
    data_residency: "us-east-1"
    pii_handling: "ccpa"
    retention_days: 365
    cross_border_transfer: true
    encryption: "AES-256-GCM"

  CHINA:
    data_residency: "cn-beijing"
    pii_handling: "mlps"
    retention_days: 1095
    cross_border_transfer: false
    encryption: "SM4"  # Chinese national standard

# Aggregation rules for global analytics
global_analytics:
  allowed_data:
    - device_counts_per_region
    - anonymized_usage_patterns
    - aggregated_sensor_averages
  prohibited_data:
    - raw_sensor_readings
    - device_identifiers
    - user_pii

Pattern 3: Latency-Optimized Routing

class GlobalRouter:
    """Route device connections to nearest regional hub."""

    def __init__(self):
        self.regions = {
            "us-west": {"endpoint": "iot.us-west.example.com", "lat": 37.7, "lng": -122.4},
            "us-east": {"endpoint": "iot.us-east.example.com", "lat": 40.7, "lng": -74.0},
            "eu-west": {"endpoint": "iot.eu-west.example.com", "lat": 51.5, "lng": -0.1},
            "apac":    {"endpoint": "iot.apac.example.com", "lat": 35.7, "lng": 139.7},
        }

    def get_nearest_endpoint(self, device_lat: float, device_lng: float) -> str:
        """Return endpoint for nearest regional hub."""
        min_distance = float('inf')
        nearest = None

        for region, info in self.regions.items():
            distance = self._haversine(device_lat, device_lng, info["lat"], info["lng"])
            if distance < min_distance:
                min_distance = distance
                nearest = info["endpoint"]

        return nearest

    def failover_endpoint(self, primary_region: str) -> str:
        """Return backup endpoint if primary is unavailable."""
        failover_map = {
            "us-west": "us-east",
            "us-east": "us-west",
            "eu-west": "us-east",  # EU failover to US-East (lowest latency)
            "apac": "us-west"
        }
        return self.regions[failover_map[primary_region]]["endpoint"]

Cost-latency trade-offs:

Deployment Latency Monthly Cost Best For
Single region 50-200ms globally $1,000 Small scale, single market
3 regions 20-80ms $5,000 Global consumer products
Edge + 3 regions 5-30ms $15,000 Real-time industrial IoT
Full mesh (5+ regions) <10ms everywhere $50,000+ Gaming, financial, critical

Apply the decision framework to this scenario:

Your Scenario: You’re building a smart warehouse with 800 temperature sensors, 200 asset tracking tags, and 50 security cameras across 3 buildings. Requirements: - Temperature alerts must trigger within 30 seconds - Asset tracking updates every 5 minutes acceptable - Video archived for 90 days, rarely accessed - Network: Wi-Fi in buildings, but intermittent between buildings

Exercise Steps:

  1. Calculate data volume:

    • Temperature: 800 sensors × 10 bytes × 12/hour = 96 KB/hour
    • Asset tracking: 200 tags × 50 bytes × 12/hour = 120 KB/hour
    • Video: 50 cameras × 2 Mbps × 8 hours = 3.6 TB/day
  2. Map to decision matrix:

    • Device scale: 1,050 devices → Medium (100-10K range)
    • Latency: 30 seconds for alerts → Low latency (100ms-1s)
    • Connectivity: Intermittent between buildings → Fog/hybrid needed
    • Data volume: 3.6 TB/day → High (>100 GB/day)
  3. Architecture decision: What architecture tier should you select? (Hint: mixed requirements!)

  4. Edge processing question: Should temperature threshold checking happen at edge or cloud? Why?

  5. Storage strategy: Where should you store video vs temperature data?

What to Observe: Notice how the video cameras (high data volume) drive your storage architecture differently than temperature sensors (low data volume). The 30-second latency requirement for temperature alerts forces edge processing, even though video can use cloud storage.

12.5 Concept Relationships

IoT Architecture Concept Relationship to Architecture Selection Practical Impact
Device Scale Determines gateway count and network topology <100 devices = direct cloud; >10K = multi-tier with regional hubs
Latency Requirements Drives edge vs cloud processing split <100ms = mandatory edge; >1s = cloud acceptable
Connectivity Reliability Influences store-and-forward buffering needs Intermittent = fog with local intelligence; reliable = cloud-centric viable
Data Volume Determines bandwidth costs and storage architecture >100 GB/day = edge aggregation mandatory; <1 GB/day = full cloud transmission feasible
Industry Domain Guides reference model selection (ITU-T vs IoT-A vs ISA-95) Smart city = ITU-T; enterprise = IoT-A; factory = ISA-95/RAMI 4.0

Common Pitfalls

Teams often default to architectures they know (e.g., “we always use AWS IoT Core”) rather than evaluating whether it fits the project’s latency, connectivity, and cost constraints. A thorough requirements analysis — latency, data volume, connectivity, budget, regulatory — must drive architecture selection before technology choices are made.

Functional requirements (what the system does) are easy to specify; non-functional requirements (how well it does it) are critical but often neglected. A system that meets all functional requirements but suffers from high latency, poor reliability, or prohibitive operating costs will fail in production. Include NFRs explicitly in architecture scoring matrices.

Architecture selection often focuses on initial build cost while ignoring 5-year total cost of ownership: cloud egress fees, device firmware update complexity, battery replacement logistics, and support costs. A serverless architecture may cost less to build but much more to operate at scale than a self-hosted edge deployment.

Architecture decisions made purely on paper often rest on untested assumptions about network reliability, sensor accuracy, or protocol performance. Build a minimum viable prototype that stress-tests the riskiest architectural assumptions before committing to full development.

12.6 Summary

The architecture selection framework helps you make systematic decisions based on:

Factor Cloud-Centric Fog/Hybrid Edge-Centric
Device Scale <100 100-10K >10K
Latency >1s acceptable 100ms-1s <100ms critical
Connectivity Reliable Intermittent Offline expected
Data Volume <1 GB/day 1-100 GB/day >100 GB/day

Key insights:

  • Mixed requirements demand multi-tier architectures - don’t force a single pattern
  • Start simple and add tiers only when requirements demand them
  • Industry domain influences reference model selection
  • Multi-region deployments require careful data sovereignty planning

12.7 See Also

12.8 Knowledge Check

12.9 What’s Next

If you want to… Read this
See real-world application architectures Architecture Applications
Study common architectural mistakes Common Pitfalls and Best Practices
Work through a complete smart building example Smart Building Worked Example
Learn about production architecture management Production Architecture Management
Explore QoS and service levels QoS and Service Management