1273  CAP Theorem and Database Categories

Learning Objectives

After completing this chapter, you will be able to:

  • Explain the CAP theorem and its implications for distributed IoT systems
  • Choose between consistency and availability for different IoT data types
  • Select appropriate database categories for specific use cases
  • Design distributed database architectures that handle network partitions
TipMVU: CAP Theorem for IoT

Core Concept: In distributed databases, you can only guarantee two of three properties: Consistency (all nodes see the same data), Availability (every request gets a response), and Partition tolerance (system works despite network failures). Why It Matters: IoT systems span edge, fog, and cloud with unreliable networks–partition tolerance is mandatory, so you must choose between consistency (CP) for control commands or availability (AP) for sensor data. Key Takeaway: Use AP (eventual consistency) for sensor telemetry where a few out-of-order readings are acceptable; use CP (strong consistency) for device commands, firmware updates, and billing records where conflicts could cause failures or financial errors.

1273.1 The CAP Theorem

The CAP theorem states that distributed systems can only guarantee two of three properties:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart TB
    CAP[CAP Theorem<br/>Pick Any 2 of 3] --> C[Consistency<br/>All nodes see same data]
    CAP --> A[Availability<br/>Every request gets response]
    CAP --> P[Partition Tolerance<br/>Works despite network failures]

    C & A --> CA[CA System<br/>Traditional RDBMS<br/>PostgreSQL, MySQL]
    C & P --> CP[CP System<br/>Strong Consistency<br/>HBase, MongoDB]
    A & P --> AP[AP System<br/>High Availability<br/>Cassandra, DynamoDB]

    AP --> IoTSensor[IoT Sensor Data<br/>Eventual consistency OK]
    CP --> IoTControl[IoT Control Commands<br/>Must be consistent]

    style CAP fill:#E67E22,stroke:#2C3E50,color:#fff
    style C fill:#2C3E50,stroke:#16A085,color:#fff
    style A fill:#16A085,stroke:#2C3E50,color:#fff
    style P fill:#2C3E50,stroke:#16A085,color:#fff
    style CA fill:#7F8C8D,stroke:#2C3E50,color:#fff
    style CP fill:#E67E22,stroke:#2C3E50,color:#fff
    style AP fill:#27AE60,stroke:#2C3E50,color:#fff

Figure 1273.1: CAP Theorem Trade-offs: Distributed systems can guarantee only two of Consistency, Availability, and Partition Tolerance–IoT sensor data often accepts AP (eventual consistency), while control commands require CP (strong consistency).

1273.2 CAP Theorem Timeline Visualization

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart LR
    subgraph Phase1["Phase 1: Normal Operation"]
        N1[Node A<br/>Data: v1] <-->|Sync OK| N2[Node B<br/>Data: v1]
        N1 --> C1[Consistent]
        N1 --> A1[Available]
    end

    subgraph Phase2["Phase 2: Network Partition"]
        P1[Node A<br/>Write v2?]
        P2[Node B<br/>Cannot sync]
        P1 -.->|No connection| P2

        P1 --> Choice{System Choice}
        Choice -->|CP Mode| CP[Reject write<br/>Stay consistent<br/>Unavailable]
        Choice -->|AP Mode| AP[Accept write<br/>v2 on A only<br/>Inconsistent]
    end

    subgraph Phase3["Phase 3: Recovery"]
        R1[Partition heals]
        R1 -->|CP Mode| RCP[No conflicts<br/>Data: v1 everywhere]
        R1 -->|AP Mode| RAP[Conflict resolution<br/>Merge v1 and v2]
    end

    Phase1 --> Phase2 --> Phase3

    style N1 fill:#27AE60,stroke:#2C3E50,color:#fff
    style N2 fill:#27AE60,stroke:#2C3E50,color:#fff
    style P1 fill:#E67E22,stroke:#2C3E50,color:#fff
    style P2 fill:#E67E22,stroke:#2C3E50,color:#fff
    style CP fill:#2C3E50,stroke:#16A085,color:#fff
    style AP fill:#16A085,stroke:#2C3E50,color:#fff
    style Choice fill:#7F8C8D,stroke:#2C3E50,color:#fff

Figure 1273.2: CAP Theorem Timeline: How consistency and availability trade-offs manifest during network partitions.

1273.3 CAP Theorem for IoT Data Types

Different IoT data types have different consistency requirements:

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart TB
    subgraph Data["IoT Data Types"]
        Sensor[Sensor Telemetry<br/>Temperature, Vibration<br/>1000s readings/sec]
        Command[Device Commands<br/>Actuator Control<br/>Config Updates]
        Meta[Device Metadata<br/>Inventory, Firmware<br/>Infrequent updates]
        Billing[Billing Records<br/>Usage Metering<br/>Financial data]
    end

    subgraph Priority["CAP Priority"]
        AP[AP Priority<br/>Availability First<br/>Eventual Consistency]
        CP[CP Priority<br/>Consistency First<br/>May Reject Writes]
        CA[CA Priority<br/>Single Datacenter<br/>Full ACID]
    end

    Sensor -->|"OK to lose a<br/>few readings"| AP
    Command -->|"Must not have<br/>conflicting commands"| CP
    Meta -->|"Partitions rare<br/>for metadata"| CA
    Billing -->|"Financial accuracy<br/>required"| CP

    AP --> APEx[Cassandra, DynamoDB<br/>InfluxDB]
    CP --> CPEx[MongoDB, HBase<br/>PostgreSQL Sync]
    CA --> CAEx[PostgreSQL<br/>MySQL Cluster]

    style Sensor fill:#16A085,stroke:#2C3E50,color:#fff
    style Command fill:#E67E22,stroke:#2C3E50,color:#fff
    style Meta fill:#2C3E50,stroke:#16A085,color:#fff
    style Billing fill:#E74C3C,stroke:#2C3E50,color:#fff
    style AP fill:#27AE60,stroke:#2C3E50,color:#fff
    style CP fill:#E67E22,stroke:#2C3E50,color:#fff
    style CA fill:#2C3E50,stroke:#16A085,color:#fff

1273.3.1 IoT Implications

  • Sensor data: Often AP (availability + partition tolerance) - eventual consistency OK
  • Control commands: Often CP (consistency + partition tolerance) - strong consistency needed
  • Edge deployments: Must handle partitions (network failures)

1273.4 Tradeoff: Synchronous vs Asynchronous Write-Ahead Log

WarningTradeoff: Synchronous vs Asynchronous WAL

Option A: Synchronous WAL (fsync on every write) - Write latency: 5-20ms per write (disk sync overhead) - Durability guarantee: Zero data loss on crash (every write persisted) - Write throughput: 5,000-20,000 writes/sec (limited by disk IOPS) - Hardware requirements: High-performance SSD (100K+ IOPS) recommended - Use cases: Financial transactions, billing records, device commands, audit logs

Option B: Asynchronous WAL (batch fsync every 100ms-1s) - Write latency: 0.5-2ms per write (memory buffer only) - Durability guarantee: Up to 1 second of data loss on crash - Write throughput: 100,000-500,000 writes/sec (memory-bound) - Hardware requirements: Standard SSD sufficient - Use cases: High-velocity sensor telemetry, metrics, logs where occasional loss is acceptable

Decision Factors: - Choose Synchronous WAL when: Every record is critical (billing, safety alerts), regulatory compliance requires provable data integrity, recovery point objective (RPO) is zero, write volume is under 20K/sec - Choose Asynchronous WAL when: Write volume exceeds 50K/sec, losing 1 second of sensor data is acceptable, latency-sensitive applications cannot wait for disk sync, cost optimization requires commodity hardware - Hybrid approach: Synchronous for commands and alerts (low volume, high value); asynchronous for telemetry (high volume, replaceable) - PostgreSQL supports per-transaction synchronous_commit settings - Real-world numbers: 10K sensors at 1Hz with sync WAL requires 10K IOPS; same load with async WAL requires only 100 IOPS (100x reduction)

1273.5 Database Categories Comparison

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart TB
    subgraph Header["Database Comparison for IoT"]
        H1[Write Speed]
        H2[Query Power]
        H3[Scalability]
    end

    subgraph Relational["Relational (SQL)"]
        R1[Medium Write Speed]
        R2[High Query Power<br/>Complex Joins]
        R3[Vertical Scaling]
        R4[Device metadata<br/>User accounts]
    end

    subgraph Document["Document (NoSQL)"]
        D1[High Write Speed]
        D2[Medium Query Power<br/>JSON queries]
        D3[Horizontal Scaling]
        D4[Configurations<br/>Event logs]
    end

    subgraph KeyValue["Key-Value"]
        K1[Very High Write Speed]
        K2[Low Query Power<br/>Get/Set only]
        K3[Horizontal Scaling]
        K4[Caching<br/>Latest state]
    end

    subgraph TimeSeries["Time-Series"]
        T1[Very High Write Speed]
        T2[Medium Query Power<br/>Time queries]
        T3[Horizontal Scaling]
        T4[Sensor readings<br/>Metrics]
    end

    subgraph Graph["Graph"]
        G1[Medium Write Speed]
        G2[High Query Power<br/>Traversals]
        G3[Horizontal Scaling]
        G4[Device topology<br/>Dependencies]
    end

    style Header fill:#2C3E50,stroke:#16A085,color:#fff
    style Relational fill:#ecf0f1,stroke:#2C3E50,color:#2C3E50
    style Document fill:#ecf0f1,stroke:#16A085,color:#2C3E50
    style KeyValue fill:#ecf0f1,stroke:#E67E22,color:#2C3E50
    style TimeSeries fill:#E8F4F8,stroke:#16A085,color:#2C3E50
    style Graph fill:#ecf0f1,stroke:#2C3E50,color:#2C3E50

Figure 1273.3: Database Comparison Matrix: Trade-offs between write speed, query complexity, and scalability for IoT workloads.

1273.5.1 Comparison Matrix

Database Type Best For Scalability Query Complexity Write Speed Example Use Case
Relational (SQL) Structured data, ACID Vertical High (SQL) Medium Device inventory, user accounts
Document (NoSQL) Semi-structured Horizontal Medium (queries) High Device configurations, events
Key-Value Simple lookups Horizontal Low (get/set) Very High Session data, caching
Time-Series Time-stamped data Horizontal Medium (time queries) Very High Sensor readings, metrics
Wide-Column Large-scale analytics Horizontal Medium Very High Device logs, telemetry
Graph Relationships Horizontal High (traversals) Medium Device topology, network analysis

1273.6 Scaling Trade-offs

1273.6.1 Tradeoff: Horizontal Sharding vs Replication

WarningTradeoff: Horizontal Sharding vs Replication for IoT Scale

Option A: Horizontal Sharding - Distribute data across nodes by device_id hash or time range - Write throughput: 500K-2M writes/sec (scales linearly with nodes) - Storage per node: 1/N of total data (e.g., 100GB per 10 nodes = 1TB total) - Query latency: 5-50ms for single-shard queries; 50-500ms for cross-shard aggregations - Operational cost: $2,000-5,000/month for 10-node cluster (cloud managed)

Option B: Read Replicas - Single primary with multiple read replicas - Write throughput: 50K-100K writes/sec (limited by single primary) - Storage per node: Full dataset on each replica (e.g., 1TB on each of 5 nodes) - Query latency: 2-10ms (local reads); 10-50ms replication lag - Operational cost: $1,500-3,000/month for primary + 4 replicas (cloud managed)

Decision Factors: - Choose Sharding when: Write volume exceeds 100K/sec, single-node storage limit exceeded (>10TB), queries are device-specific (90%+ hit single shard) - Choose Replication when: Read volume is 10x+ write volume, writes under 100K/sec, need geographic read distribution, strong consistency required for all reads - Hybrid approach: Shard by device_id + replicate each shard for high availability (most production IoT systems use this)

1273.6.2 Tradeoff: Hot vs Cold Storage

WarningTradeoff: Hot Storage (SSD/Memory) vs Cold Storage (Object Storage/HDD)

Option A: Hot Storage (NVMe SSD, In-Memory Cache) - Query latency: 1-10ms for any time range - Storage cost: $0.10-0.25/GB/month (cloud SSD), $500/TB (on-prem NVMe) - Throughput: 500K-2M IOPS (random access optimized) - Durability: Requires replication for redundancy - Best for: Real-time dashboards, alerting, last 7-30 days of data

Option B: Cold Storage (S3/GCS Object Storage, HDD) - Query latency: 100-500ms first byte, 1-10 seconds for large scans - Storage cost: $0.01-0.03/GB/month (cloud object), $20/TB (on-prem HDD) - Throughput: 100-1000 MB/s sequential (poor random access) - Durability: 99.999999999% (11 nines) built-in redundancy - Best for: Compliance archives, ML training data, data older than 30 days

Decision Factors: - Choose Hot Storage when: Query latency SLA under 50ms, frequent random access to any data point, alerting requires instant historical context, dashboards refresh every few seconds - Choose Cold Storage when: Data accessed less than once per month, batch analytics can tolerate multi-second latency, storage budget is primary constraint, regulatory retention requires 7+ years - Tiered approach: Hot (last 7 days on SSD), Warm (7-90 days on cheaper SSD), Cold (90+ days on object storage) - implements automatic data lifecycle management - Cost example: 10TB of sensor data costs $1,000/month on SSD vs $100/month on S3 - 10x savings for 90% of data that’s rarely accessed

1273.6.3 Tradeoff: Normalized vs Denormalized Schema

WarningTradeoff: Normalized Schema vs Denormalized Schema for IoT

Option A: Normalized Schema (3NF, Foreign Keys) - Storage efficiency: High - no duplicate data (device metadata stored once) - Write performance: Fast - update metadata in single location - Query performance: Slower - requires JOINs across 3-5 tables - Schema changes: Easy - add columns to dimension tables - Query latency: 10-100ms for typical dashboard queries (JOIN overhead) - Best for: Device registry, user accounts, billing records

Option B: Denormalized Schema (Flat tables, Embedded data) - Storage efficiency: Lower - device name/location repeated in every reading - Write performance: Slower - must include all metadata with each insert - Query performance: Faster - single table scan, no JOINs required - Schema changes: Hard - must backfill millions of rows for new attributes - Query latency: 2-20ms for typical dashboard queries (no JOIN overhead) - Best for: High-velocity sensor telemetry, time-series analytics

Decision Factors: - Choose Normalized when: Device metadata changes frequently (firmware versions, calibration), storage cost is critical, data integrity is paramount, analytics require flexible ad-hoc joins - Choose Denormalized when: Query latency is critical (<10ms SLA), metadata is stable (location rarely changes), write volume exceeds 100K/sec, primary access is time-range scans - Hybrid approach: Normalize device registry (PostgreSQL), denormalize telemetry with embedded tags (InfluxDB/TimescaleDB) - join at query time when needed, pre-join for dashboards - Storage impact: 100 bytes/reading with embedded metadata vs 50 bytes without = 2x storage, but avoids 5-10ms JOIN latency on every query

1273.7 Summary

  • CAP theorem constrains distributed systems to two of three guarantees: Consistency, Availability, Partition tolerance
  • IoT sensor data often accepts eventual consistency (AP) for higher availability
  • Control commands require strong consistency (CP) to prevent conflicting actions
  • Synchronous WAL guarantees durability at cost of write latency; async WAL trades durability for throughput
  • Sharding enables horizontal write scaling; replication enables read scaling and high availability
  • Storage tiering (hot/warm/cold) reduces costs by 90%+ while meeting latency SLAs
  • Schema normalization trade-offs depend on query patterns, update frequency, and latency requirements

1273.8 What’s Next

Continue with Time-Series Databases to learn about optimizing storage for sensor telemetry with TimescaleDB and InfluxDB.

Related Chapters: - Database Selection Framework - Choosing the right database - Sharding Strategies - Deep dive into horizontal scaling - Worked Examples - Complete architecture case studies