%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart TB
CAP[CAP Theorem<br/>Pick Any 2 of 3] --> C[Consistency<br/>All nodes see same data]
CAP --> A[Availability<br/>Every request gets response]
CAP --> P[Partition Tolerance<br/>Works despite network failures]
C & A --> CA[CA System<br/>Traditional RDBMS<br/>PostgreSQL, MySQL]
C & P --> CP[CP System<br/>Strong Consistency<br/>HBase, MongoDB]
A & P --> AP[AP System<br/>High Availability<br/>Cassandra, DynamoDB]
AP --> IoTSensor[IoT Sensor Data<br/>Eventual consistency OK]
CP --> IoTControl[IoT Control Commands<br/>Must be consistent]
style CAP fill:#E67E22,stroke:#2C3E50,color:#fff
style C fill:#2C3E50,stroke:#16A085,color:#fff
style A fill:#16A085,stroke:#2C3E50,color:#fff
style P fill:#2C3E50,stroke:#16A085,color:#fff
style CA fill:#7F8C8D,stroke:#2C3E50,color:#fff
style CP fill:#E67E22,stroke:#2C3E50,color:#fff
style AP fill:#27AE60,stroke:#2C3E50,color:#fff
1273 CAP Theorem and Database Categories
Learning Objectives
After completing this chapter, you will be able to:
- Explain the CAP theorem and its implications for distributed IoT systems
- Choose between consistency and availability for different IoT data types
- Select appropriate database categories for specific use cases
- Design distributed database architectures that handle network partitions
Core Concept: In distributed databases, you can only guarantee two of three properties: Consistency (all nodes see the same data), Availability (every request gets a response), and Partition tolerance (system works despite network failures). Why It Matters: IoT systems span edge, fog, and cloud with unreliable networksβpartition tolerance is mandatory, so you must choose between consistency (CP) for control commands or availability (AP) for sensor data. Key Takeaway: Use AP (eventual consistency) for sensor telemetry where a few out-of-order readings are acceptable; use CP (strong consistency) for device commands, firmware updates, and billing records where conflicts could cause failures or financial errors.
1273.1 The CAP Theorem
The CAP theorem states that distributed systems can only guarantee two of three properties:
1273.2 CAP Theorem Timeline Visualization
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart LR
subgraph Phase1["Phase 1: Normal Operation"]
N1[Node A<br/>Data: v1] <-->|Sync OK| N2[Node B<br/>Data: v1]
N1 --> C1[Consistent]
N1 --> A1[Available]
end
subgraph Phase2["Phase 2: Network Partition"]
P1[Node A<br/>Write v2?]
P2[Node B<br/>Cannot sync]
P1 -.->|No connection| P2
P1 --> Choice{System Choice}
Choice -->|CP Mode| CP[Reject write<br/>Stay consistent<br/>Unavailable]
Choice -->|AP Mode| AP[Accept write<br/>v2 on A only<br/>Inconsistent]
end
subgraph Phase3["Phase 3: Recovery"]
R1[Partition heals]
R1 -->|CP Mode| RCP[No conflicts<br/>Data: v1 everywhere]
R1 -->|AP Mode| RAP[Conflict resolution<br/>Merge v1 and v2]
end
Phase1 --> Phase2 --> Phase3
style N1 fill:#27AE60,stroke:#2C3E50,color:#fff
style N2 fill:#27AE60,stroke:#2C3E50,color:#fff
style P1 fill:#E67E22,stroke:#2C3E50,color:#fff
style P2 fill:#E67E22,stroke:#2C3E50,color:#fff
style CP fill:#2C3E50,stroke:#16A085,color:#fff
style AP fill:#16A085,stroke:#2C3E50,color:#fff
style Choice fill:#7F8C8D,stroke:#2C3E50,color:#fff
1273.3 CAP Theorem for IoT Data Types
Different IoT data types have different consistency requirements:
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart TB
subgraph Data["IoT Data Types"]
Sensor[Sensor Telemetry<br/>Temperature, Vibration<br/>1000s readings/sec]
Command[Device Commands<br/>Actuator Control<br/>Config Updates]
Meta[Device Metadata<br/>Inventory, Firmware<br/>Infrequent updates]
Billing[Billing Records<br/>Usage Metering<br/>Financial data]
end
subgraph Priority["CAP Priority"]
AP[AP Priority<br/>Availability First<br/>Eventual Consistency]
CP[CP Priority<br/>Consistency First<br/>May Reject Writes]
CA[CA Priority<br/>Single Datacenter<br/>Full ACID]
end
Sensor -->|"OK to lose a<br/>few readings"| AP
Command -->|"Must not have<br/>conflicting commands"| CP
Meta -->|"Partitions rare<br/>for metadata"| CA
Billing -->|"Financial accuracy<br/>required"| CP
AP --> APEx[Cassandra, DynamoDB<br/>InfluxDB]
CP --> CPEx[MongoDB, HBase<br/>PostgreSQL Sync]
CA --> CAEx[PostgreSQL<br/>MySQL Cluster]
style Sensor fill:#16A085,stroke:#2C3E50,color:#fff
style Command fill:#E67E22,stroke:#2C3E50,color:#fff
style Meta fill:#2C3E50,stroke:#16A085,color:#fff
style Billing fill:#E74C3C,stroke:#2C3E50,color:#fff
style AP fill:#27AE60,stroke:#2C3E50,color:#fff
style CP fill:#E67E22,stroke:#2C3E50,color:#fff
style CA fill:#2C3E50,stroke:#16A085,color:#fff
1273.3.1 IoT Implications
- Sensor data: Often AP (availability + partition tolerance) - eventual consistency OK
- Control commands: Often CP (consistency + partition tolerance) - strong consistency needed
- Edge deployments: Must handle partitions (network failures)
1273.4 Tradeoff: Synchronous vs Asynchronous Write-Ahead Log
Option A: Synchronous WAL (fsync on every write) - Write latency: 5-20ms per write (disk sync overhead) - Durability guarantee: Zero data loss on crash (every write persisted) - Write throughput: 5,000-20,000 writes/sec (limited by disk IOPS) - Hardware requirements: High-performance SSD (100K+ IOPS) recommended - Use cases: Financial transactions, billing records, device commands, audit logs
Option B: Asynchronous WAL (batch fsync every 100ms-1s) - Write latency: 0.5-2ms per write (memory buffer only) - Durability guarantee: Up to 1 second of data loss on crash - Write throughput: 100,000-500,000 writes/sec (memory-bound) - Hardware requirements: Standard SSD sufficient - Use cases: High-velocity sensor telemetry, metrics, logs where occasional loss is acceptable
Decision Factors: - Choose Synchronous WAL when: Every record is critical (billing, safety alerts), regulatory compliance requires provable data integrity, recovery point objective (RPO) is zero, write volume is under 20K/sec - Choose Asynchronous WAL when: Write volume exceeds 50K/sec, losing 1 second of sensor data is acceptable, latency-sensitive applications cannot wait for disk sync, cost optimization requires commodity hardware - Hybrid approach: Synchronous for commands and alerts (low volume, high value); asynchronous for telemetry (high volume, replaceable) - PostgreSQL supports per-transaction synchronous_commit settings - Real-world numbers: 10K sensors at 1Hz with sync WAL requires 10K IOPS; same load with async WAL requires only 100 IOPS (100x reduction)
1273.5 Database Categories Comparison
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1'}}}%%
flowchart TB
subgraph Header["Database Comparison for IoT"]
H1[Write Speed]
H2[Query Power]
H3[Scalability]
end
subgraph Relational["Relational (SQL)"]
R1[Medium Write Speed]
R2[High Query Power<br/>Complex Joins]
R3[Vertical Scaling]
R4[Device metadata<br/>User accounts]
end
subgraph Document["Document (NoSQL)"]
D1[High Write Speed]
D2[Medium Query Power<br/>JSON queries]
D3[Horizontal Scaling]
D4[Configurations<br/>Event logs]
end
subgraph KeyValue["Key-Value"]
K1[Very High Write Speed]
K2[Low Query Power<br/>Get/Set only]
K3[Horizontal Scaling]
K4[Caching<br/>Latest state]
end
subgraph TimeSeries["Time-Series"]
T1[Very High Write Speed]
T2[Medium Query Power<br/>Time queries]
T3[Horizontal Scaling]
T4[Sensor readings<br/>Metrics]
end
subgraph Graph["Graph"]
G1[Medium Write Speed]
G2[High Query Power<br/>Traversals]
G3[Horizontal Scaling]
G4[Device topology<br/>Dependencies]
end
style Header fill:#2C3E50,stroke:#16A085,color:#fff
style Relational fill:#ecf0f1,stroke:#2C3E50,color:#2C3E50
style Document fill:#ecf0f1,stroke:#16A085,color:#2C3E50
style KeyValue fill:#ecf0f1,stroke:#E67E22,color:#2C3E50
style TimeSeries fill:#E8F4F8,stroke:#16A085,color:#2C3E50
style Graph fill:#ecf0f1,stroke:#2C3E50,color:#2C3E50
1273.5.1 Comparison Matrix
| Database Type | Best For | Scalability | Query Complexity | Write Speed | Example Use Case |
|---|---|---|---|---|---|
| Relational (SQL) | Structured data, ACID | Vertical | High (SQL) | Medium | Device inventory, user accounts |
| Document (NoSQL) | Semi-structured | Horizontal | Medium (queries) | High | Device configurations, events |
| Key-Value | Simple lookups | Horizontal | Low (get/set) | Very High | Session data, caching |
| Time-Series | Time-stamped data | Horizontal | Medium (time queries) | Very High | Sensor readings, metrics |
| Wide-Column | Large-scale analytics | Horizontal | Medium | Very High | Device logs, telemetry |
| Graph | Relationships | Horizontal | High (traversals) | Medium | Device topology, network analysis |
1273.6 Scaling Trade-offs
1273.6.1 Tradeoff: Horizontal Sharding vs Replication
Option A: Horizontal Sharding - Distribute data across nodes by device_id hash or time range - Write throughput: 500K-2M writes/sec (scales linearly with nodes) - Storage per node: 1/N of total data (e.g., 100GB per 10 nodes = 1TB total) - Query latency: 5-50ms for single-shard queries; 50-500ms for cross-shard aggregations - Operational cost: $2,000-5,000/month for 10-node cluster (cloud managed)
Option B: Read Replicas - Single primary with multiple read replicas - Write throughput: 50K-100K writes/sec (limited by single primary) - Storage per node: Full dataset on each replica (e.g., 1TB on each of 5 nodes) - Query latency: 2-10ms (local reads); 10-50ms replication lag - Operational cost: $1,500-3,000/month for primary + 4 replicas (cloud managed)
Decision Factors: - Choose Sharding when: Write volume exceeds 100K/sec, single-node storage limit exceeded (>10TB), queries are device-specific (90%+ hit single shard) - Choose Replication when: Read volume is 10x+ write volume, writes under 100K/sec, need geographic read distribution, strong consistency required for all reads - Hybrid approach: Shard by device_id + replicate each shard for high availability (most production IoT systems use this)
1273.6.2 Tradeoff: Hot vs Cold Storage
Option A: Hot Storage (NVMe SSD, In-Memory Cache) - Query latency: 1-10ms for any time range - Storage cost: $0.10-0.25/GB/month (cloud SSD), $500/TB (on-prem NVMe) - Throughput: 500K-2M IOPS (random access optimized) - Durability: Requires replication for redundancy - Best for: Real-time dashboards, alerting, last 7-30 days of data
Option B: Cold Storage (S3/GCS Object Storage, HDD) - Query latency: 100-500ms first byte, 1-10 seconds for large scans - Storage cost: $0.01-0.03/GB/month (cloud object), $20/TB (on-prem HDD) - Throughput: 100-1000 MB/s sequential (poor random access) - Durability: 99.999999999% (11 nines) built-in redundancy - Best for: Compliance archives, ML training data, data older than 30 days
Decision Factors: - Choose Hot Storage when: Query latency SLA under 50ms, frequent random access to any data point, alerting requires instant historical context, dashboards refresh every few seconds - Choose Cold Storage when: Data accessed less than once per month, batch analytics can tolerate multi-second latency, storage budget is primary constraint, regulatory retention requires 7+ years - Tiered approach: Hot (last 7 days on SSD), Warm (7-90 days on cheaper SSD), Cold (90+ days on object storage) - implements automatic data lifecycle management - Cost example: 10TB of sensor data costs $1,000/month on SSD vs $100/month on S3 - 10x savings for 90% of data thatβs rarely accessed
1273.6.3 Tradeoff: Normalized vs Denormalized Schema
Option A: Normalized Schema (3NF, Foreign Keys) - Storage efficiency: High - no duplicate data (device metadata stored once) - Write performance: Fast - update metadata in single location - Query performance: Slower - requires JOINs across 3-5 tables - Schema changes: Easy - add columns to dimension tables - Query latency: 10-100ms for typical dashboard queries (JOIN overhead) - Best for: Device registry, user accounts, billing records
Option B: Denormalized Schema (Flat tables, Embedded data) - Storage efficiency: Lower - device name/location repeated in every reading - Write performance: Slower - must include all metadata with each insert - Query performance: Faster - single table scan, no JOINs required - Schema changes: Hard - must backfill millions of rows for new attributes - Query latency: 2-20ms for typical dashboard queries (no JOIN overhead) - Best for: High-velocity sensor telemetry, time-series analytics
Decision Factors: - Choose Normalized when: Device metadata changes frequently (firmware versions, calibration), storage cost is critical, data integrity is paramount, analytics require flexible ad-hoc joins - Choose Denormalized when: Query latency is critical (<10ms SLA), metadata is stable (location rarely changes), write volume exceeds 100K/sec, primary access is time-range scans - Hybrid approach: Normalize device registry (PostgreSQL), denormalize telemetry with embedded tags (InfluxDB/TimescaleDB) - join at query time when needed, pre-join for dashboards - Storage impact: 100 bytes/reading with embedded metadata vs 50 bytes without = 2x storage, but avoids 5-10ms JOIN latency on every query
1273.7 Summary
- CAP theorem constrains distributed systems to two of three guarantees: Consistency, Availability, Partition tolerance
- IoT sensor data often accepts eventual consistency (AP) for higher availability
- Control commands require strong consistency (CP) to prevent conflicting actions
- Synchronous WAL guarantees durability at cost of write latency; async WAL trades durability for throughput
- Sharding enables horizontal write scaling; replication enables read scaling and high availability
- Storage tiering (hot/warm/cold) reduces costs by 90%+ while meeting latency SLAs
- Schema normalization trade-offs depend on query patterns, update frequency, and latency requirements
1273.8 Whatβs Next
Continue with Time-Series Databases to learn about optimizing storage for sensor telemetry with TimescaleDB and InfluxDB.
Related Chapters: - Database Selection Framework - Choosing the right database - Sharding Strategies - Deep dive into horizontal scaling - Worked Examples - Complete architecture case studies