%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#7F8C8D', 'secondaryColor': '#16A085', 'tertiaryColor': '#E67E22', 'fontSize': '14px'}}}%%
flowchart TB
subgraph L1["Level 1: Hash by device_id"]
direction LR
HASH["Hash Function"]
end
subgraph S1["Shard 1<br/>devices 0-999"]
direction TB
S1J["Jan 2024 chunk"]
S1F["Feb 2024 chunk"]
S1M["Mar 2024 chunk"]
end
subgraph S2["Shard 2<br/>devices 1000-1999"]
direction TB
S2J["Jan 2024 chunk"]
S2F["Feb 2024 chunk"]
S2M["Mar 2024 chunk"]
end
subgraph S3["Shard N<br/>devices N-range"]
direction TB
S3J["Jan 2024 chunk"]
S3F["Feb 2024 chunk"]
S3M["Mar 2024 chunk"]
end
HASH --> S1
HASH --> S2
HASH --> S3
style L1 fill:#2C3E50,stroke:#16A085,color:#fff
style S1 fill:#16A085,stroke:#2C3E50,color:#fff
style S2 fill:#16A085,stroke:#2C3E50,color:#fff
style S3 fill:#16A085,stroke:#2C3E50,color:#fff
style HASH fill:#E67E22,stroke:#2C3E50,color:#fff
1277 Sharding Strategies
Learning Objectives
After completing this chapter, you will be able to:
- Choose between time-based, device-based, and hybrid sharding strategies
- Implement distributed hypertables in TimescaleDB
- Design Cassandra partition keys for IoT workloads
- Handle hot spots and shard rebalancing
- Make informed decisions about sharding vs. replication
1277.1 Introduction
Sharding (horizontal partitioning) distributes data across multiple database nodes to handle scale beyond a single server’s capacity. For IoT time-series data, choosing the right sharding key and strategy is critical for both write performance and query efficiency.
1277.2 The Sharding Key Decision
The sharding key determines how data is distributed across nodes. For IoT, three primary strategies exist:
1277.2.1 1. Time-Based Sharding (Range Partitioning)
Shard 1: January 2024 data
Shard 2: February 2024 data
Shard 3: March 2024 data
...
Pros: Simple retention (drop entire shard), time-range queries hit single shard
Cons: Hot shard problem - all current writes go to the newest shard, creating write bottleneck
When to use: Read-heavy workloads with time-bounded queries; archival systems
1277.2.2 2. Device-Based Sharding (Hash Partitioning)
Shard 1: device_ids hashing to 0-999
Shard 2: device_ids hashing to 1000-1999
Shard 3: device_ids hashing to 2000-2999
...
Pros: Even write distribution, device-specific queries hit single shard
Cons: Cross-device aggregations require scatter-gather across all shards; retention requires per-shard cleanup
When to use: Write-heavy workloads; queries primarily filter by device
1277.2.3 3. Hybrid Sharding (Two-Level)
Pros: Balanced writes (hash) + efficient retention (time chunks) + device queries stay local
Cons: Implementation complexity; cross-device queries still scatter
When to use: Large-scale IoT with mixed query patterns (the common case)
1277.3 Implementation: TimescaleDB Distributed Hypertables
TimescaleDB 2.0+ supports distributed hypertables that implement hybrid sharding automatically:
-- Create distributed hypertable across 3 data nodes
CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
device_id VARCHAR(50) NOT NULL,
value DOUBLE PRECISION,
quality SMALLINT
);
SELECT create_distributed_hypertable(
'sensor_data',
'time', -- Time dimension (chunking)
'device_id', -- Space dimension (sharding)
number_partitions => 16, -- Hash partitions per node
chunk_time_interval => INTERVAL '1 day'
);
-- Query behavior:
-- Device-specific: routes to single node
SELECT * FROM sensor_data
WHERE device_id = 'sensor-42' AND time > NOW() - INTERVAL '1 hour';
-- Time-range aggregate: parallel across all nodes, merge results
SELECT date_trunc('hour', time), avg(value)
FROM sensor_data
WHERE time > NOW() - INTERVAL '7 days'
GROUP BY 1;1277.4 Implementation: Cassandra Partition Design
Apache Cassandra uses partition keys for distribution and clustering columns for ordering within partitions:
-- Optimal IoT schema: partition by device + time bucket
CREATE TABLE sensor_readings (
device_id text,
time_bucket timestamp, -- Truncated to hour/day
reading_time timestamp,
value double,
PRIMARY KEY ((device_id, time_bucket), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);
-- Why time_bucket in partition key?
-- Without: device_id alone creates unbounded partition growth
-- With: bounded partitions (~100MB each), predictable performance
-- Query patterns enabled:
-- 1. Single device, recent data (hits 1 partition):
SELECT * FROM sensor_readings
WHERE device_id = 'sensor-42'
AND time_bucket = '2024-01-15 00:00:00';
-- 2. Single device, date range (hits multiple partitions):
SELECT * FROM sensor_readings
WHERE device_id = 'sensor-42'
AND time_bucket IN ('2024-01-14 00:00:00', '2024-01-15 00:00:00');
1277.6 Performance Characteristics by Strategy
| Metric | Time-Based | Device-Based | Hybrid |
|---|---|---|---|
| Write throughput | Limited by hot shard | Linear with nodes | Linear with nodes |
| Device query latency | Scatter-gather | Single node | Single node |
| Time-range query | Single node | Scatter-gather | Scatter-gather |
| Retention cleanup | Drop shard (fast) | Per-shard scan (slow) | Drop chunks (fast) |
| Implementation | Simple | Simple | Complex |
1277.7 Edge Cases and Gotchas
1. Shard Key Cardinality: Too few unique device_ids (< 100) leads to uneven distribution. Solution: compound key with sub-partitions.
2. Query Routing Overhead: Every cross-shard query adds coordinator overhead. Design queries to minimize scatter when possible.
3. Transaction Boundaries: Distributed transactions across shards are expensive or unsupported. Design schemas so single-device operations stay within one shard.
4. Schema Changes: Adding columns to distributed tables requires coordination. Plan schema evolution carefully.
5. Backup Complexity: Point-in-time recovery across shards requires consistent snapshots. Use database-native distributed backup tools.
1277.8 Decision Framework
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#7F8C8D', 'secondaryColor': '#16A085', 'tertiaryColor': '#E67E22', 'fontSize': '14px'}}}%%
flowchart TD
START["Start Here:<br/>What is your PRIMARY query pattern?"]
Q1["80%+ queries filter<br/>by device_id first"]
Q2["80%+ queries filter<br/>by time range first"]
Q3["Mixed patterns<br/>(common in IoT)"]
S1["Device-based sharding<br/>(hash by device_id)"]
S2["Time-based sharding<br/>(range by timestamp)"]
S3["Hybrid sharding<br/>(hash by device, partition by time)"]
R1["Add time-based partitioning<br/>within shards for retention"]
R2["Accept write hotspot OR<br/>use write buffering"]
R3["Use TimescaleDB distributed hypertables<br/>or Cassandra compound partition keys"]
START --> Q1
START --> Q2
START --> Q3
Q1 --> S1
Q2 --> S2
Q3 --> S3
S1 --> R1
S2 --> R2
S3 --> R3
style START fill:#2C3E50,stroke:#16A085,color:#fff
style Q1 fill:#16A085,stroke:#2C3E50,color:#fff
style Q2 fill:#16A085,stroke:#2C3E50,color:#fff
style Q3 fill:#16A085,stroke:#2C3E50,color:#fff
style S1 fill:#E67E22,stroke:#2C3E50,color:#fff
style S2 fill:#E67E22,stroke:#2C3E50,color:#fff
style S3 fill:#E67E22,stroke:#2C3E50,color:#fff
style R1 fill:#7F8C8D,stroke:#2C3E50,color:#fff
style R2 fill:#7F8C8D,stroke:#2C3E50,color:#fff
style R3 fill:#7F8C8D,stroke:#2C3E50,color:#fff
Rule of Thumb: For most IoT systems with >100K devices and >10K writes/sec, hybrid sharding with device-based distribution and time-based chunking provides the best balance of write performance, query flexibility, and operational simplicity.
1277.9 Knowledge Check
Scenario: Your IoT platform has 1 million devices generating 10 billion records/month. Query patterns: 80% filter by device_id, 20% aggregate across all devices by time range. Single PostgreSQL server maxes out at 5,000 writes/sec.
Think about: 1. How should you shard the data–by device_id, by timestamp, or hybrid? 2. What happens to cross-device aggregate queries with device_id sharding? 3. How many shards do you need for 10 billion records/month at 100 bytes/record?
Key Insight: Hybrid sharding: Shard by device_id for write distribution (1M devices across 10 shards = 100K devices/shard), then partition each shard by time for efficient retention. Trade-off: Device-specific queries hit 1 shard (fast), but cross-device aggregates must query all 10 shards and merge results (slower but acceptable for 20% of queries). Storage: 10B records x 100 bytes = 1 TB/month. With 3-month retention, you need 3 TB distributed across 10 shards = 300 GB/shard.
Question: What is the key difference between horizontal partitioning (sharding) and vertical partitioning?
Horizontal partitioning (sharding) divides rows across servers–e.g., users 1-1M on server1, 1M-2M on server2. Scales throughput and storage by distributing data.
Vertical partitioning splits columns–e.g., user profile (name, email) on server1, user activity (clicks, views) on server2. Separates hot/cold data.
IoT example: Shard sensor readings by device_id range (horizontal) to distribute 10TB across 10 servers (1TB each). Vertically partition metadata (device name, location) from metrics (temperature, pressure) since metadata is rarely updated.
1277.10 Summary
- Time-based sharding is simple but creates hot spots for current writes
- Device-based sharding distributes writes evenly but complicates time-range aggregations
- Hybrid sharding (device hash + time partition) provides the best balance for most IoT workloads
- TimescaleDB distributed hypertables and Cassandra compound partition keys implement hybrid sharding natively
- Hot device problem requires compound sharding keys to spread high-frequency devices across shards
- Rebalancing strategies trade off downtime, data movement, and implementation complexity
1277.11 What’s Next
Continue with Worked Examples to see complete database architecture designs for fleet management and smart city data lakes.
Related Chapters: - CAP Theorem - Distributed systems trade-offs - Time-Series Databases - TimescaleDB optimization - Database Selection Framework - Choosing the right database