Sharding distributes IoT data across multiple database nodes when a single server cannot handle the scale. The three main strategies are time-based sharding (simple but creates hot spots), device-based sharding (even writes but complicates aggregations), and hybrid sharding (best balance for most IoT workloads). For systems with over 100K devices, hybrid sharding with device-based distribution and time-based chunking is the recommended approach.
Learning Objectives
After completing this chapter, you will be able to:
Select between time-based, device-based, and hybrid sharding strategies based on query patterns
Implement distributed hypertables in TimescaleDB
Design Cassandra partition keys for IoT workloads
Identify hot spot causes and apply compound sharding keys to resolve them
Evaluate rebalancing strategies when adding or removing nodes
For Beginners: Database Sharding
Sharding is splitting a massive database across multiple servers when one machine cannot handle all the data. Think of a library that has grown so large it needs multiple buildings – you might put fiction in one building and non-fiction in another. For IoT, you might split data by time period or by device group, so searches stay fast even as data grows enormous.
6.1 Introduction
When an IoT deployment grows beyond what a single database server can handle – whether due to write throughput, storage capacity, or query latency – the solution is sharding: distributing data across multiple nodes. Unlike simple replication (which copies all data to every node for read scalability), sharding splits the dataset so each node holds a distinct portion. The key design decision is how to split: the sharding key determines which data goes where, and a poor choice can create bottlenecks worse than no sharding at all.
6.2 The Sharding Key Decision
The sharding key determines how data is distributed across nodes. For IoT, three primary strategies exist:
6.2.1 1. Time-Based Sharding (Range Partitioning)
Shard 1: January 2024 data
Shard 2: February 2024 data
Shard 3: March 2024 data
...
Pros: Simple retention (drop entire shard), time-range queries hit single shard
Cons: Hot shard problem - all current writes go to the newest shard, creating write bottleneck
When to use: Read-heavy workloads with time-bounded queries; archival systems
hash(device_id) mod N → shard assignment
Shard 0: devices where hash(device_id) mod 3 = 0
Shard 1: devices where hash(device_id) mod 3 = 1
Shard 2: devices where hash(device_id) mod 3 = 2
Pros: Even write distribution, device-specific queries hit single shard
Cons: Cross-device aggregations require scatter-gather across all shards; retention requires per-shard cleanup
When to use: Write-heavy workloads; queries primarily filter by device
6.2.3 3. Hybrid Sharding (Two-Level)
Figure 6.1: Hybrid Sharding: Hash by device_id for write distribution, then partition by time within each shard for efficient retention.
With the strategy decision made, the next question is how to implement it. Two popular databases for IoT time-series workloads – TimescaleDB and Cassandra – each provide built-in mechanisms for hybrid sharding.
TimescaleDB 2.0+ supports distributed hypertables that implement hybrid sharding automatically. (Note: Timescale deprecated self-hosted multi-node in favor of their cloud-native architecture in later versions, but the concepts and API patterns remain relevant for understanding hybrid sharding.)
-- Create distributed hypertable across 3 data nodesCREATETABLE sensor_data (time TIMESTAMPTZ NOTNULL, device_id VARCHAR(50) NOTNULL,valueDOUBLEPRECISION, quality SMALLINT);SELECT create_distributed_hypertable('sensor_data','time', -- Time dimension (chunking)'device_id', -- Space dimension (sharding) number_partitions =>16, -- Hash partitions per node chunk_time_interval =>INTERVAL'1 day');-- Query behavior:-- Device-specific: routes to single nodeSELECT*FROM sensor_dataWHERE device_id ='sensor-42'ANDtime> NOW() -INTERVAL'1 hour';-- Time-range aggregate: parallel across all nodes, merge resultsSELECT date_trunc('hour', time), avg(value)FROM sensor_dataWHEREtime> NOW() -INTERVAL'7 days'GROUPBY1;
6.4 Implementation: Cassandra Partition Design
Apache Cassandra uses partition keys for distribution and clustering columns for ordering within partitions:
-- Optimal IoT schema: partition by device + time bucket
CREATE TABLE sensor_readings (
device_id text,
time_bucket timestamp, -- Truncated to hour/day
reading_time timestamp,
value double,
PRIMARY KEY ((device_id, time_bucket), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);
-- Why time_bucket in partition key?
-- Without: device_id alone creates unbounded partition growth
-- With: bounded partitions (typically <100MB each), predictable performance
-- Example: 1 reading/sec x 200 bytes x 86,400 sec/day = ~17MB per device-day
-- Query patterns enabled:
-- 1. Single device, recent data (hits 1 partition):
SELECT * FROM sensor_readings
WHERE device_id = 'sensor-42'
AND time_bucket = '2024-01-15 00:00:00';
-- 2. Single device, date range (hits multiple partitions):
SELECT * FROM sensor_readings
WHERE device_id = 'sensor-42'
AND time_bucket IN ('2024-01-14 00:00:00', '2024-01-15 00:00:00');
6.5 Shard Rebalancing and Hot Spots
The Hot Device Problem: Some IoT devices generate 100x more data than others (high-frequency industrial sensors vs. daily weather stations). Hash-based sharding can still create hot spots.
Solution: Compound Sharding Key
import random# Add randomization to spread hot devices across shards.# Random assignment (rather than hashing) ensures uniform write# distribution even when device_id hashes cluster. The trade-off# is that reads must scatter-gather across all buckets for a device.def generate_shard_key(device_id: str, num_buckets: int=4) ->str:""" Spread a single hot device across multiple shards. device-001 becomes: device-001:0, device-001:1, device-001:2, device-001:3 """ bucket = random.randint(0, num_buckets -1)returnf"{device_id}:{bucket}"# Queries must now scatter across buckets# SELECT * FROM sensors WHERE device_id IN ('device-001:0', 'device-001:1', ...)
Rebalancing Strategies:
Strategy
Downtime
Data Movement
Complexity
Add shard, rehash all
Hours
100%
Low
Consistent hashing
Minutes
~1 node’s share
Medium
Virtual nodes (vnodes)
Seconds
Configurable
High
6.6 Performance Characteristics by Strategy
Metric
Time-Based
Device-Based
Hybrid
Write throughput
Limited by hot shard
Linear with nodes
Linear with nodes
Device query latency
Scatter-gather
Single node
Single node
Time-range query
Single node
Scatter-gather
Scatter-gather
Retention cleanup
Drop shard (fast)
Per-shard scan (slow)
Drop chunks (fast)
Implementation
Simple
Simple
Complex
6.7 Worked Example: Fleet Tracking Platform Sharding
Scenario: A logistics company tracks 250,000 delivery vehicles. Each vehicle sends GPS coordinates every 5 seconds plus engine diagnostics every 30 seconds.
Data volume calculation:
GPS: 250,000 vehicles x 1 msg/5s = 50,000 writes/sec
Diagnostics: 250,000 vehicles x 1 msg/30s = 8,333 writes/sec
Total: ~58,000 writes/sec
Record size: GPS = 80 bytes, Diagnostics = 200 bytes
Daily volume: (50,000 x 80 + 8,333 x 200) x 86,400
= 346 GB/day GPS + 144 GB/day diagnostics = 490 GB/day
90-day retention = 44 TB total storage
Why a single server fails: PostgreSQL on a 64-core server handles ~15,000 writes/sec for indexed inserts. At 58,000 writes/sec, you need at least 4 nodes just for write throughput – but you want headroom for traffic spikes (Monday morning fleet dispatch doubles GPS frequency).
Try It: IoT Sharding Calculator
Adjust the parameters below to estimate sharding requirements for your own IoT fleet.
With single-node capacity of 15,000 writes/sec, required nodes = \(\lceil 108,333 / 15,000 \rceil = 8\) nodes minimum. The 8-node hybrid design provides exactly the headroom needed for traffic spikes while keeping costs 34% lower than a single massive instance.
Hybrid sharding design:
-- TimescaleDB distributed across 8 data nodes-- Step 1: Create tableCREATETABLE vehicle_telemetry (time TIMESTAMPTZ NOTNULL, vehicle_id VARCHAR(20) NOTNULL, msg_type VARCHAR(10) NOTNULL, -- 'gps' or 'diag' latitude DOUBLEPRECISION, longitude DOUBLEPRECISION, speed_kmh SMALLINT, engine_rpm SMALLINT, fuel_pct SMALLINT, diagnostics JSONB);-- Step 2: Distribute by vehicle_id, chunk by 6-hour intervalsSELECT create_distributed_hypertable('vehicle_telemetry','time','vehicle_id', number_partitions =>32, -- 32 hash partitions across 8 nodes chunk_time_interval =>INTERVAL'6 hours'-- 4 chunks per day per partition);-- Step 3: Continuous aggregate for fleet dashboardCREATEMATERIALIZEDVIEW fleet_summary_hourlyWITH (timescaledb.continuous) ASSELECT time_bucket('1 hour', time) AShour,count(*) AS total_readings,count(DISTINCT vehicle_id) AS active_vehicles,avg(speed_kmh) FILTER (WHERE msg_type ='gps') AS avg_speed,-- Note: percentile_cont in continuous aggregates requires TimescaleDB 2.7+percentile_cont(0.95) WITHIN GROUP (ORDERBY speed_kmh)FILTER (WHERE msg_type ='gps') AS p95_speedFROM vehicle_telemetryGROUPBYhour;
Query performance results:
Query Type
Shards Hit
Latency
Example
Single vehicle, last hour
1 of 8
12ms
“Where is truck VH-4521?”
Single vehicle, 30-day history
1 of 8
180ms
“Route history for VH-4521”
Fleet-wide, last hour
All 8
85ms
“How many vehicles are moving?”
Fleet-wide, 30-day trend
Pre-aggregated
25ms
“Average fleet utilization this month”
Cost comparison (AWS, 90-day retention):
Approach
Nodes
Monthly Cost
Writes/sec Headroom
Single RDS r6g.16xlarge
1
$8,200
Fails at 15K (need 58K)
4x RDS r6g.4xlarge (device shard)
4
$6,600
60K (tight)
8x TimescaleDB r6g.2xlarge (hybrid)
8
$5,400
120K (2x headroom)
The 8-node hybrid approach costs less than the single massive node because smaller instances have better price/performance, and 6-hour chunks enable efficient compression (typically 10-15x on GPS data).
6.8 Decision Framework
Figure 6.2: Sharding Strategy Decision: Choose based on primary query pattern, then add secondary partitioning for retention.
Rule of Thumb: For most IoT systems with >100K devices and >10K writes/sec, hybrid sharding with device-based distribution and time-based chunking provides the best balance of write performance, query flexibility, and operational simplicity.
6.9 Edge Cases and Gotchas
1. Shard Key Cardinality: Too few unique device_ids (< 100) leads to uneven distribution. Solution: compound key with sub-partitions.
2. Query Routing Overhead: Every cross-shard query adds coordinator overhead. Design queries to minimize scatter when possible.
3. Transaction Boundaries: Distributed transactions across shards are expensive or unsupported. Design schemas so single-device operations stay within one shard.
4. Schema Changes: Adding columns to distributed tables requires coordination. Plan schema evolution carefully.
5. Backup Complexity: Point-in-time recovery across shards requires consistent snapshots. Use database-native distributed backup tools.
6.10 Knowledge Check
Understanding Check: Sharding Strategy for IoT Scale
Scenario: Your IoT platform has 1 million devices generating 10 billion records/month. Query patterns: 80% filter by device_id, 20% aggregate across all devices by time range. Single PostgreSQL server maxes out at 5,000 writes/sec.
Think about:
What is the required write throughput? (10B records/month = ? writes/sec)
How should you shard the data–by device_id, by timestamp, or hybrid?
What happens to cross-device aggregate queries with device_id sharding?
How much storage do you need per shard?
Key Insight: First, calculate throughput: 10B records / (30 days x 86,400 sec/day) = ~3,858 writes/sec. A single PostgreSQL at 5,000 writes/sec could barely handle the average – but not peak loads (typically 2-3x baseline). So you need multiple nodes.
Hybrid sharding: Shard by device_id for write distribution (1M devices across 10 shards = 100K devices/shard), then partition each shard by time for efficient retention. Trade-off: Device-specific queries hit 1 shard (fast), but cross-device aggregates must query all 10 shards and merge results (slower but acceptable for 20% of queries). Storage: 10B records x 100 bytes = 1 TB/month. With 3-month retention, you need 3 TB distributed across 10 shards = 300 GB/shard.
Quiz: Hot Spot Prevention
For Kids: Meet the Sensor Squad!
Sharding is like splitting a giant library into smaller branch libraries so everyone can find books faster!
6.10.1 The Sensor Squad Adventure: The Overflowing Data Library
The Sensor Squad’s data library was in trouble. They had SO many sensor readings that the single librarian (the database server) couldn’t keep up!
“I’m receiving 10,000 readings every second!” cried the librarian. “I can’t file them all fast enough!”
Max the Microcontroller had an idea: “What if we split the library into SMALLER libraries?”
Plan A: Split by Time (The Calendar Method) “January readings go to Library 1, February to Library 2!”
“But wait,” said Sammy the Sensor, “ALL of today’s new readings still go to the SAME library – the newest one. That librarian is still overwhelmed!”
Plan B: Split by Sensor (The Name Method) “Sensors with names A-M go to Library 1, N-Z to Library 2!”
“Now the work is shared equally!” said Lila the LED. “But if I want ALL readings from last Tuesday across ALL sensors, I have to visit EVERY library. That’s slow!”
Plan C: Split by BOTH! (The Hybrid Method) “Split by sensor name for FILING, then organize by date WITHIN each library!”
Bella the Battery smiled. “Now filing is fast AND we can still find things by date!”
The librarians were happy because: - Each library only handles HALF the incoming readings (fast filing!) - Looking up one sensor’s data? Visit just ONE library! - Need to clean up old data? Just remove old date sections from each library!
“It’s like having the best of both worlds!” said Max.
6.10.2 Key Words for Kids
Word
What It Means
Sharding
Splitting a big database into smaller pieces – like dividing a huge puzzle among friends
Hot Spot
When one piece gets way more work than others – like one checkout lane being super busy
Hybrid
Combining two approaches to get the best of both – like a bicycle that’s also electric
Interactive Quiz: Sequence the Steps
:
🏷️ Label the Diagram
💻 Code Challenge
6.11 Summary
Time-based sharding is simple but creates hot spots for current writes
Device-based sharding distributes writes evenly but complicates time-range aggregations
Hybrid sharding (device hash + time partition) provides the best balance for most IoT workloads
Related Concepts - Explore these next: - CAP Theorem - Trade-offs between consistency and availability in sharded systems - Data Quality Monitoring - Quality checks in distributed architectures
Practical Applications:
Worked Examples - Fleet management sharding design (vehicle_id partitioning)
Stream Processing - Kafka partitioning aligns with database sharding
6.12 What’s Next
If you want to…
Read this next
See sharding applied to real fleet and smart city deployments