6  Sharding Strategies

In 60 Seconds

Sharding distributes IoT data across multiple database nodes when a single server cannot handle the scale. The three main strategies are time-based sharding (simple but creates hot spots), device-based sharding (even writes but complicates aggregations), and hybrid sharding (best balance for most IoT workloads). For systems with over 100K devices, hybrid sharding with device-based distribution and time-based chunking is the recommended approach.

Learning Objectives

After completing this chapter, you will be able to:

  • Select between time-based, device-based, and hybrid sharding strategies based on query patterns
  • Implement distributed hypertables in TimescaleDB
  • Design Cassandra partition keys for IoT workloads
  • Identify hot spot causes and apply compound sharding keys to resolve them
  • Evaluate rebalancing strategies when adding or removing nodes

Sharding is splitting a massive database across multiple servers when one machine cannot handle all the data. Think of a library that has grown so large it needs multiple buildings – you might put fiction in one building and non-fiction in another. For IoT, you might split data by time period or by device group, so searches stay fast even as data grows enormous.

6.1 Introduction

When an IoT deployment grows beyond what a single database server can handle – whether due to write throughput, storage capacity, or query latency – the solution is sharding: distributing data across multiple nodes. Unlike simple replication (which copies all data to every node for read scalability), sharding splits the dataset so each node holds a distinct portion. The key design decision is how to split: the sharding key determines which data goes where, and a poor choice can create bottlenecks worse than no sharding at all.

6.2 The Sharding Key Decision

The sharding key determines how data is distributed across nodes. For IoT, three primary strategies exist:

6.2.1 1. Time-Based Sharding (Range Partitioning)

Shard 1: January 2024 data
Shard 2: February 2024 data
Shard 3: March 2024 data
...

Pros: Simple retention (drop entire shard), time-range queries hit single shard

Cons: Hot shard problem - all current writes go to the newest shard, creating write bottleneck

When to use: Read-heavy workloads with time-bounded queries; archival systems

6.2.2 2. Device-Based Sharding (Hash Partitioning)

hash(device_id) mod N → shard assignment
Shard 0: devices where hash(device_id) mod 3 = 0
Shard 1: devices where hash(device_id) mod 3 = 1
Shard 2: devices where hash(device_id) mod 3 = 2

Pros: Even write distribution, device-specific queries hit single shard

Cons: Cross-device aggregations require scatter-gather across all shards; retention requires per-shard cleanup

When to use: Write-heavy workloads; queries primarily filter by device

6.2.3 3. Hybrid Sharding (Two-Level)

Two-level sharding architecture showing device IDs hashed to distribute across shards, with each shard further partitioned into time-based chunks for data retention
Figure 6.1: Hybrid Sharding: Hash by device_id for write distribution, then partition by time within each shard for efficient retention.

Pros: Balanced writes (hash) + efficient retention (time chunks) + device queries stay local

Cons: Implementation complexity; cross-device queries still scatter

When to use: Large-scale IoT with mixed query patterns (the common case)

6.3 Implementation: TimescaleDB Distributed Hypertables

With the strategy decision made, the next question is how to implement it. Two popular databases for IoT time-series workloads – TimescaleDB and Cassandra – each provide built-in mechanisms for hybrid sharding.

TimescaleDB 2.0+ supports distributed hypertables that implement hybrid sharding automatically. (Note: Timescale deprecated self-hosted multi-node in favor of their cloud-native architecture in later versions, but the concepts and API patterns remain relevant for understanding hybrid sharding.)

-- Create distributed hypertable across 3 data nodes
CREATE TABLE sensor_data (
    time        TIMESTAMPTZ NOT NULL,
    device_id   VARCHAR(50) NOT NULL,
    value       DOUBLE PRECISION,
    quality     SMALLINT
);

SELECT create_distributed_hypertable(
    'sensor_data',
    'time',                          -- Time dimension (chunking)
    'device_id',                     -- Space dimension (sharding)
    number_partitions => 16,         -- Hash partitions per node
    chunk_time_interval => INTERVAL '1 day'
);

-- Query behavior:
-- Device-specific: routes to single node
SELECT * FROM sensor_data
WHERE device_id = 'sensor-42' AND time > NOW() - INTERVAL '1 hour';

-- Time-range aggregate: parallel across all nodes, merge results
SELECT date_trunc('hour', time), avg(value)
FROM sensor_data
WHERE time > NOW() - INTERVAL '7 days'
GROUP BY 1;

6.4 Implementation: Cassandra Partition Design

Apache Cassandra uses partition keys for distribution and clustering columns for ordering within partitions:

-- Optimal IoT schema: partition by device + time bucket
CREATE TABLE sensor_readings (
    device_id    text,
    time_bucket  timestamp,      -- Truncated to hour/day
    reading_time timestamp,
    value        double,
    PRIMARY KEY ((device_id, time_bucket), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);

-- Why time_bucket in partition key?
-- Without: device_id alone creates unbounded partition growth
-- With: bounded partitions (typically <100MB each), predictable performance
-- Example: 1 reading/sec x 200 bytes x 86,400 sec/day = ~17MB per device-day

-- Query patterns enabled:
-- 1. Single device, recent data (hits 1 partition):
SELECT * FROM sensor_readings
WHERE device_id = 'sensor-42'
AND time_bucket = '2024-01-15 00:00:00';

-- 2. Single device, date range (hits multiple partitions):
SELECT * FROM sensor_readings
WHERE device_id = 'sensor-42'
AND time_bucket IN ('2024-01-14 00:00:00', '2024-01-15 00:00:00');

6.5 Shard Rebalancing and Hot Spots

The Hot Device Problem: Some IoT devices generate 100x more data than others (high-frequency industrial sensors vs. daily weather stations). Hash-based sharding can still create hot spots.

Solution: Compound Sharding Key

import random

# Add randomization to spread hot devices across shards.
# Random assignment (rather than hashing) ensures uniform write
# distribution even when device_id hashes cluster.  The trade-off
# is that reads must scatter-gather across all buckets for a device.
def generate_shard_key(device_id: str, num_buckets: int = 4) -> str:
    """
    Spread a single hot device across multiple shards.
    device-001 becomes: device-001:0, device-001:1, device-001:2, device-001:3
    """
    bucket = random.randint(0, num_buckets - 1)
    return f"{device_id}:{bucket}"

# Queries must now scatter across buckets
# SELECT * FROM sensors WHERE device_id IN ('device-001:0', 'device-001:1', ...)

Rebalancing Strategies:

Strategy Downtime Data Movement Complexity
Add shard, rehash all Hours 100% Low
Consistent hashing Minutes ~1 node’s share Medium
Virtual nodes (vnodes) Seconds Configurable High

6.6 Performance Characteristics by Strategy

Metric Time-Based Device-Based Hybrid
Write throughput Limited by hot shard Linear with nodes Linear with nodes
Device query latency Scatter-gather Single node Single node
Time-range query Single node Scatter-gather Scatter-gather
Retention cleanup Drop shard (fast) Per-shard scan (slow) Drop chunks (fast)
Implementation Simple Simple Complex

6.7 Worked Example: Fleet Tracking Platform Sharding

Scenario: A logistics company tracks 250,000 delivery vehicles. Each vehicle sends GPS coordinates every 5 seconds plus engine diagnostics every 30 seconds.

Data volume calculation:

GPS: 250,000 vehicles x 1 msg/5s = 50,000 writes/sec
Diagnostics: 250,000 vehicles x 1 msg/30s = 8,333 writes/sec
Total: ~58,000 writes/sec

Record size: GPS = 80 bytes, Diagnostics = 200 bytes
Daily volume: (50,000 x 80 + 8,333 x 200) x 86,400
            = 346 GB/day GPS + 144 GB/day diagnostics = 490 GB/day

90-day retention = 44 TB total storage

Why a single server fails: PostgreSQL on a 64-core server handles ~15,000 writes/sec for indexed inserts. At 58,000 writes/sec, you need at least 4 nodes just for write throughput – but you want headroom for traffic spikes (Monday morning fleet dispatch doubles GPS frequency).

Adjust the parameters below to estimate sharding requirements for your own IoT fleet.

How much headroom do you actually need? With 58,000 writes/sec baseline and Monday spikes doubling GPS frequency (GPS = 50,000 of those 58K writes):

\[ \text{Peak load} = 8,333\text{ diagnostics} + (50,000 \times 2)\text{ GPS} = 108,333 \text{ writes/sec} \]

With single-node capacity of 15,000 writes/sec, required nodes = \(\lceil 108,333 / 15,000 \rceil = 8\) nodes minimum. The 8-node hybrid design provides exactly the headroom needed for traffic spikes while keeping costs 34% lower than a single massive instance.

Hybrid sharding design:

-- TimescaleDB distributed across 8 data nodes
-- Step 1: Create table
CREATE TABLE vehicle_telemetry (
    time         TIMESTAMPTZ NOT NULL,
    vehicle_id   VARCHAR(20) NOT NULL,
    msg_type     VARCHAR(10) NOT NULL,  -- 'gps' or 'diag'
    latitude     DOUBLE PRECISION,
    longitude    DOUBLE PRECISION,
    speed_kmh    SMALLINT,
    engine_rpm   SMALLINT,
    fuel_pct     SMALLINT,
    diagnostics  JSONB
);

-- Step 2: Distribute by vehicle_id, chunk by 6-hour intervals
SELECT create_distributed_hypertable(
    'vehicle_telemetry',
    'time',
    'vehicle_id',
    number_partitions => 32,              -- 32 hash partitions across 8 nodes
    chunk_time_interval => INTERVAL '6 hours'  -- 4 chunks per day per partition
);

-- Step 3: Continuous aggregate for fleet dashboard
CREATE MATERIALIZED VIEW fleet_summary_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', time) AS hour,
    count(*) AS total_readings,
    count(DISTINCT vehicle_id) AS active_vehicles,
    avg(speed_kmh) FILTER (WHERE msg_type = 'gps') AS avg_speed,
    -- Note: percentile_cont in continuous aggregates requires TimescaleDB 2.7+
    percentile_cont(0.95) WITHIN GROUP (ORDER BY speed_kmh)
        FILTER (WHERE msg_type = 'gps') AS p95_speed
FROM vehicle_telemetry
GROUP BY hour;

Query performance results:

Query Type Shards Hit Latency Example
Single vehicle, last hour 1 of 8 12ms “Where is truck VH-4521?”
Single vehicle, 30-day history 1 of 8 180ms “Route history for VH-4521”
Fleet-wide, last hour All 8 85ms “How many vehicles are moving?”
Fleet-wide, 30-day trend Pre-aggregated 25ms “Average fleet utilization this month”

Cost comparison (AWS, 90-day retention):

Approach Nodes Monthly Cost Writes/sec Headroom
Single RDS r6g.16xlarge 1 $8,200 Fails at 15K (need 58K)
4x RDS r6g.4xlarge (device shard) 4 $6,600 60K (tight)
8x TimescaleDB r6g.2xlarge (hybrid) 8 $5,400 120K (2x headroom)

The 8-node hybrid approach costs less than the single massive node because smaller instances have better price/performance, and 6-hour chunks enable efficient compression (typically 10-15x on GPS data).

6.8 Decision Framework

Flowchart decision tree starting with primary query pattern, branching to time-based sharding for time-range-dominant workloads, device-based for device-dominant workloads, and hybrid sharding for mixed patterns, with secondary partitioning recommendations at each leaf
Figure 6.2: Sharding Strategy Decision: Choose based on primary query pattern, then add secondary partitioning for retention.

Rule of Thumb: For most IoT systems with >100K devices and >10K writes/sec, hybrid sharding with device-based distribution and time-based chunking provides the best balance of write performance, query flexibility, and operational simplicity.

6.9 Edge Cases and Gotchas

1. Shard Key Cardinality: Too few unique device_ids (< 100) leads to uneven distribution. Solution: compound key with sub-partitions.

2. Query Routing Overhead: Every cross-shard query adds coordinator overhead. Design queries to minimize scatter when possible.

3. Transaction Boundaries: Distributed transactions across shards are expensive or unsupported. Design schemas so single-device operations stay within one shard.

4. Schema Changes: Adding columns to distributed tables requires coordination. Plan schema evolution carefully.

5. Backup Complexity: Point-in-time recovery across shards requires consistent snapshots. Use database-native distributed backup tools.

6.10 Knowledge Check

Scenario: Your IoT platform has 1 million devices generating 10 billion records/month. Query patterns: 80% filter by device_id, 20% aggregate across all devices by time range. Single PostgreSQL server maxes out at 5,000 writes/sec.

Think about:

  1. What is the required write throughput? (10B records/month = ? writes/sec)
  2. How should you shard the data–by device_id, by timestamp, or hybrid?
  3. What happens to cross-device aggregate queries with device_id sharding?
  4. How much storage do you need per shard?

Key Insight: First, calculate throughput: 10B records / (30 days x 86,400 sec/day) = ~3,858 writes/sec. A single PostgreSQL at 5,000 writes/sec could barely handle the average – but not peak loads (typically 2-3x baseline). So you need multiple nodes.

Hybrid sharding: Shard by device_id for write distribution (1M devices across 10 shards = 100K devices/shard), then partition each shard by time for efficient retention. Trade-off: Device-specific queries hit 1 shard (fast), but cross-device aggregates must query all 10 shards and merge results (slower but acceptable for 20% of queries). Storage: 10B records x 100 bytes = 1 TB/month. With 3-month retention, you need 3 TB distributed across 10 shards = 300 GB/shard.

Sharding is like splitting a giant library into smaller branch libraries so everyone can find books faster!

6.10.1 The Sensor Squad Adventure: The Overflowing Data Library

The Sensor Squad’s data library was in trouble. They had SO many sensor readings that the single librarian (the database server) couldn’t keep up!

“I’m receiving 10,000 readings every second!” cried the librarian. “I can’t file them all fast enough!”

Max the Microcontroller had an idea: “What if we split the library into SMALLER libraries?”

Plan A: Split by Time (The Calendar Method) “January readings go to Library 1, February to Library 2!”

“But wait,” said Sammy the Sensor, “ALL of today’s new readings still go to the SAME library – the newest one. That librarian is still overwhelmed!”

Plan B: Split by Sensor (The Name Method) “Sensors with names A-M go to Library 1, N-Z to Library 2!”

“Now the work is shared equally!” said Lila the LED. “But if I want ALL readings from last Tuesday across ALL sensors, I have to visit EVERY library. That’s slow!”

Plan C: Split by BOTH! (The Hybrid Method) “Split by sensor name for FILING, then organize by date WITHIN each library!”

Bella the Battery smiled. “Now filing is fast AND we can still find things by date!”

The librarians were happy because: - Each library only handles HALF the incoming readings (fast filing!) - Looking up one sensor’s data? Visit just ONE library! - Need to clean up old data? Just remove old date sections from each library!

“It’s like having the best of both worlds!” said Max.

6.10.2 Key Words for Kids

Word What It Means
Sharding Splitting a big database into smaller pieces – like dividing a huge puzzle among friends
Hot Spot When one piece gets way more work than others – like one checkout lane being super busy
Hybrid Combining two approaches to get the best of both – like a bicycle that’s also electric

:

6.11 Summary

  • Time-based sharding is simple but creates hot spots for current writes
  • Device-based sharding distributes writes evenly but complicates time-range aggregations
  • Hybrid sharding (device hash + time partition) provides the best balance for most IoT workloads
  • TimescaleDB distributed hypertables and Cassandra compound partition keys implement hybrid sharding natively
  • Hot device problem requires compound sharding keys to spread high-frequency devices across shards
  • Rebalancing strategies trade off downtime, data movement, and implementation complexity

Prerequisites - Read these first: - Time-Series Databases - Understanding distributed hypertables in TimescaleDB - Database Selection Framework - When to choose distributed vs single-node

Related Concepts - Explore these next: - CAP Theorem - Trade-offs between consistency and availability in sharded systems - Data Quality Monitoring - Quality checks in distributed architectures

Practical Applications:

6.12 What’s Next

If you want to… Read this next
See sharding applied to real fleet and smart city deployments Worked Examples
Understand CAP trade-offs in sharded systems CAP Theorem
Set up data retention policies that align with shard lifecycle Data Retention
Choose the right database before designing your sharding strategy Database Selection Framework
Ensure data quality checks work across distributed nodes Data Quality Monitoring