9  Time-Series Databases

In 60 Seconds

Time-series databases like TimescaleDB and InfluxDB are purpose-built for storing timestamped IoT sensor data, achieving 10-100x better performance than generic SQL databases through automatic time-based partitioning, built-in compression (90-95% storage reduction), and native retention policies. For most IoT telemetry workloads, a time-series database is the optimal choice.

Learning Objectives

After completing this chapter, you will be able to:

  • Compare time-series database architectures (TimescaleDB, InfluxDB)
  • Implement efficient schemas for sensor data storage
  • Optimize queries with continuous aggregates and indexing
  • Configure retention policies and compression for cost optimization
  • Identify common pitfalls in time-series database design and their solutions

A time-series database is specially designed for data with timestamps, which is exactly what IoT sensors produce. Think of it as a diary optimized for entries like ‘temperature at 10:00 AM, temperature at 10:01 AM.’ Regular databases can store this too, but time-series databases are purpose-built to handle millions of such entries per second and answer time-based questions instantly.

9.1 Introduction

IoT sensors produce a relentless stream of timestamped readings – temperature every second, vibration every millisecond, GPS coordinates every 10 seconds. A generic relational database can store this data, but as volume grows into billions of rows, queries slow to a crawl and storage costs spiral. Time-series databases (TSDBs) solve this through three core optimizations: time-based partitioning, columnar compression, and native retention policies. This chapter focuses on TimescaleDB (a PostgreSQL extension) as the primary example, with principles that apply broadly to InfluxDB, QuestDB, and other TSDBs.

9.2 Time-Series Database Benefits

Feature Time-Series DB Generic SQL
Write throughput 100K-1M/sec 10K-50K/sec
Time-range queries Optimized (partition pruning) Full table scan
Compression 90-95% automatic Manual, limited
Retention policies Built-in Custom scripts
Continuous aggregates Native Materialized views

9.3 Interactive: IoT Storage Calculator

Use this calculator to estimate storage requirements for your IoT deployment. Adjust the sliders to model your specific scenario.

9.4 TimescaleDB: PostgreSQL for Time-Series

TimescaleDB extends PostgreSQL with hypertables–automatically partitioned tables optimized for time-series data.

9.4.1 Hypertable Benefits

Feature Standard Table Hypertable
Query recent data Full table scan Scan 1 chunk
Delete old data Slow DELETE Instant DROP chunk
Parallel processing Limited Multi-chunk concurrent
Index management One large index Per-chunk indexes

9.4.2 Creating a Hypertable

-- Main sensor readings table
CREATE TABLE sensor_readings (
    time        TIMESTAMPTZ NOT NULL,
    device_id   VARCHAR(50) NOT NULL,
    value       DOUBLE PRECISION,
    quality     SMALLINT DEFAULT 100
);

-- Convert to hypertable with 1-day chunks
SELECT create_hypertable('sensor_readings', 'time',
    chunk_time_interval => INTERVAL '1 day');

-- Index for common queries
CREATE INDEX ON sensor_readings (device_id, time DESC);

9.4.3 Try It: Chunk Sizing

The chunk interval determines how much data goes into each partition. Too small means too many chunks to manage; too large means queries scan unnecessary data. Use this calculator to find the right balance.

9.4.4 Compression Policy

-- Enable compression
ALTER TABLE sensor_readings SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'device_id'
);

-- Compress data older than 7 days
SELECT add_compression_policy('sensor_readings', INTERVAL '7 days');

Compression results: 90-95% storage reduction for typical sensor data.

How does compression translate to real cost savings? Adjust the parameters below to see how compression affects your storage costs.

Default example: 1,000 sensors at 1 Hz with 32-byte readings produce ~940 GB/year raw. With 10:1 compression, that drops to ~94 GB, saving ~$233/year at standard cloud storage rates.

9.4.5 Retention Policy

-- Automatically drop chunks older than 90 days
SELECT add_retention_policy('sensor_readings', INTERVAL '90 days');

Why it’s fast: Dropping entire chunks is a metadata operation (milliseconds), not row-by-row deletion (hours for millions of rows).

9.4.6 Continuous Aggregates

Pre-compute common aggregations for dashboard queries:

-- Create hourly summary
CREATE MATERIALIZED VIEW hourly_summary
WITH (timescaledb.continuous) AS
SELECT
    device_id,
    time_bucket('1 hour', time) AS hour,
    AVG(value) AS avg_value,
    MIN(value) AS min_value,
    MAX(value) AS max_value,
    COUNT(*) AS reading_count
FROM sensor_readings
GROUP BY device_id, hour;

-- Auto-refresh every 30 minutes
SELECT add_continuous_aggregate_policy('hourly_summary',
    start_offset => INTERVAL '1 day',
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '30 minutes');

Performance impact: Dashboard queries go from 45 seconds (500M rows) to 10ms (hourly aggregates).

9.5 Common Pitfalls

Pitfall: Underestimating Cardinality in Time-Series Indexes

The Mistake: Creating composite indexes on (device_id, timestamp) without considering that each unique device creates a separate index entry per time interval, leading to index bloat that exceeds the data itself.

Why It Happens: Developers apply relational database indexing patterns to time-series data. In a traditional database, indexing a user_id column with 1,000 users creates 1,000 index entries. But in time-series, indexing (device_id, timestamp) for 1,000 devices with 1-second resolution creates 1,000 x 86,400 = 86.4 million entries per day. Over a year, that is 31.5 billion entries in a single global index.

The Fix: Use time-partitioned tables with partition-local indexes:

-- PostgreSQL/TimescaleDB: Partition by time, index within partitions
CREATE TABLE sensor_readings (
    time        TIMESTAMPTZ NOT NULL,
    device_id   TEXT NOT NULL,
    value       DOUBLE PRECISION
);

-- TimescaleDB automatically partitions by time
SELECT create_hypertable('sensor_readings', 'time',
    chunk_time_interval => INTERVAL '1 day');

-- Index is created per-chunk, not globally
CREATE INDEX ON sensor_readings (device_id, time DESC);

-- Each daily chunk has 86.4M rows for 1K devices at 1Hz
-- Per-chunk index: ~1.9 GB vs ~705 GB global; queries scan only relevant chunks

Cardinality calculation: For 1,000 devices at 1 Hz:

  • Daily rows: 1,000 x 86,400 = 86.4 million rows/day
  • Annual rows: 86.4M x 365 = 31.5 billion rows/year
  • Global B-tree index (annual): ~24 bytes/entry x 31.5B = ~705 GB index overhead
  • Chunked index (365 daily chunks): each chunk indexes 86.4M rows (~1.9 GB), but queries only scan relevant chunks instead of a 705 GB global index
Pitfall: Storing Timezone-Naive Timestamps

The Mistake: Using TIMESTAMP instead of TIMESTAMPTZ (timestamp with timezone), causing silent data corruption when devices or servers change timezones, or when DST transitions occur.

Why It Happens: TIMESTAMP (without timezone) is shorter to type and appears simpler. Developers assume “we’ll always use UTC” but don’t enforce it at the schema level.

The Fix: Always use timezone-aware timestamp types and enforce UTC:

-- WRONG: Timezone-naive timestamp
CREATE TABLE sensor_data (
    time        TIMESTAMP NOT NULL,  -- Ambiguous!
    device_id   TEXT,
    value       DOUBLE PRECISION
);

-- CORRECT: Timezone-aware timestamp
CREATE TABLE sensor_data (
    time        TIMESTAMPTZ NOT NULL,  -- Stores UTC, displays in session TZ
    device_id   TEXT,
    value       DOUBLE PRECISION
);

-- Enforce UTC on insert (PostgreSQL)
ALTER DATABASE iot_db SET timezone TO 'UTC';

Storage difference: Both TIMESTAMP and TIMESTAMPTZ use 8 bytes in PostgreSQL–there’s no storage penalty for timezone awareness.

9.6 Worked Example: Building Management System

TimescaleDB Sizing for a 500-Sensor Smart Office Building

Scenario: A commercial property manager deploys 500 sensors across a 20-floor office building – temperature, humidity, CO2, occupancy, and lighting – reporting every 30 seconds. Data must be retained for regulatory compliance (energy audits require 3 years of history). Budget: $200/month for database hosting.

Step 1: Calculate raw data volume

Metric Value
Sensors 500
Readings per sensor per day 86,400 / 30 = 2,880
Total readings per day 500 x 2,880 = 1,440,000
Bytes per reading ~120 bytes (timestamp 8B + device_id 20B + 5 sensor values x 8B + quality flag 2B + overhead 50B)
Raw data per day 1,440,000 x 120 = 173 MB/day
Raw data per year 173 MB x 365 = 63 GB/year
Raw data for 3 years 189 GB

Step 2: Apply TimescaleDB compression and downsampling

Data Age Strategy Storage
0-7 days (uncompressed) Raw for real-time dashboards 1.2 GB
7-90 days (compressed) 10x compression for debugging/analysis 1.4 GB
90 days - 1 year (5-min aggregates, compressed) Raw dropped; only aggregates kept 1.05 GB
Years 2-3 (5-min aggregates, compressed) Aggregates retained for compliance 2.1 GB

Total storage for 3 years: ~5.8 GB (vs. 189 GB uncompressed = 97% reduction)

Step 3: Configure retention policy (SQL)

-- Raw data: keep 7 days uncompressed for real-time dashboards
SELECT add_compression_policy('sensor_readings', INTERVAL '7 days');

-- Continuous aggregate: 5-minute summaries for trend analysis
CREATE MATERIALIZED VIEW readings_5min
WITH (timescaledb.continuous) AS
SELECT device_id, time_bucket('5 minutes', time) AS bucket,
       AVG(temperature) AS avg_temp, AVG(humidity) AS avg_humidity,
       AVG(co2) AS avg_co2, MAX(occupancy) AS max_occupancy
FROM sensor_readings
GROUP BY device_id, bucket;

-- Downsample: drop raw data after 90 days, keep 5-min aggregates 3 years
SELECT add_retention_policy('sensor_readings', INTERVAL '90 days');
SELECT add_retention_policy('readings_5min', INTERVAL '3 years');

Step 4: Select hosting tier

Provider Tier Storage CPU/RAM Cost/month
Timescale Cloud Dev 10 GB 0.5 CPU, 2 GB $29
Timescale Cloud Standard 25 GB 1 CPU, 4 GB $69
AWS RDS (self-managed) db.t3.medium 50 GB GP3 2 vCPU, 4 GB $85
Self-hosted (Hetzner) CX31 80 GB NVMe 2 vCPU, 8 GB $12

Winner: Timescale Cloud Dev at $29/month – 5.8 GB fits within 10 GB allocation, managed backups included, well within $200 budget.

Step 5: Verify query performance

Query Unpartitioned (189 GB) With Hypertable + Compression
Last 24h for 1 sensor 45 seconds 12 ms (scans 1 daily chunk)
Weekly average for all sensors 8 minutes 85 ms (reads continuous aggregate)
3-year trend for 1 floor 4 minutes 350 ms (reads downsampled aggregate)

Key insight: The combination of hypertable partitioning, compression, and continuous aggregates turns a 189 GB database into 5.8 GB while making queries 1,000x faster. The regulatory 3-year retention requirement, which initially seemed expensive, costs only $29/month with proper time-series architecture.

Key Takeaway

Time-series databases achieve dramatic performance improvements through automatic time-based partitioning (queries only scan relevant chunks), built-in compression (90-95% storage reduction), native retention policies (instant data deletion by dropping entire chunks), and continuous aggregates (pre-computed dashboards in milliseconds). Always use timezone-aware timestamps and partition-local indexes to avoid performance pitfalls.

9.7 Knowledge Check Quizzes

Test your understanding of time-series database concepts, from architecture selection to query optimization.

A time-series database is like a super-organized diary that remembers everything by WHEN it happened!

9.7.1 The Sensor Squad Adventure: The Time Machine Filing System

The Sensor Squad was drowning in data. Every sensor sent a new reading every single second, and after a year they had BILLIONS of entries!

“Finding last Tuesday’s temperature takes FOREVER!” groaned Sammy the Sensor. “The regular filing cabinet has to search through EVERY SINGLE entry!”

Max the Microcontroller had a brilliant idea: “What if we organize everything by TIME, like a diary?”

Max built a special filing system called a Hypertable:

  • Monday’s readings go in Monday’s drawer
  • Tuesday’s readings go in Tuesday’s drawer
  • Wednesday’s readings… you get the idea!

“Now when someone asks for Tuesday’s temperature,” said Max, “I only open TUESDAY’S drawer instead of searching through the whole year! That’s 365 times faster!”

Bella the Battery added another trick: “Old drawers get SQUISHED DOWN really small. We can fit a whole week into the space that used to hold just ONE day!”

“That’s compression!” said Lila the LED. “We saved 90% of our storage space!”

And the best part? When data got really old, they could just THROW AWAY an entire drawer in one second, instead of erasing entries one by one (which took hours).

“Our Time Machine Filing System is AMAZING!” cheered the Squad. “Fast searches, tiny storage, and easy cleanup!”

9.7.2 Key Words for Kids

Word What It Means
Time-Series Data organized by when it happened – like entries in a diary
Hypertable A special table that automatically splits data by time – like a filing cabinet with a drawer for each day
Compression Squishing data to take up less space – like vacuum-packing clothes to fit more in a suitcase
Retention Policy Rules for when to delete old data – like cleaning out your closet at the end of each season

Key Concepts
  • TimescaleDB: A PostgreSQL extension that transforms standard tables into time-partitioned hypertables, adding automatic chunk management, columnar compression, and continuous aggregates while preserving full SQL compatibility
  • InfluxDB: A purpose-built time-series database using a custom query language (Flux), optimized for extremely high cardinality data and providing built-in alerting and downsampling policies
  • Hypertable: A TimescaleDB table automatically partitioned into fixed-duration time chunks, enabling parallel queries, chunk exclusion for time-range queries, and per-chunk compression policies
  • Compression Policy: An automated background process that converts older time chunks from row format to columnar format, typically achieving 10-20x compression ratios on IoT telemetry with repeated float values
  • Continuous Aggregate: A TimescaleDB materialized view that automatically refreshes pre-computed aggregations as new raw data arrives, enabling sub-second dashboard queries over billions of historical rows
  • Retention Policy: A TimescaleDB or InfluxDB rule that automatically drops chunks older than a configured threshold, preventing unbounded storage growth while optionally cascading data to cold object storage
  • Chunk Exclusion: Query optimization where the planner uses time range predicates to skip chunks outside the query window, dramatically reducing I/O for recent-data queries on large hypertables
  • Downsampling: The process of replacing high-resolution raw data (1-second readings) with lower-resolution aggregates (1-minute averages) for long-term retention, preserving trend visibility while reducing storage by 60x

9.8 Summary

  • Time-series databases provide 10-100x performance improvement over generic SQL for sensor data
  • TimescaleDB hypertables automatically partition data by time, enabling efficient queries and retention
  • Compression reduces storage by 90-95% for older data
  • Retention policies implement multi-tier storage (raw -> aggregated -> archived)
  • Continuous aggregates pre-compute dashboards for millisecond response times
  • Proper indexing with time-partitioned tables avoids cardinality explosion
  • Timezone-aware timestamps prevent silent data corruption

9.9 Concept Relationships

Prerequisites - Read these first: - Database Selection Framework - Understanding when to use time-series databases - Big Data Overview - Context for high-velocity IoT data challenges

Related Concepts - Explore these next: - Time-Series Fundamentals - Deep dive into TSDB architecture (LSM trees, columnar storage) - Time-Series Platforms - Comparing InfluxDB, TimescaleDB, Prometheus - Data Quality Monitoring - Quality metrics specific to time-series data

Practical Applications:

9.10 What’s Next

If you want to… Read this next
Validate data quality in your time-series database Data Quality Monitoring
Dive deeper into TSDB internals (LSM trees, columnar storage) Time-Series Fundamentals
Compare InfluxDB, TimescaleDB, and Prometheus in detail Time-Series Platforms
Design retention and downsampling policies Data Retention
Scale your TSDB across multiple nodes Sharding Strategies