Time-series databases like TimescaleDB and InfluxDB are purpose-built for storing timestamped IoT sensor data, achieving 10-100x better performance than generic SQL databases through automatic time-based partitioning, built-in compression (90-95% storage reduction), and native retention policies. For most IoT telemetry workloads, a time-series database is the optimal choice.
Learning Objectives
After completing this chapter, you will be able to:
Implement efficient schemas for sensor data storage
Optimize queries with continuous aggregates and indexing
Configure retention policies and compression for cost optimization
Identify common pitfalls in time-series database design and their solutions
For Beginners: Time-Series Databases
A time-series database is specially designed for data with timestamps, which is exactly what IoT sensors produce. Think of it as a diary optimized for entries like ‘temperature at 10:00 AM, temperature at 10:01 AM.’ Regular databases can store this too, but time-series databases are purpose-built to handle millions of such entries per second and answer time-based questions instantly.
9.1 Introduction
IoT sensors produce a relentless stream of timestamped readings – temperature every second, vibration every millisecond, GPS coordinates every 10 seconds. A generic relational database can store this data, but as volume grows into billions of rows, queries slow to a crawl and storage costs spiral. Time-series databases (TSDBs) solve this through three core optimizations: time-based partitioning, columnar compression, and native retention policies. This chapter focuses on TimescaleDB (a PostgreSQL extension) as the primary example, with principles that apply broadly to InfluxDB, QuestDB, and other TSDBs.
9.2 Time-Series Database Benefits
Feature
Time-Series DB
Generic SQL
Write throughput
100K-1M/sec
10K-50K/sec
Time-range queries
Optimized (partition pruning)
Full table scan
Compression
90-95% automatic
Manual, limited
Retention policies
Built-in
Custom scripts
Continuous aggregates
Native
Materialized views
9.3 Interactive: IoT Storage Calculator
Use this calculator to estimate storage requirements for your IoT deployment. Adjust the sliders to model your specific scenario.
Show code
viewof num_sensors = Inputs.range([10,50000], {value:5000,step:10,label:"Number of sensors"})viewof reading_interval = Inputs.select([1,5,10,15,30,60,300,600], {value:30,label:"Reading interval (seconds)",format: x => x <60?`${x} sec`:`${x/60} min`})viewof bytes_per_reading = Inputs.range([16,512], {value:200,step:8,label:"Bytes per reading"})viewof retention_years = Inputs.range([1,10], {value:1,step:1,label:"Retention period (years)"})viewof compression_pct = Inputs.range([0,99], {value:90,step:5,label:"Compression savings (%)"})
TimescaleDB extends PostgreSQL with hypertables–automatically partitioned tables optimized for time-series data.
9.4.1 Hypertable Benefits
Feature
Standard Table
Hypertable
Query recent data
Full table scan
Scan 1 chunk
Delete old data
Slow DELETE
Instant DROP chunk
Parallel processing
Limited
Multi-chunk concurrent
Index management
One large index
Per-chunk indexes
9.4.2 Creating a Hypertable
-- Main sensor readings tableCREATETABLE sensor_readings (time TIMESTAMPTZ NOTNULL, device_id VARCHAR(50) NOTNULL,valueDOUBLEPRECISION, quality SMALLINTDEFAULT100);-- Convert to hypertable with 1-day chunksSELECT create_hypertable('sensor_readings', 'time', chunk_time_interval =>INTERVAL'1 day');-- Index for common queriesCREATEINDEXON sensor_readings (device_id, timeDESC);
9.4.3 Try It: Chunk Sizing
The chunk interval determines how much data goes into each partition. Too small means too many chunks to manage; too large means queries scan unnecessary data. Use this calculator to find the right balance.
Show code
viewof chunk_sensors = Inputs.range([100,100000], {value:10000,step:100,label:"Number of sensors"})viewof chunk_hz = Inputs.range([0.1,10], {value:1,step:0.1,label:"Readings per second (Hz)"})viewof chunk_row_bytes = Inputs.range([32,256], {value:64,step:8,label:"Bytes per row"})viewof chunk_interval = Inputs.select([1,6,12,24,48,168], {value:24,label:"Chunk interval",format: x => x <24?`${x} hours`: x ===24?"1 day": x ===48?"2 days":"1 week"})
Default example: 1,000 sensors at 1 Hz with 32-byte readings produce ~940 GB/year raw. With 10:1 compression, that drops to ~94 GB, saving ~$233/year at standard cloud storage rates.
9.4.5 Retention Policy
-- Automatically drop chunks older than 90 daysSELECT add_retention_policy('sensor_readings', INTERVAL'90 days');
Why it’s fast: Dropping entire chunks is a metadata operation (milliseconds), not row-by-row deletion (hours for millions of rows).
9.4.6 Continuous Aggregates
Pre-compute common aggregations for dashboard queries:
-- Create hourly summaryCREATEMATERIALIZEDVIEW hourly_summaryWITH (timescaledb.continuous) ASSELECT device_id, time_bucket('1 hour', time) AShour,AVG(value) AS avg_value,MIN(value) AS min_value,MAX(value) AS max_value,COUNT(*) AS reading_countFROM sensor_readingsGROUPBY device_id, hour;-- Auto-refresh every 30 minutesSELECT add_continuous_aggregate_policy('hourly_summary', start_offset =>INTERVAL'1 day', end_offset =>INTERVAL'1 hour', schedule_interval =>INTERVAL'30 minutes');
Performance impact: Dashboard queries go from 45 seconds (500M rows) to 10ms (hourly aggregates).
9.5 Common Pitfalls
Pitfall: Underestimating Cardinality in Time-Series Indexes
The Mistake: Creating composite indexes on (device_id, timestamp) without considering that each unique device creates a separate index entry per time interval, leading to index bloat that exceeds the data itself.
Why It Happens: Developers apply relational database indexing patterns to time-series data. In a traditional database, indexing a user_id column with 1,000 users creates 1,000 index entries. But in time-series, indexing (device_id, timestamp) for 1,000 devices with 1-second resolution creates 1,000 x 86,400 = 86.4 million entries per day. Over a year, that is 31.5 billion entries in a single global index.
The Fix: Use time-partitioned tables with partition-local indexes:
-- PostgreSQL/TimescaleDB: Partition by time, index within partitionsCREATETABLE sensor_readings (time TIMESTAMPTZ NOTNULL, device_id TEXT NOTNULL,valueDOUBLEPRECISION);-- TimescaleDB automatically partitions by timeSELECT create_hypertable('sensor_readings', 'time', chunk_time_interval =>INTERVAL'1 day');-- Index is created per-chunk, not globallyCREATEINDEXON sensor_readings (device_id, timeDESC);-- Each daily chunk has 86.4M rows for 1K devices at 1Hz-- Per-chunk index: ~1.9 GB vs ~705 GB global; queries scan only relevant chunks
Cardinality calculation: For 1,000 devices at 1 Hz:
Daily rows: 1,000 x 86,400 = 86.4 million rows/day
Annual rows: 86.4M x 365 = 31.5 billion rows/year
Global B-tree index (annual): ~24 bytes/entry x 31.5B = ~705 GB index overhead
Chunked index (365 daily chunks): each chunk indexes 86.4M rows (~1.9 GB), but queries only scan relevant chunks instead of a 705 GB global index
Pitfall: Storing Timezone-Naive Timestamps
The Mistake: Using TIMESTAMP instead of TIMESTAMPTZ (timestamp with timezone), causing silent data corruption when devices or servers change timezones, or when DST transitions occur.
Why It Happens: TIMESTAMP (without timezone) is shorter to type and appears simpler. Developers assume “we’ll always use UTC” but don’t enforce it at the schema level.
The Fix: Always use timezone-aware timestamp types and enforce UTC:
-- WRONG: Timezone-naive timestampCREATETABLE sensor_data (timeTIMESTAMPNOTNULL, -- Ambiguous! device_id TEXT,valueDOUBLEPRECISION);-- CORRECT: Timezone-aware timestampCREATETABLE sensor_data (time TIMESTAMPTZ NOTNULL, -- Stores UTC, displays in session TZ device_id TEXT,valueDOUBLEPRECISION);-- Enforce UTC on insert (PostgreSQL)ALTERDATABASE iot_db SET timezone TO'UTC';
Storage difference: Both TIMESTAMP and TIMESTAMPTZ use 8 bytes in PostgreSQL–there’s no storage penalty for timezone awareness.
9.6 Worked Example: Building Management System
TimescaleDB Sizing for a 500-Sensor Smart Office Building
Scenario: A commercial property manager deploys 500 sensors across a 20-floor office building – temperature, humidity, CO2, occupancy, and lighting – reporting every 30 seconds. Data must be retained for regulatory compliance (energy audits require 3 years of history). Budget: $200/month for database hosting.
Step 1: Calculate raw data volume
Metric
Value
Sensors
500
Readings per sensor per day
86,400 / 30 = 2,880
Total readings per day
500 x 2,880 = 1,440,000
Bytes per reading
~120 bytes (timestamp 8B + device_id 20B + 5 sensor values x 8B + quality flag 2B + overhead 50B)
Raw data per day
1,440,000 x 120 = 173 MB/day
Raw data per year
173 MB x 365 = 63 GB/year
Raw data for 3 years
189 GB
Step 2: Apply TimescaleDB compression and downsampling
Data Age
Strategy
Storage
0-7 days (uncompressed)
Raw for real-time dashboards
1.2 GB
7-90 days (compressed)
10x compression for debugging/analysis
1.4 GB
90 days - 1 year (5-min aggregates, compressed)
Raw dropped; only aggregates kept
1.05 GB
Years 2-3 (5-min aggregates, compressed)
Aggregates retained for compliance
2.1 GB
Total storage for 3 years: ~5.8 GB (vs. 189 GB uncompressed = 97% reduction)
Step 3: Configure retention policy (SQL)
-- Raw data: keep 7 days uncompressed for real-time dashboardsSELECT add_compression_policy('sensor_readings', INTERVAL'7 days');-- Continuous aggregate: 5-minute summaries for trend analysisCREATEMATERIALIZEDVIEW readings_5minWITH (timescaledb.continuous) ASSELECT device_id, time_bucket('5 minutes', time) AS bucket,AVG(temperature) AS avg_temp, AVG(humidity) AS avg_humidity,AVG(co2) AS avg_co2, MAX(occupancy) AS max_occupancyFROM sensor_readingsGROUPBY device_id, bucket;-- Downsample: drop raw data after 90 days, keep 5-min aggregates 3 yearsSELECT add_retention_policy('sensor_readings', INTERVAL'90 days');SELECT add_retention_policy('readings_5min', INTERVAL'3 years');
Step 4: Select hosting tier
Provider
Tier
Storage
CPU/RAM
Cost/month
Timescale Cloud
Dev
10 GB
0.5 CPU, 2 GB
$29
Timescale Cloud
Standard
25 GB
1 CPU, 4 GB
$69
AWS RDS (self-managed)
db.t3.medium
50 GB GP3
2 vCPU, 4 GB
$85
Self-hosted (Hetzner)
CX31
80 GB NVMe
2 vCPU, 8 GB
$12
Winner: Timescale Cloud Dev at $29/month – 5.8 GB fits within 10 GB allocation, managed backups included, well within $200 budget.
Step 5: Verify query performance
Query
Unpartitioned (189 GB)
With Hypertable + Compression
Last 24h for 1 sensor
45 seconds
12 ms (scans 1 daily chunk)
Weekly average for all sensors
8 minutes
85 ms (reads continuous aggregate)
3-year trend for 1 floor
4 minutes
350 ms (reads downsampled aggregate)
Key insight: The combination of hypertable partitioning, compression, and continuous aggregates turns a 189 GB database into 5.8 GB while making queries 1,000x faster. The regulatory 3-year retention requirement, which initially seemed expensive, costs only $29/month with proper time-series architecture.
Key Takeaway
Time-series databases achieve dramatic performance improvements through automatic time-based partitioning (queries only scan relevant chunks), built-in compression (90-95% storage reduction), native retention policies (instant data deletion by dropping entire chunks), and continuous aggregates (pre-computed dashboards in milliseconds). Always use timezone-aware timestamps and partition-local indexes to avoid performance pitfalls.
9.7 Knowledge Check Quizzes
Test your understanding of time-series database concepts, from architecture selection to query optimization.
Quiz 2: Indexing and Optimization
Quiz 3: Retention Policies
Quiz 4: TimescaleDB Hypertables
Quiz 5: Troubleshooting Query Performance
For Kids: Meet the Sensor Squad!
A time-series database is like a super-organized diary that remembers everything by WHEN it happened!
9.7.1 The Sensor Squad Adventure: The Time Machine Filing System
The Sensor Squad was drowning in data. Every sensor sent a new reading every single second, and after a year they had BILLIONS of entries!
“Finding last Tuesday’s temperature takes FOREVER!” groaned Sammy the Sensor. “The regular filing cabinet has to search through EVERY SINGLE entry!”
Max the Microcontroller had a brilliant idea: “What if we organize everything by TIME, like a diary?”
Max built a special filing system called a Hypertable:
Monday’s readings go in Monday’s drawer
Tuesday’s readings go in Tuesday’s drawer
Wednesday’s readings… you get the idea!
“Now when someone asks for Tuesday’s temperature,” said Max, “I only open TUESDAY’S drawer instead of searching through the whole year! That’s 365 times faster!”
Bella the Battery added another trick: “Old drawers get SQUISHED DOWN really small. We can fit a whole week into the space that used to hold just ONE day!”
“That’s compression!” said Lila the LED. “We saved 90% of our storage space!”
And the best part? When data got really old, they could just THROW AWAY an entire drawer in one second, instead of erasing entries one by one (which took hours).
“Our Time Machine Filing System is AMAZING!” cheered the Squad. “Fast searches, tiny storage, and easy cleanup!”
9.7.2 Key Words for Kids
Word
What It Means
Time-Series
Data organized by when it happened – like entries in a diary
Hypertable
A special table that automatically splits data by time – like a filing cabinet with a drawer for each day
Compression
Squishing data to take up less space – like vacuum-packing clothes to fit more in a suitcase
Retention Policy
Rules for when to delete old data – like cleaning out your closet at the end of each season
Match Time-Series Concepts to Their Functions
Key Concepts
TimescaleDB: A PostgreSQL extension that transforms standard tables into time-partitioned hypertables, adding automatic chunk management, columnar compression, and continuous aggregates while preserving full SQL compatibility
InfluxDB: A purpose-built time-series database using a custom query language (Flux), optimized for extremely high cardinality data and providing built-in alerting and downsampling policies
Hypertable: A TimescaleDB table automatically partitioned into fixed-duration time chunks, enabling parallel queries, chunk exclusion for time-range queries, and per-chunk compression policies
Compression Policy: An automated background process that converts older time chunks from row format to columnar format, typically achieving 10-20x compression ratios on IoT telemetry with repeated float values
Continuous Aggregate: A TimescaleDB materialized view that automatically refreshes pre-computed aggregations as new raw data arrives, enabling sub-second dashboard queries over billions of historical rows
Retention Policy: A TimescaleDB or InfluxDB rule that automatically drops chunks older than a configured threshold, preventing unbounded storage growth while optionally cascading data to cold object storage
Chunk Exclusion: Query optimization where the planner uses time range predicates to skip chunks outside the query window, dramatically reducing I/O for recent-data queries on large hypertables
Downsampling: The process of replacing high-resolution raw data (1-second readings) with lower-resolution aggregates (1-minute averages) for long-term retention, preserving trend visibility while reducing storage by 60x
Label the Diagram
💻 Code Challenge
9.8 Summary
Time-series databases provide 10-100x performance improvement over generic SQL for sensor data
TimescaleDB hypertables automatically partition data by time, enabling efficient queries and retention
Compression reduces storage by 90-95% for older data
Continuous aggregates pre-compute dashboards for millisecond response times
Proper indexing with time-partitioned tables avoids cardinality explosion
Timezone-aware timestamps prevent silent data corruption
9.9 Concept Relationships
See Also
Prerequisites - Read these first: - Database Selection Framework - Understanding when to use time-series databases - Big Data Overview - Context for high-velocity IoT data challenges
Related Concepts - Explore these next: - Time-Series Fundamentals - Deep dive into TSDB architecture (LSM trees, columnar storage) - Time-Series Platforms - Comparing InfluxDB, TimescaleDB, Prometheus - Data Quality Monitoring - Quality metrics specific to time-series data
Practical Applications:
Worked Examples - Fleet management, smart city data lake, and query performance examples