Implement efficient schemas for sensor data storage
Optimize queries with continuous aggregates and indexing
Configure retention policies and compression for cost optimization
Avoid common pitfalls in time-series database design
1278.1 Introduction
Time-series databases are purpose-built for storing and querying timestamped data–the dominant pattern in IoT sensor telemetry. They achieve 10-100x better performance than generic SQL databases for time-series workloads through specialized optimizations.
1278.2 Time-Series Database Benefits
Feature
Time-Series DB
Generic SQL
Write throughput
100K-1M/sec
10K-50K/sec
Time-range queries
Optimized (partition pruning)
Full table scan
Compression
90-95% automatic
Manual, limited
Retention policies
Built-in
Custom scripts
Continuous aggregates
Native
Materialized views
1278.3 Interactive: IoT Storage Calculator
Use this calculator to estimate storage requirements for your IoT deployment:
TipStorage Estimator
Show code
viewof numSensors = Inputs.range([10,10000], {value:100,step:10,label:"Number of Sensors"})viewof readingsPerHour = Inputs.range([1,3600], {value:60,step:1,label:"Readings per Hour (per sensor)"})viewof bytesPerReading = Inputs.range([10,1000], {value:50,step:10,label:"Bytes per Reading"})viewof retentionDays = Inputs.range([1,365], {value:30,step:1,label:"Retention Period (Days)"})
storageGrowthData = {const data = [];for (let day =1; day <= retentionDays; day +=Math.ceil(retentionDays /30)) { data.push({day: day,storage_gb: (storageCalc.dailyGB* day) }); }return data;}Plot.plot({title:"Storage Growth Over Time",width:640,height:300,x: {label:"Days",grid:true},y: {label:"Storage (GB)",grid:true},marks: [ Plot.line(storageGrowthData, {x:"day",y:"storage_gb",stroke:"#16A085",strokeWidth:2}), Plot.dot(storageGrowthData, {x:"day",y:"storage_gb",fill:"#2C3E50",r:4}) ]})
Show code
{const container =document.getElementById('kc-db-9');if (container &&typeof InlineKnowledgeCheck !=='undefined') { container.innerHTML=''; container.appendChild(InlineKnowledgeCheck.create({question:"You're estimating storage for a smart city deployment: 5,000 sensors, each reporting 200 bytes every 30 seconds, with 1-year retention. What is the approximate annual storage requirement BEFORE compression?",options: [ {text:"10 GB - a single server can handle this easily",correct:false,feedback:"Let's calculate: 5,000 sensors x 2 readings/min x 60 min x 24 hr x 365 days x 200 bytes = 5,256 billion bytes = ~5.3 TB per year."}, {text:"500 GB - a moderately sized database",correct:false,feedback:"This underestimates by 10x. At 5,000 sensors with 2 readings/minute, you're generating significant data volume."}, {text:"5 TB - requires careful capacity planning",correct:true,feedback:"Correct! 5,000 sensors x 2/min x 60 x 24 x 365 x 200 bytes = ~5.3 TB. With 90% compression (TimescaleDB), this reduces to ~530 GB. Always calculate raw storage first, then apply compression factor."}, {text:"50 TB - needs distributed storage system",correct:false,feedback:"This overestimates by 10x. At 200 bytes per reading and 30-second intervals, the calculation yields ~5 TB before compression."} ],difficulty:"easy",topic:"iot-databases" })); }}
1278.4 TimescaleDB: PostgreSQL for Time-Series
TimescaleDB extends PostgreSQL with hypertables–automatically partitioned tables optimized for time-series data.
1278.4.1 Hypertable Benefits
Feature
Standard Table
Hypertable
Query recent data
Full table scan
Scan 1 chunk
Delete old data
Slow DELETE
Instant DROP chunk
Parallel processing
Limited
Multi-chunk concurrent
Index management
One large index
Per-chunk indexes
1278.4.2 Creating a Hypertable
-- Main sensor readings tableCREATETABLE sensor_readings (time TIMESTAMPTZ NOTNULL, device_id VARCHAR(50) NOTNULL,valueDOUBLEPRECISION, quality SMALLINTDEFAULT100);-- Convert to hypertable with 1-day chunksSELECT create_hypertable('sensor_readings', 'time', chunk_time_interval =>INTERVAL'1 day');-- Index for common queriesCREATEINDEXON sensor_readings (device_id, timeDESC);
1278.4.3 Compression Policy
-- Enable compressionALTERTABLE sensor_readings SET ( timescaledb.compress, timescaledb.compress_segmentby ='device_id');-- Compress data older than 7 daysSELECT add_compression_policy('sensor_readings', INTERVAL'7 days');
Compression results: 90-95% storage reduction for typical sensor data.
1278.4.4 Retention Policy
-- Automatically drop chunks older than 90 daysSELECT add_retention_policy('sensor_readings', INTERVAL'90 days');
Why it’s fast: Dropping entire chunks is a metadata operation (milliseconds), not row-by-row deletion (hours for millions of rows).
1278.4.5 Continuous Aggregates
Pre-compute common aggregations for dashboard queries:
-- Create hourly summaryCREATEMATERIALIZEDVIEW hourly_summaryWITH (timescaledb.continuous) ASSELECT device_id, time_bucket('1 hour', time) AShour,AVG(value) AS avg_value,MIN(value) AS min_value,MAX(value) AS max_value,COUNT(*) AS reading_countFROM sensor_readingsGROUPBY device_id, hour;-- Auto-refresh every 30 minutesSELECT add_continuous_aggregate_policy('hourly_summary', start_offset =>INTERVAL'1 day', end_offset =>INTERVAL'1 hour', schedule_interval =>INTERVAL'30 minutes');
Performance impact: Dashboard queries go from 45 seconds (500M rows) to 10ms (hourly aggregates).
1278.5 Common Pitfalls
CautionPitfall: Underestimating Cardinality in Time-Series Indexes
The Mistake: Creating composite indexes on (device_id, timestamp) without considering that each unique device creates a separate index entry per time interval, leading to index bloat that exceeds the data itself.
Why It Happens: Developers apply relational database indexing patterns to time-series data. In a traditional database, indexing a user_id column with 100,000 users creates 100,000 index entries. But in time-series, indexing (device_id, timestamp) for 100,000 devices with 1-second resolution creates 100,000 x 86,400 = 8.64 billion entries per day.
The Fix: Use time-partitioned tables with partition-local indexes:
-- PostgreSQL/TimescaleDB: Partition by time, index within partitionsCREATETABLE sensor_readings (time TIMESTAMPTZ NOTNULL, device_id TEXT NOTNULL,valueDOUBLEPRECISION);-- TimescaleDB automatically partitions by timeSELECT create_hypertable('sensor_readings', 'time', chunk_time_interval =>INTERVAL'1 day');-- Index is created per-chunk, not globallyCREATEINDEXON sensor_readings (device_id, timeDESC);-- Each daily chunk has ~8.64M rows for 100K devices at 1Hz-- Index per chunk: ~200 MB vs global index: ~73 GB for 1 year
Cardinality calculation: For 100,000 devices at 1 Hz: - Daily rows: 100,000 x 86,400 = 8.64 billion rows/year - Global B-tree index: ~8 bytes/entry x 8.64B = 69 GB index overhead - Chunked index (365 daily chunks): 365 x 200 MB = 73 GB, but queries only scan relevant chunks
CautionPitfall: Storing Timezone-Naive Timestamps
The Mistake: Using TIMESTAMP instead of TIMESTAMPTZ (timestamp with timezone), causing silent data corruption when devices or servers change timezones, or when DST transitions occur.
Why It Happens: TIMESTAMP (without timezone) is shorter to type and appears simpler. Developers assume “we’ll always use UTC” but don’t enforce it at the schema level.
The Fix: Always use timezone-aware timestamp types and enforce UTC:
-- WRONG: Timezone-naive timestampCREATETABLE sensor_data (timeTIMESTAMPNOTNULL, -- Ambiguous! device_id TEXT,valueDOUBLEPRECISION);-- CORRECT: Timezone-aware timestampCREATETABLE sensor_data (time TIMESTAMPTZ NOTNULL, -- Stores UTC, displays in session TZ device_id TEXT,valueDOUBLEPRECISION);-- Enforce UTC on insert (PostgreSQL)ALTERDATABASE iot_db SET timezone TO'UTC';
Storage difference: Both TIMESTAMP and TIMESTAMPTZ use 8 bytes in PostgreSQL–there’s no storage penalty for timezone awareness.
1278.6 Knowledge Check Quizzes
NoteQuiz 1: Time-Series Databases
Question 1: Which database type is MOST appropriate for storing time-series sensor data with high write throughput requirements?
Time-series databases are purpose-built for sensor data: optimized for timestamp-ordered inserts, automatic downsampling, efficient time-range queries, and data retention policies. They achieve 10-20x better write performance than generic SQL for time-series workloads.
NoteQuiz 2: Indexing and Optimization
Question 2: An IoT application queries sensor data primarily by device_id and timestamp. Which indexing strategy provides the best query performance?
Composite index (device_id, timestamp) optimizes queries like WHERE device_id = 'sensor1' AND timestamp > '2024-01-01'. Database can use the index for both filters in a single lookup. Order matters: leading column (device_id) enables index use, then timestamp for range scans.
NoteQuiz 3: Retention Policies
Question 3: A smart city deploys 10,000 sensors generating 1 reading/sec each. After 1 year, the database has 315 billion records. Which retention strategy is most practical?
Multi-tier retention balances detail and cost by keeping raw data only when it is most useful (debugging and model training) and downsampling older data for trends.
Record counts (10,000 sensors at 1 Hz): - Raw forever: ~315B rows/year - Raw for 30 days: ~25.9B rows - Hourly aggregates for 1 year: ~87.6M rows - Daily aggregates: ~3.65M rows/year
At a 10-year horizon, the policy stores roughly 26B rows versus ~3.15T rows if you retained all raw data (~120x more).
NoteQuiz 4: TimescaleDB Hypertables
Question 4: What is the primary advantage of TimescaleDB hypertables over standard PostgreSQL tables for IoT data?
Hypertables automatically partition data into time-based chunks (e.g., daily or weekly). Benefits: 1. Query optimization: Time-range queries only scan relevant chunks 2. Efficient retention: Drop old chunks instantly vs slow DELETE 3. Parallel processing: Multiple chunks processed concurrently
Example: 1 billion records over 365 days. Standard table: Full scan for recent data (slow). Hypertable with daily chunks: Scan 1 chunk (2.7M records) = 365x faster!
NoteQuiz 5: Troubleshooting Query Performance
Question 5: A temperature sensor table has 100 million records. A query for data from the last 24 hours scans the entire table. What’s the most likely problem?
Symptom: Query WHERE timestamp > NOW() - INTERVAL '1 day' performs full table scan.
Diagnosis: 1. No timestamp index: Database must read all 100M rows. Solution: CREATE INDEX ON sensor_data(timestamp) 2. Outdated statistics: Query planner doesn’t know data distribution. Solution: ANALYZE sensor_data
After fixing: Index seeks directly to relevant records. 1 day of data (~100k records) vs 100M total = 1000x less data scanned.
1278.7 Summary
Time-series databases provide 10-100x performance improvement over generic SQL for sensor data
TimescaleDB hypertables automatically partition data by time, enabling efficient queries and retention
Compression reduces storage by 90-95% for older data