12  ::: {style=“overflow-x: auto;”}

title: “Time-Series Database Platforms” difficulty: intermediate —

12.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Compare InfluxDB, TimescaleDB, and Prometheus architectures and use cases
  • Evaluate platform trade-offs for specific IoT deployment requirements
  • Select the right time-series database for your application domain
  • Write basic queries in Flux (InfluxDB), SQL (TimescaleDB), and PromQL (Prometheus)
  • Differentiate the licensing, clustering, and ecosystem characteristics across platforms
In 60 Seconds

The major time-series platforms for IoT – InfluxDB, TimescaleDB, QuestDB, and Apache IoTDB – each optimize for different trade-offs: cardinality limits, query language familiarity, edge deployment constraints, and integration ecosystem. InfluxDB excels at high-cardinality cloud deployments; TimescaleDB suits teams wanting SQL on PostgreSQL; QuestDB delivers maximum raw ingest speed; Apache IoTDB targets industrial IoT with hierarchical device modeling. Match platform to workload before committing to a schema, as migration between platforms is costly.

12.2 Key Concepts

  • InfluxDB 3.0: The latest InfluxDB version using Apache Arrow columnar storage and Parquet persistence, enabling SQL queries via Flight SQL while maintaining InfluxDB’s high-cardinality time-series strengths
  • Apache IoTDB: An open-source time-series database designed for industrial IoT with a hierarchical path-based data model (root.factory.line.machine.sensor), optimized for billions of time-series from physical assets
  • Prometheus: A pull-based monitoring time-series database optimized for cloud-native infrastructure metrics, using a scrape model where the server pulls metrics from instrumented services at configured intervals
  • OpenTSDB: A time-series database built on Apache HBase, enabling horizontal scaling to billions of data points by leveraging HBase’s distributed architecture – suited for large-scale IoT requiring HBase ecosystem integration
  • Grafana: A multi-source visualization platform that connects to virtually all time-series databases via plugins, providing dashboards, alerting, and annotation layers without requiring migration between databases
  • Telegraf: An agent-based data collection framework by InfluxData that collects metrics from hundreds of input sources (MQTT, Modbus, OPC-UA) and writes to multiple output destinations including InfluxDB and Prometheus
  • Flux Query Language: InfluxDB’s functional data scripting language enabling complex transformations, joins, and user-defined functions on time-series data – more powerful than InfluxQL but requires a learning curve
  • Vector Clock: A logical timestamp mechanism used in distributed time-series platforms to track causality between writes across replicas, enabling conflict detection without requiring perfectly synchronized physical clocks

12.3 MVU: Minimum Viable Understanding

Core concept: Choose InfluxDB for maximum write throughput (500K+ pts/sec), TimescaleDB when you need SQL compatibility and joins with business data, or Prometheus for Kubernetes-native infrastructure monitoring. Why it matters: The wrong platform choice locks you into limitations–InfluxDB cannot join sensor data with customer records, Prometheus only retains 15-30 days, and TimescaleDB cannot match InfluxDB’s raw write speed. Key takeaway: Start with the question “Do I need to join time-series data with relational business data?” If yes, use TimescaleDB; if no and write volume exceeds 300K pts/sec, use InfluxDB; for infrastructure monitoring with alerting, use Prometheus.

A time-series database is specially designed for data with timestamps, which is exactly what IoT sensors produce. Think of it as a diary optimized for entries like ‘temperature at 10:00 AM, temperature at 10:01 AM.’ Regular databases can store this too, but time-series databases are purpose-built to handle millions of such entries per second and answer time-based questions instantly.

12.4 Introduction

Choosing the right time-series database is one of the most consequential architectural decisions in an IoT project. The database you select determines your write throughput ceiling, query capabilities, integration options, and operational complexity for years to come. Unlike general-purpose databases where migration is painful but feasible, time-series databases embed assumptions about data model, query patterns, and retention strategies that make late-stage switches extremely costly.

This chapter examines three dominant platforms – InfluxDB, TimescaleDB, and Prometheus – through the lens of real IoT deployment requirements. Rather than declaring a “winner,” we provide decision frameworks that match platform strengths to specific use cases.

12.5 Comparing Major Platforms

Estimated time: ~20 min | Difficulty: Advanced | Unit: P10.C15.U03

Three dominant platforms serve different IoT use cases: InfluxDB (pure time-series), TimescaleDB (PostgreSQL compatibility), and Prometheus (monitoring and alerting). Each makes different trade-offs between write performance, query flexibility, and operational simplicity.

12.5.1 InfluxDB: Pure Time-Series Platform

Architecture:

  • Purpose-built time-series database from the ground up
  • Custom query language (Flux) optimized for time-series operations
  • Built-in retention policies and continuous queries (downsampling)
  • Schema-less tags and fields model

Data Model:

measurement: temperature
tags: {sensor_id=temp_001, location=building_a, floor=3}
fields: {value=23.5, battery=85}
timestamp: 2025-12-15T10:00:00Z
  • Measurement: Like a table (e.g., “temperature”, “humidity”)
  • Tags: Indexed metadata (dimensions for grouping)
  • Fields: Actual values (not indexed)
  • Timestamp: Automatic time index

Strengths:

  • Extreme write performance: 500,000+ points/second single node
  • Excellent compression: 20:1 to 30:1 typical
  • Built-in downsampling: Automatic aggregation to lower resolutions
  • Time-aware functions: derivative(), moving_average(), rate()
  • Retention policies: Automatic data expiration

Trade-offs:

  • Custom query language (learning curve vs. SQL)
  • No joins with external data (pure time-series only)
  • Enterprise features require paid license (clustering, high availability)

Ideal Use Case:

High-velocity IoT metrics collection where all data is time-series and you need maximum write performance.

Example Flux Query:

from(bucket: "iot_sensors")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "temperature")
  |> filter(fn: (r) => r["location"] == "building_a")
  |> aggregateWindow(every: 5m, fn: mean)
  |> yield(name: "mean")

12.5.2 TimescaleDB: PostgreSQL-Compatible Time-Series

Architecture:

  • Extension to PostgreSQL (inherits full SQL support)
  • Automatic time-based chunking (hypertables)
  • Continuous aggregates for downsampling
  • Full PostgreSQL ecosystem compatibility

Data Model:

CREATE TABLE sensor_data (
  time        TIMESTAMPTZ NOT NULL,
  sensor_id   TEXT NOT NULL,
  location    TEXT,
  temperature DOUBLE PRECISION,
  humidity    DOUBLE PRECISION
);

SELECT create_hypertable('sensor_data', 'time');
  • Hypertable: Abstraction that automatically partitions by time
  • Chunks: Underlying tables (e.g., data_2025_12_week1, data_2025_12_week2)
  • Standard SQL: Use normal PostgreSQL queries

Strengths:

  • SQL compatibility: Existing PostgreSQL knowledge applies
  • Joins with relational data: Combine time-series with user accounts, device metadata
  • Rich ecosystem: Existing PostgreSQL tools (pgAdmin, Grafana, etc.)
  • Full ACID compliance: Transactions when needed
  • Good compression: 10:1 to 15:1 typical

Trade-offs:

  • Lower write throughput than pure time-series DBs (200,000 points/second)
  • More complex setup (PostgreSQL tuning required)
  • Heavier resource usage (PostgreSQL overhead)

Ideal Use Case:

IoT systems that need to correlate time-series data with business data (customer accounts, product catalogs, billing information).

Example SQL Query:

SELECT
  time_bucket('5 minutes', time) AS bucket,
  sensor_id,
  AVG(temperature) as avg_temp,
  MAX(temperature) as max_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '1 hour'
  AND location = 'building_a'
GROUP BY bucket, sensor_id
ORDER BY bucket DESC;

12.5.3 Prometheus: Monitoring and Alerting Platform

Architecture:

  • Pull-based metric collection (scrapes exporters)
  • Local time-series database (not designed for long-term storage)
  • Built-in alerting engine (Alertmanager)
  • Kubernetes-native integration

Data Model:

http_requests_total{method="GET", endpoint="/api/sensors", status="200"} 1547 @ 1639564800
  • Metric name: http_requests_total
  • Labels: {method="GET", endpoint="/api/sensors", status="200"}
  • Value: 1547
  • Timestamp: 1639564800 (Unix timestamp)

Strengths:

  • Pull-based: Resilient to client failures (server controls collection)
  • Service discovery: Automatically discovers targets in Kubernetes
  • Built-in alerting: Define rules, route to on-call engineers
  • Powerful query language: PromQL optimized for monitoring
  • Kubernetes standard: De facto observability for containerized systems

Trade-offs:

  • Short-term storage: Typically 15-30 days (not for long-term IoT data)
  • No built-in downsampling: Use external systems (Thanos, Cortex)
  • Pull model limitation: Requires network access to scrape targets
  • No long-term storage design: Excellent short-term compression (~1.37 bytes/sample via Gorilla encoding), but not designed for multi-year retention

Ideal Use Case:

Infrastructure monitoring, Kubernetes observability, short-term alerting on IoT gateway health.

Example PromQL Query:

rate(sensor_readings_total{location="building_a"}[5m])

12.5.4 Platform Comparison Table

Feature InfluxDB TimescaleDB Prometheus
Write/Ingestion Rate 500K+ pts/sec (push) 200-300K pts/sec (push) ~10M samples/sec (pull-based scraping)
Query Language Flux (custom) SQL (PostgreSQL) PromQL (custom)
Compression 20:1 to 30:1 10:1 to 15:1 ~12:1 (Gorilla double-delta encoding)
Built-in Downsampling Yes Yes (continuous aggregates) No (external)
Long-term Storage Excellent Excellent Limited (15-30 days)
Joins with Relational Data No Yes (full PostgreSQL) No
Alerting Via Kapacitor Via external tools Built-in (Alertmanager)
Best For Pure IoT time-series IoT + business data Infrastructure monitoring
Open Source License MIT (v2.x OSS) Apache 2.0 Apache 2.0
Clustering/HA Enterprise only PostgreSQL replication External (Thanos/Cortex)

12.5.5 Selection Decision Tree

Decision tree diagram for selecting between InfluxDB, TimescaleDB, and Prometheus time-series databases
Figure 12.1: Time Series Database Selection Decision Tree
Side-by-side comparison of InfluxDB, TimescaleDB, and Prometheus showing strengths and trade-offs for IoT deployments
Figure 12.2: Platform Comparison: InfluxDB vs TimescaleDB vs Prometheus strengths and trade-offs

This view maps IoT application domains to optimal database choices:

Decision matrix mapping IoT application domains like industrial, smart building, fleet tracking, and agriculture to recommended database platforms

Different IoT domains have distinct requirements that align with specific database strengths.

Layered architecture diagram of a time-series database showing query parsing, indexing, LSM tree storage engine, and compression layers
Figure 12.3: Time-Series Database Internal Architecture: From query parsing through indexing to LSM storage and compression

Scenario: Test 100,000 sensors writing 1 reading/second to measure actual throughput and resource usage.

Test Setup:

  • Hardware: 3-node cluster, 16 vCPU, 64 GB RAM per node
  • Workload: 100,000 points/second sustained for 1 hour
  • Measurement: Write latency P50/P95/P99, CPU, RAM, disk I/O

InfluxDB Results:

Write throughput: 520,000 points/sec achieved
Latency P50: 2.3 ms
Latency P95: 8.1 ms
Latency P99: 15.2 ms
CPU utilization: 45% average
RAM usage: 28 GB (stable)
Disk write: 85 MB/sec (compressed)
Compression ratio: 22:1

TimescaleDB Results:

Write throughput: 245,000 points/sec achieved
Latency P50: 4.1 ms
Latency P95: 12.5 ms
Latency P99: 24.3 ms
CPU utilization: 62% average
RAM usage: 42 GB (shared_buffers + connections)
Disk write: 180 MB/sec (before compression)
Compression ratio: 12:1 (after compression policy)

Analysis:

Metric InfluxDB TimescaleDB Winner Reasoning
Raw throughput 520K pts/sec 245K pts/sec InfluxDB Purpose-built for writes
Latency P99 15.2 ms 24.3 ms InfluxDB More predictable
Compression 22:1 12:1 InfluxDB Better for storage cost
CPU efficiency 45% 62% InfluxDB Lower overhead
SQL compatibility No Yes TimescaleDB Easier adoption
Geo-spatial queries No Yes (PostGIS) TimescaleDB Critical for fleet tracking

Decision for 100K sensor fleet:

Key Insight: Benchmark YOUR workload on YOUR hardware. Marketing claims don’t capture real-world trade-offs. :::

Step 1: Do you need SQL joins with business data?

  • Yes → TimescaleDB (PostgreSQL compatible)
  • No → Continue to Step 2

Step 2: Is write volume > 300K points/second?

  • Yes → InfluxDB (maximum throughput)
  • No → Continue to Step 3

Step 3: Are you monitoring Kubernetes infrastructure?

  • Yes → Prometheus (native K8s integration)
  • No → Continue to Step 4

Step 4: Do you need geospatial queries?

  • Yes → TimescaleDB with PostGIS
  • No → Continue to Step 5

Step 5: Is your team already SQL-expert?

  • Yes → TimescaleDB (familiar queries)
  • No → InfluxDB (purpose-built simplicity)

Real-World Example Decisions:

Use Case | Write Vol | Joins? | Geo? | K8s? | Choice | Reasoning |

|———-|———–|——–|——|——|————|———–| | Fleet tracking | 150K/s | Yes | Yes | No | TimescaleDB | Geo + joins essential | | Factory metrics | 500K/s | No | No | No | InfluxDB | Pure speed needed |

Container monitoring | 200K/s | No | No | Yes | Prometheus | K8s native |
Smart building | 50K/s | Yes | No | No | TimescaleDB | Join with tenant data |
Weather stations | 10K/s | No | Yes | No | TimescaleDB | Geo queries for maps |

Key Insight: No “best” database exists. TimescaleDB wins most IoT use cases due to SQL + geo + joins.

12.5.6 Platform Selection Scoring Tool

Use this interactive tool to score your IoT project requirements against each platform. Adjust the sliders to match your deployment needs and see which database is recommended.

Tradeoff: InfluxDB vs TimescaleDB for IoT Deployments

Option A: InfluxDB - Purpose-built time-series database

  • Write throughput: 500K+ points/second (single node)
  • Compression ratio: 20:1 to 30:1 (excellent)
  • Query language: Flux (custom, learning curve)
  • SQL joins: Not supported (pure time-series only)
  • Licensing: MIT (v2.x Community), Enterprise paid for clustering

Option B: TimescaleDB - PostgreSQL with time-series extension

  • Write throughput: 200K-300K points/second (single node)
  • Compression ratio: 10:1 to 15:1 (good)
  • Query language: Standard SQL (familiar to most teams)
  • SQL joins: Full PostgreSQL compatibility
  • Licensing: Apache 2.0 (open source), replication included

Decision Factors:

  • Choose InfluxDB when: Write volume exceeds 300K pts/sec, all data is time-series (no joins needed), team willing to learn Flux, maximum compression is critical (storage-constrained)
  • Choose TimescaleDB when: Need to join sensor data with user accounts/billing/metadata, team has strong SQL skills, using PostgreSQL ecosystem tools (pgAdmin, Grafana PostgreSQL datasource), compliance requires familiar auditable database
  • Cost comparison (5,000 sensors, 1 reading/sec, 1 year, approximate 2025 pricing): InfluxDB Cloud ~$200/month; TimescaleDB Cloud ~$300/month; Self-hosted either ~$150/month (3-node cluster). Check current vendor pricing as cloud costs change frequently.
Pitfall: Tag Cardinality Explosion in InfluxDB

The Mistake: Using high-cardinality values as tags (device UUIDs, user IDs, session tokens) instead of fields, causing memory exhaustion and query performance collapse.

Why It Happens: Tags are indexed for fast filtering, so developers instinctively make every queryable attribute a tag. They don’t realize InfluxDB stores one time-series per unique tag combination in memory. With 100,000 devices as tags, you create 100,000 in-memory series–even if each device has only one reading.

The Fix:

# WRONG: device_id as tag creates 100K series
# (InfluxDB line protocol format)
temperature,device_id=uuid-12345-abcde value=23.5
# High cardinality tag ↑

# CORRECT: device_id as field, use coarse tags
temperature,region=us-west,building=hq value=23.5,device_id="uuid-12345-abcde"
# Low cardinality tags ↑           High cardinality field ↑

Cardinality limits to remember:

  • InfluxDB OSS: ~1 million series before severe degradation
  • InfluxDB Cloud: Charged per series (100K series = ~$50/month overhead)
  • Rule of thumb: Tags should have <10,000 unique values
  • Query by high-cardinality field: Use |> filter(fn: (r) => r.device_id == "uuid-...")
  • Monitor cardinality: influx query 'import "influxdata/influxdb" influxdb.cardinality(bucket: "sensors", start: -30d)'

Try it – Cardinality Impact Calculator:

How quickly does cardinality explode with multiple tags? Consider an IoT deployment with seemingly reasonable tag cardinality:

\[ \begin{aligned} \text{Total series} &= \text{regions} \times \text{buildings} \times \text{floors} \times \text{sensor\_types} \\ &= 10 \times 50 \times 10 \times 4 = 20{,}000\text{ series} \end{aligned} \]

This works. But add device UUID as a tag with 100,000 unique devices:

\[ \begin{aligned} \text{Total series} &= \text{unique tag combinations actually observed} \\ &= 100{,}000\text{ devices} \times 4\text{ sensor types} = 400{,}000\text{ series} \end{aligned} \]

Each device lives in one region/building/floor, so those tags don’t multiply further. But if you also track per-endpoint metrics or add session IDs, cardinality compounds fast. With 100K devices, 4 sensor types, and 5 metric names:

\[ 100{,}000 \times 4 \times 5 = 2{,}000{,}000\text{ series} \]

Exceeds InfluxDB OSS’s ~1M series limit by 2x! Memory overhead: ~0.5-1 KB per series for the in-memory index = 1-2 GB RAM just for the series index, growing with each new tag combination. The fix: use device_id as a field, not a tag, keeping series count at \(4 \times 5 = 20\) per coarse tag group (requires minimal RAM).

Common Pitfalls

InfluxDB dominates search results and tutorials, leading many teams to adopt it for workloads where TimescaleDB (SQL familiarity, complex joins) or QuestDB (maximum ingest speed) would be better fits. Benchmark your specific combination of write rate, cardinality, and query patterns with each platform before committing.

InfluxDB stores each unique tag combination as a separate time-series in an in-memory index. A fleet of 100,000 devices with 50 metrics each creates 5 million unique series – exceeding InfluxDB’s practical cardinality limit and causing memory exhaustion. Model high-cardinality identifiers (device serial numbers) as fields, not tags.

A platform that works perfectly in a cloud VM may be undeployable on a 512 MB RAM edge gateway. InfluxDB’s in-memory cardinality index can consume gigabytes at scale. For edge deployments, evaluate SQLite with time-series extensions, QuestDB’s embedded mode, or purpose-built edge databases like EdgeX Foundry before committing to a cloud-centric platform.

12.6 Summary

Platform selection is one of the most important decisions in time-series database architecture.

Key Takeaways:

  1. InfluxDB: Maximum write throughput (500K+ pts/sec), excellent compression (20-30x), purpose-built for pure time-series. Choose when write volume is extreme and no relational joins are needed.

  2. TimescaleDB: PostgreSQL compatibility with 200-300K pts/sec write capacity. Choose when you need SQL joins with business data or your team has strong SQL skills.

  3. Prometheus: Kubernetes-native monitoring with built-in alerting. Choose for infrastructure monitoring with 15-30 day retention needs.

  4. Avoid cardinality explosion: In InfluxDB, keep tag cardinality under 10,000 unique values. Use fields for high-cardinality identifiers.

  5. Domain alignment matters: Industrial IoT favors InfluxDB, smart buildings favor TimescaleDB, DevOps favors Prometheus. Match platform strengths to your use case.

12.6.1 Storage Estimation by Platform

Estimate your storage requirements across all three platforms. Input your deployment parameters to see monthly storage and approximate costs.

The Sensor Squad needs a special diary – but which one should they choose?

Sammy the Sensor measures temperature every second. After just one day, he has 86,400 readings! He needs a diary to store them all. But there are three different diaries to choose from!

Diary 1: InfluxDB (The Speed Writer) “I can write 500,000 measurements per second!” boasts InfluxDB. “I was born to handle time data, and I compress everything really small.”

Max the Microcontroller likes this: “Perfect for our factory with thousands of sensors all talking at once!”

But Lila the LED notices a problem: “What if we need to look up which CUSTOMER owns each sensor? InfluxDB cannot combine sensor data with customer records easily.”

Diary 2: TimescaleDB (The Swiss Army Knife) “I speak SQL – the language that almost every programmer already knows!” says TimescaleDB. “And I CAN combine sensor data with customer information because I am built on top of PostgreSQL, a regular database.”

“But you are a bit slower at writing,” notes Sammy. “Only 200,000 per second instead of 500,000.”

Diary 3: Prometheus (The Guardian) “I watch over computer systems!” says Prometheus. “I check on your servers every 15 seconds and shout an alarm if anything goes wrong. I come with built-in alert rules!”

Bella the Battery notices: “But you only remember the last 15-30 days. After that, the data is gone!”

The Verdict: “Choose based on what you NEED,” says Max: - Tons of sensors writing super fast? InfluxDB! - Need to mix sensor data with business data? TimescaleDB! - Watching over computers and want alerts? Prometheus!

12.6.2 Try This at Home!

Think about three types of diaries: (1) A quick sticky note pad (fast to write, hard to organize – like InfluxDB), (2) A school notebook with sections (organized, you can find things – like TimescaleDB), (3) A homework planner with alarms (reminds you of deadlines – like Prometheus). Different needs, different tools!

12.7 Concept Relationships

Prerequisites - Read these first: - Time-Series Fundamentals - TSDB architecture (LSM trees, columnar storage) - Database Selection Framework - General selection criteria

Dive Deeper:

Related Concepts:

Practical Applications:

12.8 What’s Next

Now that you can compare and select between InfluxDB, TimescaleDB, and Prometheus, the next steps depend on where you want to deepen your expertise.

If you want to… Read this next
Implement retention policies and downsampling to reduce storage by 95-98% Data Retention and Downsampling
Optimize query performance with indexing and time-bucketing strategies Query Optimization
Practice platform benchmarking with hands-on write performance tests Time-Series Practice
Review the general database selection framework for IoT projects Database Selection Framework
Explore sharding and distributed deployment for your chosen platform Sharding Strategies