%%{init: {'theme': 'base', 'themeVariables': {'primaryColor':'#E8F4F8','primaryTextColor':'#2C3E50','primaryBorderColor':'#16A085','lineColor':'#16A085','secondaryColor':'#FEF5E7','tertiaryColor':'#F4ECF7','edgeLabelBackground':'#ffffff','textColor':'#2C3E50','fontSize':'14px'}}}%%
flowchart TD
A[Query Request] --> B{Time Range}
B -->|Recent<br/><1 hour| C[Query Raw Data<br/>Fast, small scan]
B -->|Medium<br/>1 hour - 7 days| D{Aggregation needed?}
B -->|Old<br/>>7 days| E[Query Downsampled Data<br/>Mandatory]
D -->|Yes| F[Use Continuous Aggregates<br/>Pre-computed]
D -->|No| G[Query Raw Data<br/>Larger scan]
C --> H[Result]
F --> H
G --> H
E --> H
style A fill:#E8F4F8
style C fill:#27AE60,color:#fff
style F fill:#16A085,color:#fff
style E fill:#E67E22,color:#fff
1292 Query Optimization for IoT Time-Series
1292.1 Learning Objectives
By the end of this chapter, you will be able to:
- Write efficient queries for common IoT sensor data patterns
- Implement last-value, time-range, and anomaly detection queries
- Apply query optimization best practices for time-series workloads
- Use pagination effectively for large IoT datasets
- Balance query caching with real-time data freshness requirements
Core Concept: Efficient time-series queries always specify time ranges, use appropriate data granularity, leverage pre-computed aggregates, and limit result sets to what the application can actually render. Why It Matters: A poorly-written dashboard query can scan millions of rows and take 30+ seconds; the same query optimized with time bounds, continuous aggregates, and result limits returns in under 100ms. Key Takeaway: Always constrain queries by time range, use downsampled data for historical analysis, leverage continuous aggregates for dashboard queries, and cache aggressively (most dashboards tolerate 10-60 second staleness).
1292.2 Query Optimization for IoT
Efficient queries are essential for real-time dashboards, anomaly detection, and historical analysis.
Core Concept: Pagination divides large query results into manageable chunks (pages), allowing clients to request data incrementally rather than retrieving millions of records in a single response.
Why It Matters: IoT systems accumulate massive datasets - a single sensor with 1-second readings generates 86,400 records daily. Without pagination, requesting “all temperature readings” for a week returns 600,000+ records, overwhelming client memory, saturating network bandwidth, and causing API timeouts. Well-designed pagination keeps response times consistent regardless of total dataset size.
Key Takeaway: For time-series IoT data, prefer cursor-based pagination using timestamps (?after=2025-01-15T10:30:00Z&limit=1000) over offset-based (?offset=50000&limit=1000) because offset pagination becomes increasingly slow as offsets grow large, while timestamp cursors leverage time-based indexes efficiently.
1292.2.1 Common Query Patterns
1292.2.1.1 Last Value Queries
Scenario: Display current sensor readings on a dashboard.
InfluxDB:
from(bucket: "iot_sensors")
|> range(start: -5m)
|> filter(fn: (r) => r["_measurement"] == "temperature")
|> last()
TimescaleDB:
SELECT DISTINCT ON (sensor_id)
sensor_id,
time,
temperature
FROM sensor_data
WHERE time > NOW() - INTERVAL '5 minutes'
ORDER BY sensor_id, time DESC;Optimization Tips: - Query recent time range only (last 5-10 minutes) - Use indexes on sensor_id + time - Cache results for 5-30 seconds (acceptable staleness)
1292.2.1.2 Time-Range Aggregations
Scenario: Show hourly averages for the last 24 hours.
InfluxDB:
from(bucket: "iot_sensors")
|> range(start: -24h)
|> filter(fn: (r) => r["_measurement"] == "temperature")
|> filter(fn: (r) => r["location"] == "building_a")
|> aggregateWindow(every: 1h, fn: mean)
TimescaleDB:
SELECT
time_bucket('1 hour', time) AS hour,
AVG(temperature) as avg_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
AND location = 'building_a'
GROUP BY hour
ORDER BY hour;Optimization Tips: - Use continuous aggregates/downsampled data if available - Partition tables by time (automatic in TSDBs) - Limit result set size (frontend can’t render 10,000 points)
1292.2.1.3 Anomaly Detection Queries
Scenario: Detect sensors with values 2 standard deviations above mean.
InfluxDB:
from(bucket: "iot_sensors")
|> range(start: -1h)
|> filter(fn: (r) => r["_measurement"] == "temperature")
|> aggregateWindow(every: 5m, fn: mean)
|> movingAverage(n: 12) // 1-hour moving average
|> stddev()
|> filter(fn: (r) => r._value > 2.0)
TimescaleDB:
WITH stats AS (
SELECT
sensor_id,
AVG(temperature) as mean_temp,
STDDEV(temperature) as stddev_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY sensor_id
)
SELECT
s.sensor_id,
s.time,
s.temperature,
st.mean_temp,
(s.temperature - st.mean_temp) / st.stddev_temp as z_score
FROM sensor_data s
JOIN stats st ON s.sensor_id = st.sensor_id
WHERE s.time > NOW() - INTERVAL '1 hour'
AND ABS((s.temperature - st.mean_temp) / st.stddev_temp) > 2.0;Optimization Tips: - Pre-compute statistics in continuous aggregates - Use window functions efficiently (avoid full table scans) - Consider streaming anomaly detection (outside database)
1292.2.1.4 Multi-Sensor Correlations
Scenario: Find when temperature and humidity both exceed thresholds.
TimescaleDB (easier with joins):
SELECT
t.time,
t.sensor_id as temp_sensor,
h.sensor_id as humidity_sensor,
t.temperature,
h.humidity
FROM sensor_data t
JOIN sensor_data h ON
t.time = h.time AND
t.location = h.location
WHERE t.time > NOW() - INTERVAL '1 hour'
AND t.sensor_type = 'temperature'
AND h.sensor_type = 'humidity'
AND t.temperature > 30
AND h.humidity > 80;Optimization Tips: - Co-locate related sensors in same partition (by location) - Use time-bucketing to reduce join complexity - Consider denormalized schemas (store temp+humidity in same row)
1292.2.2 Query Performance Best Practices
This view shows a step-by-step decision process for optimizing time-series queries:
%% fig-alt: "Query optimization decision tree showing step-by-step process. Start with query request, first check if time range specified - if no, add explicit bounds. Then check data volume - if over 10000 points, apply downsampling. Check if aggregation is pre-computed - if yes use continuous aggregate, if no compute on-the-fly. Finally check if result is cacheable - if dashboard query cache for 30 seconds, if ad-hoc return directly. Shows performance impact at each decision point."
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor':'#E8F4F8','primaryTextColor':'#2C3E50','primaryBorderColor':'#16A085','lineColor':'#16A085','secondaryColor':'#FEF5E7','tertiaryColor':'#F4ECF7','edgeLabelBackground':'#ffffff','textColor':'#2C3E50','fontSize':'14px'}}}%%
flowchart TD
Start[Query Request] --> TimeCheck{Time range<br/>specified?}
TimeCheck -->|No| AddBounds[Add explicit<br/>time bounds]
TimeCheck -->|Yes| VolumeCheck{Data volume<br/>> 10K points?}
AddBounds --> VolumeCheck
VolumeCheck -->|Yes| Downsample[Apply<br/>downsampling]
VolumeCheck -->|No| AggCheck{Pre-computed<br/>aggregate?}
Downsample --> AggCheck
AggCheck -->|Yes| UseCA[Use continuous<br/>aggregate<br/>10x faster]
AggCheck -->|No| Compute[Compute<br/>on-the-fly]
UseCA --> CacheCheck{Cacheable<br/>result?}
Compute --> CacheCheck
CacheCheck -->|Dashboard| Cache[Cache 30s<br/>Reduce load]
CacheCheck -->|Ad-hoc| Return[Return<br/>directly]
Cache --> Return
style Start fill:#E8F4F8,stroke:#2C3E50
style AddBounds fill:#E67E22,stroke:#2C3E50,color:#fff
style Downsample fill:#E67E22,stroke:#2C3E50,color:#fff
style UseCA fill:#27AE60,stroke:#2C3E50,color:#fff
style Cache fill:#16A085,stroke:#2C3E50,color:#fff
style Return fill:#2C3E50,stroke:#16A085,color:#fff
Following this decision tree systematically can improve query performance by 10-100x.
Key Principles:
- Always specify time ranges: Never query unbounded time
- Use appropriate granularity: Don’t fetch 1-second data for a 1-year chart
- Leverage indexes: Filter on indexed tags/columns first
- Limit result sets: Return max 1,000-10,000 points (downsample if needed)
- Pre-aggregate when possible: Use continuous aggregates for dashboards
- Cache aggressively: Most dashboards tolerate 10-60 second staleness
Option A: Aggressive Caching (30-60 second TTL) - Query latency: 1-5ms (cache hit), 50-500ms (cache miss) - Database load: 10-50 QPS (reduced from 500+ QPS) - Infrastructure cost: $200-500/month for Redis cache layer - Data freshness: 30-60 seconds stale (acceptable for dashboards) - Cache hit rate: 80-95% for dashboard queries - Best for: Real-time dashboards with 5+ users, aggregation-heavy queries
Option B: Real-Time Queries (No Caching) - Query latency: 50-500ms consistent (depends on data volume) - Database load: 500+ QPS for active dashboards - Infrastructure cost: $1,000-3,000/month for larger database instance - Data freshness: Sub-second (true real-time) - Cache hit rate: N/A - Best for: Alerting systems, control loops, single-user ad-hoc analysis
Decision Factors: - Choose Aggressive Caching when: Dashboard refreshes every 10-60 seconds anyway, multiple users view same data (shared cache), query involves expensive aggregations (time_bucket, percentiles), database is a cost bottleneck - Choose Real-Time when: Sub-second freshness is critical (safety systems, trading), each user queries unique data (no cache benefit), query latency is already <50ms, system is event-driven (WebSocket push vs polling) - Hybrid approach: Cache aggregations (hourly stats) aggressively; serve latest readings directly from database - balances freshness for critical metrics with efficiency for historical views
1292.3 Interactive Demonstration
Explore how sampling intervals, aggregation windows, and downsampling affect IoT time-series data storage and visualization. This simulation generates 24 hours of temperature sensor data and demonstrates key time-series database concepts.
Try These Experiments:
- See the impact of sampling rate: Set interval to “1 second”, then “1 hour” - notice the 3,600x difference in points
- Compare aggregation types: With “1 hour” aggregation, toggle between mean/max/min to see how each preserves different signal characteristics
- Understand downsampling: Set aggregation to “15 minutes”, then increase downsample ratio - notice how storage drops but visual quality degrades
- Visualize retention policy: Enable the retention toggle to see how multi-tier storage reduces total footprint by 95%+
1292.4 Summary
Efficient query design is critical for responsive IoT dashboards and analytics.
Key Takeaways:
Always specify time ranges: Unbounded queries scan entire tables and can take minutes.
Use appropriate data sources: Recent data from raw tables, historical data from continuous aggregates.
Limit result sets: Frontends cannot render 100,000 points; downsample to 1,000-10,000.
Cache strategically: Use hybrid caching–cache aggregates, serve latest readings directly.
Pre-compute common aggregations: Continuous aggregates make dashboards 10-100x faster.
Optimize for your query patterns: Last-value, time-range, anomaly detection, and correlation queries each have different optimization strategies.
1292.5 What’s Next
In the next chapter on Time-Series Practice and Labs, we’ll apply these concepts through a real-world Tesla case study, hands-on ESP32 lab, and worked examples for smart grid query optimization.