1292 Query Optimization for IoT Time-Series

Prerequisites: Data Retention and Downsampling

This enables: Time-Series Practice and Labs | Anomaly Detection

1292.1 Learning Objectives

By the end of this chapter, you will be able to:

Write efficient queries for common IoT sensor data patterns
Implement last-value, time-range, and anomaly detection queries
Apply query optimization best practices for time-series workloads
Use pagination effectively for large IoT datasets
Balance query caching with real-time data freshness requirements

MVU: Minimum Viable Understanding

Core Concept: Efficient time-series queries always specify time ranges, use appropriate data granularity, leverage pre-computed aggregates, and limit result sets to what the application can actually render. Why It Matters: A poorly-written dashboard query can scan millions of rows and take 30+ seconds; the same query optimized with time bounds, continuous aggregates, and result limits returns in under 100ms. Key Takeaway: Always constrain queries by time range, use downsampled data for historical analysis, leverage continuous aggregates for dashboard queries, and cache aggressively (most dashboards tolerate 10-60 second staleness).

1292.2 Query Optimization for IoT

Estimated time: ~15 min | Difficulty: Advanced | Unit: P10.C15.U05

Efficient queries are essential for real-time dashboards, anomaly detection, and historical analysis.

Understanding Pagination for IoT Data APIs

Core Concept: Pagination divides large query results into manageable chunks (pages), allowing clients to request data incrementally rather than retrieving millions of records in a single response.

Why It Matters: IoT systems accumulate massive datasets - a single sensor with 1-second readings generates 86,400 records daily. Without pagination, requesting “all temperature readings” for a week returns 600,000+ records, overwhelming client memory, saturating network bandwidth, and causing API timeouts. Well-designed pagination keeps response times consistent regardless of total dataset size.

Key Takeaway: For time-series IoT data, prefer cursor-based pagination using timestamps (?after=2025-01-15T10:30:00Z&limit=1000) over offset-based (?offset=50000&limit=1000) because offset pagination becomes increasingly slow as offsets grow large, while timestamp cursors leverage time-based indexes efficiently.

1292.2.1 Common Query Patterns

1292.2.1.1 Last Value Queries

Scenario: Display current sensor readings on a dashboard.

InfluxDB:

from(bucket: "iot_sensors")
  |> range(start: -5m)
  |> filter(fn: (r) => r["_measurement"] == "temperature")
  |> last()

TimescaleDB:

SELECT DISTINCT ON (sensor_id)
  sensor_id,
  time,
  temperature
FROM sensor_data
WHERE time > NOW() - INTERVAL '5 minutes'
ORDER BY sensor_id, time DESC;

Optimization Tips: - Query recent time range only (last 5-10 minutes) - Use indexes on sensor_id + time - Cache results for 5-30 seconds (acceptable staleness)

1292.2.1.2 Time-Range Aggregations

Scenario: Show hourly averages for the last 24 hours.

InfluxDB:

from(bucket: "iot_sensors")
  |> range(start: -24h)
  |> filter(fn: (r) => r["_measurement"] == "temperature")
  |> filter(fn: (r) => r["location"] == "building_a")
  |> aggregateWindow(every: 1h, fn: mean)

TimescaleDB:

SELECT
  time_bucket('1 hour', time) AS hour,
  AVG(temperature) as avg_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
  AND location = 'building_a'
GROUP BY hour
ORDER BY hour;

Optimization Tips: - Use continuous aggregates/downsampled data if available - Partition tables by time (automatic in TSDBs) - Limit result set size (frontend can’t render 10,000 points)

1292.2.1.3 Anomaly Detection Queries

Scenario: Detect sensors with values 2 standard deviations above mean.

InfluxDB:

from(bucket: "iot_sensors")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "temperature")
  |> aggregateWindow(every: 5m, fn: mean)
  |> movingAverage(n: 12) // 1-hour moving average
  |> stddev()
  |> filter(fn: (r) => r._value > 2.0)

TimescaleDB:

WITH stats AS (
  SELECT
    sensor_id,
    AVG(temperature) as mean_temp,
    STDDEV(temperature) as stddev_temp
  FROM sensor_data
  WHERE time > NOW() - INTERVAL '1 hour'
  GROUP BY sensor_id
)
SELECT
  s.sensor_id,
  s.time,
  s.temperature,
  st.mean_temp,
  (s.temperature - st.mean_temp) / st.stddev_temp as z_score
FROM sensor_data s
JOIN stats st ON s.sensor_id = st.sensor_id
WHERE s.time > NOW() - INTERVAL '1 hour'
  AND ABS((s.temperature - st.mean_temp) / st.stddev_temp) > 2.0;

Optimization Tips: - Pre-compute statistics in continuous aggregates - Use window functions efficiently (avoid full table scans) - Consider streaming anomaly detection (outside database)

1292.2.1.4 Multi-Sensor Correlations

Scenario: Find when temperature and humidity both exceed thresholds.

TimescaleDB (easier with joins):

SELECT
  t.time,
  t.sensor_id as temp_sensor,
  h.sensor_id as humidity_sensor,
  t.temperature,
  h.humidity
FROM sensor_data t
JOIN sensor_data h ON
  t.time = h.time AND
  t.location = h.location
WHERE t.time > NOW() - INTERVAL '1 hour'
  AND t.sensor_type = 'temperature'
  AND h.sensor_type = 'humidity'
  AND t.temperature > 30
  AND h.humidity > 80;

Optimization Tips: - Co-locate related sensors in same partition (by location) - Use time-bucketing to reduce join complexity - Consider denormalized schemas (store temp+humidity in same row)

1292.2.2 Query Performance Best Practices

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor':'#E8F4F8','primaryTextColor':'#2C3E50','primaryBorderColor':'#16A085','lineColor':'#16A085','secondaryColor':'#FEF5E7','tertiaryColor':'#F4ECF7','edgeLabelBackground':'#ffffff','textColor':'#2C3E50','fontSize':'14px'}}}%%
flowchart TD
    A[Query Request] --> B{Time Range}

    B -->|Recent<br/><1 hour| C[Query Raw Data<br/>Fast, small scan]
    B -->|Medium<br/>1 hour - 7 days| D{Aggregation needed?}
    B -->|Old<br/>>7 days| E[Query Downsampled Data<br/>Mandatory]

    D -->|Yes| F[Use Continuous Aggregates<br/>Pre-computed]
    D -->|No| G[Query Raw Data<br/>Larger scan]

    C --> H[Result]
    F --> H
    G --> H
    E --> H

    style A fill:#E8F4F8
    style C fill:#27AE60,color:#fff
    style F fill:#16A085,color:#fff
    style E fill:#E67E22,color:#fff

Figure 1292.1: Query Routing Based on Time Range and Aggregation Needs

Alternative View: Query Optimization Decision Tree

This view shows a step-by-step decision process for optimizing time-series queries:

%% fig-alt: "Query optimization decision tree showing step-by-step process. Start with query request, first check if time range specified - if no, add explicit bounds. Then check data volume - if over 10000 points, apply downsampling. Check if aggregation is pre-computed - if yes use continuous aggregate, if no compute on-the-fly. Finally check if result is cacheable - if dashboard query cache for 30 seconds, if ad-hoc return directly. Shows performance impact at each decision point."
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor':'#E8F4F8','primaryTextColor':'#2C3E50','primaryBorderColor':'#16A085','lineColor':'#16A085','secondaryColor':'#FEF5E7','tertiaryColor':'#F4ECF7','edgeLabelBackground':'#ffffff','textColor':'#2C3E50','fontSize':'14px'}}}%%
flowchart TD
    Start[Query Request] --> TimeCheck{Time range<br/>specified?}

    TimeCheck -->|No| AddBounds[Add explicit<br/>time bounds]
    TimeCheck -->|Yes| VolumeCheck{Data volume<br/>> 10K points?}
    AddBounds --> VolumeCheck

    VolumeCheck -->|Yes| Downsample[Apply<br/>downsampling]
    VolumeCheck -->|No| AggCheck{Pre-computed<br/>aggregate?}
    Downsample --> AggCheck

    AggCheck -->|Yes| UseCA[Use continuous<br/>aggregate<br/>10x faster]
    AggCheck -->|No| Compute[Compute<br/>on-the-fly]

    UseCA --> CacheCheck{Cacheable<br/>result?}
    Compute --> CacheCheck

    CacheCheck -->|Dashboard| Cache[Cache 30s<br/>Reduce load]
    CacheCheck -->|Ad-hoc| Return[Return<br/>directly]
    Cache --> Return

    style Start fill:#E8F4F8,stroke:#2C3E50
    style AddBounds fill:#E67E22,stroke:#2C3E50,color:#fff
    style Downsample fill:#E67E22,stroke:#2C3E50,color:#fff
    style UseCA fill:#27AE60,stroke:#2C3E50,color:#fff
    style Cache fill:#16A085,stroke:#2C3E50,color:#fff
    style Return fill:#2C3E50,stroke:#16A085,color:#fff

Following this decision tree systematically can improve query performance by 10-100x.

Key Principles:

Always specify time ranges: Never query unbounded time
Use appropriate granularity: Don’t fetch 1-second data for a 1-year chart
Leverage indexes: Filter on indexed tags/columns first
Limit result sets: Return max 1,000-10,000 points (downsample if needed)
Pre-aggregate when possible: Use continuous aggregates for dashboards
Cache aggressively: Most dashboards tolerate 10-60 second staleness

Show code

{
  const container = document.getElementById('kc-timeseries-9');
  if (container && typeof InlineKnowledgeCheck !== 'undefined') {
    container.innerHTML = '';
    container.appendChild(InlineKnowledgeCheck.create({
      question: "A factory operations dashboard queries 'average temperature by zone for the last 24 hours' every 10 seconds as users interact with it. The query scans 8.6 million raw data points and takes 3 seconds. 50 operators view this dashboard simultaneously. What is the MOST effective optimization?",
      options: [
        {text: "Add more indexes on the temperature and zone columns", correct: false, feedback: "Incorrect. Indexes help locate data quickly but don't reduce the fundamental problem: scanning millions of rows for aggregation. The query already finds the data; it's the aggregation computation that's slow."},
        {text: "Create a continuous aggregate (materialized view) that pre-computes hourly averages by zone", correct: true, feedback: "Correct! A continuous aggregate pre-computes the aggregation incrementally. Instead of scanning 8.6M raw rows, the dashboard queries ~24 pre-computed hourly summaries per zone. This reduces query time from 3 seconds to ~50ms and eliminates redundant computation when 50 users request the same data."},
        {text: "Increase the database server RAM to cache more raw data in memory", correct: false, feedback: "Incorrect. More RAM helps but doesn't eliminate the computation cost of aggregating millions of rows. Each query still performs the same aggregation work. Pre-computing the aggregation (continuous aggregate) is more effective than caching raw data."},
        {text: "Reduce the refresh rate from 10 seconds to 60 seconds to lower query load", correct: false, feedback: "Partially helpful but not the best solution. This reduces load by 6x but still has 50 users each running expensive 3-second queries. Pre-computing the aggregation makes each query fast AND eliminates redundant computation across users."}
      ],
      difficulty: "medium",
      topic: "query-optimization"
    }));
  }
}

Tradeoff: Query Caching vs Real-Time Data Freshness

Option A: Aggressive Caching (30-60 second TTL) - Query latency: 1-5ms (cache hit), 50-500ms (cache miss) - Database load: 10-50 QPS (reduced from 500+ QPS) - Infrastructure cost: $200-500/month for Redis cache layer - Data freshness: 30-60 seconds stale (acceptable for dashboards) - Cache hit rate: 80-95% for dashboard queries - Best for: Real-time dashboards with 5+ users, aggregation-heavy queries

Option B: Real-Time Queries (No Caching) - Query latency: 50-500ms consistent (depends on data volume) - Database load: 500+ QPS for active dashboards - Infrastructure cost: $1,000-3,000/month for larger database instance - Data freshness: Sub-second (true real-time) - Cache hit rate: N/A - Best for: Alerting systems, control loops, single-user ad-hoc analysis

Decision Factors: - Choose Aggressive Caching when: Dashboard refreshes every 10-60 seconds anyway, multiple users view same data (shared cache), query involves expensive aggregations (time_bucket, percentiles), database is a cost bottleneck - Choose Real-Time when: Sub-second freshness is critical (safety systems, trading), each user queries unique data (no cache benefit), query latency is already <50ms, system is event-driven (WebSocket push vs polling) - Hybrid approach: Cache aggregations (hourly stats) aggressively; serve latest readings directly from database - balances freshness for critical metrics with efficiency for historical views

Show code

{
  const container = document.getElementById('kc-timeseries-10');
  if (container && typeof InlineKnowledgeCheck !== 'undefined') {
    container.innerHTML = '';
    container.appendChild(InlineKnowledgeCheck.create({
      question: "An energy management system has a real-time dashboard showing power consumption. The dashboard refreshes every 5 seconds. A facility manager wants sub-second data freshness to detect power spikes immediately. The database currently handles 500 QPS from dashboard queries. Adding a Redis cache with 30-second TTL would reduce load to 50 QPS. What is the BEST recommendation?",
      options: [
        {text: "Implement the 30-second cache - cost savings outweigh the freshness requirement", correct: false, feedback: "Incorrect. For power spike detection, 30-second staleness could mean missing critical events. The facility manager has a legitimate real-time requirement. We need a solution that balances both concerns."},
        {text: "Skip caching entirely - the manager's freshness requirement must be honored exactly", correct: false, feedback: "Incorrect. Running 500 QPS when most queries return the same data is wasteful. A hybrid approach can satisfy the freshness requirement for spike detection while caching historical/aggregate queries."},
        {text: "Use a hybrid: cache historical aggregates (30s TTL) but serve the 'latest reading' query directly from the database", correct: true, feedback: "Correct! The hybrid approach caches what doesn't need freshness (historical averages, trend charts) while serving real-time queries (current power, spike detection) directly. This reduces load by 80%+ while maintaining sub-second freshness for the critical metric."},
        {text: "Increase the cache TTL to 60 seconds to reduce costs further", correct: false, feedback: "Incorrect. This worsens the freshness problem without addressing the core issue. Longer TTL means even staler data for spike detection, which defeats the manager's requirement."}
      ],
      difficulty: "hard",
      topic: "query-caching"
    }));
  }
}

1292.3 Interactive Demonstration

Interactive: Time Series Explorer

Explore how sampling intervals, aggregation windows, and downsampling affect IoT time-series data storage and visualization. This simulation generates 24 hours of temperature sensor data and demonstrates key time-series database concepts.

Show code

viewof samplingInterval = Inputs.select(
  new Map([
    ["1 second (86,400 points/day)", 1],
    ["1 minute (1,440 points/day)", 60],
    ["5 minutes (288 points/day)", 300],
    ["1 hour (24 points/day)", 3600]
  ]),
  {value: 60, label: "Sampling Interval"}
)

viewof aggregationWindow = Inputs.select(
  new Map([
    ["None (raw data)", 1],
    ["5 minutes", 5],
    ["15 minutes", 15],
    ["1 hour", 60],
    ["6 hours", 360]
  ]),
  {value: 1, label: "Aggregation Window"}
)

viewof aggregationType = Inputs.select(
  ["mean", "min", "max", "count"],
  {value: "mean", label: "Aggregation Type"}
)

viewof downsampleRatio = Inputs.range(
  [1, 20],
  {value: 1, step: 1, label: "Downsample Ratio (1 = no downsampling)"}
)

viewof showRetention = Inputs.toggle({label: "Show Retention Policy Tiers", value: false})

Show code

function generateSensorData(intervalSeconds) {
  const pointsPerDay = Math.floor(86400 / intervalSeconds);
  const data = [];
  const baseTemp = 22; // Base temperature in Celsius

  for (let i = 0; i < pointsPerDay; i++) {
    const hour = (i * intervalSeconds / 3600) % 24;

    // Simulate daily temperature pattern
    const dailyVariation = 4 * Math.sin((hour - 6) * Math.PI / 12);

    // Add some noise
    const noise = (Math.random() - 0.5) * 2;

    // Occasional spikes (simulating HVAC events)
    const spike = Math.random() < 0.02 ? (Math.random() > 0.5 ? 5 : -3) : 0;

    const temperature = baseTemp + dailyVariation + noise + spike;

    data.push({
      timestamp: new Date(2025, 11, 26, 0, 0, 0).getTime() + (i * intervalSeconds * 1000),
      hour: hour,
      value: Math.round(temperature * 10) / 10
    });
  }
  return data;
}

// Aggregate data based on window size
function aggregateData(data, windowMinutes, aggType) {
  if (windowMinutes <= 1) return data;

  const windowMs = windowMinutes * 60 * 1000;
  const aggregated = [];

  let currentWindow = [];
  let windowStart = data[0]?.timestamp || 0;

  for (const point of data) {
    if (point.timestamp - windowStart >= windowMs) {
      if (currentWindow.length > 0) {
        const values = currentWindow.map(p => p.value);
        let aggValue;

        switch(aggType) {
          case "mean":
            aggValue = values.reduce((a, b) => a + b, 0) / values.length;
            break;
          case "min":
            aggValue = Math.min(...values);
            break;
          case "max":
            aggValue = Math.max(...values);
            break;
          case "count":
            aggValue = values.length;
            break;
          default:
            aggValue = values.reduce((a, b) => a + b, 0) / values.length;
        }

        aggregated.push({
          timestamp: windowStart + windowMs / 2,
          hour: new Date(windowStart + windowMs / 2).getHours() +
                new Date(windowStart + windowMs / 2).getMinutes() / 60,
          value: Math.round(aggValue * 10) / 10,
          pointCount: values.length
        });
      }
      windowStart = point.timestamp;
      currentWindow = [point];
    } else {
      currentWindow.push(point);
    }
  }

  // Handle last window
  if (currentWindow.length > 0) {
    const values = currentWindow.map(p => p.value);
    let aggValue;
    switch(aggType) {
      case "mean":
        aggValue = values.reduce((a, b) => a + b, 0) / values.length;
        break;
      case "min":
        aggValue = Math.min(...values);
        break;
      case "max":
        aggValue = Math.max(...values);
        break;
      case "count":
        aggValue = values.length;
        break;
      default:
        aggValue = values.reduce((a, b) => a + b, 0) / values.length;
    }
    aggregated.push({
      timestamp: windowStart,
      hour: new Date(windowStart).getHours() + new Date(windowStart).getMinutes() / 60,
      value: Math.round(aggValue * 10) / 10,
      pointCount: values.length
    });
  }

  return aggregated;
}

// Downsample data
function downsampleData(data, ratio) {
  if (ratio <= 1) return data;
  return data.filter((_, i) => i % ratio === 0);
}

// Generate and process data
rawData = generateSensorData(samplingInterval)
aggregatedData = aggregateData(rawData, aggregationWindow, aggregationType)
displayData = downsampleData(aggregatedData, downsampleRatio)

Show code

bytesPerPoint = 32  // timestamp (8) + sensor_id (8) + value (8) + metadata (8)
compressionRatio = 10  // Typical TSDB compression

rawStorageBytes = rawData.length * bytesPerPoint
aggregatedStorageBytes = aggregatedData.length * bytesPerPoint
displayStorageBytes = displayData.length * bytesPerPoint

// Format bytes for display
function formatBytes(bytes) {
  if (bytes < 1024) return bytes + " B";
  if (bytes < 1024 * 1024) return (bytes / 1024).toFixed(1) + " KB";
  return (bytes / (1024 * 1024)).toFixed(2) + " MB";
}

function formatNumber(n) {
  return n.toLocaleString();
}

Show code

// Storage comparison display
html`
<div style="display: grid; grid-template-columns: repeat(3, 1fr); gap: 1rem; margin: 1rem 0; text-align: center;">
  <div style="background: #E8F4F8; padding: 1rem; border-radius: 8px; border-left: 4px solid #2C3E50;">
    <div style="font-size: 0.85rem; color: #666;">Raw Data Points</div>
    <div style="font-size: 1.5rem; font-weight: bold; color: #2C3E50;">${formatNumber(rawData.length)}</div>
    <div style="font-size: 0.8rem; color: #888;">Storage: ${formatBytes(rawStorageBytes)}</div>
  </div>
  <div style="background: #FEF5E7; padding: 1rem; border-radius: 8px; border-left: 4px solid #E67E22;">
    <div style="font-size: 0.85rem; color: #666;">After Aggregation</div>
    <div style="font-size: 1.5rem; font-weight: bold; color: #E67E22;">${formatNumber(aggregatedData.length)}</div>
    <div style="font-size: 0.8rem; color: #888;">Storage: ${formatBytes(aggregatedStorageBytes)}</div>
  </div>
  <div style="background: #E8F8F5; padding: 1rem; border-radius: 8px; border-left: 4px solid #16A085;">
    <div style="font-size: 0.85rem; color: #666;">After Downsampling</div>
    <div style="font-size: 1.5rem; font-weight: bold; color: #16A085;">${formatNumber(displayData.length)}</div>
    <div style="font-size: 0.8rem; color: #888;">Storage: ${formatBytes(displayStorageBytes)}</div>
  </div>
</div>
`

Show code

totalReduction = rawStorageBytes > 0 ? ((1 - displayStorageBytes / rawStorageBytes) * 100).toFixed(1) : 0
compressedStorage = displayStorageBytes / compressionRatio

html`
<div style="background: linear-gradient(135deg, #16A085, #27AE60); color: white; padding: 1rem; border-radius: 8px; margin: 1rem 0; text-align: center;">
  <div style="font-size: 1.1rem;">
    <strong>Storage Savings: ${totalReduction}%</strong>
    <span style="opacity: 0.9; font-size: 0.9rem;">(${formatBytes(rawStorageBytes)} -> ${formatBytes(displayStorageBytes)})</span>
  </div>
  <div style="font-size: 0.85rem; margin-top: 0.5rem; opacity: 0.9;">
    With 10:1 TSDB compression: ${formatBytes(compressedStorage)} final storage
  </div>
</div>
`

Show code

Plot = import("https://cdn.jsdelivr.net/npm/@observablehq/plot@0.6/+esm")

// Main visualization
Plot.plot({
  title: aggregationType === "count" ? "Data Point Count per Window" : "Temperature Over 24 Hours",
  subtitle: `Showing ${formatNumber(displayData.length)} points (${aggregationWindow > 1 ? aggregationType + " aggregation, " : ""}${downsampleRatio > 1 ? downsampleRatio + "x downsampling" : "raw data"})`,
  width: 700,
  height: 350,
  marginLeft: 60,
  marginBottom: 50,
  x: {
    label: "Hour of Day",
    domain: [0, 24],
    tickFormat: d => d + ":00"
  },
  y: {
    label: aggregationType === "count" ? "Point Count" : "Temperature (C)",
    grid: true
  },
  marks: [
    // Show raw data as light background if aggregated
    aggregationWindow > 1 ? Plot.dot(rawData.filter((_, i) => i % Math.max(1, Math.floor(rawData.length / 500)) === 0), {
      x: "hour",
      y: "value",
      fill: "#ddd",
      r: 1.5,
      opacity: 0.3,
      title: d => `Raw: ${d.value}C`
    }) : null,
    // Main data line
    Plot.line(displayData, {
      x: "hour",
      y: "value",
      stroke: "#16A085",
      strokeWidth: 2
    }),
    // Data points
    Plot.dot(displayData, {
      x: "hour",
      y: "value",
      fill: "#16A085",
      r: displayData.length > 100 ? 2 : 4,
      title: d => aggregationType === "count"
        ? `Count: ${d.value} points`
        : `${d.value}C at ${Math.floor(d.hour)}:${String(Math.round((d.hour % 1) * 60)).padStart(2, '0')}`
    })
  ].filter(Boolean)
})

Show code

retentionData = [
  {tier: "Raw (7 days)", points: rawData.length * 7, storage: rawData.length * 7 * bytesPerPoint, color: "#E74C3C"},
  {tier: "1-min avg (30 days)", points: Math.floor(rawData.length / 60) * 30, storage: Math.floor(rawData.length / 60) * 30 * bytesPerPoint, color: "#E67E22"},
  {tier: "1-hour avg (1 year)", points: 24 * 365, storage: 24 * 365 * bytesPerPoint, color: "#16A085"},
  {tier: "Daily avg (forever)", points: 365, storage: 365 * bytesPerPoint, color: "#2C3E50"}
]

// Show retention visualization if toggled
showRetention ? Plot.plot({
  title: "Multi-Tier Retention Policy (1 Year)",
  subtitle: "Storage required at each tier for a single sensor",
  width: 700,
  height: 250,
  marginLeft: 140,
  marginBottom: 40,
  x: {
    label: "Storage (KB)",
    tickFormat: d => (d / 1024).toFixed(0) + " KB"
  },
  y: {
    label: null
  },
  marks: [
    Plot.barX(retentionData, {
      x: "storage",
      y: "tier",
      fill: "color",
      title: d => `${d.tier}: ${formatNumber(d.points)} points, ${formatBytes(d.storage)}`
    }),
    Plot.text(retentionData, {
      x: d => d.storage + 5000,
      y: "tier",
      text: d => formatBytes(d.storage),
      textAnchor: "start",
      fontSize: 11
    })
  ]
}) : html`<div style="text-align: center; color: #666; padding: 1rem; background: #f5f5f5; border-radius: 8px;">
  <em>Enable "Show Retention Policy Tiers" above to visualize multi-tier storage strategy</em>
</div>`

Show code

// Key insights
html`
<div style="background: #f8f9fa; padding: 1rem; border-radius: 8px; margin-top: 1rem;">
  <strong>Key Insights:</strong>
  <ul style="margin: 0.5rem 0 0 0; padding-left: 1.5rem;">
    <li><strong>Sampling interval</strong> determines raw data volume (1-second = 86,400 points/day vs 1-hour = 24 points/day)</li>
    <li><strong>Aggregation</strong> reduces points while preserving trends (mean) or detecting anomalies (max/min)</li>
    <li><strong>Downsampling</strong> further reduces storage for historical data that doesn't need full resolution</li>
    <li><strong>Compression</strong> (10:1 typical) applies on top of all reductions</li>
    <li><strong>Retention tiers</strong> keep high resolution for recent data, aggregate for history</li>
  </ul>
</div>
`

Try These Experiments:

See the impact of sampling rate: Set interval to “1 second”, then “1 hour” - notice the 3,600x difference in points
Compare aggregation types: With “1 hour” aggregation, toggle between mean/max/min to see how each preserves different signal characteristics
Understand downsampling: Set aggregation to “15 minutes”, then increase downsample ratio - notice how storage drops but visual quality degrades
Visualize retention policy: Enable the retention toggle to see how multi-tier storage reduces total footprint by 95%+

1292.4 Summary

Efficient query design is critical for responsive IoT dashboards and analytics.

Key Takeaways:

Always specify time ranges: Unbounded queries scan entire tables and can take minutes.
Use appropriate data sources: Recent data from raw tables, historical data from continuous aggregates.
Limit result sets: Frontends cannot render 100,000 points; downsample to 1,000-10,000.
Cache strategically: Use hybrid caching–cache aggregates, serve latest readings directly.
Pre-compute common aggregations: Continuous aggregates make dashboards 10-100x faster.
Optimize for your query patterns: Last-value, time-range, anomaly detection, and correlation queries each have different optimization strategies.

1292.5 What’s Next

In the next chapter on Time-Series Practice and Labs, we’ll apply these concepts through a real-world Tesla case study, hands-on ESP32 lab, and worked examples for smart grid query optimization.