%%{init: {'theme': 'base', 'themeVariables': {'primaryColor':'#E8F4F8','primaryTextColor':'#2C3E50','primaryBorderColor':'#16A085','lineColor':'#16A085','secondaryColor':'#FEF5E7','tertiaryColor':'#F4ECF7','edgeLabelBackground':'#ffffff','textColor':'#2C3E50','fontSize':'14px'}}}%%
flowchart TD
A[Vehicle Sensors<br/>300M points/sec] --> B{On-Vehicle Filtering}
B -->|Critical Safety| C[Immediate Upload<br/>100% data]
B -->|Anomaly Detected| D[Burst Upload<br/>High-res context]
B -->|Normal Operation| E[Aggregate On-Vehicle<br/>1-minute averages]
C --> F[Tesla Cloud<br/>Time-Series DB]
D --> F
E --> F
F --> G[Hot Storage<br/>7 days full resolution]
G --> H[Warm Storage<br/>30 days, 1-min aggregates]
H --> I[Cold Storage<br/>1 year, 1-hour aggregates]
I --> J[Archive<br/>S3 Glacier, daily aggregates]
style A fill:#E8F4F8
style C fill:#E74C3C,color:#fff
style D fill:#E67E22,color:#fff
style E fill:#16A085,color:#fff
style F fill:#2C3E50,color:#fff
1291 Time-Series Practice and Labs
1291.1 Learning Objectives
By the end of this chapter, you will be able to:
- Apply time-series concepts through hands-on ESP32 lab exercises
- Analyze real-world scale challenges from the Tesla fleet telemetry case study
- Design complete time-series storage strategies using worked examples
- Implement circular buffers and downsampling on resource-constrained devices
- Calculate NTP synchronization intervals for distributed IoT deployments
Core Concept: Real-world time-series systems combine edge processing, adaptive sampling, and multi-tier retention to handle extreme scale while maintaining analytical capability. Why It Matters: Tesla’s 300 million points/second demonstrates that naive approaches fail catastrophically–only through edge aggregation, adaptive sampling, and tiered retention can such systems become practical. Key Takeaway: Start with standard tools (InfluxDB, TimescaleDB), implement retention from day one, and add edge processing when cloud ingestion becomes a bottleneck.
1291.2 Real-World Case Study: Tesla Fleet Telemetry
Tesla operates one of the world’s largest IoT time-series systems, collecting data from over 1.5 million vehicles worldwide.
1291.2.1 Scale Challenge
Data Generation: - Vehicles: 1.5 million active vehicles - Sensors per vehicle: ~200 sensors (battery, motors, HVAC, GPS, cameras, etc.) - Sampling rate: 1 Hz average (some sensors faster, some slower) - Data points per second: 1.5M vehicles x 200 sensors x 1 Hz = 300 million points/second - Daily data: 300M x 86,400 seconds = 25.9 trillion points/day
Storage Requirements (without optimization): - Raw: 25.9T points x 32 bytes = 829 TB/day - Annual: 829 TB x 365 = 302 PB/year
This is physically impossible to store economically. Tesla’s approach:
1291.2.2 Tesla’s Time-Series Strategy
Key Optimizations:
- Edge Aggregation: Vehicles pre-process 90% of data locally
- Only upload anomalies and aggregates
- Reduces cloud ingestion to ~30M points/second (10x reduction)
- Adaptive Sampling: Sample rates adjust based on context
- Parked: Sample every 5 minutes
- Driving: Sample every second
- Hard braking: Sample at 100 Hz for 10 seconds
- Multi-Tier Retention:
- Hot (7 days): Full resolution for recent analysis
- Warm (30 days): Downsampled for trend analysis
- Cold (1 year): Aggregates for long-term patterns
- Archive: Compliance and model training
- Custom Time-Series Engine:
- Tesla built custom infrastructure (not off-the-shelf)
- Columnar storage with extreme compression (50:1 ratios)
- Distributed across data centers globally
Results: - Actual storage: ~15 PB/year (vs. 302 PB raw) - Query latency: <100ms for recent data analysis - Powers Autopilot improvements, range predictions, battery health monitoring
Lessons for IoT Architects:
- Edge processing is essential at scale: Don’t send everything to the cloud
- Adaptive strategies: Sample rates and retention policies should match data value
- Domain-specific compression: Tesla’s battery telemetry compresses 100:1 because voltage changes slowly
- Start with standard tools: Use InfluxDB or TimescaleDB initially, only build custom if you reach their limits
1291.3 Understanding Check
You’re deploying an IoT system for a 20-story office building with the following sensor network:
- Temperature sensors: 500 sensors (1 per room), report every 10 seconds
- Occupancy sensors: 500 sensors (motion detection), report on change (avg 1/minute)
- Energy meters: 50 meters (per floor + equipment), report every 30 seconds
- Air quality sensors: 100 sensors (CO2, VOC), report every 60 seconds
Each reading is 32 bytes (timestamp + sensor_id + value + metadata).
1291.3.1 Questions:
- How many data points per day does this system generate?
- What is the raw storage requirement per month (without compression)?
- If you implement this retention policy, what’s the storage after 1 year?
- Tier 1: Raw data, 7 days retention
- Tier 2: 1-minute averages, 30 days retention
- Tier 3: 1-hour averages, 1 year retention
- Tier 4: Daily aggregates, forever
- Which database would you recommend and why?
1291.3.2 Solutions:
1. Data points per day:
- Temperature: 500 sensors x (86,400 sec/day / 10 sec) = 4,320,000 points/day
- Occupancy: 500 sensors x (1,440 min/day / 1 min) = 720,000 points/day
- Energy: 50 meters x (86,400 sec/day / 30 sec) = 144,000 points/day
- Air quality: 100 sensors x (86,400 sec/day / 60 sec) = 144,000 points/day
Total: 5,328,000 points/day
2. Raw storage per month:
- Daily: 5,328,000 points x 32 bytes = 170.5 MB/day
- Monthly: 170.5 MB x 30 = 5.1 GB/month raw
3. Storage after 1 year with retention policy:
Tier 1 (Raw, 7 days): - 5,328,000 points/day x 7 days x 32 bytes = 1.19 GB - With 10:1 compression: 119 MB
Tier 2 (1-minute averages, 30 days): - Original: 5,328,000 points/day -> Downsampled: ~88,800 points/day (1-minute buckets) - 88,800 x 30 days x 32 bytes = 85.2 MB - With 10:1 compression: 8.5 MB
Tier 3 (1-hour averages, 1 year): - 88,800 points/day -> 1,480 points/day (1-hour buckets) - 1,480 x 365 days x 32 bytes = 17.3 MB - With 10:1 compression: 1.7 MB
Tier 4 (Daily aggregates, forever): - 1,480 points/day -> ~25 points/day (daily buckets, multiple aggregations) - 25 x 365 days x 32 bytes = 0.3 MB/year - Negligible growth
Total storage after 1 year: ~130 MB
vs. no retention policy: 5.1 GB/month x 12 = 61.2 GB raw (6.1 GB compressed)
Savings: 98% reduction
4. Database recommendation:
Recommended: TimescaleDB
Reasoning:
- Write throughput is manageable: 5.3M points/day / 86,400 seconds = ~62 writes/second (well within TimescaleDB capacity)
- Need for correlations: Building management systems need to join sensor data with:
- Room assignments (which tenant, department)
- Energy billing data
- Maintenance schedules
- Occupancy reservations
- SQL compatibility: Facilities team likely familiar with SQL, easier integration with existing building management software
- PostgreSQL ecosystem: Rich tooling for dashboards (Grafana), reporting, and analytics
Alternative: InfluxDB would work if: - Write rates increased 10x (more sensors added) - No need to correlate with relational business data - Team willing to learn Flux query language
Not recommended: Prometheus–designed for short-term infrastructure monitoring, not multi-year IoT data retention.
1291.4 Worked Example: Smart Grid Query Optimization
1291.5 Worked Example: Time-Series Query Optimization for Smart Grid
Scenario: A utility company operates a smart grid with 10,000 smart meters, each reporting energy consumption every 15 minutes. The operations team needs dashboards showing: - Real-time consumption (last hour, per-meter detail) - Daily peak demand (last 30 days, aggregated by region) - Monthly trends (last 12 months, company-wide)
Goal: Design a query and storage strategy that keeps dashboard latency under 2 seconds while minimizing storage costs.
What we do: Estimate raw data ingestion and storage requirements.
Calculations:
Meters: 10,000
Readings per meter per day: 24 hours x 4 readings/hour = 96
Total readings per day: 10,000 x 96 = 960,000
Bytes per reading: ~50 bytes (timestamp + meter_id + kWh + voltage + metadata)
Daily raw data: 960,000 x 50 = 48 MB/day
Annual raw data: 48 MB x 365 = 17.5 GB/year (uncompressed)
Why: Understanding data volume determines partition strategy, retention policies, and hardware requirements. At 960K writes/day (11 writes/second average), even a basic TSDB handles this easily–but dashboards querying across 350M+ annual rows need optimization.
What we do: Configure time-based partitioning to isolate recent vs historical data.
TimescaleDB Configuration:
-- Create hypertable with 1-day chunks
CREATE TABLE meter_readings (
time TIMESTAMPTZ NOT NULL,
meter_id INTEGER NOT NULL,
region_id INTEGER NOT NULL,
kwh DOUBLE PRECISION,
voltage DOUBLE PRECISION
);
SELECT create_hypertable('meter_readings', 'time',
chunk_time_interval => INTERVAL '1 day');
-- Add indexes for common query patterns
CREATE INDEX idx_meter_time ON meter_readings (meter_id, time DESC);
CREATE INDEX idx_region_time ON meter_readings (region_id, time DESC);Why: Daily chunks mean queries for “last hour” scan only 1 partition (~40K rows) instead of the entire table. The meter_id and region_id indexes accelerate filtered queries without excessive write overhead.
What we do: Pre-compute hourly and daily aggregates to accelerate dashboard queries.
Continuous Aggregate Setup:
-- Hourly aggregates (for daily peak analysis)
CREATE MATERIALIZED VIEW meter_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS hour,
region_id,
COUNT(*) as reading_count,
AVG(kwh) as avg_kwh,
MAX(kwh) as peak_kwh,
SUM(kwh) as total_kwh
FROM meter_readings
GROUP BY hour, region_id
WITH NO DATA;
-- Refresh policy: update every 15 minutes, cover last 2 hours
SELECT add_continuous_aggregate_policy('meter_hourly',
start_offset => INTERVAL '2 hours',
end_offset => INTERVAL '15 minutes',
schedule_interval => INTERVAL '15 minutes');
-- Daily aggregates (for monthly trend analysis)
CREATE MATERIALIZED VIEW meter_daily
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 day', time) AS day,
region_id,
AVG(kwh) as avg_kwh,
MAX(kwh) as daily_peak_kwh,
SUM(kwh) as total_kwh
FROM meter_readings
GROUP BY day, region_id
WITH NO DATA;
SELECT add_continuous_aggregate_policy('meter_daily',
start_offset => INTERVAL '3 days',
end_offset => INTERVAL '1 day',
schedule_interval => INTERVAL '1 day');Why: Continuous aggregates pre-compute results incrementally. A 30-day peak demand query now scans 30 rows per region (720 total for 24 regions) instead of 28.8M raw readings–a 40,000x reduction.
What we do: Implement tiered retention to balance detail vs storage cost.
Retention Policy:
-- Compress data older than 7 days (10:1 compression typical)
ALTER TABLE meter_readings SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'meter_id',
timescaledb.compress_orderby = 'time DESC'
);
SELECT add_compression_policy('meter_readings', INTERVAL '7 days');
-- Drop raw data older than 90 days (aggregates retained)
SELECT add_retention_policy('meter_readings', INTERVAL '90 days');
-- Keep hourly aggregates for 2 years
SELECT add_retention_policy('meter_hourly', INTERVAL '2 years');
-- Keep daily aggregates for 10 years
SELECT add_retention_policy('meter_daily', INTERVAL '10 years');Why: This tiered approach provides: - Last 7 days: Full resolution, uncompressed (fast per-meter queries) - 7-90 days: Full resolution, compressed (10x storage reduction) - 90+ days: Only aggregates retained (99% storage reduction)
What we do: Write queries that leverage partitions and aggregates.
Optimized Queries:
-- Dashboard 1: Real-time consumption (last hour)
-- Scans: ~40K rows from current chunk
SELECT meter_id, time, kwh
FROM meter_readings
WHERE time > NOW() - INTERVAL '1 hour'
AND region_id = 5
ORDER BY time DESC;
-- Latency: ~50ms
-- Dashboard 2: Daily peak demand (last 30 days)
-- Scans: ~720 rows from hourly aggregate
SELECT day, MAX(peak_kwh) as daily_peak
FROM meter_hourly
WHERE hour > NOW() - INTERVAL '30 days'
GROUP BY date_trunc('day', hour)
ORDER BY day;
-- Latency: ~20ms
-- Dashboard 3: Monthly trends (last 12 months)
-- Scans: ~288 rows from daily aggregate
SELECT date_trunc('month', day) as month,
SUM(total_kwh) as monthly_consumption
FROM meter_daily
WHERE day > NOW() - INTERVAL '12 months'
GROUP BY month
ORDER BY month;
-- Latency: ~15msWhy: Each query hits the appropriate data tier–raw data for recent detail, hourly aggregates for medium-term analysis, daily aggregates for long-term trends. All queries complete in under 100ms.
Outcome: All three dashboard queries complete in under 100ms (well under the 2-second requirement).
Storage Summary: | Data Tier | Retention | Size (Year 1) | Size (Year 5) | |———–|———–|—————|—————| | Raw (uncompressed) | 7 days | 336 MB | 336 MB | | Raw (compressed) | 90 days | 430 MB | 430 MB | | Hourly aggregates | 2 years | 15 MB | 30 MB | | Daily aggregates | 10 years | 0.5 MB | 2.5 MB | | Total | - | ~780 MB | ~800 MB |
Comparison without optimization: 87.5 GB after 5 years (109x more storage).
Key Decisions Made: 1. Daily chunks: Isolates recent data for fast queries 2. Continuous aggregates: Pre-computes common dashboard queries 3. Tiered retention: Keeps detail where needed, aggregates everywhere else 4. Compression after 7 days: Balances query speed vs storage 5. Index strategy: region_id + time indexes match query patterns
1291.6 Worked Examples: Time Synchronization
Scenario: A fleet management system tracks 5,000 vehicles using GPS sensors. Each vehicle reports position data to a central time-series database (InfluxDB). Due to cellular network variability, GPS timestamps from vehicles can drift from server time. The operations team needs to ensure data can be correctly ordered despite clock discrepancies.
Given: - Number of vehicles: 5,000 - Reporting interval: 10 seconds per vehicle - Vehicle GPS clock accuracy: +/-2 seconds (cellular NTP sync) - Server clock accuracy: +/-50 ms (GPS-disciplined NTP) - Data retention window for real-time view: 1 hour - Out-of-order arrival tolerance: Up to 30 seconds late
Steps: 1. Calculate timestamp uncertainty between any two vehicles: - Vehicle A clock: +/-2 seconds - Vehicle B clock: +/-2 seconds - Combined uncertainty = +/-4 seconds (worst case: A is +2s, B is -2s)
- Determine minimum sampling interval for unambiguous ordering:
- Events must be >4 seconds apart to guarantee correct ordering
- Current interval (10 seconds) > 4 seconds: OK for ordering
- Configure InfluxDB write buffer for late arrivals:
- Maximum expected lateness: 30 seconds
- Set
cache-max-memory-sizeto handle 30 seconds of pending writes - Buffer size = 5,000 vehicles x 3 readings (30s / 10s) x 100 bytes = 1.5 MB
- Design timestamp handling strategy:
- Primary timestamp: Vehicle GPS timestamp (event time)
- Secondary timestamp: Server receipt time (processing time)
- Store both:
time(GPS) andreceived_at(server) - Query by GPS time for correct route reconstruction
- Configure retention policy for clock drift tolerance:
- Hot tier: 1 hour at full resolution (for real-time dashboards)
- Use 5-second
GROUP BY time()to absorb +/-2 second drift variations
Result: Configure InfluxDB with 1.5 MB write cache, dual timestamps, and 5-second aggregation buckets. This absorbs +/-4 second clock drift while correctly ordering 98.5% of position updates.
Key Insight: In distributed IoT systems, clock drift is inevitable. Design your data model to store both event time (when it happened) and ingestion time (when you received it). Query by event time for analytics but use ingestion time for troubleshooting data pipeline issues.
Scenario: A smart agriculture company deploys 200 edge gateways across remote farms. Each gateway aggregates data from 50 soil sensors and forwards to the cloud every 5 minutes. The gateways have low-cost oscillators and intermittent cellular connectivity. The team must design NTP synchronization to ensure timestamp accuracy for cross-farm analysis.
Given: - Edge gateways: 200 devices - Gateway oscillator drift: +/-100 ppm (low-cost crystal) - Cellular connectivity: Available 80% of time (intermittent) - NTP round-trip time: 200-500 ms (cellular network) - Required timestamp accuracy: +/-500 ms for cross-farm comparison - Data upload interval: 5 minutes (300 seconds)
Steps: 1. Calculate maximum drift between NTP syncs: - Drift rate = 100 ppm = 100 microseconds per second = 0.1 ms/s - Over 5-minute upload interval: 0.1 ms/s x 300s = 30 ms drift - Over 1 hour without sync: 0.1 ms/s x 3600s = 360 ms drift
- Calculate NTP sync accuracy limits:
- Network RTT asymmetry: Assume 10% asymmetry
- Asymmetry error = RTT x asymmetry = 350 ms x 10% = 35 ms
- Best achievable NTP accuracy: ~50-100 ms over cellular
- Determine maximum sync interval:
- Error budget: 500 ms target accuracy
- NTP sync error: 100 ms (typical)
- Available for drift: 500 - 100 = 400 ms
- Max sync interval = 400 ms / 0.1 ms/s = 4,000 seconds = 66 minutes
- Account for connectivity gaps:
- 20% downtime means potential 20% longer gaps between syncs
- Apply safety factor: 66 min x 0.8 = 53 minutes recommended
- Round to 30 minutes for practical implementation
- Configure NTP client for intermittent connectivity:
- Primary: Cloud NTP server (time.google.com)
- Secondary: GPS time from connected sensors (if available)
- Retry interval on failure: 5 minutes
- Maximum poll interval: 30 minutes
- Store last-known offset for gap periods
Result: Configure edge gateways to sync NTP every 30 minutes. During connectivity gaps up to 66 minutes, clocks remain within 400 ms accuracy. Combined with NTP sync error (~100 ms), total accuracy stays within 500 ms target.
Key Insight: Low-cost IoT devices with 100 ppm oscillators need hourly NTP syncs for sub-second accuracy. For tighter requirements (<100 ms), either upgrade to TCXO oscillators (+/-2 ppm) or implement GPS-disciplined timing at edge gateways.
1291.7 Hands-On Lab: Time-Series Data Logger
In this hands-on lab, you will build a time-series data logger using an ESP32 microcontroller. You will learn how to collect timestamped sensor data, implement efficient storage using circular buffers, apply downsampling techniques, and query historical data through serial commands.
1291.7.1 Learning Objectives
By completing this lab, you will be able to:
- Collect timestamped sensor data from multiple sensors on an ESP32
- Implement a circular buffer for efficient memory management on resource-constrained devices
- Apply downsampling techniques to reduce storage requirements while preserving data trends
- Query historical data using custom serial commands
- Understand the trade-offs between data resolution and storage capacity
1291.7.2 Prerequisites
- Basic C/C++ programming knowledge
- Familiarity with Arduino IDE concepts
- Understanding of time-series data concepts (covered earlier in this chapter)
1291.7.3 Wokwi Simulator
Use the embedded simulator below to complete this lab. The ESP32 environment comes pre-configured with essential libraries.
- Click inside the simulator and press Ctrl+Shift+M (or Cmd+Shift+M on Mac) to open the Serial Monitor
- Use the temperature sensor on the virtual breadboard or simulate readings with the built-in random values
- You can save your project to Wokwi by creating a free account
1291.7.4 Step-by-Step Instructions
1291.7.4.1 Step 1: Set Up the Basic Data Structure
First, define the data structures for storing timestamped sensor readings. Copy this code into the simulator:
#include <Arduino.h>
#include <time.h>
// Configuration constants
#define BUFFER_SIZE 100 // Number of readings to store
#define SAMPLE_INTERVAL 1000 // Sample every 1 second (ms)
#define DOWNSAMPLE_FACTOR 5 // Average every 5 readings for storage
// Data point structure - 12 bytes per reading
struct DataPoint {
uint32_t timestamp; // Unix timestamp (seconds since epoch)
float temperature; // Temperature in Celsius
float humidity; // Relative humidity percentage
};
// Circular buffer for raw data (high resolution)
DataPoint rawBuffer[BUFFER_SIZE];
int rawHead = 0;
int rawCount = 0;
// Circular buffer for downsampled data (long-term storage)
DataPoint downsampledBuffer[BUFFER_SIZE];
int dsHead = 0;
int dsCount = 0;
// Accumulator for downsampling
float tempAccum = 0;
float humidAccum = 0;
int accumCount = 0;
// Timing
unsigned long lastSampleTime = 0;
uint32_t startTime = 0;1291.7.4.2 Step 2: Implement the Circular Buffer Operations
Add these functions to manage the circular buffer efficiently:
// Add a data point to a circular buffer
void addToBuffer(DataPoint* buffer, int& head, int& count,
DataPoint point, int maxSize) {
buffer[head] = point;
head = (head + 1) % maxSize;
if (count < maxSize) {
count++;
}
}
// Get the index of a point at a given offset from newest
int getIndex(int head, int count, int offset, int maxSize) {
if (offset >= count) return -1;
return (head - 1 - offset + maxSize) % maxSize;
}
// Calculate storage usage
void printStorageStats() {
int rawBytes = rawCount * sizeof(DataPoint);
int dsBytes = dsCount * sizeof(DataPoint);
Serial.println("\n=== Storage Statistics ===");
Serial.printf("Raw buffer: %d/%d points (%d bytes)\n",
rawCount, BUFFER_SIZE, rawBytes);
Serial.printf("Downsampled: %d/%d points (%d bytes)\n",
dsCount, BUFFER_SIZE, dsBytes);
Serial.printf("Total memory: %d bytes\n", rawBytes + dsBytes);
Serial.printf("Compression ratio: %.1fx\n",
rawCount > 0 ? (float)(rawCount * sizeof(DataPoint)) /
(dsCount * sizeof(DataPoint) + 1) : 1.0);
}1291.7.4.3 Step 3: Implement Sensor Reading and Downsampling
Add the sensor collection and downsampling logic:
// Simulate sensor readings (replace with real sensors in production)
DataPoint readSensors() {
DataPoint dp;
dp.timestamp = startTime + (millis() / 1000);
// Simulate temperature: 20-30C with some variation
dp.temperature = 25.0 + sin(millis() / 10000.0) * 5.0 +
random(-10, 10) / 10.0;
// Simulate humidity: 40-60% with some variation
dp.humidity = 50.0 + cos(millis() / 15000.0) * 10.0 +
random(-5, 5) / 10.0;
return dp;
}
// Process and store a new reading with downsampling
void processSensorReading() {
DataPoint reading = readSensors();
// Store in raw buffer (high resolution)
addToBuffer(rawBuffer, rawHead, rawCount, reading, BUFFER_SIZE);
// Accumulate for downsampling
tempAccum += reading.temperature;
humidAccum += reading.humidity;
accumCount++;
// When we have enough samples, create downsampled point
if (accumCount >= DOWNSAMPLE_FACTOR) {
DataPoint dsPoint;
dsPoint.timestamp = reading.timestamp;
dsPoint.temperature = tempAccum / accumCount;
dsPoint.humidity = humidAccum / accumCount;
addToBuffer(downsampledBuffer, dsHead, dsCount, dsPoint, BUFFER_SIZE);
// Reset accumulator
tempAccum = 0;
humidAccum = 0;
accumCount = 0;
}
// Print latest reading
Serial.printf("[%lu] Temp: %.2fC, Humidity: %.2f%%\n",
reading.timestamp, reading.temperature, reading.humidity);
}1291.7.4.4 Step 4: Implement Query Commands
Add serial command processing for querying historical data:
// Query last N readings from specified buffer
void queryLastN(DataPoint* buffer, int head, int count,
int n, const char* bufferName) {
Serial.printf("\n=== Last %d readings from %s ===\n", n, bufferName);
Serial.println("Timestamp\tTemp(C)\tHumidity(%)");
Serial.println("----------------------------------------");
int toShow = min(n, count);
for (int i = 0; i < toShow; i++) {
int idx = getIndex(head, count, i, BUFFER_SIZE);
if (idx >= 0) {
Serial.printf("%lu\t\t%.2f\t\t%.2f\n",
buffer[idx].timestamp,
buffer[idx].temperature,
buffer[idx].humidity);
}
}
}
// Calculate statistics for a buffer
void calculateStats(DataPoint* buffer, int head, int count) {
if (count == 0) {
Serial.println("No data available.");
return;
}
float minTemp = 999, maxTemp = -999, sumTemp = 0;
float minHum = 999, maxHum = -999, sumHum = 0;
for (int i = 0; i < count; i++) {
int idx = getIndex(head, count, i, BUFFER_SIZE);
if (idx >= 0) {
sumTemp += buffer[idx].temperature;
sumHum += buffer[idx].humidity;
if (buffer[idx].temperature < minTemp)
minTemp = buffer[idx].temperature;
if (buffer[idx].temperature > maxTemp)
maxTemp = buffer[idx].temperature;
if (buffer[idx].humidity < minHum)
minHum = buffer[idx].humidity;
if (buffer[idx].humidity > maxHum)
maxHum = buffer[idx].humidity;
}
}
Serial.println("\n=== Statistics ===");
Serial.printf("Temperature - Min: %.2f, Max: %.2f, Avg: %.2f\n",
minTemp, maxTemp, sumTemp / count);
Serial.printf("Humidity - Min: %.2f, Max: %.2f, Avg: %.2f\n",
minHum, maxHum, sumHum / count);
}
// Process serial commands
void processCommand(String cmd) {
cmd.trim();
cmd.toUpperCase();
if (cmd == "HELP") {
Serial.println("\n=== Available Commands ===");
Serial.println("HELP - Show this help message");
Serial.println("RAW N - Show last N raw readings");
Serial.println("DS N - Show last N downsampled readings");
Serial.println("STATS - Show statistics for all data");
Serial.println("STORAGE - Show storage usage");
Serial.println("CLEAR - Clear all buffers");
}
else if (cmd.startsWith("RAW ")) {
int n = cmd.substring(4).toInt();
queryLastN(rawBuffer, rawHead, rawCount, n, "Raw Buffer");
}
else if (cmd.startsWith("DS ")) {
int n = cmd.substring(3).toInt();
queryLastN(downsampledBuffer, dsHead, dsCount, n, "Downsampled Buffer");
}
else if (cmd == "STATS") {
Serial.println("\n--- Raw Data Statistics ---");
calculateStats(rawBuffer, rawHead, rawCount);
Serial.println("\n--- Downsampled Data Statistics ---");
calculateStats(downsampledBuffer, dsHead, dsCount);
}
else if (cmd == "STORAGE") {
printStorageStats();
}
else if (cmd == "CLEAR") {
rawHead = rawCount = 0;
dsHead = dsCount = 0;
tempAccum = humidAccum = accumCount = 0;
Serial.println("All buffers cleared.");
}
else {
Serial.println("Unknown command. Type HELP for available commands.");
}
}1291.7.4.5 Step 5: Setup and Main Loop
Complete the program with the setup and loop functions:
void setup() {
Serial.begin(115200);
delay(1000);
// Initialize pseudo-random time (in real deployment, use NTP)
startTime = 1704067200; // Jan 1, 2024 00:00:00 UTC
Serial.println("\n====================================");
Serial.println(" Time-Series Data Logger v1.0");
Serial.println("====================================");
Serial.println("Collecting sensor data...");
Serial.printf("Sample interval: %d ms\n", SAMPLE_INTERVAL);
Serial.printf("Downsample factor: %d\n", DOWNSAMPLE_FACTOR);
Serial.printf("Buffer size: %d readings\n", BUFFER_SIZE);
Serial.println("\nType HELP for available commands.\n");
}
void loop() {
// Collect sensor data at specified interval
unsigned long currentTime = millis();
if (currentTime - lastSampleTime >= SAMPLE_INTERVAL) {
lastSampleTime = currentTime;
processSensorReading();
}
// Process serial commands
if (Serial.available()) {
String command = Serial.readStringUntil('\n');
processCommand(command);
}
}1291.7.5 Testing Your Implementation
- Start the simulator and open the Serial Monitor (Ctrl+Shift+M)
- Wait 10-15 seconds to collect some data
- Try these commands:
HELP- View all available commandsRAW 10- Show the last 10 raw readingsDS 5- Show the last 5 downsampled readingsSTATS- View min/max/average statisticsSTORAGE- See memory usage and compression ratio
1291.7.6 Challenge Exercises
Modify the code to include a third sensor (e.g., pressure or light level). Update the DataPoint structure and all related functions.
Hint: You will need to update the struct, the readSensors() function, and all print statements.
Add a new command RANGE start end that queries data between two timestamps. For example, RANGE 1704067210 1704067220 should show all readings in that 10-second window.
Hint: Use the queryTimeRange() function pattern shown in the chapter.
Implement a three-tier storage system: - Raw data: 1-second resolution (last 60 readings) - Medium: 10-second averages (last 100 readings) - Long-term: 1-minute averages (last 100 readings)
This mirrors how production time-series databases like InfluxDB implement retention policies.
Hint: Create three separate circular buffers and three downsampling accumulators.
Add automatic anomaly detection that prints a warning when temperature or humidity readings deviate more than 2 standard deviations from the running average.
Hint: Maintain running sum and sum-of-squares to calculate variance efficiently without storing all historical values.
1291.7.7 What You Learned
In this lab, you implemented core time-series database concepts on a microcontroller:
| Concept | Implementation |
|---|---|
| Timestamped data | Each reading includes a Unix timestamp |
| Circular buffer | Fixed-size buffer that overwrites oldest data |
| Downsampling | Averaging multiple readings to reduce storage |
| Query interface | Serial commands for data retrieval |
| Storage efficiency | Compression ratio tracking |
These same principles apply to production time-series databases like InfluxDB and TimescaleDB, which use similar techniques at much larger scale with additional optimizations like columnar compression and time-based partitioning.
Foundations:
- Big Data Overview - Data management fundamentals and IoT data challenges
- Data Storage and Databases - Storage concepts and database types
- Data in the Cloud - Cloud storage strategies and service models
Advanced Topics:
- Stream Processing - Real-time data pipelines with Kafka, Flink
- Edge Data Acquisition - Data collection patterns at the edge
- Interoperability - Data format standards and exchange protocols
Architecture:
- Edge Computing Patterns - Edge processing architectures
- Cloud Computing - Cloud infrastructure for IoT
1291.8 Summary
This chapter applied time-series concepts through practical examples and hands-on exercises.
Key Takeaways:
Edge processing is essential at scale: Tesla’s 98% data reduction comes from on-vehicle aggregation, not just cloud compression.
Adaptive strategies match data value: Sample faster during interesting events (hard braking), slower during routine operation (parked).
Standard tools handle most workloads: Start with InfluxDB or TimescaleDB–only build custom when you exceed their limits.
Embedded systems can implement TSDB concepts: Circular buffers, downsampling, and time-based queries work on microcontrollers.
Time synchronization requires planning: Calculate drift rates, sync intervals, and design for connectivity gaps.
1291.9 What’s Next
In the next chapter on Stream Processing, we’ll explore how to process IoT data in real-time before it reaches the database. Learn how to use Apache Kafka, Apache Flink, and cloud streaming services to detect anomalies, trigger alerts, and perform complex event processing on sensor streams at microsecond latency–transforming raw time-series data into actionable insights.