8 Storage Design Case Studies

Fleet Tracking, Smart City Lakes, Migration Safety, and Release Evidence

data-storage

8.1 Start With the Review Room

Use each case study like a design review, not a template to copy. A team should be able to point at the workload, explain the storage roles, show which queries matter, and prove what happens when data ages, moves, or must be restored. The example is useful only if it leaves that review trail visible.

In 60 Seconds

Worked examples are not recipes to copy. They are structured design reviews. Start by measuring the workload, split storage roles by query and lifecycle, choose the smallest set of databases that can prove the required behavior, and release only when the evidence shows ingestion, queries, retention, recovery, and migration paths work.

Learning Objectives

By the end of this chapter, you will be able to:

translate IoT requirements into storage roles, query paths, and lifecycle windows;
design a fleet tracking storage stack that separates registry, telemetry, current state, and archive data;
design a smart city data lake with separate tracks for telemetry, video, metadata, and compliance exports;
identify where exact cost and performance numbers must be replaced by measured release evidence;
plan a migration-safe architecture so future database changes do not require device firmware changes.

8.2 How to Use These Examples

Each example uses the same review loop:

1. Quantify Count devices, payloads, bursts, query windows, and retention obligations.

2. Split roles Assign registry, telemetry, current state, events, media, and archive to clear storage roles.

3. Sketch schema Define keys, time fields, indexes, partitions, and quality metadata from query patterns.

4. Prove release Measure ingestion, query plans, freshness, retention jobs, restore drills, and migration fallback.

A worked example should show the decision path, not just the final database name.

Reuse the Method, Not the Numbers

The numbers below are planning estimates used to show the reasoning process. Real deployments must replace them with measured payload sizes, compression ratios, cloud or on-prem prices, query plans, and restore times from the target environment.

8.3 Capacity Arithmetic Before Product Choice

Sizing a storage design starts with arithmetic, not vendor claims. The first pass needs two formulas:

ingest rate = active series count / sampling interval
storage = active series count x points per series x bytes per point

The term that often breaks estimates is active series count. It is driven by metric count and searchable dimensions, and risky tags multiply it quickly. A low-cardinality dimension such as site may be safe when each device belongs to exactly one site and users filter by site. An unbounded per-message field such as request_id, raw GPS coordinate, session id, or free-text note should not become a tag or series key.

Retention arithmetic becomes practical when each data class has a temperature tier: hot data supports frequent reads, warm data supports periodic investigation, cold data supports rare restore, and archive data supports compliance evidence.

For example, a building fleet with 2,400 devices, 8 metrics, and one sample every 15 seconds has 2,400 x 8 = 19,200 active metric streams. Its ingest rate is 19,200 / 15 = 1,280 points per second. Over a day, that is 1,280 x 86,400 = 110,592,000 points. At 12 compressed bytes per point, the value stream is about 1.33 GB per day, so 30 hot days is about 40 GB before replicas, indexes, write-ahead logs, object manifests, and backups.

Capacity review also separates disk from memory. Disk is mostly a function of points and bytes per point, which depends on codec, sampling regularity, value variance, and retention tier. Memory is driven heavily by active series metadata. Two workloads can both ingest 1,000 points per second, but the one with 100,000 active series sampled every 100 seconds asks the index, cache, compaction planner, and query engine to track far more metadata than the one with 1,000 active series sampled every second.

Question

What to calculate

Failure it catches

How fast do writes arrive?

Active series count divided by sampling interval, plus reconnect burst assumptions.

A platform sized for steady state but unable to absorb offline replay.

How much disk is needed?

Points over each retention window times measured bytes per point and replica factor.

A cost estimate that ignores indexes, WAL, backups, or compressed query behavior.

How much memory is needed?

Active series count, tag index metadata, cache pressure, and query working set.

A workload with moderate ingest rate but explosive cardinality.

What can be summarized?

Raw retention, rollup cadence, late-data refresh rule, and raw replay fallback.

Dashboards that are cheap because they read stale or incomplete summaries.

Knowledge Check: Cardinality and Memory

8.4 Worked Example 1: Fleet Tracking

8.4.1 Scenario

A logistics platform tracks delivery vehicles. Devices report location, speed, heading, fuel, and diagnostic fields. Users need a live map, route history, geofence alerts, driver and vehicle reports, and audit exports. The design must support growth without forcing a database migration every time a new query appears.

8.4.2 Step 1: Model the Workload

Use a simple capacity model first. For a planning example, assume 10,000 vehicles, one report every 10 seconds, and a normalized stored row around 250 bytes before indexes and replicas.

1,000/sec steady report rate before reconnect bursts

86.4M/day telemetry rows at 10-second cadence

21.6 GB/day rough raw payload before indexes, replicas, and backups

The important question is not only “how many rows?” It is “which reads must be local, fresh, and cheap?”

Query

Typical scope

Storage implication

Release evidence

Live map

Latest known state for many vehicles.

Use a latest-state table or cache, not a raw-history scan.

Freshness lag, missing-device count, and stale-state alarms.

Route history

One vehicle over hours or days.

Time-series table with vehicle and time predicates.

Query plan proves time pruning and vehicle index use.

Geofence alert

Current point against polygons.

Spatial index and clear geometry/geography convention.

Boundary tests, false-positive review, and spatial index plan.

Monthly report

Fleet-level trends over many vehicles.

Rollups or warehouse tables instead of repeated raw scans.

Aggregate freshness and reconciliation against raw samples.

8.4.3 Step 2: Split Storage Roles

Fleet tracking is a poor fit for a single “one database does everything” decision. It is usually a small set of storage roles with clear contracts.

Registry

PostgreSQL tables

Vehicles, drivers, depots, tenants, devices, firmware, permissions, and geofence definitions.

Telemetry

Time-series table

Append-heavy readings partitioned by time and indexed by vehicle, tenant, and metric.

Spatial

PostGIS indexes

Geofence polygons, route geometry, point-in-polygon checks, and map query acceleration.

Object storage

Compliance exports and coarse summaries stored in queryable columnar files.

This role split explains why TimescaleDB plus PostGIS can be a strong starting point: it keeps time-series rows and spatial queries inside PostgreSQL while preserving SQL joins to registry data. InfluxDB can still be appropriate when spatial joins are not central. MongoDB can be appropriate when event documents are the primary query surface. The deciding factor is the query contract, not a generic product ranking.

8.4.4 Step 3: Sketch the Schema

The schema keeps event time, insert time, tenant, vehicle, location, and quality evidence together. It also keeps a separate latest-state path so dashboards do not scan raw history.

CREATE TABLE fleet_points (
    observed_at  TIMESTAMPTZ      NOT NULL,
    inserted_at  TIMESTAMPTZ      NOT NULL DEFAULT now(),
    tenant_id    TEXT             NOT NULL,
    vehicle_id   TEXT             NOT NULL,
    location     GEOGRAPHY(POINT, 4326),
    speed_kmh    NUMERIC(6,2),
    heading_deg  NUMERIC(5,2),
    fuel_pct     NUMERIC(5,2),
    quality      SMALLINT         NOT NULL DEFAULT 100,
    PRIMARY KEY (tenant_id, vehicle_id, observed_at)
);

SELECT create_hypertable(
    'fleet_points',
    by_range('observed_at', INTERVAL '1 day'),
    if_not_exists => TRUE
);

CREATE INDEX fleet_points_vehicle_time_idx
    ON fleet_points (vehicle_id, observed_at DESC);

CREATE INDEX fleet_points_location_idx
    ON fleet_points USING GIST (location);

CREATE TABLE fleet_latest (
    tenant_id    TEXT        NOT NULL,
    vehicle_id   TEXT        NOT NULL,
    observed_at  TIMESTAMPTZ NOT NULL,
    location     GEOGRAPHY(POINT, 4326),
    quality      SMALLINT    NOT NULL,
    PRIMARY KEY (tenant_id, vehicle_id)
);

The latest-state table is updated by the ingest pipeline or a controlled database job. It is a product decision: it trades a small amount of write-side work for predictable map reads and clear freshness evidence.

8.4.5 Step 4: Prove the Design

Evidence

Fleet test

Failure it catches

Who signs off

Ingest burst

Replay reconnect traffic from a sample of offline vehicles.

Backfill blocking live map updates.

Backend and SRE.

Spatial plan

Run geofence queries with representative polygons and boundary cases.

Sequential scans or wrong geometry assumptions.

Backend and GIS owner.

Freshness

Compare device heartbeat, raw insert time, and latest-state update time.

Maps that show stale positions as if they were current.

Product and operations.

Audit export

Restore exported history and reproduce a sample compliance report.

Archive that stores bytes but cannot answer audit questions.

Compliance and SRE.

8.5 Worked Example 2: Smart City Data Lake

8.5.1 Scenario

A city platform collects environmental telemetry, traffic counts, streetlight status, water quality readings, video clips, asset metadata, and incident records. City operators need dashboards. Analysts need historical queries and model training datasets. Compliance teams need retention evidence and restore procedures.

8.5.2 Step 1: Separate Data Classes

The first design mistake is treating every stream as the same kind of data. The second is optimizing the small stream while ignoring the large one.

Data class

Shape

Primary store

Design risk

Sensor telemetry

Small numeric records with timestamps and locations.

Time-series store plus rollups.

Too many tags or raw scans for dashboards.

Video and images

Large binary objects with metadata and access policies.

Object storage with lifecycle rules and indexes.

Keeping continuous raw footage when only event clips are needed.

Metadata

Assets, zones, ownership, maintenance, and sensor models.

Relational store with versioned schemas.

Losing the ability to interpret old records after a sensor upgrade.

Analytics datasets

Curated extracts for reports, ML, and public data portals.

Columnar files such as Parquet in object storage.

Uncontrolled copies with no lineage or retention rule.

8.5.3 Step 2: Route to Storage Tracks

A city data lake usually has separate telemetry, media, metadata, analytics, and archive tracks.

Telemetry track

Hot raw, warm rollups

Keep recent readings queryable, then roll up older readings to the resolution operators and planners need.

Media track

Objects and metadata

Store clips as objects, keep searchable metadata separately, and use lifecycle rules only after policy approval.

Registry track

Versioned meaning

Track sensor model, unit, calibration, location, ownership, and schema version so old readings stay interpretable.

Analytics track

Curated datasets

Publish Parquet or warehouse tables with lineage, data quality status, and retention tags.

8.5.4 Step 3: Partition for Queries and Lifecycle

Time-series tables and object storage need different partitioning schemes. Do not blindly reuse the same key.

CREATE TABLE city_readings (
    observed_at    TIMESTAMPTZ NOT NULL,
    sensor_id      TEXT        NOT NULL,
    sensor_type    TEXT        NOT NULL,
    district_id    TEXT        NOT NULL,
    metric         TEXT        NOT NULL,
    value          DOUBLE PRECISION,
    unit           TEXT,
    schema_version INTEGER     NOT NULL,
    quality        SMALLINT    NOT NULL DEFAULT 100,
    PRIMARY KEY (sensor_id, metric, observed_at)
);

SELECT create_hypertable(
    'city_readings',
    by_range('observed_at', INTERVAL '1 day'),
    if_not_exists => TRUE
);

CREATE INDEX city_readings_dashboard_idx
    ON city_readings (sensor_type, district_id, observed_at DESC);

city-media/
  raw/
    year=2026/month=05/day=23/district=north/camera=cam-014/
  reviewed/
    incident_id=inc-2026-00142/
  public/
    release=transport-study-2026-q2/
  archive/
    retention_class=legal-hold/

The object path is not the only index. A searchable metadata table should hold camera id, district, incident id, retention class, privacy status, and review status. Otherwise every investigation becomes a bucket crawl.

8.5.5 Step 4: Keep Schema Meaning Versioned

Sensors change. Units change. Firmware changes. The data lake needs to know what a value meant when it was recorded.

CREATE TABLE sensor_schema_versions (
    sensor_type     TEXT        NOT NULL,
    schema_version  INTEGER     NOT NULL,
    effective_from  TIMESTAMPTZ NOT NULL,
    field_contract  JSONB       NOT NULL,
    unit_contract   JSONB       NOT NULL,
    notes           TEXT,
    PRIMARY KEY (sensor_type, schema_version)
);

Change

Safe handling

Evidence

New field

Add optional field, default behavior, and quality checks.

Old and new devices both parse in staging.

Unit change

Store original unit and converted canonical unit.

Historical reports reproduce old values correctly.

Sensor model change

Version the field contract and calibration metadata.

Dashboards segment by model during rollout.

Privacy rule change

Tag media and datasets by release class and retention class.

Access review and deletion dry run pass.

8.5.6 Step 5: Prove the Lake

Evidence

City test

Failure it catches

Who signs off

Data lineage

Trace a dashboard number back to raw readings and schema version.

Reports that cannot be explained or audited.

Data governance and analytics.

Lifecycle dry run

List objects and chunks that would move or delete under each policy.

Accidental deletion of legal-hold or public-record data.

Legal, records, and SRE.

Restore drill

Restore one district and one incident window into an isolated environment.

Archive that is cheap but operationally useless.

SRE and incident response.

Access review

Verify who can see raw media, derived metadata, and public datasets.

Privacy leakage across operational and public data paths.

Security and privacy teams.

8.6 Worked Example 3: Migration-Safe Architecture

The safest storage design assumes that one decision will be wrong later. Query patterns change, compliance obligations change, managed-service features change, and teams learn more after production traffic arrives.

Migration safety starts before a migration is needed:

Contract

Canonical event envelope

Every record has event time, ingest time, source id, schema version, quality status, and trace id.

Replay

Durable ingestion log

A broker or append-only log lets new sinks be backfilled without changing devices.

Dual path

Controlled dual-write

New stores are fed in parallel until row counts, checksums, freshness, and query outputs match.

Cutover

Measured switch

Route a small read percentage first, compare behavior, then expand with rollback criteria.

REQUIRED_FIELDS = {
    "event_time",
    "ingest_time",
    "source_id",
    "schema_version",
    "quality",
    "payload",
}


def validate_storage_event(event: dict) -> tuple[bool, list[str]]:
    missing = sorted(REQUIRED_FIELDS - event.keys())
    if missing:
        return False, [f"missing:{field}" for field in missing]
    if event["quality"] < 0 or event["quality"] > 100:
        return False, ["quality_out_of_range"]
    return True, []

8.7 Label the Storage Roles

Label the Diagram

8.8 Design Review Order

8.9 Symptom-to-Control Match

8.10 Common Pitfalls

Starting With a Product Name Instead of a Query Contract

“Use a time-series database” is not enough. The design needs to state which queries are local, which use rollups, which need current state, which join metadata, and which require archive restore.

Treating Planning Costs as Production Evidence

Spreadsheet cost estimates are useful for comparing options, but they are not release evidence. Replace estimates with measured row size, compression, storage growth, lifecycle movement, and restore time from the target environment.

Mixing Media and Telemetry Lifecycles

Video, images, numeric telemetry, metadata, and public datasets usually have different privacy, cost, and restore rules. Store them through separate tracks with explicit retention classes.

Optimizing the Small Stream First

In many city platforms, media or logs dominate storage while numeric sensor data is small. Measure volume by data class before spending effort on the wrong optimization.

8.11 Summary

Worked examples should train design judgment. For fleet tracking, the key is separating registry, telemetry, spatial queries, latest state, rollups, and archive exports. For smart city storage, the key is splitting telemetry, media, metadata, analytics, and compliance tracks. For migration safety, the key is a stable event envelope and durable ingestion log. In every case, the release bar is measured evidence: load behavior, query plans, freshness, lifecycle dry runs, restore drills, access review, and fallback.

8.12 Concept Relationships

Selection

8.13 What’s Next

Big Data

8.14 Official References

8.15 Key Takeaway

Worked examples should end with a storage justification: what is written, how often, how long it is kept, who queries it, and what failure or scaling constraint drives the design.