2 Storage Roles and Data Lifecycle

Storage Roles, Data Flow, Lifecycle Tiers, Quality Gates, and Release Evidence

data-storage

2.1 Start With the Storage Mess

Imagine a sensor value appears on a dashboard, triggers an alert, and later becomes evidence in a customer support case. The first storage question is not which database brand held the row. It is whether the system can say where the value came from, which copy is authoritative, how long the raw record lives, and how someone can restore it when the dashboard no longer shows the detail.

Data Dora

“Every reading has a time and a cost — decide retention before you decide the database.”

Through this chapter, Dora tracks each storage role in her ledger: what arrives, how long it stays, and what keeping it costs.

Overview: Storage Is A Set Of Roles

IoT data storage is not one database. A production system usually separates device registry records, timestamped telemetry, operational events, latest values, binary artifacts, and long-term archives. Each role has a different write pattern, query path, lifecycle, owner, and failure mode.

The useful overview question is not "which database is best?" It is "what must each record prove, who owns that fact, how will it be queried, how long must it live, and what evidence shows that the storage path is reliable?" Product choices come after those contracts are clear.

If you only need the intuition, this layer is enough: assign each data shape to a storage role, validate it before durable write, separate cache from history, and prove retention plus restore behavior before release.

Storage overview work should connect each role to ingestion, validation, hot storage, summaries, archive, quarantine, and restore feedback.

Worked example: a greenhouse fleet has 240 sensor nodes and each node sends one temperature reading per minute. That is 240 × 60 × 24 = 345,600 readings per day, so the telemetry role needs append-friendly writes, time-window partitions, late-arrival handling, and a retention rule. The registry is much smaller: 240 device records plus owner, zone, firmware, credential, and calibration state. Putting those facts into the telemetry table would make authorization and firmware review harder, even though the row count is tiny compared with the readings.

The same system may keep a latest-value cache with one current row per node for dashboards. That cache should be rebuildable by reading the newest accepted telemetry point for each device, not by treating the cache as the only copy of state. If raw readings stay hot for 30 days, the hot tier holds about 10.4 million readings before rollup or archive. A release-ready design records that arithmetic, the query that proves it, and the restore test that can bring a sample archived day back into the same schema.

Dora’s Retention Ledger

Reading: 240 greenhouse nodes send one temperature reading per minute — 345,600 readings a day.
Keep: raw detail stays hot for 30 days, then rolls up or moves to archive.
Cost: about 10.4 million readings held in the hot tier before rollup begins.

Core Storage Roles

Registry

Device identity, owner, model, firmware state, deployment location, credentials, and authorization metadata.

Telemetry

Append-heavy timestamped measurements that need time-window reads, partitioning, compression, and retention policy.

Events and latest values

Operational facts, alerts, state transitions, and current-value views for dashboards, rules, and APIs.

Artifacts and archives

Images, audio, firmware, model files, cold history, replay data, compliance evidence, and restoreable payloads.

Beginner Examples

A temperature reading usually belongs in telemetry, while the device owner belongs in the registry.
A dashboard's latest value can be cached, but the cache should be rebuildable from durable telemetry.
An alert event should keep enough context to explain what changed, when it changed, and which rule produced it.
A camera image belongs in object storage with searchable metadata, checksum, access policy, and retention rule.

Overview Knowledge Check

Practitioner: Build The Storage Evidence Record

A practical IoT storage review should produce an evidence record that another engineer can inspect. It should show the accepted schema, rejected-message handling, timestamp contract, source of truth, derived stores, retention rules, restore checks, and representative queries that prove the design can support production behavior.

This record prevents two common failures: using one store for every data shape and adding many stores without ownership rules. The goal is not fewer databases or more databases. The goal is clear responsibility for each fact and a tested path for storing, querying, retaining, and restoring it.

Storage Review Sequence

Classify the data shape. Separate telemetry, registry state, events, latest values, binary artifacts, and archives.
Define the write path. State whether the record is append-only, transactional, derived, cached, imported, or archived.
Define the query path. Capture representative device lookups, time-window scans, latest-value reads, incident searches, and restore queries.
Set quality gates. Validate schema version, timestamp, units, device identity, bounds, duplicate keys, and quarantine behavior.
Plan lifecycle tiers. Decide what stays hot, what becomes a summary, what is archived, and what can be deleted.
Prove recovery. Test backup, restore, replay, checksum verification, and derived-store rebuild paths before release.

Evidence Ledger

Review Item

What To Record

Common Failure

Retest Trigger

Role ownership

Which store owns registry, telemetry, event, latest-value, artifact, and archive facts.

The same fact is edited independently in registry, cache, analytics index, and export files.

New data shape, new service, migration, cache, index, or analytics copy is introduced.

Quality contract

Accepted schema versions, required fields, timestamp rules, unit normalization, rejection, and quarantine evidence.

Malformed or late records are silently inserted and later treated as trustworthy history.

Firmware payload, unit, timestamp source, validation rule, or ingestion protocol changes.

Lifecycle plan

Hot window, rollup definition, archive location, checksum, retention duration, deletion rule, and legal hold behavior.

Data is moved or deleted without a restore test or without knowing which queries still need raw history.

Retention, cost model, compliance need, dashboard query, model-training need, or archive policy changes.

Query proof

Representative live and historical queries, index or partition evidence, latency sample, and load-test result.

The design optimizes for a guessed query while real dashboards and incidents use different access patterns.

Dashboard, API, alert rule, analytics report, fleet size, or incident workflow changes.

Practitioner Knowledge Check

Under The Hood: Storage Handoffs And Failure Modes

The storage layer starts before the database write. A device message is received, authenticated, decoded, validated, normalized, routed, indexed, summarized, cached, archived, and sometimes replayed. Each handoff can preserve evidence or lose it. A reliable storage design names those handoffs and records how they are tested.

Two timestamps are often needed: the device event time and the platform receive time. Event time supports the physical history. Receive time explains network delay, offline buffering, clock drift, and late arrivals. UTC should be the durable timestamp representation; local time belongs at the display layer when users need it.

Worked example: a gateway serving 80 meters loses uplink for 18 minutes while devices continue sampling once per minute. When the link returns, it uploads 80 × 18 = 1,440 delayed readings. The storage handoff needs a stable device id, event-time timestamp, receive-time timestamp, sequence or idempotency key, schema version, and quality flag before the durable write. Without those fields, the platform cannot tell a delayed 10:05 reading from a live 10:23 reading, and dashboards may silently mix physical time with arrival time.

Dora’s Retention Ledger

Reading: 80 meters sampling once a minute; an 18-minute outage returns 1,440 delayed readings at once.
Keep: both event time and receive time, plus a sequence or idempotency key, before the durable write.
Cost: without them, a delayed 10:05 reading looks like a live 10:23 one.

Another failure appears when derived views are not rebuilt from authoritative data. If a latest-value cache for 600 devices is cleared during maintenance, the recovery run should read the newest accepted durable record per device, rebuild 600 cache entries, and mark any missing device with an explicit stale or unknown state. If the rebuild finds 12 devices with no valid sample in the last hour, that is operational evidence, not a reason to invent current values. The under-the-hood rule is simple: every derived copy needs a named source, rebuild query, exception route, owner, and retest trigger.

Pipeline Handoffs

Ingestion to validation

Authenticate device identity, decode the payload, check schema version, and decide accepted versus rejected state.

Validation to durable write

Normalize timestamp, units, metric names, device IDs, and deduplication keys before writing trusted history.

Durable write to derived views

Build latest-value caches, dashboards, search indexes, and summaries from stores that can be replayed or reconciled.

Hot store to archive

Retain checksums, manifests, schemas, metadata links, restore permissions, and query instructions for cold history.

Failure Questions To Ask

What happens when a device reconnects and uploads older readings?
Can duplicate messages be detected without dropping legitimate repeated measurements?
Can a derived cache be rebuilt after eviction, restart, or schema change?
Can archived data be restored, decoded, and joined back to device metadata?
Which store owns each fact, and which stores are derived copies?
Do representative queries still pass after retention, migration, or partition changes?

Under-the-Hood Knowledge Check

2.2 Summary

IoT storage should be designed as roles and contracts, not as a single product choice.
Registry, telemetry, events, latest values, artifacts, and archives have different owners, queries, lifecycles, and failure modes.
Durable storage begins with ingestion, validation, normalization, timestamp handling, and rejection evidence.
Caches, dashboards, search indexes, and summaries should usually be rebuildable from authoritative stores.
Retention and archive policies need restore, checksum, schema, permission, and query evidence before release.

Key Takeaway

Route each IoT data shape to an owned storage role, validate before durable write, separate cache from history, and prove lifecycle recovery before release.