3 Choosing a Stream Architecture

data

stream-processing

iot

Keywords

stream processing architecture, IoT event stream, event-time processing, stream state, architecture review, pipeline evidence

3.1 Start With the Path One Event Must Survive

Before choosing a streaming tool, follow one event from the device to the decision it is supposed to support. Ask where it is accepted, where it may wait, where state is kept, where order can change, and who receives the output.

An architecture is the answer to those handoff questions. The right shape is the one that makes timing, ordering, recovery, and integration evidence visible enough for the team to defend the stream’s behavior in production.

3.2 Choose Architecture From Evidence

Stream-processing architecture is the shape of the path that receives IoT events, preserves their meaning, processes them while they are still useful, and hands bounded results to downstream decisions. The strongest architecture is not the most elaborate one. It is the simplest design that preserves the evidence the decision needs.

A sensor stream that only validates independent events may need a direct path. A maintenance alert that depends on several related events may need keyed state and event-time windows. A stream that must be replayed after rule changes may need a durable event log. The architecture choice follows from the event path, ordering need, state requirement, sink contract, and recovery evidence.

Worked example: a cold-chain gateway may receive one temperature reading every minute. If the only decision is “reject malformed readings and store valid readings,” a direct validation path is enough. If the decision is “alert when three related readings exceed the limit inside a 10-minute event-time window,” the path now needs keyed state and a window rule. If auditors may later ask which readings produced an alert after the rule changes, the path also needs a durable log or replayable evidence store.

The architecture choice is therefore a review claim. It should say why simpler routing is sufficient, or why extra state, replay, or hybrid batch-stream processing is justified by the decision rather than by tool preference.

If you only need the intuition, use this rule: choose the smallest architecture that preserves event identity, time basis, ordering, state, failure behavior, and retest evidence for the downstream decision.

flowchart LR
events[Device events] --> contract[Event contract]
contract --> decision{Decision need}
decision -->|Independent event| direct[Direct validation path]
decision -->|Window or correlation| stateful[Stateful processor]
decision -->|Replay or audit| log[Durable log path]
direct --> sink[Sink contract]
stateful --> sink
log --> sink
sink --> retest[Retest trigger]

Architecture Questions

Event Path

Where do events enter, which gateway or broker touches them, and what source identity, event identifier, units, schema version, and quality flags must survive?

Ordering And Time

Does order matter by device, asset, tenant, location, or correlation group, and does the decision use event time, received time, or a documented mix?

State Need

Does the path require a window, aggregate, last-known value, pattern progress, checkpoint, or replayable event history?

Sink Boundary

Which alert, dashboard, durable record, downstream stream, or review queue consumes the result, and what does it assume about finality?

Common Patterns

Direct stream path: independent events move through validation, light enrichment, and a sink with minimal retained state.
Durable log plus processors: events are recorded before processing so recovery, replay, multiple consumers, and later review are practical.
Stateful stream processor: keyed windows, aggregates, correlations, or pattern progress are maintained while events continue arriving.
Hybrid batch and stream path: live decisions use a stream path while historical review, training, or retrospective analysis uses a separate evidence path.

Overview Knowledge Check

3.3 Write The Architecture Review Record

A practical architecture review record explains why the selected pattern is sufficient for one stream path. It does not need to approve the whole platform. It should name the downstream decision, event contract, time basis, ordering rule, state record, sink contract, failure handling, and retest trigger.

This prevents architecture diagrams from hiding the real claims. A box labeled “stream processor” does not explain whether late events are accepted, whether output is provisional, whether replay can recreate decisions, or whether a sink can tolerate correction. The review record should make those boundaries explicit.

Review Workflow

Name the downstream decision. Identify the alert, aggregate, dashboard, durable record, automated action, or review queue that consumes the stream output.
Record event meaning. Capture event identifier, source identity, event time, received time, schema version, units, status, and quality flags.
Check ordering and state. Decide which key needs order, which state is retained, and when that state expires or is repaired.
Select the pattern. Choose direct path, durable log, stateful processor, hybrid path, or a bounded combination only where evidence requires it.
Define failure behavior. State what happens to late, duplicate, malformed, missing, delayed, retried, or replayed events.
Attach retest triggers. Reopen review when source, schema, load shape, time basis, ordering rule, state rule, or sink contract changes.

Architecture Record

Review Field

What To Record

Architectural Signal

Common Failure

Decision need

The alert, dashboard, durable result, or automated action that depends on the stream output.

Defines latency, accuracy, reviewability, and finality expectations.

The architecture optimizes transport speed but not the decision that consumes the output.

Event contract

Source id, event id, event time, schema version, units, status, quality flags, and required fields.

Shows whether events can be interpreted consistently after routing or replay.

Consumers infer missing meaning differently and produce incompatible results.

State and ordering

Key, order scope, window, correlation rule, aggregate, checkpoint, retention, and cleanup behavior.

Shows whether direct, stateful, replayable, or hybrid processing is justified.

A multi-event decision is implemented as independent filtering with no retained evidence.

Failure and retest

Late-event policy, duplicate handling, malformed-event routing, replay behavior, sink contract, and retest trigger.

Shows how the architecture stays explainable after disruption or change.

The sink receives outputs without knowing whether they are provisional, corrected, stale, or final.

Worked Review: Equipment Alert Stream

A factory equipment alert stream receives vibration and temperature events from edge gateways. The alert should fire only when related signals for the same asset appear inside a reviewed event-time window. That decision needs more than independent filtering because partial evidence must be retained while the second signal may still arrive.

A stateful stream processor is justified for the alert path because it can key state by asset, retain partial pattern evidence, apply an event-time window, and emit a bounded alert record. A durable log may also be justified if the team must replay the evidence after a rule change, audit a disputed alert, or recover after processor failure. A direct path may still be fine for an independent health-heartbeat stream where each event stands alone.

Practitioner Knowledge Check

3.4 Architecture Is Failure Semantics

Under the hood, architecture choices decide what evidence survives failure. A durable log preserves an event history but does not automatically define event meaning. A state store can support windows and correlations but needs retention, checkpoint, and recovery rules. A sink can display fast results but still needs finality, freshness, and correction semantics.

The hard questions appear when the path is stressed: events arrive late, devices resend, processors restart, state expires, a sink is unavailable, schema changes, or a rule must be replayed. A reviewable architecture can explain whether output is unchanged, corrected, withheld, provisional, duplicated, or routed for manual review.

Worked example: a motor alert rule sees vibration event V17, then temperature event T22. The processor emits alert A9 and checkpoints state. After a restart, replaying V17 and T22 should rebuild the same decision without creating a second work order. That requires stable event identifiers, a checkpoint or log offset, an idempotency key for A9, and a sink contract that treats replayed output as a correction or confirmation rather than a new incident.

Test the failure semantics with a small trace. Replay the same events twice, delay one event beyond the window, send one duplicate, and restart the processor mid-window. The expected output should be written before the test runs. If the architecture cannot predict whether each output is final, corrected, withheld, or dead-lettered, it is not yet reviewable.

A practical review also names what operators will see. If replay confirms alert A9, the dashboard should show the same alert identity and an updated recovery note, not a second incident. If the late T22 event arrives after the cutoff, the sink should show either a correction event or a documented rejection reason. Those observable contracts are part of the architecture.

Mechanics To Surface

Event-Time Mechanics

Window boundaries, late-event policy, watermark or cutoff behavior, and received-time exceptions decide what evidence can influence an output.

State Mechanics

Keys, partitions, checkpoints, retention, cleanup, and recovery determine whether stateful decisions remain explainable after restart.

Replay Mechanics

Durable logs, idempotency keys, duplicate policies, versioned rules, and side-effect boundaries determine whether replay is safe.

Integration Mechanics

Sink contracts, freshness labels, correction events, backpressure policy, and dead-letter routing determine what downstream users can trust.

Failure Modes To Test

Late data changes meaning: an event arrives after the first output and the architecture has no correction or finality rule.
State cannot recover: a processor restart loses partial patterns or creates a different result from the same input trace.
Replay duplicates side effects: reprocessing emits a second alert, command, or durable record without idempotency evidence.
Backpressure hides age: a sink shows outputs as current even though the processor is operating on delayed input.
Schema drift spreads: unknown event versions flow through the main path and force every consumer to guess.

Under-the-Hood Knowledge Check

3.5 Summary

Stream-processing architecture should be selected from decision evidence, not from tool preference.
Direct, durable-log, stateful, and hybrid paths are patterns with different evidence tradeoffs, not maturity levels.
Event contracts, time basis, ordering rules, state records, sink contracts, failure behavior, and retest triggers make architecture claims reviewable.
Stateful or replayable paths are justified when the downstream decision depends on history, correlation, recovery, audit, or later review.
Failure semantics matter because late data, restart, replay, schema drift, backpressure, and sink outages can change what stream outputs mean.

Key Takeaway

Choose the smallest stream-processing architecture that preserves the event meaning, state, recovery behavior, sink contract, and retest evidence the downstream decision needs.