12  Lab: Stream Processing

Chapter Topic
Stream Processing Overview of batch vs. stream processing for IoT
Fundamentals Windowing strategies, watermarks, and event-time semantics
Architectures Kafka, Flink, and Spark Streaming comparison
Pipelines End-to-end ingestion, processing, and output design
Challenges Late data, exactly-once semantics, and backpressure
Pitfalls Common mistakes and production worked examples
Basic Lab ESP32 circular buffers, windows, and event detection
Advanced Lab CEP, pattern matching, and anomaly detection on ESP32
Game & Summary Interactive review game and module summary
Interoperability Four levels of IoT interoperability
Interop Fundamentals Technical, syntactic, semantic, and organizational layers
Interop Standards SenML, JSON-LD, W3C WoT, and oneM2M
Integration Patterns Protocol adapters, gateways, and ontology mapping

Learning Objectives

After completing this lab, you will be able to:

  • Implement continuous data streaming with circular buffers for memory-efficient sensor data management on ESP32
  • Build tumbling and sliding window aggregations for periodic statistics and moving averages
  • Design threshold-based and rate-of-change event detection systems with alert cooldown mechanisms
  • Calculate and monitor pipeline throughput metrics for capacity planning and bottleneck detection
  • Map embedded stream processing patterns to their production equivalents in Kafka, Flink, and Spark

This hands-on lab lets you build a working stream processing system for IoT data. Think of it as your first time actually cooking a recipe instead of just reading about it. You will take raw sensor data flowing in real time and learn to filter, transform, and analyze it on the fly.

In 60 Seconds

This hands-on lab builds a complete stream processing system on ESP32, demonstrating continuous 10 Hz sensor ingestion, 5-second tumbling windows for periodic aggregation, 10-sample sliding windows for moving averages, threshold-based event detection with alert cooldown, and circular buffer management for bounded memory. These embedded patterns directly map to production concepts: circular buffers are Kafka partitions, tumbling windows are Flink hourly aggregations, and alert cooldowns are real-world backpressure mechanisms.

Key Concepts
  • Kafka Consumer Group: Set of Kafka consumers sharing partition assignments, enabling parallel processing of event streams with automatic rebalancing when consumers join or leave
  • Stream Processing Topology: Directed acyclic graph of processing operators (source, filter, transform, aggregate, sink) defining how events flow through a streaming pipeline
  • Windowed Aggregation: Computing aggregate values (count, sum, average, max) over events within a defined time window (5-minute tumbling, 1-minute sliding)
  • Consumer Offset: Kafka bookmark tracking which events a consumer group has processed, enabling exactly-once or at-least-once resumption after restart
  • Stream Join: Combining related events from two Kafka topics based on a common key (device ID) within a configurable time window
  • Kafka Streams DSL: High-level API for defining stream processing topologies in Java/Kotlin using functional operators (filter, map, groupBy, aggregate)
  • Serdes (Serializer/Deserializer): Components converting between Java objects and byte arrays for Kafka topic storage and retrieval
  • State Store: Local persistent storage (RocksDB) within a Kafka Streams application maintaining aggregation state with automatic changelog topic backup for recovery

12.1 Lab: Build a Real-Time Stream Processing System

~45 min | Intermediate-Advanced | P10.C14.LAB01

What You’ll Build

A complete stream processing system on ESP32 that reads temperature, light, and threshold data at 10 Hz, processes it through tumbling and sliding windows, detects anomalies with alert cooldowns, stores samples in a circular buffer, and displays real-time throughput metrics over serial. You will wire up the circuit, load the code, and experiment with each processing stage hands-on.

12.1.1 Stream Processing Concepts Demonstrated

This lab implements several key stream processing patterns at the embedded level:

Concept Implementation Real-World Analogy
Continuous Streaming Sensors read every 100ms Kafka consumer polling
Tumbling Windows Non-overlapping 5-second windows Hourly billing aggregates
Sliding Windows 10-sample moving average Real-time trend smoothing
Event Detection Threshold crossing alerts Anomaly detection triggers
Circular Buffer Fixed-size ring buffer Bounded memory streaming
Throughput Stats Events per second tracking Pipeline monitoring

12.1.2 Wokwi Simulator

Use the embedded simulator below to build your stream processing system:

12.1.3 Circuit Setup

Connect multiple sensors and LEDs to the ESP32:

Component ESP32 Pin Purpose
Temperature Sensor (NTC) GPIO 34 Primary data stream
Light Sensor (LDR) GPIO 35 Secondary data stream
Red LED GPIO 18 High temperature alert
Yellow LED GPIO 19 Rate of change alert
Green LED GPIO 21 Normal operation indicator
Potentiometer GPIO 32 Adjustable threshold

Add this diagram.json configuration in Wokwi:

{
  "version": 1,
  "author": "IoT Class - Stream Processing Lab",
  "editor": "wokwi",
  "parts": [
    { "type": "wokwi-esp32-devkit-v1", "id": "esp", "top": 0, "left": 0 },
    { "type": "wokwi-ntc-temperature-sensor", "id": "temp1", "top": -100, "left": 100 },
    { "type": "wokwi-photoresistor-sensor", "id": "ldr1", "top": -100, "left": 220 },
    { "type": "wokwi-potentiometer", "id": "pot1", "top": -100, "left": 340 },
    { "type": "wokwi-led", "id": "led_red", "top": 150, "left": 100, "attrs": { "color": "red" } },
    { "type": "wokwi-led", "id": "led_yellow", "top": 150, "left": 150, "attrs": { "color": "yellow" } },
    { "type": "wokwi-led", "id": "led_green", "top": 150, "left": 200, "attrs": { "color": "green" } },
    { "type": "wokwi-resistor", "id": "r1", "top": 200, "left": 100, "attrs": { "value": "220" } },
    { "type": "wokwi-resistor", "id": "r2", "top": 200, "left": 150, "attrs": { "value": "220" } },
    { "type": "wokwi-resistor", "id": "r3", "top": 200, "left": 200, "attrs": { "value": "220" } }
  ],
  "connections": [
    ["esp:GND.1", "temp1:GND", "black", ["h0"]],
    ["esp:3V3", "temp1:VCC", "red", ["h0"]],
    ["esp:34", "temp1:OUT", "green", ["h0"]],
    ["esp:GND.1", "ldr1:GND", "black", ["h0"]],
    ["esp:3V3", "ldr1:VCC", "red", ["h0"]],
    ["esp:35", "ldr1:OUT", "orange", ["h0"]],
    ["esp:GND.1", "pot1:GND", "black", ["h0"]],
    ["esp:3V3", "pot1:VCC", "red", ["h0"]],
    ["esp:32", "pot1:SIG", "purple", ["h0"]],
    ["esp:18", "led_red:A", "red", ["h0"]],
    ["led_red:C", "r1:1", "black", ["h0"]],
    ["r1:2", "esp:GND.2", "black", ["h0"]],
    ["esp:19", "led_yellow:A", "yellow", ["h0"]],
    ["led_yellow:C", "r2:1", "black", ["h0"]],
    ["r2:2", "esp:GND.2", "black", ["h0"]],
    ["esp:21", "led_green:A", "green", ["h0"]],
    ["led_green:C", "r3:1", "black", ["h0"]],
    ["r3:2", "esp:GND.2", "black", ["h0"]]
  ]
}

12.1.4 Complete Arduino Code

Copy this code into the Wokwi editor. The code is organized into five key sections explained below.

1. Configuration and Data Structures:

// Pin definitions
const int TEMP_PIN = 34, LIGHT_PIN = 35, THRESHOLD_PIN = 32;
const int LED_RED = 18, LED_YELLOW = 19, LED_GREEN = 21;

// Stream processing parameters
const int SAMPLE_INTERVAL_MS = 100;      // 10 Hz sampling
const int TUMBLING_WINDOW_MS = 5000;     // 5-second windows
const int SLIDING_WINDOW_SIZE = 10;      // 10-sample sliding window
const int BUFFER_SIZE = 100;             // Circular buffer capacity

struct SensorReading {
  unsigned long timestamp;
  float temperature;
  float light;
};

SensorReading circularBuffer[BUFFER_SIZE];
int bufferHead = 0, bufferCount = 0;

2. Circular Buffer – memory-bounded stream storage (maps to Kafka partitions):

void addToCircularBuffer(unsigned long ts, float temp, float light) {
  circularBuffer[bufferHead] = {ts, temp, light};
  bufferHead = (bufferHead + 1) % BUFFER_SIZE;
  if (bufferCount < BUFFER_SIZE) bufferCount++;
}

3. Tumbling Window – non-overlapping 5-second aggregation:

struct TumblingWindow {
  unsigned long windowStart;
  float tempSum, lightSum, tempMin, tempMax;
  int sampleCount;
};

void updateTumblingWindow(float temp, float light) {
  currentWindow.tempSum += temp;
  currentWindow.lightSum += light;
  currentWindow.sampleCount++;
  if (temp < currentWindow.tempMin) currentWindow.tempMin = temp;
  if (temp > currentWindow.tempMax) currentWindow.tempMax = temp;
}

4. Event Detection – threshold and rate-of-change with cooldown:

void detectEvents(float currentTemp, float slidingAvg, float threshold) {
  unsigned long now = millis();
  bool inCooldown = (now - lastAlertTime) < ALERT_COOLDOWN_MS;

  if (currentTemp > threshold) {
    digitalWrite(LED_RED, HIGH);
    if (!inCooldown) {
      Serial.print("ALERT: Temp ");
      Serial.print(currentTemp, 1);
      Serial.print("C > threshold ");
      Serial.println(threshold, 1);
      lastAlertTime = now;
    }
  }

  float rateOfChange = abs(slidingAvg - lastTempAvg);
  if (rateOfChange > 5.0 && !inCooldown) {
    digitalWrite(LED_YELLOW, HIGH);
    lastAlertTime = now;
  }
  lastTempAvg = slidingAvg;
}

5. Main Loop – the streaming pipeline orchestration:

void loop() {
  unsigned long now = millis();
  if (now - lastSampleTime >= SAMPLE_INTERVAL_MS) {
    lastSampleTime = now;
    float temp = readTemperature();
    float light = readLight();
    float threshold = readThreshold();

    addToCircularBuffer(now, temp, light);   // Store
    updateTumblingWindow(temp, light);        // Aggregate
    updateSlidingWindow(temp, light);         // Moving avg
    float slidingAvg = getSlidingAverage(tempSlidingBuffer);
    detectEvents(temp, slidingAvg, threshold);// Alert
    updateThroughput();                       // Metrics

    if (now - currentWindow.windowStart >= TUMBLING_WINDOW_MS) {
      emitTumblingWindowResults();
      resetTumblingWindow();
    }
  }
}

The complete code combines all sections above with setup, display functions, and sensor reading helpers. Copy the full code from the Wokwi simulator embedded above or expand this section to view it.

12.1.5 Step-by-Step Instructions

12.1.5.1 Step 1: Set Up the Simulator

  1. Open the Wokwi simulator embedded above (or visit wokwi.com)
  2. Create a new ESP32 project
  3. Click the diagram.json tab and paste the circuit configuration
  4. Replace the default code with the complete Arduino code above

12.1.5.2 Step 2: Run and Observe Basic Streaming

  1. Click the Play button to start the simulation
  2. Open the Serial Monitor to see the stream output
  3. Observe the continuous data flow:
    • RAW readings update every 100ms (10 Hz)
    • SLIDING averages smooth the data
    • BUFFER fills up over time

12.1.5.3 Step 3: Experiment with Tumbling Windows

  1. Wait 5 seconds for the first tumbling window to complete
  2. Observe the window summary: min, max, average, sample count
  3. Note: Each window is non-overlapping - no sample counted twice
  4. Think about: Why is this useful for billing or hourly reports?

12.1.5.4 Step 4: Adjust Threshold and Trigger Alerts

  1. Click and drag the potentiometer to lower the threshold
  2. Click the NTC sensor and increase temperature to trigger RED LED
  3. Observe the alert message in Serial Monitor
  4. Note the cooldown: Alerts don’t flood (2-second cooldown)

12.1.5.5 Step 5: Observe Rapid Change Detection

  1. Quickly change the temperature sensor value (click and drag rapidly)
  2. Watch for YELLOW LED indicating rapid change detected
  3. This demonstrates: Rate-of-change anomaly detection
  4. Real-world analogy: Detecting equipment failure signatures

12.1.5.6 Step 6: Understand the Circular Buffer

  1. Watch the BUFFER FILL visualization grow
  2. After ~10 seconds, buffer reaches 100% full
  3. New data overwrites old data - fixed memory usage
  4. This is critical: Embedded devices have limited RAM

12.1.6 Understanding the Output

Architecture diagram showing stream lab architecture components and layers
Figure 12.1: Stream processing lab architecture showing data flow from sensors through buffer and windowing stages to event detection and outputs

12.1.7 Key Stream Processing Concepts Explained

12.2 Concept 1: Tumbling vs Sliding Windows

Tumbling Windows (implemented in this lab): - Non-overlapping, fixed-duration windows (5 seconds in our lab) - Each event belongs to exactly ONE window - Results emitted when window closes - Use case: Hourly billing summaries, periodic aggregates

Sliding Windows (implemented in this lab): - Overlapping windows that “slide” with each new event - Each event may belong to MULTIPLE windows - Continuously updated averages - Use case: Moving averages, trend smoothing, real-time dashboards

Memory Trade-off:

  • Tumbling: O(1) memory per window (just aggregates)
  • Sliding: O(window_size) memory (must store all samples)

Why Circular Buffers?

  • Streams are infinite, memory is finite
  • Fixed memory allocation prevents heap fragmentation
  • Oldest data automatically evicted when buffer full
  • O(1) insert and O(1) access by offset

Implementation Pattern:

buffer[head] = newData;          // Write at head
head = (head + 1) % BUFFER_SIZE; // Wrap around

Real-World Analogy: Like a security camera that records over the oldest footage - you always have the last N hours, never running out of storage.

Stream Processing Buffer Sizing for a 10 Hz IoT Sensor

Given a circular buffer storing 100 samples at 10 Hz sampling rate:

Time Coverage: Each sample arrives every \(\frac{1}{10 \text{ Hz}} = 100 \text{ ms}\). Total buffer time window: \[T_{\text{buffer}} = 100 \text{ samples} \times 0.1 \text{ s} = 10 \text{ seconds}\]

Memory Footprint: Each SensorReading struct contains 1 timestamp (4 bytes), 2 floats (8 bytes): \[M_{\text{buffer}} = 100 \times (4 + 4 + 4) = 1,200 \text{ bytes} = 1.17 \text{ KB}\]

Tumbling Window Sample Count: 5-second window at 10 Hz: \[N_{\text{tumbling}} = 5 \text{ s} \times 10 \text{ Hz} = 50 \text{ samples per window}\]

Sliding Window Latency: 10-sample sliding window introduces: \[\text{Latency}_{\text{sliding}} = \frac{10 \text{ samples}}{10 \text{ Hz}} = 1 \text{ second lag}\]

Event Detection Rate: With 2-second cooldown, maximum alert frequency: \[f_{\text{alert,max}} = \frac{1}{2 \text{ s}} = 0.5 \text{ Hz} = 30 \text{ alerts/min}\]

Throughput Calculation: At 10 Hz sampling, daily event count: \[N_{\text{daily}} = 10 \text{ events/s} \times 86{,}400 \text{ s/day} = 864{,}000 \text{ events/day}\]

Real-world insight: This lab’s 1.17 KB buffer demonstrates bounded-memory streaming—the same principle Kafka uses with partition retention, just at microcontroller scale instead of cluster scale.

Threshold-Based Detection:

  • Simple: value > threshold triggers alert
  • Fast: O(1) computation per event
  • Limitation: Doesn’t detect trends, only instantaneous values

Rate-of-Change Detection:

  • Compares current average to previous average
  • Detects rapid changes even within “normal” range
  • Example: Temperature rising 10C/minute (could indicate fire)

Alert Cooldown (Backpressure):

  • Prevents alert flooding during sustained anomalies
  • Real systems use exponential backoff
  • Balance: Too short = alert fatigue, Too long = missed events

Why Track Throughput?

  • Detect processing bottlenecks before they cause data loss
  • Capacity planning for production deployments
  • Alerting when throughput drops (sensor failure?)

Metrics to Monitor:

  • Events per second (current throughput)
  • Total events processed (lifetime counter)
  • Buffer fill percentage (approaching capacity?)
  • Processing latency (time from read to output)

In Production: Use Prometheus, Grafana, or cloud monitoring to track these metrics with alerting thresholds.

12.2.1 Challenge Exercises

Challenge 1: Add Session Windows

Modify the code to implement session windows that group events by activity periods:

  • Start a new session when first event arrives after 3+ seconds of inactivity
  • Close session and emit results after 3 seconds of no new events
  • Track: session duration, event count, min/max/avg values

Hint: Track lastEventTime and check for gaps exceeding SESSION_GAP_MS

Challenge 2: Implement Late Data Handling

Add support for simulated “late arriving” data:

  1. Add a button that injects an event with a timestamp 10 seconds in the past
  2. Determine which tumbling window this event belongs to
  3. If window already closed, route to “late data” output
  4. Display count of late events separately

Hint: Compare event timestamp to currentWindow.windowStart

Challenge 3: Multi-Sensor Correlation

Extend event detection to correlate multiple sensors:

  • Alert when BOTH temperature is high AND light is low (equipment overheating in darkness)
  • Track how often both conditions occur simultaneously
  • Implement a “compound event” counter

Hint: Add correlatedAlertCount and check both conditions in detectEvents()

Challenge 4: Adaptive Thresholds

Replace the fixed potentiometer threshold with an adaptive threshold:

  1. Calculate the moving standard deviation of temperature
  2. Set threshold = sliding_average + (2 * standard_deviation)
  3. This creates an adaptive anomaly boundary
  4. Compare alert rates between fixed and adaptive thresholds

Hint: Track sum of squares to calculate variance: variance = (sumSq/n) - (sum/n)^2

12.2.2 Common Issues and Solutions

Issue Cause Solution
No serial output Baud rate mismatch Ensure both code and monitor use 115200
LEDs not lighting Wrong pin connection Check diagram.json connections match code
Buffer always 100% Expected behavior Circular buffer maintains fixed size by design
No tumbling window output Not enough time Wait full 5 seconds for first window
Alerts firing constantly Threshold too low Adjust potentiometer higher

12.2.3 From Lab to Production

Each pattern in this lab has a direct production counterpart. The 10 Hz polling loop mirrors a Kafka consumer, the circular buffer mirrors Kafka partition retention, tumbling and sliding windows map to Flink or Spark Structured Streaming operators, threshold alerts map to PagerDuty integrations, and throughput stats map to Prometheus pipeline metrics. Building these on an ESP32 first gives you hands-on intuition for the same abstractions at cloud scale.

Key Takeaway

This lab demonstrates that stream processing fundamentals – circular buffers, tumbling/sliding windows, event detection, and throughput monitoring – can be implemented on a $5 ESP32 microcontroller. The patterns are identical to production systems: circular buffers map to Kafka topic partitions with retention, tumbling windows map to Flink/Spark hourly aggregations, and alert cooldowns implement the same backpressure concept used in PagerDuty integrations. Understanding these patterns on embedded hardware builds intuition that transfers directly to enterprise stream processing.

A packaging factory uses ESP32 microcontrollers to monitor conveyor belt throughput. Each sensor detects package passings and must detect stalls (no packages for 10 seconds) and overloads (>30 packages/minute sustained).

Implementation Details:

  • Sampling rate: 10 Hz (checking optical sensor every 100 ms)
  • Tumbling window: 60 seconds for throughput calculation
  • Sliding window: 10-second moving average to smooth jitter
  • Circular buffer: 600 samples (60 seconds x 10 Hz = 600 readings)
  • Alert cooldown: 30 seconds to prevent operator fatigue

Sample Calculation: If 45 packages pass in a 60-second tumbling window, throughput = 45 packages/minute. The sliding window averages the last 100 samples (10 seconds at 10 Hz) to smooth out brief sensor jitter. When average inter-package time exceeds 10 seconds OR throughput exceeds 30/min for 3 consecutive windows, the system triggers alerts.

Production Scaling: This same logic deployed across 12 conveyor lines processes 1,440 package detections/minute total (120 detections/min x 12 lines), while raw sensor data totals 7,200 samples/minute (600 samples/min x 12 lines). At scale, Apache Kafka ingests these events, Flink applies identical windowing logic cluster-wide, and alerts route to PagerDuty with the same 30-second cooldown pattern.

Use Case Tumbling Window Sliding Window Session Window Best For
Hourly energy billing 1-hour non-overlapping - - Fixed-period aggregates, no overlap needed
Real-time anomaly detection - 5-min window, 30s slide - Continuous monitoring, smooth results
User interaction tracking - - 10-min timeout Activity bursts (e.g., clicks, app usage)
Temperature threshold alerts - 2-min window, 10s slide - Quick response to sustained conditions
Network traffic analysis 1-min buckets - - Periodic reports, traffic shaping

Selection Criteria:

  • Latency tolerance <1 second → Sliding (1s slide) or no windows
  • Storage constraints → Tumbling (lower memory, no overlap)
  • Event correlation → Session (groups related events)
  • Regulatory compliance → Tumbling (exactly-once per period)
Common Mistake: Confusing Buffer Size with Window Duration

What practitioners do wrong: They set the circular buffer size equal to the tumbling window sample count. For example, a 5-second tumbling window at 10 Hz sampling (50 samples) uses a 50-sample circular buffer.

Why it fails: If network latency delays processing by even 100ms, the buffer wraps around and overwrites unprocessed samples. The tumbling window then emits incomplete aggregations with missing data points.

Correct approach: Size the circular buffer to accommodate: 1. Window duration samples (5s × 10 Hz = 50 samples) 2. Maximum expected processing delay (2s × 10 Hz = 20 samples) 3. Safety margin (30% = 21 samples) 4. Total buffer size: 50 + 20 + 21 = 91 samples (round to 100)

Real-world consequence: In a 2019 smart grid deployment, undersized buffers caused 3-7% data loss during network congestion, leading to incorrect billing aggregates. The fix required OTA firmware updates to 40,000 deployed meters. Buffer sizing costs a few hundred bytes of RAM but prevents catastrophic data loss.

The Sensor Squad builds their first real-time data system – on a tiny computer!

Max the Microcontroller (an ESP32) is excited! Today he gets to build a stream processing system – just like the ones Netflix and banks use, but on a tiny computer that costs only $5!

“We have three missions,” says Max:

Mission 1: The Endless Notebook (Circular Buffer) Sammy the Sensor reads the temperature 10 times per second. But Max only has room for 100 readings in his notebook (memory).

“When my notebook is full, I erase the OLDEST reading and write the newest one on top!” says Max. “I always have the most recent 10 seconds of data.”

Lila the LED compares it to a clock: “The hands keep going around and around, always pointing to NOW! That is why it is called a CIRCULAR buffer!”

Mission 2: The 5-Second Summary (Tumbling Window) Every 5 seconds, Max looks at all the readings he collected and writes a summary: “Average temperature: 25.3C, Minimum: 24.1C, Maximum: 26.8C.”

“It is like a teacher grading papers every Friday,” explains Bella the Battery. “She does not grade DURING the week (that would be chaos), she waits until the end and grades them all at once. That is a tumbling window!”

Mission 3: The Alert System (Event Detection) If the temperature goes above a threshold: RED LED lights up! If the temperature changes really fast: YELLOW LED flashes! If everything is normal: GREEN LED glows peacefully.

“But we do not want the alarm screaming every second!” says Max. “So I add a cooldown – after one alert, I wait 2 seconds before alerting again. Otherwise Lila would burn out!”

After building it, the Sensor Squad checks their throughput: “We are processing 10 events per second on a computer smaller than a cookie! Real systems do millions per second, but the LOGIC is exactly the same!”

12.2.4 Try This at Home!

Get a piece of paper and draw 10 boxes in a row. This is your “circular buffer.” Now roll a dice every 5 seconds and write the number in the next box. When you reach box 10, go back to box 1 and write OVER the old number. After a few rounds, look at your buffer – you always have the last 10 rolls! That is how computers store streaming data with limited memory.

12.3 Concept Relationships

This lab demonstrates core concepts from earlier chapters:

The embedded implementation builds intuition for cloud-scale systems.

12.4 See Also

ESP32 Stream Processing:

Circular Buffer Implementations:

12.6 What’s Next

If you want to… Read this
Advance to complex event processing lab Hands-On Lab: Advanced CEP
Study stream processing architecture patterns Stream Processing Architectures
Build complete IoT streaming pipelines Building IoT Streaming Pipelines
Review common streaming pitfalls to avoid Common Pitfalls and Worked Examples

Continue to Hands-On Lab: Advanced CEP for a 60-minute advanced lab covering pattern matching, complex event processing (CEP), session windows, and statistical aggregation.