12 Lab: Stream Processing
| Chapter | Topic |
|---|---|
| Stream Processing | Overview of batch vs. stream processing for IoT |
| Fundamentals | Windowing strategies, watermarks, and event-time semantics |
| Architectures | Kafka, Flink, and Spark Streaming comparison |
| Pipelines | End-to-end ingestion, processing, and output design |
| Challenges | Late data, exactly-once semantics, and backpressure |
| Pitfalls | Common mistakes and production worked examples |
| Basic Lab | ESP32 circular buffers, windows, and event detection |
| Advanced Lab | CEP, pattern matching, and anomaly detection on ESP32 |
| Game & Summary | Interactive review game and module summary |
| Interoperability | Four levels of IoT interoperability |
| Interop Fundamentals | Technical, syntactic, semantic, and organizational layers |
| Interop Standards | SenML, JSON-LD, W3C WoT, and oneM2M |
| Integration Patterns | Protocol adapters, gateways, and ontology mapping |
Learning Objectives
After completing this lab, you will be able to:
- Implement continuous data streaming with circular buffers for memory-efficient sensor data management on ESP32
- Build tumbling and sliding window aggregations for periodic statistics and moving averages
- Design threshold-based and rate-of-change event detection systems with alert cooldown mechanisms
- Calculate and monitor pipeline throughput metrics for capacity planning and bottleneck detection
- Map embedded stream processing patterns to their production equivalents in Kafka, Flink, and Spark
This hands-on lab lets you build a working stream processing system for IoT data. Think of it as your first time actually cooking a recipe instead of just reading about it. You will take raw sensor data flowing in real time and learn to filter, transform, and analyze it on the fly.
- Kafka Consumer Group: Set of Kafka consumers sharing partition assignments, enabling parallel processing of event streams with automatic rebalancing when consumers join or leave
- Stream Processing Topology: Directed acyclic graph of processing operators (source, filter, transform, aggregate, sink) defining how events flow through a streaming pipeline
- Windowed Aggregation: Computing aggregate values (count, sum, average, max) over events within a defined time window (5-minute tumbling, 1-minute sliding)
- Consumer Offset: Kafka bookmark tracking which events a consumer group has processed, enabling exactly-once or at-least-once resumption after restart
- Stream Join: Combining related events from two Kafka topics based on a common key (device ID) within a configurable time window
- Kafka Streams DSL: High-level API for defining stream processing topologies in Java/Kotlin using functional operators (filter, map, groupBy, aggregate)
- Serdes (Serializer/Deserializer): Components converting between Java objects and byte arrays for Kafka topic storage and retrieval
- State Store: Local persistent storage (RocksDB) within a Kafka Streams application maintaining aggregation state with automatic changelog topic backup for recovery
12.1 Lab: Build a Real-Time Stream Processing System
A complete stream processing system on ESP32 that reads temperature, light, and threshold data at 10 Hz, processes it through tumbling and sliding windows, detects anomalies with alert cooldowns, stores samples in a circular buffer, and displays real-time throughput metrics over serial. You will wire up the circuit, load the code, and experiment with each processing stage hands-on.
12.1.1 Stream Processing Concepts Demonstrated
This lab implements several key stream processing patterns at the embedded level:
| Concept | Implementation | Real-World Analogy |
|---|---|---|
| Continuous Streaming | Sensors read every 100ms | Kafka consumer polling |
| Tumbling Windows | Non-overlapping 5-second windows | Hourly billing aggregates |
| Sliding Windows | 10-sample moving average | Real-time trend smoothing |
| Event Detection | Threshold crossing alerts | Anomaly detection triggers |
| Circular Buffer | Fixed-size ring buffer | Bounded memory streaming |
| Throughput Stats | Events per second tracking | Pipeline monitoring |
12.1.2 Wokwi Simulator
Use the embedded simulator below to build your stream processing system:
12.1.3 Circuit Setup
Connect multiple sensors and LEDs to the ESP32:
| Component | ESP32 Pin | Purpose |
|---|---|---|
| Temperature Sensor (NTC) | GPIO 34 | Primary data stream |
| Light Sensor (LDR) | GPIO 35 | Secondary data stream |
| Red LED | GPIO 18 | High temperature alert |
| Yellow LED | GPIO 19 | Rate of change alert |
| Green LED | GPIO 21 | Normal operation indicator |
| Potentiometer | GPIO 32 | Adjustable threshold |
Add this diagram.json configuration in Wokwi:
{
"version": 1,
"author": "IoT Class - Stream Processing Lab",
"editor": "wokwi",
"parts": [
{ "type": "wokwi-esp32-devkit-v1", "id": "esp", "top": 0, "left": 0 },
{ "type": "wokwi-ntc-temperature-sensor", "id": "temp1", "top": -100, "left": 100 },
{ "type": "wokwi-photoresistor-sensor", "id": "ldr1", "top": -100, "left": 220 },
{ "type": "wokwi-potentiometer", "id": "pot1", "top": -100, "left": 340 },
{ "type": "wokwi-led", "id": "led_red", "top": 150, "left": 100, "attrs": { "color": "red" } },
{ "type": "wokwi-led", "id": "led_yellow", "top": 150, "left": 150, "attrs": { "color": "yellow" } },
{ "type": "wokwi-led", "id": "led_green", "top": 150, "left": 200, "attrs": { "color": "green" } },
{ "type": "wokwi-resistor", "id": "r1", "top": 200, "left": 100, "attrs": { "value": "220" } },
{ "type": "wokwi-resistor", "id": "r2", "top": 200, "left": 150, "attrs": { "value": "220" } },
{ "type": "wokwi-resistor", "id": "r3", "top": 200, "left": 200, "attrs": { "value": "220" } }
],
"connections": [
["esp:GND.1", "temp1:GND", "black", ["h0"]],
["esp:3V3", "temp1:VCC", "red", ["h0"]],
["esp:34", "temp1:OUT", "green", ["h0"]],
["esp:GND.1", "ldr1:GND", "black", ["h0"]],
["esp:3V3", "ldr1:VCC", "red", ["h0"]],
["esp:35", "ldr1:OUT", "orange", ["h0"]],
["esp:GND.1", "pot1:GND", "black", ["h0"]],
["esp:3V3", "pot1:VCC", "red", ["h0"]],
["esp:32", "pot1:SIG", "purple", ["h0"]],
["esp:18", "led_red:A", "red", ["h0"]],
["led_red:C", "r1:1", "black", ["h0"]],
["r1:2", "esp:GND.2", "black", ["h0"]],
["esp:19", "led_yellow:A", "yellow", ["h0"]],
["led_yellow:C", "r2:1", "black", ["h0"]],
["r2:2", "esp:GND.2", "black", ["h0"]],
["esp:21", "led_green:A", "green", ["h0"]],
["led_green:C", "r3:1", "black", ["h0"]],
["r3:2", "esp:GND.2", "black", ["h0"]]
]
}12.1.4 Complete Arduino Code
Copy this code into the Wokwi editor. The code is organized into five key sections explained below.
1. Configuration and Data Structures:
// Pin definitions
const int TEMP_PIN = 34, LIGHT_PIN = 35, THRESHOLD_PIN = 32;
const int LED_RED = 18, LED_YELLOW = 19, LED_GREEN = 21;
// Stream processing parameters
const int SAMPLE_INTERVAL_MS = 100; // 10 Hz sampling
const int TUMBLING_WINDOW_MS = 5000; // 5-second windows
const int SLIDING_WINDOW_SIZE = 10; // 10-sample sliding window
const int BUFFER_SIZE = 100; // Circular buffer capacity
struct SensorReading {
unsigned long timestamp;
float temperature;
float light;
};
SensorReading circularBuffer[BUFFER_SIZE];
int bufferHead = 0, bufferCount = 0;2. Circular Buffer – memory-bounded stream storage (maps to Kafka partitions):
void addToCircularBuffer(unsigned long ts, float temp, float light) {
circularBuffer[bufferHead] = {ts, temp, light};
bufferHead = (bufferHead + 1) % BUFFER_SIZE;
if (bufferCount < BUFFER_SIZE) bufferCount++;
}3. Tumbling Window – non-overlapping 5-second aggregation:
struct TumblingWindow {
unsigned long windowStart;
float tempSum, lightSum, tempMin, tempMax;
int sampleCount;
};
void updateTumblingWindow(float temp, float light) {
currentWindow.tempSum += temp;
currentWindow.lightSum += light;
currentWindow.sampleCount++;
if (temp < currentWindow.tempMin) currentWindow.tempMin = temp;
if (temp > currentWindow.tempMax) currentWindow.tempMax = temp;
}4. Event Detection – threshold and rate-of-change with cooldown:
void detectEvents(float currentTemp, float slidingAvg, float threshold) {
unsigned long now = millis();
bool inCooldown = (now - lastAlertTime) < ALERT_COOLDOWN_MS;
if (currentTemp > threshold) {
digitalWrite(LED_RED, HIGH);
if (!inCooldown) {
Serial.print("ALERT: Temp ");
Serial.print(currentTemp, 1);
Serial.print("C > threshold ");
Serial.println(threshold, 1);
lastAlertTime = now;
}
}
float rateOfChange = abs(slidingAvg - lastTempAvg);
if (rateOfChange > 5.0 && !inCooldown) {
digitalWrite(LED_YELLOW, HIGH);
lastAlertTime = now;
}
lastTempAvg = slidingAvg;
}5. Main Loop – the streaming pipeline orchestration:
void loop() {
unsigned long now = millis();
if (now - lastSampleTime >= SAMPLE_INTERVAL_MS) {
lastSampleTime = now;
float temp = readTemperature();
float light = readLight();
float threshold = readThreshold();
addToCircularBuffer(now, temp, light); // Store
updateTumblingWindow(temp, light); // Aggregate
updateSlidingWindow(temp, light); // Moving avg
float slidingAvg = getSlidingAverage(tempSlidingBuffer);
detectEvents(temp, slidingAvg, threshold);// Alert
updateThroughput(); // Metrics
if (now - currentWindow.windowStart >= TUMBLING_WINDOW_MS) {
emitTumblingWindowResults();
resetTumblingWindow();
}
}
}The complete code combines all sections above with setup, display functions, and sensor reading helpers. Copy the full code from the Wokwi simulator embedded above or expand this section to view it.
12.1.5 Step-by-Step Instructions
12.1.5.1 Step 1: Set Up the Simulator
- Open the Wokwi simulator embedded above (or visit wokwi.com)
- Create a new ESP32 project
- Click the diagram.json tab and paste the circuit configuration
- Replace the default code with the complete Arduino code above
12.1.5.2 Step 2: Run and Observe Basic Streaming
- Click the Play button to start the simulation
- Open the Serial Monitor to see the stream output
- Observe the continuous data flow:
- RAW readings update every 100ms (10 Hz)
- SLIDING averages smooth the data
- BUFFER fills up over time
12.1.5.3 Step 3: Experiment with Tumbling Windows
- Wait 5 seconds for the first tumbling window to complete
- Observe the window summary: min, max, average, sample count
- Note: Each window is non-overlapping - no sample counted twice
- Think about: Why is this useful for billing or hourly reports?
12.1.5.4 Step 4: Adjust Threshold and Trigger Alerts
- Click and drag the potentiometer to lower the threshold
- Click the NTC sensor and increase temperature to trigger RED LED
- Observe the alert message in Serial Monitor
- Note the cooldown: Alerts don’t flood (2-second cooldown)
12.1.5.5 Step 5: Observe Rapid Change Detection
- Quickly change the temperature sensor value (click and drag rapidly)
- Watch for YELLOW LED indicating rapid change detected
- This demonstrates: Rate-of-change anomaly detection
- Real-world analogy: Detecting equipment failure signatures
12.1.5.6 Step 6: Understand the Circular Buffer
- Watch the BUFFER FILL visualization grow
- After ~10 seconds, buffer reaches 100% full
- New data overwrites old data - fixed memory usage
- This is critical: Embedded devices have limited RAM
12.1.6 Understanding the Output
12.1.7 Key Stream Processing Concepts Explained
12.2 Concept 1: Tumbling vs Sliding Windows
Tumbling Windows (implemented in this lab): - Non-overlapping, fixed-duration windows (5 seconds in our lab) - Each event belongs to exactly ONE window - Results emitted when window closes - Use case: Hourly billing summaries, periodic aggregates
Sliding Windows (implemented in this lab): - Overlapping windows that “slide” with each new event - Each event may belong to MULTIPLE windows - Continuously updated averages - Use case: Moving averages, trend smoothing, real-time dashboards
Memory Trade-off:
- Tumbling: O(1) memory per window (just aggregates)
- Sliding: O(window_size) memory (must store all samples)
Why Circular Buffers?
- Streams are infinite, memory is finite
- Fixed memory allocation prevents heap fragmentation
- Oldest data automatically evicted when buffer full
- O(1) insert and O(1) access by offset
Implementation Pattern:
buffer[head] = newData; // Write at head
head = (head + 1) % BUFFER_SIZE; // Wrap aroundReal-World Analogy: Like a security camera that records over the oldest footage - you always have the last N hours, never running out of storage.
Stream Processing Buffer Sizing for a 10 Hz IoT Sensor
Given a circular buffer storing 100 samples at 10 Hz sampling rate:
Time Coverage: Each sample arrives every \(\frac{1}{10 \text{ Hz}} = 100 \text{ ms}\). Total buffer time window: \[T_{\text{buffer}} = 100 \text{ samples} \times 0.1 \text{ s} = 10 \text{ seconds}\]
Memory Footprint: Each SensorReading struct contains 1 timestamp (4 bytes), 2 floats (8 bytes): \[M_{\text{buffer}} = 100 \times (4 + 4 + 4) = 1,200 \text{ bytes} = 1.17 \text{ KB}\]
Tumbling Window Sample Count: 5-second window at 10 Hz: \[N_{\text{tumbling}} = 5 \text{ s} \times 10 \text{ Hz} = 50 \text{ samples per window}\]
Sliding Window Latency: 10-sample sliding window introduces: \[\text{Latency}_{\text{sliding}} = \frac{10 \text{ samples}}{10 \text{ Hz}} = 1 \text{ second lag}\]
Event Detection Rate: With 2-second cooldown, maximum alert frequency: \[f_{\text{alert,max}} = \frac{1}{2 \text{ s}} = 0.5 \text{ Hz} = 30 \text{ alerts/min}\]
Throughput Calculation: At 10 Hz sampling, daily event count: \[N_{\text{daily}} = 10 \text{ events/s} \times 86{,}400 \text{ s/day} = 864{,}000 \text{ events/day}\]
Real-world insight: This lab’s 1.17 KB buffer demonstrates bounded-memory streaming—the same principle Kafka uses with partition retention, just at microcontroller scale instead of cluster scale.
Threshold-Based Detection:
- Simple: value > threshold triggers alert
- Fast: O(1) computation per event
- Limitation: Doesn’t detect trends, only instantaneous values
Rate-of-Change Detection:
- Compares current average to previous average
- Detects rapid changes even within “normal” range
- Example: Temperature rising 10C/minute (could indicate fire)
Alert Cooldown (Backpressure):
- Prevents alert flooding during sustained anomalies
- Real systems use exponential backoff
- Balance: Too short = alert fatigue, Too long = missed events
Why Track Throughput?
- Detect processing bottlenecks before they cause data loss
- Capacity planning for production deployments
- Alerting when throughput drops (sensor failure?)
Metrics to Monitor:
- Events per second (current throughput)
- Total events processed (lifetime counter)
- Buffer fill percentage (approaching capacity?)
- Processing latency (time from read to output)
In Production: Use Prometheus, Grafana, or cloud monitoring to track these metrics with alerting thresholds.
12.2.1 Challenge Exercises
Modify the code to implement session windows that group events by activity periods:
- Start a new session when first event arrives after 3+ seconds of inactivity
- Close session and emit results after 3 seconds of no new events
- Track: session duration, event count, min/max/avg values
Hint: Track lastEventTime and check for gaps exceeding SESSION_GAP_MS
Add support for simulated “late arriving” data:
- Add a button that injects an event with a timestamp 10 seconds in the past
- Determine which tumbling window this event belongs to
- If window already closed, route to “late data” output
- Display count of late events separately
Hint: Compare event timestamp to currentWindow.windowStart
Extend event detection to correlate multiple sensors:
- Alert when BOTH temperature is high AND light is low (equipment overheating in darkness)
- Track how often both conditions occur simultaneously
- Implement a “compound event” counter
Hint: Add correlatedAlertCount and check both conditions in detectEvents()
Replace the fixed potentiometer threshold with an adaptive threshold:
- Calculate the moving standard deviation of temperature
- Set threshold = sliding_average + (2 * standard_deviation)
- This creates an adaptive anomaly boundary
- Compare alert rates between fixed and adaptive thresholds
Hint: Track sum of squares to calculate variance: variance = (sumSq/n) - (sum/n)^2
12.2.2 Common Issues and Solutions
| Issue | Cause | Solution |
|---|---|---|
| No serial output | Baud rate mismatch | Ensure both code and monitor use 115200 |
| LEDs not lighting | Wrong pin connection | Check diagram.json connections match code |
| Buffer always 100% | Expected behavior | Circular buffer maintains fixed size by design |
| No tumbling window output | Not enough time | Wait full 5 seconds for first window |
| Alerts firing constantly | Threshold too low | Adjust potentiometer higher |
12.2.3 From Lab to Production
Each pattern in this lab has a direct production counterpart. The 10 Hz polling loop mirrors a Kafka consumer, the circular buffer mirrors Kafka partition retention, tumbling and sliding windows map to Flink or Spark Structured Streaming operators, threshold alerts map to PagerDuty integrations, and throughput stats map to Prometheus pipeline metrics. Building these on an ESP32 first gives you hands-on intuition for the same abstractions at cloud scale.
This lab demonstrates that stream processing fundamentals – circular buffers, tumbling/sliding windows, event detection, and throughput monitoring – can be implemented on a $5 ESP32 microcontroller. The patterns are identical to production systems: circular buffers map to Kafka topic partitions with retention, tumbling windows map to Flink/Spark hourly aggregations, and alert cooldowns implement the same backpressure concept used in PagerDuty integrations. Understanding these patterns on embedded hardware builds intuition that transfers directly to enterprise stream processing.
A packaging factory uses ESP32 microcontrollers to monitor conveyor belt throughput. Each sensor detects package passings and must detect stalls (no packages for 10 seconds) and overloads (>30 packages/minute sustained).
Implementation Details:
- Sampling rate: 10 Hz (checking optical sensor every 100 ms)
- Tumbling window: 60 seconds for throughput calculation
- Sliding window: 10-second moving average to smooth jitter
- Circular buffer: 600 samples (60 seconds x 10 Hz = 600 readings)
- Alert cooldown: 30 seconds to prevent operator fatigue
Sample Calculation: If 45 packages pass in a 60-second tumbling window, throughput = 45 packages/minute. The sliding window averages the last 100 samples (10 seconds at 10 Hz) to smooth out brief sensor jitter. When average inter-package time exceeds 10 seconds OR throughput exceeds 30/min for 3 consecutive windows, the system triggers alerts.
Production Scaling: This same logic deployed across 12 conveyor lines processes 1,440 package detections/minute total (120 detections/min x 12 lines), while raw sensor data totals 7,200 samples/minute (600 samples/min x 12 lines). At scale, Apache Kafka ingests these events, Flink applies identical windowing logic cluster-wide, and alerts route to PagerDuty with the same 30-second cooldown pattern.
| Use Case | Tumbling Window | Sliding Window | Session Window | Best For |
|---|---|---|---|---|
| Hourly energy billing | 1-hour non-overlapping | - | - | Fixed-period aggregates, no overlap needed |
| Real-time anomaly detection | - | 5-min window, 30s slide | - | Continuous monitoring, smooth results |
| User interaction tracking | - | - | 10-min timeout | Activity bursts (e.g., clicks, app usage) |
| Temperature threshold alerts | - | 2-min window, 10s slide | - | Quick response to sustained conditions |
| Network traffic analysis | 1-min buckets | - | - | Periodic reports, traffic shaping |
Selection Criteria:
- Latency tolerance <1 second → Sliding (1s slide) or no windows
- Storage constraints → Tumbling (lower memory, no overlap)
- Event correlation → Session (groups related events)
- Regulatory compliance → Tumbling (exactly-once per period)
What practitioners do wrong: They set the circular buffer size equal to the tumbling window sample count. For example, a 5-second tumbling window at 10 Hz sampling (50 samples) uses a 50-sample circular buffer.
Why it fails: If network latency delays processing by even 100ms, the buffer wraps around and overwrites unprocessed samples. The tumbling window then emits incomplete aggregations with missing data points.
Correct approach: Size the circular buffer to accommodate: 1. Window duration samples (5s × 10 Hz = 50 samples) 2. Maximum expected processing delay (2s × 10 Hz = 20 samples) 3. Safety margin (30% = 21 samples) 4. Total buffer size: 50 + 20 + 21 = 91 samples (round to 100)
Real-world consequence: In a 2019 smart grid deployment, undersized buffers caused 3-7% data loss during network congestion, leading to incorrect billing aggregates. The fix required OTA firmware updates to 40,000 deployed meters. Buffer sizing costs a few hundred bytes of RAM but prevents catastrophic data loss.
The Sensor Squad builds their first real-time data system – on a tiny computer!
Max the Microcontroller (an ESP32) is excited! Today he gets to build a stream processing system – just like the ones Netflix and banks use, but on a tiny computer that costs only $5!
“We have three missions,” says Max:
Mission 1: The Endless Notebook (Circular Buffer) Sammy the Sensor reads the temperature 10 times per second. But Max only has room for 100 readings in his notebook (memory).
“When my notebook is full, I erase the OLDEST reading and write the newest one on top!” says Max. “I always have the most recent 10 seconds of data.”
Lila the LED compares it to a clock: “The hands keep going around and around, always pointing to NOW! That is why it is called a CIRCULAR buffer!”
Mission 2: The 5-Second Summary (Tumbling Window) Every 5 seconds, Max looks at all the readings he collected and writes a summary: “Average temperature: 25.3C, Minimum: 24.1C, Maximum: 26.8C.”
“It is like a teacher grading papers every Friday,” explains Bella the Battery. “She does not grade DURING the week (that would be chaos), she waits until the end and grades them all at once. That is a tumbling window!”
Mission 3: The Alert System (Event Detection) If the temperature goes above a threshold: RED LED lights up! If the temperature changes really fast: YELLOW LED flashes! If everything is normal: GREEN LED glows peacefully.
“But we do not want the alarm screaming every second!” says Max. “So I add a cooldown – after one alert, I wait 2 seconds before alerting again. Otherwise Lila would burn out!”
After building it, the Sensor Squad checks their throughput: “We are processing 10 events per second on a computer smaller than a cookie! Real systems do millions per second, but the LOGIC is exactly the same!”
12.2.4 Try This at Home!
Get a piece of paper and draw 10 boxes in a row. This is your “circular buffer.” Now roll a dice every 5 seconds and write the number in the next box. When you reach box 10, go back to box 1 and write OVER the old number. After a few rounds, look at your buffer – you always have the last 10 rolls! That is how computers store streaming data with limited memory.
12.3 Concept Relationships
This lab demonstrates core concepts from earlier chapters:
- Applies: Stream Processing Fundamentals – tumbling and sliding windows in practice
- Implements: Stream Processing Architectures – Kafka/Flink patterns on embedded hardware
- Prepares for: Stream Processing Lab - Advanced – foundation for CEP and pattern matching
- Complements: Data Storage – circular buffers as bounded-memory storage
The embedded implementation builds intuition for cloud-scale systems.
12.4 See Also
ESP32 Stream Processing:
- ESP-IDF Event Loop Library – Event-driven programming on ESP32
- Arduino Stream Class – Base class for data streams
- PlatformIO ESP32 Examples – Production ESP32 patterns
Circular Buffer Implementations:
- Arduino CircularBuffer Library – Production-ready implementation
- Embedded Artistry Blog – Ring buffer theory and practice
12.6 What’s Next
| If you want to… | Read this |
|---|---|
| Advance to complex event processing lab | Hands-On Lab: Advanced CEP |
| Study stream processing architecture patterns | Stream Processing Architectures |
| Build complete IoT streaming pipelines | Building IoT Streaming Pipelines |
| Review common streaming pitfalls to avoid | Common Pitfalls and Worked Examples |
Continue to Hands-On Lab: Advanced CEP for a 60-minute advanced lab covering pattern matching, complex event processing (CEP), session windows, and statistical aggregation.