50 Edge Computing: Processing Patterns
50.1 Learning Objectives
By the end of this chapter, you will be able to:
- Apply Four Edge Processing Patterns: Implement Filter, Aggregate, Infer, and Store-Forward patterns for different IoT scenarios
- Select Optimal Patterns: Match processing patterns to application requirements based on latency, bandwidth, and reliability needs
- Evaluate Trade-offs: Compare edge ML inference versus cloud ML, and batch processing versus real-time streaming
- Avoid Common Mistakes: Understand when edge processing saves costs versus when it adds unnecessary complexity
Key Concepts
- In-network processing: Executing data transformations within the network infrastructure (on switches, gateways, or routers) rather than at endpoints, reducing end-to-end latency and backbone bandwidth usage.
- Stream operator: A discrete processing step in a streaming pipeline (filter, map, aggregate, join) that transforms one stream into another; composing stream operators builds a complete edge processing pipeline.
- Event windowing: Grouping stream events into finite sets for batch processing based on time (tumbling windows, sliding windows) or event count (count windows) before applying aggregation or detection operators.
- Stateful processing: Stream processing that maintains state between events (running averages, counters, session trackers) in contrast to stateless operations (format conversion, field extraction) that treat each event independently.
- Processing latency budget: The maximum time allowed for all edge processing steps combined before data must be acted upon or forwarded, constraining the complexity and number of processing operators that can be applied.
For Beginners: Edge Data Processing
Edge data processing means analyzing IoT data right where it is collected, before sending anything to the cloud. Think of a store manager who handles routine customer questions on the spot instead of calling headquarters for every decision. This makes your IoT system faster, cheaper, and more reliable.
50.2 Prerequisites
Before diving into this chapter, you should be familiar with:
- Edge Compute Patterns Overview: Introduction to edge computing concepts
- IoT Reference Model: Understanding the seven-level architecture and Level 3 processing
50.3 Edge Processing Patterns
Edge computing employs four primary data processing patterns, each optimized for different IoT scenarios:
Alternative View: Pattern Selection by Use Case
This view helps select the right edge pattern based on your specific requirements:
Match your primary constraint to the optimal edge processing pattern.
Alternative View: Edge Processing Timeline
This view shows how data flows through edge processing over time:
Edge processing pipelines execute in milliseconds while cloud sync happens asynchronously.
50.4 Pattern Selection Guide
| Pattern | Best For | Bandwidth Savings | Use Case Example |
|---|---|---|---|
| Filter | Threshold monitoring | 99%+ | Send alerts only when temperature exceeds safe limits |
| Aggregate | Trend analysis | 99%+ | Send hourly averages instead of per-second readings |
| Infer | Anomaly detection | 95%+ | Visual inspection - send defect alerts, not all images |
| Store & Forward | Intermittent connectivity | N/A | Remote sites with satellite links - sync when online |
50.4.1 Pattern 1: Filter at Edge
The filter pattern applies simple threshold checks to determine which data warrants transmission:
# Example: Temperature threshold filter
def filter_reading(temp_celsius, threshold=80.0):
if temp_celsius > threshold:
return {"alert": True, "value": temp_celsius, "timestamp": now()}
return None # Don't transmit normal readingsBest for:
- Alarm/alert systems where only exceptions matter
- High-frequency sensors where most readings are routine
- Bandwidth-constrained links where every byte counts
50.4.2 Pattern 2: Aggregate at Edge
The aggregate pattern computes statistics locally and sends summaries:
# Example: Statistical aggregation
import statistics
def aggregate_window(readings, window_minutes=60):
return {
"min": min(readings),
"max": max(readings),
"avg": sum(readings) / len(readings),
"stddev": statistics.stdev(readings),
"count": len(readings),
"window_end": now()
}
Putting Numbers to It
Consider 200 temperature sensors sampling at 1 Hz:
Without aggregation (transmit every reading): \[\text{Per sensor} = 1\text{ Hz} \times 4\text{ bytes} = 4\text{ bytes/s}\] \[\text{Fleet total} = 4\text{ bytes/s} \times 200 = 800\text{ bytes/s}\] \[\text{Daily bandwidth} = 800\text{ bytes/s} \times 86{,}400\text{ s} = 69.1\text{ MB/day}\]
With 1-hour aggregation (transmit min/max/avg/stddev every hour): \[\text{Per sensor/hour} = 4\text{ values} \times 4\text{ bytes} = 16\text{ bytes}\] \[\text{Per sensor/day} = 16\text{ bytes} \times 24\text{ hours} = 384\text{ bytes}\] \[\text{Fleet daily bandwidth} = 384\text{ bytes} \times 200 = 76{,}800\text{ bytes} = 75\text{ KB/day}\]
Reduction ratio: \[\frac{69.1\text{ MB} - 0.075\text{ MB}}{69.1\text{ MB}} = 99.89\% \text{ reduction}\]
Aggregation compresses 3,600 readings into 4 statistical values, achieving near-perfect bandwidth savings while preserving trend information for analytics.
Best for:
- Trend analysis where individual readings are less important than patterns
- Environmental monitoring (temperature, humidity, air quality)
- Capacity planning and historical analysis
50.4.3 Pattern 3: Infer at Edge
The infer pattern runs machine learning models locally:
# Example: Anomaly detection at edge
model = load_tflite_model("anomaly_detector.tflite")
def infer_anomaly(sensor_data):
prediction = model.predict(sensor_data)
if prediction["anomaly_score"] > 0.85:
return {"anomaly": True, "score": prediction["anomaly_score"],
"features": sensor_data}
return None # Normal - don't transmitBest for:
- Visual inspection (defect detection in manufacturing)
- Predictive maintenance (vibration analysis)
- Safety systems requiring immediate response
50.4.4 Pattern 4: Store and Forward
The store-and-forward pattern handles intermittent connectivity:
# Example: Store-and-forward buffer
class EdgeBuffer:
def __init__(self, max_size_mb=100):
self.buffer = []
self.max_size = max_size_mb * 1024 * 1024
def store(self, reading):
self.buffer.append(reading)
self.compact_if_needed()
def forward_when_connected(self):
if network_available():
batch = self.buffer.copy()
if upload_batch(batch):
self.buffer.clear()Best for:
- Remote sites (oil rigs, agricultural fields, offshore platforms)
- Mobile assets (vehicles, shipping containers, drones)
- Any deployment with unreliable connectivity
50.5 Trade-off Analysis
50.5.1 Edge ML Inference vs Cloud ML Inference
Tradeoff: Edge ML Inference vs Cloud ML Inference
Option A (Edge ML): Deploy 2-10 MB quantized models on gateways (TensorFlow Lite, ONNX Runtime), achieving 10-50 ms inference latency with 85-92% accuracy for classification tasks on Cortex-M4/ESP32 class devices.
Option B (Cloud ML): Run 100 MB - 1 GB full-precision models in cloud (TensorFlow Serving, AWS SageMaker), achieving 200-500 ms round-trip latency with 95-99% accuracy using GPUs for complex pattern recognition.
Decision Factors: Choose edge ML when latency requirements are under 100 ms (safety systems, real-time control), privacy mandates data cannot leave premises (healthcare, industrial IP), or connectivity is unreliable (remote assets, mobile equipment). Choose cloud ML when model complexity requires GPU acceleration (video analytics, NLP), training data continuously improves models (recommendation systems), or centralized management simplifies updates. Hybrid architectures run simple detection at edge (anomaly flags, threshold checks) while cloud handles deep analysis (root cause diagnosis, long-term forecasting). A factory safety system needs 20 ms edge response, but weekly predictive maintenance reports can use cloud-trained models updated monthly.
50.5.2 Batch Processing vs Real-Time Streaming
Tradeoff: Batch Processing vs Real-Time Streaming at Edge
Option A (Batch Processing): Collect sensor data locally for 1-60 minutes, process in batches, upload aggregated results. Reduces compute cycles by 80-95%, extends battery life 3-5x on constrained devices, but introduces 1-60 minute detection latency.
Option B (Real-Time Streaming): Process each sensor reading immediately as it arrives with sub-second latency. Enables instant anomaly detection and immediate control responses, but requires 5-10x more edge compute power and continuous network connectivity for cloud integration.
Decision Factors: Choose batch processing for delay-tolerant analytics (hourly environmental reports, daily asset utilization), bandwidth-constrained links (satellite, cellular metered), and battery-powered devices where duty cycling extends deployment life from weeks to years. Choose real-time streaming for safety-critical monitoring (gas leaks, machine failures requiring less than 1 s response), control systems with tight feedback loops (HVAC, robotics), and applications where stale data has no value (live tracking, interactive systems). A smart meter can batch hourly readings, but a pipeline pressure sensor must stream in real-time to detect ruptures within seconds.
50.6 Cost Analysis: When Edge Saves Money (and When It Doesn’t)
Common Misconception: “Edge Always Reduces Costs”
The Myth: “Processing at the edge always saves money compared to cloud processing.”
The Reality: Edge computing reduces bandwidth costs but introduces hardware and maintenance costs that can exceed cloud savings in many scenarios.
Real-World Example: Agricultural Soil Monitoring
A precision agriculture company deployed 10,000 soil moisture sensors across 5,000 acres:
Cloud-Only Approach (Initial Design):
- 10,000 sensors x 1 reading/hour x 24 hours x 30 days = 7.2M readings/month
- Data size: 7.2M readings x 50 bytes = 360 MB/month
- Cloud egress cost: 360 MB / 1,024 MB/GB x $0.09/GB = $0.03/month
- Cloud compute/storage: ~$50/month
- Total: ~$50/month
Edge Gateway Approach (Actual Deployment):
- 50 edge gateways ($200 each): $10,000 upfront capital
- Edge aggregation reduces cloud traffic by 90%: 36 MB/month
- Cloud costs: $5/month (minimal compute)
- Gateway maintenance: $100/month (cellular data, power, repairs)
- Amortized gateway cost over 3 years: $278/month
- Total: $383/month
The Hidden Costs:
- Hardware depreciation: $10,000 / 36 months = $278/month
- Cellular connectivity: 50 gateways x $2/month = $100/month
- Maintenance visits: 1 failed gateway/month x $150/visit = $150/month (later reduced with better hardware)
- Software updates: Edge devices require OTA update infrastructure
When Edge Actually Saved Money (Year 2): After initial deployment issues were resolved and maintenance costs dropped to $50/month:
- Edge total: $278 (amortized) + $100 (cellular) + $50 (maintenance) + $5 (cloud) = $433/month
- Cloud total: $50/month
Edge remained more expensive until the company expanded to 100,000 sensors in Year 3. At that scale, cloud compute costs grew to ~$500/month while edge gateways were already provisioned with spare capacity, making the per-sensor cost of edge processing lower than cloud.
Key Takeaway: Edge computing provides the greatest cost savings when:
- High data volumes overwhelm bandwidth costs (>10 GB/month)
- Hardware is amortized over multi-year deployments
- Maintenance is minimal (reliable hardware, remote updates)
- Latency/privacy requirements justify the investment regardless of cost
Decision Framework:
- Small deployments (<1,000 sensors, <1 GB/month): Cloud is usually cheaper
- Medium deployments (1,000-10,000 sensors): Hybrid (edge aggregation + cloud) often optimal
- Large deployments (>10,000 sensors, >10 GB/month): Edge pays for itself within 6-12 months
50.6.1 Worked Example: Danfoss Supermarket Refrigeration Edge Strategy
Scenario: Danfoss, a Danish climate technology company, manages refrigeration controllers in 40,000 European supermarkets. Each store has 12 display cases with 3 sensors each (temperature, defrost status, compressor current), reporting every 30 seconds.
Given:
- 40,000 stores x 12 cases x 3 sensors = 1,440,000 sensors
- 30-second intervals = 86,400 s / 30 s = 2,880 readings/sensor/day
- Raw payload: 8 bytes per reading (timestamp + value)
- Cellular backhaul at EUR 0.50/MB
Step 1: Calculate raw data volume
| Metric | Value |
|---|---|
| Daily readings (total) | 1,440,000 x 2,880 = 4.15 billion |
| Daily raw data | 4.15 x 10^9 x 8 bytes = 33.2 GB |
| Daily cellular cost | 33,200 MB x EUR 0.50/MB = EUR 16,600 |
Step 2: Apply edge patterns per sensor type
| Sensor | Pattern | Logic | Reduction |
|---|---|---|---|
| Temperature | Filter | Transmit only if outside -22 to -18 C (freezer) or 2 to 4 C (chiller) | 97% (normal 97% of time) |
| Defrost status | Filter | Transmit only on state change (start/stop) | 99.5% (2 events/day vs 2,880) |
| Compressor current | Aggregate | Transmit 5-minute RMS + peak | 90% (10 summaries/hour vs 120 raw) |
Step 3: Calculate optimized volumes
Each sensor type accounts for one-third of the 1,440,000 sensors (480,000 sensors each):
| Metric | Raw | After Edge | Savings |
|---|---|---|---|
| Temperature data/day | 11.1 GB | 333 MB | 97% |
| Defrost data/day | 11.1 GB | 55 MB | 99.5% |
| Compressor data/day | 11.1 GB | 1.1 GB | 90% |
| Total daily | 33.2 GB | 1.49 GB | 95.5% |
| Cellular cost/day | EUR 16,600 | EUR 745 | EUR 15,855 saved |
Result: Edge processing reduces daily cellular costs from EUR 16,600 to EUR 745 – a 95.5% reduction. The fleet-wide gateway investment (40,000 gateways x EUR 120 = EUR 4.8 million) recovers in about 10 months through cellular savings of EUR 15,855 per day.
Key Insight: The three sensor types in the same deployment use three different edge patterns. Temperature uses Filter (only exceptions matter), defrost uses Filter (only state changes matter), and compressor current uses Aggregate (trend data matters). Pattern selection is per-sensor, not per-deployment.
50.7 Summary
- Four edge processing patterns address different IoT requirements: Filter (threshold alerts), Aggregate (statistical summaries), Infer (ML-based detection), and Store-Forward (intermittent connectivity)
- Pattern selection depends on primary priority: bandwidth reduction, real-time response, reliability, or privacy
- Edge ML trade-offs balance latency (10-50 ms edge vs 200-500 ms cloud) against accuracy (85-92% edge vs 95-99% cloud)
- Batch vs streaming trade-offs balance power efficiency against detection latency
- Cost analysis shows edge saves money only at scale (>10,000 sensors) or when latency/privacy requirements justify hardware investment
- Hybrid architectures typically provide the best balance: edge for time-critical decisions, cloud for complex analytics
Key Takeaway
The four edge processing patterns – Filter, Aggregate, Infer, and Store-Forward – each solve a different problem. Filter reduces bandwidth by sending only exceptions. Aggregate computes local statistics for trend analysis. Infer runs ML models for intelligent detection. Store-Forward handles intermittent connectivity. Hybrid architectures combining edge speed with cloud analytics consistently outperform either approach alone, but edge only saves money at scale (>10,000 sensors) or when latency and privacy requirements justify hardware investment.
For Kids: Meet the Sensor Squad!
“The Four Superpowers of Edge Computing!”
The Sensor Squad was protecting a giant farm with thousands of soil sensors. But there was a problem – sending ALL that data to the Cloud was like trying to push an elephant through a garden hose!
“I have an idea!” said Max the Microcontroller. “I have four superpowers we can use!”
Superpower 1: The Filter! “Sammy, you check the soil temperature every second, but I only need to know when it gets dangerously hot or cold. So I will only pass along the alarm readings – that is like reading 1,000 pages of a book but only sending the 2 most exciting pages!”
Superpower 2: The Aggregator! “Instead of sending every single moisture reading, I will calculate the average, the highest, and the lowest for each hour. Three numbers instead of 3,600 – that is a 1,200x data diet!”
Superpower 3: The Brain! “I have a tiny AI brain that can look at sensor patterns and spot trouble. If the soil looks sick, I will send an alert. If everything is normal, I stay quiet.”
Superpower 4: The Memory! Bella the Battery added, “And when the internet goes down on the farm – which happens a lot – Max stores everything in his memory and sends it all when the connection comes back. No data lost!”
Lila the LED flashed in four colors: “Filter, Aggregate, Infer, Store-Forward – the four superpowers of edge computing!”
What did the Squad learn? There are four ways to handle data at the edge, and the best choice depends on what you need: saving bandwidth, spotting problems, running AI, or surviving without internet!
50.8 Concept Check
50.9 Concept Relationships
How This Concept Connects
Builds on:
- Edge Compute Patterns Overview - Foundation concepts
- Edge IoT Reference Model - Level 3 processing context
Four Patterns in Detail:
- Filter Pattern - 99%+ bandwidth reduction by sending only threshold violations
- Aggregate Pattern - Statistical summaries (min/max/avg) for trend analysis
- Infer Pattern - ML models run locally, send anomaly alerts only
- Store-Forward Pattern - Buffers data during outages, syncs when reconnected
Real-World Examples:
- Danfoss Supermarkets - 95.5% data reduction across 40,000 stores using three different patterns
- Agricultural Soil Monitoring - Cost analysis shows cloud-only cheaper until very large scale
50.10 See Also
Related Resources
Pattern Implementation:
- Edge Patterns Practical Guide - Interactive tools and worked examples
- Edge Cyber-Foraging - Opportunistic compute offloading
Architecture Context:
- Edge, Fog, and Cloud Overview - Three-tier architecture
- Multi-Sensor Data Fusion - Combining patterns for sensor fusion
Case Studies:
- Danfoss Case Study (in chapter) - Three patterns for three sensor types
- Agricultural Cost Analysis (in chapter) - When edge investment does not pay off
50.11 Try It Yourself
Hands-On Exercise: Implement the Four Edge Patterns
Scenario: Industrial motor monitoring with vibration sensors at 1 kHz sampling.
Pattern 1: Filter
def filter_pattern(vibration_g, threshold=5.0):
"""Send only threshold violations"""
if vibration_g > threshold:
return {"alert": True, "value": vibration_g, "pattern": "filter"}
return None # Don't transmit normal readings
# Test: 8 samples, 3 exceed threshold (5.0)
readings = [2.1, 3.5, 2.8, 6.2, 4.1, 7.8, 3.2, 5.9]
alerts = [filter_pattern(r) for r in readings if filter_pattern(r)]
print(f"Filter: {len(readings)} readings -> {len(alerts)} alerts")Pattern 2: Aggregate
import statistics
def aggregate_pattern(readings_1sec):
"""Compute 1-second statistics from 1000 Hz samples"""
return {
"min": min(readings_1sec),
"max": max(readings_1sec),
"avg": statistics.mean(readings_1sec),
"stddev": statistics.stdev(readings_1sec),
"pattern": "aggregate"
}
# Test: 1000 samples -> 4 values = 250x reduction
samples_1khz = [2.1 + (i * 0.01) for i in range(1000)]
summary = aggregate_pattern(samples_1khz)
print(f"Aggregate: 1000 samples -> 4 values = 250x reduction")Pattern 3: Infer (simplified)
def infer_pattern(vibration_rms, threshold_trained=4.5):
"""Simplified ML inference: anomaly score based on trained threshold"""
anomaly_score = vibration_rms / threshold_trained
if anomaly_score > 1.2: # 20% above normal
return {"anomaly": True, "score": round(anomaly_score, 2),
"pattern": "infer"}
return None # Normal operation
# Test: 5.8 g RMS is above trained baseline of 4.5 g
result = infer_pattern(5.8)
print(f"Infer: {result}")Pattern 4: Store-Forward
class StoreForwardBuffer:
def __init__(self, max_size_mb=100):
self.buffer = []
self.max_size = max_size_mb * 1024 * 1024
def store(self, reading):
self.buffer.append(reading)
if len(self.buffer) > 1000: # Simplified FIFO
self.buffer.pop(0)
def forward(self, network_available):
if network_available and self.buffer:
print(f"Forwarding {len(self.buffer)} buffered readings")
self.buffer.clear()
return True
return False
# Test: buffer during outage, then forward
buffer = StoreForwardBuffer()
for i in range(50):
buffer.store({"reading": i})
buffer.forward(network_available=False) # No sync
print(f"Buffer size: {len(buffer.buffer)} readings")
buffer.forward(network_available=True) # Sync successful
print(f"After sync: {len(buffer.buffer)} readings")What to Observe:
- Filter drastically reduces transmissions (99%+) but loses trend information
- Aggregate preserves statistics while achieving 250x reduction
- Infer requires trained model but enables intelligent detection
- Store-Forward ensures no data loss during outages
Extension Challenge: Combine patterns: Filter + Aggregate. Send hourly aggregates for normal operation, immediate alerts for threshold violations.
50.12 What’s Next
| Next Topic | Description |
|---|---|
| Cyber-Foraging and Caching | Opportunistic compute offloading and caching strategies |
| Edge Patterns Practical Guide | Interactive tools, worked examples, and common pitfalls |