31  Edge Processing for Big Data

In 60 Seconds

Edge processing transforms impossible IoT data volumes into manageable analytics by filtering 90% of redundant data locally, aggregating 9% into summaries, and sending only 1% of critical anomalies to the cloud. This reduces bandwidth costs by 98% or more and cuts latency from 500ms to under 100ms.

Learning Objectives

After completing this chapter, you will be able to:

  • Apply the 90/10 rule for IoT data reduction
  • Calculate bandwidth and cost savings from edge processing
  • Design edge-to-cloud data pipelines
  • Determine when to process at edge versus cloud

Key Concepts

  • Edge pre-aggregation: Computing summary statistics (min, max, mean, count) on raw sensor data at the source device before transmission, reducing bandwidth by orders of magnitude.
  • Fog computing: An intermediate tier between edge devices and the cloud that provides more computational resources than a sensor node but lower latency than a remote data centre.
  • Data thinning: Selectively transmitting only readings that exceed a threshold or change by a significant amount, drastically reducing the volume of data sent to the cloud.
  • Federated learning: A distributed ML training approach where models are trained locally on edge devices and only model weight updates (not raw data) are sent to a central server.
  • Streaming aggregation: The process of computing rolling statistics (e.g., 5-minute average) over a continuous data stream without storing individual readings.
  • Bandwidth budget: The maximum data transmission allowance for an IoT deployment, often constrained by wireless link capacity, data plan costs, or energy consumption.

Edge processing for big data means analyzing IoT data where it is collected instead of sending everything to a central location. Think of a security guard who watches camera feeds on-site instead of streaming all video to a remote office. This local processing reduces network costs and enables faster responses.

31.1 Edge Computing: Making Big Data Manageable

Here’s a secret that changes everything: 90% of IoT data doesn’t need to leave the device.

31.1.1 The Traffic Camera Example

A single traffic camera generates 30 frames per second, 24/7:

Without Edge Processing:

  • Raw data: 30 fps x 2 MB/frame x 86,400 seconds/day = 5.2 TB per day per camera
  • A city with 10,000 cameras would generate 52 PB per day (52,000,000 GB!)
  • Network bandwidth required: 10,000 cameras x 60 MB/s = 600 GB/s = 4.8 Tbps continuous (impossible for any city)
  • Storage cost: 52 PB x $0.023/GB/month (AWS S3) = $1,196,000/month ($14M/year!)

With Edge Processing:

Instead of sending all video to the cloud:

  1. Camera processes video locally (edge computing using Nvidia Jetson or similar)
  2. Counts vehicles, detects accidents, identifies license plates
  3. Sends only summary: “Camera 4521: 847 vehicles, 2 accidents, peak at 8:15 AM”
  4. Daily data sent: ~1 KB instead of 5.2 TB (5 billion x reduction!)

Cost Comparison:

Metric No Edge Processing With Edge Processing Reduction
Data volume/day 52 PB 10 GB 5,200,000x
Network bandwidth 4.8 Tbps 1 Mbps 4,800,000x
Storage cost/year $14,000,000 $280 50,000x
Edge hardware cost $0 $200/camera x 10K = $2M One-time cost

ROI: Edge hardware pays for itself in 2 months of storage savings alone!

31.1.2 The 90/10 Rule for IoT Data

Data reduction pipeline showing raw sensor data filtered at edge from 5.2 TB per day through three stages: 90 percent filtered locally, 9 percent aggregated into summaries, and 1 percent anomalies forwarded to cloud
Figure 31.1: Edge Computing Reduces 5.2TB Daily Data to Manageable Summaries
Timeline visualization showing progressive data volume reduction as IoT data passes through edge filtering, fog aggregation, and cloud ingestion stages over time
Figure 31.2: Timeline showing how edge processing progressively reduces data volume in real-time

Edge Computing Data Reduction Pipeline: Edge processors filter 90% of redundant data locally, aggregate 9% into summaries, and send only 1% of critical anomalies to cloud – transforming an impossible 5.2 TB/day per camera into a manageable stream of summaries and alerts.

The 90/10 Rule Breakdown:

Data Type Processed At Example Sent to Cloud?
Raw readings (90%) Edge only Temperature: 72.1 F, 72.1 F, 72.2 F, 72.1 F… No (redundant)
Aggregated summaries (9%) Edge to Cloud Average temp: 72 F, min: 68 F, max: 76 F Yes (batch)
Anomalies/alerts (1%) Edge to Cloud Temperature spike: 95 F! Yes (real-time)

This transforms an impossible big data problem into a manageable analytics problem.

31.1.3 Interactive: Edge Data Reduction Calculator

Explore how edge processing reduces data volumes and costs. Adjust the parameters to model your own IoT deployment.

The data reduction ratio compounds across layers. For \(N\) sensors at rate \(f\) Hz, each \(b\) bytes:

Raw data rate: \(R_{raw} = N \times f \times b\)

After edge filtering (keep anomalies \(p_a\) + sample fraction \(p_s\)): \[R_{edge} = R_{raw} \times (p_a + p_s)\]

After fog aggregation (hourly summaries, forward anomalies): \[R_{fog} = \frac{N \times b_{summary}}{3600} + R_{raw} \times p_a\]

Example: 10,000 sensors, 1 Hz, 100 bytes, \(p_a = 0.01\), \(p_s = 0.05\), \(b_{summary} = 200\) bytes: - \(R_{raw} = 10,000 \times 1 \times 100 = 1 \text{ MB/s}\) - \(R_{edge} = 1 \times 0.06 = 60 \text{ KB/s}\) (94% reduction) - \(R_{fog} = \frac{10{,}000 \times 200}{3{,}600} + 1{,}000{,}000 \times 0.01 = 556 + 10{,}000 = 10{,}556 \text{ bytes/s} \approx 10.56 \text{ KB/s}\) (98.9% total reduction)

Daily cloud ingress: \(10.56 \text{ KB/s} \times 86,400 = 913 \text{ MB/day}\) vs. \(86.4 \text{ GB/day}\) raw.

31.1.4 Progressive Data Reduction Example

Smart Building HVAC System (1,000 temperature sensors):

Level 1 - Sensors: 1,000 sensors x 1 reading/sec x 4 bytes = 4 KB/sec raw data

Level 2 - Edge Gateway (per floor, 10 floors):
  - Aggregate 100 sensors to 10 room averages
  - 10 floors x 10 rooms x 1 reading/sec x 4 bytes = 400 bytes/sec
  - Reduction: 4 KB to 400 bytes (10x smaller)

Level 3 - Building Controller:
  - Aggregate 100 rooms to 10 zone averages
  - 10 zones x 1 reading/sec x 4 bytes = 40 bytes/sec
  - Reduction: 400 bytes to 40 bytes (10x smaller)

Level 4 - Cloud:
  - Receive zone averages + anomaly alerts only
  - 40 bytes/sec normal + 1 alert/minute average
  - Total cloud data: ~3.5 GB/year (vs 126 GB/year without edge)

Final reduction: 100x less data sent to cloud, while preserving all important information!

31.1.5 Real-World Example: Smart City Traffic

The Scenario: 10,000 traffic cameras generating data

Comparison diagram showing sequential single-server processing taking hours for 10,000 images versus parallel distributed processing across 1,000 nodes completing in seconds
Figure 31.3: Sequential versus Parallel Distributed Image Processing

Traditional vs Big Data Processing: Sequential processing on single server takes hours for 10,000 images; parallel distributed processing completes in seconds by dividing work across 1,000 nodes.

Approach Method Time
Traditional Process 1 image at a time Hours
Big Data Process 1,000 images simultaneously Seconds

31.2 Understanding Check: When to Process at Edge vs Cloud

Scenario: An industrial IoT deployment has 1,000 vibration sensors on factory equipment, each sampling at 10,000 Hz (10,000 readings per second). Each reading is 4 bytes. Engineers need to detect bearing failures in real-time (<100ms) while also storing long-term data for predictive maintenance models.

Think about: Why would you process at edge instead of sending all data to the cloud?

Key Insights:

  1. Bandwidth Problem:

    • Raw data: 1,000 sensors x 10,000 readings/sec x 4 bytes = 40 MB/second = 320 Mbps
    • Most factory internet: 10-100 Mbps upload
    • Real number: Edge processing reduces bandwidth by 99% (40 MB/s to 400 KB/s)

    Edge processing approach:

    Raw: 10,000 readings/sec -> Edge FFT analysis -> 1 summary/sec
    40 MB/sec -> Process locally -> 400 bytes/sec to cloud
    Bandwidth: 320 Mbps -> 3.2 kbps (99.99% reduction!)
  2. Latency Problem:

    • Round-trip to cloud: 50-200ms (too slow for real-time failure detection)
    • Edge processing: <10ms (fast enough to trigger emergency shutdown)
    • Real number: Edge enables <100ms response time vs 200ms+ cloud processing
  3. Cost Problem:

    • Cloud storage: 40 MB/sec x 86,400 sec/day = 3.46 TB/day = 1.26 PB/year
    • AWS S3 Standard: 1,260 TB x $23/TB/month = $29,000/month = $348,000/year
    • Edge processing + summary storage: 400 KB/sec x 86,400 = 34.56 GB/day = $300/year
    • Real number: Edge processing saves $347,700/year (99.9% cost reduction)

Decision Rule:

Process at EDGE when:
- High sample rate (>100 Hz) generates massive data
- Real-time response required (<100ms)
- Limited bandwidth to cloud
- Data can be summarized without loss (FFT, aggregates)

Send to CLOUD when:
- Complex ML models need GPU clusters
- Historical analysis across all devices
- Data must be preserved in raw form
- Cost of edge processing > cloud storage

Edge-Cloud Hybrid Architecture:

Edge Device:
- Raw sampling: 10,000 Hz vibration data
- FFT analysis: Extract frequency spectrum
- Anomaly detection: Compare to baseline
- Alert: Send if threshold exceeded (immediate)
- Summary: Send statistics every second (normal operation)

Cloud:
- Store: Summaries from all 1,000 sensors
- Train: ML models on historical patterns
- Predict: Equipment failures weeks in advance
- Deploy: Updated models back to edge devices

31.3 Edge vs Fog vs Cloud Decision Framework

Decision tree flowchart for choosing between edge, fog, and cloud processing based on latency requirements, data volume, computational complexity, and connectivity
Figure 31.4: Edge versus Fog versus Cloud Processing Decision Tree

31.3.1 Specific Numbers for Common IoT Scenarios

Scenario Devices Data Rate Recommended Stack Monthly Cost Key Metric
Smart Home 10-50 1 KB/min per device Cloud only (AWS IoT) $5-20 Simplicity
Building Automation 100-1,000 100 bytes/min Fog + Cloud (InfluxDB) $50-200 10x storage compression
Smart City 10,000-100,000 1 KB/min Edge + Cloud (Kafka + S3) $2,000-10,000 99% bandwidth reduction
Industrial Monitoring 1,000-10,000 10 KB/sec (high-freq) Edge + Time-Series DB $500-5,000 <100ms latency
Connected Vehicles 1M+ vehicles 1 MB/hour per vehicle Batch processing (Hadoop) $20,000+ Petabyte-scale analytics

31.3.2 Edge Processing Decision: Specific Bandwidth Numbers

Use Edge Processing When:

High-Frequency Sensors (saves 99%+ bandwidth):

Example: Vibration sensor at 10,000 Hz
Raw: 10,000 samples/sec x 4 bytes = 40 KB/sec = 320 kbps
Edge FFT: 1 summary/sec x 400 bytes = 400 bytes/sec = 3.2 kbps
Reduction: 99% bandwidth saved

Video Processing (saves 99.9%+ bandwidth):

Example: Security camera at 1080p 30fps
Raw: 30 frames/sec x 2 MB/frame = 60 MB/sec = 480 Mbps
Edge object detection: 1 event/min x 1 KB = 16 bytes/sec = 128 bps
Reduction: 99.9999% bandwidth saved

Limited Connectivity (cellular/satellite):

Cellular data: $10/GB typical
Raw streaming: 40 MB/sec x 86,400 sec/day = 3.46 TB/day = $34,600/day
Edge summarization: 400 KB/day = $0.004/day
Savings: $34,600/day to $0.004/day (99.9999% cost reduction)

Skip Edge Processing When:

Low-Rate Sensors (edge overhead not worth it):

Example: Temperature sensor at 1 reading/min
Data: 1 sample/min x 50 bytes = 50 bytes/min = 72 KB/day
Cost to cloud: $0.00072/day (negligible)
Edge device cost: $20 one-time + $2/year power
Not worth edge processing for such low data rates

31.4 Case Study: Rolls-Royce Engine Health Monitoring

Rolls-Royce’s TotalCare program monitors over 13,000 jet engines in service worldwide. Each engine has approximately 25 sensor channels recording parameters like fan speed (N1, N2), exhaust gas temperature (EGT), oil pressure, and vibration. During flight, sensors sample at rates from 1 Hz (temperatures) to 10 kHz (vibration accelerometers).

The raw data challenge:

  • Per engine per flight (average 6 hours): 25 channels x mixed rates = approximately 2.5 GB
  • Fleet daily: 13,000 engines x 2 flights average x 2.5 GB = 65 TB/day
  • Annual storage at cloud rates ($0.023/GB/month): $18 million/year just for storage

Edge processing solution (Engine Health Monitoring Unit onboard):

The onboard EHMU performs real-time analysis during flight:

  1. Snapshot extraction: Captures steady-state snapshots at specific flight phases (takeoff, climb, cruise, descent) – reduces continuous data to 8-12 key snapshots per flight
  2. Trend parameters: Computes delta-EGT (difference from baseline), vibration spectral peaks, oil consumption rate
  3. Exceedance detection: Flags any parameter exceeding operational limits for immediate post-flight review

Data reduction result:

Metric Raw After Edge Processing Reduction
Per engine per flight 2.5 GB 150 KB (snapshots + trends) 99.994%
Fleet daily 65 TB 3.9 GB 99.994%
Annual storage cost $18M $1,080 99.994%

Why this matters beyond cost: The edge-processed trend data enables Rolls-Royce to predict EGT margin degradation weeks before it triggers a maintenance event. Airlines receive advance notice to schedule engine washes or blade replacements during planned ground time rather than expensive unscheduled diversions (which cost $150,000-$500,000 per event in delay penalties, passenger rebooking, and ferry flights).

Key Insight: Edge processing is not just a cost optimization – it makes the analytics possible. No airline or cloud provider could handle 65 TB/day from a single customer’s engines. The onboard EHMU turns an impossible data problem into a lightweight, actionable data stream.

31.5 Cloud-Only vs Edge Processing Economics

Side-by-side comparison of cloud-only architecture sending 5 MB raw images at 500ms latency costing 1166 dollars per month versus edge processing sending 100-byte alerts at 100ms latency costing 19.50 dollars per month
Figure 31.5: Cloud Only versus Edge Processing Cost and Latency Comparison

Cloud vs Edge Processing Economics: Sending 5 MB raw images to cloud costs $1,166/month with 500ms latency. Edge processing with local AI model sends only 100-byte alerts, reducing costs to $19.50/month (98% savings) and latency to 100ms (5x faster).

The Cost (Cloud-Only):

  • Bandwidth: 5 MB/second x 86,400 seconds/day = 432 GB/day
  • Cloud ingress (AWS): 432 GB/day x $0.09/GB = $38.88/day = $1,166/month
  • Processing latency: Upload time + cloud processing = 500ms+ (too slow for real-time)

The Fix: Edge processing - process at the device

Approach Data Sent to Cloud Bandwidth/Day Monthly Cost Latency Savings
Cloud-only 5 MB raw images 432 GB $1,166 500ms Baseline
Edge processing 100-byte alerts (only when person detected) 7.2 GB $19.50 100ms 98% cost + 5x faster

The Rule: > “Process data as close to the source as possible. Only send insights to cloud, not raw data.”

31.5.1 Interactive: Edge Processing ROI Calculator

Model the return on investment for deploying edge processing hardware in your IoT deployment.

Pitfall: Assuming Cloud Bandwidth Is Free

A common architectural mistake is designing an IoT system as if network bandwidth has zero cost and infinite capacity. As shown above, even a single device streaming at 5 MB/s generates 432 GB/day. At scale (thousands of devices), cloud-only architectures become physically impossible – no amount of budget can overcome bandwidth limits that exceed available network capacity. Always estimate your raw data rate before choosing an architecture.

Sammy the Sensor had a BIG problem. He was a security camera watching the school entrance, and he was taking 30 pictures every single second!

“Max!” Sammy called out to Max the Microcontroller. “I have taken 2.6 MILLION pictures today! How do we send all of these to the cloud?”

Max did some quick math. “That would be 5.2 terabytes of data per day. Our internet connection is like a garden hose, and you are trying to push an entire swimming pool through it!”

Lila the LED had an idea. “What if we do not send ALL the pictures? Most of them look exactly the same – just an empty hallway!”

“Brilliant!” said Max. “That is called EDGE PROCESSING! Instead of sending every picture, I will look at each one right here. If nothing has changed, I will throw it away. If something interesting happens – like a person walking in – THEN I will send that picture to the cloud.”

Bella the Battery was thrilled. “So instead of sending 2.6 million pictures, we send maybe… 100?”

“Exactly!” said Max. “We go from 5.2 terabytes to about 50 megabytes. That is like turning a mountain of mail into a single postcard!”

“This is the 90/10 rule,” Max continued. “90% of sensor data is boring and repetitive. Only 10% is actually interesting. And of that 10%, only 1% is urgent. So we process data RIGHT HERE at the edge, and only send the important stuff to the cloud. It saves money, saves time, and saves battery!”

Bella smiled. “Saves battery? Now you are speaking MY language!”

Key lesson: Edge processing means analyzing data right where it is collected instead of sending everything to the cloud. It is like having a smart mail sorter that only forwards the important letters and throws away the junk mail!

Key Takeaway

The 90/10 rule is the most impactful concept in IoT data management: 90% of sensor data is redundant and can be filtered at the edge. By processing data as close to the source as possible and sending only summaries and alerts to the cloud, you reduce bandwidth costs by 98%, cut latency from seconds to milliseconds, and make otherwise impossible data volumes entirely manageable. Always ask: “Can this data be summarized before it leaves the device?”

A manufacturing facility deploys 1,000 vibration sensors (10 kHz sampling rate) on production equipment. Without edge processing, what are the bandwidth and storage costs?

Scenario Without Edge Processing:

  • Sensors: 1,000 units
  • Sample rate: 10,000 samples/second per sensor
  • Data size: 4 bytes per sample (float32)
  • Raw data rate: 1,000 × 10,000 × 4 = 40 MB/second = 320 Mbps
  • Daily data: 40 MB/s × 86,400 s = 3.46 TB/day
  • Annual data: 3.46 TB × 365 = 1,263 TB = 1.26 PB/year

Cloud Costs:

  • Ingress (AWS): 3.46 TB/day × $0.09/GB = $311/day = $113,515/year
  • Storage (S3): 1,263 TB × $0.023/GB/month = $29,049/month = $348,588/year
  • Total annual cost: $462,103

With Edge Processing (FFT + Threshold Detection): Edge device (Raspberry Pi 4 or NVIDIA Jetson Nano) performs: 1. Real-time FFT on 1-second windows → extract 20 frequency bins 2. Compare to baseline profile → flag anomalies 3. Send: 20 float values/second (normal) or full waveform if anomaly (1% of time)

Edge-Processed Data Rate:

  • Normal operation (99%): 1,000 sensors × 20 values/s × 4 bytes = 80 KB/s
  • Anomaly (1%): 1,000 × 10,000 × 4 bytes × 0.01 = 400 KB/s
  • Average: 80 + 400 = 480 KB/s
  • Daily: 480 KB × 86,400 = 41.5 GB/day
  • Annual: 41.5 × 365 = 15.1 TB/year

New Cloud Costs:

  • Ingress: 41.5 GB/day × $0.09/GB = $3.74/day = $1,365/year
  • Storage: 15.1 TB × $0.023/GB/month = $347/month = $4,164/year
  • Total annual cost: $5,529

Edge Hardware Investment:

  • 10 edge gateways @ $400 each (100 sensors per gateway) = $4,000
  • One-time setup labor: $6,000
  • Total upfront: $10,000

ROI Calculation:

  • Annual savings: $462,103 - $5,529 = $456,574
  • Payback period: $10,000 / $456,574 = 0.022 years = 8 days
  • 5-year NPV: $2,282,870 savings - $10,000 investment = $2,272,870

Data Reduction: 1.26 PB → 15.1 TB = 99% reduction (83x compression)

Key Insight: Edge processing pays for itself in 8 days through bandwidth/storage savings alone, while enabling real-time anomaly detection (<100ms local response) impossible with cloud-only architecture.

Factor Process at Edge Send to Cloud Decision Rule
Sample Rate >100 Hz (continuous streams) <1 Hz (periodic samples) High-frequency data (kHz) generates MB/s - edge filter to summaries
Latency Requirement <100ms (real-time control) >1 second (analytics) Safety systems need <50ms - edge only; dashboards tolerate seconds - cloud OK
Bandwidth Cost >1 GB/day (cellular/satellite) <100 MB/day (Wi-Fi/ethernet) Cellular $10/GB makes raw streaming prohibitive; edge reduces to KB/day
Data Lifetime Value Value decays rapidly (sensor health) Long-term value (trends, ML training) Streaming video: 30 fps has no value after 1 sec (edge detect events); temperature trends need years of history (cloud store)
Computational Complexity Simple (FFT, thresholds, filters) Complex (deep learning, cross-device correlation) Edge handles signal processing; cloud handles models requiring GPU clusters
Connectivity Reliability Intermittent (remote, mobile) Always-on (urban, fixed) Oil rigs with satellite uplinks - edge buffer outages; city deployments - cloud works

Quick Decision Matrix:

Data Volume Latency Need Complexity Deploy At
>1 MB/s <100ms Low Edge
>1 MB/s <100ms High Edge + Cloud (edge preprocess, cloud train models)
>1 MB/s >1s Low Edge (still cheaper to filter locally)
>1 MB/s >1s High Edge + Cloud
<1 KB/s <100ms Low Edge (latency dominates)
<1 KB/s <100ms High Edge + Cloud
<1 KB/s >1s Low Cloud (data volume small enough)
<1 KB/s >1s High Cloud (no edge processing needed)

Exceptions:

  • Even with low data volume, process at edge if: offline operation required, privacy critical (medical data), or power budget prohibits continuous transmission
Common Mistake: Sending All Video to Cloud for “AI Analysis”

The Error: A retail chain installs 500 security cameras (1080p, 30 fps) across 50 stores and streams all video to cloud for “AI-powered theft detection and customer analytics.”

The Math:

  • Per camera: 30 fps × 2 MB/frame = 60 MB/s = 480 Mbps
  • Per store (10 cameras): 600 MB/s = 4.8 Gbps
  • Fleet (500 cameras): 30 GB/s = 240 Gbps
  • Daily: 30 GB/s × 86,400 = 2.59 PB/day
  • Monthly: 2.59 PB × 30 = 77.7 PB/month

Cloud Costs (AWS):

  • Ingress: 77.7 PB × $0.09/GB = $7.0 million/month
  • Storage (if kept 30 days): 77.7 PB × $0.023/GB = $1.78 million/month
  • AI inference (Rekognition): 2.59 PB/day / 2 MB/frame = 1.3 billion images/day x $0.001 = $1.3 million/day = $39 million/month
  • Total: $47.78 million/month = $573 million/year

Reality Check: Most stores don’t have 4.8 Gbps upload bandwidth (requires business fiber at $5K-15K/month).

Correct Approach - Edge AI: Deploy NVIDIA Jetson Xavier NX at each store ($400/unit): 1. Run YOLO object detection locally (people, products) 2. Count customers, detect shelf stockouts, flag suspicious behavior 3. Send: Event logs (person entered at 10:15:22, shelf 3 empty), not raw video 4. Send video only when alarm triggered (0.1% of time)

Edge Processing Results:

  • Normal: 10 cameras × 1 KB/s metadata = 10 KB/s per store
  • Alarms (0.1%): 10 cameras × 60 MB/s × 0.001 = 600 KB/s
  • Average per store: 610 KB/s
  • Fleet: 610 KB/s × 50 = 30.5 MB/s
  • Daily: 30.5 MB/s × 86,400 = 2.6 TB/day (vs 2.59 PB without edge)

New Cloud Costs:

  • Ingress: 2.6 TB/day × 30 × $0.09/GB = $7,020/month
  • Storage: 2.6 TB/day × 30 × $0.023/GB = $1,794/month
  • Total: $8,814/month (vs $47.78M/month)

Edge Hardware:

  • 50 Jetson units @ $400 = $20,000
  • Installation/setup: $30,000
  • Total investment: $50,000

ROI:

  • Monthly savings: $47.78M - $8.8K = $47.77M/month
  • Payback: $50K / $47.77M = 0.001 months = less than 1 hour
  • Annual savings: $573 million

Key Lesson: Video, high-frequency sensor data, and other MB/s-scale streams should ALWAYS be processed at the edge. Only send compressed summaries, alerts, or rare events to cloud. “Send everything to cloud for AI” is a trillion-dollar mistake waiting to happen.

Common Pitfalls

Sending only 5-minute averages to the cloud prevents detecting millisecond transients that precede equipment failure. Retain high-resolution data locally for a rolling window (e.g., last 24 hours) even when transmitting only summaries.

A Raspberry Pi gateway and an STM32 microcontroller have vastly different compute budgets. Profile the target hardware with a representative workload before committing to an edge analytics design.

Edge nodes go offline due to power loss, network issues, or hardware faults. Design the cloud ingestion layer to detect gaps, request retransmission, and flag periods of missing data rather than silently accepting incomplete datasets.

Moving complex ML inference to the edge reduces cloud bandwidth costs but increases edge hardware costs and complicates model update deployments. Calculate the total cost of ownership across all tiers before deciding.

31.6 Summary

  • The 90/10 rule states that 90% of IoT data can be filtered or aggregated at the edge, sending only 10% (summaries and alerts) to the cloud.
  • Edge processing reduces costs by 98-99% by eliminating bandwidth charges, storage costs, and cloud compute expenses for redundant data.
  • Latency improves from 500ms to <100ms when processing happens locally at the edge instead of round-tripping to cloud.
  • Decision framework: Use edge for high-frequency sensors (>100 Hz), real-time requirements (<100ms), or limited bandwidth. Use cloud for ML training and historical analytics.

31.7 What’s Next

If you want to… Read this
Understand the full big data pipeline architecture Big Data Pipelines
Explore cloud processing for large-scale analytics Big Data Technologies
Apply anomaly detection at the edge Anomaly Detection Overview
Study edge compute patterns in depth Edge Compute Patterns
Return to the module overview Big Data Overview