31  Edge Processing for Big Data

In 60 Seconds

Edge processing transforms impossible IoT data volumes into manageable analytics by filtering 90% of redundant data locally, aggregating 9% into summaries, and sending only 1% of critical anomalies to the cloud. This reduces bandwidth costs by 98% or more and cuts latency from 500ms to under 100ms.

Learning Objectives

After completing this chapter, you will be able to:

  • Apply the 90/10 rule for IoT data reduction
  • Calculate bandwidth and cost savings from edge processing
  • Design edge-to-cloud data pipelines
  • Determine when to process at edge versus cloud

Key Concepts

  • Edge pre-aggregation: Computing summary statistics (min, max, mean, count) on raw sensor data at the source device before transmission, reducing bandwidth by orders of magnitude.
  • Fog computing: An intermediate tier between edge devices and the cloud that provides more computational resources than a sensor node but lower latency than a remote data centre.
  • Data thinning: Selectively transmitting only readings that exceed a threshold or change by a significant amount, drastically reducing the volume of data sent to the cloud.
  • Federated learning: A distributed ML training approach where models are trained locally on edge devices and only model weight updates (not raw data) are sent to a central server.
  • Streaming aggregation: The process of computing rolling statistics (e.g., 5-minute average) over a continuous data stream without storing individual readings.
  • Bandwidth budget: The maximum data transmission allowance for an IoT deployment, often constrained by wireless link capacity, data plan costs, or energy consumption.

Edge processing for big data means analyzing IoT data where it is collected instead of sending everything to a central location. Think of a security guard who watches camera feeds on-site instead of streaming all video to a remote office. This local processing reduces network costs and enables faster responses.

31.1 Edge Computing: Making Big Data Manageable

Here’s a secret that changes everything: 90% of IoT data doesn’t need to leave the device.

31.1.1 The Traffic Camera Example

A single traffic camera generates 30 frames per second, 24/7:

Without Edge Processing:

  • Raw data: 30 fps x 2 MB/frame x 86,400 seconds/day = 5.2 TB per day per camera
  • A city with 10,000 cameras would generate 52 PB per day (52,000,000 GB!)
  • Network bandwidth required: 10,000 cameras x 60 MB/s = 600 GB/s = 4.8 Tbps continuous (impossible for any city)
  • Storage cost: 52 PB x $0.023/GB/month (AWS S3) = $1,196,000/month ($14M/year!)

With Edge Processing:

Instead of sending all video to the cloud:

  1. Camera processes video locally (edge computing using Nvidia Jetson or similar)
  2. Counts vehicles, detects accidents, identifies license plates
  3. Sends only summary: “Camera 4521: 847 vehicles, 2 accidents, peak at 8:15 AM”
  4. Daily data sent: ~1 KB instead of 5.2 TB (5 billion x reduction!)

Cost Comparison:

  • Data volume/day: No edge processing sends 52 PB; edge processing sends 10 GB; reduction is 5,200,000x.
  • Network bandwidth: No edge processing needs 4.8 Tbps; edge processing needs 1 Mbps; reduction is 4,800,000x.
  • Storage cost/year: No edge processing costs $14,000,000; edge processing costs $280; reduction is 50,000x.
  • Edge hardware cost: No edge hardware costs $0; an edge rollout costs $200/camera x 10,000 = $2M as a one-time investment.

ROI: Edge hardware pays for itself in 2 months of storage savings alone!

31.1.2 The 90/10 Rule for IoT Data

Data reduction pipeline showing raw sensor data filtered at edge from 5.2 TB per day through three stages: 90 percent filtered locally, 9 percent aggregated into summaries, and 1 percent anomalies forwarded to cloud
Figure 31.1: Edge Computing Reduces 5.2TB Daily Data to Manageable Summaries
Timeline visualization showing progressive data volume reduction as IoT data passes through edge filtering, fog aggregation, and cloud ingestion stages over time
Figure 31.2: Timeline showing how edge processing progressively reduces data volume in real-time

Edge Computing Data Reduction Pipeline: Edge processors filter 90% of redundant data locally, aggregate 9% into summaries, and send only 1% of critical anomalies to cloud – transforming an impossible 5.2 TB/day per camera into a manageable stream of summaries and alerts.

Mobile diagram summary

  • The raw stream starts at 5.2 TB/day per camera, which is too large to ship directly to cloud systems.
  • Edge filtering strips out redundant frames first, then aggregation compresses the remaining signal into summaries and alerts.
  • The key timeline is: raw capture -> filtered stream -> aggregated summary -> transmitted result, all within milliseconds at the edge.
  • The design goal is simple: reduce traffic before transmission so the network carries insights instead of raw footage.

The 90/10 Rule Breakdown:

  • Raw readings (90%): processed at the edge only. Example: temperature readings such as 72.1 F, 72.1 F, 72.2 F, 72.1 F. Sent to cloud: no, because the stream is mostly redundant.
  • Aggregated summaries (9%): produced at the edge and forwarded to cloud. Example: average 72 F, minimum 68 F, maximum 76 F. Sent to cloud: yes, usually in batches.
  • Anomalies and alerts (1%): detected at the edge and forwarded immediately. Example: temperature spike to 95 F. Sent to cloud: yes, in real time.

This transforms an impossible big data problem into a manageable analytics problem.

31.1.3 Interactive: Edge Data Reduction Calculator

Explore how edge processing reduces data volumes and costs. Adjust the parameters to model your own IoT deployment.

The data reduction ratio compounds across layers. For \(N\) sensors at rate \(f\) Hz, each \(b\) bytes:

Raw data rate: \(R_{raw} = N \times f \times b\)

After edge filtering (keep anomalies \(p_a\) + sample fraction \(p_s\)): \[R_{edge} = R_{raw} \times (p_a + p_s)\]

After fog aggregation (hourly summaries, forward anomalies): \[R_{fog} = \frac{N \times b_{summary}}{3600} + R_{raw} \times p_a\]

Example: 10,000 sensors, 1 Hz, 100 bytes, \(p_a = 0.01\), \(p_s = 0.05\), \(b_{summary} = 200\) bytes: - \(R_{raw} = 10,000 \times 1 \times 100 = 1 \text{ MB/s}\) - \(R_{edge} = 1 \times 0.06 = 60 \text{ KB/s}\) (94% reduction) - \(R_{fog} = \frac{10{,}000 \times 200}{3{,}600} + 1{,}000{,}000 \times 0.01 = 556 + 10{,}000 = 10{,}556 \text{ bytes/s} \approx 10.56 \text{ KB/s}\) (98.9% total reduction)

Daily cloud ingress: \(10.56 \text{ KB/s} \times 86,400 = 913 \text{ MB/day}\) vs. \(86.4 \text{ GB/day}\) raw.

31.1.4 Progressive Data Reduction Example

Smart Building HVAC System (1,000 temperature sensors):

  1. Sensors: 1,000 sensors x 1 reading/sec x 4 bytes = 4 KB/sec raw data.
  2. Edge gateways (10 floors): aggregate 100 sensors into 10 room averages.
    • Output: 10 floors x 10 rooms x 1 reading/sec x 4 bytes = 400 bytes/sec
    • Reduction: 4 KB/sec -> 400 bytes/sec (10x smaller)
  3. Building controller: aggregate 100 rooms into 10 zone averages.
    • Output: 10 zones x 1 reading/sec x 4 bytes = 40 bytes/sec
    • Reduction: 400 bytes/sec -> 40 bytes/sec (another 10x smaller)
  4. Cloud: receive zone averages plus anomaly alerts only.
    • Normal traffic: 40 bytes/sec
    • Alerts: roughly 1 alert/minute
    • Total cloud data: ~3.5 GB/year versus 126 GB/year without edge processing

Final reduction: 100x less data sent to cloud, while preserving all important information!

31.1.5 Real-World Example: Smart City Traffic

The Scenario: 10,000 traffic cameras generating data

Comparison diagram showing sequential single-server processing taking hours for 10,000 images versus parallel distributed processing across 1,000 nodes completing in seconds
Figure 31.3: Sequential versus Parallel Distributed Image Processing

Traditional vs Big Data Processing: Sequential processing on single server takes hours for 10,000 images; parallel distributed processing completes in seconds by dividing work across 1,000 nodes.

Mobile processing comparison

  • Sequential processing handles one image at a time, so a city-scale batch takes hours.
  • Distributed processing fans the same workload across many workers and finishes in seconds.
  • At smart-city scale, parallel processing is mandatory because one server becomes the bottleneck immediately.
  • Traditional approach: process one image at a time. Typical completion time: hours.
  • Big data approach: process 1,000 images simultaneously. Typical completion time: seconds.

31.2 Understanding Check: When to Process at Edge vs Cloud

Scenario: An industrial IoT deployment has 1,000 vibration sensors on factory equipment, each sampling at 10,000 Hz (10,000 readings per second). Each reading is 4 bytes. Engineers need to detect bearing failures in real-time (<100ms) while also storing long-term data for predictive maintenance models.

Think about: Why would you process at edge instead of sending all data to the cloud?

Key Insights:

  1. Bandwidth Problem:

    • Raw data: 1,000 sensors x 10,000 readings/sec x 4 bytes = 40 MB/second = 320 Mbps
    • Most factory internet: 10-100 Mbps upload
    • Real number: Edge processing reduces bandwidth by 99% (40 MB/s to 400 KB/s)

    Edge processing approach:

    Raw: 10,000 readings/sec -> Edge FFT analysis -> 1 summary/sec
    40 MB/sec -> Process locally -> 400 bytes/sec to cloud
    Bandwidth: 320 Mbps -> 3.2 kbps (99.99% reduction!)
  2. Latency Problem:

    • Round-trip to cloud: 50-200ms (too slow for real-time failure detection)
    • Edge processing: <10ms (fast enough to trigger emergency shutdown)
    • Real number: Edge enables <100ms response time vs 200ms+ cloud processing
  3. Cost Problem:

    • Cloud storage: 40 MB/sec x 86,400 sec/day = 3.46 TB/day = 1.26 PB/year
    • AWS S3 Standard: 1,260 TB x $23/TB/month = $29,000/month = $348,000/year
    • Edge processing + summary storage: 400 KB/sec x 86,400 = 34.56 GB/day = $300/year
    • Real number: Edge processing saves $347,700/year (99.9% cost reduction)

Decision Rule: - Process at the edge when: - high sample rates (>100 Hz) generate massive data volumes - response time must stay below 100 ms - bandwidth to the cloud is limited or expensive - the raw stream can be summarized without losing the important signal - Send data to the cloud when: - complex ML models need GPU clusters - you need historical analysis across all devices - the raw data must be preserved for compliance or future analysis - the cost of edge processing exceeds the cloud-storage benefit

Edge-Cloud Hybrid Architecture: - Edge device - raw sampling: 10,000 Hz vibration data - FFT analysis: extract the frequency spectrum - anomaly detection: compare current data to baseline - alerting: send immediate notifications when thresholds are exceeded - summaries: send statistics every second during normal operation - Cloud - store summaries from all 1,000 sensors - train ML models on historical patterns - predict equipment failures weeks in advance - deploy updated models back to edge devices

31.3 Edge vs Fog vs Cloud Decision Framework

Decision tree flowchart for choosing between edge, fog, and cloud processing based on latency requirements, data volume, computational complexity, and connectivity
Figure 31.4: Edge versus Fog versus Cloud Processing Decision Tree

Mobile decision-tree summary

  • Choose edge when latency must stay below 100 ms, raw data volume is high, or uplink bandwidth is limited.
  • Choose fog when nearby gateways can aggregate data from many devices and coordinate local decisions.
  • Choose cloud when workloads need historical data across the fleet, heavyweight ML training, or elastic compute.
  • Hybrid designs usually look like: edge filtering -> fog aggregation -> cloud analytics.

31.3.1 Specific Numbers for Common IoT Scenarios

  • Smart Home: 10-50 devices, about 1 KB/min per device. Recommended stack: cloud only (AWS IoT). Typical monthly cost: $5-20. Key metric: simplicity.
  • Building Automation: 100-1,000 devices, about 100 bytes/min. Recommended stack: fog + cloud (InfluxDB). Typical monthly cost: $50-200. Key metric: 10x storage compression.
  • Smart City: 10,000-100,000 devices, about 1 KB/min. Recommended stack: edge + cloud (Kafka + S3). Typical monthly cost: $2,000-10,000. Key metric: 99% bandwidth reduction.
  • Industrial Monitoring: 1,000-10,000 devices, about 10 KB/sec high-frequency data. Recommended stack: edge + time-series database. Typical monthly cost: $500-5,000. Key metric: <100 ms latency.
  • Connected Vehicles: 1M+ vehicles, about 1 MB/hour per vehicle. Recommended stack: batch processing (Hadoop). Typical monthly cost: $20,000+. Key metric: petabyte-scale analytics.

31.3.2 Edge Processing Decision: Specific Bandwidth Numbers

Use Edge Processing When:

High-Frequency Sensors (saves 99%+ bandwidth): - Example: vibration sensor at 10,000 Hz - Raw stream: 10,000 samples/sec x 4 bytes = 40 KB/sec = 320 kbps - Edge FFT summary: 1 summary/sec x 400 bytes = 400 bytes/sec = 3.2 kbps - Reduction: 99% bandwidth saved

Video Processing (saves 99.9%+ bandwidth): - Example: security camera at 1080p, 30 fps - Raw stream: 30 frames/sec x 2 MB/frame = 60 MB/sec = 480 Mbps - Edge object detection: 1 event/min x 1 KB = 16 bytes/sec = 128 bps - Reduction: 99.9999% bandwidth saved

Limited Connectivity (cellular/satellite): - Typical cellular data price: $10/GB - Raw streaming: 40 MB/sec x 86,400 sec/day = 3.46 TB/day = $34,600/day - Edge summarization: 400 KB/day = $0.004/day - Savings: $34,600/day -> $0.004/day (99.9999% cost reduction)

Skip Edge Processing When:

Low-Rate Sensors (edge overhead not worth it): - Example: temperature sensor at 1 reading/min - Data volume: 1 sample/min x 50 bytes = 50 bytes/min = 72 KB/day - Cloud cost: $0.00072/day (negligible) - Edge device cost: $20 one-time + $2/year power - Conclusion: edge processing is usually not worth it for such low data rates

31.4 Case Study: Rolls-Royce Engine Health Monitoring

Rolls-Royce’s TotalCare program monitors over 13,000 jet engines in service worldwide. Each engine has approximately 25 sensor channels recording parameters like fan speed (N1, N2), exhaust gas temperature (EGT), oil pressure, and vibration. During flight, sensors sample at rates from 1 Hz (temperatures) to 10 kHz (vibration accelerometers).

The raw data challenge:

  • Per engine per flight (average 6 hours): 25 channels x mixed rates = approximately 2.5 GB
  • Fleet daily: 13,000 engines x 2 flights average x 2.5 GB = 65 TB/day
  • Annual storage at cloud rates ($0.023/GB/month): $18 million/year just for storage

Edge processing solution (Engine Health Monitoring Unit onboard):

The onboard EHMU performs real-time analysis during flight:

  1. Snapshot extraction: Captures steady-state snapshots at specific flight phases (takeoff, climb, cruise, descent) – reduces continuous data to 8-12 key snapshots per flight
  2. Trend parameters: Computes delta-EGT (difference from baseline), vibration spectral peaks, oil consumption rate
  3. Exceedance detection: Flags any parameter exceeding operational limits for immediate post-flight review

Data reduction result:

  • Per engine per flight: raw data is 2.5 GB; after edge processing it falls to 150 KB (snapshots plus trends); reduction is 99.994%.
  • Fleet daily: raw data is 65 TB; after edge processing it falls to 3.9 GB; reduction is 99.994%.
  • Annual storage cost: raw cost is $18M; after edge processing it falls to $1,080; reduction is 99.994%.

Why this matters beyond cost: The edge-processed trend data enables Rolls-Royce to predict EGT margin degradation weeks before it triggers a maintenance event. Airlines receive advance notice to schedule engine washes or blade replacements during planned ground time rather than expensive unscheduled diversions (which cost $150,000-$500,000 per event in delay penalties, passenger rebooking, and ferry flights).

Key Insight: Edge processing is not just a cost optimization – it makes the analytics possible. No airline or cloud provider could handle 65 TB/day from a single customer’s engines. The onboard EHMU turns an impossible data problem into a lightweight, actionable data stream.

31.5 Cloud-Only vs Edge Processing Economics

Side-by-side comparison of cloud-only architecture sending 5 MB raw images at 500ms latency costing 1166 dollars per month versus edge processing sending 100-byte alerts at 100ms latency costing 19.50 dollars per month
Figure 31.5: Cloud Only versus Edge Processing Cost and Latency Comparison

Cloud vs Edge Processing Economics: Sending 5 MB raw images to cloud costs $1,166/month with 500ms latency. Edge processing with local AI model sends only 100-byte alerts, reducing costs to $19.50/month (98% savings) and latency to 100ms (5x faster).

Mobile economics summary

  • Cloud-only streaming sends 432 GB/day, costs about $1,166/month, and adds roughly 500 ms latency.
  • Edge processing sends 100-byte alerts, cuts traffic to 7.2 GB/day, drops cost to about $19.50/month, and lowers latency to 100 ms.
  • The rule is to send events and summaries, not full raw streams, unless the cloud genuinely needs the original data.

The Cost (Cloud-Only):

  • Bandwidth: 5 MB/second x 86,400 seconds/day = 432 GB/day
  • Cloud ingress (AWS): 432 GB/day x $0.09/GB = $38.88/day = $1,166/month
  • Processing latency: Upload time + cloud processing = 500ms+ (too slow for real-time)

The Fix: Edge processing - process at the device

  • Cloud-only: send 5 MB raw images to the cloud, consume 432 GB/day, cost $1,166/month, and deliver about 500 ms latency. Savings status: baseline.
  • Edge processing: send 100-byte alerts only when a person is detected, consume 7.2 GB/day, cost $19.50/month, and deliver about 100 ms latency. Savings status: 98% lower cost and 5x faster.

The Rule: > “Process data as close to the source as possible. Only send insights to cloud, not raw data.”

31.5.1 Interactive: Edge Processing ROI Calculator

Model the return on investment for deploying edge processing hardware in your IoT deployment.

Pitfall: Assuming Cloud Bandwidth Is Free

A common architectural mistake is designing an IoT system as if network bandwidth has zero cost and infinite capacity. As shown above, even a single device streaming at 5 MB/s generates 432 GB/day. At scale (thousands of devices), cloud-only architectures become physically impossible – no amount of budget can overcome bandwidth limits that exceed available network capacity. Always estimate your raw data rate before choosing an architecture.

Sammy the Sensor had a BIG problem. He was a security camera watching the school entrance, and he was taking 30 pictures every single second!

“Max!” Sammy called out to Max the Microcontroller. “I have taken 2.6 MILLION pictures today! How do we send all of these to the cloud?”

Max did some quick math. “That would be 5.2 terabytes of data per day. Our internet connection is like a garden hose, and you are trying to push an entire swimming pool through it!”

Lila the LED had an idea. “What if we do not send ALL the pictures? Most of them look exactly the same – just an empty hallway!”

“Brilliant!” said Max. “That is called EDGE PROCESSING! Instead of sending every picture, I will look at each one right here. If nothing has changed, I will throw it away. If something interesting happens – like a person walking in – THEN I will send that picture to the cloud.”

Bella the Battery was thrilled. “So instead of sending 2.6 million pictures, we send maybe… 100?”

“Exactly!” said Max. “We go from 5.2 terabytes to about 50 megabytes. That is like turning a mountain of mail into a single postcard!”

“This is the 90/10 rule,” Max continued. “90% of sensor data is boring and repetitive. Only 10% is actually interesting. And of that 10%, only 1% is urgent. So we process data RIGHT HERE at the edge, and only send the important stuff to the cloud. It saves money, saves time, and saves battery!”

Bella smiled. “Saves battery? Now you are speaking MY language!”

Key lesson: Edge processing means analyzing data right where it is collected instead of sending everything to the cloud. It is like having a smart mail sorter that only forwards the important letters and throws away the junk mail!

Key Takeaway

The 90/10 rule is the most impactful concept in IoT data management: 90% of sensor data is redundant and can be filtered at the edge. By processing data as close to the source as possible and sending only summaries and alerts to the cloud, you reduce bandwidth costs by 98%, cut latency from seconds to milliseconds, and make otherwise impossible data volumes entirely manageable. Always ask: “Can this data be summarized before it leaves the device?”

A manufacturing facility deploys 1,000 vibration sensors (10 kHz sampling rate) on production equipment. Without edge processing, what are the bandwidth and storage costs?

Scenario Without Edge Processing:

  • Sensors: 1,000 units
  • Sample rate: 10,000 samples/second per sensor
  • Data size: 4 bytes per sample (float32)
  • Raw data rate: 1,000 × 10,000 × 4 = 40 MB/second = 320 Mbps
  • Daily data: 40 MB/s × 86,400 s = 3.46 TB/day
  • Annual data: 3.46 TB × 365 = 1,263 TB = 1.26 PB/year

Cloud Costs:

  • Ingress (AWS): 3.46 TB/day × $0.09/GB = $311/day = $113,515/year
  • Storage (S3): 1,263 TB × $0.023/GB/month = $29,049/month = $348,588/year
  • Total annual cost: $462,103

With Edge Processing (FFT + Threshold Detection): Edge device (Raspberry Pi 4 or NVIDIA Jetson Nano) performs: 1. Real-time FFT on 1-second windows → extract 20 frequency bins 2. Compare to baseline profile → flag anomalies 3. Send: 20 float values/second (normal) or full waveform if anomaly (1% of time)

Edge-Processed Data Rate:

  • Normal operation (99%): 1,000 sensors × 20 values/s × 4 bytes = 80 KB/s
  • Anomaly (1%): 1,000 × 10,000 × 4 bytes × 0.01 = 400 KB/s
  • Average: 80 + 400 = 480 KB/s
  • Daily: 480 KB × 86,400 = 41.5 GB/day
  • Annual: 41.5 × 365 = 15.1 TB/year

New Cloud Costs:

  • Ingress: 41.5 GB/day × $0.09/GB = $3.74/day = $1,365/year
  • Storage: 15.1 TB × $0.023/GB/month = $347/month = $4,164/year
  • Total annual cost: $5,529

Edge Hardware Investment:

  • 10 edge gateways @ $400 each (100 sensors per gateway) = $4,000
  • One-time setup labor: $6,000
  • Total upfront: $10,000

ROI Calculation:

  • Annual savings: $462,103 - $5,529 = $456,574
  • Payback period: $10,000 / $456,574 = 0.022 years = 8 days
  • 5-year NPV: $2,282,870 savings - $10,000 investment = $2,272,870

Data Reduction: 1.26 PB → 15.1 TB = 99% reduction (83x compression)

Key Insight: Edge processing pays for itself in 8 days through bandwidth/storage savings alone, while enabling real-time anomaly detection (<100ms local response) impossible with cloud-only architecture.

  • Sample rate: process at the edge for >100 Hz continuous streams; send to cloud for <1 Hz periodic samples. Decision rule: kHz-rate data quickly becomes MB/s, so edge filtering is usually mandatory.
  • Latency requirement: process at the edge for <100 ms real-time control; send to cloud for >1 second analytics. Decision rule: safety systems need sub-50 ms response, while dashboards can tolerate seconds.
  • Bandwidth cost: process at the edge for >1 GB/day cellular or satellite traffic; send to cloud for <100 MB/day over Wi-Fi or ethernet. Decision rule: at $10/GB, raw streaming becomes prohibitively expensive.
  • Data lifetime value: process at the edge when value decays quickly, such as sensor-health alerts; send to cloud when long-term trends or ML training matter more. Decision rule: video frames lose value in seconds, but temperature trends can remain useful for years.
  • Computational complexity: process at the edge when the task is simple, such as FFTs, thresholds, or filters; send to cloud when the workload needs deep learning or cross-device correlation. Decision rule: edge handles signal processing, cloud handles GPU-scale training.
  • Connectivity reliability: process at the edge when links are intermittent, such as oil rigs or mobile assets; send to cloud when connectivity is stable and always on. Decision rule: edge nodes can buffer through outages while the cloud waits for clean uplinks.

Quick Decision Matrix:

  • >1 MB/s, <100 ms, low complexity: Edge
  • >1 MB/s, <100 ms, high complexity: Edge + Cloud for local preprocessing plus cloud training
  • >1 MB/s, >1 s, low complexity: Edge because local filtering is still cheaper
  • >1 MB/s, >1 s, high complexity: Edge + Cloud
  • <1 KB/s, <100 ms, low complexity: Edge because latency dominates the decision
  • <1 KB/s, <100 ms, high complexity: Edge + Cloud
  • <1 KB/s, >1 s, low complexity: Cloud because the data volume is already small
  • <1 KB/s, >1 s, high complexity: Cloud because edge processing adds little benefit

Exceptions:

  • Even with low data volume, process at edge if: offline operation required, privacy critical (medical data), or power budget prohibits continuous transmission
Common Mistake: Sending All Video to Cloud for “AI Analysis”

The Error: A retail chain installs 500 security cameras (1080p, 30 fps) across 50 stores and streams all video to cloud for “AI-powered theft detection and customer analytics.”

The Math:

  • Per camera: 30 fps × 2 MB/frame = 60 MB/s = 480 Mbps
  • Per store (10 cameras): 600 MB/s = 4.8 Gbps
  • Fleet (500 cameras): 30 GB/s = 240 Gbps
  • Daily: 30 GB/s × 86,400 = 2.59 PB/day
  • Monthly: 2.59 PB × 30 = 77.7 PB/month

Cloud Costs (AWS):

  • Ingress: 77.7 PB × $0.09/GB = $7.0 million/month
  • Storage (if kept 30 days): 77.7 PB × $0.023/GB = $1.78 million/month
  • AI inference (Rekognition): 2.59 PB/day / 2 MB/frame = 1.3 billion images/day x $0.001 = $1.3 million/day = $39 million/month
  • Total: $47.78 million/month = $573 million/year

Reality Check: Most stores don’t have 4.8 Gbps upload bandwidth (requires business fiber at $5K-15K/month).

Correct Approach - Edge AI: Deploy NVIDIA Jetson Xavier NX at each store ($400/unit): 1. Run YOLO object detection locally (people, products) 2. Count customers, detect shelf stockouts, flag suspicious behavior 3. Send: Event logs (person entered at 10:15:22, shelf 3 empty), not raw video 4. Send video only when alarm triggered (0.1% of time)

Edge Processing Results:

  • Normal: 10 cameras × 1 KB/s metadata = 10 KB/s per store
  • Alarms (0.1%): 10 cameras × 60 MB/s × 0.001 = 600 KB/s
  • Average per store: 610 KB/s
  • Fleet: 610 KB/s × 50 = 30.5 MB/s
  • Daily: 30.5 MB/s × 86,400 = 2.6 TB/day (vs 2.59 PB without edge)

New Cloud Costs:

  • Ingress: 2.6 TB/day × 30 × $0.09/GB = $7,020/month
  • Storage: 2.6 TB/day × 30 × $0.023/GB = $1,794/month
  • Total: $8,814/month (vs $47.78M/month)

Edge Hardware:

  • 50 Jetson units @ $400 = $20,000
  • Installation/setup: $30,000
  • Total investment: $50,000

ROI:

  • Monthly savings: $47.78M - $8.8K = $47.77M/month
  • Payback: $50K / $47.77M = 0.001 months = less than 1 hour
  • Annual savings: $573 million

Key Lesson: Video, high-frequency sensor data, and other MB/s-scale streams should ALWAYS be processed at the edge. Only send compressed summaries, alerts, or rare events to cloud. “Send everything to cloud for AI” is a trillion-dollar mistake waiting to happen.

Common Pitfalls

Sending only 5-minute averages to the cloud prevents detecting millisecond transients that precede equipment failure. Retain high-resolution data locally for a rolling window (e.g., last 24 hours) even when transmitting only summaries.

A Raspberry Pi gateway and an STM32 microcontroller have vastly different compute budgets. Profile the target hardware with a representative workload before committing to an edge analytics design.

Edge nodes go offline due to power loss, network issues, or hardware faults. Design the cloud ingestion layer to detect gaps, request retransmission, and flag periods of missing data rather than silently accepting incomplete datasets.

Moving complex ML inference to the edge reduces cloud bandwidth costs but increases edge hardware costs and complicates model update deployments. Calculate the total cost of ownership across all tiers before deciding.

31.6 Summary

  • The 90/10 rule states that 90% of IoT data can be filtered or aggregated at the edge, sending only 10% (summaries and alerts) to the cloud.
  • Edge processing reduces costs by 98-99% by eliminating bandwidth charges, storage costs, and cloud compute expenses for redundant data.
  • Latency improves from 500ms to <100ms when processing happens locally at the edge instead of round-tripping to cloud.
  • Decision framework: Use edge for high-frequency sensors (>100 Hz), real-time requirements (<100ms), or limited bandwidth. Use cloud for ML training and historical analytics.

31.7 What’s Next

If you want to… Read this
Understand the full big data pipeline architecture Big Data Pipelines
Explore cloud processing for large-scale analytics Big Data Technologies
Apply anomaly detection at the edge Anomaly Detection Overview
Study edge compute patterns in depth Edge Compute Patterns
Return to the module overview Big Data Overview