12  Code Offloading & Computing

12.1 Code Offloading and Heterogeneous Computing

This section provides a stable anchor for cross-references to code offloading and heterogeneous computing across the curriculum.

12.2 Learning Objectives

By the end of this chapter, you will be able to:

  • Understand Code Offloading Decisions: Explain when to process locally versus offload to cloud based on energy profiles
  • Calculate Offloading Energy Costs: Compute transmission energy for Wi-Fi vs cellular networks
  • Apply MAUI Framework: Use the MAUI decision framework to make context-aware offloading decisions
  • Leverage Heterogeneous Cores: Match computational tasks to appropriate processors (CPU, GPU, DSP, NPU)
  • Design Energy-Preserving Sensing Plans: Find the cheapest sequence of operations to determine context
In 60 Seconds

Code offloading decides whether an IoT device should compute locally or send data to the cloud; the right choice depends on comparing radio transmission energy (0.1 mJ/KB over Wi-Fi, 1 mJ/KB over cellular) against local computation energy, using frameworks like MAUI to make this decision automatically at runtime.

Key Concepts

  • Code Offloading: Migrating a computation from the IoT device to a remote server (cloud or edge) to reduce local energy consumption
  • MAUI Framework: A system that automatically profiles computation and network state to decide whether local or remote execution is cheaper
  • Transmission Energy: The energy cost of sending data over a radio link; equals data_size × energy_per_byte, which varies by technology and signal strength
  • Heterogeneous Computing: Using multiple specialized processor types (CPU, GPU, DSP, NPU) on one chip, each optimized for different workload characteristics
  • NPU (Neural Processing Unit): A chip accelerator designed for neural network inference, achieving 10–100x better energy efficiency than GPU for AI workloads
  • Energy-Preserving Sensing Plan: A sequence of cheap sensor readings that can infer expensive context without directly measuring it
  • Local vs Cloud Breakeven: The computation complexity threshold at which local processing uses less energy than transmitting data to the cloud

Energy and power management determines how long your IoT device can operate between battery changes or charges. Think of packing for a camping trip with limited battery packs – every bit of power must be used wisely. Since many IoT sensors need to run for months or years unattended, power management is often the single most important engineering decision.

“Sometimes I have a really hard math problem to solve,” said Max the Microcontroller. “I COULD do it myself, but it would take forever and drain Bella’s battery. Or I could send the data to a powerful cloud server and let IT do the math. That is called code offloading.”

Sammy the Sensor asked, “But sending data uses energy too, right?” Max nodded, “Exactly! That is the trade-off. Sending data over Wi-Fi costs about 0.1 millijoules per byte, but over cellular it costs 10 times more. So sometimes it is cheaper to compute locally, and sometimes it is cheaper to offload. The MAUI framework helps you decide.”

Bella the Battery broke it down simply: “If the computation is small, do it locally. If the computation is huge and you have Wi-Fi, send it to the cloud. If you are on cellular with bad signal, definitely do it locally – transmitting over a weak signal wastes tons of my energy!” Lila the LED added, “Modern chips also have specialized processors – a GPU for graphics, a DSP for audio, an NPU for AI. Using the right processor for each job saves energy too!”

12.3 Prerequisites

Before diving into this chapter, you should be familiar with:

12.4 Energy-Preserving Sensing Plans

Flowchart showing energy-preserving sensing plan with cost-based option selection comparing direct sensing versus inference from cached attributes
Figure 12.1: Energy-Preserving Sensing Plan: Cost-Based Option Selection

Sensing Planner finds the best sequence of proxy attributes to sense, considering: - Direct sensing cost - Inference possibilities from cached attributes - Confidence of inference rules - Overall energy minimization

Example: To determine “InOffice”, options are: 1. Sense directly (80mW) 2. If “Running=True” cached, infer “InOffice=False” (0mW) 3. If “AtHome=True” cached, infer “InOffice=False” (0mW)

Choose the cheapest option!

12.5 Code Offloading Decisions

Decision tree diagram showing MAUI code offloading framework comparing local execution energy against network transfer plus remote compute energy based on network type and battery state
Figure 12.2: MAUI Code Offloading Decision: Local vs Remote Execution Energy Analysis

12.5.1 Interactive Offloading Energy Calculator

MAUI (Mobile Assistance Using Infrastructure): Framework that profiles code components in terms of energy to decide whether to run locally or remotely.

Considerations:

  • Costs related to transfer of code/data
  • Dynamic decisions based on network constraints
  • Latency requirements
  • Local vs remote execution energy

Example: With 3G, offloading may cost more energy due to high network transmission costs. With Wi-Fi, offloading can save significant energy.

Common Misconception: “Cloud Processing Is Always More Energy Efficient”

The Misconception: “The cloud has powerful servers, so offloading computation always saves energy on my IoT device.”

The Reality: Network transmission energy often exceeds local computation energy, especially on cellular networks. The decision depends on network type, data size, and computation complexity.

Quantified Comparison:

Task: Process 1MB sensor data with ML model (2 seconds computation)

Quantified Energy Comparison:

Task: Process 1MB sensor data with ML model (2 seconds computation on local device)

Option 1: Local Processing (ARM Cortex-M4 @ 80 MHz): \[E_{local} = P_{CPU} \times t_{compute} = 50 \text{ mW} \times 2 \text{ sec} = 100 \text{ mJ}\]

Option 2: Wi-Fi Offloading: \[E_{TX} = P_{TX} \times t_{TX} = 250 \text{ mW} \times 0.5 \text{ sec} = 125 \text{ mJ}\] \[E_{RX} = P_{RX} \times t_{RX} = 150 \text{ mW} \times 0.02 \text{ sec} = 3 \text{ mJ}\] \[E_{idle} = P_{idle} \times t_{cloud} = 20 \text{ mW} \times 0.1 \text{ sec} = 2 \text{ mJ}\] \[E_{WiFi} = 125 + 3 + 2 = 130 \text{ mJ (30\% worse than local)}\]

Option 3: LTE Offloading (with RRC tail energy): \[E_{ramp} = 500 \text{ mW} \times 0.5 \text{ sec} = 250 \text{ mJ}\] \[E_{TX} = 800 \text{ mW} \times 1.0 \text{ sec} = 800 \text{ mJ}\] \[E_{RX} = 400 \text{ mW} \times 0.05 \text{ sec} = 20 \text{ mJ}\] \[E_{tail} = 200 \text{ mW} \times 5 \text{ sec} = 1,000 \text{ mJ}\] \[E_{LTE} = 250 + 800 + 20 + 1,000 = 2,070 \text{ mJ (20× worse than local!)}\]

Key insight: LTE tail energy (5-10 sec radio-on after transmission) dominates. For short tasks under 30 seconds, local execution wins. For longer tasks (60+ seconds), offloading can save energy despite the tail penalty.

When Cloud Wins:

Task: Complex ML inference (60 seconds on local device)

  • Local: 50 mW × 60s = 3000 mJ
  • Wi-Fi offload: 130 mJ transmission + negligible remote = 130 mJ total
  • Energy savings: 23× improvement!

Decision Matrix:

Network Data Size Computation Time Recommendation
Wi-Fi <100 KB <5 sec Local (transmission overhead dominates)
Wi-Fi <100 KB >30 sec Offload (computation dominates)
Wi-Fi >1 MB >10 sec Offload (parallel advantage)
LTE Any <30 sec Local (tail energy kills savings)
LTE <500 KB >60 sec Offload (if battery >50%)

MAUI’s Context-Aware Approach:

  • Wi-Fi available + heavy computation → Offload (2-20× savings)
  • LTE only + light computation → Local (avoid 10-15× penalty)
  • Battery <20% → Always local (conserve energy)
  • Latency critical → Offload if Wi-Fi, local if LTE

Key Insight: The 5-10 second LTE “tail energy” (radio staying on after transmission) often consumes more energy than the entire local computation. Context-aware offloading decisions must consider network type, not just raw transmission costs.

12.6 Local Computation: Heterogeneous Cores

Block diagram showing heterogeneous mobile system-on-chip architecture with CPU GPU DSP and NPU cores and task scheduler assigning workloads to appropriate processors
Figure 12.3: Heterogeneous Mobile SoC Architecture: CPU, DSP, GPU, and NPU Task Scheduling

Modern Mobile SoCs include heterogeneous cores: - CPU: General purpose, control flow - GPU: Massively parallel, graphics and compute - DSP: Low-power signal processing, audio/sensor data - NPU: Neural network acceleration, ML inference

Benefits:

  • Increase performance and power efficiency
  • Selected tasks shift to more efficient cores
  • Dynamic voltage/frequency scaling per core

Example - Keyword Spotting:

  • Optimized GPU is >6x faster than cloud
  • Optimized GPU is >21x faster than sequential CPU
  • Optimized GPU with batching outperforms cloud energy-wise

12.7 Knowledge Check: Heterogeneous Computing

12.8 Code Offloading Energy Analysis Worksheet

Scenario: Image processing on wearable device - local vs cloud decision

12.8.1 Step 1: Local Processing Energy

Component Power Duration Energy
Image Capture 80 mA @ 3.7V = 296 mW 100 ms 29.6 mJ
CPU Processing 200 mA @ 3.7V = 740 mW 3000 ms 2,220 mJ
Total Local - - 2,249.6 mJ

12.8.2 Step 2: Cloud Offloading Energy (Wi-Fi)

Component Power Duration Energy
Image Capture 80 mA @ 3.7V = 296 mW 100 ms 29.6 mJ
Wi-Fi TX (upload 50KB) 250 mA @ 3.7V = 925 mW 400 ms 370 mJ
Wi-Fi RX (download 5KB) 150 mA @ 3.7V = 555 mW 50 ms 27.75 mJ
Idle Wait (remote processing) 15 mA @ 3.7V = 55.5 mW 500 ms 27.75 mJ
Total Cloud (Wi-Fi) - - 455.1 mJ

Wi-Fi Decision: Offload (saves 1,794 mJ = 80% energy reduction)

12.8.3 Step 3: Cloud Offloading Energy (LTE)

Component Power Duration Energy
Image Capture 80 mA @ 3.7V = 296 mW 100 ms 29.6 mJ
LTE TX (upload 50KB) 500 mA @ 3.7V = 1,850 mW 800 ms 1,480 mJ
LTE RX (download 5KB) 300 mA @ 3.7V = 1,110 mW 100 ms 111 mJ
RRC State Overhead 200 mA @ 3.7V = 740 mW 2000 ms 1,480 mJ
Total Cloud (LTE) - - 3,100.6 mJ

LTE Decision: Process locally (saves 851 mJ vs LTE offloading)

12.8.4 Step 4: MAUI Decision Framework

Decision = {
  if (Wi-Fi available AND energy_cloud_wifi < energy_local):
    return "OFFLOAD_WIFI"
  elif (energy_local < energy_cloud_cellular):
    return "PROCESS_LOCAL"
  elif (battery > 50% AND latency_critical):
    return "OFFLOAD_CELLULAR"
  else:
    return "PROCESS_LOCAL"
}

12.8.5 Step 5: Context-Aware Adaptation

Context Network Battery Decision Energy Rationale
At Home Wi-Fi 80% Offload 455 mJ Wi-Fi cheap, fast
Outdoors LTE 80% Local 2,250 mJ LTE expensive
Outdoors LTE 15% Local 2,250 mJ Battery critical
At Office Wi-Fi 15% Offload 455 mJ Save battery with Wi-Fi

Your Turn: Calculate offloading decisions for your application!

12.9 Sensor Fusion Energy Optimization Worksheet

12.9.1 Interactive GPS vs Inference Energy Calculator

Scenario: Location tracking using GPS vs Wi-Fi/accelerometer inference

12.9.2 Step 1: Direct GPS Sensing

State Current Duration Energy per Hour
GPS Active 45 mA 30 sec 0.375 mAh
Processing 20 mA 2 sec 0.011 mAh
BLE TX 15 mA 1 sec 0.004 mAh
Sleep 10 µA 27 sec 0.000075 mAh

Per measurement (60s cycle): 0.390 mAh Per hour (60 measurements): 23.4 mAh 200mAh battery life: 8.5 hours

12.9.3 Step 2: ACE Inference Strategy

Use cached GPS + accelerometer for motion detection

Scenario Method Current Duration Frequency
Stationary Cached GPS 10 µA 60 sec 59 min/hour
Moving (inferred) Accel check 0.5 mA 0.5 sec 59 times/hour
Verify Location GPS 45 mA 30 sec 1 time/hour

Energy per hour:

E_stationary = 59 × (10µA × 60s) / 3600 = 0.0098 mAh
E_accel_check = 59 × (0.5mA × 0.5s) / 3600 = 0.0041 mAh
E_gps_verify = 1 × (45mA × 30s + 20mA × 2s) / 3600 = 0.386 mAh
E_total = 0.40 mAh per hour

200mAh battery life: 500 hours = 20.8 days

Energy savings: 58.5× improvement over continuous GPS!

12.9.4 Step 3: Association Rules for Inference

ACE learns these rules from history:

Rule Support Confidence Inference
Accel_Still=True → AtHome=True 25% 85% Skip GPS if still
Wi-Fi_SSID=Home → AtHome=True 30% 95% Use Wi-Fi instead of GPS
Time=Night AND Still → Sleeping=True 15% 90% 10× reduce all sampling

Optimized energy with rules:

  • 85% of requests served from cache/inference (0.01 mAh)
  • 15% require GPS sensing (0.39 mAh)
  • Average: 0.085 mAh per request
  • Battery life: 2,352 hours = 98 days!

12.9.5 Step 4: Battery-Aware Adaptation

Battery Level Strategy GPS Frequency Avg Current
100-50% Normal Every 5 min 0.40 mA
50-20% Conservative Every 15 min 0.15 mA
20-15% Emergency Every 30 min 0.08 mA
<15% Critical Every 60 min 0.04 mA

Your Turn: Design inference rules for your sensor fusion application!

12.10 Case Study: Google’s Adaptive Offloading in Pixel Phones

Google’s Pixel phones implement a real-world version of the MAUI framework for computational photography. The “Night Sight” feature requires processing 15-30 images through a multi-frame alignment and HDR+ pipeline – computationally equivalent to approximately 60 seconds of sustained CPU work.

The Offloading Decision in Practice

Condition Processing Location Why
Wi-Fi connected, charging Cloud (Google Photos) Zero energy penalty; cloud produces higher quality result
Wi-Fi connected, battery >50% Hybrid (edge denoise + cloud enhance) Balances quality with battery preservation
Cellular only, any battery Fully local (Tensor NPU) LTE upload of 30 raw images (~150 MB) costs 2,775 mJ vs 1,200 mJ local NPU processing
Airplane mode Fully local (Tensor NPU) No choice; queue cloud processing for later

Measured Energy Comparison

Night Sight processing (15 images, 12MP each):

Local CPU (Cortex-A76):    740 mW x 8.2 sec = 6,068 mJ
Local NPU (Tensor G3):    280 mW x 3.1 sec = 868 mJ   (7x more efficient)
Wi-Fi offload:             925 mW x 2.0 sec (upload) + 55 mW x 1.5 sec (wait)
                           + 555 mW x 0.3 sec (download) = 2,099 mJ
LTE offload:               1,850 mW x 3.5 sec (upload) + 740 mW x 5.0 sec (tail)
                           + 1,110 mW x 0.5 sec (download) = 10,780 mJ

Key insight: The NPU (868 mJ) beats even Wi-Fi offloading (2,099 mJ) for this workload because the data transfer overhead exceeds the computational savings. This contradicts the naive assumption that “cloud is always more energy efficient.” Specialized local hardware has fundamentally changed the offloading calculus – the MAUI framework must account for heterogeneous local processors, not just CPU vs cloud.

When Cloud Still Wins

For Google Photos’ “Magic Eraser” feature (removing objects from images), the ML model requires 3.2 GB of weights that cannot fit on device. Here, offloading is mandatory regardless of energy cost. The decision becomes: offload now (if on Wi-Fi) or defer until Wi-Fi is available (if on cellular).

12.12 Summary

Code offloading and heterogeneous computing are essential for energy-efficient IoT systems:

  1. Energy-Preserving Sensing Plans: Always choose the cheapest method to obtain context - cache, inference, then direct sensing
  2. MAUI Framework: Compare local execution energy against network transmission + idle wait + receive energy
  3. Network-Aware Decisions: Wi-Fi offloading often saves energy; LTE offloading often wastes energy due to tail power
  4. Heterogeneous Cores: Match tasks to appropriate processors - DSP for audio, GPU for parallel, NPU for ML
  5. Context-Aware Adaptation: Adjust offloading decisions based on battery level, network type, and latency requirements

The key insight is that offloading decisions are highly context-dependent. Simple rules like “always offload” or “always local” are suboptimal - intelligent systems adapt to current conditions.

A smart camera needs to classify images (dog/cat/person). Compare local NPU vs Wi-Fi cloud offloading.

Local NPU processing (Google Edge TPU): - Inference time: 15 ms - Power during inference: 2.5 W - Idle power: 0.1 W - Energy per classification: 2.5 W × 0.015 s = 37.5 mJ

Wi-Fi cloud offloading:

  • Image size: 200 KB (JPEG compressed)
  • Result size: 1 KB (JSON classification)
  • Wi-Fi upload: 200 KB at 5 Mbps = 320 ms at 250 mW = 80 mJ
  • Wi-Fi download: 1 KB at 10 Mbps = 0.8 ms at 150 mW = 0.12 mJ
  • Idle wait (cloud processing): 50 ms at 20 mW = 1 mJ
  • Total cloud energy: 80 + 0.12 + 1 = 81.2 mJ

Conclusion: Local NPU wins (37.5 mJ vs 81.2 mJ = 54% energy savings). Wi-Fi transmission overhead exceeds local inference cost.

When cloud wins: If classification requires a 500 MB model (won’t fit on device), offloading is mandatory. Or if the device uses an older CPU instead of NPU: - CPU inference: 800 mW × 2 seconds = 1,600 mJ - Cloud offload: 81.2 mJ (20× more efficient!)

This demonstrates MAUI’s context-aware principle: offloading decision depends on available local hardware AND network conditions.

Task Data Size Computation Network Battery Recommendation Energy Savings
Face detection (embedded NPU) 100 KB 20 ms Wi-Fi Any Local NPU 3× vs cloud
Face detection (CPU only) 100 KB 3 sec Wi-Fi >50% Cloud 5× vs local CPU
Voice recognition (keyword spotting) 5 KB 10 ms Any Any Local DSP 10× vs cloud
Voice recognition (full transcription) 500 KB 5 sec Wi-Fi >30% Cloud 2× vs local
Sensor data ML (simple model) 1 KB 5 ms LTE Any Local 50× vs LTE tail
Video analytics (complex model) 5 MB 10 sec Wi-Fi >70% Cloud Only option (model size)

Decision criteria (MAUI framework):

  1. Model fits on device? → If NO, must offload
  2. Network is Wi-Fi? → If YES, offload heavy computation; if LTE, process locally
  3. Battery <20%? → Always process locally (conserve energy)
  4. Specialized hardware available (NPU/DSP)? → Process locally (10-100× faster)
  5. Latency critical (<100 ms)? → Offload if Wi-Fi, local if LTE (tail latency)

Best For Your Project:

  • Real-time object detection on drone → Local (edge TPU, latency critical)
  • Batch image classification at home → Cloud (Wi-Fi available, no time pressure)
  • Wake word detection on wearable → Local (DSP ultra-low power)
  • Natural language queries → Cloud (models too large for edge)
Common Mistake: Forgetting GPU Power Consumption When Using Heterogeneous Cores

What they do wrong: Developers optimize a computer vision task to run on mobile GPU, achieving 5× speedup over CPU. They assume battery life improves proportionally: “5× faster means 5× less energy!”

Why it fails: GPUs consume 2-5× more power than CPUs even when delivering speedup. The energy equation is:

Energy = Power × Time

If GPU cuts time by 5× but uses 3× power, energy only improves 1.67× (not 5×).

Real calculation (mobile image processing): - CPU: 800 mW × 1,000 ms = 800 mJ - GPU: 2,400 mW × 200 ms (5× faster) = 480 mJ - Savings: 40% (not 80% as naively expected)

When GPU hurts energy: If task is small (CPU takes 50 ms), GPU overhead dominates: - CPU: 800 mW × 50 ms = 40 mJ - GPU: 2,400 mW × 20 ms (2.5× speedup) + 1,200 mW × 15 ms (init) = 48 + 18 = 66 mJ (worse!)

Correct approach: Profile actual power during execution, not just time. Use tools like: - Android Battery Historian - iOS Instruments (Energy Log) - Embedded: INA219 power monitor on VDD rail

Real-world example: A fitness app offloaded step counting to mobile GPU, expecting 10× battery improvement from 10× speedup. Actual battery life: 20% worse! GPU consumed 4.2 W during active processing vs 1.8 W for CPU, and the 30 ms task ran every second — GPU initialization overhead (50 ms at 2 W) consumed more energy than the computation saved. Switching back to CPU with NEON SIMD instructions delivered 3× speedup at 1.2× power = 2.5× net energy savings. Lesson: Speedup ≠ energy savings. Always measure power, not just time.

Common Pitfalls

Many engineers offload computation assuming the cloud is “free” energetically. But radio transmission — especially over cellular or at low signal strength — can consume 10–100× more energy than the computation itself. Always compute the breakeven point before deciding.

Routing a small task to the GPU may actually increase energy consumption because GPU initialization (50–100 ms at 2–3 W) exceeds the energy savings from faster execution. Only use GPU/NPU acceleration for tasks that take more than a few hundred milliseconds on the CPU.

Radio energy increases dramatically at low signal strength as the transmitter boosts power. A cellular link at -100 dBm can consume 10× more energy than at -80 dBm. Always measure transmission energy under real deployment signal conditions.

Offloading is not binary (local vs full cloud). Partial offloading — preprocessing on device to reduce data size, then sending compressed results — often provides the best energy tradeoff. Consider edge nodes as intermediate offload targets when cloud latency or transmission cost is too high.

12.13 What’s Next

If you want to… Read this
Practice with energy optimization worksheets Context Energy Optimization
Go back to duty cycling fundamentals Duty Cycling Fundamentals
Explore context-aware approaches ACE & Shared Context Sensing
Apply strategies to hardware design Hardware & Software Optimisation
Understand energy measurement tools Energy-Aware Measurement