1611  Code Offloading and Heterogeneous Computing

1611.1 Code Offloading and Heterogeneous Computing

This section provides a stable anchor for cross-references to code offloading and heterogeneous computing across the book.

1611.2 Learning Objectives

By the end of this chapter, you will be able to:

  • Understand Code Offloading Decisions: Explain when to process locally versus offload to cloud based on energy profiles
  • Calculate Offloading Energy Costs: Compute transmission energy for Wi-Fi vs cellular networks
  • Apply MAUI Framework: Use the MAUI decision framework to make context-aware offloading decisions
  • Leverage Heterogeneous Cores: Match computational tasks to appropriate processors (CPU, GPU, DSP, NPU)
  • Design Energy-Preserving Sensing Plans: Find the cheapest sequence of operations to determine context

1611.3 Prerequisites

Before diving into this chapter, you should be familiar with:

1611.4 Energy-Preserving Sensing Plans

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#fff'}}}%%
flowchart TD
    Start[App requests:<br/>AtHome status] --> SP[Sensing Planner]

    SP --> Eval[Evaluate options]

    Eval --> O1[Option 1: Direct GPS<br/>Cost: 100mW<br/>Accuracy: 100%]
    Eval --> O2[Option 2: Wi-Fi SSID<br/>Cost: 20mW<br/>Accuracy: 90%]
    Eval --> O3[Option 3: Infer from<br/>cached Driving=False<br/>Cost: 0mW<br/>Accuracy: 85%]

    O1 --> Best{Select cheapest<br/>above threshold}
    O2 --> Best
    O3 --> Best

    Best --> Choose[Choose O3:<br/>Infer from cache]
    Choose --> Verify{Confidence<br/>> threshold?}
    Verify -->|Yes| Return[Return inferred value]
    Verify -->|No| Fallback[Fall back to Wi-Fi O2]

    style Choose fill:#16A085,stroke:#2C3E50,color:#fff
    style Return fill:#16A085,stroke:#2C3E50,color:#fff
    style O1 fill:#E67E22,stroke:#2C3E50,color:#fff

Figure 1611.1: Energy-Preserving Sensing Plan: Cost-Based Option Selection

Sensing Planner finds the best sequence of proxy attributes to sense, considering: - Direct sensing cost - Inference possibilities from cached attributes - Confidence of inference rules - Overall energy minimization

Example: To determine “InOffice”, options are: 1. Sense directly (80mW) 2. If “Running=True” cached, infer “InOffice=False” (0mW) 3. If “AtHome=True” cached, infer “InOffice=False” (0mW)

Choose the cheapest option!

1611.5 Code Offloading Decisions

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#fff'}}}%%
flowchart TD
    Task[Compute Task] --> Profile[MAUI Profiler]

    Profile --> Local[Local Execution<br/>Energy Cost]
    Profile --> Remote[Remote Execution<br/>Energy Cost]

    Local --> L1[CPU: 500mW × 2s = 1000mJ]
    Remote --> R1[Network TX: 200mW × 0.5s = 100mJ]
    Remote --> R2[Server: 0mW]
    Remote --> R3[Network RX: 100mW × 0.1s = 10mJ]
    R1 --> RTotal[Total: 110mJ]
    R2 --> RTotal
    R3 --> RTotal

    L1 --> Decision{Compare costs}
    RTotal --> Decision

    Decision --> Choose[Choose Remote<br/>110mJ < 1000mJ<br/>9× energy savings!]

    style Choose fill:#16A085,stroke:#2C3E50,color:#fff
    style L1 fill:#E67E22,stroke:#2C3E50,color:#fff

Figure 1611.2: MAUI Code Offloading Decision: Local vs Remote Execution Energy Analysis

MAUI (Mobile Assistance Using Infrastructure): Framework that profiles code components in terms of energy to decide whether to run locally or remotely.

Considerations: - Costs related to transfer of code/data - Dynamic decisions based on network constraints - Latency requirements - Local vs remote execution energy

Example: With 3G, offloading may cost more energy due to high network transmission costs. With Wi-Fi, offloading can save significant energy.

WarningCommon Misconception: “Cloud Processing Is Always More Energy Efficient”

The Misconception: “The cloud has powerful servers, so offloading computation always saves energy on my IoT device.”

The Reality: Network transmission energy often exceeds local computation energy, especially on cellular networks. The decision depends on network type, data size, and computation complexity.

Quantified Comparison:

Task: Process 1MB sensor data with ML model (2 seconds computation)

Option 1: Local Processing (ARM Cortex-M4 @ 80 MHz) - CPU power: 50 mW × 2 seconds = 100 mJ - Total energy: 100 mJ

Option 2: Wi-Fi Offloading - Transmit 1MB: 250 mW × 0.5s = 125 mJ - Receive 10KB result: 150 mW × 0.02s = 3 mJ - Idle during compute: 20 mW × 0.1s = 2 mJ - Total energy: 130 mJ (30% worse than local!)

Option 3: LTE Offloading - RRC state transition: 500 mW × 0.5s = 250 mJ (radio ramp-up) - Transmit 1MB: 800 mW × 1.0s = 800 mJ - Receive 10KB: 400 mW × 0.05s = 20 mJ - Tail energy (radio stays on): 200 mW × 5s = 1000 mJ - Total energy: 2070 mJ (20× worse than local!)

When Cloud Wins:

Task: Complex ML inference (60 seconds on local device) - Local: 50 mW × 60s = 3000 mJ - Wi-Fi offload: 130 mJ transmission + negligible remote = 130 mJ total - Energy savings: 23× improvement!

Decision Matrix:

Network Data Size Computation Time Recommendation
Wi-Fi <100 KB <5 sec Local (transmission overhead dominates)
Wi-Fi <100 KB >30 sec Offload (computation dominates)
Wi-Fi >1 MB >10 sec Offload (parallel advantage)
LTE Any <30 sec Local (tail energy kills savings)
LTE <500 KB >60 sec Offload (if battery >50%)

MAUI’s Context-Aware Approach: - Wi-Fi available + heavy computation → Offload (2-20× savings) - LTE only + light computation → Local (avoid 10-15× penalty) - Battery <20% → Always local (conserve energy) - Latency critical → Offload if Wi-Fi, local if LTE

Key Insight: The 5-10 second LTE “tail energy” (radio staying on after transmission) often consumes more energy than the entire local computation. Context-aware offloading decisions must consider network type, not just raw transmission costs.

1611.6 Local Computation: Heterogeneous Cores

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#fff'}}}%%
flowchart LR
    Task[IoT Task] --> Scheduler[Task Scheduler]

    Scheduler --> CPU[CPU Core<br/>General Purpose<br/>2000 mW<br/>Fast & Flexible]
    Scheduler --> DSP[DSP Core<br/>Signal Processing<br/>500 mW<br/>Efficient for audio]
    Scheduler --> GPU[GPU Core<br/>Parallel Processing<br/>3000 mW<br/>Image processing]
    Scheduler --> NPU[NPU Core<br/>ML Inference<br/>200 mW<br/>AI acceleration]

    CPU --> Ex1[Control flow,<br/>networking]
    DSP --> Ex2[Audio filtering,<br/>voice detection]
    GPU --> Ex3[Image recognition,<br/>video processing]
    NPU --> Ex4[Neural networks,<br/>sensor fusion]

    style NPU fill:#16A085,stroke:#2C3E50,color:#fff
    style DSP fill:#16A085,stroke:#2C3E50,color:#fff
    style GPU fill:#E67E22,stroke:#2C3E50,color:#fff
    style CPU fill:#E67E22,stroke:#2C3E50,color:#fff

Figure 1611.3: Heterogeneous Mobile SoC Architecture: CPU, DSP, GPU, and NPU Task Scheduling

Modern Mobile SoCs include heterogeneous cores: - CPU: General purpose, control flow - GPU: Massively parallel, graphics and compute - DSP: Low-power signal processing, audio/sensor data - NPU: Neural network acceleration, ML inference

Benefits: - Increase performance and power efficiency - Selected tasks shift to more efficient cores - Dynamic voltage/frequency scaling per core

Example - Keyword Spotting: - Optimized GPU is >6x faster than cloud - Optimized GPU is >21x faster than sequential CPU - Optimized GPU with batching outperforms cloud energy-wise

1611.7 Knowledge Check: Heterogeneous Computing

Question 1: A mobile app needs to process images. With Wi-Fi available, remote execution costs 50 mW energy and takes 500ms. Local execution costs 200 mW and takes 2000ms. What should MAUI decide?

Explanation: MAUI makes energy-based offloading decisions. Total energy = Power × Time. Remote: 50 mW × 0.5s = 25 millijoules. Local: 200 mW × 2s = 400 millijoules. Remote uses 16× less energy! With Wi-Fi, network transmission costs are low, making offloading worthwhile. However, with 3G, transmission overhead might reverse this decision. MAUI dynamically profiles and decides based on network quality, latency requirements, and energy trade-offs. For battery-powered devices, energy often matters more than speed.

Question 2: Why did optimized GPU keyword spotting achieve >21× speedup compared to sequential CPU and even outperform cloud offloading?

Explanation: Mobile GPUs excel at parallel tasks like audio/image processing. Keyword spotting involves many parallel FFTs and neural network computations. Sequential CPU processes samples one-by-one (slow), while GPU processes thousands simultaneously (fast). Batching multiple audio segments further amortizes initialization overhead. Local GPU avoids network transmission energy (150-300mW for Wi-Fi). This demonstrates heterogeneous computing benefits: match task characteristics (parallel vs serial) to appropriate core (GPU vs CPU vs DSP) for maximum efficiency.

Question 3: An IoT device profiles a task: Local CPU execution uses 500 mW for 3 seconds. Cloud execution: transmit (100 KB) + receive (10 KB) over LTE (800 kbps, 0.8 mW/KB) + remote compute (50 ms negligible local power). What’s the energy comparison?

Explanation: Local energy = 500 mW × 3s = 1500 mJ. Cloud transmission: (100+10) KB × 0.8 mW/KB = 88 mW power. Time = (110 KB × 8 bits) / 800 kbps ≈ 1.1 s. Transmission energy = 88 mW × 1.1s ≈ 97 mJ. But mobile radio overhead (RRC state transitions, retransmissions) adds ~10-15× multiplier, making real cost ~1000-1200 mJ. This demonstrates why cellular offloading often costs MORE energy than local execution despite faster remote compute. Wi-Fi (lower overhead) or LoRa (different use case) changes this calculus completely.

1611.8 Code Offloading Energy Analysis Worksheet

Scenario: Image processing on wearable device - local vs cloud decision

1611.8.1 Step 1: Local Processing Energy

Component Power Duration Energy
Image Capture 80 mA @ 3.7V = 296 mW 100 ms 29.6 mJ
CPU Processing 200 mA @ 3.7V = 740 mW 3000 ms 2,220 mJ
Total Local - - 2,249.6 mJ

1611.8.2 Step 2: Cloud Offloading Energy (Wi-Fi)

Component Power Duration Energy
Image Capture 80 mA @ 3.7V = 296 mW 100 ms 29.6 mJ
Wi-Fi TX (upload 50KB) 250 mA @ 3.7V = 925 mW 400 ms 370 mJ
Wi-Fi RX (download 5KB) 150 mA @ 3.7V = 555 mW 50 ms 27.75 mJ
Idle Wait (remote processing) 15 mA @ 3.7V = 55.5 mW 500 ms 27.75 mJ
Total Cloud (Wi-Fi) - - 455.1 mJ

Wi-Fi Decision: Offload (saves 1,794 mJ = 80% energy reduction)

1611.8.3 Step 3: Cloud Offloading Energy (LTE)

Component Power Duration Energy
Image Capture 80 mA @ 3.7V = 296 mW 100 ms 29.6 mJ
LTE TX (upload 50KB) 500 mA @ 3.7V = 1,850 mW 800 ms 1,480 mJ
LTE RX (download 5KB) 300 mA @ 3.7V = 1,110 mW 100 ms 111 mJ
RRC State Overhead 200 mA @ 3.7V = 740 mW 2000 ms 1,480 mJ
Total Cloud (LTE) - - 3,100.6 mJ

LTE Decision: Process locally (saves 851 mJ vs LTE offloading)

1611.8.4 Step 4: MAUI Decision Framework

Decision = {
  if (Wi-Fi available AND energy_cloud_wifi < energy_local):
    return "OFFLOAD_WIFI"
  elif (energy_local < energy_cloud_cellular):
    return "PROCESS_LOCAL"
  elif (battery > 50% AND latency_critical):
    return "OFFLOAD_CELLULAR"
  else:
    return "PROCESS_LOCAL"
}

1611.8.5 Step 5: Context-Aware Adaptation

Context Network Battery Decision Energy Rationale
At Home Wi-Fi 80% Offload 455 mJ Wi-Fi cheap, fast
Outdoors LTE 80% Local 2,250 mJ LTE expensive
Outdoors LTE 15% Local 2,250 mJ Battery critical
At Office Wi-Fi 15% Offload 455 mJ Save battery with Wi-Fi

Your Turn: Calculate offloading decisions for your application!

1611.9 Sensor Fusion Energy Optimization Worksheet

Scenario: Location tracking using GPS vs Wi-Fi/accelerometer inference

1611.9.1 Step 1: Direct GPS Sensing

State Current Duration Energy per Hour
GPS Active 45 mA 30 sec 0.375 mAh
Processing 20 mA 2 sec 0.011 mAh
BLE TX 15 mA 1 sec 0.004 mAh
Sleep 10 µA 27 sec 0.000075 mAh

Per measurement (60s cycle): 0.390 mAh Per hour (60 measurements): 23.4 mAh 200mAh battery life: 8.5 hours

1611.9.2 Step 2: ACE Inference Strategy

Use cached GPS + accelerometer for motion detection

Scenario Method Current Duration Frequency
Stationary Cached GPS 10 µA 60 sec 59 min/hour
Moving (inferred) Accel check 0.5 mA 0.5 sec 59 times/hour
Verify Location GPS 45 mA 30 sec 1 time/hour

Energy per hour:

E_stationary = 59 × (10µA × 60s) / 3600 = 0.0098 mAh
E_accel_check = 59 × (0.5mA × 0.5s) / 3600 = 0.0041 mAh
E_gps_verify = 1 × (45mA × 30s + 20mA × 2s) / 3600 = 0.386 mAh
E_total = 0.40 mAh per hour

200mAh battery life: 500 hours = 20.8 days

Energy savings: 58.5× improvement over continuous GPS!

1611.9.3 Step 3: Association Rules for Inference

ACE learns these rules from history:

Rule Support Confidence Inference
Accel_Still=True → AtHome=True 25% 85% Skip GPS if still
Wi-Fi_SSID=Home → AtHome=True 30% 95% Use Wi-Fi instead of GPS
Time=Night AND Still → Sleeping=True 15% 90% 10× reduce all sampling

Optimized energy with rules: - 85% of requests served from cache/inference (0.01 mAh) - 15% require GPS sensing (0.39 mAh) - Average: 0.085 mAh per request - Battery life: 2,352 hours = 98 days!

1611.9.4 Step 4: Battery-Aware Adaptation

Battery Level Strategy GPS Frequency Avg Current
100-50% Normal Every 5 min 0.40 mA
50-20% Conservative Every 15 min 0.15 mA
20-15% Emergency Every 30 min 0.08 mA
<15% Critical Every 60 min 0.04 mA

Your Turn: Design inference rules for your sensor fusion application!

1611.11 Summary

Code offloading and heterogeneous computing are essential for energy-efficient IoT systems:

  1. Energy-Preserving Sensing Plans: Always choose the cheapest method to obtain context - cache, inference, then direct sensing
  2. MAUI Framework: Compare local execution energy against network transmission + idle wait + receive energy
  3. Network-Aware Decisions: Wi-Fi offloading often saves energy; LTE offloading often wastes energy due to tail power
  4. Heterogeneous Cores: Match tasks to appropriate processors - DSP for audio, GPU for parallel, NPU for ML
  5. Context-Aware Adaptation: Adjust offloading decisions based on battery level, network type, and latency requirements

The key insight is that offloading decisions are highly context-dependent. Simple rules like “always offload” or “always local” are suboptimal - intelligent systems adapt to current conditions.

1611.12 What’s Next

The next section covers Energy Optimization Worksheets and Assessment, which provides comprehensive calculation worksheets, detailed quizzes, and practical exercises for applying context-aware energy management techniques to real IoT systems.