1611 Code Offloading and Heterogeneous Computing

1611.1 Code Offloading and Heterogeneous Computing

This section provides a stable anchor for cross-references to code offloading and heterogeneous computing across the book.

1611.2 Learning Objectives

By the end of this chapter, you will be able to:

Understand Code Offloading Decisions: Explain when to process locally versus offload to cloud based on energy profiles
Calculate Offloading Energy Costs: Compute transmission energy for Wi-Fi vs cellular networks
Apply MAUI Framework: Use the MAUI decision framework to make context-aware offloading decisions
Leverage Heterogeneous Cores: Match computational tasks to appropriate processors (CPU, GPU, DSP, NPU)
Design Energy-Preserving Sensing Plans: Find the cheapest sequence of operations to determine context

1611.3 Prerequisites

Before diving into this chapter, you should be familiar with:

ACE System and Shared Context Sensing: Understanding caching and inference for context-aware systems
IoT Reference Models: Knowledge of edge, fog, and cloud computing architectures

1611.4 Energy-Preserving Sensing Plans

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#fff'}}}%%
flowchart TD
    Start[App requests:<br/>AtHome status] --> SP[Sensing Planner]

    SP --> Eval[Evaluate options]

    Eval --> O1[Option 1: Direct GPS<br/>Cost: 100mW<br/>Accuracy: 100%]
    Eval --> O2[Option 2: Wi-Fi SSID<br/>Cost: 20mW<br/>Accuracy: 90%]
    Eval --> O3[Option 3: Infer from<br/>cached Driving=False<br/>Cost: 0mW<br/>Accuracy: 85%]

    O1 --> Best{Select cheapest<br/>above threshold}
    O2 --> Best
    O3 --> Best

    Best --> Choose[Choose O3:<br/>Infer from cache]
    Choose --> Verify{Confidence<br/>> threshold?}
    Verify -->|Yes| Return[Return inferred value]
    Verify -->|No| Fallback[Fall back to Wi-Fi O2]

    style Choose fill:#16A085,stroke:#2C3E50,color:#fff
    style Return fill:#16A085,stroke:#2C3E50,color:#fff
    style O1 fill:#E67E22,stroke:#2C3E50,color:#fff

Figure 1611.1: Energy-Preserving Sensing Plan: Cost-Based Option Selection

Sensing Planner finds the best sequence of proxy attributes to sense, considering: - Direct sensing cost - Inference possibilities from cached attributes - Confidence of inference rules - Overall energy minimization

Example: To determine “InOffice”, options are: 1. Sense directly (80mW) 2. If “Running=True” cached, infer “InOffice=False” (0mW) 3. If “AtHome=True” cached, infer “InOffice=False” (0mW)

Choose the cheapest option!

1611.5 Code Offloading Decisions

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#fff'}}}%%
flowchart TD
    Task[Compute Task] --> Profile[MAUI Profiler]

    Profile --> Local[Local Execution<br/>Energy Cost]
    Profile --> Remote[Remote Execution<br/>Energy Cost]

    Local --> L1[CPU: 500mW × 2s = 1000mJ]
    Remote --> R1[Network TX: 200mW × 0.5s = 100mJ]
    Remote --> R2[Server: 0mW]
    Remote --> R3[Network RX: 100mW × 0.1s = 10mJ]
    R1 --> RTotal[Total: 110mJ]
    R2 --> RTotal
    R3 --> RTotal

    L1 --> Decision{Compare costs}
    RTotal --> Decision

    Decision --> Choose[Choose Remote<br/>110mJ < 1000mJ<br/>9× energy savings!]

    style Choose fill:#16A085,stroke:#2C3E50,color:#fff
    style L1 fill:#E67E22,stroke:#2C3E50,color:#fff

Figure 1611.2: MAUI Code Offloading Decision: Local vs Remote Execution Energy Analysis

MAUI (Mobile Assistance Using Infrastructure): Framework that profiles code components in terms of energy to decide whether to run locally or remotely.

Considerations: - Costs related to transfer of code/data - Dynamic decisions based on network constraints - Latency requirements - Local vs remote execution energy

Example: With 3G, offloading may cost more energy due to high network transmission costs. With Wi-Fi, offloading can save significant energy.

Common Misconception: “Cloud Processing Is Always More Energy Efficient”

The Misconception: “The cloud has powerful servers, so offloading computation always saves energy on my IoT device.”

The Reality: Network transmission energy often exceeds local computation energy, especially on cellular networks. The decision depends on network type, data size, and computation complexity.

Quantified Comparison:

Task: Process 1MB sensor data with ML model (2 seconds computation)

Option 1: Local Processing (ARM Cortex-M4 @ 80 MHz) - CPU power: 50 mW × 2 seconds = 100 mJ - Total energy: 100 mJ

Option 2: Wi-Fi Offloading - Transmit 1MB: 250 mW × 0.5s = 125 mJ - Receive 10KB result: 150 mW × 0.02s = 3 mJ - Idle during compute: 20 mW × 0.1s = 2 mJ - Total energy: 130 mJ (30% worse than local!)

Option 3: LTE Offloading - RRC state transition: 500 mW × 0.5s = 250 mJ (radio ramp-up) - Transmit 1MB: 800 mW × 1.0s = 800 mJ - Receive 10KB: 400 mW × 0.05s = 20 mJ - Tail energy (radio stays on): 200 mW × 5s = 1000 mJ - Total energy: 2070 mJ (20× worse than local!)

When Cloud Wins:

Task: Complex ML inference (60 seconds on local device) - Local: 50 mW × 60s = 3000 mJ - Wi-Fi offload: 130 mJ transmission + negligible remote = 130 mJ total - Energy savings: 23× improvement!

Decision Matrix:

Network	Data Size	Computation Time	Recommendation
Wi-Fi	<100 KB	<5 sec	Local (transmission overhead dominates)
Wi-Fi	<100 KB	>30 sec	Offload (computation dominates)
Wi-Fi	>1 MB	>10 sec	Offload (parallel advantage)
LTE	Any	<30 sec	Local (tail energy kills savings)
LTE	<500 KB	>60 sec	Offload (if battery >50%)

MAUI’s Context-Aware Approach: - Wi-Fi available + heavy computation → Offload (2-20× savings) - LTE only + light computation → Local (avoid 10-15× penalty) - Battery <20% → Always local (conserve energy) - Latency critical → Offload if Wi-Fi, local if LTE

Key Insight: The 5-10 second LTE “tail energy” (radio staying on after transmission) often consumes more energy than the entire local computation. Context-aware offloading decisions must consider network type, not just raw transmission costs.

1611.6 Local Computation: Heterogeneous Cores

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#fff'}}}%%
flowchart LR
    Task[IoT Task] --> Scheduler[Task Scheduler]

    Scheduler --> CPU[CPU Core<br/>General Purpose<br/>2000 mW<br/>Fast & Flexible]
    Scheduler --> DSP[DSP Core<br/>Signal Processing<br/>500 mW<br/>Efficient for audio]
    Scheduler --> GPU[GPU Core<br/>Parallel Processing<br/>3000 mW<br/>Image processing]
    Scheduler --> NPU[NPU Core<br/>ML Inference<br/>200 mW<br/>AI acceleration]

    CPU --> Ex1[Control flow,<br/>networking]
    DSP --> Ex2[Audio filtering,<br/>voice detection]
    GPU --> Ex3[Image recognition,<br/>video processing]
    NPU --> Ex4[Neural networks,<br/>sensor fusion]

    style NPU fill:#16A085,stroke:#2C3E50,color:#fff
    style DSP fill:#16A085,stroke:#2C3E50,color:#fff
    style GPU fill:#E67E22,stroke:#2C3E50,color:#fff
    style CPU fill:#E67E22,stroke:#2C3E50,color:#fff

Figure 1611.3: Heterogeneous Mobile SoC Architecture: CPU, DSP, GPU, and NPU Task Scheduling

Modern Mobile SoCs include heterogeneous cores: - CPU: General purpose, control flow - GPU: Massively parallel, graphics and compute - DSP: Low-power signal processing, audio/sensor data - NPU: Neural network acceleration, ML inference

Benefits: - Increase performance and power efficiency - Selected tasks shift to more efficient cores - Dynamic voltage/frequency scaling per core

Example - Keyword Spotting: - Optimized GPU is >6x faster than cloud - Optimized GPU is >21x faster than sequential CPU - Optimized GPU with batching outperforms cloud energy-wise

1611.7 Knowledge Check: Heterogeneous Computing

Quiz: Local Computation and Offloading

Question 1: A mobile app needs to process images. With Wi-Fi available, remote execution costs 50 mW energy and takes 500ms. Local execution costs 200 mW and takes 2000ms. What should MAUI decide?

Explanation: MAUI makes energy-based offloading decisions. Total energy = Power × Time. Remote: 50 mW × 0.5s = 25 millijoules. Local: 200 mW × 2s = 400 millijoules. Remote uses 16× less energy! With Wi-Fi, network transmission costs are low, making offloading worthwhile. However, with 3G, transmission overhead might reverse this decision. MAUI dynamically profiles and decides based on network quality, latency requirements, and energy trade-offs. For battery-powered devices, energy often matters more than speed.

Question 2: Why did optimized GPU keyword spotting achieve >21× speedup compared to sequential CPU and even outperform cloud offloading?

Explanation: Mobile GPUs excel at parallel tasks like audio/image processing. Keyword spotting involves many parallel FFTs and neural network computations. Sequential CPU processes samples one-by-one (slow), while GPU processes thousands simultaneously (fast). Batching multiple audio segments further amortizes initialization overhead. Local GPU avoids network transmission energy (150-300mW for Wi-Fi). This demonstrates heterogeneous computing benefits: match task characteristics (parallel vs serial) to appropriate core (GPU vs CPU vs DSP) for maximum efficiency.

Question 3: An IoT device profiles a task: Local CPU execution uses 500 mW for 3 seconds. Cloud execution: transmit (100 KB) + receive (10 KB) over LTE (800 kbps, 0.8 mW/KB) + remote compute (50 ms negligible local power). What’s the energy comparison?

Explanation: Local energy = 500 mW × 3s = 1500 mJ. Cloud transmission: (100+10) KB × 0.8 mW/KB = 88 mW power. Time = (110 KB × 8 bits) / 800 kbps ≈ 1.1 s. Transmission energy = 88 mW × 1.1s ≈ 97 mJ. But mobile radio overhead (RRC state transitions, retransmissions) adds ~10-15× multiplier, making real cost ~1000-1200 mJ. This demonstrates why cellular offloading often costs MORE energy than local execution despite faster remote compute. Wi-Fi (lower overhead) or LoRa (different use case) changes this calculus completely.

1611.8 Code Offloading Energy Analysis Worksheet

Work Through: Code Offloading Energy Analysis

Scenario: Image processing on wearable device - local vs cloud decision

1611.8.1 Step 1: Local Processing Energy

Component	Power	Duration	Energy
Image Capture	80 mA @ 3.7V = 296 mW	100 ms	29.6 mJ
CPU Processing	200 mA @ 3.7V = 740 mW	3000 ms	2,220 mJ
Total Local	-	-	2,249.6 mJ

1611.8.2 Step 2: Cloud Offloading Energy (Wi-Fi)

Component	Power	Duration	Energy
Image Capture	80 mA @ 3.7V = 296 mW	100 ms	29.6 mJ
Wi-Fi TX (upload 50KB)	250 mA @ 3.7V = 925 mW	400 ms	370 mJ
Wi-Fi RX (download 5KB)	150 mA @ 3.7V = 555 mW	50 ms	27.75 mJ
Idle Wait (remote processing)	15 mA @ 3.7V = 55.5 mW	500 ms	27.75 mJ
Total Cloud (Wi-Fi)	-	-	455.1 mJ

Wi-Fi Decision: Offload (saves 1,794 mJ = 80% energy reduction)

1611.8.3 Step 3: Cloud Offloading Energy (LTE)

Component	Power	Duration	Energy
Image Capture	80 mA @ 3.7V = 296 mW	100 ms	29.6 mJ
LTE TX (upload 50KB)	500 mA @ 3.7V = 1,850 mW	800 ms	1,480 mJ
LTE RX (download 5KB)	300 mA @ 3.7V = 1,110 mW	100 ms	111 mJ
RRC State Overhead	200 mA @ 3.7V = 740 mW	2000 ms	1,480 mJ
Total Cloud (LTE)	-	-	3,100.6 mJ

LTE Decision: Process locally (saves 851 mJ vs LTE offloading)

1611.8.4 Step 4: MAUI Decision Framework

Decision = {
  if (Wi-Fi available AND energy_cloud_wifi < energy_local):
    return "OFFLOAD_WIFI"
  elif (energy_local < energy_cloud_cellular):
    return "PROCESS_LOCAL"
  elif (battery > 50% AND latency_critical):
    return "OFFLOAD_CELLULAR"
  else:
    return "PROCESS_LOCAL"
}

1611.8.5 Step 5: Context-Aware Adaptation

Context	Network	Battery	Decision	Energy	Rationale
At Home	Wi-Fi	80%	Offload	455 mJ	Wi-Fi cheap, fast
Outdoors	LTE	80%	Local	2,250 mJ	LTE expensive
Outdoors	LTE	15%	Local	2,250 mJ	Battery critical
At Office	Wi-Fi	15%	Offload	455 mJ	Save battery with Wi-Fi

Your Turn: Calculate offloading decisions for your application!

1611.9 Sensor Fusion Energy Optimization Worksheet

Work Through: Sensor Fusion Energy Optimization

Scenario: Location tracking using GPS vs Wi-Fi/accelerometer inference

1611.9.1 Step 1: Direct GPS Sensing

State	Current	Duration	Energy per Hour
GPS Active	45 mA	30 sec	0.375 mAh
Processing	20 mA	2 sec	0.011 mAh
BLE TX	15 mA	1 sec	0.004 mAh
Sleep	10 µA	27 sec	0.000075 mAh

Per measurement (60s cycle): 0.390 mAh Per hour (60 measurements): 23.4 mAh 200mAh battery life: 8.5 hours

1611.9.2 Step 2: ACE Inference Strategy

Use cached GPS + accelerometer for motion detection

Scenario	Method	Current	Duration	Frequency
Stationary	Cached GPS	10 µA	60 sec	59 min/hour
Moving (inferred)	Accel check	0.5 mA	0.5 sec	59 times/hour
Verify Location	GPS	45 mA	30 sec	1 time/hour

Energy per hour:

E_stationary = 59 × (10µA × 60s) / 3600 = 0.0098 mAh
E_accel_check = 59 × (0.5mA × 0.5s) / 3600 = 0.0041 mAh
E_gps_verify = 1 × (45mA × 30s + 20mA × 2s) / 3600 = 0.386 mAh
E_total = 0.40 mAh per hour

200mAh battery life: 500 hours = 20.8 days

Energy savings: 58.5× improvement over continuous GPS!

1611.9.3 Step 3: Association Rules for Inference

ACE learns these rules from history:

Rule	Support	Confidence	Inference
Accel_Still=True → AtHome=True	25%	85%	Skip GPS if still
Wi-Fi_SSID=Home → AtHome=True	30%	95%	Use Wi-Fi instead of GPS
Time=Night AND Still → Sleeping=True	15%	90%	10× reduce all sampling

Optimized energy with rules: - 85% of requests served from cache/inference (0.01 mAh) - 15% require GPS sensing (0.39 mAh) - Average: 0.085 mAh per request - Battery life: 2,352 hours = 98 days!

1611.9.4 Step 4: Battery-Aware Adaptation

Battery Level	Strategy	GPS Frequency	Avg Current
100-50%	Normal	Every 5 min	0.40 mA
50-20%	Conservative	Every 15 min	0.15 mA
20-15%	Emergency	Every 30 min	0.08 mA
<15%	Critical	Every 60 min	0.04 mA

Your Turn: Design inference rules for your sensor fusion application!

1611.10 Visual Reference Gallery

MAUI Offloading Framework

Geometric flowchart showing the MAUI code offloading decision process. Compares local execution energy against network transfer plus cloud execution, factoring in network conditions, battery state, and latency requirements.

MAUI Offloading

Intelligent offloading frameworks like MAUI reduce energy by delegating computation-heavy tasks to the cloud when network conditions are favorable.

LEO Low-Energy Offloading

Artistic overview of LEO low-energy offloading system showing energy profiler, offloading decision engine, and adaptive partitioner that splits computation between mobile device and cloud based on real-time energy and latency constraints

LEO Overview

LEO (Low Energy Offloading) extends MAUI with dynamic adaptation. This visualization shows how the energy profiler monitors real-time consumption, the decision engine evaluates offload candidates, and the adaptive partitioner splits computation to minimize total energy under latency constraints.

Local Computation Performance

Performance comparison chart showing local GPU computation speedup versus cloud offloading for mobile keyword spotting achieving 21x faster processing

Local Computation

Local GPU processing can outperform cloud offloading for many IoT workloads, achieving 21× speedup over sequential CPU while avoiding network transmission energy costs.

1611.11 Summary

Code offloading and heterogeneous computing are essential for energy-efficient IoT systems:

Energy-Preserving Sensing Plans: Always choose the cheapest method to obtain context - cache, inference, then direct sensing
MAUI Framework: Compare local execution energy against network transmission + idle wait + receive energy
Network-Aware Decisions: Wi-Fi offloading often saves energy; LTE offloading often wastes energy due to tail power
Heterogeneous Cores: Match tasks to appropriate processors - DSP for audio, GPU for parallel, NPU for ML
Context-Aware Adaptation: Adjust offloading decisions based on battery level, network type, and latency requirements

The key insight is that offloading decisions are highly context-dependent. Simple rules like “always offload” or “always local” are suboptimal - intelligent systems adapt to current conditions.

1611.12 What’s Next

The next section covers Energy Optimization Worksheets and Assessment, which provides comprehensive calculation worksheets, detailed quizzes, and practical exercises for applying context-aware energy management techniques to real IoT systems.