Learning Objectives
- Identify the four key optimisation dimensions (speed, size, power, energy) and explain how application requirements drive priorities
- Compare hardware acceleration options across the processor spectrum from CPU through DSP, FPGA, and ASIC
- Apply compiler optimisation flags (-O0, -O2, -O3, -Os) and evaluate their trade-offs for flash-constrained devices
- Implement fixed-point arithmetic (Qn.m format) to replace floating-point operations on MCUs without an FPU
Hardware and software optimization in IoT means choosing the right tool for each task: selecting specialized processors (DSP, FPGA, ASIC) for compute-intensive work, applying compiler flags (-O2/-Os) for code/speed tradeoffs, and using fixed-point arithmetic to replace costly floating-point operations — collectively reducing energy consumption by 2–100x on resource-constrained devices.
Key Concepts
- Optimization Dimensions: Speed (latency), size (flash/RAM footprint), power (instantaneous draw), and energy (total consumption) — improving one often worsens another
- DSP (Digital Signal Processor): A specialized processor with SIMD instructions optimized for repetitive arithmetic on data streams; 2–10× more energy-efficient than CPU for signal processing
- FPGA (Field-Programmable Gate Array): Reconfigurable hardware that implements custom circuits in silicon; highly energy-efficient for specific algorithms but requires significant design effort
- ASIC (Application-Specific Integrated Circuit): Custom silicon designed for exactly one task; the most energy-efficient option but with high non-recurring engineering (NRE) cost
- Compiler Optimization Level: The
-O flag tells the compiler how aggressively to optimize; -O2 balances speed and size, -Os minimizes code size, -O3 maximizes speed
- Fixed-Point Arithmetic: Representing fractional numbers as scaled integers to avoid FPU operations; 3–10× more energy-efficient than floating-point on MCUs without hardware FPU
- NRE Cost: Non-recurring engineering cost — the one-time design cost of custom hardware; must be amortized over production volume to justify ASIC or custom FPGA designs
Energy and power management determines how long your IoT device can operate between battery changes or charges. Think of packing for a camping trip with limited battery packs – every bit of power must be used wisely. Since many IoT sensors need to run for months or years unattended, power management is often the single most important engineering decision.
“Optimization means making things work better with less,” said Max the Microcontroller. “There are four things you can optimize: speed (how fast), size (how much memory), power (how much energy per second), and total energy (how long the battery lasts). But here is the catch – improving one often makes another worse!”
Sammy the Sensor gave an example: “If Max runs his processor at full speed, he finishes work fast but uses lots of power. If he slows down, he saves power but takes longer. The trick is finding the sweet spot where you finish fast enough but do not waste energy.”
“Hardware and software optimization work together,” explained Lila the LED. “On the hardware side, you might use a specialized chip that does one task really efficiently. On the software side, you use compiler tricks and clever code to squeeze more performance from the same hardware.” Bella the Battery concluded, “The golden rule: measure first, optimize second. Never guess where the bottleneck is – measure it, then fix the biggest problem first!”
Overview
Hardware and software optimization is essential for efficient IoT systems. This topic is covered in four focused chapters that progressively build your understanding from fundamentals through advanced techniques.
Start with Optimization Fundamentals to understand trade-offs, then proceed through hardware, software, and fixed-point chapters based on your needs.
Chapter Guide
Estimated Time: 25 minutes | Level: Intermediate
Learn the core concepts of IoT optimization including:
- The four key dimensions: speed, size, power, and energy
- How application requirements drive optimization priorities
- Common misconceptions about speed vs. power optimization
- When to optimize and what to measure first
Key takeaway: For duty-cycled IoT devices, sleep efficiency matters more than processing speed.
Estimated Time: 30 minutes | Level: Advanced
Explore hardware acceleration options:
- Processor spectrum: CPU -> DSP -> FPGA -> ASIC
- ASIC specialization dimensions (instruction set, memory, interconnect)
- Heterogeneous multicore architectures (ARM big.LITTLE)
- Memory hierarchy and DMA optimization
Key takeaway: Match hardware capabilities to application requirements considering NRE costs and production volume.
Estimated Time: 35 minutes | Level: Advanced
Master software-level optimizations:
- Compiler flags: -O0, -O2, -O3, -Os trade-offs
- Code size strategies for flash-constrained devices
- SIMD vectorization for 4x+ speedup on array operations
- Function inlining and opportunistic sleeping patterns
Key takeaway: Profile before optimizing - 80% of execution time comes from 20% of code.
Estimated Time: 25 minutes | Level: Advanced
Implement efficient numeric computation:
- Qn.m format representation and selection
- Converting floating-point algorithms to fixed-point
- Multiplication, division, and overflow handling
- int8 quantization for edge ML (4x speedup, 4x memory reduction)
Key takeaway: Fixed-point arithmetic offers 4x+ performance at slight precision cost - essential for edge AI.
Quick Reference
| Fundamentals |
Starting any optimization work |
Avoid optimizing the wrong thing |
| Hardware |
Selecting MCU/accelerator, high-volume products |
Match hardware to workload |
| Software |
Firmware development, code efficiency |
Faster execution, smaller binaries |
| Fixed-Point |
DSP, ML inference, FPU-less processors |
4x+ efficiency on integer hardware |
Prerequisites
Before diving into these optimization chapters, you should be familiar with:
Concept Relationships
Hardware-software optimization is a multi-layered discipline connecting across the IoT stack:
- Prerequisites: Requires Energy-Aware Design principles before optimization—must understand energy budget constraints
- Builds Upon: Extends Embedded Programming with performance/efficiency techniques
- Enables: Proper optimization is prerequisite for achieving Energy Harvesting viability (reduce consumption before adding solar)
- Measured By: All optimization claims must be validated with Energy Measurement—never assume, always measure
Layer dependencies: Hardware selection constrains software (no FPU → must use fixed-point). Software optimization reveals hardware needs (profiling shows 90% time in FFT → consider DSP). Iterate between layers.
Common Pitfalls
Guessing which part of the code is slow or power-hungry and optimizing it without measurement leads to wasted effort. Always profile first — measure actual execution time and current draw for each code section, then target the real bottleneck.
The -O3 flag enables aggressive optimizations like loop unrolling that significantly increase code size. On microcontrollers with 64–256 KB flash, -O3 can cause code to exceed flash capacity. Use -O2 or -Os for flash-constrained devices.
GPU acceleration is designed for throughput, not latency or energy efficiency in embedded contexts. For short tasks (<10 ms), GPU initialization overhead consumes more energy than the task itself. Use GPU only for sustained compute-intensive workloads.
Fixed-point multiplication can overflow if the result exceeds the register width. Always saturate intermediate results or use wider accumulators, and validate that your Qn.m format has sufficient integer bits for the maximum expected value.