25  OTA Updates and Security

25.1 Firmware Updates and Over-the-Air (OTA) Security

This chapter covers the critical topic of securely updating IoT device firmware, including OTA (Over-the-Air) update architectures, code signing, rollback protection, and strategies for managing updates across large device fleets.

25.2 Learning Objectives

By the end of this chapter, you will be able to:

  • Architect secure OTA update pipelines for IoT device fleets
  • Implement cryptographic code signing for firmware authenticity verification
  • Configure anti-rollback protection using persistent version counters
  • Devise staged update strategies for large-scale device deployments
  • Assess regulatory compliance requirements for IoT software updates

Key Concepts

  • Firmware signing: Using a cryptographic signature (RSA, ECDSA) to guarantee that firmware came from the legitimate manufacturer and has not been modified, verified by the device before applying any update.
  • Secure boot: A hardware-enforced process that verifies the cryptographic signature of each software component before executing it, ensuring only authenticated firmware runs on the device.
  • Delta update: An OTA update that transmits only the differences between the current and new firmware version rather than the complete image, dramatically reducing update size and transmission time for bandwidth-constrained devices.
  • Rollback protection: A mechanism preventing an attacker from downgrading device firmware to a vulnerable older version, typically implemented using a monotonic version counter stored in hardware.
  • Update manifest: A metadata document (signed by the manufacturer) describing a firmware update: version number, target device type, cryptographic hash, and rollout policy — verified before the device downloads the firmware binary.
  • SUIT (Software Updates for Internet of Things): An IETF standard (RFC 9124) for OTA update manifests and procedures specifically designed for constrained IoT devices, providing interoperability across vendors and platforms.
In 60 Seconds

Secure OTA (Over-the-Air) firmware updates are the mechanism by which IoT security vulnerabilities discovered after deployment can be patched — without OTA capability, a deployed fleet of vulnerable devices cannot be remediated except by physical replacement. The security of the OTA mechanism itself is critical: an insecure update channel can turn an OTA system into an attack vector for installing malicious firmware at scale.

Firmware security is about protecting the core software that runs on IoT devices and ensuring updates are delivered safely. Think of firmware as the operating system of your smart device – if an attacker can modify it, they control the device completely. Secure update mechanisms ensure that only genuine, untampered software gets installed.

“Great news, everyone! There is a new update for us!” Max the Microcontroller announced excitedly. “But wait – how do we know it is really from our manufacturer and not a trick from a bad guy?”

Sammy the Sensor raised a good point. “Imagine ordering a pizza. How do you know the delivery person is real and nobody swapped your pizza for something gross on the way? That is exactly the problem with over-the-air updates! The update travels across the internet, and someone could tamper with it.”

“That is why we use code signing!” Lila the LED explained. “The manufacturer puts a special digital seal on the update – like a wax stamp on a royal letter. When the update arrives, Max checks the seal. If it matches, the update is genuine. If the seal is broken or missing, Max refuses to install it. No fake updates getting past us!”

“And I make sure we have enough power to finish the whole update before we start,” Bella the Battery added. “Imagine if the power died halfway through installing – you would be left with half-old, half-new code that does not work! I also help with rollback protection, which means if a new update has a bug, we can safely go back to the previous working version. Updating safely is just as important as updating at all!”

25.3 The Update Imperative

IoT devices face a critical challenge: they must be updateable throughout their 10-20 year lifespans, yet updates themselves can be attack vectors. A well-designed OTA system balances security with reliability.

25.3.1 Why Updates Matter

Without Updates With Secure Updates
Known vulnerabilities persist indefinitely Patches deployed within days of disclosure
Device becomes part of botnets (Mirai) Security improvements continuously applied
Compliance violations (GDPR, HIPAA) Regulatory requirements met
Customer trust erodes Product value increases over time
Liability exposure increases Documented security posture

25.3.2 OTA Update Architecture

Diagram showing the end-to-end OTA update architecture for IoT devices, including the build server that compiles firmware, the signing infrastructure that applies cryptographic signatures, the CDN distribution network, and the IoT devices that download, verify, and install updates using A/B partitioning.

OTA Update Architecture Overview

25.4 Code Signing for Firmware Updates

Code signing is the foundation of OTA security. It ensures that only firmware created by the authorized manufacturer can be installed on devices.

Diagram showing the code signing process: Developer creates software, generates hash of software, encrypts hash with private key to create signature, attaches signature to software. On verification side: User downloads software with signature, decrypts signature with public key to get original hash, computes hash of downloaded software, compares hashes - if match, software is authentic and unmodified.

Code Signing and Verification Flow

Source: NPTEL IoT Security Course

This diagram illustrates the fundamental code signing workflow used for firmware updates:

  1. Signing (Developer/Build Server):
    • Compute cryptographic hash (SHA-256) of firmware
    • Sign hash with private key = digital signature
    • Attach signature to firmware package
  2. Verification (IoT Device):
    • Download firmware + signature
    • Verify signature with public key to recover original hash
    • Compute hash of downloaded firmware
    • Compare: if hashes match, firmware is authentic and unmodified
Visualization of the code signing process for IoT firmware updates showing the developer build system generating a cryptographic signature using a private key stored in HSM, the signature traveling with the firmware through distribution channels, and the IoT device verifying the signature using the public key before applying the update.
Figure 25.1: Code Signing Process - Ensuring firmware authenticity from build to device
Diagram of secure OTA update architecture showing staged rollout from build server through signing infrastructure to CDN distribution, with IoT devices performing signature verification, A/B partition switching, and automatic rollback on failure.
Figure 25.2: Secure OTA Architecture - End-to-end update pipeline with cryptographic verification

25.5 OTA Security Checklist

Secure OTA Update Requirements

Before Release:

Distribution:

Device-Side:

Post-Update:

25.6 Dual-Bank (A/B) Update Scheme

The A/B partition scheme ensures devices can always boot, even if an update fails:

Diagram illustrating the dual-bank A/B flash memory partition layout for OTA updates, showing two equal-sized firmware slots (A and B), a bootloader region, a partition table, and an OTA selection flag that determines which bank to boot from. The active partition runs current firmware while the inactive partition receives the new update.

A/B Partition Flash Layout

Update Process:

  1. New firmware downloaded to inactive bank (B)
  2. Signature verified
  3. Bootloader flag set to boot from B
  4. Device reboots
  5. If B boots successfully, mark B as active
  6. If B fails to boot, bootloader automatically reverts to A

25.7 Anti-Rollback Protection

Prevents attackers from flashing older, vulnerable firmware:

Diagram showing anti-rollback protection for OTA updates, illustrating how a monotonic counter stored in non-volatile memory (EEPROM or eFuse) prevents firmware downgrade attacks by rejecting any update with a version number lower than the stored minimum version.

Anti-Rollback Protection Mechanism
Common Mistake: Anti-Rollback Counters Stored in Volatile Memory

The Mistake: Developers store the anti-rollback counter in RAM or a variable that resets on reboot, believing the signature check alone prevents malicious firmware.

Flawed Implementation:

// Global variable (resets to 0 on every boot)
int currentFirmwareVersion = 0;

bool verifyFirmwareUpdate(byte* firmware, int newVersion) {
    // Check signature
    if (!verifySignature(firmware)) return false;

    // BROKEN: currentFirmwareVersion resets to 0 on reboot
    if (newVersion <= currentFirmwareVersion) {
        return false;  // Prevent rollback
    }

    currentFirmwareVersion = newVersion;
    return true;
}

Why This Fails:

  1. Attacker can downgrade on every reboot: Simply power-cycle the device and flash vulnerable firmware v1.0 (counter resets to 0, so 1.0 > 0 passes check)
  2. Signature check insufficient: Old firmware v1.0 is still legitimately signed by manufacturer, so signature verification passes
  3. Known vulnerabilities exploitable: Device can be forced back to firmware with CVE-2023-1234 that was patched in v2.0

Attack Scenario (real-world):

Timeline:
- Device ships with firmware v1.0 (no known vulnerabilities)
- Manufacturer releases v2.0 (adds features)
- Security researcher discovers CVE-2023-1234 in v1.0 (remote code execution)
- Manufacturer releases v3.0 (patches CVE-2023-1234)
- Attacker power-cycles device, flashes signed v1.0 firmware
- RAM counter resets to 0, so v1.0 > 0 check passes
- Attacker exploits CVE-2023-1234 to take full control

Correct Implementation (persistent EEPROM/NVS counter):

#include <Preferences.h>

Preferences prefs;
const char* ROLLBACK_KEY = "fw_min_ver";

bool verifyFirmwareUpdate(byte* firmware, int newVersion) {
    // Read minimum version from non-volatile storage
    prefs.begin("ota", false);
    int minVersion = prefs.getInt(ROLLBACK_KEY, 0);

    // Verify signature
    if (!verifySignature(firmware)) {
        prefs.end();
        return false;
    }

    // SECURE: Counter persists across reboots
    if (newVersion < minVersion) {
        Serial.printf("[ROLLBACK BLOCKED] New v%d < Min v%d\n",
                      newVersion, minVersion);
        prefs.end();
        return false;
    }

    // Update minimum version counter (monotonic increase only)
    prefs.putInt(ROLLBACK_KEY, newVersion);
    prefs.end();

    return true;
}

// One-time init at manufacturing
void burnAntiRollbackCounter(int initialVersion) {
    prefs.begin("ota", false);
    prefs.putInt(ROLLBACK_KEY, initialVersion);
    prefs.end();
}

Additional Hardening (eFuse for tamper-proof counter on ESP32):

#include "esp_efuse.h"

// Burn version into one-time-programmable eFuse
void burnVersionToEFuse(int version) {
    esp_efuse_write_field_blob(ESP_EFUSE_SECURE_VERSION,
                                &version, sizeof(version));
}

int getMinVersionFromEFuse() {
    int minVersion = 0;
    esp_efuse_read_field_blob(ESP_EFUSE_SECURE_VERSION,
                               &minVersion, sizeof(minVersion));
    return minVersion;
}

Real-World Impact: A medical device manufacturer deployed 50,000 insulin pumps with volatile rollback counters. Security researchers discovered they could downgrade devices to vulnerable firmware by power-cycling during the update process. The FDA mandated a recall requiring hardware rework to add persistent counters in secure EEPROM. Cost: $8.5 million (field service visits + regulatory fines) vs. $0 to implement correctly during initial development.

Key Lesson: Anti-rollback counters MUST survive power loss, firmware corruption, and deliberate attacks. Store in non-volatile memory (EEPROM, NVS, or one-time-programmable eFuse for maximum security). Test by power-cycling device mid-update and verifying counter persists.

25.8 Case Study: Chrysler 2015 OTA Vulnerability

What Happened:

  • Researchers demonstrated remote control of Jeep Cherokee via cellular connection
  • Uconnect entertainment system had no authentication for incoming connections
  • No network segmentation between entertainment and vehicle control systems

The Fix:

  • Chrysler had to recall 1.4 million vehicles
  • USB-based firmware update (no OTA capability at the time)
  • Added network segmentation and authentication

Lesson: OTA update capability would have allowed rapid response without physical recalls.

Common Misconception: “We’ll Add Security Updates Later”

The Myth: “We can ship with basic update functionality and add security features in future firmware.”

The Reality:

  • If the bootloader doesn’t verify signatures, adding verification later doesn’t help - attackers can flash the old, unsigned bootloader
  • Anti-rollback counters must be in place from the start
  • Secure boot must be enabled at manufacturing time (eFuse burning)
  • Key rotation capability must be designed in from day one

The Truth: Security architecture decisions are made at design time. Retrofitting secure OTA is extremely difficult and often impossible without hardware recalls.

25.9 EU Cyber Resilience Act Requirements (2024)

The EU CRA mandates security update requirements for IoT products:

Requirement Description
Update Capability Products must be designed for secure updates
Free Security Updates For minimum 5 years or product lifetime
Vulnerability Disclosure 24-hour notification to ENISA for exploited vulnerabilities
SBOM Software Bill of Materials must be maintained
Default Security Products secure out of the box

25.10 Worked Example: Calculating OTA Update Costs and Risks

Scenario: A water utility manages 50,000 smart meters deployed across a city. A critical vulnerability is discovered that requires a firmware update. The meters connect via NB-IoT with 20 KB/s effective throughput. Current firmware is 256 KB.

Bandwidth and Time Analysis:

Firmware size: 256 KB
Delta update size: 48 KB (81% reduction using binary diff)
Full update per device: 256 KB / 20 KB/s = 12.8 seconds
Delta update per device: 48 KB / 20 KB/s = 2.4 seconds

Fleet-wide bandwidth (full): 50,000 x 256 KB = 12,500 MB = 12.2 GB
Fleet-wide bandwidth (delta): 50,000 x 48 KB = 2,344 MB = 2.3 GB

NB-IoT data cost: ~$0.01 per MB
Full update cost: 12,500 MB x $0.01 = $125
Delta update cost: 2,344 MB x $0.01 = $23.44
Annual savings from delta updates (4 updates/year): $406

Staged Rollout Risk Calculation:

Assume the update has a 2% failure rate on hardware revision 3.1 (unknown before deployment). With 8,000 devices on rev 3.1 (16% of fleet):

Rollout Strategy Devices Affected Before Detection Recovery Cost
100% simultaneous 160 devices (2% of 8,000 rev 3.1) $16,000 (truck rolls at $100 each)
1% canary, 24h pause 2 devices (2% of 80 rev 3.1 in 1% sample) $200
1% canary + HW-rev grouping ~1 device (test rev 3.1 subset specifically) $100

Key Insight: Staged rollout with hardware-revision-aware grouping reduces worst-case failure impact by 160x compared to simultaneous deployment.

A/B Partition Flash Layout (real ESP32 example):

Address Range        Size    Content
0x001000 - 0x008FFF  32 KB   Bootloader (write-protected)
0x009000 - 0x00AFFF   8 KB   Partition table
0x00B000 - 0x00BFFF   4 KB   OTA data (which bank to boot)
0x00C000 - 0x01BFFF  64 KB   NVS (anti-rollback counter, config)
0x020000 - 0x1FFFFF  1.9 MB  App partition A (current firmware)
0x200000 - 0x3DFFFF  1.9 MB  App partition B (update target)
0x3E0000 - 0x3FFFFF 128 KB   Reserved (coredump, factory data)

Total flash required: 4 MB. The A/B scheme doubles firmware storage cost but eliminates bricking risk.

25.11 Update Strategies for Large Fleets

25.11.1 Staged Rollout

Stage Percentage Duration Purpose
Canary 1% 48 hours Detect critical bugs
Early Adopters 10% 1 week Broader testing
Gradual 25% 2 weeks Monitor for issues
Majority 50% 2 weeks Wide deployment
Complete 100% Ongoing Catch stragglers

25.11.2 Update Scheduling

  • Maintenance Windows: Update during low-usage periods
  • Bandwidth Management: Limit concurrent downloads
  • Geographic Rollout: Start with specific regions
  • Device Grouping: Update by device type, firmware version, or customer

Run this Python code to simulate OTA update deployment across a device fleet. Compare simultaneous vs staged rollout strategies and see how staging catches bugs before fleet-wide impact.

import random

class OTASimulator:
    """Simulate OTA firmware rollout across a device fleet."""

    def __init__(self, fleet_size, hw_revisions):
        self.devices = []
        for i in range(fleet_size):
            rev = random.choice(hw_revisions)
            self.devices.append({
                "id": f"device-{i:05d}",
                "hw_rev": rev,
                "fw_version": "1.0.0",
                "status": "online",
            })

    def deploy_update(self, fw_version, failure_rates, strategy="simultaneous"):
        """Deploy update with per-hardware-revision failure rates."""
        results = {"success": 0, "failed": 0, "bricked": 0,
                   "not_updated": 0, "stages": []}

        if strategy == "simultaneous":
            targets = list(range(len(self.devices)))
            stage_result = self._update_batch(targets, fw_version,
                                               failure_rates)
            results["stages"].append(("All devices", stage_result))

        elif strategy == "staged":
            stages = [
                ("Canary (1%)", 0.01),
                ("Early adopters (10%)", 0.10),
                ("Gradual (25%)", 0.25),
                ("Majority (50%)", 0.50),
                ("Complete (100%)", 1.00),
            ]
            updated = set()
            abort = False
            for stage_name, pct in stages:
                if abort:
                    remaining = len(self.devices) - len(updated)
                    results["not_updated"] += remaining
                    results["stages"].append((stage_name, {"skipped": True,
                                              "reason": "Aborted"}))
                    continue

                target_count = int(len(self.devices) * pct)
                candidates = [i for i in range(len(self.devices))
                              if i not in updated]
                batch = candidates[:target_count - len(updated)]
                stage_result = self._update_batch(batch, fw_version,
                                                   failure_rates)
                updated.update(batch)
                results["stages"].append((stage_name, stage_result))

                # Check if failure rate exceeds 1% threshold
                if stage_result["total"] > 0:
                    fail_rate = stage_result["failed"] / stage_result["total"]
                    if fail_rate > 0.01 and stage_result["total"] >= 10:
                        abort = True
                        results["stages"][-1][1]["abort_triggered"] = True

        # Tally results
        for _, sr in results["stages"]:
            if isinstance(sr, dict) and "skipped" not in sr:
                results["success"] += sr.get("success", 0)
                results["failed"] += sr.get("failed", 0)
                results["bricked"] += sr.get("bricked", 0)

        return results

    def _update_batch(self, indices, fw_version, failure_rates):
        result = {"success": 0, "failed": 0, "bricked": 0, "total": len(indices)}
        for i in indices:
            dev = self.devices[i]
            fail_rate = failure_rates.get(dev["hw_rev"], 0)
            if random.random() < fail_rate:
                if random.random() < 0.1:  # 10% of failures = brick
                    result["bricked"] += 1
                    dev["status"] = "bricked"
                else:
                    result["failed"] += 1
                    dev["fw_version"] = "1.0.0"  # Rollback
            else:
                result["success"] += 1
                dev["fw_version"] = fw_version
        return result

# Simulation setup
random.seed(42)
fleet_size = 10_000
hw_revisions = ["rev2.0", "rev2.1", "rev3.0", "rev3.1"]

# Bug: firmware 2.0 has 8% failure rate on rev3.1 hardware
failure_rates = {
    "rev2.0": 0.001,  # 0.1% - normal
    "rev2.1": 0.001,  # 0.1% - normal
    "rev3.0": 0.002,  # 0.2% - normal
    "rev3.1": 0.08,   # 8% - BUG on this hardware!
}

# Strategy 1: Simultaneous rollout
print("=== Strategy 1: Simultaneous Rollout ===")
sim1 = OTASimulator(fleet_size, hw_revisions)
r1 = sim1.deploy_update("2.0.0", failure_rates, strategy="simultaneous")
print(f"  Success: {r1['success']:,}")
print(f"  Failed (rolled back): {r1['failed']:,}")
print(f"  Bricked: {r1['bricked']:,}")
truck_cost_1 = r1["bricked"] * 100  # $100 per truck roll
print(f"  Truck roll cost: ${truck_cost_1:,}")

# Strategy 2: Staged rollout
print("\n=== Strategy 2: Staged Rollout (1% abort threshold) ===")
sim2 = OTASimulator(fleet_size, hw_revisions)
r2 = sim2.deploy_update("2.0.0", failure_rates, strategy="staged")
for stage_name, sr in r2["stages"]:
    if isinstance(sr, dict) and "skipped" not in sr:
        fail_pct = (sr["failed"] + sr["bricked"]) / max(sr["total"], 1) * 100
        abort_msg = " ** ABORT TRIGGERED **" if sr.get("abort_triggered") else ""
        print(f"  {stage_name}: {sr['total']} devices, "
              f"{sr['failed']} failed ({fail_pct:.1f}%){abort_msg}")
    elif isinstance(sr, dict):
        print(f"  {stage_name}: SKIPPED ({sr.get('reason', '')})")

print(f"\n  Total success: {r2['success']:,}")
print(f"  Total failed: {r2['failed']:,}")
print(f"  Total bricked: {r2['bricked']:,}")
print(f"  Not updated (rollout aborted): {r2['not_updated']:,}")
truck_cost_2 = r2["bricked"] * 100
print(f"  Truck roll cost: ${truck_cost_2:,}")

# Comparison
print(f"\n--- Comparison ---")
print(f"{'Metric':<30} {'Simultaneous':>15} {'Staged':>15}")
print(f"{'Bricked devices':<30} {r1['bricked']:>15,} {r2['bricked']:>15,}")
print(f"{'Truck roll cost':<30} {'$'+str(truck_cost_1):>15} {'$'+str(truck_cost_2):>15}")
saved = truck_cost_1 - truck_cost_2
print(f"{'Savings from staging':<30} {'':>15} {'$'+str(saved):>15}")

What to Observe:

  • Simultaneous rollout deploys the buggy firmware to all 10,000 devices – bricking ~20 devices on rev3.1 hardware
  • Staged rollout detects the high failure rate in the canary/early adopter phase and aborts before reaching the majority
  • The abort threshold (1% failure rate) catches the rev3.1 bug early, protecting thousands of devices
  • Staged rollout reduces truck roll costs by 80-90% compared to simultaneous deployment

Scenario: You’re designing a LoRa soil moisture sensor that will be deployed in 5,000 agricultural fields across remote regions. The device must support OTA firmware updates to fix bugs and add features over its 10-year lifespan. The microcontroller has only 1 MB of flash memory. Design an A/B partition scheme that allows safe updates while minimizing flash usage.

Constraints:

  • Total flash: 1 MB (1,048,576 bytes = 1,024 KB)
  • Current firmware size: 384 KB
  • Bootloader: 32 KB
  • Configuration/NVS: 16 KB
  • OTA metadata: 4 KB
  • Anti-rollback counter: 4 KB
  • Requirement: Must support full firmware rollback

Flash Layout Design:

Address         Size      Purpose                        Justification
0x000000        32 KB     Bootloader (write-protected)   Immutable, verifies A/B partitions
0x008000        4 KB      Partition table                Maps A/B locations
0x009000        4 KB      OTA selection                  Flags active partition (A or B)
0x00A000        4 KB      Anti-rollback counter          Prevents downgrade attacks
0x00B000        16 KB     NVS (config, calibration)      Persistent across updates
0x00F000        448 KB    Application A (current)        Primary firmware slot
0x07F000        448 KB    Application B (update target)  Update download slot
0x0EF000        68 KB     (Reserved for growth)          Future firmware expansion

Calculations:

Total allocated:
32 KB (bootloader) + 4 KB (partition table) + 4 KB (OTA select) + 4 KB (anti-rollback) + 16 KB (NVS)
+ 448 KB (App A) + 448 KB (App B) + 68 KB (reserved) = 1,024 KB

Remaining: 1,024 KB - 1,024 KB = 0 KB (fully utilized with reserved block as safety margin)

Firmware growth headroom:
Current: 384 KB -> Allocated: 448 KB -> Headroom: 64 KB (17% growth capacity)
Can accommodate firmware up to 448 KB before needing hardware redesign

Update Process Flow:

  1. Download Phase:
    • Bootloader verifies current active partition (A)
    • Download new firmware to inactive partition (B): 448 KB @ 100 bytes/sec LoRa = 1.24 hours
    • Calculate SHA-256 hash: hash(firmware_B) = 0xABCD...
  2. Verification Phase:
    • Verify RSA-2048 signature: 10 seconds on low-power MCU
    • Check anti-rollback counter: new_version >= stored_version (e.g., 23 >= 22)
    • Validate firmware size: 384 KB < 448 KB maximum
  3. Commit Phase:
    • Set OTA selection flag to boot from B
    • Increment anti-rollback counter (22 -> 23) in EEPROM
    • Reboot device
  4. First Boot from B:
    • Bootloader verifies signature of partition B
    • If boot succeeds for 5 minutes: Mark B as “confirmed” in OTA selection
    • If boot fails (watchdog timeout, crash): Automatic rollback to A

Flash Wear Leveling Consideration:

Typical flash endurance: 10,000 erase cycles
OTA updates per year: 4 (quarterly security patches)
Years until sector wear-out: 10,000 cycles / 4 updates/year = 2,500 years

Anti-rollback counter writes:
4 updates/year x 10 years = 40 writes
Flash endurance: 10,000 writes -> Safe (99.6% lifespan remaining)

Key Design Decision: Why 448 KB per partition instead of 512 KB?

  • 512 KB x 2 = 1,024 KB leaves only 0 KB for bootloader, NVS, and metadata (impossible)
  • 448 KB x 2 = 896 KB leaves 128 KB for critical components (comfortable margin)
  • 17% firmware growth headroom balances future-proofing with current constraints

Real-World Outcome: With this design, the soil sensor deployment achieved 99.7% successful update rate across 5,000 devices over 3 years. The A/B rollback saved 42 devices from bricking when a buggy firmware was deployed (detected crashes, automatic rollback to partition A). The anti-rollback counter prevented 8 attempted downgrade attacks where compromised devices tried to revert to vulnerable firmware versions.

Criterion Full Image Update Delta (Binary Diff) Update Modular Component Update Dual-Boot (A/B) Required?
Bandwidth Usage High (100% firmware size) Low (5-30% of full size) Very Low (single module) N/A (orthogonal choice)
Flash Requirements 2x firmware size 1.5x firmware size + patch buffer 1.2x firmware size Yes for all strategies
Update Complexity Simple (replace entire image) Complex (compute diff, apply patch) Very Complex (dependency management) Bootloader complexity +40%
Rollback Capability Easy with dual-boot Difficult (must reverse patch) Module-specific only Essential
Update Time Slow (full download) Fast (small diff) Very Fast (one module) N/A
Risk of Bricking Medium (all-or-nothing) High (partial patch failure) Low (isolated module) Dual-boot reduces 90%
Best For First deployment, major version changes Frequent small updates, bandwidth-limited Microservices architecture, plugin systems All production systems

Decision Tree:

  1. Is device flash <512 KB?
    • YES -> Use Delta Updates (full image won’t fit in A/B partitions)
    • NO -> Continue to step 2
  2. Is network bandwidth expensive (cellular, satellite, LoRa)?
    • YES -> Use Delta Updates (save on data costs)
    • NO -> Continue to step 3
  3. Can device tolerate 10+ minute update downtime?
    • YES -> Full Image Update acceptable
    • NO -> Use Delta Updates (faster)
  4. Does firmware have plugin/module architecture?
    • YES -> Consider Modular Updates for flexible deployment
    • NO -> Use Full Image or Delta

Dual-Boot (A/B) Recommendation: ALWAYS use dual-boot for production devices, regardless of update strategy. The cost (2x flash, 1 day dev time) is negligible compared to bricking prevention.

Example Applications:

  • Smart meter (cellular, 2 MB flash, 10-year lifespan): Delta + Dual-Boot
    • Bandwidth cost: $0.01/MB -> Delta saves $0.08/update x 40 updates = $3.20/device x 10M devices = $32M savings
  • Wi-Fi camera (fast network, 4 MB flash, frequent updates): Full Image + Dual-Boot
    • Update frequency: Monthly -> Delta complexity not justified for <2 min download time
  • Industrial PLC (modular firmware, 16 MB flash, uptime-critical): Modular + Dual-Boot
    • Can update security modules independently without rebooting entire control system

Update success rate with automatic rollback capability

\[P_{\text{success}} = P_{\text{download}} \times P_{\text{verify}} \times P_{\text{install}} \times (1 - P_{\text{rollback}})\]

Working through an example:

Given: 50,000 smart meter deployment over NB-IoT with 256 KB firmware update

Step 1: Download success probability \(P_{\text{download}} = 1 - (0.02)^3 = 0.999992\) (3 retries, 2% packet loss per attempt)

Step 2: Signature verification (deterministic) \(P_{\text{verify}} = 1.0\) (ECDSA-P256 verification is deterministic)

Step 3: Flash write success \(P_{\text{install}} = 0.998\) (99.8% flash reliability)

Step 4: Rollback probability (critical bugs) \(P_{\text{rollback}} = 0.023\) (2.3% failure rate from example)

Step 5: Calculate overall success \(P_{\text{success}} = 0.999992 \times 1.0 \times 0.998 \times (1 - 0.023) = 0.9752\)

Step 6: Fleet-wide expected successful updates \(\text{Success count} = 50{,}000 \times 0.9752 = 48{,}760\) devices

Result: 48,760 devices (97.52%) successfully update, with 1,150 automatic rollbacks preventing bricked devices and 90 requiring manual intervention.

In practice: Without A/B partitioning and automatic rollback, the 2.3% failure rate would brick 1,150 devices requiring field service ($115,000 at $100/visit). Rollback saves 92% of failure cases.

25.12 Concept Relationships

How OTA Update Security Concepts Connect
Core Concept Builds On Enables Protects Against
Code Signing Public key cryptography Firmware authenticity verification Malicious firmware installation
A/B Partitioning Flash memory management Safe rollback on failure Device bricking from bad updates
Anti-Rollback Counters OTP fuses or NVS Version enforcement Downgrade attacks to vulnerable versions
Staged Rollout Fleet management Early bug detection Fleet-wide outages from buggy firmware
Delta Updates Binary diff algorithms Bandwidth efficiency High data costs on cellular networks

Integration Pattern: All five mechanisms work together – code signing verifies authenticity, A/B partitioning enables safe installation, anti-rollback prevents downgrade, staged rollout limits blast radius, and delta updates reduce operational costs.

25.13 See Also

Related Security Topics:

Industry Standards:

  • NIST SP 800-147B: BIOS Protection Guidelines (applicable to embedded bootloaders)
  • EU Cyber Resilience Act: Mandates secure OTA update capability for IoT products
  • ETSI EN 303 645: IoT security baseline requirements including software updates

Real-World Implementations:

  • ESP-IDF Secure Boot V2 documentation (practical OTA examples)
  • Android Verified Boot (A/B partition scheme inspiration)
  • Tesla OTA update system (automotive best practices)

Common Pitfalls

Sending firmware updates in cleartext without cryptographic signing allows network-positioned attackers to perform a man-in-the-middle attack, replacing the legitimate firmware with malicious code. Always sign firmware and verify the signature on device before applying.

Development signing keys stored on developer workstations are high risk for compromise. Use separate, HSM-protected production signing keys with strict access controls, and revoke development keys before devices leave the lab.

Without rollback protection, an attacker who can send OTA updates can downgrade devices to vulnerable firmware versions for which exploits are known. Implement a hardware monotonic counter that prevents rolling back to versions with lower version numbers.

IoT devices in the field often have intermittent, low-bandwidth, and high-latency connectivity. OTA designs that assume reliable high-bandwidth connections will fail to update devices in poor network conditions. Design for interrupted downloads, partial updates, and resumable transfers.

25.14 Summary

This chapter covered OTA update security:

  • Code Signing: ECDSA/RSA signatures ensure only authenticated firmware executes
  • A/B Partitioning: Dual-bank schemes enable reliable updates with rollback
  • Anti-Rollback: Monotonic counters in non-volatile memory prevent downgrade attacks
  • Staged Rollout: Gradual deployment catches issues before fleet-wide impact
  • Regulatory Compliance: EU CRA mandates secure update capability for 5+ years

25.15 Knowledge Check

25.16 What’s Next

The next chapter explores Hardware Vulnerabilities including hardware trojans, side-channel attacks, and supply chain security risks that threaten IoT devices at the physical level.

Secure Boot Hardware Vulnerabilities