219  State Machine Design Patterns for IoT

219.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Apply Connection Patterns: Design state machines for network connection management
  • Implement Duty Cycling: Create power-efficient sampling state machines
  • Design Safety Systems: Build actuator control with multiple protection layers
  • Choose Appropriate Patterns: Select the right state machine pattern for different IoT scenarios

219.2 Prerequisites

Before diving into this chapter, you should be familiar with:

219.3 Introduction

State machine patterns are reusable solutions to common IoT design challenges. Rather than designing from scratch, experienced developers apply proven patterns that handle edge cases and failure modes. This chapter presents three essential patterns for IoT applications.

Time: ~15 min | Difficulty: Intermediate | Unit: P04.FSM.U03

219.4 Pattern 1: Connection State Machine

A common pattern for IoT devices managing network connections:

%% fig-alt: State diagram for IoT device network connection showing states for disconnected, connecting, connected, and reconnecting with appropriate retry logic and backoff timers
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D'}}}%%
stateDiagram-v2
    [*] --> DISCONNECTED

    DISCONNECTED --> CONNECTING : connect()
    CONNECTING --> CONNECTED : onConnect
    CONNECTING --> DISCONNECTED : onFail [retries exhausted]
    CONNECTING --> CONNECTING : onFail [retries remain]

    CONNECTED --> DISCONNECTED : onDisconnect
    CONNECTED --> RECONNECTING : onConnectionLost

    RECONNECTING --> CONNECTED : onReconnect
    RECONNECTING --> DISCONNECTED : timeout

    note right of CONNECTING : Exponential backoff
    note right of RECONNECTING : Faster retry

219.4.1 Key Features

Feature Implementation Benefit
Exponential Backoff Double wait time after each failure Prevents network congestion
Retry Limits Maximum attempts before giving up Avoids infinite loops
Fast Reconnect Shorter delays for lost connections Maintains user experience
Connection State Tracking Clear CONNECTED/DISCONNECTED states Reliable data transmission

219.4.2 Implementation Considerations

  1. Backoff Strategy: Start with 1 second, double each retry, cap at 60 seconds
  2. Retry Limits: 5 retries for initial connection, 10 for reconnection
  3. Connection Monitoring: Heartbeat or ping to detect silent failures
  4. State Persistence: Remember last-known-good credentials

219.4.3 Example Use Cases

  • MQTT client connections
  • Wi-Fi station mode
  • Cellular modem management
  • WebSocket connections
  • BLE central device pairing

219.5 Pattern 2: Sensor Sampling State Machine

Efficient power management through state-based duty cycling:

%% fig-alt: Sensor sampling state machine showing cycle between deep sleep, wake, sample, process, transmit, and back to sleep states with timing annotations
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D'}}}%%
stateDiagram-v2
    [*] --> DEEP_SLEEP

    DEEP_SLEEP --> WAKING : timer/interrupt
    WAKING --> SAMPLING : peripherals ready
    SAMPLING --> PROCESSING : sample complete
    PROCESSING --> TRANSMITTING : data ready [connected]
    PROCESSING --> BUFFERING : data ready [disconnected]
    TRANSMITTING --> DEEP_SLEEP : tx complete
    BUFFERING --> DEEP_SLEEP : buffer stored

    note right of DEEP_SLEEP : 10uA
    note right of SAMPLING : 50mA
    note right of TRANSMITTING : 150mA

219.5.1 Power Consumption by State

State Current Draw Duration Energy per Cycle
DEEP_SLEEP 10 uA 60 seconds 0.6 mAs
WAKING 20 mA 100 ms 2 mAs
SAMPLING 50 mA 200 ms 10 mAs
PROCESSING 30 mA 50 ms 1.5 mAs
TRANSMITTING 150 mA 500 ms 75 mAs
Total per cycle - ~61 seconds ~89 mAs

219.5.2 Implementation Considerations

  1. Wake Source: RTC timer vs. external interrupt vs. both
  2. Peripheral Initialization: Lazy vs. eager initialization
  3. Buffering Strategy: Store locally when network unavailable
  4. Sample Aggregation: Reduce transmissions by batching data

219.5.3 Example Use Cases

  • Environmental monitoring stations
  • Agricultural soil sensors
  • Asset tracking devices
  • Wildlife monitoring collars
  • Smart meter endpoints

219.6 Pattern 3: Actuator Safety State Machine

Safety-critical actuator control with multiple protection layers:

%% fig-alt: Safety state machine for actuator control showing normal operation, warning, and emergency states with failsafe transitions for motor or valve control
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D'}}}%%
stateDiagram-v2
    [*] --> SAFE_OFF

    state "Normal Operation" as normal {
        SAFE_OFF --> STARTING : start [all checks pass]
        STARTING --> RUNNING : ready
        RUNNING --> STOPPING : stop
        STOPPING --> SAFE_OFF : stopped
    }

    state "Warning Zone" as warning {
        RUNNING --> LIMITING : threshold exceeded
        LIMITING --> RUNNING : back to normal
        LIMITING --> EMERGENCY_STOP : critical threshold
    }

    state "Emergency" as emergency {
        EMERGENCY_STOP --> LOCKOUT : estop confirmed
        LOCKOUT --> SAFE_OFF : admin reset
    }

    RUNNING --> EMERGENCY_STOP : emergency button
    STARTING --> EMERGENCY_STOP : fault detected

219.6.1 Safety Layers

Layer Trigger Response Recovery
Normal Operator command Controlled start/stop Automatic
Limiting Threshold exceeded Reduce power/speed Automatic when safe
Emergency Stop Critical fault or button Immediate halt Requires admin
Lockout Confirmed emergency Disable actuator Manual reset only

219.6.2 Implementation Considerations

  1. Fail-Safe Default: Always default to SAFE_OFF on unknown state
  2. Watchdog Timer: Force EMERGENCY_STOP if state machine hangs
  3. Dual-Channel Input: Redundant sensors for critical measurements
  4. Audit Logging: Record all state transitions with timestamps

219.6.3 Example Use Cases

  • Industrial motor control
  • HVAC damper actuators
  • Automated valve systems
  • Robotic arm controllers
  • Medical pump controllers

219.7 Choosing the Right Pattern

%% fig-alt: Decision tree for selecting appropriate state machine pattern based on IoT device requirements
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#7F8C8D'}}}%%
flowchart TD
    A["What is the primary concern?"] --> B["Network reliability"]
    A --> C["Power consumption"]
    A --> D["Safety/reliability"]

    B --> B1["Connection Pattern"]
    B1 --> B2["Add exponential backoff<br/>Track connection state<br/>Handle reconnection"]

    C --> C1["Sampling Pattern"]
    C1 --> C2["Duty cycle with sleep<br/>Batch transmissions<br/>Buffer when offline"]

    D --> D1["Safety Pattern"]
    D1 --> D2["Multiple protection layers<br/>Fail-safe defaults<br/>Lockout states"]

    style A fill:#2C3E50,stroke:#16A085,stroke-width:2px,color:#fff
    style B1 fill:#16A085,stroke:#2C3E50,stroke-width:2px,color:#fff
    style C1 fill:#E67E22,stroke:#2C3E50,stroke-width:2px,color:#fff
    style D1 fill:#7F8C8D,stroke:#2C3E50,stroke-width:2px,color:#fff

219.7.1 Pattern Combinations

Many real-world IoT devices combine multiple patterns:

  1. Smart Thermostat: Connection + Safety patterns
  2. Remote Sensor Node: Sampling + Connection patterns
  3. Industrial Controller: All three patterns

When combining patterns, use hierarchical state machines to keep the design manageable.

219.8 Summary

State machine patterns provide battle-tested solutions for common IoT challenges:

  • Connection Pattern: Manages network reliability with backoff and retry logic
  • Sampling Pattern: Optimizes power consumption through duty cycling
  • Safety Pattern: Ensures reliable actuator control with protection layers

Applying these patterns reduces development time, improves reliability, and leverages lessons learned from thousands of production IoT deployments.

219.9 Knowledge Check

Why does the connection pattern use exponential backoff instead of fixed retry intervals?

  1. To reduce code complexity
  2. To prevent overwhelming the network during outages
  3. To make the code run faster
  4. Because it is required by Wi-Fi standards
Click for answer

Answer: B) To prevent overwhelming the network during outages

Exponential backoff spreads out retry attempts, preventing thousands of devices from simultaneously hammering the network when it recovers from an outage. This is especially important for large-scale IoT deployments where simultaneous reconnection attempts could cause secondary failures.

In the sensor sampling pattern, why is there a separate BUFFERING state instead of just staying in PROCESSING?

  1. To save memory
  2. To handle offline operation gracefully
  3. To make the code simpler
  4. Because sensors require buffering
Click for answer

Answer: B) To handle offline operation gracefully

The BUFFERING state allows the device to store data locally when the network is unavailable, then return to sleep to conserve power. This ensures data is not lost during connectivity outages while maintaining the power efficiency of the duty cycling pattern.

What is the purpose of the LOCKOUT state in the actuator safety pattern?

  1. To save power
  2. To prevent unauthorized access
  3. To ensure dangerous faults require deliberate human intervention to clear
  4. To simplify the state machine
Click for answer

Answer: C) To ensure dangerous faults require deliberate human intervention to clear

The LOCKOUT state prevents automatic recovery from serious faults. This ensures that a qualified person must physically or administratively reset the system, providing an opportunity to investigate and address the root cause before resuming operation.

219.10 What’s Next

Apply your state machine knowledge to related topics: