Three core IoT state machine patterns: connection management (DISCONNECTED -> CONNECTING -> CONNECTED with exponential backoff up to 5 minutes), duty cycling (SLEEP -> WAKE -> SAMPLE -> TRANSMIT with 99%+ sleep ratio for multi-year battery life), and safety actuator control (SAFE -> ARMED -> ACTIVE with hardware watchdog timeouts of 100-500ms). Always design transitions as one-way safety valves – any anomaly should default to the safest state.
77.1 Learning Objectives
By the end of this chapter, you will be able to:
Construct Connection Patterns: Design state machines for network connection management with exponential backoff
Implement Duty Cycling: Create power-efficient sampling state machines that maximise battery life
Architect Safety Systems: Build actuator control with multiple protection layers and hardware watchdog enforcement
Select Appropriate Patterns: Justify the choice of state machine pattern for different IoT scenarios
For Beginners: State Machine Patterns
State machine patterns are reusable templates for managing device behavior. Think of a vending machine that follows a clear sequence: waiting for coins, accepting selection, dispensing product, giving change. Each step is a state, and the rules for moving between states are clear. These patterns help you design reliable IoT device behavior that handles every situation predictably.
77.2 Prerequisites
Before diving into this chapter, you should be familiar with:
State machine patterns are reusable solutions to common IoT design challenges. Rather than designing from scratch, experienced developers apply proven patterns that handle edge cases and failure modes. This chapter presents three essential patterns for IoT applications.
Time: ~15 min | Difficulty: Intermediate | Unit: P04.FSM.U03
77.4 Pattern 1: Connection State Machine
A common pattern for IoT devices managing network connections:
77.4.1 Key Features
Feature
Implementation
Benefit
Exponential Backoff
Double wait time after each failure
Prevents network congestion
Retry Limits
Maximum attempts before giving up
Avoids infinite loops
Fast Reconnect
Shorter delays for lost connections
Maintains user experience
Connection State Tracking
Clear CONNECTED/DISCONNECTED states
Reliable data transmission
77.4.2 Implementation Considerations
Backoff Strategy: Start with 1 second, double each retry, cap at 60 seconds
Retry Limits: 5 retries for initial connection, 10 for reconnection
Connection Monitoring: Heartbeat or ping to detect silent failures
State Persistence: Remember last-known-good credentials
77.4.3 Example Use Cases
MQTT client connections
Wi-Fi station mode
Cellular modem management
WebSocket connections
BLE central device pairing
77.5 Pattern 2: Sensor Sampling State Machine
Efficient power management through state-based duty cycling:
77.5.1 Power Consumption by State
State
Current Draw
Duration
Energy per Cycle
DEEP_SLEEP
10 uA
60 seconds
0.6 mAs
WAKING
20 mA
100 ms
2 mAs
SAMPLING
50 mA
200 ms
10 mAs
PROCESSING
30 mA
50 ms
1.5 mAs
TRANSMITTING
150 mA
500 ms
75 mAs
Total per cycle
-
~61 seconds
~89 mAs
77.5.2 Implementation Considerations
Wake Source: RTC timer vs. external interrupt vs. both
Peripheral Initialization: Lazy vs. eager initialization
Buffering Strategy: Store locally when network unavailable
Sample Aggregation: Reduce transmissions by batching data
77.5.3 Example Use Cases
Environmental monitoring stations
Agricultural soil sensors
Asset tracking devices
Wildlife monitoring collars
Smart meter endpoints
77.6 Pattern 3: Actuator Safety State Machine
Safety-critical actuator control with multiple protection layers:
77.6.1 Safety Layers
Layer
Trigger
Response
Recovery
Normal
Operator command
Controlled start/stop
Automatic
Limiting
Threshold exceeded
Reduce power/speed
Automatic when safe
Emergency Stop
Critical fault or button
Immediate halt
Requires admin
Lockout
Confirmed emergency
Disable actuator
Manual reset only
77.6.2 Implementation Considerations
Fail-Safe Default: Always default to SAFE_OFF on unknown state
Watchdog Timer: Force EMERGENCY_STOP if state machine hangs
Dual-Channel Input: Redundant sensors for critical measurements
Audit Logging: Record all state transitions with timestamps
77.6.3 Example Use Cases
Industrial motor control
HVAC damper actuators
Automated valve systems
Robotic arm controllers
Medical pump controllers
77.7 Choosing the Right Pattern
77.7.1 Pattern Combinations
Many real-world IoT devices combine multiple patterns:
When combining patterns, use hierarchical state machines to keep the design manageable.
For Kids: Meet the Sensor Squad!
State machine patterns are like recipe books for robot brains – proven recipes that engineers use again and again!
77.7.2 The Sensor Squad Adventure: Three Magic Recipes
The Sensor Squad discovered that most IoT devices need the same three “brain recipes” (patterns):
Recipe 1 – The Never-Give-Up Connector (Connection Pattern): Sammy the Sensor needed to connect to the Wi-Fi, but sometimes the signal was weak. Instead of trying again immediately (which would be like knocking on someone’s door 100 times per second – super annoying!), Sammy followed the “polite knock” recipe: - First try: Wait 1 second - Second try: Wait 2 seconds - Third try: Wait 4 seconds - Keep doubling, but never wait more than 5 minutes!
This way, Sammy keeps trying but does not overwhelm the router.
Recipe 2 – The Sleepy Sensor (Sampling Pattern): Bella the Battery had a problem. If she powered the temperature sensor ALL the time, the battery would die in 2 days. So she created a schedule: - Sleep for 60 seconds (uses barely any power – like a hibernating bear!) - Wake up for 0.1 seconds - Read the temperature for 0.2 seconds - Send the reading for 0.5 seconds - Go back to Sleep!
Result: The battery lasts 5 YEARS instead of 2 days!
Recipe 3 – The Safety Guard (Safety Pattern): Lila the LED controlled a factory robot arm. Safety was the #1 priority! The recipe had LAYERS: - Normal: Robot works fine - Warning: “Slow down, something looks off” - Emergency Stop: “FREEZE! Something is wrong!” - Lockout: “No one touches this until a human checks it”
Max the Microcontroller said, “The most important rule? When in doubt, ALWAYS go to the safest state. A robot that stops is better than a robot that breaks things!”
77.7.3 Key Words for Kids
Word
What It Means
Pattern
A proven recipe that solves a common problem – reuse it instead of starting from scratch
Exponential Backoff
Waiting longer and longer between tries, so you do not annoy the server
Duty Cycling
Sleeping most of the time and only waking up briefly to do work
Fail-Safe
Always defaulting to the safest option when something goes wrong
Key Takeaway
In one sentence: Three essential state machine patterns – connection management (exponential backoff), duty cycling (sleep-wake-sample-transmit), and safety control (layered fail-safe states) – solve the vast majority of IoT device behavior challenges.
Remember this rule: Always design transitions as one-way safety valves – any anomaly should default to the safest state, and dangerous fault recovery should require deliberate human intervention.
77.8 Code Example: Connection State Machine with Exponential Backoff
This MicroPython implementation demonstrates the connection pattern from Pattern 1. The state machine manages Wi-Fi connectivity with exponential backoff, retry limits, and heartbeat monitoring:
import time, networkclass ConnectionFSM:"""States: DISCONNECTED -> CONNECTING -> CONNECTED -> RECONNECTING Backoff doubles per failure: 1s, 2s, 4s, 8s... up to max_backoff_s.""" DISCONNECTED, CONNECTING, CONNECTED, RECONNECTING = ("DISCONNECTED", "CONNECTING", "CONNECTED", "RECONNECTING")def__init__(self, ssid, password, max_retries=5, max_backoff_s=60):self.ssid, self.password = ssid, passwordself.max_retries, self.max_backoff_s = max_retries, max_backoff_sself.state, self.retries, self.backoff_s =self.DISCONNECTED, 0, 1self.wlan = network.WLAN(network.STA_IF)self.last_heartbeat, self.heartbeat_interval_s =0, 30def _attempt_connect(self):self.wlan.active(True)self.wlan.connect(self.ssid, self.password) deadline = time.time() +10while time.time() < deadline:ifself.wlan.isconnected(): returnTrue time.sleep(0.5)returnFalsedef update(self):"""Run one cycle of the state machine. Call in main loop."""ifself.state ==self.DISCONNECTED:self.state =self.CONNECTINGelifself.state ==self.CONNECTING:ifself._attempt_connect():self.state, self.retries, self.backoff_s =self.CONNECTED, 0, 1self.last_heartbeat = time.time()else:self.retries +=1ifself.retries >=self.max_retries:self.state, self.retries, self.backoff_s =self.DISCONNECTED, 0, 1else: time.sleep(self.backoff_s)self.backoff_s =min(self.backoff_s *2, self.max_backoff_s)elifself.state ==self.CONNECTED:ifnotself.wlan.isconnected():self.state, self.backoff_s =self.RECONNECTING, 1# ... RECONNECTING mirrors CONNECTING with 2x patiencereturnself.state# Usagefsm = ConnectionFSM("MyNetwork", "MyPassword")whileTrue:if fsm.update() == ConnectionFSM.CONNECTED:pass# Safe to transmit sensor data time.sleep(1)
The exponential backoff (1s, 2s, 4s, 8s...) prevents network congestion when many devices reconnect simultaneously after an outage. The separate RECONNECTING state uses more patient retry limits because a previously-working connection is likely to recover.
Common Pitfalls
1. Using if-else chains instead of a proper state machine
Firmware that implements device behavior with nested if-else checks on global variables is an implicit state machine that becomes unmaintainable. When a new state (e.g., FIRMWARE_UPDATE) must be added, every if-else branch must be audited for correctness. An explicit state machine with defined states and transitions makes adding new states a localized change with clear boundary conditions.
Key Concepts
Finite State Machine (FSM): A computational model representing a device’s behavior as a finite set of states, with transitions triggered by events or conditions – the standard pattern for modeling IoT device lifecycle
State: A distinct mode of operation for an IoT device (IDLE, SENSING, TRANSMITTING, ERROR, SLEEP) where specific behaviors and valid transitions are defined
Transition: A directed change from one state to another triggered by an event or condition, optionally executing actions (entry/exit/transition actions) during the state change
Entry/Exit Action: Code executed automatically when entering or leaving a state, used to configure hardware (enable ADC on entry), release resources (disable radio on exit), or log state changes for debugging
Guard Condition: A boolean expression that must be true for a transition to occur, enabling conditional state changes based on sensor values, battery levels, or network connectivity status
Hierarchical State Machine (HSM): An FSM extension where states can contain substates, enabling shared behavior to be defined in parent states and overriding only specific behaviors in child states – reduces duplication in complex IoT device firmware
Watchdog Timer: A hardware timer that resets the device if not periodically cleared by software, ensuring the state machine always reaches a safe state even when firmware enters an unexpected condition
State Explosion: An FSM anti-pattern where the number of states and transitions grows combinatorially as features are added, making the model incomprehensible – addressed by using hierarchical state machines or orthogonal regions
2. Forgetting timeout transitions
IoT devices in the field encounter unexpected conditions (sensor hangs, lost connections) that prevent the state machine from receiving the expected event to progress. Every state that waits for an external event must have a timeout transition leading to a safe state (usually ERROR or RESET). Without timeouts, devices get permanently stuck in intermediate states, appearing ‘frozen’ without restarting.
3. Not logging state transitions in production
State machine transitions are invaluable for diagnosing field failures. A device that reaches ERROR state without a logged transition history forces engineers to reproduce complex sequences on the bench. Log every transition with timestamp, previous state, trigger event, and new state. This adds minimal overhead but transforms debugging from days to minutes.
Label the Diagram
Code Challenge
77.9 Summary
State machine patterns provide battle-tested solutions for common IoT challenges:
Connection Pattern: Manages network reliability with backoff and retry logic
Sampling Pattern: Optimizes power consumption through duty cycling
Safety Pattern: Ensures reliable actuator control with protection layers
Applying these patterns reduces development time, improves reliability, and leverages lessons learned from thousands of production IoT deployments.
77.10 Pattern Review Questions
Question 1: Connection Pattern
Why does the connection pattern use exponential backoff instead of fixed retry intervals?
To reduce code complexity
To prevent overwhelming the network during outages
To make the code run faster
Because it is required by Wi-Fi standards
Click for answer
Answer: B) To prevent overwhelming the network during outages
Exponential backoff spreads out retry attempts, preventing thousands of devices from simultaneously hammering the network when it recovers from an outage. This is especially important for large-scale IoT deployments where simultaneous reconnection attempts could cause secondary failures.
Question 2: Sampling Pattern
In the sensor sampling pattern, why is there a separate BUFFERING state instead of just staying in PROCESSING?
To save memory
To handle offline operation gracefully
To make the code simpler
Because sensors require buffering
Click for answer
Answer: B) To handle offline operation gracefully
The BUFFERING state allows the device to store data locally when the network is unavailable, then return to sleep to conserve power. This ensures data is not lost during connectivity outages while maintaining the power efficiency of the duty cycling pattern.
Question 3: Safety Pattern
What is the purpose of the LOCKOUT state in the actuator safety pattern?
To save power
To prevent unauthorized access
To ensure dangerous faults require deliberate human intervention to clear
To simplify the state machine
Click for answer
Answer: C) To ensure dangerous faults require deliberate human intervention to clear
The LOCKOUT state prevents automatic recovery from serious faults. This ensures that a qualified person must physically or administratively reset the system, providing an opportunity to investigate and address the root cause before resuming operation.
77.11 Knowledge Check
Quiz: State Machine Patterns
77.12 Worked Example: Duty-Cycling State Machine Power Budget
Scenario: A soil moisture sensor deployed in a vineyard in Marlborough, New Zealand must operate for 3 years on a single 3.6V lithium thionyl chloride battery (19,000 mAh). The sensor samples soil moisture and transmits via LoRaWAN every 15 minutes.
From the table: \(I_{avg} = \frac{(2.5 \mu A \times 899.2s) + (8mA \times 0.12s) + ... + (12mA \times 0.5s)}{900s}\)
Simplifying: \(I_{avg} = \frac{2247.8 \mu As + 960 \mu As + ... + 6000 \mu As}{900s} \approx 18.8 \mu A\)
Battery life (circuit only, ignoring self-discharge): \(\frac{19,000 mAh}{0.0188 mA} = 1,010,638\) hours \(\approx\)115 years. Including 1%/year self-discharge, practical lifetime is ~54 years – still vastly exceeding the 3-year target.
The 99.9% sleep time is critical: if the sensor transmitted continuously at 120 mA, battery life would be \(\frac{19,000}{120} = 158\) hours ≈ 6.6 days. Duty cycling extends life by 1700x.
The state machine’s 99.9% time in DEEP_SLEEP means the sensor uses just 5.5% of the battery capacity over 3 years (circuit drain only). The remaining capacity is available for additional features or simply provides decades of operation well beyond the 3-year requirement.
Optimisation – Adding Alert Mode:
With the massive energy surplus, the vineyard owner adds a FROST_ALERT state:
Added State
Trigger
Behaviour
Extra Energy
FROST_ALERT
Temperature < 2C
Sample every 60 sec, TX every 5 min
14.6 mWh/day
During frost season (June-August in NZ, ~90 days), frost alerts activate for ~6 hours/night:
Additional energy: 90 days x 6 hrs/day x (14.6/24) mWh = 328 mWh/season
3-year frost alert energy: 984 mWh (1.4% of battery)
Total 3-year usage: 1,770 + 984 = 2,754 mWh (4% of battery)
Key Insight: The duty-cycling state machine is the single most impactful design pattern for battery-powered IoT. By spending 99.9% of time in DEEP_SLEEP (2.5 uA), the transmit state’s 120 mA draw (48,000x higher) becomes negligible in the average. This pattern transforms a 19,000 mAh battery from a 6.6-day power source (if always transmitting) into one lasting decades (over 50 years including self-discharge).
Try It: Duty Cycling Battery Life Calculator
Adjust the sensor parameters to explore how duty cycling affects battery life. See the dramatic difference between always-on and duty-cycled operation.
Show code
viewof dcBattery = Inputs.range([500,20000], {value:19000,step:500,label:"Battery capacity (mAh)"})viewof dcVoltage = Inputs.range([1.5,5.0], {value:3.6,step:0.1,label:"Battery voltage (V)"})viewof dcSleepCurrent = Inputs.range([1,50], {value:2.5,step:0.5,label:"Sleep current (uA)"})viewof dcTxCurrent = Inputs.range([20,300], {value:120,step:10,label:"Transmit current (mA)"})viewof dcCyclePeriod = Inputs.range([60,3600], {value:900,step:60,label:"Cycle period (sec)"})viewof dcActiveTime = Inputs.range([0.1,10], {value:0.756,step:0.1,label:"Active time per cycle (sec)"})viewof dcSelfDischarge = Inputs.range([0,5], {value:1,step:0.5,label:"Self-discharge (%/year)"})