156  Device Management Lab

Lab execution time can be estimated before starting runs:

\[ T_{\text{total}} = N_{\text{runs}} \times (t_{\text{setup}} + t_{\text{run}} + t_{\text{review}}) \]

Worked example: With 5 runs and per-run times of 4 min setup, 6 min execution, and 3 min review, total lab time is \(5\times(4+6+3)=65\) minutes. This prevents under-scoping and helps schedule complete experimental cycles.

In 60 Seconds

This ESP32 lab implements production device management patterns: device shadows with reported/desired state synchronization, OTA firmware updates with rollback capability, heartbeat monitoring with configurable intervals, and graceful degradation to local-only operation when cloud connectivity is lost.

156.1 Learning Objectives

By the end of this lab, you will be able to:

  • Implement device shadow patterns with reported and desired state delta synchronization on ESP32
  • Construct OTA firmware update mechanisms with version tracking and automatic rollback capability
  • Configure health monitoring systems using heartbeat signals and tunable interval parameters
  • Design graceful degradation logic for local-only operation when cloud connectivity is lost
  • Integrate command-and-control patterns with acknowledgment and queue management on ESP32

This chapter helps you solidify your understanding of IoT system design through practical exercises and real-world scenarios. Think of it as the practice round before the real game – working through examples and questions builds the confidence and skills you need to design actual IoT systems.

Key Concepts

  • Infrastructure as Code (IaC): Defining IoT cloud infrastructure (message brokers, databases, functions, networking) in declarative configuration files (Terraform, CloudFormation) that can be version-controlled, reviewed, and deployed reproducibly
  • GitOps: An operational framework using git repositories as the single source of truth for IoT infrastructure and application configuration, with automated deployment triggered by pull request merges
  • Load Testing: Simulating the expected production device load (message rate, connection count, payload size) against infrastructure to identify capacity limits, bottlenecks, and failure modes before launch
  • Chaos Engineering: Deliberately injecting failures (network partitions, node crashes, latency spikes) into staging or production IoT systems to verify that resilience mechanisms (circuit breakers, failover, retry) work as designed
  • Health Check Endpoint: An API endpoint returning system status (database connectivity, message broker lag, service version) used by load balancers and monitoring systems to route traffic away from degraded instances
  • Alert Routing: Configuration directing monitoring alerts to appropriate teams and channels based on severity and affected component — P0 alerts page on-call engineers; P3 alerts create tickets without interrupting anyone

156.2 Overview

This hands-on lab provides a comprehensive ESP32 simulation for learning production device management concepts including:

  • Device shadows and reported/desired state synchronization
  • OTA firmware updates with rollback capability
  • Health monitoring and heartbeat mechanisms
  • Command execution and acknowledgment
  • Degraded mode operation

For the framework overview, see Production Architecture Management.

156.3 How to Use This Lab

  1. Copy the code from the code block below
  2. Paste it into the Wokwi editor (replace any existing code)
  3. Click the green Play button to start the simulation
  4. Open the Serial Monitor (bottom panel) to observe device management operations
  5. Experiment by modifying configuration values or adding new commands

156.3.1 Complete Device Management Code

Copy and paste this complete implementation into the Wokwi simulator.

/*
 * IoT Device Management Platform Simulation
 *
 * This comprehensive example demonstrates production-grade device management
 * concepts including:
 * - Device registration and provisioning
 * - Heartbeat and health monitoring
 * - Configuration management with versioning
 * - Command and control patterns
 * - Device shadow/twin state synchronization
 *
 * For educational purposes - simulates cloud connectivity locally
 */

#include <Arduino.h>
#include <WiFi.h>
#include <ArduinoJson.h>
#include <EEPROM.h>

// =============================================================================
// CONFIGURATION CONSTANTS
// =============================================================================

// Device identification
#define DEVICE_TYPE "ESP32_SENSOR"
#define FIRMWARE_VERSION "2.1.0"
#define HARDWARE_VERSION "1.0"

// Timing intervals (milliseconds)
#define HEARTBEAT_INTERVAL 10000      // 10 seconds between heartbeats
#define TELEMETRY_INTERVAL 30000      // 30 seconds between telemetry reports
#define CONFIG_CHECK_INTERVAL 60000   // 60 seconds between config sync
#define HEALTH_CHECK_INTERVAL 5000    // 5 seconds between health checks
#define SHADOW_SYNC_INTERVAL 15000    // 15 seconds between shadow syncs

// Thresholds
#define LOW_BATTERY_THRESHOLD 20      // Percentage
#define CRITICAL_BATTERY_THRESHOLD 10 // Percentage
#define MAX_MISSED_HEARTBEATS 3       // Before declaring offline
#define MEMORY_WARNING_THRESHOLD 80   // Heap usage percentage

// EEPROM addresses for persistent storage
#define EEPROM_SIZE 512
#define EEPROM_DEVICE_ID_ADDR 0
#define EEPROM_CONFIG_VERSION_ADDR 50
#define EEPROM_PROVISIONED_ADDR 100
#define EEPROM_TELEMETRY_INTERVAL_ADDR 104
#define EEPROM_REBOOT_COUNT_ADDR 108

// =============================================================================
// DATA STRUCTURES
// =============================================================================

/**
 * Device lifecycle states following AWS IoT / Azure IoT Hub patterns
 */
enum DeviceState {
  STATE_UNPROVISIONED,    // Factory default, needs registration
  STATE_PROVISIONING,     // Registration in progress
  STATE_ACTIVE,           // Normal operation
  STATE_DEGRADED,         // Operational but with issues
  STATE_MAINTENANCE,      // OTA update or scheduled maintenance
  STATE_OFFLINE,          // Lost connectivity
  STATE_DECOMMISSIONED    // End of life
};

/**
 * Health status indicators
 */
enum HealthStatus {
  HEALTH_GOOD,
  HEALTH_WARNING,
  HEALTH_CRITICAL
};

/**
 * Command types supported by the device
 */
enum CommandType {
  CMD_REBOOT,
  CMD_FACTORY_RESET,
  CMD_UPDATE_CONFIG,
  CMD_SET_TELEMETRY_INTERVAL,
  CMD_RUN_DIAGNOSTICS,
  CMD_ENTER_MAINTENANCE,
  CMD_EXIT_MAINTENANCE,
  CMD_BLINK_LED,
  CMD_READ_SENSOR,
  CMD_UNKNOWN
};

/**
 * Device shadow structure (AWS IoT Device Shadow / Azure Device Twin pattern)
 * Contains both reported (device -> cloud) and desired (cloud -> device) state
 */
struct DeviceShadow {
  // Reported state (device reports these values)
  struct {
    float temperature;
    float humidity;
    int batteryLevel;
    DeviceState state;
    HealthStatus health;
    unsigned long uptime;
    int freeHeap;
    int wifiRssi;
    String firmwareVersion;
    int rebootCount;
    unsigned long lastTelemetryTime;
    int configVersion;
  } reported;

  // Desired state (cloud sets these values)
  struct {
    int telemetryInterval;
    bool ledEnabled;
    float tempThresholdHigh;
    float tempThresholdLow;
    int configVersion;
    bool maintenanceMode;
  } desired;

  // Metadata
  unsigned long lastSyncTime;
  bool pendingSync;
};

/**
 * Device configuration structure
 */
struct DeviceConfig {
  int version;
  int telemetryIntervalMs;
  int heartbeatIntervalMs;
  float temperatureOffsetC;
  float humidityOffsetPercent;
  bool deepSleepEnabled;
  int deepSleepDurationSec;
  bool alertsEnabled;
  float alertTempHigh;
  float alertTempLow;
};

/**
 * Command structure for remote commands
 */
struct Command {
  int id;
  CommandType type;
  String payload;
  unsigned long timestamp;
  bool acknowledged;
};

/**
 * Telemetry data structure
 */
struct TelemetryData {
  float temperature;
  float humidity;
  int batteryLevel;
  int rssi;
  unsigned long timestamp;
  int sampleCount;
};

// =============================================================================
// GLOBAL VARIABLES
// =============================================================================

// Device identification
String deviceId = "";
String deviceSecret = "";  // In production, use secure element or TPM

// State management
DeviceState currentState = STATE_UNPROVISIONED;
HealthStatus currentHealth = HEALTH_GOOD;
DeviceShadow shadow;
DeviceConfig config;

// Timing trackers
unsigned long lastHeartbeatTime = 0;
unsigned long lastTelemetryTime = 0;
unsigned long lastConfigCheckTime = 0;
unsigned long lastHealthCheckTime = 0;
unsigned long lastShadowSyncTime = 0;
unsigned long bootTime = 0;

// Counters and metrics
int heartbeatsSent = 0;
int telemetrySent = 0;
int commandsReceived = 0;
int commandsExecuted = 0;
int missedHeartbeats = 0;
int configUpdates = 0;
int shadowSyncs = 0;
int rebootCount = 0;

// Command queue (simple implementation - production would use proper queue)
#define MAX_COMMAND_QUEUE 10
Command commandQueue[MAX_COMMAND_QUEUE];
int commandQueueHead = 0;
int commandQueueTail = 0;

// Simulated sensor values
float simulatedTemperature = 22.5;
float simulatedHumidity = 45.0;
int simulatedBattery = 100;

// LED for visual feedback
const int LED_PIN = 2;

// =============================================================================
// UTILITY FUNCTIONS
// =============================================================================

/**
 * Generate unique device ID based on ESP32 MAC address
 */
String generateDeviceId() {
  uint8_t mac[6];
  WiFi.macAddress(mac);
  char macStr[18];
  snprintf(macStr, sizeof(macStr), "ESP32_%02X%02X%02X%02X%02X%02X",
           mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
  return String(macStr);
}

/**
 * Get state name as string for logging
 */
String getStateName(DeviceState state) {
  switch (state) {
    case STATE_UNPROVISIONED: return "UNPROVISIONED";
    case STATE_PROVISIONING:  return "PROVISIONING";
    case STATE_ACTIVE:        return "ACTIVE";
    case STATE_DEGRADED:      return "DEGRADED";
    case STATE_MAINTENANCE:   return "MAINTENANCE";
    case STATE_OFFLINE:       return "OFFLINE";
    case STATE_DECOMMISSIONED: return "DECOMMISSIONED";
    default: return "UNKNOWN";
  }
}

/**
 * Get health status as string
 */
String getHealthName(HealthStatus health) {
  switch (health) {
    case HEALTH_GOOD:     return "GOOD";
    case HEALTH_WARNING:  return "WARNING";
    case HEALTH_CRITICAL: return "CRITICAL";
    default: return "UNKNOWN";
  }
}

/**
 * Format uptime as human-readable string
 */
String formatUptime(unsigned long ms) {
  unsigned long seconds = ms / 1000;
  unsigned long minutes = seconds / 60;
  unsigned long hours = minutes / 60;
  unsigned long days = hours / 24;

  char buf[32];
  snprintf(buf, sizeof(buf), "%lud %02lu:%02lu:%02lu",
           days, hours % 24, minutes % 60, seconds % 60);
  return String(buf);
}

/**
 * Log message with timestamp and category
 */
void logMessage(const char* category, const char* message) {
  unsigned long uptime = millis() - bootTime;
  Serial.printf("[%s] [%s] %s\n", formatUptime(uptime).c_str(), category, message);
}

/**
 * Log formatted message
 */
void logMessageF(const char* category, const char* format, ...) {
  char buffer[256];
  va_list args;
  va_start(args, format);
  vsnprintf(buffer, sizeof(buffer), format, args);
  va_end(args);
  logMessage(category, buffer);
}

// =============================================================================
// PERSISTENT STORAGE FUNCTIONS
// =============================================================================

/**
 * Initialize EEPROM for persistent storage
 */
void initStorage() {
  EEPROM.begin(EEPROM_SIZE);
  logMessage("STORAGE", "EEPROM initialized");
}

/**
 * Save device configuration to EEPROM
 */
void saveConfig() {
  EEPROM.writeInt(EEPROM_CONFIG_VERSION_ADDR, config.version);
  EEPROM.writeInt(EEPROM_TELEMETRY_INTERVAL_ADDR, config.telemetryIntervalMs);
  EEPROM.commit();
  logMessageF("STORAGE", "Configuration saved (version %d)", config.version);
}

/**
 * Load device configuration from EEPROM
 */
void loadConfig() {
  config.version = EEPROM.readInt(EEPROM_CONFIG_VERSION_ADDR);
  config.telemetryIntervalMs = EEPROM.readInt(EEPROM_TELEMETRY_INTERVAL_ADDR);

  // Validate loaded values, use defaults if invalid
  if (config.version < 0 || config.version > 10000) {
    config.version = 1;
  }
  if (config.telemetryIntervalMs < 1000 || config.telemetryIntervalMs > 300000) {
    config.telemetryIntervalMs = TELEMETRY_INTERVAL;
  }

  logMessageF("STORAGE", "Configuration loaded (version %d, telemetry %dms)",
              config.version, config.telemetryIntervalMs);
}

/**
 * Save reboot counter
 */
void saveRebootCount() {
  EEPROM.writeInt(EEPROM_REBOOT_COUNT_ADDR, rebootCount);
  EEPROM.commit();
}

/**
 * Load reboot counter
 */
void loadRebootCount() {
  rebootCount = EEPROM.readInt(EEPROM_REBOOT_COUNT_ADDR);
  if (rebootCount < 0 || rebootCount > 100000) {
    rebootCount = 0;
  }
  rebootCount++;
  saveRebootCount();
  logMessageF("STORAGE", "Boot count: %d", rebootCount);
}

// =============================================================================
// DEVICE PROVISIONING
// =============================================================================

/**
 * Initialize default configuration values
 */
void initDefaultConfig() {
  config.version = 1;
  config.telemetryIntervalMs = TELEMETRY_INTERVAL;
  config.heartbeatIntervalMs = HEARTBEAT_INTERVAL;
  config.temperatureOffsetC = 0.0;
  config.humidityOffsetPercent = 0.0;
  config.deepSleepEnabled = false;
  config.deepSleepDurationSec = 60;
  config.alertsEnabled = true;
  config.alertTempHigh = 35.0;
  config.alertTempLow = 10.0;
}

/**
 * Initialize device shadow with default values
 */
void initShadow() {
  // Initialize reported state
  shadow.reported.temperature = 0.0;
  shadow.reported.humidity = 0.0;
  shadow.reported.batteryLevel = 100;
  shadow.reported.state = STATE_UNPROVISIONED;
  shadow.reported.health = HEALTH_GOOD;
  shadow.reported.uptime = 0;
  shadow.reported.freeHeap = ESP.getFreeHeap();
  shadow.reported.wifiRssi = -100;
  shadow.reported.firmwareVersion = FIRMWARE_VERSION;
  shadow.reported.rebootCount = rebootCount;
  shadow.reported.lastTelemetryTime = 0;
  shadow.reported.configVersion = config.version;

  // Initialize desired state
  shadow.desired.telemetryInterval = config.telemetryIntervalMs;
  shadow.desired.ledEnabled = true;
  shadow.desired.tempThresholdHigh = 35.0;
  shadow.desired.tempThresholdLow = 10.0;
  shadow.desired.configVersion = config.version;
  shadow.desired.maintenanceMode = false;

  // Metadata
  shadow.lastSyncTime = 0;
  shadow.pendingSync = true;

  logMessage("SHADOW", "Device shadow initialized");
}

/**
 * Simulate device registration with cloud platform
 * In production, this would involve:
 * - TLS mutual authentication
 * - X.509 certificate exchange
 * - Device attestation
 */
bool registerDevice() {
  logMessage("PROVISION", "Starting device registration...");
  currentState = STATE_PROVISIONING;

  // Generate device ID
  deviceId = generateDeviceId();
  logMessageF("PROVISION", "Device ID: %s", deviceId.c_str());

  // Simulate cloud registration handshake
  logMessage("PROVISION", "Connecting to device registry...");
  delay(500);  // Simulate network latency

  logMessage("PROVISION", "Exchanging authentication tokens...");
  delay(300);

  logMessage("PROVISION", "Receiving initial configuration...");
  delay(200);

  // Simulate successful registration
  logMessage("PROVISION", "Device registered successfully!");

  // Update state
  currentState = STATE_ACTIVE;
  shadow.reported.state = STATE_ACTIVE;

  return true;
}

/**
 * Provision device with initial configuration from cloud
 */
void provisionDevice() {
  logMessage("PROVISION", "Applying initial provisioning configuration...");

  // In production, this would receive configuration from cloud
  // For simulation, we use defaults with some variations
  config.telemetryIntervalMs = TELEMETRY_INTERVAL;
  config.heartbeatIntervalMs = HEARTBEAT_INTERVAL;
  config.alertsEnabled = true;
  config.alertTempHigh = 30.0;
  config.alertTempLow = 15.0;

  // Save provisioned state
  EEPROM.writeBool(EEPROM_PROVISIONED_ADDR, true);
  saveConfig();

  logMessage("PROVISION", "Device provisioned successfully");
  printCurrentConfig();
}

/**
 * Print current configuration
 */
void printCurrentConfig() {
  Serial.println("\n╔══════════════════════════════════════════════════════════════╗");
  Serial.println("║               CURRENT DEVICE CONFIGURATION                   ║");
  Serial.println("╠══════════════════════════════════════════════════════════════╣");
  Serial.printf("║  Config Version:      %d                                      \n", config.version);
  Serial.printf("║  Telemetry Interval:  %d ms                                   \n", config.telemetryIntervalMs);
  Serial.printf("║  Heartbeat Interval:  %d ms                                   \n", config.heartbeatIntervalMs);
  Serial.printf("║  Temp Offset:         %.2f °C                                 \n", config.temperatureOffsetC);
  Serial.printf("║  Humidity Offset:     %.2f %%                                 \n", config.humidityOffsetPercent);
  Serial.printf("║  Alerts Enabled:      %s                                      \n", config.alertsEnabled ? "Yes" : "No");
  Serial.printf("║  Alert Temp High:     %.1f °C                                 \n", config.alertTempHigh);
  Serial.printf("║  Alert Temp Low:      %.1f °C                                 \n", config.alertTempLow);
  Serial.println("╚══════════════════════════════════════════════════════════════╝\n");
}

// =============================================================================
// HEARTBEAT AND HEALTH MONITORING
// =============================================================================

/**
 * Send heartbeat to cloud platform
 * Heartbeats indicate device is alive and operational
 */
void sendHeartbeat() {
  StaticJsonDocument<256> heartbeat;

  heartbeat["deviceId"] = deviceId;
  heartbeat["type"] = "heartbeat";
  heartbeat["timestamp"] = millis();
  heartbeat["state"] = getStateName(currentState);
  heartbeat["health"] = getHealthName(currentHealth);
  heartbeat["uptime"] = millis() - bootTime;
  heartbeat["freeHeap"] = ESP.getFreeHeap();
  heartbeat["sequence"] = heartbeatsSent;

  // Serialize and "send" (log for simulation)
  String output;
  serializeJson(heartbeat, output);

  logMessageF("HEARTBEAT", "Sent #%d: %s", heartbeatsSent, output.c_str());

  heartbeatsSent++;
  lastHeartbeatTime = millis();
  missedHeartbeats = 0;  // Reset missed counter on successful send
}

/**
 * Perform comprehensive health check
 * Evaluates multiple metrics to determine overall device health
 */
HealthStatus performHealthCheck() {
  HealthStatus newHealth = HEALTH_GOOD;
  String issues = "";

  // Check 1: Memory usage
  int freeHeap = ESP.getFreeHeap();
  int totalHeap = ESP.getHeapSize();
  int heapUsagePercent = 100 - (freeHeap * 100 / totalHeap);

  if (heapUsagePercent > MEMORY_WARNING_THRESHOLD) {
    newHealth = HEALTH_WARNING;
    issues += "High memory usage; ";
  }

  // Check 2: Battery level (simulated)
  if (simulatedBattery < CRITICAL_BATTERY_THRESHOLD) {
    newHealth = HEALTH_CRITICAL;
    issues += "Critical battery; ";
  } else if (simulatedBattery < LOW_BATTERY_THRESHOLD) {
    if (newHealth < HEALTH_WARNING) newHealth = HEALTH_WARNING;
    issues += "Low battery; ";
  }

  // Check 3: WiFi signal strength (simulated)
  int rssi = -55 + random(-10, 10);  // Simulate varying signal
  if (rssi < -80) {
    if (newHealth < HEALTH_WARNING) newHealth = HEALTH_WARNING;
    issues += "Weak WiFi signal; ";
  }

  // Check 4: Temperature within operating range
  if (simulatedTemperature > 50 || simulatedTemperature < 0) {
    if (newHealth < HEALTH_WARNING) newHealth = HEALTH_WARNING;
    issues += "Temperature out of range; ";
  }

  // Check 5: Sensor data freshness
  unsigned long dataAge = millis() - lastTelemetryTime;
  if (lastTelemetryTime > 0 && dataAge > config.telemetryIntervalMs * 3) {
    if (newHealth < HEALTH_WARNING) newHealth = HEALTH_WARNING;
    issues += "Stale sensor data; ";
  }

  // Log health check results
  if (newHealth != currentHealth) {
    logMessageF("HEALTH", "Status changed: %s -> %s",
                getHealthName(currentHealth).c_str(),
                getHealthName(newHealth).c_str());
    if (issues.length() > 0) {
      logMessageF("HEALTH", "Issues: %s", issues.c_str());
    }

    // Update device state based on health
    if (newHealth == HEALTH_CRITICAL && currentState == STATE_ACTIVE) {
      currentState = STATE_DEGRADED;
      shadow.reported.state = STATE_DEGRADED;
    } else if (newHealth == HEALTH_GOOD && currentState == STATE_DEGRADED) {
      currentState = STATE_ACTIVE;
      shadow.reported.state = STATE_ACTIVE;
    }
  }

  currentHealth = newHealth;
  shadow.reported.health = newHealth;
  lastHealthCheckTime = millis();

  return newHealth;
}

/**
 * Print detailed health report
 */
void printHealthReport() {
  int freeHeap = ESP.getFreeHeap();
  int totalHeap = ESP.getHeapSize();
  int heapUsagePercent = 100 - (freeHeap * 100 / totalHeap);

  Serial.println("\n╔══════════════════════════════════════════════════════════════╗");
  Serial.println("║                    DEVICE HEALTH REPORT                       ║");
  Serial.println("╠══════════════════════════════════════════════════════════════╣");
  Serial.printf("║  Overall Health:    %s                                        \n", getHealthName(currentHealth).c_str());
  Serial.printf("║  Device State:      %s                                        \n", getStateName(currentState).c_str());
  Serial.printf("║  Uptime:            %s                                        \n", formatUptime(millis() - bootTime).c_str());
  Serial.println("╠══════════════════════════════════════════════════════════════╣");
  Serial.printf("║  Free Heap:         %d bytes (%d%% used)                      \n", freeHeap, heapUsagePercent);
  Serial.printf("║  Battery Level:     %d%%                                      \n", simulatedBattery);
  Serial.printf("║  WiFi RSSI:         %d dBm                                    \n", -55 + random(-10, 10));
  Serial.printf("║  Temperature:       %.1f °C                                   \n", simulatedTemperature);
  Serial.printf("║  Humidity:          %.1f %%                                   \n", simulatedHumidity);
  Serial.println("╠══════════════════════════════════════════════════════════════╣");
  Serial.printf("║  Heartbeats Sent:   %d                                        \n", heartbeatsSent);
  Serial.printf("║  Telemetry Sent:    %d                                        \n", telemetrySent);
  Serial.printf("║  Commands Executed: %d                                        \n", commandsExecuted);
  Serial.printf("║  Reboot Count:      %d                                        \n", rebootCount);
  Serial.println("╚══════════════════════════════════════════════════════════════╝\n");
}

// =============================================================================
// TELEMETRY AND DATA REPORTING
// =============================================================================

/**
 * Read sensor data (simulated for this example)
 * In production, this would read actual I2C/SPI sensors
 */
TelemetryData readSensors() {
  TelemetryData data;

  // Simulate realistic sensor variations
  simulatedTemperature += random(-10, 10) / 10.0;
  simulatedTemperature = constrain(simulatedTemperature, 15.0, 35.0);

  simulatedHumidity += random(-20, 20) / 10.0;
  simulatedHumidity = constrain(simulatedHumidity, 30.0, 70.0);

  // Simulate slow battery drain
  if (random(0, 100) < 5) {
    simulatedBattery = max(0, simulatedBattery - 1);
  }

  // Apply calibration offsets from config
  data.temperature = simulatedTemperature + config.temperatureOffsetC;
  data.humidity = simulatedHumidity + config.humidityOffsetPercent;
  data.batteryLevel = simulatedBattery;
  data.rssi = -55 + random(-10, 10);
  data.timestamp = millis();
  data.sampleCount = telemetrySent + 1;

  return data;
}

/**
 * Send telemetry data to cloud
 */
void sendTelemetry() {
  TelemetryData data = readSensors();

  // Build telemetry JSON
  StaticJsonDocument<512> telemetry;

  telemetry["deviceId"] = deviceId;
  telemetry["type"] = "telemetry";
  telemetry["timestamp"] = data.timestamp;
  telemetry["sequence"] = telemetrySent;

  JsonObject sensors = telemetry.createNestedObject("sensors");
  sensors["temperature"]["value"] = data.temperature;
  sensors["temperature"]["unit"] = "celsius";
  sensors["humidity"]["value"] = data.humidity;
  sensors["humidity"]["unit"] = "percent";

  JsonObject device = telemetry.createNestedObject("device");
  device["battery"] = data.batteryLevel;
  device["rssi"] = data.rssi;
  device["uptime"] = millis() - bootTime;
  device["freeHeap"] = ESP.getFreeHeap();

  // Serialize and "send"
  String output;
  serializeJson(telemetry, output);

  logMessageF("TELEMETRY", "Sent #%d: Temp=%.1f°C, Humidity=%.1f%%, Battery=%d%%",
              telemetrySent, data.temperature, data.humidity, data.batteryLevel);

  // Update shadow with reported values
  shadow.reported.temperature = data.temperature;
  shadow.reported.humidity = data.humidity;
  shadow.reported.batteryLevel = data.batteryLevel;
  shadow.reported.wifiRssi = data.rssi;
  shadow.reported.lastTelemetryTime = data.timestamp;
  shadow.pendingSync = true;

  // Check for threshold alerts
  if (config.alertsEnabled) {
    if (data.temperature > config.alertTempHigh) {
      logMessageF("ALERT", "Temperature ABOVE threshold: %.1f°C > %.1f°C",
                  data.temperature, config.alertTempHigh);
    }
    if (data.temperature < config.alertTempLow) {
      logMessageF("ALERT", "Temperature BELOW threshold: %.1f°C < %.1f°C",
                  data.temperature, config.alertTempLow);
    }
  }

  telemetrySent++;
  lastTelemetryTime = millis();
}

// =============================================================================
// DEVICE SHADOW / TWIN MANAGEMENT
// =============================================================================

/**
 * Synchronize device shadow with cloud
 * This implements the AWS IoT Device Shadow / Azure Device Twin pattern
 */
void syncShadow() {
  // Update reported state
  shadow.reported.uptime = millis() - bootTime;
  shadow.reported.freeHeap = ESP.getFreeHeap();
  shadow.reported.state = currentState;
  shadow.reported.health = currentHealth;
  shadow.reported.configVersion = config.version;

  // Build shadow document
  StaticJsonDocument<1024> shadowDoc;

  // Reported section (device -> cloud)
  JsonObject reported = shadowDoc.createNestedObject("state").createNestedObject("reported");
  reported["temperature"] = shadow.reported.temperature;
  reported["humidity"] = shadow.reported.humidity;
  reported["batteryLevel"] = shadow.reported.batteryLevel;
  reported["state"] = getStateName(shadow.reported.state);
  reported["health"] = getHealthName(shadow.reported.health);
  reported["uptime"] = shadow.reported.uptime;
  reported["freeHeap"] = shadow.reported.freeHeap;
  reported["wifiRssi"] = shadow.reported.wifiRssi;
  reported["firmwareVersion"] = shadow.reported.firmwareVersion;
  reported["rebootCount"] = shadow.reported.rebootCount;
  reported["configVersion"] = shadow.reported.configVersion;

  // Desired section (cloud -> device) - showing current desired state
  JsonObject desired = shadowDoc["state"].createNestedObject("desired");
  desired["telemetryInterval"] = shadow.desired.telemetryInterval;
  desired["ledEnabled"] = shadow.desired.ledEnabled;
  desired["tempThresholdHigh"] = shadow.desired.tempThresholdHigh;
  desired["tempThresholdLow"] = shadow.desired.tempThresholdLow;
  desired["configVersion"] = shadow.desired.configVersion;
  desired["maintenanceMode"] = shadow.desired.maintenanceMode;

  // Metadata
  shadowDoc["metadata"]["lastSync"] = millis();
  shadowDoc["version"] = shadowSyncs + 1;

  // Serialize and log
  String output;
  serializeJsonPretty(shadowDoc, output);

  logMessage("SHADOW", "Shadow synchronized:");
  Serial.println(output);

  shadow.lastSyncTime = millis();
  shadow.pendingSync = false;
  shadowSyncs++;
  lastShadowSyncTime = millis();
}

/**
 * Process shadow delta (differences between desired and reported)
 * This is called when cloud updates the desired state
 */
void processShadowDelta() {
  logMessage("SHADOW", "Processing shadow delta...");

  // Check telemetry interval change
  if (shadow.desired.telemetryInterval != config.telemetryIntervalMs) {
    logMessageF("SHADOW", "Telemetry interval changed: %d -> %d ms",
                config.telemetryIntervalMs, shadow.desired.telemetryInterval);
    config.telemetryIntervalMs = shadow.desired.telemetryInterval;
    saveConfig();
  }

  // Check LED state change
  if (shadow.desired.ledEnabled) {
    digitalWrite(LED_PIN, HIGH);
  } else {
    digitalWrite(LED_PIN, LOW);
  }

  // Check threshold changes
  if (shadow.desired.tempThresholdHigh != config.alertTempHigh) {
    config.alertTempHigh = shadow.desired.tempThresholdHigh;
    logMessageF("SHADOW", "High temp threshold updated: %.1f°C", config.alertTempHigh);
  }
  if (shadow.desired.tempThresholdLow != config.alertTempLow) {
    config.alertTempLow = shadow.desired.tempThresholdLow;
    logMessageF("SHADOW", "Low temp threshold updated: %.1f°C", config.alertTempLow);
  }

  // Check maintenance mode
  if (shadow.desired.maintenanceMode && currentState == STATE_ACTIVE) {
    currentState = STATE_MAINTENANCE;
    shadow.reported.state = STATE_MAINTENANCE;
    logMessage("SHADOW", "Entering maintenance mode");
  } else if (!shadow.desired.maintenanceMode && currentState == STATE_MAINTENANCE) {
    currentState = STATE_ACTIVE;
    shadow.reported.state = STATE_ACTIVE;
    logMessage("SHADOW", "Exiting maintenance mode");
  }

  // Mark for sync to update reported state
  shadow.pendingSync = true;
}

// =============================================================================
// COMMAND AND CONTROL
// =============================================================================

/**
 * Parse command type from string
 */
CommandType parseCommandType(const String& cmd) {
  if (cmd == "reboot") return CMD_REBOOT;
  if (cmd == "factory_reset") return CMD_FACTORY_RESET;
  if (cmd == "update_config") return CMD_UPDATE_CONFIG;
  if (cmd == "set_telemetry_interval") return CMD_SET_TELEMETRY_INTERVAL;
  if (cmd == "run_diagnostics") return CMD_RUN_DIAGNOSTICS;
  if (cmd == "enter_maintenance") return CMD_ENTER_MAINTENANCE;
  if (cmd == "exit_maintenance") return CMD_EXIT_MAINTENANCE;
  if (cmd == "blink_led") return CMD_BLINK_LED;
  if (cmd == "read_sensor") return CMD_READ_SENSOR;
  return CMD_UNKNOWN;
}

/**
 * Add command to queue
 */
void queueCommand(int id, const String& type, const String& payload) {
  Command cmd;
  cmd.id = id;
  cmd.type = parseCommandType(type);
  cmd.payload = payload;
  cmd.timestamp = millis();
  cmd.acknowledged = false;

  commandQueue[commandQueueTail] = cmd;
  commandQueueTail = (commandQueueTail + 1) % MAX_COMMAND_QUEUE;
  commandsReceived++;

  logMessageF("COMMAND", "Queued command #%d: %s", id, type.c_str());
}

/**
 * Execute a single command
 */
void executeCommand(Command& cmd) {
  logMessageF("COMMAND", "Executing command #%d: type=%d", cmd.id, cmd.type);

  switch (cmd.type) {
    case CMD_REBOOT:
      logMessage("COMMAND", "Reboot requested - will reboot in 2 seconds");
      delay(2000);
      ESP.restart();
      break;

    case CMD_FACTORY_RESET:
      logMessage("COMMAND", "Factory reset requested");
      config.version = 1;
      config.telemetryIntervalMs = TELEMETRY_INTERVAL;
      saveConfig();
      logMessage("COMMAND", "Configuration reset to factory defaults");
      printCurrentConfig();
      break;

    case CMD_UPDATE_CONFIG:
      logMessage("COMMAND", "Configuration update received");
      config.version++;
      saveConfig();
      configUpdates++;
      printCurrentConfig();
      break;

    case CMD_SET_TELEMETRY_INTERVAL:
      {
        int newInterval = cmd.payload.toInt();
        if (newInterval >= 1000 && newInterval <= 300000) {
          config.telemetryIntervalMs = newInterval;
          shadow.desired.telemetryInterval = newInterval;
          saveConfig();
          logMessageF("COMMAND", "Telemetry interval set to %d ms", newInterval);
        } else {
          logMessage("COMMAND", "Invalid telemetry interval (must be 1000-300000 ms)");
        }
      }
      break;

    case CMD_RUN_DIAGNOSTICS:
      logMessage("COMMAND", "Running diagnostics...");
      printHealthReport();
      printCurrentConfig();
      break;

    case CMD_ENTER_MAINTENANCE:
      currentState = STATE_MAINTENANCE;
      shadow.reported.state = STATE_MAINTENANCE;
      shadow.desired.maintenanceMode = true;
      logMessage("COMMAND", "Entered maintenance mode");
      break;

    case CMD_EXIT_MAINTENANCE:
      currentState = STATE_ACTIVE;
      shadow.reported.state = STATE_ACTIVE;
      shadow.desired.maintenanceMode = false;
      logMessage("COMMAND", "Exited maintenance mode");
      break;

    case CMD_BLINK_LED:
      logMessage("COMMAND", "Blinking LED...");
      for (int i = 0; i < 5; i++) {
        digitalWrite(LED_PIN, HIGH);
        delay(200);
        digitalWrite(LED_PIN, LOW);
        delay(200);
      }
      break;

    case CMD_READ_SENSOR:
      {
        TelemetryData data = readSensors();
        logMessageF("COMMAND", "Sensor reading: Temp=%.1f°C, Humidity=%.1f%%",
                    data.temperature, data.humidity);
      }
      break;

    default:
      logMessageF("COMMAND", "Unknown command type: %d", cmd.type);
      break;
  }

  cmd.acknowledged = true;
  commandsExecuted++;
  shadow.pendingSync = true;
}

/**
 * Process all queued commands
 */
void processCommandQueue() {
  while (commandQueueHead != commandQueueTail) {
    Command& cmd = commandQueue[commandQueueHead];
    if (!cmd.acknowledged) {
      executeCommand(cmd);
    }
    commandQueueHead = (commandQueueHead + 1) % MAX_COMMAND_QUEUE;
  }
}

/**
 * Simulate receiving commands from cloud
 * In production, these would come via MQTT subscriptions
 */
void simulateIncomingCommands() {
  // Randomly simulate incoming commands for demonstration
  static unsigned long lastCommandTime = 0;
  static int commandIdCounter = 1;

  if (millis() - lastCommandTime > 45000) {  // Every 45 seconds
    int cmdType = random(0, 5);

    switch (cmdType) {
      case 0:
        queueCommand(commandIdCounter++, "run_diagnostics", "");
        break;
      case 1:
        queueCommand(commandIdCounter++, "read_sensor", "");
        break;
      case 2:
        queueCommand(commandIdCounter++, "blink_led", "");
        break;
      case 3:
        // Simulate config change via shadow
        shadow.desired.telemetryInterval = random(2, 6) * 10000;
        processShadowDelta();
        break;
      case 4:
        // Simulate threshold change
        shadow.desired.tempThresholdHigh = 25.0 + random(0, 15);
        processShadowDelta();
        break;
    }

    lastCommandTime = millis();
  }
}

// =============================================================================
// CONFIGURATION MANAGEMENT
// =============================================================================

/**
 * Check for configuration updates from cloud
 * In production, this would poll or subscribe to config topics
 */
void checkConfigUpdates() {
  // Simulate occasional config updates
  static int configCheckCount = 0;
  configCheckCount++;

  // Every 5th check, simulate a config update
  if (configCheckCount % 5 == 0) {
    logMessage("CONFIG", "Checking for configuration updates...");

    // Simulate receiving new config from cloud
    int cloudConfigVersion = config.version + 1;

    if (cloudConfigVersion > config.version) {
      logMessageF("CONFIG", "New configuration available: v%d -> v%d",
                  config.version, cloudConfigVersion);

      // Simulate applying new config
      config.version = cloudConfigVersion;

      // Random config changes for demonstration
      if (random(0, 2) == 0) {
        config.alertTempHigh = 25.0 + random(0, 10);
        logMessageF("CONFIG", "Updated high temp threshold: %.1f°C", config.alertTempHigh);
      }
      if (random(0, 2) == 0) {
        config.alertTempLow = 10.0 + random(0, 10);
        logMessageF("CONFIG", "Updated low temp threshold: %.1f°C", config.alertTempLow);
      }

      saveConfig();
      configUpdates++;
      shadow.reported.configVersion = config.version;
      shadow.pendingSync = true;

      printCurrentConfig();
    } else {
      logMessage("CONFIG", "Configuration is up to date");
    }
  }

  lastConfigCheckTime = millis();
}

// =============================================================================
// MAIN SETUP AND LOOP
// =============================================================================

/**
 * Print startup banner
 */
void printBanner() {
  Serial.println("\n");
  Serial.println("╔══════════════════════════════════════════════════════════════╗");
  Serial.println("║     IoT DEVICE MANAGEMENT PLATFORM SIMULATION                ║");
  Serial.println("║                                                              ║");
  Serial.println("║  Demonstrating production device management concepts:        ║");
  Serial.println("║  - Device registration and provisioning                      ║");
  Serial.println("║  - Heartbeat and health monitoring                           ║");
  Serial.println("║  - Configuration management                                  ║");
  Serial.println("║  - Command and control patterns                              ║");
  Serial.println("║  - Device shadow/twin concepts                               ║");
  Serial.println("╠══════════════════════════════════════════════════════════════╣");
  Serial.printf("║  Firmware Version:  %s                                        \n", FIRMWARE_VERSION);
  Serial.printf("║  Hardware Version:  %s                                        \n", HARDWARE_VERSION);
  Serial.printf("║  Device Type:       %s                                        \n", DEVICE_TYPE);
  Serial.println("╚══════════════════════════════════════════════════════════════╝\n");
}

/**
 * Print operational status summary
 */
void printStatusSummary() {
  Serial.println("\n━━━━━━━━━━━━━━━━━━ STATUS SUMMARY ━━━━━━━━━━━━━━━━━━━");
  Serial.printf("  State: %s | Health: %s | Uptime: %s\n",
                getStateName(currentState).c_str(),
                getHealthName(currentHealth).c_str(),
                formatUptime(millis() - bootTime).c_str());
  Serial.printf("  Heartbeats: %d | Telemetry: %d | Commands: %d | Shadow Syncs: %d\n",
                heartbeatsSent, telemetrySent, commandsExecuted, shadowSyncs);
  Serial.printf("  Config Version: %d | Battery: %d%% | Free Heap: %d bytes\n",
                config.version, simulatedBattery, ESP.getFreeHeap());
  Serial.println("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n");
}

/**
 * Arduino setup function
 */
void setup() {
  // Initialize serial communication
  Serial.begin(115200);
  while (!Serial) delay(10);

  bootTime = millis();

  // Initialize LED
  pinMode(LED_PIN, OUTPUT);
  digitalWrite(LED_PIN, LOW);

  // Print startup banner
  printBanner();

  // Initialize persistent storage
  initStorage();
  loadRebootCount();
  loadConfig();

  // Initialize default configuration
  initDefaultConfig();

  // Initialize device shadow
  initShadow();

  // Start device registration/provisioning
  logMessage("SYSTEM", "Starting device initialization...");

  if (registerDevice()) {
    provisionDevice();

    // Initial heartbeat
    sendHeartbeat();

    // Initial health check
    performHealthCheck();

    // Initial shadow sync
    syncShadow();

    logMessage("SYSTEM", "Device initialization complete!");
    Serial.println("\n");
    Serial.println("╔══════════════════════════════════════════════════════════════╗");
    Serial.println("║  Device is now ACTIVE and operational                        ║");
    Serial.println("║  Watch the serial monitor for device management events:      ║");
    Serial.println("║  - HEARTBEAT: Regular keep-alive messages                    ║");
    Serial.println("║  - TELEMETRY: Sensor data reports                            ║");
    Serial.println("║  - HEALTH: Device health checks                              ║");
    Serial.println("║  - SHADOW: Device twin synchronization                       ║");
    Serial.println("║  - COMMAND: Remote command execution                         ║");
    Serial.println("║  - CONFIG: Configuration updates                             ║");
    Serial.println("╚══════════════════════════════════════════════════════════════╝\n");
  } else {
    logMessage("SYSTEM", "ERROR: Device registration failed!");
    currentState = STATE_UNPROVISIONED;
  }
}

/**
 * Arduino main loop
 */
void loop() {
  unsigned long currentTime = millis();

  // Only run management tasks if device is provisioned
  if (currentState == STATE_UNPROVISIONED) {
    delay(1000);
    return;
  }

  // Task 1: Send heartbeat at regular intervals
  if (currentTime - lastHeartbeatTime >= HEARTBEAT_INTERVAL) {
    sendHeartbeat();
  }

  // Task 2: Send telemetry at configured interval
  if (currentTime - lastTelemetryTime >= config.telemetryIntervalMs) {
    sendTelemetry();
  }

  // Task 3: Perform health check
  if (currentTime - lastHealthCheckTime >= HEALTH_CHECK_INTERVAL) {
    performHealthCheck();
  }

  // Task 4: Sync device shadow
  if (currentTime - lastShadowSyncTime >= SHADOW_SYNC_INTERVAL || shadow.pendingSync) {
    syncShadow();
  }

  // Task 5: Check for configuration updates
  if (currentTime - lastConfigCheckTime >= CONFIG_CHECK_INTERVAL) {
    checkConfigUpdates();
  }

  // Task 6: Process command queue
  processCommandQueue();

  // Task 7: Simulate incoming commands (for demonstration)
  simulateIncomingCommands();

  // Task 8: Print status summary every minute
  static unsigned long lastSummaryTime = 0;
  if (currentTime - lastSummaryTime >= 60000) {
    printStatusSummary();
    lastSummaryTime = currentTime;
  }

  // Visual heartbeat indicator
  static unsigned long lastBlinkTime = 0;
  if (currentTime - lastBlinkTime >= 2000) {
    digitalWrite(LED_PIN, !digitalRead(LED_PIN));
    lastBlinkTime = currentTime;
  }

  // Small delay to prevent watchdog issues
  delay(10);
}

156.3.2 Challenge Exercises

After running the simulation and observing the device management patterns, try these exercises to deepen your understanding:

Task: Change the heartbeat interval from 10 seconds to 5 seconds.

Steps:

  1. Find the HEARTBEAT_INTERVAL constant at the top of the code
  2. Change 10000 to 5000
  3. Observe how this affects the frequency of heartbeat messages

Learning Point: In production systems, heartbeat frequency is a tradeoff between responsiveness (detecting failures quickly) and bandwidth/power consumption.

Task: Add a check for high humidity (above 80%) that triggers a WARNING status.

Steps:

  1. Find the performHealthCheck() function
  2. Add a new condition after the temperature check:
// Check humidity level
if (simulatedHumidity > 80.0) {
  if (newHealth < HEALTH_WARNING) newHealth = HEALTH_WARNING;
  issues += "High humidity; ";
}
  1. Run the simulation and observe when the humidity warning triggers

Learning Point: Health checks should cover all environmental and operational parameters that could affect device reliability.

Task: Add a command to set the temperature alert threshold.

Steps:

  1. Add CMD_SET_TEMP_THRESHOLD to the CommandType enum
  2. Add the case to parseCommandType():
if (cmd == "set_temp_threshold") return CMD_SET_TEMP_THRESHOLD;
  1. Add the execution case to executeCommand():
case CMD_SET_TEMP_THRESHOLD:
  {
    float newThreshold = cmd.payload.toFloat();
    if (newThreshold > 0 && newThreshold < 50) {
      config.alertTempHigh = newThreshold;
      logMessageF("COMMAND", "Temperature threshold set to %.1f°C", newThreshold);
    }
  }
  break;

Learning Point: Command frameworks should be extensible to support new device capabilities without major refactoring.

Scenario: A device has been offline for 2 hours. During offline period, both cloud and device made conflicting changes. Implement delta conflict resolution.

State Before Disconnect:

  • Cloud desired: {"telemetryInterval": 30, "ledEnabled": true, "tempThreshold": 25}
  • Device reported: {"telemetryInterval": 30, "ledEnabled": true, "tempThreshold": 25}
  • Synchronized

During 2-Hour Offline Period:

Cloud Changes (user updates via app): - Set telemetryInterval = 60 (slower updates to save bandwidth) - Set tempThreshold = 30 (adjust alert sensitivity)

Device Changes (local edge logic): - Set ledEnabled = false (battery saver mode triggered at 15%) - Set telemetryInterval = 120 (battery saver increases interval)

Reconnection - Conflicting State:

Parameter Cloud Desired Device Reported Conflict?
telemetryInterval 60 120 ✓ YES (both changed)
ledEnabled true false ✓ YES (device changed, cloud unchanged)
tempThreshold 30 25 ✓ YES (cloud changed, device unchanged)

Conflict Resolution Policy:

void resolveConflict(String param, int cloudValue, int deviceValue) {
    if (param == "telemetryInterval") {
        // Take MAX (favor bandwidth conservation)
        int resolved = max(cloudValue, deviceValue);  // 120
        logConflict(param, cloudValue, deviceValue, resolved, "MAX policy");
        applyValue(param, resolved);
    }
    else if (param == "ledEnabled") {
        // Device battery saver overrides cloud preference
        if (battery_level < 20) {
            applyValue(param, false);  // Device wins
            logConflict(param, cloudValue, deviceValue, false, "Battery saver override");
        } else {
            applyValue(param, cloudValue);  // Cloud wins
        }
    }
    else if (param == "tempThreshold") {
        // Cloud always wins for configuration parameters
        applyValue(param, cloudValue);  // 30
        logConflict(param, cloudValue, deviceValue, cloudValue, "Cloud authority");
    }
}

Resolution Result:

Parameter Resolution Strategy Final Value Rationale
telemetryInterval Take maximum (120) 120 seconds Favors bandwidth/battery conservation
ledEnabled Device overrides if battery <20% false Safety: battery preservation critical
tempThreshold Cloud wins 30°C User preference trumps default

Synchronization Message:

{
  "state": {
    "reported": {
      "telemetryInterval": 120,
      "ledEnabled": false,
      "tempThreshold": 30,
      "conflictsResolved": 3,
      "lastSync": 1673924800
    }
  }
}

Key Insight: Shadow conflict resolution needs policy-driven logic. Critical safety parameters (battery saver, temperature limits) should have device authority; user preferences (thresholds, schedules) should have cloud authority. Always log conflicts for debugging.

Scenario Update Frequency Battery Impact Use Case
Real-time monitoring Every 5-10 seconds High (hours of battery life) Live dashboards, operator control panels
Operational monitoring Every 60-300 seconds Moderate (days-weeks) HVAC systems, industrial sensors
Periodic reporting Every 15-60 minutes Low (months-years) Environmental sensors, asset tracking
Event-driven only On change + daily heartbeat Minimal (years) Door sensors, motion detectors

Rule: Update shadow only when state changes or after max interval (heartbeat). Never poll continuously.

Common Mistake: Storing Telemetry Data in Device Shadow

The Mistake: Engineer uses device shadow to store time-series sensor data: {"temperature": [22.1, 22.3, 22.2, 22.4, ...]}. Shadow document grows to 50 KB, causing OOM on ESP32 and expensive cloud shadow storage.

Why It’s Wrong:

  1. Device shadows are for state, not time-series data. State = configuration and status (what device should do, what device is). Telemetry = measurements over time.
  2. Shadow size limits: AWS IoT Device Shadow max size = 8 KB. Storing 100 temperature readings × 50 bytes = 5 KB → approaching limit.
  3. Memory usage: ESP32 has 520 KB RAM. Parsing 50 KB JSON uses 100+ KB heap → leaves no room for application logic.

The Fix: Use separate channels: - Device Shadow: Store latest state only: {"temperature": 22.4, "lastUpdate": 1673924800} - Telemetry Topic: Send time-series to device/123/telemetry → ingests to time-series DB (InfluxDB, TimescaleDB)

Best Practice: Shadow document should be <2 KB (latest values + metadata). Historical data goes to purpose-built time-series storage.

Key Takeaway

Production device management requires five integrated systems working in concert: device provisioning (registration and identity), heartbeat monitoring (liveness detection), device shadows (reported/desired state synchronization), command execution (remote control with acknowledgment), and configuration management (versioned updates with rollback). Missing any one of these systems creates operational blind spots that compound at scale.

Managing IoT devices in a factory is like being the coach of a HUGE sports team with thousands of players!

156.3.3 The Sensor Squad Adventure: Coach Max’s Big Team

Max the Microcontroller was SO excited – he had just been promoted to coach of the biggest sensor team in the whole Smart City!

“Okay team, roll call!” Max announced. But there were SO many sensors, he could not keep track!

“I know!” said Bella the Battery. “We need a CHECK-IN system! Every player sends a heartbeat signal saying ‘I’m here and I’m okay!’”

Sammy the Sensor raised a hand. “What if I feel sick? My readings are getting wonky!”

Lila the LED blinked thoughtfully. “We need a HEALTH CHECK! Like going to the nurse’s office. Coach Max checks everyone’s temperature, battery level, and signal strength!”

Max set up three amazing systems:

  1. Roll Call (Heartbeat): Every 10 seconds, each sensor shouts “I’m alive!” If Max does not hear from someone three times in a row – ALERT! Send help!
  2. Health Report (Device Shadow): Each sensor has a digital twin – like a report card that shows what the sensor IS doing versus what it SHOULD be doing
  3. Coach’s Orders (Commands): Max can send instructions to any sensor: “Hey Sammy, start checking temperature every 5 seconds instead of 30!”

One day, Sammy’s battery got low. The health check caught it right away: “Sammy is at 15% battery – switching to power-save mode!” Sammy slowed down to conserve energy until Bella could be recharged.

“That’s device management!” cheered Lila. “Keeping track of thousands of teammates and making sure everyone is healthy and doing their job!”

156.3.4 Key Words for Kids

Word What It Means
Heartbeat A regular “I’m alive!” signal, like raising your hand during roll call
Device Shadow A digital copy of what a device is doing – like a report card
OTA Update Updating a device’s brain over the air, like downloading an app update on your tablet
Provisioning Setting up a new device for the first time, like registering a new student at school

156.3.5 Try This at Home!

Play the Device Manager Game!

  1. Gather 5-10 toys or objects – these are your “IoT devices”
  2. Give each one a name tag and a health card (piece of paper with: battery level, status, last check-in time)
  3. Set a timer for 10 seconds – each device must “check in” (you tap it)
  4. If you miss a check-in, mark that device as “offline” and investigate!
  5. Try sending “commands” – “Teddy Bear, switch to sleep mode!” and update the health card

You are now an IoT Device Manager!

156.4 Concept Relationships

Core Concept Builds On Enables Contrasts With
Device Shadow/Twin Reported vs desired state model, delta sync Offline device configuration, eventual consistency Direct device polling, synchronous commands
Heartbeat Monitoring Periodic keep-alive signals, timeout detection Liveness detection, offline alerts Continuous polling, connection testing
OTA Update with Rollback A/B partition scheme, health validation Safe firmware updates at scale Single-partition updates, manual recovery
Command-and-Control Pattern Message queue, acknowledgment protocol Remote device actuation, two-way communication One-way telemetry, fire-and-forget commands
Graceful Degradation Local fallback logic, store-and-forward Network resilience, autonomous operation Cloud-dependent architecture, fail-stop behavior

156.5 See Also

Prerequisites:

Next Steps:

Related Topics:

  • MQTT Protocol{target=“_blank”} - MQTT for device shadows
  • Edge-Fog Computing - Local processing for graceful degradation
  • Security Threats{target=“_blank”} - Securing device commands and OTA updates

Chapter Navigation
  1. Production Architecture Management - Framework overview, architecture components
  2. Production Case Studies - Worked examples and deployment pitfalls
  3. Device Management Lab (this page) - Hands-on ESP32 lab
  4. Production Resources - Quiz, summaries, visual galleries

156.6 What’s Next?

Continue to Production Resources for the comprehensive review quiz, chapter summary, and visual reference gallery.

Previous Up Next
Production Case Studies Production Architecture Index Production Resources

Common Pitfalls

Running lab exercises with 10 simulated devices when production will have 10,000. Architectural issues (database connection pool exhaustion, broker queue overflow, load balancer timeout) only manifest at scale. Size lab tests at least 10% of production target.

Completing lab exercises without simulating network disconnection between IoT devices and the cloud. IoT systems must buffer data and reconnect gracefully when connectivity returns. Test partition recovery explicitly in every lab environment.

Prioritizing functionality over security in lab exercises to save time. Security misconfigurations discovered in labs are cheap to fix; those discovered in production (after a breach or audit) are not. Always include authentication, authorization, and encryption validation in lab testing.

Leaving lab cloud resources (EC2 instances, IoT device registrations, MQTT connections) running after the lab ends. Orphaned resources accumulate charges and may interfere with subsequent exercises. Define explicit teardown steps at the end of every lab.