4 Integration Testing for IoT Systems
4.1 Learning Objectives
By the end of this chapter, you will be able to:
- Test Hardware-Software Interfaces: Validate GPIO, I2C, SPI, and ADC interactions
- Test Protocol Implementations: Verify MQTT, CoAP, HTTP, and BLE protocol behavior
- Test Cloud Integration: Validate end-to-end device-to-cloud data flows
- Design Integration Test Strategies: Create comprehensive test plans for IoT subsystems
For Beginners: Integration Testing for IoT Systems
Integration testing verifies that different parts of your IoT system work correctly together. Think of it like a rehearsal where all the musicians play together for the first time – individual practice sounded fine, but playing together reveals timing and coordination issues. Integration tests catch problems at the boundaries between components.
Sensor Squad: Playing Together
“Unit tests check that each of us works alone,” said Max the Microcontroller. “But integration tests check that we work together! Can Sammy’s sensor data reach me correctly over the I2C bus? Does my MQTT message actually arrive at the cloud broker? Do Lila’s LED patterns respond correctly to my commands?”
Sammy the Sensor gave an example. “My I2C driver passes all unit tests – it sends the right bytes in the right order. But when we connect it to Max’s I2C bus with a real pull-up resistor, the timing is slightly different. Integration testing catches these interface problems that unit tests miss.”
Lila the LED described protocol testing. “We send an MQTT message with a specific payload, then verify the broker received it correctly, forwarded it to the subscriber, and the subscriber parsed it into the right data structure. That end-to-end chain has many potential failure points.” Bella the Battery summarized the principle. “If two components communicate, test their communication. If timing matters between components, test real timing. Integration tests live in the middle of the testing pyramid – more realistic than unit tests but faster than full system tests.”
4.2 Prerequisites
Before diving into this chapter, you should be familiar with:
- Testing Fundamentals: Understanding the testing pyramid
- Unit Testing Firmware: Mocking and unit test concepts
Key Takeaway
In one sentence: Integration tests validate that modules work together correctly, catching interface bugs that unit tests miss.
Remember this rule: If two modules communicate, test their interface. If timing matters, test real timing. If state persists, test state consistency.
4.3 Hardware-Software Integration
Integration tests bridge unit tests (logic only) and end-to-end tests (full system).
Testing hardware-software interfaces requires real hardware or high-fidelity simulators:
4.3.1 Key Integration Test Categories
| Category | What It Tests | Tools |
|---|---|---|
| GPIO | Pin state, timing, interrupts | Logic analyzer, oscilloscope |
| I2C/SPI | Sensor/peripheral communication | Protocol analyzer (Saleae, Bus Pirate) |
| ADC | Analog-to-digital conversion accuracy | Signal generator, calibrated reference |
| UART | Serial communication, parsing | Serial monitor, terminal emulator |
| Wi-Fi | Connection, reconnection, roaming | Network emulator, access point |
4.3.2 Example: Testing GPIO-based LED Control
// test_integration_gpio.c
#include "test_framework.h"
#include "gpio_driver.h"
#include "logic_analyzer.h" // Interface to external tool
void test_led_toggle_timing(void) {
// Arrange: Configure LED pin and start logic analyzer capture
gpio_configure(LED_PIN, GPIO_OUTPUT);
logic_analyzer_start_capture(LED_PIN);
// Act: Toggle LED with 100ms period
for (int i = 0; i < 10; i++) {
gpio_write(LED_PIN, HIGH);
delay_ms(50);
gpio_write(LED_PIN, LOW);
delay_ms(50);
}
// Assert: Verify timing via logic analyzer
TimingResult result = logic_analyzer_analyze_period(LED_PIN);
// Allow 10% tolerance for timing
TEST_ASSERT_WITHIN(10, 100, result.period_ms);
TEST_ASSERT_WITHIN(5, 50, result.high_time_ms);
TEST_ASSERT_EQUAL(10, result.cycle_count);
}4.3.3 Example: Testing I2C Temperature Sensor
void test_i2c_sensor_read(void) {
// Arrange: Known calibrated temperature (thermal chamber)
float chamber_temp = 25.0; // Set by test fixture
thermal_chamber_set_temperature(chamber_temp);
delay_ms(30000); // Wait for stabilization
// Act: Read sensor
float sensor_temp = temperature_sensor_read();
// Assert: Within sensor accuracy spec (±0.5°C)
TEST_ASSERT_FLOAT_WITHIN(0.5, chamber_temp, sensor_temp);
}
void test_i2c_sensor_disconnect_detection(void) {
// Arrange: Disconnect sensor from I2C bus
i2c_bus_disconnect(SENSOR_ADDRESS);
// Act: Attempt to read
SensorStatus status = temperature_sensor_read_safe();
// Assert: Proper error handling
TEST_ASSERT_EQUAL(SENSOR_NOT_FOUND, status);
TEST_ASSERT_TRUE(error_logged(ERR_I2C_NACK));
}4.4 Protocol Testing
Protocol tests validate that your firmware correctly implements communication protocols.
4.4.1 MQTT Client Testing
# test_mqtt_integration.py
import pytest
import paho.mqtt.client as mqtt
import time
import json
@pytest.fixture
def mqtt_broker():
"""Start a test MQTT broker."""
broker = start_mosquitto_broker(port=1883)
yield broker
broker.stop()
@pytest.fixture
def device(mqtt_broker):
"""Flash and boot device under test."""
flash_firmware("firmware.bin")
power_cycle_device()
wait_for_boot(timeout=30)
return DeviceInterface()
class TestMQTTPublish:
def test_sensor_data_published_on_interval(self, device, mqtt_broker):
"""Device should publish sensor data every 60 seconds."""
messages = []
def on_message(client, userdata, msg):
messages.append(json.loads(msg.payload))
client = mqtt.Client()
client.on_message = on_message
client.connect("localhost", 1883)
client.subscribe("sensors/temperature")
client.loop_start()
# Wait for 3 publish intervals
time.sleep(180)
client.loop_stop()
# Assert: Should have ~3 messages
assert len(messages) >= 2
assert len(messages) <= 4
# Verify message format
for msg in messages:
assert "temperature" in msg
assert "timestamp" in msg
assert isinstance(msg["temperature"], float)
def test_publish_qos1_with_ack(self, device, mqtt_broker):
"""Device should retry QoS 1 messages until acknowledged."""
# Capture MQTT packets
pcap = start_mqtt_capture()
# Temporarily block PUBACK from broker
mqtt_broker.block_puback()
# Trigger device publish
device.trigger_sensor_read()
time.sleep(2)
# Verify retransmissions
packets = pcap.get_packets()
publish_count = sum(1 for p in packets if p.type == "PUBLISH")
assert publish_count >= 2 # Original + at least 1 retry
# Restore and verify eventual delivery
mqtt_broker.allow_puback()
time.sleep(5)
assert mqtt_broker.received_message_count() >= 1
class TestMQTTSubscribe:
def test_command_received_and_executed(self, device, mqtt_broker):
"""Device should execute commands received via MQTT."""
client = mqtt.Client()
client.connect("localhost", 1883)
# Send command
command = {"action": "set_led", "state": "on"}
client.publish("device/commands", json.dumps(command))
time.sleep(1)
# Verify device executed command
assert device.read_gpio_state(LED_PIN) == HIGH
def test_reconnect_after_broker_restart(self, device, mqtt_broker):
"""Device should reconnect after broker becomes available."""
# Verify initial connection
assert device.mqtt_connected() == True
# Restart broker
mqtt_broker.stop()
time.sleep(5)
mqtt_broker.start()
# Wait for reconnection
for _ in range(30):
if device.mqtt_connected():
break
time.sleep(1)
assert device.mqtt_connected() == True4.4.2 Protocol Compliance Testing
Validate correct implementation of protocol specifications:
| Protocol | Key Tests | Tools |
|---|---|---|
| MQTT | QoS handling, will messages, keepalive, clean session | Wireshark, mqtt-spy |
| CoAP | Confirmable messages, block transfer, observe | libcoap test suite |
| HTTP | Status codes, headers, chunked encoding | curl, Postman |
| BLE | GATT services, advertising, pairing | nRF Connect, hcitool |
Putting Numbers to It
MQTT QoS levels trade throughput for reliability. The protocol overhead directly impacts message delivery rate and battery consumption.
\[\text{Effective Throughput} = \frac{\text{Messages}}{\text{Time} \times (1 + \text{Overhead Factor})}\]
For 100 messages sent over 60 seconds:
\[ \begin{align} \text{QoS 0:} & \quad \frac{100}{60 \times 1.0} = 1.67\text{ msg/s (no ACK, potential loss)} \\ \text{QoS 1:} & \quad \frac{100}{60 \times 1.5} = 1.11\text{ msg/s (PUBACK adds 50% overhead)} \\ \text{QoS 2:} & \quad \frac{100}{60 \times 2.0} = 0.83\text{ msg/s (4-way handshake doubles overhead)} \end{align} \]
QoS 1 guarantees delivery with 50% overhead (PUBLISH + PUBACK). QoS 2 ensures exactly-once with 100% overhead (PUBLISH + PUBREC + PUBREL + PUBCOMP). Integration tests verify devices handle reconnection at the chosen QoS without message loss.
Interactive Calculator:
4.5 Cloud Integration Testing
End-to-end tests validate the complete data path from sensor to cloud and back.
4.5.1 Testing Cloud API Integration
# test_cloud_integration.py
import pytest
import requests
import time
CLOUD_API = "https://api.staging.example.com"
DEVICE_ID = "TEST-001"
@pytest.fixture
def device():
"""Configure device for staging environment."""
flash_firmware_with_config({
"cloud_url": CLOUD_API,
"device_id": DEVICE_ID
})
power_cycle_device()
wait_for_boot(timeout=30)
return DeviceInterface()
class TestDeviceToCloud:
def test_telemetry_received_by_cloud(self, device):
"""Sensor data should appear in cloud API within 60s."""
# Trigger device to send data
device.trigger_sensor_read()
# Poll cloud API for data
for _ in range(60):
response = requests.get(
f"{CLOUD_API}/devices/{DEVICE_ID}/telemetry/latest"
)
if response.status_code == 200:
data = response.json()
assert "temperature" in data
return
time.sleep(1)
pytest.fail("Telemetry not received by cloud within 60s")
def test_device_registration(self, device):
"""New device should auto-register with cloud."""
# Check device appears in cloud registry
response = requests.get(f"{CLOUD_API}/devices/{DEVICE_ID}")
assert response.status_code == 200
data = response.json()
assert data["status"] == "online"
assert data["firmware_version"] == device.firmware_version()
class TestCloudToDevice:
def test_config_update_applied(self, device):
"""Config changes from cloud should be applied on device."""
# Set new config via cloud API
new_config = {"reporting_interval": 120}
requests.post(
f"{CLOUD_API}/devices/{DEVICE_ID}/config",
json=new_config
)
# Wait for device to fetch config
time.sleep(10)
# Verify device applied config
assert device.get_config("reporting_interval") == 120
def test_firmware_ota_update(self, device):
"""Device should accept and install OTA firmware update."""
original_version = device.firmware_version()
# Trigger OTA via cloud
requests.post(
f"{CLOUD_API}/devices/{DEVICE_ID}/ota",
json={"version": "2.0.0"}
)
# Wait for update (max 5 minutes)
for _ in range(60):
time.sleep(5)
if device.firmware_version() != original_version:
break
assert device.firmware_version() == "2.0.0"
assert device.is_operational() # Device still works4.6 Test Environment Setup
Integration tests require controlled environments that simulate production conditions.
4.6.1 Docker Compose for Reproducible IoT Test Environments
Running integration tests on a developer’s laptop requires local instances of MQTT brokers, databases, and mock cloud services. Docker Compose makes this environment reproducible across the team:
# docker-compose.test.yml
# Run: docker compose -f docker-compose.test.yml up -d
# Then: pytest tests/integration/ -v
services:
mqtt-broker:
image: eclipse-mosquitto:2.0
ports:
- "1883:1883" # MQTT
- "9001:9001" # WebSocket
volumes:
- ./test-config/mosquitto.conf:/mosquitto/config/mosquitto.conf
healthcheck:
test: ["CMD", "mosquitto_sub", "-t", "$$SYS/#", "-C", "1", "-i", "healthcheck"]
interval: 5s
timeout: 3s
retries: 3
influxdb:
image: influxdb:2.7
ports:
- "8086:8086"
environment:
DOCKER_INFLUXDB_INIT_MODE: setup
DOCKER_INFLUXDB_INIT_USERNAME: admin
DOCKER_INFLUXDB_INIT_PASSWORD: testpassword
DOCKER_INFLUXDB_INIT_ORG: iot-test
DOCKER_INFLUXDB_INIT_BUCKET: sensor-data
grafana:
image: grafana/grafana:10.2
ports:
- "3000:3000"
depends_on:
- influxdb
# Mock cloud API for device registration and commands
mock-cloud:
build: ./test-fixtures/mock-cloud
ports:
- "8080:8080"
environment:
MQTT_BROKER: mqtt-broker:1883
EXPECTED_DEVICES: "TEST-001,TEST-002,TEST-003"Why Docker Compose: Every developer gets an identical test environment. No “works on my machine” problems. CI/CD servers spin up the same stack. Tear down after tests leaves no state behind.
Test workflow:
# Start test infrastructure
docker compose -f docker-compose.test.yml up -d
# Wait for services to be healthy
docker compose -f docker-compose.test.yml wait mqtt-broker
# Run integration tests (pytest discovers tests automatically)
pytest tests/integration/ -v --timeout=120
# Tear down (clean state for next run)
docker compose -f docker-compose.test.yml down -v4.6.2 Network Condition Simulation
# test_network_conditions.py
import netem # Network emulator
class TestNetworkResilience:
def test_high_latency_mqtt(self, device):
"""Device handles 500ms network latency."""
# Add 500ms latency
netem.add_delay("eth0", delay_ms=500)
try:
device.trigger_sensor_read()
time.sleep(5)
# Verify message eventually delivered
assert mqtt_broker.received_message_count() >= 1
finally:
netem.reset("eth0")
def test_packet_loss_recovery(self, device):
"""Device retries under 20% packet loss."""
netem.add_packet_loss("eth0", loss_percent=20)
try:
device.trigger_sensor_read()
time.sleep(30) # Allow retries
assert mqtt_broker.received_message_count() >= 1
finally:
netem.reset("eth0")
def test_intermittent_connectivity(self, device):
"""Device buffers data during network outage."""
# Disconnect for 60 seconds
netem.block_all_traffic("eth0")
time.sleep(60)
# Reconnect
netem.allow_all_traffic("eth0")
time.sleep(30)
# Verify buffered data was sent
messages = mqtt_broker.get_all_messages()
# Should have ~6 messages buffered (10s interval)
assert len(messages) >= 54.7 Knowledge Check
4.8 Real-World Integration Test Failure: Nest Thermostat Battery Drain (2016)
In January 2016, Nest Labs pushed a firmware update (version 5.1.3) to its Learning Thermostat that passed all unit tests and automated integration tests. Within days, thousands of customers reported their thermostats had died overnight, leaving homes without heating during a winter cold snap.
What happened: The firmware update introduced a bug in the software that managed the thermostat’s rechargeable lithium-ion battery. The thermostat normally charges its battery from the HVAC system’s 24V C-wire. The bug caused the charging circuit to fail to top up the battery during normal HVAC cycles. Over 48-72 hours, the battery drained completely, and the thermostat shut down.
Why integration tests missed it: Nest’s integration test environment used bench power supplies providing constant 5V USB power to the thermostats, bypassing the HVAC charging circuit entirely. The firmware bug specifically affected the charging handshake between the thermostat’s power management IC and the HVAC system’s transformer – a hardware interaction that the test environment did not replicate.
The integration testing gap:
| Test Level | What Was Tested | What Was Missed |
|---|---|---|
| Unit tests | Battery monitoring algorithm (simulated voltages) | Real charging circuit behavior |
| Software integration | HVAC scheduling + Wi-Fi + display | Power management IC communication |
| Lab integration | Thermostat on USB bench power | HVAC transformer charging handshake |
| Field validation | 2-week beta test on 500 devices | Battery discharge took 48-72 hours, beta period too short for some HVAC configs |
Lessons for IoT integration testing:
Test with production power sources: If the deployed device charges from an HVAC transformer, the integration test must use an HVAC transformer – not a USB power supply. Power path testing is as critical as data path testing.
Extend soak tests beyond one charge cycle: A 2-week beta test was insufficient because the bug manifested only after 2-3 charge/discharge cycles on specific HVAC configurations (those without a C-wire, relying on power-stealing from the R-wire). Soak tests should cover at least 5 full operational cycles of every subsystem.
Monitor subsystem health, not just user-facing functionality: The Nest appeared to work perfectly (displaying temperature, running schedules, responding to commands) while the battery was silently draining. Integration tests should monitor internal health metrics (battery voltage trend, charge current, power management state) alongside visible functionality.
Nest’s CEO Tony Fadell publicly apologized, and the company issued a manual reboot procedure. The incident cost Nest an estimated $5-10 million in customer support, replacement units, and brand damage – far exceeding what a properly instrumented integration test environment would have cost to build.
Worked Example: Integration Test Suite for Smart Thermostat (Device + Cloud + Mobile App)
System Architecture: ESP32-based thermostat reads DHT22 temperature/humidity sensor, controls relay for HVAC, publishes data to AWS IoT Core via MQTT, mobile app (React Native) subscribes to device shadow and sends control commands.
Test Objective: Validate end-to-end integration: sensor → firmware → MQTT → cloud → mobile app → command → relay actuation.
Step 1: Test Infrastructure Setup (Docker Compose)
# docker-compose.test.yml
services:
mosquitto:
image: eclipse-mosquitto:2.0
ports:
- "1883:1883"
volumes:
- ./test-config/mosquitto.conf:/mosquitto/config/mosquitto.conf
mock-aws-iot:
image: localstack/localstack:latest
environment:
- SERVICES=iot,iotdata
ports:
- "4566:4566"
test-runner:
build: ./tests/
depends_on:
- mosquitto
- mock-aws-iot
environment:
MQTT_BROKER: mosquitto:1883
AWS_ENDPOINT: http://mock-aws-iot:4566Step 2: Hardware-in-the-Loop Test Fixture
Physical setup: - ESP32 thermostat device under test (DUT) on bench - USB connection for serial logging and power - Relay output wired to LED (visible indicator instead of real HVAC) - DHT22 sensor in controlled thermal chamber (can set precise temp/humidity) - Ethernet connection to test network (isolated from production)
Step 3: Integration Test Cases (Pytest)
# test_integration_thermostat.py
import pytest
import paho.mqtt.client as mqtt
import requests
import serial
import time
import json
# Test fixture: Device interface
class ThermostatDevice:
def __init__(self, serial_port="/dev/ttyUSB0"):
self.serial = serial.Serial(serial_port, 115200, timeout=1)
time.sleep(2) # Wait for ESP32 boot
def send_command(self, cmd):
self.serial.write(f"{cmd}\n".encode())
def read_log(self, timeout=5):
"""Read serial output until timeout"""
start = time.time()
output = []
while time.time() - start < timeout:
if self.serial.in_waiting:
line = self.serial.readline().decode().strip()
output.append(line)
print(f"[DEVICE] {line}")
return output
def set_target_temp(self, temp_celsius):
"""Simulate user setting target temperature"""
self.send_command(f"SET_TARGET:{temp_celsius}")
def get_relay_state(self):
"""Read GPIO state of relay pin"""
self.send_command("GET_RELAY")
logs = self.read_log(timeout=2)
for line in logs:
if "RELAY:" in line:
return "ON" if "HIGH" in line else "OFF"
return "UNKNOWN"
@pytest.fixture
def device():
"""Flash firmware and boot device"""
print("Flashing firmware...")
os.system("platformio run --target upload")
time.sleep(5)
return ThermostatDevice()
@pytest.fixture
def mqtt_client():
"""MQTT client connected to test broker"""
client = mqtt.Client()
client.connect("localhost", 1883)
client.loop_start()
yield client
client.loop_stop()
@pytest.fixture
def thermal_chamber():
"""Control test chamber temperature"""
# Interface to Espec environmental chamber via RS-232
chamber = EspecChamber(port="/dev/ttyUSB1")
chamber.set_temperature(22) # Default 22°C
yield chamber
chamber.set_temperature(22) # Reset after test
# Test 1: Sensor Reading → MQTT Publish
def test_sensor_data_published_to_mqtt(device, mqtt_client, thermal_chamber):
"""Device should read DHT22 and publish to MQTT every 60 seconds"""
# Set known temperature in chamber
thermal_chamber.set_temperature(25.0)
time.sleep(120) # Wait for chamber stabilization
# Subscribe to device telemetry topic
messages = []
def on_message(client, userdata, msg):
payload = json.loads(msg.payload)
messages.append(payload)
print(f"[MQTT] Received: {payload}")
mqtt_client.on_message = on_message
mqtt_client.subscribe("thermostat/device001/telemetry")
# Wait for at least 2 publish intervals (60s each)
time.sleep(130)
# Assertions
assert len(messages) >= 2, f"Expected >=2 messages, got {len(messages)}"
for msg in messages:
assert "temperature" in msg, "Message missing temperature field"
assert "humidity" in msg, "Message missing humidity field"
assert "timestamp" in msg, "Message missing timestamp"
# Temperature should match chamber setpoint ±1°C (DHT22 accuracy)
assert 24.0 <= msg["temperature"] <= 26.0, \
f"Temp {msg['temperature']}°C outside range [24-26]"
# Test 2: Cloud Command → Relay Control
def test_cloud_command_controls_relay(device, mqtt_client):
"""MQTT command to set target temp should actuate relay when needed"""
# Initial state: room temp 22°C, target 20°C → heating OFF
mqtt_client.publish("thermostat/device001/command", json.dumps({
"target_temperature": 20
}))
time.sleep(3)
assert device.get_relay_state() == "OFF", "Relay should be OFF when temp > target"
# Change target to 24°C → should turn ON heating
mqtt_client.publish("thermostat/device001/command", json.dumps({
"target_temperature": 24
}))
time.sleep(3)
assert device.get_relay_state() == "ON", "Relay should be ON when temp < target"
# Test 3: Network Interruption Recovery
def test_mqtt_reconnection_after_broker_restart(device, mqtt_client):
"""Device should reconnect to MQTT broker after network interruption"""
# Verify device is connected
logs = device.read_log(timeout=5)
assert any("MQTT Connected" in line for line in logs), "Device not initially connected"
# Kill MQTT broker
print("Stopping MQTT broker...")
os.system("docker-compose -f docker-compose.test.yml stop mosquitto")
time.sleep(10)
# Restart broker
print("Restarting MQTT broker...")
os.system("docker-compose -f docker-compose.test.yml start mosquitto")
time.sleep(5)
# Wait for device to reconnect (timeout 60s)
reconnected = False
for attempt in range(12): # 12 x 5s = 60s
logs = device.read_log(timeout=5)
if any("MQTT Connected" in line for line in logs):
reconnected = True
break
assert reconnected, "Device failed to reconnect within 60s"
# Test 4: End-to-End Latency
def test_command_to_actuation_latency(device, mqtt_client):
"""Measure time from MQTT command to relay actuation"""
latencies = []
for i in range(10):
start_time = time.time()
# Send command
mqtt_client.publish("thermostat/device001/command", json.dumps({
"target_temperature": 24 if i % 2 == 0 else 20
}))
# Wait for relay state change
previous_state = device.get_relay_state()
while time.time() - start_time < 5:
current_state = device.get_relay_state()
if current_state != previous_state:
latency_ms = (time.time() - start_time) * 1000
latencies.append(latency_ms)
print(f"Latency: {latency_ms:.1f} ms")
break
time.sleep(0.1)
assert len(latencies) == 10, "Some commands did not actuate relay"
avg_latency = sum(latencies) / len(latencies)
max_latency = max(latencies)
print(f"Average latency: {avg_latency:.1f} ms")
print(f"Max latency: {max_latency:.1f} ms")
assert avg_latency < 500, f"Average latency {avg_latency:.1f} ms exceeds 500 ms"
assert max_latency < 1000, f"Max latency {max_latency:.1f} ms exceeds 1000 ms"
# Test 5: Cloud Device Shadow Sync
def test_device_shadow_synchronization(device, mqtt_client):
"""Device state should sync to AWS IoT Device Shadow"""
# Mock AWS IoT Core shadow (using LocalStack)
shadow_endpoint = "http://localhost:4566"
# Device publishes state
time.sleep(65) # Wait for at least one telemetry publish
# Query device shadow from cloud
response = requests.get(
f"{shadow_endpoint}/things/device001/shadow",
headers={"Authorization": "Bearer test-token"}
)
assert response.status_code == 200, f"Shadow API returned {response.status_code}"
shadow = response.json()
reported_state = shadow["state"]["reported"]
assert "temperature" in reported_state, "Shadow missing temperature"
assert "target_temperature" in reported_state, "Shadow missing target"
assert "relay_state" in reported_state, "Shadow missing relay state"
# Verify shadow matches device state
device_relay = device.get_relay_state()
shadow_relay = reported_state["relay_state"]
assert device_relay == shadow_relay, \
f"Device relay ({device_relay}) != shadow relay ({shadow_relay})"Step 4: Test Execution and Results
# Run integration test suite
docker-compose -f docker-compose.test.yml up -d
pytest tests/test_integration_thermostat.py -v --timeout=300
# Output:
# test_sensor_data_published_to_mqtt PASSED [20%]
# test_cloud_command_controls_relay PASSED [40%]
# test_mqtt_reconnection_after_broker_restart PASSED [60%]
# test_command_to_actuation_latency PASSED [80%]
# test_device_shadow_synchronization PASSED [100%]
#
# ============== 5 passed in 412.32s ==============Test Results Summary:
| Test | Duration | Result | Key Metrics |
|---|---|---|---|
| Sensor → MQTT | 130s | PASS | 2 messages published, temp within ±0.8°C of chamber setpoint |
| Command → Relay | 6s | PASS | Relay actuated correctly for both ON and OFF commands |
| MQTT Reconnect | 42s | PASS | Reconnected in 28s after broker restart (well within 60s timeout) |
| End-to-End Latency | 85s | PASS | Avg latency 284 ms, max 612 ms (both within limits) |
| Cloud Shadow Sync | 68s | PASS | Shadow state matches device state |
Bugs Found During Integration Testing:
| Bug | Caught By | Impact | Fix |
|---|---|---|---|
| MQTT reconnect exponential backoff missing | Reconnect test | Device retries every 5s forever, flooding broker logs | Added exponential backoff: 5s → 10s → 20s → 40s, max 60s |
| Relay toggled during Wi-Fi reconnect | Latency test | GPIO state glitched LOW during Wi-Fi connection, turning relay off briefly | Added GPIO hold during Wi-Fi reconnect |
| Temperature rounding error | Sensor publish test | Device published temp as integer (23) instead of float (23.5), losing precision | Changed JSON serialization to 1 decimal place |
| Shadow update after EVERY sensor read | Shadow sync test | Sending 60 shadow updates/hour exceeded AWS IoT free tier, would cost $18/month/device | Changed to shadow update only when state changes |
Total Integration Test Investment:
- Test infrastructure setup: 2 days (Docker Compose, thermal chamber integration)
- Writing tests: 3 days (5 tests, averaging 6 hours each with debugging)
- Thermal chamber rental: $180/week
- Total cost: ~$3,800 (5 engineer-days + equipment)
Value Delivered:
- Caught 4 critical bugs before production deployment
- Prevented estimated $45,000 in field support costs (bug #2 alone would have required truck rolls to reset devices)
- Provides automated regression testing for future firmware updates (run full suite in 7 minutes on every commit)
- Latency measurements inform SLA promises to customers (“commands execute in <500ms”)
Decision Framework: Choosing Integration Test Scope and Depth
| System Complexity | Integration Test Scope | Test Infrastructure | Approximate Cost | Timeline |
|---|---|---|---|---|
| Simple (1-2 subsystems) | Sensor + firmware + MQTT broker | Local MQTT broker (mosquitto), manual testing | $500-$2K | 1-2 weeks |
| Medium (3-5 subsystems) | Device + protocol + cloud backend + database | Docker Compose test stack, automated pytest | $3K-$10K | 2-4 weeks |
| Complex (6-10 subsystems) | Device + gateway + edge compute + cloud + mobile app + 3rd-party APIs | Full staging environment, CI/CD pipeline, Hardware-in-the-Loop rigs | $15K-$50K | 6-12 weeks |
| Very Complex (>10 subsystems) | Multi-vendor ecosystem, mesh network, AI/ML pipeline, real-time analytics | Production-like staging, chaos engineering, continuous testing | $100K-$500K | 3-6 months |
Decision Criteria:
1. What is the cost of a field failure?
- <$100 per incident: Manual integration testing acceptable
- $100-$1,000: Automated integration tests for critical paths
$1,000: Full HIL testing with staging environment
2. How many devices will you deploy?
- <100: Manual integration testing before each deployment
- 100-1,000: Automated tests run on every firmware commit
- 1,000-10,000: Full CI/CD with staging environment matching production
10,000: Continuous integration testing + canary deployments + rollback automation
3. What are your integration interfaces?
| Interface Type | Test Approach | Tools |
|---|---|---|
| GPIO/I2C/SPI | Hardware-in-the-Loop with real sensors | Logic analyzer (Saleae), oscilloscope |
| MQTT/CoAP/HTTP | Protocol compliance + broker integration | Wireshark, mqtt-spy, Postman |
| Cloud API (AWS/Azure) | Staging environment + mock services | LocalStack (AWS mock), Azurite (Azure mock) |
| BLE/Zigbee/LoRa | RF chamber + protocol sniffer | nRF Connect, Zigbee sniffer (TI CC2531) |
| Mobile app | Automated UI tests + API mocking | Appium, Detox, Cypress |
4. What is your test pyramid balance?
The testing pyramid for IoT integration:
| Test Level | Test Count | Execution Time | Scope |
|---|---|---|---|
| Unit tests | 500-2,000 | <5 min | Individual functions, no I/O |
| Integration tests | 50-200 | 10-60 min | Subsystem interfaces, protocol compliance |
| End-to-end tests | 10-30 | 30-120 min | Full system workflow, cloud integration |
| Field trials | 5-15 scenarios | Days-weeks | Real deployment conditions |
Recommended Test Distribution:
- 70% unit tests (fast, catch logic errors)
- 20% integration tests (medium speed, catch interface mismatches)
- 10% end-to-end tests (slow, catch system-level issues)
5. Docker Compose vs. Kubernetes vs. Manual Setup
| Approach | Best For | Learning Curve | Cost |
|---|---|---|---|
| Manual setup | 1-2 developers, simple systems | Low | $0 |
| Docker Compose | 2-10 developers, medium complexity | Medium | $0 (runs on laptop) |
| Kubernetes | 10+ developers, microservices | High | $100-$500/month (cloud cluster) |
Default Recommendation: Docker Compose for 90% of IoT projects. Kubernetes adds complexity without benefit unless you’re running >20 microservices.
Minimal Viable Integration Test Suite (start here): 1. Device boots and connects to network (Wi-Fi/cellular/LoRa) 2. Sensor data publishes to cloud within expected interval 3. Cloud command received and actuator responds 4. Device reconnects after network interruption 5. End-to-end latency measured and within SLA
These 5 tests catch 80% of integration issues. Add more as system complexity grows.
Common Mistake: Testing Only the Happy Path in Integration
The Scenario: Your smart home security camera integration test suite has 15 tests. All 15 pass every time on CI. The tests verify: - Camera connects to Wi-Fi ✓ - Video stream starts ✓ - Motion detection triggers ✓ - Notification sent to mobile app ✓ - Cloud storage uploads video clip ✓
You deploy to 500 beta testers. Within 48 hours, you get 87 support tickets: - “Camera won’t reconnect after my router rebooted” - “Video is corrupted when my internet is slow” - “Motion alerts stopped working after 3 days uptime” - “Camera drains my bandwidth - used 45 GB in one week”
Every single one of these issues was NOT caught by your integration tests, even though the tests have “100% pass rate.”
What Went Wrong:
Your integration tests only validated the happy path (perfect conditions): - Test Wi-Fi is always available, strong signal - Test internet connection is 100 Mbps, 0% packet loss - Test runs last 5 minutes (motion alert bug appears after 72 hours uptime) - Test doesn’t monitor bandwidth consumption
Real-World Integration Testing Must Cover Failure Modes:
| Happy Path Test (What You Tested) | Failure Mode Test (What You Missed) | Real-World Impact |
|---|---|---|
| Camera connects to Wi-Fi on boot | Camera reconnects after Wi-Fi dropout (5s, 30s, 5min outages) | 23% of customers have intermittent Wi-Fi (mesh networks, weak signal) |
| Video streams at 1080p | Video adapts bitrate when bandwidth drops (<1 Mbps available) | 15% of customers have <5 Mbps upload speed |
| Motion detection triggers instantly | Motion detection after 72-hour uptime (memory leak test) | Bug: motion detection buffer filled after 72 hours, stopped triggering alerts |
| Notification arrives in 2 seconds | Notification arrives even if cloud is temporarily unreachable | 8% of notifications lost when cloud API had latency spike >10s |
| Cloud storage uploads 10-second clip | Cloud storage handles 5-minute continuous motion (large file) | Uploads failed for files >50 MB (undocumented API limit) |
The Missing Tests - Negative and Stress Scenarios:
# tests/test_integration_failure_modes.py
def test_wifi_reconnection_after_dropout(camera_device):
"""Camera should reconnect to Wi-Fi after temporary network loss"""
# Verify initial connection
assert camera_device.wifi_connected()
# Simulate Wi-Fi dropout (disable router port via API)
router.disable_port(camera_device.mac_address)
time.sleep(10) # 10-second outage
# Re-enable Wi-Fi
router.enable_port(camera_device.mac_address)
# Camera should reconnect within 30 seconds
for attempt in range(30):
if camera_device.wifi_connected():
break
time.sleep(1)
assert camera_device.wifi_connected(), \
"Camera failed to reconnect after Wi-Fi dropout"
def test_video_quality_adaptation_under_bandwidth_constraint(camera_device):
"""Video bitrate should adapt when bandwidth is limited"""
# Simulate bandwidth limit using tc (traffic control)
os.system(f"tc qdisc add dev eth0 root tbf rate 512kbit burst 5kb latency 50ms")
# Start video stream
stream = camera_device.start_stream()
time.sleep(30) # Record 30 seconds
# Measure actual bitrate
bitrate_kbps = stream.get_average_bitrate()
# Cleanup bandwidth limit
os.system("tc qdisc del dev eth0 root")
# Should adapt to <512 kbps (allow 10% overhead for protocol)
assert bitrate_kbps < 550, \
f"Bitrate {bitrate_kbps} kbps exceeds 512 kbps limit (no adaptation)"
def test_motion_detection_after_72_hour_uptime(camera_device):
"""Motion detection should work correctly after extended uptime"""
# Accelerated aging: trigger motion detection 10,000 times
# (simulates 72 hours of motion events at 1 per 25 seconds)
for i in range(10000):
camera_device.trigger_motion()
if i % 100 == 0:
print(f"Motion event {i}/10000")
# Verify motion detection still works
triggered = camera_device.trigger_motion()
time.sleep(2)
assert camera_device.last_motion_event_timestamp() > time.time() - 3, \
"Motion detection failed after 10,000 events (memory leak suspected)"
def test_cloud_api_retry_on_transient_failure(camera_device):
"""Cloud uploads should retry when API returns 503 Service Unavailable"""
# Configure mock cloud API to return 503 for first 3 requests
mock_cloud.set_failure_mode(status_code=503, failure_count=3)
# Trigger video clip upload
camera_device.trigger_motion() # Creates 10-second clip
time.sleep(15)
# Verify clip eventually uploaded (after retries)
uploaded_clips = mock_cloud.get_uploaded_clips()
assert len(uploaded_clips) == 1, \
f"Expected 1 clip uploaded after retries, got {len(uploaded_clips)}"
# Verify retry attempts (should be at least 3)
assert mock_cloud.request_count >= 4, \
f"Only {mock_cloud.request_count} API requests (expected >=4 with retries)"
def test_bandwidth_usage_under_continuous_motion(camera_device):
"""Camera should not exceed bandwidth budget even with continuous motion"""
# Simulate 1 hour of continuous motion (worst case: person walking in frame entire time)
camera_device.start_continuous_motion_simulation()
# Monitor network traffic for 60 minutes (or use time compression)
bandwidth_monitor = NetworkBandwidthMonitor(camera_device.ip_address)
bandwidth_monitor.start()
time.sleep(3600) # 1 hour test
total_mb_uploaded = bandwidth_monitor.get_total_megabytes()
# Spec: Camera should use <1 GB/day = 42 MB/hour
assert total_mb_uploaded < 50, \
f"Camera uploaded {total_mb_uploaded} MB in 1 hour (exceeds 42 MB budget)"Test Results After Adding Failure Mode Tests:
| Test | Result | Bug Found |
|---|---|---|
| Wi-Fi reconnection | FAIL | Camera retried only once, then gave up. Added exponential backoff retry (max 10 attempts). |
| Bitrate adaptation | FAIL | Camera always streamed at 2 Mbps regardless of available bandwidth. Implemented adaptive bitrate (ABR). |
| 72-hour uptime | FAIL | Motion detection buffer leaked 2 KB per event, crashed after ~8,000 events. Fixed memory leak. |
| Cloud API retry | FAIL | Camera discarded clip on first 503 error. Added retry logic with exponential backoff. |
| Bandwidth budget | FAIL | Camera uploaded 180 MB/hour (4.3 GB/day!). Implemented local motion filtering (only upload if motion >3s duration). |
After Bug Fixes:
Re-ran beta test with 500 users. Support tickets dropped from 87 to 12 (86% reduction). Remaining tickets were legitimate feature requests, not integration failures.
The Lesson:
Integration tests must cover: 1. Happy path (basic functionality) 2. Sad path (expected errors: network timeout, API 4xx errors, sensor offline) 3. Bad path (unexpected errors: memory leaks, resource exhaustion, race conditions) 4. Stress path (sustained load, peak traffic, continuous operation) 5. Chaos path (random failures injected: kill processes, network partition, clock skew)
Rule of Thumb: For every 1 happy path test, write 2-3 failure mode tests. Integration bugs hide in edge cases, not the sunny-day scenario you tested 100 times on your desk.
4.9 Concept Relationships
How This Concept Connects
Builds on:
- Testing Fundamentals: Integration tests form the middle layer of the testing pyramid
- Unit Testing Firmware: Integration tests validate what unit tests mock
Relates to:
- HIL Testing: Specialized form of hardware-software integration testing
- Protocol Fundamentals: Understanding protocols helps test their implementations
- Cloud Platforms: Cloud integration tests validate device-to-cloud flows
Leads to:
- Environmental Testing: Physical condition tests after integration passes
- Field Testing: Real-world validation after successful integration
- Testing Automation: Automating integration tests in CI/CD
Part of:
- System Validation Strategy: Bridges unit tests (isolated logic) and end-to-end tests (full system)
4.10 See Also
Related Resources
Integration Test Frameworks:
- Pytest - Python integration testing with fixtures
- Docker Compose - Reproducible test environments
- Testcontainers - Docker containers for integration tests
- Paho MQTT Client - MQTT testing library
Network Simulation Tools:
- tc netem - Linux network emulator
- Toxiproxy - Network chaos testing
- Comcast - Simulated bad network conditions
Protocol Testing:
- Wireshark - Protocol analyzer for validating traffic
- mqtt-spy - MQTT protocol inspector
- Postman - HTTP/REST API testing
Cloud Mocking:
- LocalStack - Mock AWS services locally
- Azurite - Azure Storage emulator
- Mosquitto - Lightweight MQTT broker for testing
Case Studies:
- Nest Thermostat Battery Drain Incident (2016) - Integration testing gap analysis
4.11 Try It Yourself
Hands-On Challenge: Build an End-to-End Integration Test
Challenge: Create a complete integration test for an ESP32 temperature monitor that publishes to MQTT, stores data in InfluxDB, and accepts control commands.
Setup Required (90 minutes):
Docker Compose Test Environment:
services: mosquitto: image: eclipse-mosquitto:2.0 ports: - "1883:1883" influxdb: image: influxdb:2.7 ports: - "8086:8086"ESP32 Firmware Configuration:
- Connect to test MQTT broker (localhost:1883)
- Publish temperature every 10 seconds to
sensor/temp - Subscribe to
sensor/commandfor control messages
Write Integration Tests (Python + Pytest):
Test Cases to Implement:
def test_sensor_data_published_to_mqtt(device, mqtt_client):
"""Verify ESP32 publishes sensor data every 10 seconds"""
# Subscribe to topic, wait 25s, assert >=2 messages received
def test_command_received_and_executed(device, mqtt_client):
"""Send MQTT command, verify ESP32 executes (LED toggle)"""
# Publish command, read GPIO state, assert changed
def test_data_stored_in_influxdb(device, mqtt_client, influxdb):
"""End-to-end: sensor → MQTT → InfluxDB"""
# Trigger reading, wait, query InfluxDB, assert data present
def test_mqtt_reconnect_after_broker_restart(device):
"""Device reconnects after broker goes down"""
# Stop mosquitto, wait 10s, restart, assert reconnected
def test_network_latency_500ms(device, mqtt_client):
"""Device handles high latency gracefully"""
# Add 500ms delay with tc netem, verify messages still arriveYour Deliverable:
docker-compose.test.ymlfile- 5 passing integration tests
- Test execution log showing results
- Screenshot of InfluxDB query showing stored data
Success Criteria:
- All tests pass on first run (reproducible)
- Tests complete in <120 seconds
- No manual intervention required
- Network simulation test correctly uses
tc netem
Bonus Challenge:
- Add test for MQTT QoS 1 retry behavior
- Measure end-to-end latency (sensor reading → InfluxDB storage)
- Implement test for 20% packet loss scenario
4.12 Summary
Integration testing validates that IoT system components work together:
- Hardware-Software: Test GPIO, I2C, SPI with real hardware or high-fidelity simulators
- Protocol Testing: Validate MQTT, CoAP, HTTP, BLE compliance and edge cases
- Cloud Integration: End-to-end tests from sensor to cloud and back
- Network Simulation: Test resilience under latency, packet loss, and disconnection
- Environment Control: Use staging environments, not production, for integration tests
4.13 Knowledge Check
Common Pitfalls
1. Mocking Cloud APIs With Static Responses That Never Fail
Integration tests using mock cloud APIs that always return success in <10 ms do not validate behavior under realistic cloud conditions: API rate limiting (HTTP 429), temporary unavailability (HTTP 503), authentication expiry (HTTP 401), and slow responses (5+ second timeouts). Use configurable mock servers (WireMock, FastAPI) that can simulate: random failures (20% of requests return 503), slow responses (inject 5 second delay), and error sequences (succeed for 10 requests, then fail for 2, then succeed again). This validates retry logic, circuit breakers, and timeout handling.
2. Testing Components Against Their Development Version, Not Their Production Version
Integration tests run against a local development instance of the cloud API that may diverge from production. A device that passes integration tests against the development API fails against production when: API version was updated in production but not development, authentication format changed, or response schema evolved. Maintain a production-equivalent staging environment for integration tests, and run a subset of integration tests against the actual production API in a non-destructive read-only mode after each deployment.
3. Not Testing Cross-Component State Consistency
IoT systems have distributed state: device NVM, cloud database, and potentially local gateway cache. Integration tests that only test individual component behaviors miss state consistency failures: device shows “valve open” while cloud database shows “valve closed” after a connectivity interruption. Write integration tests that verify state consistency across components after simulated failures: disconnect at various protocol states, reconnect, and assert that all system components agree on the current state.
4. Skipping Integration Tests Because They Are Slow
Integration tests that require real network connections are slower than unit tests (10–60 seconds vs milliseconds), leading teams to skip them in CI. Instead, design integration tests for speed without sacrificing correctness: use local Docker-based service replicas (local MQTT broker, local CoAP server), parallelize independent test scenarios, and implement test data factories that create required state quickly. A 10-minute integration test suite running on every PR is infinitely better than no integration tests.
4.14 What’s Next?
Continue your testing journey with these chapters:
- Hardware-in-the-Loop Testing: Automate firmware validation with simulated sensors
- Environmental Testing: Validate operation under temperature, humidity, EMC
- Testing Overview: Return to the complete testing guide
| Previous | Current | Next |
|---|---|---|
| Unit Testing for IoT Firmware | Integration Testing for IoT Systems | HIL Testing for IoT |