9 Test Automation and CI/CD for IoT
9.1 Learning Objectives
By the end of this chapter, you will be able to:
- Design CI/CD Pipelines: Build automated testing pipelines for IoT firmware
- Set Up Device Farms: Create infrastructure for automated hardware testing
- Implement Test Metrics: Track coverage, quality, and testing effectiveness
- Maintain Test Documentation: Create traceability matrices for compliance
For Beginners: Test Automation and CI/CD for IoT
Testing and validation ensure your IoT device works correctly and reliably in the real world, not just on your workbench. Think of it like test-driving a car in rain, snow, and heavy traffic before buying it. Thorough testing catches problems before your devices are deployed to thousands of locations where fixing them becomes expensive and disruptive.
Sensor Squad: Testing on Autopilot
“Manual testing is fine for one device, but what about 50 firmware builds a week?” asked Max the Microcontroller. “That is where test automation comes in! A CI/CD pipeline automatically compiles your code, runs unit tests, flashes a test device, and reports pass or fail – all without human intervention.”
Sammy the Sensor described the device farm. “Imagine a rack of test boards, each with different sensors and configurations. When a developer pushes code, the CI server automatically flashes each board, runs test scenarios, and checks the results. If a code change breaks my temperature reading, the pipeline catches it before it ever reaches production.”
Lila the LED emphasized metrics. “Code coverage tells you what percentage of your firmware has been tested. Aiming for 80% coverage means 80% of your code paths have at least one test. Critical safety code should have 100% coverage.” Bella the Battery added, “And traceability matrices map every requirement to a test. When a certification auditor asks ‘how do you know feature X works?’ you show them the automated test that verifies it. No manual testing logs to maintain – it is all in the CI pipeline!”
9.2 Prerequisites
Before diving into this chapter, you should be familiar with:
- Testing Fundamentals: The IoT testing pyramid
- Hardware-in-the-Loop Testing: Automated hardware testing
Key Takeaway
In one sentence: Automated testing catches bugs before they reach production and enables rapid iteration.
Remember this rule: If it’s not automated, it’s not tested. Manual testing doesn’t scale and doesn’t prevent regressions.
9.3 Continuous Integration for IoT
Automate testing on every code commit.
9.3.1 CI Pipeline Stages
9.3.2 GitHub Actions CI Pipeline
# .github/workflows/firmware-ci.yml
name: Firmware CI
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install PlatformIO
run: |
pip install platformio
- name: Build firmware
run: |
pio run -e esp32dev
- name: Upload build artifacts
uses: actions/upload-artifact@v3
with:
name: firmware.bin
path: .pio/build/esp32dev/firmware.bin
unit-tests:
runs-on: ubuntu-latest
needs: build
steps:
- uses: actions/checkout@v3
- name: Install Unity test framework
run: |
git clone https://github.com/ThrowTheSwitch/Unity.git
- name: Run unit tests
run: |
gcc test/*.c Unity/src/unity.c -o test_runner
./test_runner
- name: Generate coverage report
run: |
gcov test/*.c
lcov --capture --directory . --output-file coverage.info
genhtml coverage.info --output-directory coverage_html
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
file: coverage.info
qemu-simulation:
runs-on: ubuntu-latest
needs: build
steps:
- uses: actions/checkout@v3
- name: Download firmware
uses: actions/download-artifact@v3
with:
name: firmware.bin
- name: Install QEMU
run: |
sudo apt-get install qemu-system-xtensa
- name: Run firmware in QEMU
run: |
timeout 60s qemu-system-xtensa \
-M esp32 -kernel firmware.bin \
-serial stdio > qemu_output.txt || true
- name: Validate QEMU output
run: |
grep "Boot successful" qemu_output.txt
grep "Wi-Fi connected" qemu_output.txt
hardware-test:
runs-on: self-hosted # Requires device farm
needs: build
steps:
- uses: actions/checkout@v3
- name: Download firmware
uses: actions/download-artifact@v3
with:
name: firmware.bin
- name: Flash to test device
run: |
esptool.py --port /dev/ttyUSB0 write_flash 0x10000 firmware.bin
- name: Run integration tests
run: |
pytest tests/integration/ -v --device /dev/ttyUSB0
security-scan:
runs-on: ubuntu-latest
needs: build
steps:
- uses: actions/checkout@v3
- name: Download firmware
uses: actions/download-artifact@v3
with:
name: firmware.bin
- name: Scan for secrets
run: |
trufflehog filesystem . --json > secrets_report.json
- name: Scan for vulnerabilities
run: |
binwalk -e firmware.bin
firmwalker firmware.bin.extracted/ > vulnerabilities.txt
- name: Fail if secrets found
run: |
if [ -s secrets_report.json ]; then
echo "Hardcoded secrets detected!"
cat secrets_report.json
exit 1
fi9.4 Device Farm for Hardware Testing
Problem: CI/CD needs real hardware, but devices are physical.
Solution: Device farm - racks of real devices connected to CI/CD infrastructure.
9.4.1 Commercial Device Farms
| Service | Focus | Pricing |
|---|---|---|
| AWS Device Farm | Cloud-based testing on real devices | Pay per minute |
| Firebase Test Lab | Android/iOS app testing | Free tier available |
| Golioth Device Test Lab | IoT-specific testing | Enterprise |
9.4.2 DIY Device Farm Setup
| Component | Purpose | Example |
|---|---|---|
| USB hubs | Connect multiple devices | 20-port powered USB hub |
| Power relays | Reboot devices remotely | USB-controlled relay board |
| UART adapters | Serial console access | FTDI FT232 (x10 devices) |
| Wi-Fi access point | Isolated test network | Raspberry Pi 4 as AP |
| Test automation server | Run tests in parallel | Jenkins on Ubuntu server |
9.4.3 Device Farm Test Example
# device_farm_test.py
import pytest
from device_farm import DeviceFarm
@pytest.fixture(scope="module")
def farm():
return DeviceFarm(config="farm_config.yml")
def test_firmware_on_all_devices(farm):
"""Flash and test firmware on all available devices"""
devices = farm.get_available_devices() # Returns 10 ESP32 dev boards
assert len(devices) >= 10, "Not enough devices in farm"
results = []
for device in devices:
# Flash firmware
farm.flash_device(device, "firmware.bin")
# Reboot
farm.reboot_device(device)
# Run test suite
test_result = farm.run_tests(device, timeout=300)
results.append({
'device_id': device.id,
'passed': test_result.passed,
'failed': test_result.failed
})
# All devices must pass
failures = [r for r in results if r['failed'] > 0]
assert len(failures) == 0, f"Devices failed: {failures}"9.5 Test Metrics and Documentation
9.5.1 Key Test Metrics
Track these metrics to measure test effectiveness:
| Metric | Formula | Target | Purpose |
|---|---|---|---|
| Code Coverage | (Lines executed / Total lines) x 100% | 80%+ | Ensure adequate test breadth |
| Defect Density | Bugs found / 1000 lines of code | <5 per KLOC | Measure code quality |
| Mean Time to Detect (MTTD) | Average time from bug intro to detection | <1 week | Measure test effectiveness |
| Test Pass Rate | (Passed tests / Total tests) x 100% | >95% | Identify flaky tests |
| Field Failure Rate | Failures in field / Devices deployed | <1% first year | Validate pre-release testing |
9.5.2 Metrics Dashboard Example
Firmware Version 2.3.1 - Test Metrics
Code Coverage:
Unit tests: 87% (target: 80%) PASS
Integration tests: 42% (additional coverage)
Total coverage: 92%
Defect Density:
Total LOC: 15,000
Bugs found in testing: 23
Density: 1.5 bugs/KLOC PASS (target: <5)
Test Execution:
Unit tests: 1,247 tests, 1,247 passed (100%) PASS
Integration tests: 89 tests, 84 passed (94%) WARNING
End-to-end tests: 12 tests, 10 passed (83%) FAIL
MTTD:
Average: 4.2 days PASS (target: <7 days)
Longest: 18 days (memory leak in sleep mode)
Field Metrics (from v2.3.0):
Deployed devices: 10,000
Field failures: 47 (0.47%) PASS
Top failure: Wi-Fi reconnection timeout (18 devices)
9.6 Requirements Traceability
Why traceability matters:
- Regulatory compliance: FDA, automotive (ISO 26262) require proof of testing
- Audit trail: Understand why tests exist, what they validate
- Regression prevention: Ensure tests cover all requirements
9.6.1 Traceability Matrix
| Requirement ID | Requirement | Test ID | Test Type | Status |
|---|---|---|---|---|
| REQ-001 | Device boots in <5s | UT-015, IT-003 | Unit, Integration | Pass |
| REQ-002 | Wi-Fi reconnects after dropout | IT-022, E2E-005 | Integration, E2E | Pass |
| REQ-003 | Battery life >2 years | IT-030, SOAK-001 | Integration, Field | Pending |
| REQ-004 | Firmware signed with RSA-2048 | ST-012 | Security | Pass |
| REQ-005 | Temperature range: -40C to +85C | ENV-001, ENV-002 | Environmental | Pass |
9.6.2 Traceability Tools
| Tool | Purpose | Integration |
|---|---|---|
| JIRA + Xray | Link requirements to test cases | CI/CD reporting |
| TestRail | Test management with requirements linking | Automation APIs |
| Polarion | Full ALM (Application Lifecycle Management) | Enterprise |
9.7 Test Strategy Optimization
9.7.1 Tiered Testing Approach
Not all tests should run on every commit:
| Tier | Trigger | Duration | Tests |
|---|---|---|---|
| Tier 1 (Commit) | Every commit | <10 min | Unit tests, lint, build |
| Tier 2 (PR) | Pull request | <30 min | Integration, security scan |
| Tier 3 (Nightly) | Scheduled | <4 hours | HIL, extended integration |
| Tier 4 (Release) | Release candidate | <24 hours | Soak test, full regression |
9.7.2 Test Selection Optimization
# Intelligent test selection based on changed files
def select_tests(changed_files):
tests_to_run = set()
for file in changed_files:
if file.startswith("src/wifi/"):
tests_to_run.add("tests/unit/test_wifi.py")
tests_to_run.add("tests/integration/test_wifi_connectivity.py")
elif file.startswith("src/sensor/"):
tests_to_run.add("tests/unit/test_sensor.py")
tests_to_run.add("tests/integration/test_sensor_accuracy.py")
elif file.startswith("src/mqtt/"):
tests_to_run.add("tests/unit/test_mqtt.py")
tests_to_run.add("tests/integration/test_mqtt_publish.py")
tests_to_run.add("tests/integration/test_mqtt_subscribe.py")
# Always run critical path tests
tests_to_run.add("tests/unit/test_boot.py")
tests_to_run.add("tests/integration/test_critical_path.py")
return list(tests_to_run)9.8 Knowledge Check
Worked Example: Optimizing CI/CD Pipeline Runtime for 5x Faster Feedback
Scenario: Your IoT firmware team has 350 unit tests taking 8 seconds, 80 integration tests taking 45 minutes, and 25 HIL tests taking 4 hours. Developers complain that waiting 45 minutes for integration test results kills productivity. They commit code, go to lunch, and find failures when they return - by which point context is lost.
Given:
- Team size: 8 developers making 15 commits/day total
- Current pipeline: Unit (8s) → Integration (45 min) → HIL (4 hrs) - all sequential
- Integration test breakdown: 30 MQTT tests (20 min), 25 Wi-Fi tests (15 min), 15 CoAP tests (8 min), 10 security tests (2 min)
- Problem: 45-minute integration feedback loop causes developers to context-switch away
Analysis:
Current Pipeline (sequential):
└─ Unit tests: 8s
└─ All integration tests: 45 min (blocking)
└─ HIL tests: 4 hrs
Total time to full results: ~5 hours
Developer impact:
- Commit at 10am → integration results at 10:45am
- Developer starts new work at 10:10am (context switched)
- At 10:45am, must recall what was committed 35 minutes ago
- If integration fails, must stop current work to fix old commit
- Typical productivity loss: 20-30 minutes per failed integration
Solution: Parallel Test Execution + Tiered Gating
# Optimized pipeline with parallel stages
jobs:
gate-commit:
runs-on: ubuntu-latest
steps:
- Unit tests (8s)
- Static analysis (3s)
- Build verification (2s)
# TOTAL: ~15 seconds (pass/fail gate for commit)
gate-critical-integration: # Run in parallel with other stages
needs: gate-commit
runs-on: [self-hosted, device-farm]
strategy:
matrix:
test-suite: [mqtt-critical, wifi-critical, coap-critical]
steps:
- Run critical path tests (5 min per suite)
# 3 suites × 5 min = 5 min parallel (not 15 min sequential)
gate-full-integration: # Runs nightly, not per-commit
schedule: "0 2 * * *" # 2 AM daily
runs-on: [self-hosted, device-farm]
strategy:
matrix:
test-suite: [mqtt-full, wifi-full, coap-full, security-full]
steps:
- Run comprehensive integration tests
# 45 minutes, but not blocking commits
gate-hil: # Runs weekly
schedule: "0 2 * * 0" # 2 AM Sunday
runs-on: [self-hosted, device-farm]
steps:
- Run full HIL regression suite (4 hours)Results:
| Pipeline Stage | Before | After | Improvement |
|---|---|---|---|
| Commit feedback | 45 min | 5 min | 9x faster |
| Critical bug detection | 45 min | 5 min | Same quality |
| Full integration coverage | 45 min | Nightly | Deferred |
| Developer context loss | High | Minimal | 5-min window |
| Daily pipeline minutes | 675 min (15 commits × 45 min) | 75 min (15 × 5 min) | 89% reduction |
Critical insight: Not all integration tests are equally important. The 15 “critical path” tests (MQTT connection, Wi-Fi reconnection, CoAP request/response, TLS handshake, firmware signature validation) catch 80% of integration bugs. Running these 15 tests (5 minutes parallel) on every commit catches most issues fast, while comprehensive testing (45 minutes) runs nightly to catch the remaining 20%.
Putting Numbers to It
Pipeline optimization follows the 80/20 rule — critical tests catch most bugs while taking minimal time. Calculate daily pipeline cost to justify optimization:
\[\text{Daily Pipeline Cost} = \text{Commits/day} \times \text{Test Duration} \times \text{Compute Cost/min}\]
For 15 daily commits with 45-minute full test runs at $0.10/minute compute cost:
\[\text{Before: } 15 \times 45 \times 0.10 = \$67.50\text{/day} = \$2,025\text{/month}\]
With tiered testing (5-minute critical tests per commit, 45-minute comprehensive nightly):
\[\text{After: } (15 \times 5 + 1 \times 45) \times 0.10 = \$12.00\text{/day} = \$360\text{/month}\]
The optimization saves $1,665/month while maintaining quality (critical tests catch 80% of bugs within 5 minutes vs 45-minute wait).
Key Insight: Fast feedback trumps comprehensive feedback for developer productivity. A 5-minute critical test gate catches most bugs immediately, while nightly comprehensive tests catch the rest without blocking flow.
Decision Framework: Selecting Test Automation Strategy for Your IoT Project
Use this framework to determine which testing layers to automate and how frequently to run them based on project constraints.
| Project Characteristic | Automation Strategy | Rationale |
|---|---|---|
| Team size: 1-3 developers | Unit tests on commit (100%), Integration on PR merge (80%), Manual HIL (20%) | Small teams can’t maintain complex automation - prioritize unit tests |
| Team size: 4-10 developers | Unit + critical integration on commit, Full integration nightly, HIL weekly | Balance fast feedback with comprehensive coverage |
| Team size: 10+ developers | Tiered pipeline: Unit (commit), Critical integration (commit), Full integration (nightly), HIL (nightly), Soak (weekly) | Large teams need sophisticated gating to prevent broken main branch |
| Safety-critical (medical, automotive) | 100% automated testing with requirements traceability matrix, mandatory HIL on every PR | Regulatory compliance requires documented testing of every requirement |
| Consumer IoT (non-critical) | Unit + integration automated, Manual HIL before releases | Cost-effectiveness - automate common cases, manual test edge cases |
| Rapid prototyping phase | Unit tests only (60% coverage target) | Speed over perfection - automate critical paths, manual test new features |
| Mature product maintenance | Comprehensive automation (unit 85%, integration 90%, HIL 80%) | Prevent regressions - automate everything to protect stable codebase |
| Budget <$10K/year | GitHub Actions (free tier), DIY device farm (10 devices, $500), Manual HIL | Maximize free/low-cost tools |
| Budget $10K-$50K/year | GitHub Actions (paid), Cloud device farm (AWS, Firebase), Semi-automated HIL | Professional tools with managed infrastructure |
| Budget >$50K/year | Enterprise CI/CD (Jenkins, GitLab), Dedicated device farm (50+ devices), Full automation | Build custom infrastructure for maximum control |
Decision tree for test execution frequency:
Is the test deterministic and fast (<30 seconds)?
├─ YES: Run on every commit
└─ NO: Is it critical for basic functionality?
├─ YES: Run on pull request merge
└─ NO: Does it require expensive hardware or >5 minutes?
├─ YES: Run nightly or weekly
└─ NO: Run on pull request merge
When to run different test types:
| Test Type | Trigger | Duration Budget | Coverage Target |
|---|---|---|---|
| Unit tests | Every commit | <10 seconds | 85% line coverage |
| Static analysis | Every commit | <5 seconds | 100% of codebase |
| Integration (critical) | Every commit | <5 minutes | 100% of critical paths |
| Integration (full) | Pull request or nightly | <30 minutes | 90% of subsystem interactions |
| HIL (smoke test) | Pull request | <15 minutes | Top 10 hardware scenarios |
| HIL (regression) | Nightly | <4 hours | 80% of hardware interactions |
| Soak testing | Weekly | 72-168 hours | 100% of long-running scenarios |
| Security scanning | Pull request | <10 minutes | 100% of codebase + dependencies |
| Performance testing | Pull request (if affected) | <30 minutes | Baseline + stress scenarios |
Key Insight: Test automation is not all-or-nothing. Start with unit tests (highest ROI), add critical integration tests, then expand based on pain points and failure patterns.
Common Mistake: Treating All Tests Equally in CI/CD Pipelines
The Mistake: Running all 455 tests (unit + integration + HIL) sequentially on every commit, resulting in 5-hour feedback loops. Developers commit code, go home, discover failures the next morning.
Why It Happens:
- Misunderstanding of the testing pyramid - treating all tests as equally important
- “If some testing is good, more testing is better” fallacy
- Not measuring the cost of slow feedback vs. comprehensive coverage
- CI/CD tools default to sequential execution
Real-World Impact:
Scenario: Developer commits Wi-Fi reconnection fix at 4:30 PM
├─ 5:00 PM: Developer leaves office (tests still running)
├─ 9:00 PM: CI pipeline completes - integration test failure found
├─ 9:00 AM next day: Developer arrives, sees failure notification
└─ Cost: 16 hours of latency, broken main branch overnight
Second-Order Effects:
- Developers stop running CI locally (too slow), pushing broken code more often
- Teams create “CI skip” commit messages to bypass slow pipelines
- Integration bugs reach production because comprehensive testing is too painful
- Developer morale suffers from constant context switching
The Fix:
Tiered testing with different triggers:
- Fast tests (unit + lint): Every commit, <10 seconds
- Critical integration: Every commit, <5 minutes
- Full integration: Pull request or nightly
- Expensive tests (HIL, soak): Nightly or weekly
Parallel execution:
# Run integration suites in parallel, not sequential strategy: matrix: suite: [wifi, mqtt, coap, ble, security] # 5 suites × 5 min = 5 min parallel (not 25 min sequential)Smart test selection:
# Only run affected tests for targeted commits if changed_files.any(path.startswith("src/wifi/")): run_tests(["unit/wifi", "integration/wifi"])Fail-fast strategy:
# Stop pipeline immediately on first failure strategy: fail-fast: true
Verification: Your pipeline is well-designed if: - ✅ Developers get commit feedback within 5-10 minutes - ✅ Critical integration tests run on every commit - ✅ Full regression suite runs daily (catches remaining edge cases) - ✅ Developers never bypass CI/CD because it’s “too slow” - ✅ Main branch breaks <1% of the time (fast critical tests catch most issues)
Key Insight: The goal of CI/CD is not to run all tests on every commit - it’s to provide fast, actionable feedback that prevents bad code from reaching production. A 5-minute critical test suite that catches 80% of bugs is more valuable than a 5-hour comprehensive suite that catches 95%.
9.9 Summary
Test automation enables quality at scale:
- CI/CD Pipelines: Automated build, test, and deployment on every commit
- Device Farms: Real hardware testing integrated into CI/CD
- Tiered Testing: Fast tests per-commit, comprehensive tests nightly/weekly
- Metrics: Track coverage, defect density, MTTD, and field failure rate
- Traceability: Link requirements to tests for regulatory compliance
9.10 Knowledge Check
9.11 Concept Relationships
How This Connects
Builds on: Testing Fundamentals establishes the testing pyramid; HIL Testing provides automated hardware validation.
Relates to: Integration Testing defines what to automate; Environmental Testing includes production test automation.
Leads to: Field Testing validates automation coverage; continuous deployment pipelines.
Part of: DevOps for IoT - automating the full develop-test-deploy cycle.
9.12 See Also
CI/CD Platforms:
- GitHub Actions for IoT: docs.github.com/actions/iot
- GitLab CI for Embedded: docs.gitlab.com/ee/ci/embedded
- Jenkins Pipeline Examples: Community IoT repos
Test Frameworks:
- Unity Test for C: throwtheswitch.org/unity
- Pytest for Python: docs.pytest.org
- Robot Framework for integration: robotframework.org
Metrics and Reporting:
- Codecov for coverage tracking
- SonarQube for code quality
- TestRail for requirements traceability
9.13 Try It Yourself
Build a CI/CD Pipeline for ESP32 Firmware
Task: Create GitHub Actions workflow that compiles, tests, and flashes firmware on every commit.
Implementation (2 hours): 1. Create workflow file .github/workflows/firmware-ci.yml 2. Add stages: - Compile with PlatformIO - Run unit tests (Unity framework) - Static analysis (cppcheck) - Flash to self-hosted HIL device - Run integration tests (pytest) 3. Configure self-hosted runner on Raspberry Pi connected to ESP32 test board 4. Add badge to README showing build status
What to Observe:
- Every commit triggers full test suite automatically
- Failed builds block PR merges
- Test results visible in GitHub UI
- Hardware-in-the-loop tests run on physical device
Expected Outcome: From “works on my machine” to “works in CI” - catching integration bugs before they reach main branch.
Deliverable: Working GitHub repo with green CI badge and test history.
Common Pitfalls
1. Automating Only Unit Tests and Skipping Integration Tests
IoT firmware unit tests that mock all hardware dependencies and external services validate logic in isolation but miss the most common production failures: incorrect HAL usage, timing-dependent race conditions between RTOS tasks, sensor communication errors, and cloud API contract violations. A test suite with 90% unit test coverage but no integration tests provides false confidence. Balance the test pyramid: 60% unit, 30% integration, 10% E2E — do not skip the integration layer.
2. Writing Tests That Depend on Test Execution Order
Automated test suites where Test B relies on state created by Test A fail when tests are run in parallel or Test A is disabled. Each test must be: independent (no shared mutable state), idempotent (produces same result on repeated runs), and isolated (creates and cleans up its own fixtures). Use setup/teardown functions (setUp/tearDown in Unity, conftest fixtures in pytest) to establish a clean state for each test rather than relying on execution order.
3. Ignoring Flaky Tests Instead of Fixing Them
Flaky tests (intermittently pass/fail due to timing, resource contention, or non-deterministic behavior) erode CI trustworthiness — teams start ignoring red builds “because it’s probably just flaky.” Treat flaky tests as production bugs: quarantine flaky tests immediately (separate suite, not blocking merge), assign ownership for investigation, and require resolution within one sprint. Track flaky test rate as a team quality metric; keep it below 0.5% of total test count.
4. Not Testing Error Recovery and Exception Paths
IoT firmware test suites with 80% coverage often miss error-handling code paths entirely. The most critical production failures are in error recovery paths: what happens after a sensor read timeout? After an I2C bus lock? After a network connection failure with a full data buffer? Write explicit tests for each error scenario: inject simulated hardware failures using mock objects, verify the device enters a safe state, and validate that recovery returns the system to normal operation correctly.
9.14 What’s Next?
Continue your testing journey with these chapters:
- Testing Overview: Return to the complete testing guide
- Field Testing and Deployment: Beta programs and real-world validation
| Previous | Current | Next |
|---|---|---|
| Security Testing for IoT Devices | Test Automation and CI/CD for IoT | Hardware Simulation Fundamentals |