Organize Code Effectively: Structure firmware into modular, maintainable components
Manage Configuration: Separate secrets from code and enable runtime configuration
Optimize Power Consumption: Implement sleep modes and duty cycling
Handle Errors Gracefully: Build robust error recovery and watchdog protection
Avoid Common Mistakes: Recognize and prevent the most frequent IoT firmware bugs
For Beginners: Software Best Practices
This hands-on chapter gives you practical prototyping experience with real (or simulated) IoT hardware. Think of it as workshop time – reading about prototyping is useful, but the real learning happens when you pick up the tools and start building. Each exercise builds skills you will use in your own IoT projects.
Sensor Squad: Rules for Writing Good Code
“Let me share the mistakes I have made so you do not repeat them!” said Max the Microcontroller with a sigh. “Rule one: organize your code into modules. Sensor code in one file, communication in another, configuration in a third. When everything is in one giant file, debugging becomes a nightmare.”
Sammy the Sensor shared his tip. “Never hard-code your Wi-Fi password or API keys in the firmware! Put them in a separate configuration file or use environment variables. If you accidentally share your code, you do not want secrets exposed.” Lila the LED added, “And always add logging! Print debug messages with severity levels – INFO for normal events, WARNING for concerning things, and ERROR for failures. When something breaks at 3 AM, those log messages are your only clue.”
Bella the Battery had the most important advice. “Use sleep modes! After reading a sensor and sending data, put the microcontroller into deep sleep until the next reading is due. I have seen projects waste 100 times more power than necessary just because the code forgot to sleep. And always set up a watchdog timer – if the code crashes or hangs, the watchdog automatically reboots the device. Without it, a frozen sensor stays frozen forever!”
Key Concepts
Microcontroller Unit (MCU): Integrated circuit combining CPU, RAM, flash, and peripherals optimised for embedded control applications.
Microprocessor Unit (MPU): High-performance processor requiring external RAM, storage, and peripherals, used in Linux-based IoT devices like Raspberry Pi.
Schematic: Electrical diagram showing component connections using standardised symbols, used to guide PCB layout.
PCB (Printed Circuit Board): Fiberglass substrate with etched copper traces connecting electronic components into a permanent assembly.
ESD Protection: Diodes and resistors protecting sensitive IC pins from electrostatic discharge during handling and in-field use.
Decoupling Capacitor: Small capacitor placed close to IC power pins to suppress high-frequency noise on the supply rail.
Design Rule Check (DRC): Automated PCB verification ensuring trace widths, clearances, and drill sizes meet the fabrication process constraints.
24.2 Prerequisites
Before diving into this chapter, you should be familiar with:
html`<div style="background: var(--bs-light, #f8f9fa); padding: 1.5rem; border-radius: 8px; border-left: 4px solid #16A085; margin-top: 0.5rem;"><h4 style="margin-top: 0; color: #2C3E50;">Power Budget Results</h4><table style="width:100%; border-collapse:collapse; margin-top:0.5rem;"><tr style="border-bottom: 1px solid #ddd;"> <td style="padding:8px; font-weight:bold;">Energy per cycle</td> <td style="padding:8px; text-align:right;">${power_results.energy_per_cycle_mAh.toFixed(5)} mAh</td></tr><tr style="border-bottom: 1px solid #ddd;"> <td style="padding:8px; font-weight:bold;">Cycle duration</td> <td style="padding:8px; text-align:right;">${power_results.cycle_time_s.toFixed(1)} seconds</td></tr><tr style="border-bottom: 1px solid #ddd;"> <td style="padding:8px; font-weight:bold;">Average current draw</td> <td style="padding:8px; text-align:right;">${power_results.avg_current_mA.toFixed(2)} mA</td></tr><tr style="border-bottom: 1px solid #ddd; background: #e8f5f1;"> <td style="padding:8px; font-weight:bold; color: #16A085;">Battery life (with deep sleep)</td> <td style="padding:8px; text-align:right; font-weight:bold; color: #16A085;">${power_results.lifetime_days.toFixed(1)} days</td></tr><tr style="border-bottom: 1px solid #ddd;"> <td style="padding:8px; font-weight:bold;">Battery life (continuous operation)</td> <td style="padding:8px; text-align:right;">${power_results.continuous_lifetime_days.toFixed(1)} days</td></tr><tr style="background: #fff3e0;"> <td style="padding:8px; font-weight:bold; color: #E67E22;">Improvement factor</td> <td style="padding:8px; text-align:right; font-weight:bold; color: #E67E22;">${power_results.improvement_factor.toFixed(1)}x longer</td></tr></table><p style="margin-top: 1rem; margin-bottom: 0; font-size: 0.9em; color: #7F8C8D;"><strong>Tip:</strong> Adjust the wake and sleep times to see how duty cycling affects battery life. Even small reductions in wake time or active current can have dramatic impacts!</p></div>`
Test with Power Profiling Tools Early
A critical oversight is developing firmware without measuring actual power consumption until deployment. Prototype testing on USB power masks the reality of battery operation.
Example: A “low power” environmental sensor prototype ran fine on USB for months, but deployed units with 2000mAh batteries lasted only 3 days instead of projected 6 months - Wi-Fi wasn’t properly sleeping, consuming 80mA continuously.
Solution: Use power profilers (Nordic Power Profiler Kit, Joulescope, or simple INA219 breakout) during development. Measure current in all states: active, idle, sleep, transmission.
24.6 Error Handling
Graceful Degradation:
bool sendData(){int retries =3;while(retries >0){if(WiFi.status()== WL_CONNECTED){if(client.publish(topic, data)){returntrue;}} retries--; delay(1000);}// Store data locally for later transmission storeDataLocally(data);returnfalse;}
Watchdog Timer:
#include <esp_task_wdt.h>void setup(){ esp_task_wdt_init(30,true);// 30 second watchdog esp_task_wdt_add(NULL);}void loop(){// Feed watchdog to prevent reset esp_task_wdt_reset();// Your code doWork();}
Watch how an ESP32’s heap memory degrades over time with and without proper memory management. Adjust the allocation size and loop interval to see how quickly memory runs out.
html`<div style="background: var(--bs-light, #f8f9fa); padding: 1.5rem; border-radius: 8px; border-left: 4px solid ${mem_sim.crashTime?'#E74C3C':'#16A085'}; margin-top: 0.5rem;"><h4 style="margin-top: 0; color: #2C3E50;">Memory Usage Over Time</h4><svg viewBox="0 0 600 220" style="width: 100%; max-width: 600px; font-family: Arial, sans-serif;"> <!-- Axes --> <line x1="60" y1="10" x2="60" y2="180" stroke="#7F8C8D" stroke-width="1"/> <line x1="60" y1="180" x2="580" y2="180" stroke="#7F8C8D" stroke-width="1"/> <!-- Y-axis label --> <text x="15" y="100" fill="#2C3E50" font-size="11" text-anchor="middle" transform="rotate(-90,15,100)">Free Heap (KB)</text> <!-- X-axis label --> <text x="320" y="210" fill="#2C3E50" font-size="11" text-anchor="middle">Loop iterations</text> <!-- Danger zone --> <rect x="60" y="${180- (10/ (mem_sim.totalHeap/1024)) *170}" width="520" height="${(10/ (mem_sim.totalHeap/1024)) *170}" fill="#E74C3C" opacity="0.1"/> <text x="570" y="${180- (5/ (mem_sim.totalHeap/1024)) *170}" fill="#E74C3C" font-size="9" text-anchor="end">Crash zone</text> <!-- Grid lines -->${[0.25,0.5,0.75].map(f => {const y =180- f *170;const val = (f * mem_sim.totalHeap/1024).toFixed(0);return`<line x1="60" y1="${y}" x2="580" y2="${y}" stroke="#ddd" stroke-width="0.5"/> <text x="55" y="${y +4}" fill="#7F8C8D" font-size="9" text-anchor="end">${val}</text>`; }).join("")} <text x="55" y="184" fill="#7F8C8D" font-size="9" text-anchor="end">0</text> <text x="55" y="14" fill="#7F8C8D" font-size="9" text-anchor="end">${(mem_sim.totalHeap/1024).toFixed(0)}</text> <!-- Data line --> <polyline fill="none" stroke="${mem_sim.crashTime?'#E74C3C':'#16A085'}" stroke-width="2" points="${mem_sim.data.map((d, i) => {const x =60+ (i /Math.max(1, mem_sim.data.length-1)) *520;const y =180- (d.freeHeap/ mem_sim.totalHeap) *170;return`${x},${y}`; }).join(" ")}"/> <!-- Crash marker -->${mem_sim.crashTime?`<circle cx="${60+ ((mem_sim.data.length-1) /Math.max(1, mem_sim.data.length-1)) *520}" cy="180" r="6" fill="#E74C3C"/> <text x="${60+ ((mem_sim.data.length-1) /Math.max(1, mem_sim.data.length-1)) *520}" y="175" fill="#E74C3C" font-size="10" text-anchor="middle" font-weight="bold">CRASH</text>`:""}</svg><div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 0.75rem; margin-top: 1rem;"> <div style="background: white; padding: 0.75rem; border-radius: 4px; text-align: center;"> <div style="font-size: 0.8em; color: #7F8C8D;">Strategy</div> <div style="font-weight: bold; color: #2C3E50;">${mem_sim.strategy}</div> </div> <div style="background: white; padding: 0.75rem; border-radius: 4px; text-align: center;"> <div style="font-size: 0.8em; color: #7F8C8D;">Loops until crash</div> <div style="font-weight: bold; color: ${mem_sim.crashTime?'#E74C3C':'#16A085'};">${mem_sim.loopsUntilCrash}</div> </div> <div style="background: white; padding: 0.75rem; border-radius: 4px; text-align: center;"> <div style="font-size: 0.8em; color: #7F8C8D;">Time to crash</div> <div style="font-weight: bold; color: ${mem_sim.crashTime?'#E74C3C':'#16A085'};">${mem_sim.crashTime? mem_sim.crashTime+"s":"Stable"}</div> </div></div><p style="margin-top: 0.75rem; margin-bottom: 0; font-size: 0.9em; color: #7F8C8D;"><strong>Insight:</strong> ${mem_sim.crashTime?`With "${mem_sim.strategy}", the device crashes after just ${mem_sim.crashTime} seconds! In production, this means random reboots that are extremely hard to debug. Switch to "delete[] after use" or "Stack allocation" to see stable behavior.`:`With "${mem_sim.strategy}", memory remains stable across all iterations. This is the pattern you should use in production IoT firmware.`}</p></div>`
24.8 Common Pitfalls and Solutions
24.8.1 Pitfall 1: Blocking Code
Problem:
// BAD: This blocks for 5 seconds!void loop(){ delay(5000);// Device can't respond to anything readSensor();}
Solution:
// GOOD: Non-blocking delayunsignedlong lastRead =0;void loop(){if(millis()- lastRead >=5000){ readSensor(); lastRead = millis();}// Can do other things while "waiting"}
Try It: Blocking vs Non-Blocking Timeline
Visualize how blocking delay() freezes the entire device versus how millis()-based timing lets the microcontroller handle multiple tasks concurrently. Adjust the number of tasks and timing to see the impact.
// Prototype runs fine, then hangs forevervoid loop(){ readSensor(); sendToCloud();// If this hangs, device freezes!}
Real consequence: 500 environmental sensors deployed in remote rainforest. 100 froze within first week. No physical access to restart. Had to send technicians on 6-hour hike ($15,000 cost).
Solution:
#include <esp_task_wdt.h>void setup(){ esp_task_wdt_init(30,true);// 30-second watchdog esp_task_wdt_add(NULL);}void loop(){ esp_task_wdt_reset();// Pet the dog every loop readSensor(); sendToCloud();// If loop hangs for 30+ seconds, watchdog resets device}
24.9 Requirements Pitfalls
Pitfall: Requirements Scope Creep
The mistake: Continuously adding features during prototyping without re-evaluating timeline, budget, or feasibility.
Symptoms:
Feature list grows after every stakeholder meeting
Original 3-month timeline becomes 9 months
Team working on multiple half-finished features simultaneously
The fix:
Define MVP before prototyping starts
Create a “feature parking lot” for post-MVP ideas
Require trade-off analysis: “What do we cut to add this?”
Use timeboxing: fixed end dates, not feature gates
Pitfall: Ignoring Non-Functional Requirements
The mistake: Focusing entirely on features while ignoring security, power consumption, reliability.
Symptoms:
Prototype sends data over unencrypted HTTP
Battery life measured in hours instead of months
Device crashes after 24 hours due to memory leaks
No OTA update mechanism
The fix:
Security: Use TLS from the first prototype
Power: Measure current consumption weekly
Reliability: Implement watchdog timers in prototype code
Maintainability: Design OTA mechanism before field deployment
Label the Diagram
Code Challenge
Order the Steps
Match the Concepts
24.10 Summary Table
Mistake
Symptom
Fix
Blocking code
Device unresponsive
Use millis() instead of delay()
No error handling
Random crashes
Check all return values
Hardcoded credentials
Can’t deploy to different sites
Store in EEPROM/NVS, use config portal (see detailed callout below)
Interrupt safety issues
Race conditions, corrupted data
Use volatile, atomic access with noInterrupts()
Memory leaks
Crashes after hours/days
Free allocated memory, use stack allocation
Not testing edge cases
Field failures
Test Wi-Fi drops, sensor errors, broker timeouts
No watchdog timer
Freezes require manual restart
Enable hardware watchdog, feed regularly
Golden Rule: Every “I’ll fix it later” in your prototype becomes a $10,000 bug in production. Fix it NOW while it’s easy!
Try It: Firmware Quality Scorecard
Rate your IoT firmware project against the best practices covered in this chapter. Check each practice you have implemented to see your overall quality score and identify areas for improvement.
Show code
viewof fw_checks = Inputs.checkbox( ["Modular code (separate files for sensors, network, config)","Secrets stored outside source code (NVS/EEPROM or secrets.h)","Deep sleep / power management implemented","Non-blocking timing (millis() instead of delay())","Error handling on all sensor reads","Error handling on all network operations","Watchdog timer enabled","Memory managed properly (no leaks, stack allocation preferred)","Logging with severity levels (INFO/WARN/ERROR)","Timeouts on all blocking operations",".gitignore covers secrets and env files","Tested with power profiler" ], {label:"Check each practice you have implemented:",value: []})
Show code
fw_score = {const total =12;const checked = fw_checks.length;const pct = (checked / total) *100;const categories = [ {name:"Code Organization",items: ["Modular code (separate files for sensors, network, config)"],color:"#3498DB"}, {name:"Security",items: ["Secrets stored outside source code (NVS/EEPROM or secrets.h)",".gitignore covers secrets and env files"],color:"#9B59B6"}, {name:"Power",items: ["Deep sleep / power management implemented","Tested with power profiler"],color:"#16A085"}, {name:"Reliability",items: ["Non-blocking timing (millis() instead of delay())","Error handling on all sensor reads","Error handling on all network operations","Watchdog timer enabled","Timeouts on all blocking operations"],color:"#E67E22"}, {name:"Maintainability",items: ["Memory managed properly (no leaks, stack allocation preferred)","Logging with severity levels (INFO/WARN/ERROR)"],color:"#2C3E50"} ];const catScores = categories.map(cat => {const done = cat.items.filter(item => fw_checks.includes(item)).length;return {...cat, done,total: cat.items.length,pct: (done / cat.items.length) *100}; });let grade, gradeColor;if (pct >=90) { grade ="A"; gradeColor ="#16A085"; }elseif (pct >=75) { grade ="B"; gradeColor ="#3498DB"; }elseif (pct >=58) { grade ="C"; gradeColor ="#E67E22"; }elseif (pct >=40) { grade ="D"; gradeColor ="#E74C3C"; }else { grade ="F"; gradeColor ="#E74C3C"; }const missing = [];if (!fw_checks.includes("Watchdog timer enabled")) missing.push("Add a watchdog timer -- without it, a hung device stays hung forever.");if (!fw_checks.includes("Secrets stored outside source code (NVS/EEPROM or secrets.h)")) missing.push("Move credentials out of source code before someone accidentally pushes them to GitHub.");if (!fw_checks.includes("Error handling on all network operations")) missing.push("Add error handling to network calls -- network failures are the #1 cause of field issues.");if (!fw_checks.includes("Deep sleep / power management implemented")) missing.push("Implement sleep modes to extend battery life from days to months.");return { checked, total, pct, grade, gradeColor, catScores, missing };}
Scenario: An ESP32 weather station randomly resets every 2-4 hours in the field, but never during bench testing. Serial logs show: rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) — an RTC watchdog reset. How do you diagnose and fix this?
Step 1: Enable Detailed Logging
Add structured logging with timestamps and severity levels:
Finding: Reset happens during MQTT publish, not during sensor reading or Wi-Fi connection. This suggests the MQTT publish is blocking for too long (>10 seconds), triggering the RTC watchdog.
Test: Disconnect the MQTT broker (unplug Ethernet cable) and watch the device behavior. After 10-15 seconds, device resets with same error.
Root cause: The mqtt.publish() call has no timeout. When the broker is unreachable, the underlying TCP socket blocks for 15+ seconds waiting for a response, triggering the watchdog.
Step 4: Apply the Fix
Add timeout to MQTT publish:
bool publishWithTimeout(constchar* topic,constchar* payload,int timeoutMs){unsignedlong startTime = millis();// Set socket timeout mqtt.setSocketTimeout(timeoutMs /1000);// Attempt publishbool success = mqtt.publish(topic, payload);// Check if we exceeded timeoutif(millis()- startTime > timeoutMs){ LOG_WARN("MQTT publish timed out!"); mqtt.disconnect();// Force disconnect hung connectionreturnfalse;}return success;}void loop(){ LOG_INFO("Sending to MQTT...");if(!publishWithTimeout("temp", String(temp).c_str(),5000)){ LOG_ERROR("MQTT publish failed or timed out");// Store data locally for retry on next cycle storeToSD(temp);}}
Step 5: Add Watchdog Timer Management
Feed the watchdog before any long-running operation:
#include <esp_task_wdt.h>void setup(){ esp_task_wdt_init(30,true);// 30-second watchdog esp_task_wdt_add(NULL);// Subscribe current task}void loop(){ esp_task_wdt_reset();// Pet the dog readSensor(); esp_task_wdt_reset();// Pet again before network ops connectWiFi(); esp_task_wdt_reset(); publishWithTimeout("temp", String(temp).c_str(),5000); esp_task_wdt_reset(); esp_deep_sleep_start();}
Step 6: Field Test
Deployed with fix: 7 days, zero resets. Problem solved!
Lessons learned:
Structured logging is essential for debugging intermittent field issues
Timeouts on all blocking operations (network, SD card, sensor reads)
Watchdog timers catch hangs, but you must feed them appropriately
Test failure scenarios (unplug broker, bad sensor, no SD card) before deployment
Decision Framework: When to Use Modular Architecture vs Monolithic
Project Characteristic
Monolithic (Single main.cpp)
Modular (Multiple files, classes)
Code size
<500 lines
>500 lines
Number of features
1-3 (read sensor, send data, sleep)
5+ (sensors, network, display, OTA, config UI)
Team size
1 developer
2+ developers
Testing approach
Manual on-device testing
Unit tests + integration tests
Maintenance timeline
Prototype (1-3 months)
Production (1+ years)
Code reuse
None (one-off project)
Multiple similar projects
Monolithic example (acceptable for simple projects):
Within 24 hours: 1. GitHub scanning bots find the API key 2. Credential stuffing attacks try the Wi-Fi password on other services 3. MQTT broker receives thousands of unauthorized connection attempts
Why it’s dangerous:
Git history is permanent: Even if you delete the file, credentials remain in commit history
Forks and clones: Once public, credentials spread uncontrollably
Credential reuse: Many users reuse passwords across services
In 60 Seconds
IoT prototyping best practices—version control for hardware and firmware, modular code architecture, hardware abstraction layers, and automated testing—dramatically reduce the time from concept to validated prototype.
The fix (secrets management strategy):
Option 1: Runtime Configuration (Best)
Store credentials in device EEPROM/NVS, configured via web interface:
#include <Preferences.h>Preferences prefs;void loadCredentials(){ prefs.begin("wifi",true);// Read-only String ssid = prefs.getString("ssid",""); String password = prefs.getString("password",""); prefs.end();if(ssid.length()==0){// First boot: start captive portal for configuration startConfigPortal();}}void startConfigPortal(){// Start Wi-Fi AP mode with web server WiFi.softAP("ESP32-Setup"); server.on("/", handleRoot);// HTML form for credentials server.on("/save", handleSave);// Save to NVS server.begin();}
Option 2: Separate Secrets File (Good)
// config.h (committed to Git)#include "secrets.h"// Not committed#define MQTT_SERVER "mqtt.example.com"#define MQTT_PORT 1883// secrets.h (in .gitignore)#define WIFI_SSID "HomeNetwork"#define WIFI_PASSWORD "MyPassword123"#define MQTT_USER "admin"#define MQTT_PASS "secretPassword"// secrets.h.template (committed to Git as example)#define WIFI_SSID "your_wifi_ssid"#define WIFI_PASSWORD "your_wifi_password"
.gitignore:
secrets.h
*.env
.env.*
Option 3: Environment Variables (CI/CD)
For automated builds, use platform environment variables:
// config.h#ifndef WIFI_SSID#define WIFI_SSID "default_ssid"// Fallback for local dev#endif// Build command:// platformio run -e esp32 -D WIFI_SSID="ActualSSID" -D WIFI_PASSWORD="ActualPassword"
Damage control if credentials leaked:
Rotate all secrets immediately (new API keys, new passwords)
Revoke old credentials at the service provider
Rewrite Git history (BFG Repo-Cleaner, but doesn’t help with forks)
Notify affected users if customer data was exposed
Enable 2FA on all accounts with leaked credentials
Prevention checklist:
✅ Add secrets.h, .env, *.key to .gitignore before first commit
✅ Use pre-commit hooks to scan for patterns like password=, api_key=
✅ Store production credentials in device NVS, not firmware
✅ Never commit real credentials, even temporarily
✅ Use secret scanning tools (GitHub Advanced Security, GitGuardian)
::
::
24.12 What’s Next
If you want to…
Read this
Apply best practices to a complete prototype project