828  Wi-Fi Mesh Lab and Self-Healing

828.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Build ESP32 Mesh Networks: Configure painlessMesh or ESP-Wi-Fi-MESH for multi-node communication
  • Understand Self-Healing: Demonstrate automatic rerouting when mesh nodes fail
  • Calculate Hop Impact: Analyze how hop count affects latency and throughput
  • Select Root Nodes: Choose appropriate power sources for mesh gateway nodes
  • Compare Architectures: Decide when to use infrastructure, mesh, or direct modes

828.2 Prerequisites

Before diving into this chapter, you should be familiar with:

NoteKey Takeaway

In one sentence: Wi-Fi mesh networks automatically route around failed nodes (self-healing), but each additional hop increases latency and reduces effective bandwidth.

Remember this rule: Mesh relay nodes must stay awake (use mains/PoE power), while battery-powered devices should be leaf nodes that sleep aggressively.

828.3 Interactive Lab: ESP32 Wi-Fi Mesh Network

Let’s build a Wi-Fi mesh network using multiple ESP32 devices that automatically route messages and self-heal when nodes fail!

NoteLab Setup

Hardware (Simulated): - 4x ESP32 DevKit v1 (mesh nodes) - Each node has temperature sensor (simulated) - Root node connects to Wi-Fi router

What This Lab Does: 1. Creates 4-node mesh network 2. Each node broadcasts temperature data 3. Messages auto-route through mesh 4. Demonstrates self-healing when node fails 5. (Optional extension) Bridge mesh data to an MQTT broker via a gateway/root node

828.4 Mesh Messaging Simulation (Arduino painlessMesh)

This section has two parts:

  1. A small interactive toy model to build intuition about hop count and airtime.
  2. A hardware-oriented ESP32 example using painlessMesh.

828.4.1 Interactive: Hop Count vs Airtime (Toy Model)

828.4.2 Hardware Lab: ESP32 Mesh Messaging (painlessMesh)

This code is intended for real ESP32 hardware (or a simulator that supports multi-node mesh libraries). Use it as a starting point for a physical lab build and adapt credentials/keys for your environment.

Code Explanation:

#include "painlessMesh.h"
#include <Arduino_JSON.h>

// Mesh network credentials
#define MESH_PREFIX     "IoT_Mesh_Network"
#define MESH_PASSWORD   "mesh_password_123"
#define MESH_PORT       5555

// Node identification
String nodeName = "TempSensor_";  // Will append node ID
int nodeNumber = 0;  // Set uniquely for each node (0-3)

Scheduler userScheduler;
painlessMesh mesh;

// Task to send sensor data periodically
void sendMessage();
Task taskSendMessage(TASK_SECOND * 5, TASK_FOREVER, &sendMessage);

// Simulate temperature reading
float readTemperature() {
  // Each node returns slightly different temp
  return 20.0 + nodeNumber * 2.5 + random(-10, 10) / 10.0;
}

// Send temperature data to mesh
void sendMessage() {
  float temp = readTemperature();

  // Create JSON message
  JSONVar msg;
  msg["node"] = nodeName + String(mesh.getNodeId());
  msg["type"] = "temperature";
  msg["value"] = temp;
  msg["unit"] = "C";
  msg["timestamp"] = millis();

  String str = JSON.stringify(msg);

  // Broadcast to all nodes in mesh
  mesh.sendBroadcast(str);

  Serial.printf("Sent: %s = %.1f C (NodeID: %u)\n",
                nodeName.c_str(), temp, mesh.getNodeId());
}

// Callback when message received
void receivedCallback(uint32_t from, String &msg) {
  Serial.printf("Received from %u: %s\n", from, msg.c_str());

  // Parse JSON
  JSONVar myObject = JSON.parse(msg);

  if (JSON.typeof(myObject) == "undefined") {
    Serial.println("JSON parsing failed");
    return;
  }

  String node = JSON.stringify(myObject["node"]);
  String type = JSON.stringify(myObject["type"]);
  double value = myObject["value"];

  Serial.printf("   Node: %s, Type: %s, Value: %.1f\n",
                node.c_str(), type.c_str(), value);
}

// Callback when new node joins mesh
void newConnectionCallback(uint32_t nodeId) {
  Serial.printf("New node joined! NodeID: %u\n", nodeId);
  Serial.printf("   Total nodes in mesh: %d\n", mesh.getNodeList().size() + 1);
}

// Callback when node leaves mesh
void changedConnectionCallback() {
  Serial.printf("Mesh topology changed\n");
  Serial.printf("   Connected nodes: %d\n", mesh.getNodeList().size() + 1);

  // Print node list
  Serial.print("   NodeIDs: ");
  auto nodes = mesh.getNodeList();
  for (auto &&id : nodes) {
    Serial.printf("%u ", id);
  }
  Serial.println();
}

// Callback when mesh time is adjusted
void nodeTimeAdjustedCallback(int32_t offset) {
  Serial.printf("Time adjusted by %ld us\n", (long)offset);
}

void setup() {
  Serial.begin(115200);
  delay(1000);

  Serial.println("\n\n=== ESP32 Wi-Fi Mesh Node Starting ===");
  Serial.printf("Node: %s%d\n", nodeName.c_str(), nodeNumber);

  // Append node number to name
  nodeName += String(nodeNumber);

  // Set mesh debugging
  mesh.setDebugMsgTypes(ERROR | STARTUP | CONNECTION);

  // Initialize mesh network
  mesh.init(MESH_PREFIX, MESH_PASSWORD, &userScheduler, MESH_PORT);

  // Register callbacks
  mesh.onReceive(&receivedCallback);
  mesh.onNewConnection(&newConnectionCallback);
  mesh.onChangedConnections(&changedConnectionCallback);
  mesh.onNodeTimeAdjusted(&nodeTimeAdjustedCallback);

  // Add task to send messages
  userScheduler.addTask(taskSendMessage);
  taskSendMessage.enable();

  Serial.println("Mesh network initialized!");
  Serial.printf("   Node ID: %u\n", mesh.getNodeId());
}

void loop() {
  mesh.update();  // Maintain mesh connections
}

828.5 Interactive Challenges

828.5.1 Challenge 1: Mesh Topology - How Many Hops?

You have a mesh network with this topology:

Diagram: ROOT

Diagram: ROOT
Figure 828.1

Question: A message from Sensor Node D needs to reach the Root Router. How many “hops” does the message make?

Click for hint

A “hop” is each time a message passes through a mesh node.

Count the arrows: D to C to B to A to Root How many arrows are there?

Click for answer

Answer: 4 hops

Step-by-step message path:

Diagram: D

Diagram: D
Figure 828.2

Why this matters:

Latency and airtime: - Each hop adds processing/queueing delay and consumes additional airtime. - More hops generally means higher latency and lower effective throughput.

Power consumption: - Each hop costs battery power - Node C relays: D’s messages + B’s messages + own messages - Middle nodes drain battery faster than edge nodes

Bandwidth impact: - Each hop “re-uses” the wireless channel - 4 hops = message transmitted 4 times over the air - In shared-channel designs, multi-hop forwarding can reduce the best-case capacity available to each flow (a simple upper bound is ~1/(hops+1)).

Optimal mesh design: - Minimize hops: Keep hop counts low for critical traffic - Place nodes strategically: Avoid long linear chains - Use multiple paths: Redundancy improves reliability

Better topology for same 5 nodes:

Diagram: ROOT

Diagram: ROOT
Figure 828.3

Now any sensor reaches the root in fewer hops than a long linear chain.

Pro tip: ESP-Wi-Fi-MESH automatically finds shortest path. You don’t manually configure routes - the mesh protocol handles it.


828.5.2 Challenge 2: Self-Healing - What Happens When Node B Fails?

Original topology:

Diagram: ROOT

Diagram: ROOT
Figure 828.4

Node B suddenly fails (battery dies, power loss, crash).

Question: What happens to messages from Sensor D? Can they still reach the Root?

Scenario options: - A) Messages lost forever (no route to Root) - B) Mesh automatically reroutes through alternate path - C) Sensor D connects directly to Root (too far away) - D) All nodes restart and rebuild mesh

Click for hint

Remember: Mesh networks are “self-healing” - they automatically find alternate paths when nodes fail.

In a proper mesh, nodes should have multiple neighbor connections, not just a single linear path.

Realistic topology (with redundant paths):

Diagram: ROOT

Diagram: ROOT
Figure 828.5
Click for answer

Answer: B) Mesh automatically reroutes through alternate path

What happens step-by-step:

Before failure:

Diagram: ROOT

Diagram: ROOT
Figure 828.6

Node B fails:

Diagram: ROOT

Diagram: ROOT
Figure 828.7

Self-healing process (conceptual):

  1. Detection: Neighbors stop receiving expected frames/acks and mark the link down.
  2. Route selection: Nodes search for a new parent/next hop based on link quality and path cost.
  3. Rerouting: Traffic resumes along the new path once routing reconverges.

New topology:

Diagram: ROOT

Diagram: ROOT
Figure 828.8

Key points:

  • Messages can resume if an alternate path exists
  • No manual intervention (automatic rerouting)
  • No node restart required (only failed node is offline)
  • Some delay/loss is possible during reconvergence

In a well-designed mesh (with redundancy):

If topology had multiple paths from the start:

Diagram: ROOT

Diagram: ROOT
Figure 828.9

Then C already knows an alternate path through A, so recovery is typically faster (often under 1 second).

Code example - detecting topology changes:

void changedConnectionCallback() {
  Serial.printf("Mesh topology changed!\n");

  // List current connections
  auto nodes = mesh.getNodeList();
  Serial.printf("   Connected nodes: %d\n", nodes.size() + 1);

  if (nodes.size() < 2) {
    Serial.println("   WARNING: Low node count, limited redundancy!");
  }

  // Mesh automatically reroutes - no action needed
}

Reality check: Recovery time varies widely by stack, topology, traffic load, and RF conditions. For critical IoT, design redundancy (multiple neighbor links) so a single node failure doesn’t isolate edge devices.


828.5.3 Challenge 3: Root Node Selection

In ESP-Wi-Fi-MESH, one node must be the “Root Node” that connects to the Wi-Fi router and internet.

Question: You have 4 nodes: - Node A: Battery-powered (small battery) - Node B: Powered by USB adapter / mains (always on) - Node C: Solar-powered (intermittent unless designed with storage) - Node D: Battery-powered (larger pack)

Which node should be the Root Node?

Click for hint

The Root Node has special responsibilities: - Always stays awake (can’t deep sleep) - Connects to Wi-Fi router (extra power for two radios) - Routes ALL traffic to/from internet - Single point of failure (if root dies, mesh loses internet)

Which power source is most reliable and has highest capacity?

Click for answer

Answer: Node B (USB powered, always on) is the best choice.

Why: - The root/gateway must stay awake, maintain the upstream link, and forward other nodes’ traffic. - Battery-only roots drain quickly because they cannot deep-sleep like leaf sensors. - Solar can work, but only if engineered like an always-on gateway (panel sizing + storage + worst-case weather).

How ESP-Wi-Fi-MESH selects root automatically (high level):

// Root election is framework-specific. Common factors include:
// - Upstream link quality to the router/AP
// - Node centrality / number of neighbors
// - Stability (power/uptime), if the stack accounts for it
// Consult your framework docs for the exact behavior.

You can manually set root node:

// Force Node B to become root
mesh.setRoot(true);  // This node becomes root
mesh.setContainsRoot(true);  // Helps root discovery

// On other nodes
mesh.setRoot(false);  // Don't become root

Optimal mesh power design:

Diagram: POWERED

Diagram: POWERED
Figure 828.10: For multi-year battery requirements at scale, consider lower-power radios (Thread/Zigbee/LPWAN) instead of Wi-Fi mesh.

Pro tip: Place root node where you have reliable AC power and good Wi-Fi router signal. All other mesh design decisions flow from root node placement.


828.5.4 Challenge 4: Mesh vs Infrastructure - When to Use Which?

You’re designing IoT deployments for three scenarios. Choose the best Wi-Fi architecture for each:

Scenario 1: Smart Home (100 sqm, 2-story house) - 25 smart devices (lights, sensors, cameras) - Good Wi-Fi router centrally located

Scenario 2: Industrial Warehouse (large indoor floor with metal shelving) - 80 temperature sensors spread across the site - Metal racks/aisles create dead zones and reflections

Scenario 3: Outdoor Smart Farm (large fields) - 50 soil moisture sensors spread across a wide area - Limited existing power infrastructure

For each scenario, choose: - A) Single Wi-Fi router (infrastructure mode) - B) Wi-Fi extenders (2-3 extenders + router) - C) Wi-Fi mesh network (multiple mesh nodes)

Click for hint

Consider for each scenario: - Area coverage: Can single router reach all devices? - Obstacles: Do walls, metal, or outdoor distance block Wi-Fi? - Backhaul: Do you have wired uplinks (best) or must you use wireless backhaul? - Roaming: Do devices move and need seamless handoff? - Power + maintenance: Can you power always-on nodes, and can you service them if they fail?

Click for answer

Answers:

Scenario 1: Smart Home - A) Single Wi-Fi router

Why: A centrally placed AP/router often covers a typical home, and a single SSID is easy to manage. If you discover dead zones, add an additional AP or move to mesh.

When to upgrade to mesh: - If you consistently have dead zones despite good placement - If you need seamless coverage across multiple floors/areas - If you can power additional nodes and want centralized management


Scenario 2: Industrial Warehouse - C) Wi-Fi mesh network

Why: A warehouse usually needs multiple RF points because racks/aisles create dead zones. Mesh is one way to extend coverage with a single SSID and some self-healing. In enterprise setups, multiple wired APs (controller-managed) can also be a strong option; among the choices here, mesh captures the “multi-node” requirement.

Pros: - Extends coverage across a large site - Can reroute around node failures (topology dependent) - Same SSID (sensors auto-connect to nearest node) - Scalable (easily add more nodes later)

Cons: - More complex than a single AP (placement, backhaul, troubleshooting) - Requires powered nodes and ongoing management - Wireless hops can reduce capacity; validate with a site survey and real traffic


Scenario 3: Outdoor Smart Farm - C) Wi-Fi mesh network (but with special considerations)

Why: If you must use Wi-Fi over a wide outdoor area, you’ll likely need powered relay points (solar + storage or mains) and careful antenna placement. In practice, many farms choose LPWAN options (LoRaWAN, NB-IoT/LTE-M) because Wi-Fi’s association overhead and always-on relay requirement can be a poor fit for multi-year batteries.


828.5.5 Summary Table

Scenario Best Wi-Fi choice (given options) Key considerations
Smart Home Single router (A) Start simple; move to multi-node only if you have persistent dead zones
Warehouse Mesh (C) Plan RF placement and backhaul; prefer wired uplinks or dedicated backhaul where possible
Farm Mesh (C), but reconsider Wi-Fi Power availability and maintenance dominate; LPWAN is often a better match

Decision checklist: - If a single AP covers the space with good RSSI/SNR - start with infrastructure mode. - If you need multiple RF points and want one SSID with simpler management - mesh (or controller-managed wired APs). - If you can’t power always-on relays or need multi-year batteries at scale - consider LPWAN instead of Wi-Fi.


828.6 Lab Takeaways

After completing this lab, you should understand:

  • Wi-Fi mesh networking - How multiple nodes self-organize and route messages
  • Self-healing - Automatic rerouting when nodes fail
  • Multi-hop communication - Messages relay through intermediate nodes
  • Root node selection - Importance of reliable power for root
  • Mesh vs infrastructure - When to use each architecture
  • ESP32 mesh frameworks - Build a simple mesh with painlessMesh (Arduino) and compare with ESP-IDF ESP-Wi-Fi-MESH

Next Steps: - Modify code to add more mesh nodes (test scalability) - Simulate node failures (disconnect power to test self-healing) - Add MQTT integration (root node publishes to broker) - Measure latency across different hop counts - Experiment with different mesh topologies (linear, star, grid)

828.7 Summary

This chapter covered hands-on Wi-Fi mesh implementation:

  • ESP32 Mesh Frameworks: painlessMesh (Arduino) provides easy mesh setup with automatic routing and self-healing
  • Hop Count Impact: Each hop increases latency and reduces effective bandwidth; minimize hops for critical traffic
  • Self-Healing Behavior: Mesh automatically reroutes around failed nodes, but recovery time varies by topology and stack
  • Root Node Requirements: Root/gateway nodes must be mains-powered; battery-powered devices should be leaf nodes
  • Architecture Selection: Choose mesh for large areas needing coverage; consider LPWAN for power-constrained outdoor deployments

828.8 What’s Next

The next chapter explores Wi-Fi MAC and Applications, covering CSMA/CA channel access, QoS traffic differentiation, and real-world IoT application examples including smart home, industrial, agriculture, and healthcare deployments.