439  Data Aggregation in WSN Routing

439.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Apply Data Aggregation: Implement in-network data processing to reduce communication overhead
  • Select Aggregation Functions: Choose appropriate functions (MIN, MAX, AVG, SUM) for different applications
  • Evaluate Aggregation Metrics: Measure accuracy, completeness, latency, and message overhead
  • Design Aggregation Trees: Structure networks for efficient data combination

439.2 Prerequisites

Before diving into this chapter, you should be familiar with:

439.3 Introduction

Data aggregation architecture
Figure 439.1: Data aggregation in sensor networks reducing traffic through in-network processing
Data aggregation example
Figure 439.2: Example of data aggregation showing multiple sensor readings combined at intermediate nodes

Data aggregation combines data from multiple sensors to reduce the number of transmissions, saving energy and bandwidth. This is one of the most effective energy-saving techniques in WSN routing.


439.4 Aggregation Benefits

439.4.1 Energy Savings

  • Fewer transmissions = less energy consumed
  • Particularly effective in dense networks
  • Can extend network lifetime by 70-90%

439.4.2 Bandwidth Efficiency

  • Reduces network congestion
  • Lowers collision probability
  • Improves delivery latency

439.4.3 Data Quality

  • Averaging reduces noise
  • Outlier detection and filtering
  • Temporal and spatial correlation exploitation

%% fig-alt: "Data aggregation architecture showing 100 sensors organized into 5 aggregation groups, reducing transmission from 100 to 5 messages at sink"
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ECF0F1', 'fontSize': '16px'}}}%%
graph TD
    subgraph "Data Aggregation in WSN"
        SENSORS["100 Sensor<br/>Nodes"]

        AGG1["Aggregator 1:<br/>20 readings<br/>→ 1 summary"]
        AGG2["Aggregator 2:<br/>20 readings<br/>→ 1 summary"]
        AGG3["Aggregator 3:<br/>20 readings<br/>→ 1 summary"]
        AGG4["Aggregator 4:<br/>20 readings<br/>→ 1 summary"]
        AGG5["Aggregator 5:<br/>20 readings<br/>→ 1 summary"]

        SINK["Sink:<br/>5 aggregates<br/>vs 100 raw"]
    end

    SENSORS -->|"Transmit raw data"| AGG1
    SENSORS -->|"Transmit raw data"| AGG2
    SENSORS -->|"Transmit raw data"| AGG3
    SENSORS -->|"Transmit raw data"| AGG4
    SENSORS -->|"Transmit raw data"| AGG5

    AGG1 -->|"Min/Max/Avg"| SINK
    AGG2 -->|"Min/Max/Avg"| SINK
    AGG3 -->|"Min/Max/Avg"| SINK
    AGG4 -->|"Min/Max/Avg"| SINK
    AGG5 -->|"Min/Max/Avg"| SINK

    SINK -.-> BENEFIT["Energy Savings:<br/>95% reduction<br/>in transmissions"]
    SINK -.-> TRADE["Trade-off:<br/>Information loss<br/>vs efficiency"]

    style SENSORS fill:#2C3E50,stroke:#16A085,stroke-width:3px,color:#fff
    style AGG1 fill:#16A085,stroke:#2C3E50,stroke-width:2px,color:#fff
    style AGG2 fill:#16A085,stroke:#2C3E50,stroke-width:2px,color:#fff
    style AGG3 fill:#16A085,stroke:#2C3E50,stroke-width:2px,color:#fff
    style AGG4 fill:#16A085,stroke:#2C3E50,stroke-width:2px,color:#fff
    style AGG5 fill:#16A085,stroke:#2C3E50,stroke-width:2px,color:#fff
    style SINK fill:#E67E22,stroke:#2C3E50,stroke-width:3px,color:#fff
    style BENEFIT fill:#D5F4E6,stroke:#16A085,stroke-width:2px
    style TRADE fill:#FADBD8,stroke:#E67E22,stroke-width:2px

Figure 439.3: Data aggregation architecture showing 100 sensors organized into 5 aggregation groups, reducing transmission from 100 to 5 messages at sink

439.5 Aggregation Functions

439.5.1 Simple Aggregation

Function Description Use Case
MIN Minimum value Lowest temperature, closest distance
MAX Maximum value Peak temperature, farthest reading
SUM Total of all values Total rainfall, cumulative count
COUNT Number of readings Active nodes, detection events
AVERAGE Mean value Average temperature across region

439.5.2 Statistical Aggregation

Function Description Use Case
MEDIAN Middle value Robust central tendency
VARIANCE Spread of values Variability assessment
STANDARD DEVIATION Measure of dispersion Quality control bounds
HISTOGRAM Distribution of values Pattern analysis

439.5.3 Duplicate-Sensitive Functions

Function Description Use Case
DISTINCT COUNT Count unique values Unique detection events
SET UNION Combine unique elements Aggregate unique IDs

439.5.4 Complex Queries

Function Description Use Case
TOP-K K largest/smallest values Hot spots, anomalies
THRESHOLD Values exceeding threshold Alert generation
EVENT DETECTION Specific patterns Intrusion, fire detection

439.6 Aggregation Metrics

439.6.1 1. Accuracy

Difference between aggregated result and true result:

Error = |Aggregated_Value - True_Value| / True_Value

Factors affecting accuracy: - Packet loss (missing readings) - Sensor calibration errors - Temporal misalignment

439.6.2 2. Completeness

Percentage of readings included in aggregate:

Completeness = Readings_Included / Total_Readings

Higher completeness = more representative aggregate. Trade-off with latency (waiting for stragglers).

439.6.3 3. Latency

Time from sensing to sink reception:

  • Aggregation introduces delay (waiting for multiple readings)
  • Must balance freshness vs. completeness
  • Application-dependent requirements

439.6.4 4. Message Overhead

Number of messages transmitted:

  • Primary benefit of aggregation
  • Measure: messages with vs. without aggregation
  • Target: 60-95% reduction typical

439.7 Application-Appropriate Aggregation

CautionPitfall: Over-Aggregating Critical Data

The Mistake: Applying aggressive data aggregation (averaging, compression) uniformly across all sensor data, then missing critical events because anomalies were smoothed away by spatial averaging.

Why It Happens: Aggregation dramatically reduces energy consumption (95% savings possible), so teams maximize it. However, if 1 out of 20 sensors detects a fire while 19 report normal, the cluster average temperature may remain below the alarm threshold.

The Fix: Use aggregation functions appropriate to your application:

  • For event detection: Use MAX or ANY-EXCEED-THRESHOLD instead of AVERAGE
  • For anomaly detection: Forward individual readings that exceed thresholds
  • Dual-path routing: Aggregated summaries for efficiency plus raw anomaly values on separate path
  • Threshold-aware aggregation: Cluster heads forward individual readings that exceed critical thresholds, even when aggregating others

Example Decision Matrix:

Application Recommended Function Rationale
Fire detection MAX or THRESHOLD Single hot spot must trigger alarm
Temperature monitoring AVERAGE Spatial average is meaningful
Intruder detection ANY (boolean OR) One detection = alert
Rainfall measurement SUM Total across region needed
Air quality AVERAGE with outliers Average, but flag anomalies

439.8 LEACH Clustering Demo

Explore how LEACH (Low-Energy Adaptive Clustering Hierarchy) balances energy consumption through randomized cluster head rotation. Watch energy depletion over multiple rounds and compare with direct transmission.

How LEACH Works:

  1. Cluster Head Election: Each round, nodes independently decide to become cluster heads (CH) based on probability p. Nodes that were recently CH have lower probability, ensuring rotation.

  2. Cluster Formation: Non-CH nodes join the nearest cluster head to minimize transmission distance.

  3. Data Aggregation: Members send data to their CH (short-distance, low energy). CHs aggregate data and transmit once to the sink (long-distance, high energy).

  4. Energy Balancing: By rotating the CH role, no single node bears the aggregation burden throughout network lifetime.

Key Observations:

  • Color gradient shows node energy levels (green=healthy, red=depleted)
  • CH markers show current cluster heads and cluster regions
  • Energy bars compare LEACH vs. direct transmission
  • Run multiple rounds to see how CH role distributes and energy depletes evenly

Energy Comparison:

Routing Method Energy Profile
Direct Transmission Nodes near sink deplete fast (hotspot)
Fixed Cluster Heads CHs die quickly, others survive
LEACH (rotating) Even depletion across all nodes

Expected Energy Savings:

For a 100-node network with 10 clusters: - Direct: 100 long-distance transmissions - LEACH: 100 short + 10 long transmissions - Savings: 40-60% depending on topology

CautionPitfall: Fixed Cluster Head Selection

The Mistake: Designating cluster heads based on initial battery levels or geographic position, then having the network partition when all cluster heads die simultaneously while regular nodes still have 70% battery.

Why It Happens: Fixed cluster heads seem logical - “put the strongest nodes in charge.” But cluster heads consume 3-5x more energy than regular nodes due to receiving from members, aggregation, and long-range transmission to the sink. Static assignment concentrates this drain on a few nodes.

The Fix: Implement cluster head rotation as in LEACH:

  • Each round, cluster heads are selected probabilistically
  • Every node serves approximately equal time as CH over network lifetime
  • Consider energy-weighted probability: nodes with more remaining energy have slightly higher CH probability
  • Monitor CH energy levels and trigger early rotation if a CH drops below 20% while others have 50%+

439.9 Worked Example: LEACH Energy Analysis

NoteWorked Example: LEACH Cluster Head Selection and Energy Analysis

Scenario: A precision agriculture deployment monitors soil moisture across a 500m × 500m vineyard. The WSN uses LEACH protocol with 100 sensor nodes reporting to a base station located at coordinate (250m, 0m) at the vineyard edge.

Given:

Parameter Value
Total nodes 100
Initial energy per node 2 J (Joules)
Cluster head probability (p) 10% (0.1)
Energy for transmission (E_tx) 50 nJ/bit
Energy for receiving (E_rx) 50 nJ/bit
Energy for data aggregation 5 nJ/bit/signal
Amplifier energy (E_amp) 100 pJ/bit/m^2
Data packet size 4000 bits
Average distance to base station 200 m
Average intra-cluster distance 35 m

Steps:

  1. Calculate expected cluster heads per round:
    • Expected CHs = n × p = 100 × 0.1 = 10 cluster heads
    • Average cluster size = 100 / 10 = 10 nodes per cluster (9 members + 1 CH)
  2. Calculate energy consumption for a regular node (cluster member):
    • Transmit to CH (short range): E_member = E_tx × k + E_amp × k × d²
    • E_member = (50 nJ/bit × 4000 bits) + (100 pJ/bit/m² × 4000 bits × 35² m²)
    • E_member = 200 µJ + 490 µJ = 690 µJ per round
  3. Calculate energy consumption for a cluster head:
    • Receive from 9 members: E_rx_total = 9 × E_rx × k = 9 × 50 nJ/bit × 4000 bits = 1,800 µJ
    • Aggregate 10 signals: E_agg = 10 × 5 nJ/bit/signal × 4000 bits = 200 µJ
    • Transmit to base station (long range): E_tx_bs = E_tx × k + E_amp × k × d²
    • E_tx_bs = (50 nJ/bit × 4000 bits) + (100 pJ/bit/m² × 4000 bits × 200² m²)
    • E_tx_bs = 200 µJ + 16,000 µJ = 16,200 µJ
    • Total CH energy = 1,800 + 200 + 16,200 = 18,200 µJ per round
  4. Calculate network energy per round:
    • 90 members × 690 µJ = 62,100 µJ
    • 10 CHs × 18,200 µJ = 182,000 µJ
    • Total network energy = 244,100 µJ = 244.1 mJ per round
  5. Estimate network lifetime:
    • Total initial energy = 100 nodes × 2 J = 200 J
    • Rounds until first node dies: ~200 J / 244.1 mJ = ~820 rounds
    • With CH rotation, energy depletes evenly, extending usable lifetime

Result: With LEACH and p=0.1, the network consumes 244.1 mJ per data collection round. Cluster heads consume 26× more energy than members (18,200 µJ vs 690 µJ), which is why rotation is critical. Without rotation, the 10 static CHs would die in ~110 rounds while other nodes retain 94% energy.

Key Insight: The LEACH probability p=0.1 creates 10 clusters optimized for this 100-node network. Increasing p to 0.2 would create 20 smaller clusters with shorter intra-cluster distances (less member energy) but more long-range CH-to-base transmissions (more total CH energy). For deployments where base station distance dominates, lower p values (larger clusters, fewer CHs) are more energy-efficient.


439.10 Knowledge Check

Question: In hierarchical routing protocols like LEACH, why do cluster heads perform data aggregation instead of simply forwarding all member data to the sink?

Explanation: Energy savings through aggregation: (1) Without aggregation: Cluster head (CH) receives data from 20 member nodes, forwards 20 separate packets to sink. CH transmissions: 20 packets. (2) With aggregation: CH receives 20 readings, computes aggregate (min/max/avg/count), forwards 1 summary packet to sink. CH transmissions: 1 packet. Energy impact: Radio transmission is expensive (20 mA for 100ms per packet = 2 mJ per packet). Without aggregation: 20 packets × 2 mJ = 40 mJ. With aggregation: 1 packet × 2 mJ + aggregation computation (~0.1 mJ) = 2.1 mJ. 95% energy savings! Trade-offs: (1) Information loss: Can’t reconstruct individual readings from aggregate. (2) Latency: Must wait for all members before aggregating.

Question: What is the “completeness” metric in data aggregation, and why does it matter?

Explanation: Completeness = (Readings included in aggregate) / (Total readings expected) × 100%. Measures what fraction of sensor data successfully reached the aggregator and contributed to the result. Example: Temperature monitoring with 100 sensors, cluster head expects 100 readings. Due to packet losses, only 87 readings received before aggregation deadline. Completeness = 87/100 = 87%. Why it matters: (1) Representativeness: Aggregate computed from 87 readings may not accurately represent the full area. (2) Decision quality: Fire detection system with 80% completeness might miss critical hot spots. (3) Trade-off with latency: Waiting longer for stragglers increases completeness but adds latency.

Question: In LEACH (Low-Energy Adaptive Clustering Hierarchy), how does the protocol distribute energy consumption evenly across all nodes?

Explanation: LEACH uses randomized rotation of cluster head roles: (1) Each round: Network operates in rounds (e.g., 20 seconds each). (2) Probabilistic selection: Each node decides independently whether to become cluster head based on probability P and rounds since last time as CH. (3) Even distribution: Over many rounds, every node spends approximately P% of time as CH. (4) Energy balancing: Since CH role consumes 3-5× more energy (receiving from members, aggregation, longer transmission), rotation prevents any single node from dying early. Math: With 100 nodes and P=5%, expect ~5 CHs per round. Each node becomes CH every ~20 rounds. Real impact: Agricultural WSN with 200 nodes, LEACH extends lifetime from 3 months (fixed CHs) to 18 months (rotation).


439.11 What’s Next?

Now that you understand data aggregation techniques, the next chapter explores Link Quality Based Routing, which uses metrics like ETX and RSSI to select reliable paths.

Continue to Link Quality Routing →