46  Edge Data Architecture

In 60 Seconds

Edge data acquisition is the process of collecting sensor data at the network periphery and deciding what to process locally versus what to send to the cloud. IoT devices fall into three categories – Big Things (servers), Small IP Things (smart cameras), and Non-IP Things (simple sensors needing gateways) – each requiring different acquisition strategies based on their connectivity and processing capabilities.

46.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Classify IoT Data Sources: Distinguish between Big Things, Small IP Things, and Non-IP Things in edge architectures
  • Explain Device Connectivity Paths: Describe how different device types connect to cloud infrastructure through direct IP or gateways
  • Analyze Data Generation Rates: Calculate data volumes across device categories and assess their implications for edge processing
  • Design Data Acquisition Strategies: Select and justify appropriate transmission schedules based on device capabilities and constraints

46.2 Prerequisites

Before diving into this chapter, you should be familiar with:

  • Edge, Fog, and Cloud Overview: Understanding the three-tier IoT architecture provides context for where edge data acquisition fits
  • Sensor Fundamentals: Knowledge of sensor types and characteristics helps understand data acquisition requirements

Think of edge data acquisition like a local newspaper reporter versus a national news network.

A local reporter (edge device) collects news from the neighborhood and decides what’s important enough to send to the national headquarters (cloud). They don’t send everything - just the highlights. This saves time, money, and keeps headquarters from being overwhelmed.

The “Edge” is simply where your sensors live:

Location Example Why “Edge”?
Your thermostat Living room wall At the edge of your network
Factory sensor On a machine Far from the central servers
Traffic camera Roadside pole Collecting data at the source

Three types of “Things” at the edge:

  1. Big Things - Computers, servers (they can talk to the internet directly)
  2. Small IP Things - Smart bulbs, webcams (they have their own internet connection)
  3. Non-IP Things - Simple sensors that need a “translator” (gateway) to reach the internet

Why process data at the edge instead of sending everything to the cloud?

Challenge Without Edge With Edge
Speed Wait for cloud response Instant local decisions
Battery Constant transmission drains battery Send only summaries, save power
Bandwidth Network gets clogged Only important data travels far
Privacy All your data goes to remote servers Sensitive data stays local

Real-world example: A security camera generates 1GB of video per hour. Instead of sending all that to the cloud, edge processing detects “motion” and only uploads the 5-second clips that matter.

46.3 Introduction to Edge Data Acquisition

Time: ~5 min | Difficulty: Foundational | Reference: P10.C08.U01

Key Concepts

  • Edge acquisition architecture: The hardware and software design of a system that captures, validates, and pre-processes sensor data at or near the source before transmission to higher processing tiers.
  • Sensor interface bus: The low-level communication protocol connecting sensors to a microcontroller or gateway — common IoT interfaces include I2C, SPI, UART, and ADC.
  • Data aggregation gateway: A device that collects raw readings from multiple nearby sensors, applies local processing (averaging, event detection), and forwards summarised data to the cloud.
  • Ring buffer: A circular fixed-size memory structure used in edge devices to store a rolling window of recent sensor readings without dynamic memory allocation.
  • Interrupt-driven sampling: A microcontroller technique where a hardware timer interrupt triggers sensor reads at precise intervals, ensuring consistent sample timing without busy-wait polling.
  • DMA (Direct Memory Access): A hardware mechanism allowing peripherals (ADC, sensor buses) to transfer data directly to memory without CPU intervention, freeing the processor for other tasks.

Edge data acquisition is the process of collecting, processing, and transmitting sensor data at the network periphery - where physical devices meet the digital infrastructure. This chapter explores the fundamental architecture and device categories that form the foundation of efficient data collection at the IoT edge.

Key Takeaway

In one sentence: Collect raw data at the edge, but only transmit what’s needed - 90% of IoT data is never analyzed.

Remember this rule: If you can’t name who will use the data and how, don’t collect it.

Why Edge Matters

Traditional cloud-centric architectures require all sensor data to travel to remote servers for processing. Edge data acquisition shifts some processing closer to the source, reducing:

  • Latency: Critical for time-sensitive applications (autonomous vehicles, industrial safety)
  • Bandwidth: Raw sensor streams can overwhelm network capacity
  • Energy: Transmitting data is 10-100x more power-intensive than local processing
  • Privacy: Sensitive data can be processed locally without cloud exposure

46.4 IoT Device Categories

Time: ~10 min | Difficulty: Intermediate | Reference: P10.C08.U02

The key sources of data in IoT are the ‘Things’ - the physical devices and controllers located on Level 1 of the IoT Reference Model. Things can be accessed directly to send and receive data, however, to be IoT ‘Things’, they must be connected to the Internet.

46.4.1 Three Categories of Things

Diagram showing three IoT device categories -- Big Things (servers, PLCs) with direct IP connectivity, Small IP Things (cameras, smart devices) with Wi-Fi or cellular links, and Non-IP Things (simple sensors) requiring gateways for protocol translation to reach cloud infrastructure.
Figure 46.1: IoT Device Categories and Gateway Connectivity Paths

Big Things might be computers and databases. Small IP-enabled Things could include webcams, lights, and smartphones. Non-IP Things may need a Gateway or other device to assist - examples include lights, temperature gauges, locks, and gates.

46.4.2 Device Characteristics Comparison

Category Examples Connectivity Data Rate Processing Capability
Big Things Servers, industrial PLCs Full IP stack GB/day High (full OS)
Small IP Things Smart cameras, lights Wi-Fi, cellular MB/day Medium (embedded)
Non-IP Things Temperature sensors, door locks Zigbee, BLE, Modbus KB/day Low (microcontroller)

46.5 Data Generation Patterns

Time: ~8 min | Difficulty: Intermediate | Reference: P10.C08.U02b

Understanding data generation patterns is essential for designing efficient edge acquisition systems. Different device types produce vastly different data volumes and require different handling strategies.

Table showing data generation statistics for common IoT device categories. Big Things like computers generate megabytes to gigabytes per day with continuous connectivity. Small IP Things such as webcams and smart lights generate kilobytes to megabytes with periodic updates. Non-IP Things including temperature sensors and door locks generate bytes to kilobytes with event-triggered transmission. Columns include device type, typical data rate, transmission frequency, and connectivity requirements.
Figure 46.2: Data generation statistics for IoT devices
Comparison table of IoT data generation rates showing volume per device, aggregated network load, and storage requirements. Rows compare sensor types: environmental sensors at 1-10 samples per minute requiring minimal bandwidth, industrial vibration sensors at 1-5 kHz requiring significant bandwidth, and video cameras at 1-30 fps consuming megabits per second. The table highlights the 1000x difference in data rates between simple sensors and rich media sources, emphasizing why edge processing and data reduction are essential for scalable IoT deployments.
Figure 46.3: Data generation rates and volumes comparison table

This view shows how data generation rates vary dramatically by device type, driving different edge processing strategies:

Bar chart or diagram comparing data volumes generated by different IoT device categories, from low-volume Non-IP sensors generating kilobytes per day to high-volume IP cameras generating gigabytes per day.

Different device types require different edge processing strategies based on their data generation rates and the value of raw versus processed data.

46.5.1 Inertial Measurement Example

High-frequency sensors like accelerometers and gyroscopes demonstrate why edge aggregation is critical:

Time-series plots showing raw accelerometer and gyroscope sensor data from a motion tracking device. Top panel displays 3-axis accelerometer readings in g-forces with X, Y, Z traces showing device orientation changes and movement events. Bottom panel shows 3-axis gyroscope readings in degrees per second capturing rotational velocity. Both plots span approximately 10 seconds of continuous sampling at 100 Hz, demonstrating the high-frequency nature of inertial measurement unit data and the need for efficient edge aggregation to reduce transmission bandwidth.
Figure 46.4: Accelerometer and gyroscope example sensor data

At 100 Hz sampling across 6 axes (3 accelerometer + 3 gyroscope), an IMU generates 600 samples/second. Transmitting raw data as 16-bit integers would require ~1.2 KB/s – unsustainable for battery-powered devices on LPWAN networks. Edge aggregation reduces this to statistical summaries at 1 Hz.

How much bandwidth does edge aggregation save for IMU data?

Raw transmission (no edge processing):

  • Sampling rate: 100 Hz per axis × 6 axes = 600 samples/sec
  • Data size: 2 bytes per sample (int16) × 600 = 1200 bytes/sec = 1.17 KB/s
  • Daily volume: \(1.17 \text{ KB/s} \times 86400 \text{ s/day} = 101 \text{ MB/day}\)
  • LoRa constraint: 1% duty cycle at SF7 allows ~250 bytes/minute = 4.2 bytes/sec → Raw transmission exceeds capacity by 285×

Edge aggregation (1 Hz statistical summaries):

  • Window: 100 samples (1 second) per axis
  • Summary: 2 values per accel axis (RMS, peak) + 1 value per gyro axis (RMS) = 9 values
  • Data size: 2 bytes × 9 values = 18 bytes per second
  • Daily volume: \(18 \text{ bytes/s} \times 86400 = 1.56 \text{ MB/day}\)
  • Bandwidth reduction: \(\frac{1200 \text{ B/s}}{18 \text{ B/s}} = 67\times\)

For LoRa deployment: Transmit aggregated summaries every 10 seconds: - Payload: 18 bytes/s x 10s = 180 bytes per transmission - Frequency: 6x/minute = 360 transmissions/hour - Fits within LoRa 1% duty cycle? Each transmission ~0.6s airtime at SF7 -> 3.6 min/hour = 6% duty cycle (exceeds 1% limit!) - Further optimization: Aggregate to 60s summaries: 108 bytes per transmission, 1x/minute -> 0.6s/60s = 1% duty cycle (borderline) - Practical solution: Aggregate to 120s summaries: 216 bytes per transmission, 0.5x/minute -> well within 1% duty cycle

Edge processing is mandatory for battery-powered IMU devices on LPWAN networks.

46.5.2 Interactive: Edge Aggregation Bandwidth Calculator

Use the sliders below to explore how sampling rate, number of axes, and aggregation window affect raw versus aggregated data rates. Observe how quickly raw data exceeds LPWAN capacity.

46.6 Power Budget Decision Framework

Time: ~5 min | Difficulty: Intermediate | Reference: P10.C08.U02c

Device capabilities directly impact acquisition strategies. The key decision point is power source:

  • Mains-powered devices (factory equipment, building systems): Can sample continuously and transmit frequently – edge processing focuses on bandwidth reduction, not power savings
  • Battery-powered devices (field sensors, wearables): Must duty-cycle both sampling and transmission – edge processing is essential to extend battery life from days to years
  • Energy-harvesting devices (solar-powered nodes): Operate with variable power budgets – edge processing must adapt to available energy

This decision tree visualizes how to select the optimal duty cycle based on power constraints:

Decision tree for IoT edge device power budget, showing branching paths from mains-powered versus battery-powered devices through duty cycle selection, sampling rate optimization, and transmission scheduling strategies.

46.7 Code Example: IMU Edge Aggregation Pipeline

This Python example demonstrates edge aggregation for the inertial measurement use case discussed above. A 100 Hz IMU produces 600 samples/second across 6 axes. Transmitting raw data is unsustainable, so the edge pipeline computes 1 Hz statistical summaries (RMS and peak per accelerometer axis, RMS per gyroscope axis), reducing bandwidth by ~67x:

import math
import time

class IMUEdgeAggregator:
    """Aggregate high-frequency IMU data into 1 Hz statistical summaries.

    Reduces 600 samples/second (100 Hz x 6 axes) to 9 summary values
    per second, cutting transmission from 1200 bytes/s to 18 bytes/s
    (67x reduction).
    """
    def __init__(self, sample_rate_hz=100, window_sec=1):
        self.sample_rate = sample_rate_hz
        self.window_size = sample_rate_hz * window_sec
        self.buffer_accel = {"x": [], "y": [], "z": []}
        self.buffer_gyro = {"x": [], "y": [], "z": []}

    def add_sample(self, ax, ay, az, gx, gy, gz):
        """Add one raw IMU sample (called at 100 Hz)."""
        self.buffer_accel["x"].append(ax)
        self.buffer_accel["y"].append(ay)
        self.buffer_accel["z"].append(az)
        self.buffer_gyro["x"].append(gx)
        self.buffer_gyro["y"].append(gy)
        self.buffer_gyro["z"].append(gz)

    def _rms(self, values):
        """Root mean square -- captures vibration energy."""
        if not values:
            return 0.0
        return math.sqrt(sum(v * v for v in values) / len(values))

    def _peak(self, values):
        """Peak absolute value -- detects impacts."""
        if not values:
            return 0.0
        return max(abs(v) for v in values)

    def window_ready(self):
        """Check if enough samples collected for one summary."""
        return len(self.buffer_accel["x"]) >= self.window_size

    def compute_summary(self):
        """Compute 1 Hz summary from buffered samples.

        Returns dict with RMS and peak for each axis -- enough
        to detect vibration anomalies without raw data.
        """
        summary = {"timestamp": int(time.time()), "samples": self.window_size}
        for axis in ["x", "y", "z"]:
            accel = self.buffer_accel[axis][:self.window_size]
            gyro = self.buffer_gyro[axis][:self.window_size]
            summary[f"accel_{axis}_rms"] = round(self._rms(accel), 4)
            summary[f"accel_{axis}_peak"] = round(self._peak(accel), 4)
            summary[f"gyro_{axis}_rms"] = round(self._rms(gyro), 2)

        # Clear processed samples
        for axis in ["x", "y", "z"]:
            self.buffer_accel[axis] = self.buffer_accel[axis][self.window_size:]
            self.buffer_gyro[axis] = self.buffer_gyro[axis][self.window_size:]
        return summary

    def estimate_bandwidth(self):
        """Compare raw vs aggregated data rates."""
        raw_bytes_per_sec = self.sample_rate * 6 * 2  # 6 axes, 2 bytes each
        summary_bytes = 9 * 2  # 9 summary values, 2 bytes each
        ratio = raw_bytes_per_sec / summary_bytes
        return {
            "raw_bytes_per_sec": raw_bytes_per_sec,
            "summary_bytes_per_sec": summary_bytes,
            "reduction_ratio": f"{ratio:.0f}x",
        }

# Simulate: factory motor vibration monitoring
agg = IMUEdgeAggregator(sample_rate_hz=100, window_sec=1)

# Feed 100 simulated samples (1 second of data)
import random
for i in range(100):
    # Normal vibration: small accelerations around 0g with noise
    agg.add_sample(
        ax=random.gauss(0, 0.05), ay=random.gauss(0, 0.05),
        az=random.gauss(1.0, 0.03),  # 1g gravity on Z
        gx=random.gauss(0, 2), gy=random.gauss(0, 2),
        gz=random.gauss(0, 1)
    )

if agg.window_ready():
    summary = agg.compute_summary()
    print("1-second summary (transmitted via LoRa):")
    for key, val in summary.items():
        print(f"  {key}: {val}")

bw = agg.estimate_bandwidth()
print(f"\nBandwidth: {bw['raw_bytes_per_sec']} B/s raw -> "
      f"{bw['summary_bytes_per_sec']} B/s summary = "
      f"{bw['reduction_ratio']} reduction")
# Output:
# 1-second summary (transmitted via LoRa):
#   timestamp: 1738900000
#   samples: 100
#   accel_x_rms: 0.0498
#   accel_x_peak: 0.1523
#   accel_y_rms: 0.0512
#   accel_y_peak: 0.1389
#   accel_z_rms: 1.0004
#   accel_z_peak: 1.0891
#   gyro_x_rms: 1.98
#   gyro_y_rms: 2.05
#   gyro_z_rms: 0.99
#
# Bandwidth: 1200 B/s raw -> 18 B/s summary = 67x reduction

The edge device transmits only RMS (vibration energy) and peak (impact detection) values at 1 Hz instead of raw waveforms at 100 Hz. A sudden increase in accel_x_rms from 0.05g to 0.5g flags a developing bearing fault without requiring cloud-side waveform analysis.

46.8 Knowledge Check

The three types of “Things” at the edge!

The Sensor Squad was setting up a smart garden, and they discovered three very different kinds of helpers:

Sammy the Sensor was a tiny temperature sensor with no internet connection. “I am a Non-IP Thing,” he explained. “I can only whisper my readings to Max using a special short-range language called Zigbee. I need Max to translate for me!”

Lila the LED was a smart camera with Wi-Fi built in. “I am a Small IP Thing! I can talk to the internet all by myself, but I am not as powerful as a big computer.”

Max the Microcontroller was connected to a powerful Raspberry Pi gateway. “I am like a Big Thing – I can run programs, store data, and talk directly to the cloud. My job is to listen to Sammy and help him get his messages to the internet.”

Bella the Battery reminded everyone: “Remember, sending data far away uses a LOT of my energy! So Max should only send the important stuff to the cloud – like when the temperature is too hot for the tomatoes. The normal readings can stay here.”

The lesson: Different devices have different abilities, and smart IoT systems match the right strategy to each type of device!

46.8.1 Worked Example: Fonterra Dairy Farm Edge Acquisition Design

Scenario: Fonterra, New Zealand’s largest dairy cooperative, deploys IoT sensors across 200 milking sheds to monitor milk quality and cow health in real-time. Each shed has a mix of device categories requiring different acquisition strategies.

Given:

  • 200 sheds, each with the following devices:
    • 1 SCADA controller (Big Thing) – monitors vat temperature, logs milk volume
    • 8 IP cameras (Small IP Things) – 1080p at 15 fps for mastitis detection via udder imaging
    • 40 Non-IP sensors per shed: 20 milk flow meters (Modbus RTU), 10 temperature probes (4-20 mA), 10 cow ID readers (RFID 134.2 kHz)
  • Rural connectivity: 4G LTE at NZD 12/GB, typical 15 Mbps downlink / 5 Mbps uplink
  • Power: Mains-powered shed, solar-powered paddock sensors

Step 1: Classify devices and estimate raw data

Device Category Count (per shed) Raw Data Rate Daily Raw (per shed)
SCADA controller Big Thing 1 50 KB/hour 1.2 MB
IP cameras Small IP Things 8 6.75 MB/min each 77.8 GB
Milk flow meters Non-IP Things 20 2 readings/sec x 4 bytes 14 MB
Temperature probes Non-IP Things 10 1 reading/10 sec x 2 bytes 0.17 MB
RFID readers Non-IP Things 10 Event-based, ~400 events/day x 12 bytes 0.05 MB

Step 2: Design acquisition strategy per category

Category Strategy Edge Processing Transmitted Data
Big Thing (SCADA) Direct IP upload None needed – data already structured 1.2 MB/day (as-is)
Small IP (cameras) Edge ML inference Run mastitis detection model on gateway; transmit only flagged frames + 10-sec clips 780 MB/day (99% reduction)
Non-IP (flow meters) Gateway aggregation Aggregate per-cow milking session (start, end, total litres, peak flow) 0.4 MB/day (97% reduction)
Non-IP (temp probes) Gateway with threshold filter Transmit only if outside 2-6 C (milk safety range) 0.008 MB/day (95% reduction)
Non-IP (RFID) Gateway protocol translation Translate RFID events to MQTT messages with cow ID + timestamp 0.05 MB/day (as-is)

Step 3: Calculate connectivity costs

Metric Without Edge Processing With Edge Processing Savings
Daily data per shed 77.8 GB 782 MB 99%
Monthly 4G cost per shed NZD 28,000 NZD 282 NZD 27,718
Monthly cost (200 sheds) NZD 5.6M NZD 56,400 NZD 5.54M
Gateway hardware (200 sheds) NZD 180,000 one-time Payback: 1 day

46.8.2 Interactive: Edge Processing Cost Savings Calculator

Adjust the parameters below to see how edge processing affects connectivity costs for a multi-shed IoT deployment. The dominant cost driver is typically the highest-bandwidth device (cameras).

Result: A single Raspberry Pi 4 gateway (NZD 900 with edge ML accelerator) per shed handles all three device categories: protocol translation for Non-IP sensors, video analytics for IP cameras, and pass-through for the SCADA controller. The 99% data reduction makes rural 4G connectivity economically viable.

Key Insight: The three device categories in Fonterra’s deployment map directly to three gateway functions: Big Things need routing (IP to IP), Small IP Things need edge inference (reduce high-bandwidth streams), and Non-IP Things need protocol translation (Modbus/4-20 mA/RFID to MQTT). A single edge gateway serves all three roles, and the dominant cost driver is always the highest-bandwidth device category (cameras, in this case).

Common Pitfalls

Busy-loop polling wastes CPU cycles and prevents the processor from entering low-power sleep states. Use hardware timer interrupts or DMA to trigger sensor reads, allowing the MCU to sleep between samples.

The architecture must be designed backwards from the bandwidth constraint: start with the available link budget, determine how many bytes per second can be transmitted, then design sampling rates and pre-aggregation to fit within that budget.

When sensor readings from different buses are acquired at slightly different times due to software scheduling delays, fusing them without correcting for the time offset produces incorrect results. Use hardware timestamps from a shared timer source.

In industrial deployments, sensors fail and are replaced while the system is running. Design the acquisition layer to detect new sensors at startup or during operation and handle their absence gracefully rather than crashing.

46.9 Summary

Edge data acquisition architecture is built on understanding three fundamental device categories:

  • Big Things: Full-capability computers with direct cloud connectivity - minimal edge processing needed
  • Small IP Things: Embedded devices with IP connectivity - benefit from edge compression and filtering
  • Non-IP Things: Simple sensors requiring gateways - need edge aggregation for efficient transmission

The acquisition strategy must match device capabilities: high-volume devices (cameras) need compression, low-volume devices (temperature sensors) need aggregation, and non-IP devices need protocol translation through gateways.

46.10 Concept Relationships

This chapter establishes the foundational architecture for edge data collection:

Core Classification (This chapter):

  • Three device categories (Big Things, Small IP Things, Non-IP Things) determine connectivity paths and acquisition strategies
  • Data generation patterns vary 1000x across categories (door sensors: bytes/day vs cameras: gigabytes/day)

Technical Implementation (Apply this foundation):

Processing Context:

  • Edge Compute Patterns - Where to process (edge/fog/cloud) depends on device capabilities established here
  • Edge Fog Computing - Big Things can participate in fog layer; Small/Non-IP Things need edge gateways

Data Quality Integration:

46.11 What’s Next

If you want to… Read this
Understand power management for the architecture Edge Acquisition Power and Gateways
Learn sampling and compression strategies Edge Acquisition Sampling and Compression
Study the broader edge compute context Edge Data Acquisition
Apply the architecture to compute patterns Edge Compute Patterns
Return to the module overview Big Data Overview