39 Edge Data Architecture

analytics-ml

edge

acq

39.1 Start With the Story

Picture an IoT team using the ideas in Edge Data Architecture during a live operations review. A device has produced messy evidence, an analytic step is about to change an alert or control decision, and someone has to explain why the result should be trusted.

Read this page as that path from sensor evidence to accountable action. Start with what the system observes, keep the model or data treatment visible, and finish with the check that would convince an operator, maintainer, or auditor to act.

In 60 Seconds

Edge data acquisition is the process of collecting sensor data at the network periphery and deciding what to process locally versus what to send to the cloud. IoT devices fall into three categories – Big Things (servers), Small IP Things (smart cameras), and Non-IP Things (simple sensors needing gateways) – each requiring different acquisition strategies based on their connectivity and processing capabilities.

Phoebe’s Field Notes: What “0.05g Rising To 0.5g” Actually Moves

Phoebe the physics guide

Phoebe’s Why

The chapter’s own bearing-fault signature – accel_x_rms rising from 0.05g to 0.5g – never touches a “g” directly. A capacitive MEMS accelerometer reports acceleration only because a microscopic suspended mass moves against its own springs, and that displacement changes a tiny sense capacitance the front end can measure. Every stage between that physical motion and the RMS value in the chapter’s Python example is a governing equation the reading has to invert: displacement from force, capacitance from displacement, a digital code from capacitance. Knowing the size of each step is what turns “the number went up 10x” into an evidence-backed claim about a real, physical fault.

The Derivation

Quasi-static mass-spring displacement, with the spring constant expressed through the sensing element’s own resonance (proof mass cancels):

\[x = \frac{ma}{k}, \qquad k = m(2\pi f_n)^2 \;\Rightarrow\; x = \frac{a}{(2\pi f_n)^2}\]

Differential half-bridge capacitance sensitivity for a parallel-plate sense gap \(d_0\):

\[\Delta C \approx C_0\,\frac{2x}{d_0}\]

Quantization step for an \(N\)-bit output over a \(\pm\) full-scale range:

\[q = \frac{\text{range}}{2^N}\]

Worked Numbers: This Chapter’s Own 0.05g to 0.5g Fault Signature

Using catalog-typical parameters for an industrial condition-monitoring MEMS part (wider bandwidth than a wearable accelerometer): resonance \(f_n=20{,}000\) Hz, nominal sense capacitance \(C_0=1.0\) pF, sense gap \(d_0=2.0\ \mu\)m.

Displacement: at the chapter’s own \(0.05g\) baseline, \(x=0.05\times9.81/(2\pi\times20{,}000)^2=31.1\) pm; at the \(0.5g\) fault threshold, \(x=311\) pm – both well under a nanometre, and exactly \(10\times\) apart because \(x\) is linear in \(a\).
Capacitance change: \(\Delta C = C_0\times2x/d_0\) gives \(31.1\) aF at baseline and \(311\) aF at the fault threshold – attofarad-scale signals, which is why MEMS accelerometer front ends are charge amplifiers, not simple voltage dividers.
Tying to quantization: a catalog-typical \(\pm4g\), 16-bit industrial vibration output gives \(q=8/65{,}536=122\ \mu g\)/LSB; the \(0.05g\) baseline sits at LSB \(410\) and the \(0.5g\) threshold at LSB \(4{,}096\), a gap of \(3{,}686\) LSBs – the digitizer is nowhere near its resolution floor, so a real bearing fault of this size is a converter-clean, unambiguous jump, not a borderline call.

The RMS and peak values this chapter’s edge pipeline transmits are downstream summaries of a picometre-scale mechanical motion converted through an attofarad-scale capacitance change – the physical chain the “governing equation” language in this chapter’s own device-taxonomy framing is quietly assuming every time a threshold like 0.5g gets treated as ground truth.

Chapter Roadmap

This chapter follows the acquisition path:

First we classify the three kinds of Things before writing the data budget.
Then we compare data generation patterns and use the IMU example to see why raw streams do not scale.
Next we connect sampling, aggregation, power source, and gateway placement to the network constraint.
Finally we apply the same logic to Fonterra, then use the quizzes and pitfalls to check the design.

Checkpoints recap the path, and the calculators let you change the chapter’s rates and costs.

39.2 Learning Objectives

By the end of this chapter, you will be able to:

Classify IoT Data Sources: Distinguish between Big Things, Small IP Things, and Non-IP Things in edge architectures
Explain Device Connectivity Paths: Describe how different device types connect to cloud infrastructure through direct IP or gateways
Analyze Data Generation Rates: Calculate data volumes across device categories and assess their implications for edge processing
Design Data Acquisition Strategies: Select and justify appropriate transmission schedules based on device capabilities and constraints

39.3 Quick Check: Edge Data Architecture

39.4 Prerequisites

Before diving into this chapter, you should be familiar with:

Edge, Fog, and Cloud Overview: Understanding the three-tier IoT architecture provides context for where edge data acquisition fits
Sensor Fundamentals: Knowledge of sensor types and characteristics helps understand data acquisition requirements

Edge Data Acquisition Basics

Think of edge data acquisition like a local newspaper reporter versus a national news network.

A local reporter (edge device) collects news from the neighborhood and decides what’s important enough to send to the national headquarters (cloud). They don’t send everything - just the highlights. This saves time, money, and keeps headquarters from being overwhelmed.

The “Edge” is simply where your sensors live:

Location	Example	Why “Edge”?
Your thermostat	Living room wall	At the edge of your network
Factory sensor	On a machine	Far from the central servers
Traffic camera	Roadside pole	Collecting data at the source

Three types of “Things” at the edge:

Big Things - Computers, servers (they can talk to the internet directly)
Small IP Things - Smart bulbs, webcams (they have their own internet connection)
Non-IP Things - Simple sensors that need a “translator” (gateway) to reach the internet

Why process data at the edge instead of sending everything to the cloud?

Challenge	Without Edge	With Edge
Speed	Wait for cloud response	Instant local decisions
Battery	Constant transmission drains battery	Send only summaries, save power
Bandwidth	Network gets clogged	Only important data travels far
Privacy	All your data goes to remote servers	Sensitive data stays local

Real-world example: A security camera generates 1GB of video per hour. Instead of sending all that to the cloud, edge processing detects “motion” and only uploads the 5-second clips that matter.

39.5 Introduction to Edge Data Acquisition

Time: ~5 min | Difficulty: Foundational | Reference: P10.C08.U01

Key Concepts

Edge acquisition architecture: The hardware and software design of a system that captures, validates, and pre-processes sensor data at or near the source before transmission to higher processing tiers.
Sensor interface bus: The low-level communication protocol connecting sensors to a microcontroller or gateway — common IoT interfaces include I2C, SPI, UART, and ADC.
Data aggregation gateway: A device that collects raw readings from multiple nearby sensors, applies local processing (averaging, event detection), and forwards summarised data to the cloud.
Ring buffer: A circular fixed-size memory structure used in edge devices to store a rolling window of recent sensor readings without dynamic memory allocation.
Interrupt-driven sampling: A microcontroller technique where a hardware timer interrupt triggers sensor reads at precise intervals, ensuring consistent sample timing without busy-wait polling.
DMA (Direct Memory Access): A hardware mechanism allowing peripherals (ADC, sensor buses) to transfer data directly to memory without CPU intervention, freeing the processor for other tasks.

Edge data acquisition is the process of collecting, processing, and transmitting sensor data at the network periphery - where physical devices meet the digital infrastructure. This chapter explores the fundamental architecture and device categories that form the foundation of efficient data collection at the IoT edge.

Key Takeaway

In one sentence: Collect raw data at the edge, but only transmit what’s needed - 90% of IoT data is never analyzed.

Remember this rule: If you can’t name who will use the data and how, don’t collect it.

Why Edge Matters

Traditional cloud-centric architectures require all sensor data to travel to remote servers for processing. Edge data acquisition shifts some processing closer to the source, reducing:

Latency: Critical for time-sensitive applications (autonomous vehicles, industrial safety)
Bandwidth: Raw sensor streams can overwhelm network capacity
Energy: Transmitting data is 10-100x more power-intensive than local processing
Privacy: Sensitive data can be processed locally without cloud exposure

39.6 IoT Device Categories

Time: ~10 min | Difficulty: Intermediate | Reference: P10.C08.U02

The key sources of data in IoT are the ‘Things’ - the physical devices and controllers located on Level 1 of the IoT Reference Model. Things can be accessed directly to send and receive data, however, to be IoT ‘Things’, they must be connected to the Internet.

39.6.1 Three Categories of Things

Diagram showing three IoT device categories -- Big Things (servers, PLCs) with direct IP connectivity, Small IP Things (cameras, smart devices) with Wi-Fi or cellular links, and Non-IP Things (simple sensors) requiring gateways for protocol translation to reach cloud infrastructure. — Figure 39.1: IoT Device Categories and Gateway Connectivity Paths

39.6.1.1 Mobile diagram summary

Big Things: Servers and PLCs have full IP connectivity, so they can upload structured data directly to the cloud.
Small IP Things: Smart cameras and other embedded devices connect over Wi-Fi or cellular and can pre-filter data before upload.
Non-IP Things: Simple sensors use Zigbee, BLE, Modbus, or similar buses and rely on a nearby gateway for translation and aggregation.
Connectivity path: Non-IP sensor -> gateway -> cloud, while Big Things and Small IP Things can usually reach the cloud without protocol translation.

Big Things might be computers and databases. Small IP-enabled Things could include webcams, lights, and smartphones. Non-IP Things may need a Gateway or other device to assist - examples include lights, temperature gauges, locks, and gates.

39.6.2 Device Characteristics Comparison

Category	Examples	Connectivity	Data Rate	Processing Capability
Big Things	Servers, industrial PLCs	Full IP stack	GB/day	High (full OS)
Small IP Things	Smart cameras, lights	Wi-Fi, cellular	MB/day	Medium (embedded)
Non-IP Things	Temperature sensors, door locks	Zigbee, BLE, Modbus	KB/day	Low (microcontroller)

Checkpoint: Device Categories

You now know:

Big Things have a full IP stack and high processing capability, so they can usually upload structured data directly.
Small IP Things have their own Wi-Fi or cellular connectivity, but still benefit from local filtering when data rates rise.
Non-IP Things use links such as Zigbee, BLE, or Modbus and need a gateway for translation and aggregation.

Once the connectivity path is clear, ask how much evidence each path would move without edge filtering.

39.7 Data Generation Patterns

Time: ~8 min | Difficulty: Intermediate | Reference: P10.C08.U02b

Understanding data generation patterns is essential for designing efficient edge acquisition systems. Different device types produce vastly different data volumes and require different handling strategies.

Table showing data generation statistics for common IoT device categories. Big Things like computers generate megabytes to gigabytes per day with continuous connectivity. Small IP Things such as webcams and smart lights generate kilobytes to megabytes with periodic updates. Non-IP Things including temperature sensors and door locks generate bytes to kilobytes with event-triggered transmission. Columns include device type, typical data rate, transmission frequency, and connectivity requirements. — Figure 39.2: Data generation statistics for IoT devices

39.7.0.1 Mobile figure summary

Single sensor at 10 Hz: 20 bytes per sample becomes 200 bytes/sec, 12 KB/minute, 720 KB/hour, 17 MB/day, and 6.3 GB/year.
Fleet impact: 10 sensors generate about 63 GB/year, 100 sensors reach 630 GB/year, 1,000 sensors reach 6.3 TB/year, and 10,000 sensors reach 63 TB/year.
Design takeaway: Even a simple sensor becomes a storage problem at fleet scale, so edge filtering and aggregation matter early.

Table showing IoT data storage requirements by sampling frequency. Rows compare rates from 1 per hour (480 bytes per day) through 1 per minute (28.8 KB), every 10 seconds (172.8 KB), every 1 second (1.73 MB), to 100 ms at 10 Hz (17.3 MB per day). Calculations assume 20 bytes per sample including timestamp, sensor value, and metadata. A color-coded scale bar at the bottom illustrates the exponential growth in daily data volume as sampling frequency increases. — Figure 39.3: Data generation rates and volumes comparison table

39.7.0.2 Mobile comparison summary

1 sample per hour: 480 B/day.
1 sample per minute: 28.8 KB/day.
1 sample every 10 seconds: 172.8 KB/day.
1 sample every second: 1.73 MB/day.
1 sample every 100 ms (10 Hz): 17.3 MB/day.
Design takeaway: A 10x faster sampling rate quickly produces a 10x larger storage and bandwidth budget.

Data Volume by Device Type

This view shows how data generation rates vary dramatically by device type, driving different edge processing strategies:

Different device types require different edge processing strategies based on their data generation rates and the value of raw versus processed data.

39.7.1 Inertial Measurement Example

High-frequency sensors like accelerometers and gyroscopes demonstrate why edge aggregation is critical:

Time-series plots showing raw accelerometer and gyroscope sensor data from a motion tracking device. Top panel displays 3-axis accelerometer readings in g-forces with X, Y, Z traces showing device orientation changes and movement events. Bottom panel shows 3-axis gyroscope readings in degrees per second capturing rotational velocity. Both plots span approximately 10 seconds of continuous sampling at 100 Hz, demonstrating the high-frequency nature of inertial measurement unit data and the need for efficient edge aggregation to reduce transmission bandwidth. — Figure 39.4: Accelerometer and gyroscope example sensor data

39.7.1.1 Mobile figure summary

Session snapshot: 2017-06-30, approximately 5 Hz, with 6 recent samples shown from a wearable IMU session.
Accelerometer axes: X, Y, and Z linear acceleration are tracked together to capture movement and gravity.
Gyroscope axes: X, Y, and Z angular velocity show how the device rotates over time.
Edge takeaway: High-frequency six-axis data is useful locally, but the transmission layer should send aggregated summaries instead of every raw sample.

At 100 Hz sampling across 6 axes (3 accelerometer + 3 gyroscope), an IMU generates 600 samples/second. Transmitting raw data as 16-bit integers would require ~1.2 KB/s – unsustainable for battery-powered devices on LPWAN networks. Edge aggregation reduces this to statistical summaries at 1 Hz.

Putting Numbers to It

How much bandwidth does edge aggregation save for IMU data?

Raw transmission (no edge processing):

Sampling rate: 100 Hz per axis × 6 axes = 600 samples/sec
Data size: 2 bytes per sample (int16) × 600 = 1200 bytes/sec = 1.17 KB/s
Daily volume: \(1.17 \text{ KB/s} \times 86400 \text{ s/day} = 101 \text{ MB/day}\)
LoRa constraint: 1% duty cycle at SF7 allows ~250 bytes/minute = 4.2 bytes/sec → Raw transmission exceeds capacity by 285×

Edge aggregation (1 Hz statistical summaries):

Window: 100 samples (1 second) per axis
Summary: 2 values per accel axis (RMS, peak) + 1 value per gyro axis (RMS) = 9 values
Data size: 2 bytes × 9 values = 18 bytes per second
Daily volume: \(18 \text{ bytes/s} \times 86400 = 1.56 \text{ MB/day}\)
Bandwidth reduction: \(\frac{1200 \text{ B/s}}{18 \text{ B/s}} = 67\times\)

For LoRa deployment: Transmit aggregated summaries every 10 seconds: - Payload: 18 bytes/s x 10s = 180 bytes per transmission - Frequency: 6x/minute = 360 transmissions/hour - Fits within LoRa 1% duty cycle? Each transmission ~0.6s airtime at SF7 -> 3.6 min/hour = 6% duty cycle (exceeds 1% limit!) - Further optimization: Aggregate to 60s summaries: 108 bytes per transmission, 1x/minute -> 0.6s/60s = 1% duty cycle (borderline) - Practical solution: Aggregate to 120s summaries: 216 bytes per transmission, 0.5x/minute -> well within 1% duty cycle

Edge processing is mandatory for battery-powered IMU devices on LPWAN networks.

39.7.2 Edge Bandwidth Calculator

Use the sliders below to explore how sampling rate, number of axes, and aggregation window affect raw versus aggregated data rates. Observe how quickly raw data exceeds LPWAN capacity.

Show code

viewof edgeAcqSampleRate = Inputs.range([10, 1000], {value: 100, step: 10, label: "Sampling rate (Hz)"})
viewof edgeAcqAxes = Inputs.range([1, 12], {value: 6, step: 1, label: "Number of axes"})
viewof edgeAcqBytesPerSample = Inputs.select([2, 4, 8], {value: 2, label: "Bytes per sample"})
viewof edgeAcqSummaryValues = Inputs.range([1, 30], {value: 9, step: 1, label: "Summary values per window"})
viewof edgeAcqWindowSec = Inputs.range([1, 120], {value: 1, step: 1, label: "Aggregation window (sec)"})

Show code

edgeAcqCalc = {
  const rawBytesPerSec = edgeAcqSampleRate * edgeAcqAxes * edgeAcqBytesPerSample;
  const summaryBytesPerWindow = edgeAcqSummaryValues * edgeAcqBytesPerSample;
  const summaryBytesPerSec = summaryBytesPerWindow / edgeAcqWindowSec;
  const reductionRatio = rawBytesPerSec / summaryBytesPerSec;
  const rawDailyMB = (rawBytesPerSec * 86400) / 1e6;
  const summaryDailyMB = (summaryBytesPerSec * 86400) / 1e6;
  const loraCapacity = 4.2;
  const loraExceedFactor = rawBytesPerSec / loraCapacity;
  const loraAggExceed = summaryBytesPerSec / loraCapacity;

  return html`<div style="background: var(--bs-light, #f8f9fa); border: 1px solid var(--bs-border-color, #dee2e6); border-radius: 8px; padding: 1.2rem; color: var(--bs-body-color); font-family: Arial, sans-serif;">
    <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin-bottom: 1rem;">
      <div style="background: var(--bs-body-bg); border-radius: 6px; padding: 1rem; border-left: 4px solid #E74C3C;">
        <div style="font-size: 0.85rem; color: #7F8C8D; text-transform: uppercase;">Raw Data Rate</div>
        <div style="font-size: 1.6rem; font-weight: bold; color: #E74C3C;">${rawBytesPerSec.toLocaleString()} B/s</div>
        <div style="font-size: 0.85rem; color: var(--bs-body-color);">${rawDailyMB.toFixed(1)} MB/day</div>
        <div style="font-size: 0.8rem; color: #E74C3C; margin-top: 0.3rem;">LoRa: exceeds capacity by ${loraExceedFactor.toFixed(0)}x</div>
      </div>
      <div style="background: var(--bs-body-bg); border-radius: 6px; padding: 1rem; border-left: 4px solid #16A085;">
        <div style="font-size: 0.85rem; color: #7F8C8D; text-transform: uppercase;">Aggregated Data Rate</div>
        <div style="font-size: 1.6rem; font-weight: bold; color: #16A085;">${summaryBytesPerSec.toFixed(1)} B/s</div>
        <div style="font-size: 0.85rem; color: var(--bs-body-color);">${summaryDailyMB.toFixed(2)} MB/day</div>
        <div style="font-size: 0.8rem; color: ${loraAggExceed <= 1 ? '#16A085' : '#E67E22'}; margin-top: 0.3rem;">LoRa: ${loraAggExceed <= 1 ? 'fits within capacity' : `still exceeds by ${loraAggExceed.toFixed(1)}x`}</div>
      </div>
    </div>
    <div style="text-align: center; padding: 0.8rem; background: var(--bs-body-bg); border-radius: 6px;">
      <span style="font-size: 1.1rem; font-weight: bold; color: #2C3E50;">Bandwidth Reduction: </span>
      <span style="font-size: 1.4rem; font-weight: bold; color: #3498DB;">${reductionRatio.toFixed(0)}x</span>
      <span style="font-size: 0.85rem; color: #7F8C8D; margin-left: 0.5rem;">(${((1 - 1/reductionRatio) * 100).toFixed(1)}% less data)</span>
    </div>
  </div>`;
}

39.8 Power Budget Decision Framework

Time: ~5 min | Difficulty: Intermediate | Reference: P10.C08.U02c

Device capabilities directly impact acquisition strategies. The key decision point is power source:

Mains-powered devices (factory equipment, building systems): Can sample continuously and transmit frequently – edge processing focuses on bandwidth reduction, not power savings
Battery-powered devices (field sensors, wearables): Must duty-cycle both sampling and transmission – edge processing is essential to extend battery life from days to years
Energy-harvesting devices (solar-powered nodes): Operate with variable power budgets – edge processing must adapt to available energy

Power Budget Decision Tree

This decision tree visualizes how to select the optimal duty cycle based on power constraints:

39.9 IMU Edge Aggregation Pipeline

This Python example demonstrates edge aggregation for the inertial measurement use case discussed above. A 100 Hz IMU produces 600 samples/second across 6 axes. Transmitting raw data is unsustainable, so the edge pipeline computes 1 Hz statistical summaries (RMS and peak per accelerometer axis, RMS per gyroscope axis), reducing bandwidth by ~67x:

import math
import time

class IMUEdgeAggregator:
    """Aggregate high-frequency IMU data into 1 Hz statistical summaries.

    Reduces 600 samples/second (100 Hz x 6 axes) to 9 summary values
    per second, cutting transmission from 1200 bytes/s to 18 bytes/s
    (67x reduction).
    """
    def __init__(self, sample_rate_hz=100, window_sec=1):
        self.sample_rate = sample_rate_hz
        self.window_size = sample_rate_hz * window_sec
        self.buffer_accel = {"x": [], "y": [], "z": []}
        self.buffer_gyro = {"x": [], "y": [], "z": []}

    def add_sample(self, ax, ay, az, gx, gy, gz):
        """Add one raw IMU sample (called at 100 Hz)."""
        self.buffer_accel["x"].append(ax)
        self.buffer_accel["y"].append(ay)
        self.buffer_accel["z"].append(az)
        self.buffer_gyro["x"].append(gx)
        self.buffer_gyro["y"].append(gy)
        self.buffer_gyro["z"].append(gz)

    def _rms(self, values):
        """Root mean square -- captures vibration energy."""
        if not values:
            return 0.0
        return math.sqrt(sum(v * v for v in values) / len(values))

    def _peak(self, values):
        """Peak absolute value -- detects impacts."""
        if not values:
            return 0.0
        return max(abs(v) for v in values)

    def window_ready(self):
        """Check if enough samples collected for one summary."""
        return len(self.buffer_accel["x"]) >= self.window_size

    def compute_summary(self):
        """Compute 1 Hz summary from buffered samples.

        Returns dict with RMS and peak for each axis -- enough
        to detect vibration anomalies without raw data.
        """
        summary = {"timestamp": int(time.time()), "samples": self.window_size}
        for axis in ["x", "y", "z"]:
            accel = self.buffer_accel[axis][:self.window_size]
            gyro = self.buffer_gyro[axis][:self.window_size]
            summary[f"accel_{axis}_rms"] = round(self._rms(accel), 4)
            summary[f"accel_{axis}_peak"] = round(self._peak(accel), 4)
            summary[f"gyro_{axis}_rms"] = round(self._rms(gyro), 2)

        # Clear processed samples
        for axis in ["x", "y", "z"]:
            self.buffer_accel[axis] = self.buffer_accel[axis][self.window_size:]
            self.buffer_gyro[axis] = self.buffer_gyro[axis][self.window_size:]
        return summary

    def estimate_bandwidth(self):
        """Compare raw vs aggregated data rates."""
        raw_bytes_per_sec = self.sample_rate * 6 * 2  # 6 axes, 2 bytes each
        summary_bytes = 9 * 2  # 9 summary values, 2 bytes each
        ratio = raw_bytes_per_sec / summary_bytes
        return {
            "raw_bytes_per_sec": raw_bytes_per_sec,
            "summary_bytes_per_sec": summary_bytes,
            "reduction_ratio": f"{ratio:.0f}x",
        }

# Simulate: factory motor vibration monitoring
agg = IMUEdgeAggregator(sample_rate_hz=100, window_sec=1)

# Feed 100 simulated samples (1 second of data)
import random
for i in range(100):
    # Normal vibration: small accelerations around 0g with noise
    agg.add_sample(
        ax=random.gauss(0, 0.05), ay=random.gauss(0, 0.05),
        az=random.gauss(1.0, 0.03),  # 1g gravity on Z
        gx=random.gauss(0, 2), gy=random.gauss(0, 2),
        gz=random.gauss(0, 1)
    )

if agg.window_ready():
    summary = agg.compute_summary()
    print("1-second summary (transmitted via LoRa):")
    for key, val in summary.items():
        print(f"  {key}: {val}")

bw = agg.estimate_bandwidth()
print(f"\nBandwidth: {bw['raw_bytes_per_sec']} B/s raw -> "
      f"{bw['summary_bytes_per_sec']} B/s summary = "
      f"{bw['reduction_ratio']} reduction")
# Output:
# 1-second summary (transmitted via LoRa):
#   timestamp: 1738900000
#   samples: 100
#   accel_x_rms: 0.0498
#   accel_x_peak: 0.1523
#   accel_y_rms: 0.0512
#   accel_y_peak: 0.1389
#   accel_z_rms: 1.0004
#   accel_z_peak: 1.0891
#   gyro_x_rms: 1.98
#   gyro_y_rms: 2.05
#   gyro_z_rms: 0.99
#
# Bandwidth: 1200 B/s raw -> 18 B/s summary = 67x reduction

The edge device transmits only RMS (vibration energy) and peak (impact detection) values at 1 Hz instead of raw waveforms at 100 Hz. A sudden increase in accel_x_rms from 0.05g to 0.5g flags a developing bearing fault without requiring cloud-side waveform analysis.

Checkpoint: Data Volume and Aggregation

You now know:

A 10 Hz sensor can become 17 MB/day and 6.3 GB/year, so fleet scale turns small samples into a storage budget.
A 100 Hz, 6-axis IMU creates 600 samples/second, or about 1.2 KB/s before local aggregation.
Summarising to 1 Hz cuts the IMU stream from 1200 B/s to 18 B/s, a 67x reduction, before any cloud upload.

Those rates only work if the device has enough energy and link budget.

39.10 Knowledge Check

39.11 Quiz: Device Categories

39.11.1 Fonterra Edge Acquisition

Scenario: Fonterra, New Zealand’s largest dairy cooperative, deploys IoT sensors across 200 milking sheds to monitor milk quality and cow health in real-time. Each shed has a mix of device categories requiring different acquisition strategies.

Given:

200 sheds, each with the following devices:
- 1 SCADA controller (Big Thing) – monitors vat temperature, logs milk volume
- 8 IP cameras (Small IP Things) – 1080p at 15 fps for mastitis detection via udder imaging
- 40 Non-IP sensors per shed: 20 milk flow meters (Modbus RTU), 10 temperature probes (4-20 mA), 10 cow ID readers (RFID 134.2 kHz)
Rural connectivity: 4G LTE at NZD 12/GB, typical 15 Mbps downlink / 5 Mbps uplink
Power: Mains-powered shed, solar-powered paddock sensors

Step 1: Classify devices and estimate raw data

Device	Category	Count (per shed)	Raw Data Rate	Daily Raw (per shed)
SCADA controller	Big Thing	1	50 KB/hour	1.2 MB
IP cameras	Small IP Things	8	6.75 MB/min each	77.8 GB
Milk flow meters	Non-IP Things	20	2 readings/sec x 4 bytes	14 MB
Temperature probes	Non-IP Things	10	1 reading/10 sec x 2 bytes	0.17 MB
RFID readers	Non-IP Things	10	Event-based, ~400 events/day x 12 bytes	0.05 MB

Step 2: Design acquisition strategy per category

Category	Strategy	Edge Processing	Transmitted Data
Big Thing (SCADA)	Direct IP upload	None needed – data already structured	1.2 MB/day (as-is)
Small IP (cameras)	Edge ML inference	Run mastitis detection model on gateway; transmit only flagged frames plus 10-second clips	780 MB/day (99% reduction)
Non-IP (flow meters)	Gateway aggregation	Aggregate per-cow milking session (start, end, total litres, peak flow)	0.4 MB/day (97% reduction)
Non-IP (temp probes)	Gateway with threshold filter	Transmit only if outside 2-6 C (milk safety range)	0.008 MB/day (95% reduction)
Non-IP (RFID)	Gateway protocol translation	Translate RFID events to MQTT messages with cow ID plus timestamp	0.05 MB/day (as-is)

Step 3: Calculate connectivity costs

Metric	Without Edge Processing	With Edge Processing	Savings
Daily data per shed	77.8 GB	782 MB	99%
Monthly 4G cost per shed	NZD 28,000	NZD 282	NZD 27,718
Monthly cost (200 sheds)	NZD 5.6M	NZD 56,400	NZD 5.54M
Gateway hardware (200 sheds)	–	NZD 180,000 one-time	Payback: 1 day

39.11.2 Edge Processing Savings Calculator

Adjust the parameters below to see how edge processing affects connectivity costs for a multi-shed IoT deployment. The dominant cost driver is typically the highest-bandwidth device (cameras).

Show code

viewof edgeCostSheds = Inputs.range([1, 500], {value: 200, step: 1, label: "Number of sheds"})
viewof edgeCostCameras = Inputs.range([0, 20], {value: 8, step: 1, label: "IP cameras per shed"})
viewof edgeCostCameraRateMB = Inputs.range([1, 20], {value: 6.75, step: 0.25, label: "Camera data rate (MB/min)"})
viewof edgeCostPerGB = Inputs.range([1, 50], {value: 12, step: 1, label: "Connectivity cost ($/GB)"})
viewof edgeCostReduction = Inputs.range([80, 99.9], {value: 99, step: 0.1, label: "Edge reduction (%)"})
viewof edgeCostGatewayPrice = Inputs.range([100, 3000], {value: 900, step: 50, label: "Gateway cost per shed ($)"})

Show code

edgeCostCalc = {
  const dailyRawGB = edgeCostCameras * edgeCostCameraRateMB * 60 * 24 / 1000;
  const dailyEdgeGB = dailyRawGB * (1 - edgeCostReduction / 100);
  const monthlyRawCost = dailyRawGB * 30 * edgeCostPerGB;
  const monthlyEdgeCost = dailyEdgeGB * 30 * edgeCostPerGB;
  const totalMonthlyRaw = monthlyRawCost * edgeCostSheds;
  const totalMonthlyEdge = monthlyEdgeCost * edgeCostSheds;
  const totalGatewayCost = edgeCostGatewayPrice * edgeCostSheds;
  const monthlySavings = totalMonthlyRaw - totalMonthlyEdge;
  const paybackDays = monthlySavings > 0 ? totalGatewayCost / (monthlySavings / 30) : Infinity;

  return html`<div style="background: var(--bs-light, #f8f9fa); border: 1px solid var(--bs-border-color, #dee2e6); border-radius: 8px; padding: 1.2rem; color: var(--bs-body-color); font-family: Arial, sans-serif;">
    <div class="edge-acq-metric-grid">
      <div class="edge-acq-metric-card" style="background: var(--bs-body-bg); border-radius: 6px; padding: 0.8rem; border-left: 4px solid #E74C3C;">
        <div style="font-size: 0.75rem; color: #7F8C8D; text-transform: uppercase;">Without Edge (monthly)</div>
        <div style="font-size: 1.3rem; font-weight: bold; color: #E74C3C; overflow-wrap: anywhere;">$${totalMonthlyRaw >= 1e6 ? (totalMonthlyRaw/1e6).toFixed(1) + 'M' : totalMonthlyRaw.toLocaleString(undefined, {maximumFractionDigits: 0})}</div>
        <div style="font-size: 0.75rem; color: var(--bs-body-color); overflow-wrap: anywhere;">${(dailyRawGB).toFixed(1)} GB/day/shed</div>
      </div>
      <div class="edge-acq-metric-card" style="background: var(--bs-body-bg); border-radius: 6px; padding: 0.8rem; border-left: 4px solid #16A085;">
        <div style="font-size: 0.75rem; color: #7F8C8D; text-transform: uppercase;">With Edge (monthly)</div>
        <div style="font-size: 1.3rem; font-weight: bold; color: #16A085; overflow-wrap: anywhere;">$${totalMonthlyEdge >= 1e6 ? (totalMonthlyEdge/1e6).toFixed(1) + 'M' : totalMonthlyEdge.toLocaleString(undefined, {maximumFractionDigits: 0})}</div>
        <div style="font-size: 0.75rem; color: var(--bs-body-color); overflow-wrap: anywhere;">${(dailyEdgeGB * 1000).toFixed(0)} MB/day/shed</div>
      </div>
      <div class="edge-acq-metric-card" style="background: var(--bs-body-bg); border-radius: 6px; padding: 0.8rem; border-left: 4px solid #3498DB;">
        <div style="font-size: 0.75rem; color: #7F8C8D; text-transform: uppercase;">Gateway Payback</div>
        <div style="font-size: 1.3rem; font-weight: bold; color: #3498DB; overflow-wrap: anywhere;">${paybackDays === Infinity ? 'N/A' : paybackDays < 1 ? '<1 day' : paybackDays.toFixed(0) + ' days'}</div>
        <div style="font-size: 0.75rem; color: var(--bs-body-color); overflow-wrap: anywhere;">$${totalGatewayCost.toLocaleString()} total HW</div>
      </div>
    </div>
    <div style="text-align: center; padding: 0.6rem; background: var(--bs-body-bg); border-radius: 6px;">
      <span style="font-size: 0.95rem; font-weight: bold; color: #2C3E50;">Monthly Savings (${edgeCostSheds} sheds): </span>
      <span style="font-size: 1.2rem; font-weight: bold; color: #E67E22;">$${(totalMonthlyRaw - totalMonthlyEdge) >= 1e6 ? ((totalMonthlyRaw - totalMonthlyEdge)/1e6).toFixed(1) + 'M' : (totalMonthlyRaw - totalMonthlyEdge).toLocaleString(undefined, {maximumFractionDigits: 0})}</span>
    </div>
  </div>`;
}

Result: A single Raspberry Pi 4 gateway (NZD 900 with edge ML accelerator) per shed handles all three device categories: protocol translation for Non-IP sensors, video analytics for IP cameras, and pass-through for the SCADA controller. The 99% data reduction makes rural 4G connectivity economically viable.

Key Insight: The three device categories in Fonterra’s deployment map directly to three gateway functions: Big Things need routing (IP to IP), Small IP Things need edge inference (reduce high-bandwidth streams), and Non-IP Things need protocol translation (Modbus/4-20 mA/RFID to MQTT). A single edge gateway serves all three roles, and the dominant cost driver is always the highest-bandwidth device category (cameras, in this case).

Checkpoint: Gateway Economics

You now know:

In the Fonterra scenario, 200 sheds combine SCADA, 8 IP cameras, and 40 Non-IP sensors per shed.
Edge processing changes camera-heavy traffic from 77.8 GB/day per shed to 782 MB/day per shed.
At NZD 12/GB, the monthly cost drops from NZD 5.6M to NZD 56,400, which explains the one-day gateway payback.

The quizzes now ask you to match the same categories, ordering, and gateway choices.

39.12 Interactive Quiz: Match Concepts

39.13 Interactive Quiz: Sequence the Steps

Common Pitfalls

Edge Acquisition Storage Tiers

Edge acquisition design does not stop at the gateway. Once data moves from “in motion” to “at rest”, define retention tiers and economics explicitly:

Tier	Typical retention	Stored form	Design purpose
Hot	Days to weeks	Raw edge records or short-window summaries	Dashboards, incident review, and replay.
Warm	Months to one year	Hourly or shift-level aggregates	Trend analysis and model features.
Cold	Multi-year	Daily aggregates or compressed event archives	Compliance, audit, and long-horizon planning.

The storage plan should be tied to a TCO calculation. Include gateway hardware, installation, cellular/cloud operations, replacements, and maintenance. Then compare those costs with avoided cloud ingestion, reduced truck rolls, lower battery replacement, and faster fault detection. If the edge system only shifts cloud cost into unmanaged local maintenance, the architecture is not actually cheaper.

Retention policies should be executable, not just documented. A robust pipeline states when to delete hot records, when to roll them into hourly aggregates, when to archive cold summaries, and how to answer queries from each tier without surprising latency.

Use Sensor Interrupts

Busy-loop polling wastes CPU cycles and prevents the processor from entering low-power sleep states. Use hardware timer interrupts or DMA to trigger sensor reads, allowing the MCU to sleep between samples.

Define the Data Budget First

The architecture must be designed backwards from the bandwidth constraint: start with the available link budget, determine how many bytes per second can be transmitted, then design sampling rates and pre-aggregation to fit within that budget.

Plan Timestamp Accuracy

When sensor readings from different buses are acquired at slightly different times due to software scheduling delays, fusing them without correcting for the time offset produces incorrect results. Use hardware timestamps from a shared timer source.

4. Not designing for sensor hot-swapping

In industrial deployments, sensors fail and are replaced while the system is running. Design the acquisition layer to detect new sensors at startup or during operation and handle their absence gracefully rather than crashing.

Checkpoint: Operational Contracts

You now know:

Edge storage needs hot, warm, and cold tiers so dashboards, trend analysis, and compliance queries do not compete for the same records.
The data budget should be defined before sampling rates, because available bytes per second constrain what can leave the gateway.
Sensor interrupts, hardware timestamps, and hot-swap handling turn the acquisition design into an operational contract instead of a diagram.

With the contracts in place, finish by checking the architecture tiers and timing-buffer companion.

39.14 Label the Diagram

39.15 Acquisition Timing and Buffers

For the deeper implementation contract behind source timestamps, clock synchronisation, bounded buffers, backpressure, drift, and replay metadata, continue to Acquisition Timing and Buffer Contracts.

39.16 Summary

Edge data acquisition architecture is built on understanding three fundamental device categories:

Big Things: Full-capability computers with direct cloud connectivity - minimal edge processing needed
Small IP Things: Embedded devices with IP connectivity - benefit from edge compression and filtering
Non-IP Things: Simple sensors requiring gateways - need edge aggregation for efficient transmission

The acquisition strategy must match device capabilities: high-volume devices (cameras) need compression, low-volume devices (temperature sensors) need aggregation, and non-IP devices need protocol translation through gateways.

39.17 Concept Relationships

This chapter establishes the foundational architecture for edge data collection:

Core Classification (This chapter):

Three device categories (Big Things, Small IP Things, Non-IP Things) determine connectivity paths and acquisition strategies
Data generation patterns vary 1000x across categories (door sensors: bytes/day vs cameras: gigabytes/day)

Technical Implementation (Apply this foundation):

Edge Data Acquisition: Sampling and Compression - How to reduce data volume based on device category (cameras need compression, temperature sensors need aggregation)
Edge Data Acquisition: Power and Gateways - Battery constraints drive duty cycling; gateways bridge Non-IP Things to cloud

Processing Context:

Edge Compute Patterns - Where to process (edge/fog/cloud) depends on device capabilities established here
Edge Fog Computing - Big Things can participate in fog layer; Small/Non-IP Things need edge gateways

Data Quality Integration:

Data Quality and Preprocessing - Edge acquisition is where validation catches errors at 1% cost vs cloud fixes
Multi-Sensor Data Fusion - Fusing data from different device categories requires understanding their generation patterns

39.18 What’s Next

If you want to…	Read this
Control timing, buffering, and replay metadata	Acquisition Timing and Buffer Contracts
Understand power management for the architecture	Edge Acquisition Power and Gateways
Learn sampling and compression strategies	Edge Acquisition Sampling and Compression
Study the broader edge compute context	Edge Data Acquisition
Apply the architecture to compute patterns	Edge Compute Patterns
Return to the module overview	Big Data Overview