338  Edge AI and Machine Learning at the Edge

338.1 Overview

Edge AI brings machine learning to IoT devices, enabling real-time inference where data is created rather than sending everything to the cloud. This chapter series covers the techniques, hardware, and deployment patterns that make Edge AI possible.

Why Edge AI? Three critical drivers make edge AI essential for many IoT applications:

  1. Latency: 10-50ms local inference vs 100-500ms cloud round-trip (safety-critical systems)
  2. Bandwidth: Process locally, send only alerts (99% reduction in data transfer)
  3. Privacy: Sensitive data never leaves the device (GDPR/HIPAA compliance by design)

338.2 Chapter Series

This comprehensive topic is divided into focused chapters:

338.2.1 Edge AI Fundamentals

Why and when to use edge AI

  • The business case: bandwidth savings, latency requirements, privacy compliance
  • When edge AI is mandatory (the “Four Mandates”)
  • Decision framework for edge vs cloud AI
  • Real-world cost calculations and ROI analysis

338.2.2 TinyML: Machine Learning on Microcontrollers

Running ML on ultra-low-power devices

  • Hardware platforms: Arduino Nano 33 BLE, ESP32-S3, STM32L4, Nordic nRF52840
  • TensorFlow Lite Micro framework and deployment
  • Edge Impulse for end-to-end TinyML development
  • Memory budgeting and model size constraints

338.2.3 Model Optimization Techniques

Compressing models 10-100x for edge deployment

  • Quantization: float32 to int8 (4x size reduction, 2-4x speedup)
  • Pruning: removing 70-90% of weights with minimal accuracy loss
  • Knowledge distillation: teacher-student training
  • Combined optimization pipelines and worked examples

338.2.4 Hardware Accelerators

Choosing NPUs, GPUs, TPUs, and FPGAs

  • Neural Processing Units (NPUs): Coral Edge TPU, Intel Movidius, Apple Neural Engine
  • Edge GPUs: NVIDIA Jetson family (Nano, Xavier NX, AGX Orin)
  • FPGAs for custom operations and deterministic latency
  • Hardware selection decision tree and TOPS vs GFLOPS comparison

338.2.5 Edge AI Applications and Deployment Pipeline

Real-world use cases and end-to-end workflows

  • Visual inspection for manufacturing quality control
  • Predictive maintenance with vibration analysis
  • Keyword spotting for always-on voice detection
  • Smart parking deployment pipeline (data collection to production)
  • Continuous learning and model retraining

338.2.6 Interactive Lab: TinyML Gesture Recognition

Hands-on practice with edge AI concepts

  • ESP32-based TinyML gesture recognition simulator
  • Neural network forward pass visualization
  • Quantization and pruning demonstrations
  • Challenge exercises for deeper learning
  • Wokwi simulator for browser-based experimentation

338.3 Quick Reference

Topic Key Concept Learn More
When to use Edge AI Sub-100ms latency, >1GB/day data, privacy requirements Fundamentals
TinyML platforms ESP32, Arduino Nano 33 BLE, STM32 with 128-512 KB RAM TinyML
Model compression INT8 quantization = 4x smaller, pruning = 10x smaller Optimization
Hardware selection NPU for int8, GPU for custom models, FPGA for <10ms latency Hardware
Deployment Data collection -> training -> quantization -> deploy -> retrain Applications
Hands-on Gesture recognition on ESP32 with quantization demo Lab

338.4 Prerequisites

Before diving into this series, you should be familiar with:

338.5 What’s Next

Start with Edge AI Fundamentals to understand when and why to use edge AI, then progress through the series based on your learning goals:

  • Quick start: Fundamentals -> TinyML -> Lab
  • Deep dive: All chapters in sequence
  • Hardware focus: Fundamentals -> Hardware -> Lab
  • Production deployment: Fundamentals -> Optimization -> Applications