320 TinyML: Machine Learning on Microcontrollers

320.1 Learning Objectives

By the end of this chapter, you will be able to:

Select TinyML Platforms: Choose appropriate microcontrollers based on model size, RAM, and power requirements
Understand Model Constraints: Calculate memory budgets for weights, activations, and runtime overhead
Use TensorFlow Lite Micro: Deploy ML models on microcontrollers using the TFLM framework
Apply Edge Impulse: Build end-to-end TinyML solutions from data collection to deployment

320.2 Introduction

TinyML brings machine learning to ultra-low-power microcontrollers with as little as 1 KB of RAM. This enables intelligent inference on battery-powered devices lasting months or years.

The challenge is fitting powerful neural networks into devices with less computational power than a 1990s calculator. This chapter explores the hardware platforms, software frameworks, and design patterns that make TinyML possible.

320.3 What Fits on a Microcontroller?

320.3.1 Popular TinyML Hardware Platforms

Device	Flash	RAM	CPU	Power	Cost	Typical Use
Arduino Nano 33 BLE	1 MB	256 KB	64 MHz Cortex-M4	5 mW	$25	Keyword spotting, gesture recognition
ESP32-S3	8 MB	512 KB	240 MHz Xtensa dual-core	20-50 mW	$8	Smart home, audio processing
Raspberry Pi Pico	2 MB	264 KB	133 MHz Cortex-M0+	30 mW	$4	Sensor fusion, vibration analysis
STM32L4	512 KB	128 KB	80 MHz Cortex-M4	3 mW	$5	Ultra-low-power sensing
Nordic nRF52840	1 MB	256 KB	64 MHz Cortex-M4	2-8 mW	$7	BLE + ML wearables

320.3.2 Model Size Constraints

Typical TinyML Model Budget:
- Model weights: 20-200 KB (int8 quantized)
- Activation tensors: 10-50 KB (intermediate layer outputs)
- Input buffer: 5-20 KB (sensor data window)
- Framework overhead: 50-100 KB (TensorFlow Lite Micro runtime)
TOTAL: Fits in 100-500 KB Flash + 50-150 KB RAM

Examples:
- Wake word detection: 18 KB model, 35 KB RAM
- Gesture recognition: 45 KB model, 65 KB RAM
- Anomaly detection: 30 KB model, 40 KB RAM

For Beginners: What is TinyML?

Think of TinyML like a smart insect brain versus a human brain.

A human brain (cloud AI) has billions of neurons and consumes 20 watts of power. An insect brain (TinyML) has only thousands of neurons but can fly, navigate, and avoid predators on microwatts of power.

TinyML lets tiny devices make smart decisions:

Your fitness tracker detects when you’re running vs walking using a 30 KB model
Smart earbuds recognize “Hey Siri” using 18 KB of neural network
A wildlife camera classifies animals without internet using 150 KB of vision AI

The magic is compression: We take big neural networks trained in the cloud and shrink them 10-100x to fit on $5 microcontrollers. The accuracy drops a little (maybe 95% instead of 98%), but the device works without internet, without batteries dying, and without sending your data anywhere.

320.4 TensorFlow Lite Micro

TensorFlow Lite Micro (TFLM) is a lightweight inference framework designed for microcontrollers, no operating system required.

320.4.1 Architecture

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1', 'fontSize': '14px'}}}%%
flowchart LR
    subgraph Cloud["Cloud Training"]
        TF["TensorFlow<br/>Model<br/>(100 MB)"]
    end

    subgraph Convert["Conversion & Optimization"]
        TFLite["TFLite<br/>Converter"]
        Quant["Quantization<br/>(int8)"]
    end

    subgraph Edge["Microcontroller"]
        Model["Model.tflite<br/>(200 KB)"]
        TFLM["TFLM<br/>Interpreter"]
        Arena["Tensor Arena<br/>(50 KB RAM)"]
    end

    TF --> TFLite
    TFLite --> Quant
    Quant --> Model
    Model --> TFLM
    TFLM --> Arena

    style Cloud fill:#7F8C8D,stroke:#2C3E50,color:#fff
    style Convert fill:#E67E22,stroke:#2C3E50,color:#fff
    style Edge fill:#16A085,stroke:#2C3E50,color:#fff

Figure 320.1: TensorFlow Lite Micro pipeline from cloud training to microcontroller deployment

How It Works:

Cloud Training: Train a full TensorFlow model (typically 100 MB+ with float32 weights)
Conversion: Use TFLite Converter to optimize for mobile/embedded
Quantization: Convert float32 to int8 (4x size reduction)
Deployment: Copy model bytes to microcontroller flash memory
Inference: TFLM interpreter loads model, allocates tensor arena in RAM, runs inference

320.4.2 Supported Operations (Subset)

Convolutional layers: Conv2D, DepthwiseConv2D
Activation functions: ReLU, ReLU6, Sigmoid, Tanh
Pooling: MaxPool2D, AveragePool2D
Fully connected: Dense
Normalization: BatchNormalization
Utilities: Reshape, Concatenate, Add, Multiply

320.4.3 Example: Wake Word Detection

// TensorFlow Lite Micro - Arduino Example (Conceptual)
#include <TensorFlowLite.h>
#include "model.h" // 18 KB wake word model

// Allocate memory for inference
constexpr int kTensorArenaSize = 35 * 1024;
uint8_t tensor_arena[kTensorArenaSize];

// Initialize interpreter
tflite::MicroInterpreter interpreter(model, ops_resolver,
                                     tensor_arena, kTensorArenaSize);
interpreter.AllocateTensors();

// Get input tensor (audio features: 40 MFCC coefficients x 49 frames)
TfLiteTensor* input = interpreter.input(0);

// Run inference on audio buffer
void detectWakeWord(float* audio_features) {
  // Copy features to input tensor
  for (int i = 0; i < 1960; i++) {
    input->data.f[i] = audio_features[i];
  }

  // Invoke inference (takes ~20ms)
  interpreter.Invoke();

  // Get output (probability of wake word)
  TfLiteTensor* output = interpreter.output(0);
  float confidence = output->data.f[0];

  if (confidence > 0.85) {
    // Wake word detected! Start streaming to cloud
    Serial.println("Wake word detected!");
  }
}

320.5 Edge Impulse: End-to-End TinyML Platform

Edge Impulse provides a complete no-code/low-code platform for building TinyML solutions: data collection -> labeling -> training -> deployment.

320.5.1 Workflow

Data Collection: Use smartphone or hardware to collect sensor data (audio, accelerometer, images)
Data Labeling: Draw bounding boxes, segment audio, label time-series
Signal Processing: Extract features (MFCC for audio, FFT for vibration, image resize)
Model Training: AutoML selects architecture, trains on cloud, optimizes for edge
Deployment: One-click export to Arduino, ESP32, Raspberry Pi, or custom hardware

320.5.2 Example Use Cases

Application	Sensor	Model Type	Model Size	Accuracy
Keyword Spotting	Microphone	1D CNN	18 KB	92%
Gesture Recognition	Accelerometer	LSTM	45 KB	88%
Visual Inspection	Camera	MobileNetV2	150 KB	94%
Predictive Maintenance	Vibration	Anomaly Detection	30 KB	96%

320.5.3 Advantages

Rapid prototyping (hours instead of weeks)
Automatic feature engineering and model optimization
Integrated data versioning and experiment tracking
Hardware abstraction (same model runs on Arduino or ESP32)

320.5.4 TinyML Runtime Selection

Choosing the right runtime depends on your hardware constraints and model complexity:

Runtime	Binary Size	Features	Best For
TensorFlow Lite Micro	50-200 KB	Full inference, many ops	General TinyML, complex models
CMSIS-NN	10-50 KB	ARM Cortex-M optimized	Ultra-low power, ARM MCUs
X-CUBE-AI	50-100 KB	STM32 optimized, hardware accelerated	STM32 family devices
Edge Impulse SDK	30-100 KB	AutoML generated, optimized	Rapid prototyping, production
microTVM	20-80 KB	Compiler-based, any target	Custom hardware, maximum efficiency

320.6 Knowledge Check

Show code

{
  const container = document.createElement('div');
  if (container && typeof InlineKnowledgeCheck !== 'undefined') {
    container.innerHTML = '';
    container.appendChild(InlineKnowledgeCheck.create({
      question: "You need to deploy a gesture recognition model on an Arduino Nano 33 BLE (256 KB RAM, 1 MB Flash). The model is 45 KB and requires 65 KB tensor arena. Your firmware uses 150 KB Flash and 80 KB RAM. Will this deployment work?",
      options: [
        {text: "Yes - 45 KB model fits in 1 MB Flash and 65 KB tensor arena fits in 256 KB RAM", correct: true, feedback: "Correct! Flash budget: 150 KB firmware + 45 KB model + ~50 KB TFLM runtime = 245 KB (fits in 1 MB). RAM budget: 80 KB firmware + 65 KB tensor arena = 145 KB (fits in 256 KB). Always calculate total budget including firmware overhead."},
        {text: "No - the model is too large for the Flash memory", correct: false, feedback: "The calculation shows Flash budget is 245 KB, well under the 1 MB limit. Flash is sufficient."},
        {text: "No - the tensor arena exceeds available RAM", correct: false, feedback: "The calculation shows RAM budget is 145 KB, under the 256 KB limit. RAM is sufficient."},
        {text: "Maybe - depends on TFLM runtime size which varies", correct: false, feedback: "While TFLM runtime size varies (50-200 KB), even at maximum overhead there's still room in 1 MB Flash. The deployment will work."}
      ],
      explanation: "TinyML memory budgeting requires calculating total usage: Model weights (Flash) + TFLM runtime (Flash) + Firmware (Flash) + Tensor arena (RAM) + Stack/heap (RAM). Arduino Nano 33 BLE with 256 KB RAM and 1 MB Flash is sufficient for this 45 KB gesture recognition model.",
      difficulty: "medium",
      topic: "tinyml-memory-budgeting"
    }));
  }
  return container;
}

Show code

{
  const container = document.createElement('div');
  if (container && typeof InlineKnowledgeCheck !== 'undefined') {
    container.innerHTML = '';
    container.appendChild(InlineKnowledgeCheck.create({
      question: "You're building a battery-powered wildlife camera that must run for 6 months on a single charge. Which TinyML hardware platform is most appropriate for running a 150 KB image classification model?",
      options: [
        {text: "ESP32-S3 (512 KB RAM, 20-50 mW) - good balance of compute and power", correct: false, feedback: "ESP32-S3 at 20-50 mW is relatively power-hungry. For 6-month battery life, you need much lower power consumption."},
        {text: "Raspberry Pi Pico (264 KB RAM, 30 mW) - cheapest option with sufficient RAM", correct: false, feedback: "Pi Pico at 30 mW would drain batteries in weeks, not months. Power efficiency is critical for wildlife cameras."},
        {text: "Nordic nRF52840 (256 KB RAM, 2-8 mW) - ultra-low power with BLE for data offload", correct: true, feedback: "Correct! Nordic nRF52840 at 2-8 mW is designed for battery-powered applications. With duty cycling (run inference only when motion detected), this can achieve months of battery life. BLE allows periodic data offload without Wi-Fi power drain."},
        {text: "STM32L4 (128 KB RAM, 3 mW) - lowest power consumption", correct: false, feedback: "While STM32L4 has excellent power consumption, 128 KB RAM is too small for a 150 KB model (model weights alone exceed available RAM)."}
      ],
      explanation: "For battery-powered TinyML: (1) Check RAM constraint (model must fit), (2) Prioritize power efficiency (mW), (3) Consider duty cycling potential. Nordic nRF52840 balances 256 KB RAM (fits 150 KB model) with 2-8 mW power draw, ideal for wildlife monitoring with months of battery life.",
      difficulty: "medium",
      topic: "tinyml-hardware-selection"
    }));
  }
  return container;
}

320.7 Summary

TinyML enables machine learning on ultra-low-power devices:

Hardware Options: - $4-25 microcontrollers with 128-512 KB RAM - Power consumption: 2-50 mW - Model budgets: 20-200 KB weights, 30-100 KB RAM

Software Frameworks: - TensorFlow Lite Micro for general TinyML - Edge Impulse for rapid prototyping - CMSIS-NN for ARM optimization

Typical Applications: - Keyword spotting: 18 KB model, 92% accuracy - Gesture recognition: 45 KB model, 88% accuracy - Visual inspection: 150 KB model, 94% accuracy

320.8 What’s Next

Now that you understand TinyML platforms and frameworks, continue to:

Model Optimization Techniques - Learn quantization, pruning, and knowledge distillation to compress models 10-100x
Hardware Accelerators - Explore NPUs, TPUs, and GPUs for edge inference
Edge AI Lab: TinyML Gesture Recognition - Build a working TinyML application with hands-on exercises