320  TinyML: Machine Learning on Microcontrollers

320.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Select TinyML Platforms: Choose appropriate microcontrollers based on model size, RAM, and power requirements
  • Understand Model Constraints: Calculate memory budgets for weights, activations, and runtime overhead
  • Use TensorFlow Lite Micro: Deploy ML models on microcontrollers using the TFLM framework
  • Apply Edge Impulse: Build end-to-end TinyML solutions from data collection to deployment

320.2 Introduction

TinyML brings machine learning to ultra-low-power microcontrollers with as little as 1 KB of RAM. This enables intelligent inference on battery-powered devices lasting months or years.

The challenge is fitting powerful neural networks into devices with less computational power than a 1990s calculator. This chapter explores the hardware platforms, software frameworks, and design patterns that make TinyML possible.

320.3 What Fits on a Microcontroller?

320.3.2 Model Size Constraints

Typical TinyML Model Budget:
- Model weights: 20-200 KB (int8 quantized)
- Activation tensors: 10-50 KB (intermediate layer outputs)
- Input buffer: 5-20 KB (sensor data window)
- Framework overhead: 50-100 KB (TensorFlow Lite Micro runtime)
TOTAL: Fits in 100-500 KB Flash + 50-150 KB RAM

Examples:
- Wake word detection: 18 KB model, 35 KB RAM
- Gesture recognition: 45 KB model, 65 KB RAM
- Anomaly detection: 30 KB model, 40 KB RAM

Think of TinyML like a smart insect brain versus a human brain.

A human brain (cloud AI) has billions of neurons and consumes 20 watts of power. An insect brain (TinyML) has only thousands of neurons but can fly, navigate, and avoid predators on microwatts of power.

TinyML lets tiny devices make smart decisions:

  • Your fitness tracker detects when youโ€™re running vs walking using a 30 KB model
  • Smart earbuds recognize โ€œHey Siriโ€ using 18 KB of neural network
  • A wildlife camera classifies animals without internet using 150 KB of vision AI

The magic is compression: We take big neural networks trained in the cloud and shrink them 10-100x to fit on $5 microcontrollers. The accuracy drops a little (maybe 95% instead of 98%), but the device works without internet, without batteries dying, and without sending your data anywhere.

320.4 TensorFlow Lite Micro

TensorFlow Lite Micro (TFLM) is a lightweight inference framework designed for microcontrollers, no operating system required.

320.4.1 Architecture

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#2C3E50', 'primaryTextColor': '#fff', 'primaryBorderColor': '#16A085', 'lineColor': '#16A085', 'secondaryColor': '#E67E22', 'tertiaryColor': '#ecf0f1', 'fontSize': '14px'}}}%%
flowchart LR
    subgraph Cloud["Cloud Training"]
        TF["TensorFlow<br/>Model<br/>(100 MB)"]
    end

    subgraph Convert["Conversion & Optimization"]
        TFLite["TFLite<br/>Converter"]
        Quant["Quantization<br/>(int8)"]
    end

    subgraph Edge["Microcontroller"]
        Model["Model.tflite<br/>(200 KB)"]
        TFLM["TFLM<br/>Interpreter"]
        Arena["Tensor Arena<br/>(50 KB RAM)"]
    end

    TF --> TFLite
    TFLite --> Quant
    Quant --> Model
    Model --> TFLM
    TFLM --> Arena

    style Cloud fill:#7F8C8D,stroke:#2C3E50,color:#fff
    style Convert fill:#E67E22,stroke:#2C3E50,color:#fff
    style Edge fill:#16A085,stroke:#2C3E50,color:#fff

Figure 320.1: TensorFlow Lite Micro pipeline from cloud training to microcontroller deployment

How It Works:

  1. Cloud Training: Train a full TensorFlow model (typically 100 MB+ with float32 weights)
  2. Conversion: Use TFLite Converter to optimize for mobile/embedded
  3. Quantization: Convert float32 to int8 (4x size reduction)
  4. Deployment: Copy model bytes to microcontroller flash memory
  5. Inference: TFLM interpreter loads model, allocates tensor arena in RAM, runs inference

320.4.2 Supported Operations (Subset)

  • Convolutional layers: Conv2D, DepthwiseConv2D
  • Activation functions: ReLU, ReLU6, Sigmoid, Tanh
  • Pooling: MaxPool2D, AveragePool2D
  • Fully connected: Dense
  • Normalization: BatchNormalization
  • Utilities: Reshape, Concatenate, Add, Multiply

320.4.3 Example: Wake Word Detection

// TensorFlow Lite Micro - Arduino Example (Conceptual)
#include <TensorFlowLite.h>
#include "model.h" // 18 KB wake word model

// Allocate memory for inference
constexpr int kTensorArenaSize = 35 * 1024;
uint8_t tensor_arena[kTensorArenaSize];

// Initialize interpreter
tflite::MicroInterpreter interpreter(model, ops_resolver,
                                     tensor_arena, kTensorArenaSize);
interpreter.AllocateTensors();

// Get input tensor (audio features: 40 MFCC coefficients x 49 frames)
TfLiteTensor* input = interpreter.input(0);

// Run inference on audio buffer
void detectWakeWord(float* audio_features) {
  // Copy features to input tensor
  for (int i = 0; i < 1960; i++) {
    input->data.f[i] = audio_features[i];
  }

  // Invoke inference (takes ~20ms)
  interpreter.Invoke();

  // Get output (probability of wake word)
  TfLiteTensor* output = interpreter.output(0);
  float confidence = output->data.f[0];

  if (confidence > 0.85) {
    // Wake word detected! Start streaming to cloud
    Serial.println("Wake word detected!");
  }
}

320.5 Edge Impulse: End-to-End TinyML Platform

Edge Impulse provides a complete no-code/low-code platform for building TinyML solutions: data collection -> labeling -> training -> deployment.

320.5.1 Workflow

  1. Data Collection: Use smartphone or hardware to collect sensor data (audio, accelerometer, images)
  2. Data Labeling: Draw bounding boxes, segment audio, label time-series
  3. Signal Processing: Extract features (MFCC for audio, FFT for vibration, image resize)
  4. Model Training: AutoML selects architecture, trains on cloud, optimizes for edge
  5. Deployment: One-click export to Arduino, ESP32, Raspberry Pi, or custom hardware

320.5.2 Example Use Cases

Application Sensor Model Type Model Size Accuracy
Keyword Spotting Microphone 1D CNN 18 KB 92%
Gesture Recognition Accelerometer LSTM 45 KB 88%
Visual Inspection Camera MobileNetV2 150 KB 94%
Predictive Maintenance Vibration Anomaly Detection 30 KB 96%

320.5.3 Advantages

  • Rapid prototyping (hours instead of weeks)
  • Automatic feature engineering and model optimization
  • Integrated data versioning and experiment tracking
  • Hardware abstraction (same model runs on Arduino or ESP32)

320.5.4 TinyML Runtime Selection

Choosing the right runtime depends on your hardware constraints and model complexity:

Runtime Binary Size Features Best For
TensorFlow Lite Micro 50-200 KB Full inference, many ops General TinyML, complex models
CMSIS-NN 10-50 KB ARM Cortex-M optimized Ultra-low power, ARM MCUs
X-CUBE-AI 50-100 KB STM32 optimized, hardware accelerated STM32 family devices
Edge Impulse SDK 30-100 KB AutoML generated, optimized Rapid prototyping, production
microTVM 20-80 KB Compiler-based, any target Custom hardware, maximum efficiency

320.6 Knowledge Check

320.7 Summary

TinyML enables machine learning on ultra-low-power devices:

Hardware Options: - $4-25 microcontrollers with 128-512 KB RAM - Power consumption: 2-50 mW - Model budgets: 20-200 KB weights, 30-100 KB RAM

Software Frameworks: - TensorFlow Lite Micro for general TinyML - Edge Impulse for rapid prototyping - CMSIS-NN for ARM optimization

Typical Applications: - Keyword spotting: 18 KB model, 92% accuracy - Gesture recognition: 45 KB model, 88% accuracy - Visual inspection: 150 KB model, 94% accuracy

320.8 Whatโ€™s Next

Now that you understand TinyML platforms and frameworks, continue to: