315 Edge AI Applications and Deployment Pipeline

315.1 Learning Objectives

By the end of this chapter, you will be able to:

Deploy Visual Inspection Systems: Implement edge AI for manufacturing quality control with 99%+ accuracy
Build Predictive Maintenance: Create vibration-based anomaly detection that predicts failures weeks in advance
Design Keyword Spotting: Implement always-on voice detection on ultra-low-power devices (<1 mW)
Construct End-to-End Pipelines: Build complete edge AI systems from data collection to production deployment

315.2 Visual Inspection: Manufacturing Quality Control

Problem: Detect defects in manufactured parts at 100 items/minute (600ms per item) with 99.5%+ accuracy.

315.2.1 Traditional vs Edge AI Approach

Traditional Approach: - Human inspector: 30 items/minute, 95% accuracy, high labor cost, fatigue errors - Cloud AI: Upload images (500 KB each), 200ms network + 100ms inference = 300ms (meets speed, but bandwidth prohibitive for 1000s of cameras)

Edge AI Solution:

Hardware: Industrial camera + NVIDIA Jetson Nano ($99) or Coral Edge TPU ($60)
Model: MobileNetV2 defect classifier (3.5 MB int8)
Training: 10,000 labeled images (good vs 5 defect types)
Inference: 5-10ms per image, 99.2% accuracy

Cost Analysis:
- Hardware: $100 per production line
- Training: One-time $5K (data labeling + compute)
- Operation: No cloud costs, just electricity (~$5/year)
ROI: Replace $60K/year human inspector, achieve 99%+ consistency

315.2.2 Real Deployment Code

# Defect detection inference loop
while True:
    image = camera.capture()  # 1920x1080 RGB

    # Preprocessing (5ms)
    resized = cv2.resize(image, (224, 224))
    normalized = resized / 255.0

    # Inference on Edge TPU (5ms)
    prediction = model.predict(normalized)

    # Classes: [good, scratch, dent, crack, discolor, other]
    class_id = np.argmax(prediction)
    confidence = prediction[class_id]

    if class_id != 0 and confidence > 0.90:  # Defect detected
        # Trigger reject mechanism (pneumatic arm)
        reject_actuator.activate()

        # Log to local database for quality tracking
        db.insert_defect(timestamp, class_id, confidence, image_thumbnail)

        # Only send defect images to cloud (not every image)
        if random.random() < 0.1:  # Sample 10% for continuous learning
            cloud.upload_for_retraining(image, class_id)

    # Total cycle time: 10-15ms (can handle 60-100 items/second)

315.3 Predictive Maintenance: Industrial Equipment

Problem: Predict bearing failure in industrial motors 2-4 weeks before catastrophic failure, avoiding $100K+ downtime.

315.3.1 Edge AI Solution

Hardware: Vibration sensor (accelerometer) + ESP32 microcontroller ($10)
Model: 1D CNN anomaly detection (30 KB TFLite)
Data: Vibration FFT features (frequency spectrum analysis)
Inference: 20ms per 1-second window, runs continuously

How It Works:
1. Accelerometer samples vibration at 10 kHz (10,000 samples/second)
2. Every 1 second, compute FFT (Fast Fourier Transform) to get frequency spectrum
3. Extract 64 frequency bins as features (e.g., energy in 10-100 Hz, 100-500 Hz, etc.)
4. CNN model classifies: Normal vs Early Warning vs Critical
5. Normal: Continue monitoring, Critical: Immediate alert to maintenance team

Training:
- Collect months of normal operation data (healthy baseline)
- Inject synthetic anomalies or use historical failure data
- Autoencoder or one-class SVM to detect "anything unusual"

315.3.2 Vibration Feature Engineering

Normal bearing:
  Peak frequency: 60 Hz (motor rotation speed)
  Harmonics: 120 Hz, 180 Hz (expected)
  Amplitude: Stable +/-10%

Failing bearing (early stage):
  New frequencies appear: 237 Hz, 412 Hz (bearing defect frequencies)
  Amplitude increases: +30% in high-frequency range (>1 kHz)
  Intermittent: Not constant, appears during load

Failing bearing (critical):
  Broad spectrum noise: Energy across all frequencies
  Amplitude spikes: +200% peaks
  Constant: Always present

Edge AI Model detects these patterns in real-time, alerting 2-4 weeks early.

315.3.3 Deployment Results

50 motors monitored continuously:
- False positive rate: 5% (2-3 false alarms per year)
- True positive rate: 95% (detected 19 of 20 actual failures)
- Lead time: Average 18 days before failure
- Cost savings: $2M/year avoided downtime (vs $10K hardware investment)

315.4 Voice and Audio: Keyword Spotting

Problem: Continuously listen for wake word (“Hey Device”) on battery-powered smart speaker, using <1 mW power.

315.4.1 Two-Stage Pipeline

1. Always-On Detector (ultra-low-power DSP):
   - Runs 18 KB tiny model continuously
   - Detects wake word with 85% accuracy, 5% false positive rate
   - Power: 0.5-1 mW

2. Verification Stage (main CPU):
   - Activates only when Stage 1 detects keyword
   - Runs larger 200 KB model for confirmation (95% accuracy)
   - Power: 50 mW for 2 seconds (then back to sleep)

Why Two Stages?
- Stage 1 runs 24/7 on tiny power budget
- Stage 2 only activates occasionally (1-2 times/hour) to filter false positives
- Average power: 1 mW + (50 mW x 2 sec x 2 times/hour / 3600 sec/hour) = 1.06 mW
- Battery life: 1000 mAh battery / 1 mW = 1000 hours = 40 days

315.4.2 Audio Feature Extraction

Raw audio: 16 kHz sample rate, 16-bit PCM
Window: 1 second = 16,000 samples

Preprocessing:
1. Pre-emphasis filter (boost high frequencies)
2. Frame audio into 25ms windows with 10ms stride (100 frames/second)
3. Compute MFCC (Mel-Frequency Cepstral Coefficients):
   - 40 MFCC coefficients per frame
   - Captures phonetic content of speech
4. Stack 49 frames (490ms of audio context)

Input tensor: 40 MFCC x 49 frames = 1960 features -> CNN -> [Wake Word Probability]

315.5 Building an End-to-End Edge AI Pipeline

Scenario: Deploy a smart parking space detector using computer vision on a solar-powered edge device.

315.5.1 Step 1: Data Collection

Equipment:
- Raspberry Pi 4 + Camera Module v2 (8MP, $25)
- Mount camera above parking lot, capturing 4 spaces per camera

Data Collection Strategy:
- Capture 1 image every 10 seconds for 2 weeks (120,000 images)
- Vary lighting conditions: morning, afternoon, night, rain, snow
- Capture different car types, angles, partial occupancy

Labeling:
- Use Label Studio or Roboflow to draw bounding boxes around cars
- Classes: [Empty, Occupied]
- 5,000 images manually labeled, 115,000 automatically using pre-trained model + manual review

315.5.2 Step 2: Model Training (Cloud)

# Transfer learning with MobileNetV2
import tensorflow as tf
from tensorflow.keras.applications import MobileNetV2

base_model = MobileNetV2(weights='imagenet', include_top=False,
                          input_shape=(224, 224, 3))
base_model.trainable = False  # Freeze pre-trained weights

model = tf.keras.Sequential([
    base_model,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(2, activation='softmax')  # Empty vs Occupied
])

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train for 20 epochs on 5,000 labeled images
history = model.fit(train_dataset, epochs=20, validation_data=val_dataset)
# Result: 97.5% validation accuracy

315.5.3 Step 3: Quantization and Optimization

# Post-training quantization to int8
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Provide representative dataset for calibration
def representative_dataset():
    for i in range(100):
        yield [train_images[i:i+1]]  # Sample calibration data

converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

# Convert
tflite_model = converter.convert()

# Save quantized model
with open('parking_detector_int8.tflite', 'wb') as f:
    f.write(tflite_model)

# Output: Original ~14 MB -> Quantized 3.8 MB (3.7x smaller)

315.5.4 Step 4: Deploy to Edge Device

# Raspberry Pi inference script
import tflite_runtime.interpreter as tflite
import cv2
import numpy as np

# Load quantized model
interpreter = tflite.Interpreter(model_path="parking_detector_int8.tflite")
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Define parking space ROIs (regions of interest)
spaces = [
    {"id": "A1", "bbox": (100, 200, 300, 400)},
    {"id": "A2", "bbox": (350, 200, 550, 400)},
    {"id": "A3", "bbox": (600, 200, 800, 400)},
    {"id": "A4", "bbox": (850, 200, 1050, 400)}
]

def check_parking_space(image, bbox):
    """Run inference on cropped parking space"""
    x1, y1, x2, y2 = bbox
    crop = image[y1:y2, x1:x2]

    # Preprocess
    resized = cv2.resize(crop, (224, 224))
    input_data = np.expand_dims(resized, axis=0).astype(np.uint8)

    # Inference
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()
    output = interpreter.get_tensor(output_details[0]['index'])[0]

    # Classes: [Empty, Occupied]
    confidence = output[1]  # Occupied probability
    return confidence > 0.7  # Threshold

# Main loop
while True:
    ret, frame = camera.read()
    if not ret:
        continue

    # Check each parking space
    occupancy = {}
    for space in spaces:
        is_occupied = check_parking_space(frame, space["bbox"])
        occupancy[space["id"]] = is_occupied

    # Update cloud dashboard (only when status changes)
    # Reduces bandwidth: 4 spaces x 10 bytes/status = 40 bytes vs 500 KB image
    send_status_update(occupancy)

    # Sleep 10 seconds (no need for 30fps monitoring)
    time.sleep(10)

315.5.5 Step 5: Continuous Monitoring and Retraining

Production Monitoring:
- Log inference confidence scores to detect model drift
- Sample 1% of images for manual review (quality assurance)
- Track false positives (marked occupied but actually empty) and false negatives

Model Retraining (every 3 months):
- Collect edge cases from production logs (e.g., motorcycles, trucks, snow-covered)
- Add 500-1000 new labeled images to training set
- Retrain model with expanded dataset
- A/B test: Deploy to 10% of cameras, compare accuracy vs old model
- Full rollout if accuracy improves by >1%

Result:
- Initial accuracy: 97.5%
- After 6 months of continuous learning: 98.9%
- False positive rate: <2%

315.6 Knowledge Check

Show code

{
  const container = document.createElement('div');
  if (container && typeof InlineKnowledgeCheck !== 'undefined') {
    container.innerHTML = '';
    container.appendChild(InlineKnowledgeCheck.create({
      question: "A smart parking deployment runs MobileNetV2 on Raspberry Pi 4. Initial accuracy: 97.5%. After 6 months, accuracy drops to 89% due to new car models, weather conditions, and lighting changes. What is the best strategy?",
      options: [
        {text: "Replace Pi 4 with Jetson Nano for more compute power", correct: false, feedback: "More compute power doesn't solve the problem. Accuracy dropped because the model hasn't seen new data patterns."},
        {text: "Implement continuous learning: sample 1% of production images, retrain quarterly, A/B test before rollout", correct: true, feedback: "Correct! Continuous learning addresses data drift: (1) Sample production images, (2) Add to training set, (3) Retrain, (4) A/B test, (5) Full rollout. Result: 97.5% -> 98.9% after 6 months."},
        {text: "Switch to cloud AI for more powerful models", correct: false, feedback: "Cloud AI doesn't solve data distribution shift. Plus adds latency, bandwidth costs, and defeats edge AI benefits."},
        {text: "Increase model size to ResNet-50 (100 MB) for better generalization", correct: false, feedback: "Larger model won't generalize to unseen patterns without retraining. Plus ResNet-50 doesn't fit Pi 4 constraints."}
      ],
      explanation: "Edge AI production requires continuous monitoring and retraining to handle data drift. Initial model trained on limited dataset reveals new patterns in production. Continuous learning pipeline: sample edge cases, expand training set, retrain quarterly.",
      difficulty: "medium",
      topic: "edge-ai-continuous-learning"
    }));
  }
  return container;
}

Show code

{
  const container = document.createElement('div');
  if (container && typeof InlineKnowledgeCheck !== 'undefined') {
    container.innerHTML = '';
    container.appendChild(InlineKnowledgeCheck.create({
      question: "Compare edge AI architectures for smart factory defect detection (100 items/minute). A: Jetson Nano local only (10ms, $99). B: ESP32 + cloud (250ms total, $8 + cloud costs). C: Jetson Nano hybrid - local for 90%, cloud for uncertain cases (15ms average). Which is optimal?",
      options: [
        {text: "Architecture A (pure edge) - most cost-effective, no cloud costs", correct: false, feedback: "While A has lowest latency, it doesn't leverage cloud for continuous improvement from uncertain edge cases."},
        {text: "Architecture B (pure cloud) - centralized intelligence, faster model updates", correct: false, feedback: "B's 250ms latency is too slow for 100 items/minute (600ms budget). Plus bandwidth costs: 144 GB/day = $14.4/day."},
        {text: "Architecture C (hybrid edge-cloud) - combines edge speed with cloud learning for continuous improvement", correct: true, feedback: "Correct! Hybrid optimizes: (1) Latency: 90% local at 10ms, 10% cloud at 200ms = 15ms average. (2) Cost: 10% cloud traffic vs 100%. (3) Accuracy: Edge cases improve model. (4) Resilience: Works offline for 90% of cases."},
        {text: "Architecture A initially, migrate to C after 6 months", correct: false, feedback: "Starting pure edge misses early cloud-assisted learning opportunities. Design hybrid from day one for faster accuracy improvements."}
      ],
      explanation: "Hybrid edge-cloud architectures optimize speed-cost-accuracy tradeoff: EDGE handles common cases (90%) locally. CLOUD handles uncertain cases (10%) with powerful models. LEARNING LOOP: Cloud insights retrain edge models monthly. Real-world deployments architect intelligent task distribution based on confidence thresholds.",
      difficulty: "hard",
      topic: "hybrid-edge-cloud-architecture"
    }));
  }
  return container;
}

315.7 Summary

Key Applications:

Application	Hardware	Model Size	Latency	ROI
Visual Inspection	Jetson/Coral	3.5 MB	5-10ms	Replace $60K/year inspector
Predictive Maintenance	ESP32	30 KB	20ms	$2M/year savings
Keyword Spotting	Low-power DSP	18 KB	20ms	40-day battery life

Pipeline Best Practices: 1. Data Collection: Capture diverse conditions (lighting, weather, variations) 2. Transfer Learning: Start with pretrained model (MobileNetV2, EfficientNet) 3. Quantization: INT8 for 4x size reduction and speedup 4. Continuous Learning: Sample production data, retrain quarterly 5. Hybrid Architecture: Local for 90%, cloud for uncertain cases

315.8 What’s Next

Now that you understand edge AI applications and deployment, continue to:

Edge AI Lab: TinyML Gesture Recognition - Build a hands-on TinyML project with quantization, pruning, and inference
Edge-Fog Computing - Learn how edge AI fits in the broader edge-fog-cloud hierarchy
Fog Production and Review - Explore orchestration for distributed edge AI deployments