25  Edge AI Deployment Pipeline

What Edge AI Delivers: Edge AI enables real-time intelligent decision-making directly on devices without cloud dependency, reducing latency from seconds to milliseconds while cutting bandwidth costs by 90%+ and enabling operation in connectivity-constrained environments.

Investment Framework:

Metric Range Key Consideration
Hardware Cost $10-$500 per device ESP32 ($10) to Jetson AGX ($500) depending on complexity
Typical ROI 3-18 months Visual inspection and predictive maintenance lead
Bandwidth Savings 90-99% Process locally, send only insights
Latency Reduction 10-100x 5-50ms local vs 200-500ms cloud round-trip

High-Impact Use Cases (ROI Leaders):

Application Hardware Investment Annual Savings Payback Period
Visual Inspection $100/line $60K labor replacement 2-3 days
Predictive Maintenance $10K/50 motors $2M avoided downtime 2 months
Voice/Keyword Spotting $5/device Battery life extension 6 months

When to Invest in Edge AI:

  • Latency requirements under 100ms (safety-critical, real-time control)
  • Bandwidth costs prohibitive (cameras, high-frequency sensors)
  • Privacy constraints require local processing (healthcare, industrial secrets)
  • Connectivity unreliable (remote sites, mobile equipment)
  • Regulatory requirements mandate data locality

Key Risk Factors: Model accuracy degradation over time (budget for continuous retraining), hardware obsolescence cycles (3-5 years), edge security vulnerabilities, and ML ops complexity requiring specialized skills.

25.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Implement Visual Inspection Systems: Configure edge AI for manufacturing quality control with 99%+ accuracy and calculate throughput requirements
  • Design Predictive Maintenance Pipelines: Analyze vibration-based anomaly detection that predicts failures weeks in advance and evaluate ROI
  • Calculate Power Budgets for Keyword Spotting: Apply two-stage pipeline design to achieve always-on voice detection on ultra-low-power devices (<1 mW)
  • Construct End-to-End Deployment Pipelines: Demonstrate complete edge AI systems from data collection through quantization to production monitoring
In 60 Seconds

Edge AI deploys trained machine learning models directly on IoT devices and gateways, eliminating cloud round-trips and achieving 5-50ms inference latency versus 200-500ms for cloud-based inference. Three high-impact applications drive adoption: visual inspection (99%+ defect detection replacing manual inspection), predictive maintenance (vibration-based failure prediction 2-4 weeks early), and keyword spotting (<1mW always-on voice detection). The deployment pipeline — data collection, training, quantization (4x size reduction), and production monitoring — must be designed end-to-end before hardware selection.

Key Concepts
  • Edge AI Inference: Running a trained ML model on local hardware (MCU, GPU, NPU) to classify or predict from sensor data without cloud connectivity
  • Visual Inspection System: Computer vision pipeline running on edge GPU that detects product defects in real-time at production line speeds (30-120 fps)
  • Predictive Maintenance: Vibration and temperature time-series analysis identifying patterns (bearing wear, imbalance) that precede equipment failure by days or weeks
  • Keyword Spotting: Always-on audio model (10-30KB) on microcontroller detecting wake words at <1mW, triggering a larger voice model only after detection
  • Edge Inference Latency: Time from sensor input to model output on local hardware — 1-10ms on NPU/GPU vs. 200-500ms for cloud round-trip
  • INT8 Quantization: Post-training technique converting 32-bit float model weights to 8-bit integers, achieving 4x size reduction with <2% accuracy loss
  • Deployment Pipeline: Sequence of steps from data labeling through training, optimization, packaging, deployment, and monitoring for edge ML applications
  • Production Monitoring: Continuous tracking of model accuracy, inference latency, and data distribution at edge to detect drift requiring retraining
Minimum Viable Understanding
  • Edge AI runs ML models directly on devices (ESP32, Jetson Nano, Coral TPU) achieving 5-50ms inference latency, compared to 200-500ms for cloud round-trips, making it essential for safety-critical and real-time applications like defect inspection and predictive maintenance.
  • The core tradeoff is model size vs. accuracy: INT8 quantization shrinks models by 4x (e.g., MobileNetV2 from 14 MB to 3.5 MB) with only 1-2% accuracy loss, while techniques like two-stage pipelines (18 KB always-on + 200 KB verification) keep power under 1.1 mW.
  • Right-size your deployment: A 30 KB TFLite model on a $10 ESP32 can detect bearing failures and save $2M/year; you do not always need a GPU. Match hardware to workload – use hybrid edge-cloud (90% local, 10% cloud) for continuous model improvement.

Hey young engineers! Did you know smart devices can have tiny brains that help them think? Sammy the Sensor and the squad are here to explain!

Lila says: “I found something amazing – a tiny computer chip called Eddie that’s smaller than your fingernail, but super smart!”

25.1.1 Meet Eddie the Edge AI Chip!

Eddie is a tiny computer chip smaller than your fingernail, but he’s super smart!

What Makes Eddie Special:

Superpower What It Means Example
Fast Thinker Makes decisions in milliseconds Spots a bad cookie before it reaches the box
Always Watching Never gets tired or bored Listens for “Hey Alexa” all day long
Power Sipper Uses less energy than a small LED Battery lasts for months!
Privacy Keeper Keeps secrets on the device Your voice stays in your home

25.1.3 Why Edge AI is Like a Superhero

Cloud AI is like calling a superhero headquarters far away: - “Hello? There’s a problem!” (send data) - Wait… wait… wait… (network delay) - “Okay, here’s what to do!” (get response) - Total time: 200-500 milliseconds

Edge AI (Eddie) is like having a superhero RIGHT THERE: - “Problem spotted!” (local processing) - “Fixed!” (immediate action) - Total time: 5-50 milliseconds

When something moves fast (like cookies on a conveyor belt or a car on the road), you need Eddie-fast decisions, not phone-call-slow decisions!

25.1.4 Fun Fact Corner

Did you know? The tiny AI chip in a smart speaker that listens for “Hey Alexa” or “OK Google” uses less power than a small nightlight! It can run for YEARS on a tiny battery because it’s so energy-efficient.

Max asks: “But wait – why can’t Eddie just use the internet like we do?” Bella explains: “Imagine you’re playing catch. Would you rather catch the ball yourself (super fast!) or phone a friend far away and ask them what to do (too slow – the ball already hit the ground!)? That’s why Eddie thinks locally!”

Sammy adds: “And the best part? Eddie is so good at sipping power that a tiny battery can keep him running for months. He’s like a flashlight that never runs out!”

Challenge: Can you find 3 devices in your home that might have an “Eddie” inside? (Hint: smart speakers, doorbells, and game controllers!)

Simple Definition: AI that runs directly on devices instead of in the cloud

Edge AI brings artificial intelligence capabilities to the “edge” of the network - the devices themselves. Instead of sending data to powerful cloud servers for analysis, the device processes information locally using a small, efficient AI model.

Cloud AI vs. Edge AI Comparison:

Aspect Cloud AI Edge AI
Where it runs Remote data centers On the device itself
Latency 200-500ms (network delay) 5-50ms (local)
Internet required? Yes, always No, works offline
Privacy Data sent to cloud Data stays on device
Bandwidth cost High (send all data) Low (send only results)
Model size Unlimited (GB to TB) Limited (KB to MB)
Model accuracy Highest possible Good enough for task

Everyday Edge AI Examples:

Device What the Edge AI Does Why Local Matters
Smart Speaker Detects “Hey Alexa” wake word Always listening, battery-powered
Phone Camera Identifies faces for photos Privacy - faces don’t leave phone
Smart Doorbell Detects people vs. animals Works without internet
Factory Camera Spots defects in products Too fast for cloud round-trip
Fitness Tracker Recognizes exercise types Battery lasts longer

The Key Tradeoff:

Edge AI models must be SMALL to fit on limited hardware. This means:

  • Smaller = Faster but less accurate (might miss subtle patterns)
  • Larger = Slower but more accurate (catches more edge cases)

Engineers choose the “right size” model that’s accurate enough for the job while fitting the hardware and power constraints.

Why Edge AI Matters:

  1. Speed: A self-driving car can’t wait 500ms for cloud response
  2. Reliability: Factory inspection can’t stop if internet goes down
  3. Privacy: Healthcare data shouldn’t leave the device
  4. Cost: Sending video to cloud costs money; sending “1 person detected” is almost free

25.2 Visual Inspection: Manufacturing Quality Control

Problem: Detect defects in manufactured parts at 100 items/minute (600ms per item) with 99.5%+ accuracy.

25.2.1 Edge AI Visual Inspection Architecture

Edge AI visual inspection pipeline showing camera capture, preprocessing, neural network inference on Edge TPU, and actuator control for defect rejection

25.2.2 Traditional vs Edge AI Approach

Traditional Approach:

  • Human inspector: 30 items/minute, 95% accuracy, high labor cost, fatigue errors
  • Cloud AI: Upload images (500 KB each), 200ms network + 100ms inference = 300ms (meets speed, but bandwidth prohibitive for 1000s of cameras)

Edge AI Solution:

Component Specification Details
Hardware Industrial camera + NVIDIA Jetson Nano ($99) or Coral Edge TPU ($60) Compact, industrial-grade
Model MobileNetV2 defect classifier 3.5 MB INT8 quantized
Training 10,000 labeled images Good vs 5 defect types
Inference 5-10ms per image 99.2% accuracy

Cost Analysis:

Item Cost Notes
Hardware $100/line One-time investment
Training $5K one-time Data labeling + compute
Operation ~$5/year Electricity only, no cloud costs
ROI $60K/year savings Replace human inspector

25.2.3 Real Deployment Code

# Defect detection inference loop
while True:
    image = camera.capture()  # 1920x1080 RGB

    # Preprocessing (5ms)
    resized = cv2.resize(image, (224, 224))
    normalized = resized / 255.0

    # Inference on Edge TPU (5ms)
    prediction = model.predict(normalized)

    # Classes: [good, scratch, dent, crack, discolor, other]
    class_id = np.argmax(prediction)
    confidence = prediction[class_id]

    if class_id != 0 and confidence > 0.90:  # Defect detected
        # Trigger reject mechanism (pneumatic arm)
        reject_actuator.activate()

        # Log to local database for quality tracking
        db.insert_defect(timestamp, class_id, confidence, image_thumbnail)

        # Only send defect images to cloud (not every image)
        if random.random() < 0.1:  # Sample 10% for continuous learning
            cloud.upload_for_retraining(image, class_id)

    # Total cycle time: 10-15ms (can handle 60-100 items/second)

25.3 Predictive Maintenance: Industrial Equipment

Problem: Predict bearing failure in industrial motors 2-4 weeks before catastrophic failure, avoiding $100K+ downtime.

25.3.1 Edge AI Solution

Component Specification Details
Hardware Vibration sensor (accelerometer) + ESP32 $10 total cost
Model 1D CNN anomaly detection 30 KB TFLite
Data Vibration FFT features Frequency spectrum analysis
Inference 20ms per 1-second window Runs continuously

Predictive maintenance pipeline showing vibration sensor data flow from accelerometer through FFT feature extraction to CNN anomaly detection with three-state output (Normal, Warning, Critical) and alert system

How It Works:

  1. Accelerometer samples vibration at 10 kHz (10,000 samples/second)
  2. Every 1 second, compute FFT (Fast Fourier Transform) to get frequency spectrum
  3. Extract 64 frequency bins as features (e.g., energy in 10-100 Hz, 100-500 Hz, etc.)
  4. CNN model classifies: Normal vs Early Warning vs Critical
  5. Normal: Continue monitoring | Critical: Immediate alert to maintenance team

Training Strategy:

  • Collect months of normal operation data (healthy baseline)
  • Inject synthetic anomalies or use historical failure data
  • Autoencoder or one-class SVM to detect “anything unusual”

25.3.2 Vibration Feature Engineering

Timeline diagram showing bearing degradation stages from healthy to critical failure, with corresponding frequency patterns and amplitude changes at each stage

Bearing State Frequency Pattern Amplitude Detection Window
Healthy Peak 60 Hz + harmonics (120, 180 Hz) Stable +/-10% Baseline reference
Early Warning New defect frequencies (237, 412 Hz) +30% high-freq 2-4 weeks early
Critical Broad spectrum noise +200% spikes Days before failure

Key Insight: Edge AI Model detects these patterns in real-time, alerting maintenance teams 2-4 weeks before catastrophic failure.

25.3.3 Deployment Results

Metric Value Impact
Motors Monitored 50 Continuous 24/7 monitoring
False Positive Rate 5% 2-3 false alarms per year
True Positive Rate 95% Detected 19 of 20 actual failures
Lead Time 18 days average Time before predicted failure
Hardware Investment $10K one-time ESP32 + accelerometers
Annual Savings $2M Avoided unplanned downtime
ROI 200x $2M savings / $10K investment

25.4 Voice and Audio: Keyword Spotting

Problem: Continuously listen for wake word (“Hey Device”) on battery-powered smart speaker, using <1 mW power.

25.4.1 Two-Stage Pipeline

Two-stage keyword spotting pipeline showing always-on low-power DSP running 18KB model continuously at 1mW, triggering verification stage with 200KB model at 50mW only when wake word detected

Stage Component Model Size Accuracy Power Duty
Stage 1 Ultra-low-power DSP 18 KB 85% (5% false positive) 0.5-1 mW 100% (always on)
Stage 2 Main CPU 200 KB 95% 50 mW ~0.1% (2 sec, 1-2x/hour)

Why Two Stages?

  • Stage 1 runs 24/7 on tiny power budget (coarse filter)
  • Stage 2 activates only occasionally (1-2 times/hour) to filter false positives
  • Average power calculation:
    • Stage 1: 1 mW constant
    • Stage 2: 50 mW × 2 sec × 2 times/hour ÷ 3600 sec/hour = 0.056 mW
    • Total: ~1.06 mW
  • Battery life: 1000 mAh battery ÷ 1 mW = 1000 hours = 40+ days

25.4.2 Interactive: Two-Stage Power Calculator

25.4.3 Audio Feature Extraction

Signal processing pipeline showing input data flowing through preprocessing stages, feature extraction, and neural network inference for classification

Step Operation Output
1. Input 16 kHz, 16-bit PCM 16,000 samples/second
2. Pre-emphasis Boost high frequencies Enhanced speech
3. Framing 25ms windows, 10ms stride 100 frames/second
4. MFCC Mel-Frequency Cepstral Coefficients 40 coefficients/frame
5. Stacking 49 frames (490ms context) Temporal context
6. Tensor 40 × 49 features 1960-dimensional input
7. CNN Neural network classifier Wake word probability

25.5 Building an End-to-End Edge AI Pipeline

Scenario: Deploy a smart parking space detector using computer vision on a solar-powered edge device.

Edge AI deployment lifecycle showing five iterative stages: Data Collection feeds into Cloud Training, which produces a model for Quantization and Optimization, followed by Edge Deployment, then Production Monitoring that loops back to Data Collection for continuous improvement

25.5.1 Step 1: Data Collection

Equipment:
- Raspberry Pi 4 + Camera Module v2 (8MP, $25)
- Mount camera above parking lot, capturing 4 spaces per camera

Data Collection Strategy:
- Capture 1 image every 10 seconds for 2 weeks (120,000 images)
- Vary lighting conditions: morning, afternoon, night, rain, snow
- Capture different car types, angles, partial occupancy

Labeling:
- Use Label Studio or Roboflow to draw bounding boxes around cars
- Classes: [Empty, Occupied]
- 5,000 images manually labeled, 115,000 automatically using pre-trained model + manual review

25.5.2 Step 2: Model Training (Cloud)

# Transfer learning with MobileNetV2
import tensorflow as tf
from tensorflow.keras.applications import MobileNetV2

base_model = MobileNetV2(weights='imagenet', include_top=False,
                          input_shape=(224, 224, 3))
base_model.trainable = False  # Freeze pre-trained weights

model = tf.keras.Sequential([
    base_model,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(2, activation='softmax')  # Empty vs Occupied
])

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train for 20 epochs on 5,000 labeled images
history = model.fit(train_dataset, epochs=20, validation_data=val_dataset)
# Result: 97.5% validation accuracy

25.5.3 Step 3: Quantization and Optimization

# Post-training quantization to int8
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Provide representative dataset for calibration
def representative_dataset():
    for i in range(100):
        yield [train_images[i:i+1]]  # Sample calibration data

converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

# Convert
tflite_model = converter.convert()

# Save quantized model
with open('parking_detector_int8.tflite', 'wb') as f:
    f.write(tflite_model)

# Output: Original ~14 MB -> Quantized 3.8 MB (3.7x smaller)

25.5.4 Step 4: Deploy to Edge Device

# Raspberry Pi parking space inference script
import tflite_runtime.interpreter as tflite
import cv2, numpy as np

# Load quantized model
interpreter = tflite.Interpreter(model_path="parking_detector_int8.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Parking space regions of interest
spaces = [
    {"id": "A1", "bbox": (100, 200, 300, 400)},
    {"id": "A2", "bbox": (350, 200, 550, 400)},
    # ... additional spaces defined similarly
]

def check_parking_space(image, bbox):
    """Crop, resize, and classify a single parking space."""
    x1, y1, x2, y2 = bbox
    crop = cv2.resize(image[y1:y2, x1:x2], (224, 224))
    input_data = np.expand_dims(crop, axis=0).astype(np.uint8)
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()
    output = interpreter.get_tensor(output_details[0]['index'])[0]
    return output[1] > 0.7  # Occupied confidence threshold

# Main loop: check spaces every 10 s, send only status changes
while True:
    ret, frame = camera.read()
    if not ret:
        continue
    occupancy = {s["id"]: check_parking_space(frame, s["bbox"])
                 for s in spaces}
    send_status_update(occupancy)  # 40 bytes vs 500 KB image
    time.sleep(10)

25.5.5 Step 5: Continuous Monitoring and Retraining

Production Monitoring:
- Log inference confidence scores to detect model drift
- Sample 1% of images for manual review (quality assurance)
- Track false positives (marked occupied but actually empty) and false negatives

Model Retraining (every 3 months):
- Collect edge cases from production logs (e.g., motorcycles, trucks, snow-covered)
- Add 500-1000 new labeled images to training set
- Retrain model with expanded dataset
- A/B test: Deploy to 10% of cameras, compare accuracy vs old model
- Full rollout if accuracy improves by >1%

Result:
- Initial accuracy: 97.5%
- After 6 months of continuous learning: 98.9%
- False positive rate: <2%

25.6 Common Pitfalls in Edge AI Deployment

Common Pitfalls

Pitfall 1 – Ignoring Data Drift in Production Models trained on lab data degrade in the field. A parking detector trained on summer images may drop from 97.5% to 89% accuracy in winter due to snow, ice reflections, and different lighting. Fix: Implement a continuous learning pipeline – sample 1% of production images, retrain quarterly, and A/B test before full rollout. Budget for ongoing data labeling, not just initial training.

Pitfall 2 – Oversizing the Model Engineers often default to large models (ResNet-50 at 100 MB) when a quantized MobileNetV2 (3.5 MB) achieves nearly identical accuracy for the target task. Larger models consume more power, require expensive hardware (GPU vs. $10 MCU), and increase inference latency without proportional accuracy gains on constrained classification tasks. Fix: Start with the smallest viable model and scale up only if accuracy is insufficient after proper training and quantization.

Pitfall 3 – Skipping Quantization Calibration Naive INT8 quantization without a representative calibration dataset can cause 5-10% accuracy drops instead of the expected 1-2%. This happens because the quantization range is not properly calibrated to the actual data distribution. Fix: Always provide 100-500 representative samples that cover the full range of expected inputs (lighting conditions, object variations, edge cases) during the quantization step.

Pitfall 4 – Pure Edge with No Feedback Loop Deploying edge AI as a “set and forget” system with no mechanism to identify uncertain predictions or collect edge cases means the model never improves. Fix: Use a hybrid architecture – process high-confidence predictions locally (90% of cases) and route uncertain predictions (confidence between 0.5-0.8) to the cloud for both immediate higher-accuracy inference and future retraining data collection.

Pitfall 5 – Underestimating Preprocessing Costs Teams focus on inference time (e.g., 5ms on Edge TPU) but overlook that image resizing, normalization, and feature extraction (e.g., FFT, MFCC) can take 5-20ms, sometimes exceeding inference time. For vibration analysis, the FFT alone on an ESP32 takes 10-15ms for 10,000 samples. Fix: Profile the entire pipeline end-to-end, not just the model inference step. Use hardware-accelerated preprocessing where available and pipeline preprocessing with inference for overlapping execution.

25.7 Knowledge Check

25.8 Worked Example: Edge AI ROI for Predictive Maintenance on a Motor Fleet

Worked Example: Vibration-Based Bearing Failure Detection for 200 Motors

Scenario: A beverage bottling plant has 200 electric motors (conveyor drives, pumps, compressors) averaging $8,000 replacement cost. Historically, 12 motors fail per year with unplanned downtime costing $15,000 per incident (lost production + emergency labor). The team evaluates deploying edge AI vibration sensors vs continuing with reactive maintenance.

Step 1: Current Cost of Reactive Maintenance

Cost Category Annual Cost
12 unplanned motor replacements $96,000
Unplanned downtime (12 x $15,000) $180,000
Emergency overtime labor $36,000
Total reactive maintenance $312,000/year

Step 2: Edge AI System Design

Component Per Motor Fleet (200 motors)
MEMS accelerometer (ADXL345) $3 $600
ESP32 MCU + TFLite Micro runtime $5 $1,000
LoRa radio (intra-plant mesh) $8 $1,600
Installation (mount + wire) $25 $5,000
Gateway + dashboard server $2,000
ML model development (one-time) $20,000
Total deployment $41/motor $30,200

Step 3: Edge AI Model Specifications

Parameter Value
Model type 1D-CNN on FFT vibration spectrum
Model size 28 KB (INT8 quantized)
Input 256-point FFT from 1-second vibration sample
Inference time on ESP32 18 ms
Sample frequency Every 15 minutes
Detection accuracy 94% (trained on 6 months of failure data)
False positive rate 3% (6 false alerts/year across fleet)
Lead time before failure 2-6 weeks advance warning

Step 4: Predictive Maintenance Cost

Cost Category Annual Cost
Planned motor replacements (detect 11 of 12 failures at 94%) $88,000 (same motors, but scheduled)
Planned downtime (scheduled during shift changes) $11,000 (vs $180K unplanned)
1 undetected failure (6% miss rate) $23,000
False positive investigations (6/year x $200 labor) $1,200
System maintenance (firmware, model retraining) $5,000
Total predictive maintenance $128,200/year

Step 5: ROI Calculation

Metric Value
Annual savings $312,000 - $128,200 = $183,800
System cost (one-time) $30,200
Payback period 2 months
3-year ROI ($183,800 x 3 - $30,200) / $30,200 = 1,726%

Why edge, not cloud? The ESP32 runs inference locally in 18 ms. Cloud inference would require continuous vibration streaming: 200 motors x 256 floats x every 15 min = 4.6 GB/day of raw data. At $0.09/GB cellular, that is $151/day ($55K/year) in data transfer alone – nearly the cost of the entire edge system.

Predictive maintenance ROI scales with failure cost avoided and detection accuracy. \(\text{Annual savings} = (N_{\text{failures}} \times P_{\text{detect}} \times \text{downtime cost}) - \text{system cost}\) Worked example: 12 failures/yr × 0.94 detection × ($15K downtime + $8K motor) = $260K avoided, minus $128K predictive cost = $183,800 annual savings, achieving 2-month payback on $30K system investment.

25.9 Summary

Key Applications:

Application Hardware Model Size Latency ROI
Visual Inspection Jetson/Coral 3.5 MB 5-10ms Replace $60K/year inspector
Predictive Maintenance ESP32 30 KB 20ms $2M/year savings
Keyword Spotting Low-power DSP 18 KB 20ms 40-day battery life

Pipeline Best Practices:

  1. Data Collection: Capture diverse conditions (lighting, weather, variations)
  2. Transfer Learning: Start with pretrained model (MobileNetV2, EfficientNet)
  3. Quantization: INT8 for 4x size reduction and speedup
  4. Continuous Learning: Sample production data, retrain quarterly
  5. Hybrid Architecture: Local for 90%, cloud for uncertain cases

25.10 Knowledge Check

25.11 What’s Next

Now that you can implement edge AI applications and deployment pipelines, continue to:

Topic Chapter Description
TinyML Gesture Recognition Edge AI Lab Build a hands-on TinyML project with quantization, pruning, and inference
Edge-Fog Hierarchy Edge-Fog Computing Analyze how edge AI fits in the broader edge-fog-cloud architecture
Distributed Orchestration Fog Production and Review Evaluate orchestration strategies for distributed edge AI deployments