45 Predictive Maintenance

applications

iiot

predictive

maintenance

45.1 Start With the Story

Start with a machine that has not failed yet, but is beginning to leave evidence in vibration, temperature, current, sound, or operating history. Predictive maintenance is the story of turning weak signals into a justified intervention before downtime, safety risk, or unnecessary replacement costs appear.

Phoebe’s Field Notes: Why a Bearing Fault Lands at 93 Hz, Not Some Other Number

Phoebe the physics guide

Phoebe’s Why

An IEPE accelerometer turns bearing vibration into a voltage the same general way every piezoelectric sensor does: a seismic mass squeezes a crystal, and the crystal’s charge output is proportional to that squeeze force. But the number this chapter actually cares about is not the sensor’s sensitivity – it is which frequency the fault shows up at, and that is pure mechanical kinematics, not electronics. A defect on the outer race gets struck by a rolling element once every time a ball passes it, and that passing rate is fixed by the bearing’s own geometry and the shaft speed, not by anything in the signal chain. This chapter’s own warning – “downsampling a 10 kHz vibration waveform to a 1 Hz dashboard trend can erase the bearing information” – only makes sense once you know what frequency that information actually lives at.

The Derivation

The cage carrying the rolling elements turns at a speed set by averaging the (stationary) outer-race contact velocity and the (rotating) inner-race contact velocity at the ball’s own radius:

\[f_{cage} = \frac{f_r}{2}\left(1 - \frac{B_d}{P_d}\cos\phi\right)\]

Each of $n$ balls strikes a fixed point on the outer race once per cage revolution, giving the outer-race ball-pass frequency directly, and the inner-race rate follows from the relative speed between the cage and the (rotating) inner race:

\[\mathrm{BPFO} = n\,f_{cage} = \frac{n}{2}f_r\left(1-\frac{B_d}{P_d}\cos\phi\right), \qquad \mathrm{BPFI} = \frac{n}{2}f_r\left(1+\frac{B_d}{P_d}\cos\phi\right)\]

Nyquist then sets the highest frequency any sample rate $f_s$ can represent without folding it back into the trusted band:

\[f_{max} = \frac{f_s}{2}\]

Worked Numbers: A Catalog-Typical Motor Bearing

The chapter names BPFO/BPFI and a “10 kHz vibration waveform” but not a specific motor, so take catalog-typical values: a 4-pole induction motor at 1780 rpm (60 Hz line, typical slip), $n=9$ balls, $B_d/P_d=0.3$, $\phi=0$ (deep-groove ball bearing).

Shaft rate: $f_r = 1780/60 = 29.7$ Hz
$\mathrm{BPFO} = (9/2)\times29.7\times(1-0.3) = 93.5$ Hz; $\mathrm{BPFI} = (9/2)\times29.7\times(1+0.3) = 173.6$ Hz
At the chapter’s own $f_s=10$ kHz: Nyquist limit $=5.00$ kHz – BPFO and BPFI sit comfortably inside the trusted band, fully resolvable.
At the chapter’s own downsampled “1 Hz dashboard trend”: Nyquist limit $=0.5$ Hz. Both fault tones alias: $93.5$ Hz folds to $|93.5 - 93\times1|=0.5$ Hz and $173.6$ Hz folds to $|173.6-174\times1|=0.4$ Hz – the fault does not vanish, it reappears disguised as a slow, ordinary-looking trend line, which is exactly the failure mode this chapter’s own sentence warns about, now with a number attached.

45.2 Learning Objectives

After completing this chapter, you will be able to:

Apply predictive maintenance patterns using IoT sensor data
Compare reactive, preventive, and predictive maintenance strategies
Design vibration analysis systems for rotating machinery
Implement machine learning models for remaining useful life prediction
Calculate ROI for predictive maintenance investments

Chapter Roadmap

This chapter moves from maintenance problem to deployable IIoT program:

First we compare reactive, preventive, and predictive maintenance so the value case is clear.
Then we design the signal chain around named assets, fault signatures, operating context, and reviewable evidence.
Next we inspect vibration, thermal, and machine-learning methods before turning alerts into work orders.
Finally we test the economics, implementation phases, quizzes, and common pitfalls that decide whether technicians trust the system.

Checkpoints recap the operating decisions; deep calculations and interactives let you inspect the numbers without losing the main path.

For Beginners: Predictive Maintenance

Predictive maintenance is like visiting the doctor for a check-up before you feel sick. Instead of waiting for a factory machine to break down (which is expensive and dangerous), sensors listen to the machine’s vibrations, temperature, and sounds to spot tiny warning signs weeks in advance. It is the same idea as your car telling you to change the oil at 5,000 miles instead of waiting for the engine to seize – except IoT sensors do it automatically, 24 hours a day.

45.3 Prerequisites

Before diving into this chapter, you should be familiar with:

Industry 4.0 Fundamentals: Core concepts of Industry 4.0 and smart manufacturing
Real-Time Requirements and ISA-95: Understanding of automation levels and data flows
Data Storage and Databases: Time-series storage for industrial data

Cross-Hub Connections

Learning Resources:

Content Hub - Search for predictive maintenance practice checks, ROI calculations, vibration analysis, and ML model selection resources
Simulations Hub - Explore vibration signal analysis simulations and FFT visualization tools
Videos Hub - Watch real-world predictive maintenance and smart-factory implementation examples
Knowledge Gaps Hub - Address common misconceptions about maintenance strategies and ROI calculations
Concept Navigator - Explore how predictive maintenance connects to IIoT, ML/AI, time-series databases, and digital twins

45.4 PdM Turns Signals into Work

Predictive maintenance is valuable when a signal changes a maintenance action before the asset fails. The goal is not to collect every possible waveform. The goal is to detect a developing fault early enough to order parts, schedule labor, protect safety, and avoid unplanned downtime.

The strongest candidates are assets with expensive failure modes and measurable degradation patterns: motors, pumps, compressors, fans, gearboxes, spindles, conveyors, chillers, turbines, and critical bearings. The signal may come from vibration, temperature, acoustic emission, motor current, pressure, oil debris, lubricant analysis, cycle counts, or control-system operating context.

Predictive maintenance pipeline from vibration sensing through edge FFT, machine-learning prediction, and maintenance alert.

A useful deployment starts with one named asset and one named fault. For example, a drive-end bearing on a cooling pump might use a triaxial accelerometer, motor current, and discharge pressure. The first proof is whether the system can distinguish normal load changes from a developing bearing defect, then raise an alert early enough for a planned inspection. If the alert cannot become a work order with parts, labor, and safe access, the analytics result has not yet created maintenance value.

The business case should also include what will not be predicted. Some faults arrive too suddenly, some assets lack repeatable operating cycles, and some failures are cheaper to repair after failure than to instrument continuously. A strong PdM program is selective: monitor the expensive, repeatable, observable failure modes first, then expand when technicians trust the alerts and the measured downtime reduction is real.

Asset criticality: Start where failure stops production, damages equipment, creates safety risk, or causes high emergency repair cost.
Fault signature: Choose sensors because they expose a known degradation mechanism, not because they are easy to install.
Maintenance action: Define the work order, part, inspection, slowdown, shutdown, or operator response that the alert should trigger.

45.5 Design the Signal Chain First

A practical PdM pilot starts with failure modes and effects analysis, asset history, and maintenance records. If bearing wear is the target, the sensor placement, sampling rate, mounting method, machine speed, load state, and baseline period matter more than the dashboard. A poorly mounted accelerometer can produce clean-looking data that is useless for diagnosis.

Useful features depend on the asset. Rotating equipment may use RMS vibration, peak acceleration, crest factor, velocity, FFT bands, envelope spectra, bearing defect frequencies such as BPFO and BPFI, temperature trend, and motor current signature analysis. Process equipment may use pressure differential, flow, valve position, energy use, startup time, or compressor discharge temperature.

For a first pilot, collect enough context to make each alert reviewable by a maintenance engineer. Record the sensor location, axis orientation, mounting method, sampling rate, machine speed, load, product recipe, recent repair history, alarm threshold, and who inspected the asset after the alert. Then compare the alert to a physical finding: looseness, imbalance, misalignment, cavitation, lubrication failure, clogged filter, damaged bearing race, or no fault found. This evidence record keeps the project out of demo mode and gives technicians a reason to trust or challenge the model.

Pick the failure mode. Name the component, mechanism, and business consequence: bearing wear, imbalance, misalignment, cavitation, lubrication loss, clogged filter, or belt degradation.
Capture operating context. Store speed, load, recipe, product, duty cycle, ambient temperature, maintenance event, and asset state with the sensor values.
Choose the detection method. Start with thresholds and trend rules where physics is clear; add anomaly detection, random forest, XGBoost, or sequence models only when labeled history and drift controls justify them.
Close the workflow. Route alerts to CMMS/EAM systems such as IBM Maximo, SAP PM, or maintenance ticketing, and track whether technicians confirmed the fault.

45.6 PdM Needs Physics and Workflow

Condition data needs a traceable path from sensor to action. An IEPE accelerometer, MEMS sensor, current transformer, oil particle counter, or temperature probe may connect to an edge DAQ, PLC, or gateway. The gateway may compute FFT features locally, forward time-series data through OPC UA or MQTT, and store results in a historian, AVEVA PI System, InfluxDB, TimescaleDB, or cloud data lake.

Sampling choices change what faults can be seen. Nyquist limits, anti-alias filtering, window length, spectral resolution, tachometer references, sensor orientation, mounting stiffness, and unit conversion all affect the signal. Downsampling a 10 kHz vibration waveform to a 1 Hz dashboard trend can erase the bearing information that maintenance needed.

Model operations are part of the design. Alerts need severity, confidence, asset id, feature values, baseline comparison, recent maintenance state, and recommended inspection. Teams must handle false positives, false negatives, seasonal operation, new product mixes, replaced parts, firmware changes, sensor drift, and concept drift. A PdM system that cannot learn from technician outcomes becomes an expensive alarm list.

The data model should keep raw windows, derived features, and maintenance events connected. A feature such as vibration RMS or envelope energy should carry the source sensor, asset id, timestamp, window size, filter settings, firmware version, and operating state. The work-order outcome should then link back to the same feature window. That linkage lets the team retrain models, audit false alarms, compare edge firmware versions, and prove whether the warning arrived before the spare-part lead time and planned maintenance window.

Feature lineage: Keep sampling rate, window, filter, unit, sensor location, and firmware/configuration version with derived features.
Alert economics: Tune thresholds against downtime cost, inspection cost, spare-part lead time, and technician trust.
Feedback loop: Capture technician disposition, part condition, root cause, and return-to-service date so the model and maintenance plan improve.

Checkpoint: Signal Chain Design

You now know:

Predictive maintenance starts with one named asset and one named fault, not with every possible waveform.
Reviewable alerts need sensor location, axis orientation, sampling rate, speed, load, baseline period, and recent maintenance state.
The data model should keep raw windows, derived features, and work-order outcomes connected so false alarms and confirmed faults can improve the system.

45.7 Introduction

One of the highest-value applications of Industrial IoT is predictive maintenance. By continuously monitoring equipment health through vibration, temperature, and other sensors, manufacturers can detect failures weeks before they occur, scheduling repairs during planned downtime rather than suffering costly unplanned outages.

Predictive Maintenance ROI Basics

Core Concept: Predictive maintenance uses condition signals such as vibration, temperature, acoustic, current, oil, pressure, and operating context to detect degradation early enough for planned maintenance. Why It Matters: The value comes from converting uncertain failure risk into scheduled work. A useful program reduces emergency repair, lost production, safety exposure, and wasted preventive replacement, but the economics depend on the asset, failure mode, spare-part lead time, labor availability, and measured alert quality. Key Takeaway: Start with high-criticality assets and known fault signatures. Define the sensor, sampling rate, baseline period, alert threshold, work-order path, and evidence record before claiming a prediction window or ROI.

Sammy Listens to Machines

Hey there, young engineer! Let’s learn about predictive maintenance with the Sensor Squad!

Sammy the Sensor has a new job at a candy factory! His mission: keep the big machines running so they can make chocolate bars all day long.

The Problem: The giant chocolate mixer broke down yesterday! Now there’s no chocolate, and everyone is sad. The repair took 3 days because nobody knew it was about to break.

Sammy’s Solution: Be a Machine Doctor!

Sammy decides to become like a doctor who listens to your heartbeat. But instead of a stethoscope, Sammy uses special sensors:

Vibration Sensor (like feeling a cat purr): Sammy sticks to the mixer and feels how it shakes. If it starts shaking funny, something’s wrong!
Temperature Sensor (like checking for a fever): If the mixer gets too hot, it might be getting sick
Sound Sensor (like hearing a squeaky wheel): Machines make different sounds when they’re healthy vs unhealthy

How Sammy Saves the Day:

Monday: Sammy notices the mixer is shaking a tiny bit more than usual
Tuesday: The shaking gets worse, and the temperature goes up a little
Wednesday: Sammy sends an alert: “Hey! Fix me this weekend before I break!”
Saturday: The maintenance team replaces a worn bearing in just 2 hours
Monday: The mixer is back to making chocolate perfectly!

The Magic: Instead of waiting for the machine to break (and losing 3 days of chocolate!), Sammy helped fix it during the weekend when nobody needed it anyway. That’s called predictive maintenance - predicting problems before they happen!

Sensor Squad Memory Trick:

Vibration = Feeling the machine’s “heartbeat”
Temperature = Checking for “fever”
Prediction = Being a fortune teller for machines
Maintenance = Giving machines their medicine before they get really sick

45.8 Maintenance Strategies Comparison

Time: ~15 min | Difficulty: Advanced | Unit: P03.C06.U06

Key Concepts

Asset Criticality: Ranking equipment by production impact, safety consequence, repair cost, spare-part lead time, and whether failure stops a constrained process.
Fault Signature: A measurable condition pattern, such as 1x vibration for imbalance, 2x vibration for misalignment, BPFO/BPFI bearing bands, rising temperature, current imbalance, or pressure drift.
Baseline Profile: A reference record of normal behavior under known speed, load, recipe, environment, and maintenance state.
Feature Extraction: Converting raw windows into reviewable values such as RMS vibration, crest factor, FFT bands, envelope energy, thermal rise, or motor-current harmonics.
Remaining Useful Life (RUL): An estimate of time or cycles until a failure threshold is likely, valid only for the modeled fault mode and operating context.
CMMS/EAM Integration: Routing alerts into maintenance systems so inspection, parts, labor, and technician disposition are captured.
Alert Precision and Recall: Measures of whether alerts correspond to real faults and whether important faults are missed.

Illustrative bar chart comparing reactive, preventive, and predictive maintenance strategies by relative repair cost and unplanned downtime. — Illustrative comparison of reactive, preventive, and predictive maintenance strategies

Illustrative comparison of three maintenance strategies: reactive work waits for failure, preventive work follows a schedule, and predictive work uses condition evidence to schedule intervention before an expected fault.

Equipment Lifecycle Comparison

This scenario timeline contrasts how the same equipment can behave under three maintenance regimes. The dollar values are illustrative inputs, not universal benchmarks.

Timeline comparing three maintenance strategies across 12 months. Reactive section: Months 1-11 show equipment running with no monitoring or investment, Month 12 shows catastrophic failure with 3 days production stop, $50K emergency repair, and $150K lost production. Preventive section: Months 1-6 normal operation with scheduled checks, Month 6 shows planned replacement even if working costing $15K parts and 8 hours downtime, Months 7-12 new parts installed that may fail anyway. Predictive section: Months 1-11 IoT sensors active with vibration trending up and ML predicting failure, Month 11 shows early warning 30 days out, parts ordered for $5K, 4-hour scheduled repair, zero unplanned downtime.

45.8.1 Planning Cost Comparison

Strategy	Example Cost Pattern	Budget Pattern	Unplanned Downtime Pattern
Reactive	Highest emergency repair exposure	Large unplanned share	Highest
Preventive	Scheduled replacement and inspection cost	Larger planned share	Lower, but parts may be changed early
Predictive	Sensor, analytics, and review workflow cost	Planned around condition evidence	Lower when alerts are trusted and actionable

Show code

viewof maintenanceStrategy = Inputs.radio(
  ["Reactive", "Preventive", "Predictive"],
  {
    label: "Maintenance Strategy:",
    value: "Reactive"
  }
)

viewof totalHP = Inputs.range([100, 10000], {
  label: "Total Facility Horsepower:",
  step: 100,
  value: 5000
})

viewof downtimeCostPerHour = Inputs.range([1000, 50000], {
  label: "Downtime Cost ($/hour):",
  step: 1000,
  value: 5000
})

strategyCosts = {
  const costs = {
    "Reactive": {
      costPerHP: 12.5,
      downtime: 0.40,
      description: "Run-to-failure approach"
    },
    "Preventive": {
      costPerHP: 8,
      downtime: 0.20,
      description: "Time-based scheduled maintenance"
    },
    "Predictive": {
      costPerHP: 4,
      downtime: 0.05,
      description: "Sensor-based condition monitoring"
    }
  };
  
  const selected = costs[maintenanceStrategy];
  const annualCost = totalHP * selected.costPerHP;
  const avgFailuresPerYear = 8;
  const hoursDowntimePerFailure = 12;
  const totalDowntimeHours = avgFailuresPerYear * hoursDowntimePerFailure * selected.downtime / 0.40;
  const downtimeCost = totalDowntimeHours * downtimeCostPerHour;
  const totalCost = annualCost + downtimeCost;
  
  return {
    strategy: maintenanceStrategy,
    costPerHP: selected.costPerHP,
    downtime: selected.downtime,
    description: selected.description,
    annualMaintenanceCost: annualCost,
    annualDowntimeCost: downtimeCost,
    totalAnnualCost: totalCost,
    downtimeHours: totalDowntimeHours
  };
}

html`<div style="background: #f8f9fa; padding: 20px; border-radius: 8px; border-left: 4px solid ${maintenanceStrategy === 'Reactive' ? '#E74C3C' : maintenanceStrategy === 'Preventive' ? '#E67E22' : '#16A085'};">
  <h3 style="margin-top: 0; color: #2C3E50;">${maintenanceStrategy} Maintenance</h3>
  <p style="color: #7F8C8D; margin: 5px 0;">${strategyCosts.description}</p>
  <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 15px; margin-top: 15px;">
    <div>
      <div style="color: #7F8C8D; font-size: 0.9em;">Cost per HP</div>
      <div style="font-size: 1.5em; font-weight: bold; color: #2C3E50;">$${strategyCosts.costPerHP}/HP/year</div>
    </div>
    <div>
      <div style="color: #7F8C8D; font-size: 0.9em;">Downtime Factor</div>
      <div style="font-size: 1.5em; font-weight: bold; color: #2C3E50;">${(strategyCosts.downtime * 100).toFixed(0)}%</div>
    </div>
    <div>
      <div style="color: #7F8C8D; font-size: 0.9em;">Annual Maintenance Cost</div>
      <div style="font-size: 1.3em; font-weight: bold; color: #2C3E50;">$${strategyCosts.annualMaintenanceCost.toLocaleString()}</div>
    </div>
    <div>
      <div style="color: #7F8C8D; font-size: 0.9em;">Annual Downtime Cost</div>
      <div style="font-size: 1.3em; font-weight: bold; color: #E74C3C;">$${strategyCosts.annualDowntimeCost.toLocaleString()}</div>
    </div>
  </div>
  <div style="margin-top: 15px; padding-top: 15px; border-top: 2px solid #ddd;">
    <div style="color: #7F8C8D; font-size: 0.9em;">Total Annual Cost</div>
    <div style="font-size: 1.8em; font-weight: bold; color: #2C3E50;">$${strategyCosts.totalAnnualCost.toLocaleString()}</div>
    <div style="color: #7F8C8D; font-size: 0.85em; margin-top: 5px;">
      ${strategyCosts.downtimeHours.toFixed(1)} hours downtime per year
    </div>
  </div>
</div>`

Show code

comparison = {
  const reactive = {
    costPerHP: 12.5,
    downtime: 0.40
  };
  const preventive = {
    costPerHP: 8,
    downtime: 0.20
  };
  const predictive = {
    costPerHP: 4,
    downtime: 0.05
  };
  
  const avgFailures = 8;
  const hoursPerFailure = 12;
  
  const calc = (strategy) => {
    const mainCost = totalHP * strategy.costPerHP;
    const downHours = avgFailures * hoursPerFailure * strategy.downtime / 0.40;
    const downCost = downHours * downtimeCostPerHour;
    return mainCost + downCost;
  };
  
  const reactiveCost = calc(reactive);
  const preventiveCost = calc(preventive);
  const predictiveCost = calc(predictive);
  
  return [
    {strategy: "Reactive", cost: reactiveCost, savings: 0},
    {strategy: "Preventive", cost: preventiveCost, savings: reactiveCost - preventiveCost},
    {strategy: "Predictive", cost: predictiveCost, savings: reactiveCost - predictiveCost}
  ];
}

Plot.plot({
  marginLeft: 100,
  marginBottom: 60,
  x: {
    label: "Annual Cost ($)",
    grid: true
  },
  y: {
    label: null
  },
  marks: [
    Plot.barX(comparison, {
      y: "strategy",
      x: "cost",
      fill: d => d.strategy === "Reactive" ? "#E74C3C" : d.strategy === "Preventive" ? "#E67E22" : "#16A085",
      tip: true,
      title: d => `${d.strategy}\nCost: $${d.cost.toLocaleString()}\nSavings vs Reactive: $${d.savings.toLocaleString()}`
    }),
    Plot.text(comparison, {
      y: "strategy",
      x: "cost",
      text: d => `$${(d.cost/1000).toFixed(0)}K`,
      dx: -30,
      fill: "white",
      fontSize: 14,
      fontWeight: "bold"
    }),
    Plot.ruleX([0])
  ],
  color: {
    legend: false
  }
})

Putting Numbers to It

Use the calculator above as an editable scenario model. For example, a plant might compare an assumed current maintenance factor with an assumed condition-based maintenance factor for a group of motors:

Current-state annual maintenance factor: \[\text{Cost} = 5{,}000 \text{ HP} \times \$12.50/\text{HP} = \$62{,}500/\text{year}\]

Condition-based annual maintenance factor: \[\text{Cost} = 5{,}000 \text{ HP} \times \$4/\text{HP} = \$20{,}000/\text{year}\]

Example maintenance-cost difference: $62,500 - $20,000 = $42,500

If the predictive maintenance system (sensors, installation, software, and integration) costs $180,000 upfront with $36,000 annual operating costs, this narrow maintenance-cost model produces:$$\text{Net Savings} = \$42{,}500 - \$36{,}000 = \$6{,}500/\text{year}$$

The narrow maintenance-cost payback would be: \[\text{Payback} = \frac{\$180{,}000}{\$6{,}500} \approx 27.7 \text{ months}\]

This does not prove the project is justified. It shows why the business case must include the local downtime rate, fault probability, spare-part lead time, false-positive cost, and whether the alert actually arrives early enough for planned work.

45.9 Predictive Maintenance Pipeline

The strategy comparison showed why condition evidence matters. The next question is how that evidence travels from a physical asset to a maintenance decision without losing context.

Predictive maintenance data pipeline from condition sensing to maintenance action

Predictive maintenance data pipeline with four stages: condition sensors collect asset evidence, an edge device computes features, analytics estimate severity or remaining useful life, and a maintenance workflow turns the alert into reviewable work.

Alternative View: Example Data Flow

This diagram uses concrete example values to show where data volume changes. Treat the numbers as a design scenario that must be replaced by measured asset data in a real deployment.

Predictive maintenance pipeline with example data volumes from sensing through edge processing, analytics, and maintenance action.

45.10 Vibration Analysis

Rotating machinery (motors, pumps, fans) reveals health through vibration signatures:

Vibration analysis workflow from sensing to defect detection

Vibration analysis workflow showing sensing (3-axis accelerometers at 100-1000 Hz) feeding both time-domain analysis (RMS, peak, crest factor) and frequency-domain analysis (FFT, order analysis, envelope analysis) to detect specific defects like imbalance, misalignment, and bearing faults.

45.10.1 Common Defects and Frequencies

Defect	Frequency Signature	Detection Lead Time
Imbalance	1x shaft speed	1-2 weeks
Misalignment	2x shaft speed (axial and radial)	Immediate
Bearing defects	BPFO, BPFI, BSF, FTF harmonics	2-4 weeks
Gear mesh	Teeth count x shaft speed	1-3 weeks
Looseness	Multiple harmonics, random spikes	1-2 weeks

45.10.2 Analysis Techniques

Time-domain analysis:

RMS: Overall vibration level
Peak: Maximum amplitude
Crest factor: Peak-to-RMS ratio (indicates impulsive events)

Frequency-domain analysis:

FFT: Fast Fourier Transform identifies specific defect frequencies
Order analysis: Tracks frequency components relative to shaft speed
Spectral trending: Monitors changes in specific frequency bands over time

Advanced techniques:

Envelope analysis: Demodulates high frequencies to detect bearing faults
Wavelet analysis: Time-frequency analysis for transient events
Cepstrum analysis: Detects periodic patterns in spectrum (gear families)

45.10.3 Detection Timeline

Defect Type	Early Detection	Actionable Alert	Critical
Bearing wear	6-8 weeks	2-4 weeks	<1 week
Imbalance	2-4 weeks	1-2 weeks	Days
Misalignment	Immediate	Immediate	N/A
Lubrication	4-6 weeks	2-3 weeks	Days

Show code

viewof shaftSpeed = Inputs.range([10, 100], {
  label: "Shaft Speed (Hz):",
  step: 5,
  value: 30
})

viewof defectType = Inputs.select(
  ["Imbalance", "Misalignment", "Bearing - BPFO", "Bearing - BPFI", "Looseness"],
  {
    label: "Defect Type:",
    value: "Imbalance"
  }
)

bearingParams = {
  const ballDiameter = 0.5; // inches
  const pitchDiameter = 3.0; // inches
  const ballCount = 8;
  const contactAngle = 0; // degrees
  
  const beta = Math.cos(contactAngle * Math.PI / 180);
  const BPFO = (ballCount / 2) * (1 + (ballDiameter / pitchDiameter) * beta);
  const BPFI = (ballCount / 2) * (1 - (ballDiameter / pitchDiameter) * beta);
  const BSF = (pitchDiameter / (2 * ballDiameter)) * (1 - Math.pow((ballDiameter / pitchDiameter) * beta, 2));
  
  return { BPFO, BPFI, BSF };
}

vibrationSignal = {
  const samples = 1000;
  const samplingRate = shaftSpeed * 100;
  const time = Array.from({length: samples}, (_, i) => i / samplingRate);
  
  let frequency;
  let amplitude = 1.0;
  let harmonics = [];
  
  switch(defectType) {
    case "Imbalance":
      frequency = shaftSpeed;
      harmonics = [{freq: shaftSpeed, amp: 1.0}];
      break;
    case "Misalignment":
      frequency = 2 * shaftSpeed;
      harmonics = [
        {freq: shaftSpeed, amp: 0.3},
        {freq: 2 * shaftSpeed, amp: 1.0},
        {freq: 3 * shaftSpeed, amp: 0.2}
      ];
      break;
    case "Bearing - BPFO":
      frequency = shaftSpeed * bearingParams.BPFO;
      harmonics = [
        {freq: shaftSpeed * bearingParams.BPFO, amp: 1.0},
        {freq: shaftSpeed * bearingParams.BPFO * 2, amp: 0.5},
        {freq: shaftSpeed * bearingParams.BPFO * 3, amp: 0.3}
      ];
      break;
    case "Bearing - BPFI":
      frequency = shaftSpeed * bearingParams.BPFI;
      harmonics = [
        {freq: shaftSpeed * bearingParams.BPFI, amp: 1.0},
        {freq: shaftSpeed * bearingParams.BPFI * 2, amp: 0.6},
        {freq: shaftSpeed * bearingParams.BPFI * 3, amp: 0.4}
      ];
      break;
    case "Looseness":
      harmonics = [
        {freq: shaftSpeed, amp: 0.8},
        {freq: 2 * shaftSpeed, amp: 0.6},
        {freq: 3 * shaftSpeed, amp: 0.4},
        {freq: 5 * shaftSpeed, amp: 0.3},
        {freq: 0.5 * shaftSpeed, amp: 0.5}
      ];
      break;
  }
  
  const signal = time.map(t => {
    let value = 0;
    harmonics.forEach(h => {
      value += h.amp * Math.sin(2 * Math.PI * h.freq * t);
    });
    // Add noise
    value += (Math.random() - 0.5) * 0.1;
    return value;
  });
  
  return time.map((t, i) => ({time: t, amplitude: signal[i]}));
}

fftData = {
  const signal = vibrationSignal.map(d => d.amplitude);
  const n = signal.length;
  const frequencies = Array.from({length: Math.floor(n/2)}, (_, i) => i * (shaftSpeed * 100) / n);
  
  // Simple magnitude calculation (approximation)
  const magnitudes = frequencies.map((f, i) => {
    let sum = 0;
    for(let k = 0; k < n; k++) {
      sum += signal[k] * Math.cos(2 * Math.PI * f * k / (shaftSpeed * 100));
    }
    return Math.abs(sum) / n;
  });
  
  return frequencies.map((f, i) => ({frequency: f, magnitude: magnitudes[i]}))
    .filter(d => d.frequency <= shaftSpeed * 10); // Show up to 10x shaft speed
}

html`<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 20px; margin: 20px 0;">
  <div>
    <h4 style="color: #2C3E50; margin-bottom: 10px;">Time Domain</h4>
    ${Plot.plot({
      width: 400,
      height: 250,
      x: {label: "Time (seconds)", grid: true},
      y: {label: "Amplitude", domain: [-2, 2]},
      marks: [
        Plot.line(vibrationSignal.slice(0, 200), {x: "time", y: "amplitude", stroke: "#16A085", strokeWidth: 1.5}),
        Plot.ruleY([0])
      ]
    })}
  </div>
  <div>
    <h4 style="color: #2C3E50; margin-bottom: 10px;">Frequency Domain (FFT)</h4>
    ${Plot.plot({
      width: 400,
      height: 250,
      x: {label: "Frequency (Hz)", grid: true},
      y: {label: "Magnitude"},
      marks: [
        Plot.line(fftData, {x: "frequency", y: "magnitude", stroke: "#E67E22", strokeWidth: 1.5}),
        Plot.ruleX([shaftSpeed], {stroke: "#2C3E50", strokeDasharray: "4 4"}),
        Plot.text([{x: shaftSpeed, y: 0}], {
          x: "x",
          y: "y",
          text: ["1x"],
          dy: -10,
          fill: "#2C3E50",
          fontSize: 10
        })
      ]
    })}
  </div>
</div>`

Show code

defectInfo = {
  const info = {
    "Imbalance": {
      signature: `1x shaft speed (${shaftSpeed} Hz)`,
      cause: "Uneven mass distribution on rotor",
      severity: "Amplitude > 0.3 in/sec indicates imbalance"
    },
    "Misalignment": {
      signature: `2x shaft speed (${(2*shaftSpeed).toFixed(1)} Hz) dominant`,
      cause: "Motor and load shafts not properly aligned",
      severity: "High axial and radial vibration"
    },
    "Bearing - BPFO": {
      signature: `BPFO = ${(shaftSpeed * bearingParams.BPFO).toFixed(1)} Hz (${bearingParams.BPFO.toFixed(2)}x shaft speed)`,
      cause: "Ball Pass Frequency Outer race - defect on outer race",
      severity: "Harmonics at 2x and 3x BPFO indicate advanced wear"
    },
    "Bearing - BPFI": {
      signature: `BPFI = ${(shaftSpeed * bearingParams.BPFI).toFixed(1)} Hz (${bearingParams.BPFI.toFixed(2)}x shaft speed)`,
      cause: "Ball Pass Frequency Inner race - defect on inner race",
      severity: "Higher amplitude than BPFO, more urgent repair"
    },
    "Looseness": {
      signature: "Multiple harmonics including sub-harmonics",
      cause: "Loose mounting bolts or bearing fit",
      severity: "Random amplitude variations indicate structural looseness"
    }
  };
  
  return info[defectType];
}

html`<div style="background: #f8f9fa; padding: 15px; border-radius: 8px; border-left: 4px solid #3498DB; margin: 20px 0;">
  <h4 style="margin-top: 0; color: #2C3E50;">${defectType} Analysis</h4>
  <div style="display: grid; gap: 10px;">
    <div>
      <strong style="color: #7F8C8D;">Frequency Signature:</strong>
      <div style="color: #2C3E50;">${defectInfo.signature}</div>
    </div>
    <div>
      <strong style="color: #7F8C8D;">Probable Cause:</strong>
      <div style="color: #2C3E50;">${defectInfo.cause}</div>
    </div>
    <div>
      <strong style="color: #7F8C8D;">Severity Indicator:</strong>
      <div style="color: #2C3E50;">${defectInfo.severity}</div>
    </div>
  </div>
</div>`

Checkpoint: Vibration Signals

You now know:

Vibration analysis uses time-domain features such as RMS, peak, and crest factor plus frequency-domain methods such as FFT, order analysis, and envelope analysis.
Common clues include 1x shaft speed for imbalance, 2x shaft speed for misalignment, and BPFO/BPFI bearing bands for race defects.
Detection timelines differ: bearing wear may show early signs 6-8 weeks out, while misalignment can be immediate.

45.11 Thermal Imaging

Infrared cameras detect thermal anomalies:

45.11.1 Thermal Monitoring Architecture

Thermal monitoring architecture flowchart with three columns: left column shows infrared sensing technologies (handheld cameras, fixed-mount sensors, drone surveys) in teal, middle column shows analysis engine performing baseline comparison, trending, and anomaly detection in navy, right column shows three application categories - electrical systems (connections, transformers, switchgear), mechanical systems (bearings, belts, couplings), and process equipment (heat exchangers, furnaces, vessels) in orange — Thermal monitoring system architecture from sensing to alerts

Thermal monitoring architecture showing infrared sensing technologies feeding into analysis engine for baseline comparison, trending, and anomaly detection across electrical, mechanical, and process equipment applications.

45.11.2 Electrical Applications

Hot spots on connections indicate high resistance
Overheated components indicate overload
Phase imbalance in motors
Can detect problems 6-12 months in advance

45.11.3 Mechanical Applications

Bearing overheating (friction)
Belt misalignment (heat buildup)
Lubrication issues (dry bearings)
Coupling problems

45.11.4 Temperature Thresholds

Component	Normal	Warning	Critical
Motor bearings	<70°C	70-85°C	>85°C
Electrical connections	<40°C rise	40-70°C rise	>70°C rise
Gearbox oil	<80°C	80-95°C	>95°C

Show code

viewof componentType = Inputs.select(
  ["Motor Bearing", "Electrical Connection", "Gearbox Oil"],
  {
    label: "Component Type:",
    value: "Motor Bearing"
  }
)

viewof ambientTemp = Inputs.range([15, 35], {
  label: "Ambient Temperature (°C):",
  step: 1,
  value: 25
})

viewof measuredTemp = Inputs.range([20, 150], {
  label: "Measured Temperature (°C):",
  step: 1,
  value: 75
})

thermalAnalysis = {
  const thresholds = {
    "Motor Bearing": {
      normal: 70,
      warning: 85,
      critical: 85,
      useAbsolute: true
    },
    "Electrical Connection": {
      normal: 40,
      warning: 70,
      critical: 70,
      useAbsolute: false
    },
    "Gearbox Oil": {
      normal: 80,
      warning: 95,
      critical: 95,
      useAbsolute: true
    }
  };
  
  const config = thresholds[componentType];
  const value = config.useAbsolute ? measuredTemp : (measuredTemp - ambientTemp);
  
  let status, color, recommendation;
  
  if (value < config.normal) {
    status = "Normal";
    color = "#16A085";
    recommendation = "Continue routine monitoring";
  } else if (value < config.warning) {
    status = "Warning";
    color = "#E67E22";
    recommendation = "Schedule repair within 1-4 weeks during planned downtime";
  } else {
    status = "Critical";
    color = "#E74C3C";
    recommendation = "Immediate action required - schedule emergency repair";
  }
  
  return {
    value,
    status,
    color,
    recommendation,
    displayValue: config.useAbsolute ? `${value}°C` : `${value}°C rise above ambient`,
    normalThreshold: config.useAbsolute ? config.normal : config.normal,
    warningThreshold: config.useAbsolute ? config.warning : config.warning
  };
}

html`<div style="background: ${thermalAnalysis.color}15; padding: 20px; border-radius: 8px; border-left: 4px solid ${thermalAnalysis.color}; margin: 20px 0;">
  <div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 15px;">
    <h3 style="margin: 0; color: #2C3E50;">${componentType}</h3>
    <div style="background: ${thermalAnalysis.color}; color: white; padding: 8px 16px; border-radius: 20px; font-weight: bold;">
      ${thermalAnalysis.status}
    </div>
  </div>
  <div style="display: grid; gap: 15px;">
    <div>
      <div style="color: #7F8C8D; font-size: 0.9em;">Current Temperature</div>
      <div style="font-size: 2em; font-weight: bold; color: ${thermalAnalysis.color};">
        ${thermalAnalysis.displayValue}
      </div>
    </div>
    <div style="background: white; padding: 10px; border-radius: 4px;">
      <div style="color: #7F8C8D; font-size: 0.85em; margin-bottom: 5px;">Temperature Range</div>
      <div style="display: flex; gap: 10px; align-items: center;">
        <div style="flex: 1; height: 20px; background: linear-gradient(to right, #16A085 0%, #16A085 ${(thermalAnalysis.normalThreshold / 150 * 100)}%, #E67E22 ${(thermalAnalysis.normalThreshold / 150 * 100)}%, #E67E22 ${(thermalAnalysis.warningThreshold / 150 * 100)}%, #E74C3C ${(thermalAnalysis.warningThreshold / 150 * 100)}%); border-radius: 10px; position: relative;">
          <div style="position: absolute; left: ${(thermalAnalysis.value / 150 * 100)}%; top: -5px; width: 3px; height: 30px; background: #2C3E50;"></div>
        </div>
      </div>
      <div style="display: flex; justify-content: space-between; margin-top: 5px; font-size: 0.8em; color: #7F8C8D;">
        <span>0°C</span>
        <span>${thermalAnalysis.normalThreshold}°C</span>
        <span>${thermalAnalysis.warningThreshold}°C</span>
        <span>150°C</span>
      </div>
    </div>
    <div style="background: white; padding: 12px; border-radius: 4px;">
      <div style="color: #7F8C8D; font-size: 0.85em; margin-bottom: 5px;">Recommended Action</div>
      <div style="color: #2C3E50; font-weight: 500;">${thermalAnalysis.recommendation}</div>
    </div>
  </div>
</div>`

45.12 Machine Learning Models

Vibration and thermal rules handle many faults directly. When conditions vary by load, recipe, season, and maintenance history, ML can help, but only if the model choice matches the evidence you actually have.

Modern predictive maintenance uses ML to learn normal behavior and detect anomalies.

45.12.1 ML Model Selection Decision Tree

Decision tree flowchart for selecting ML models in predictive maintenance: starting point asks 'Do you have labeled failure data?', if YES branch leads to supervised learning (Random Forest, XGBoost, Neural Networks) for classification tasks shown in teal, if NO branch leads to unsupervised learning (Autoencoders, Isolation Forest, One-class SVM) for anomaly detection shown in navy, separate branch for 'Need time-to-failure prediction?' leads to time-series forecasting (LSTM, Prophet, Gaussian Process) shown in orange — ML model selection decision tree for predictive maintenance

Use this decision tree to select the appropriate ML approach based on your available data and prediction goals.

45.12.2 Supervised Learning

Approach: Requires labeled failure data to train classifiers.

Algorithms:

Random Forest, XGBoost for classification
Neural networks for complex patterns

Output: “Will this bearing fail in next 30 days?” (Yes/No with probability)

Requirements:

Historical failure data (dozens to hundreds of examples)
Consistent sensor data leading up to failures
Domain expertise to label failure modes

45.12.3 Unsupervised Learning

Approach: Learns normal operation without failure labels.

Algorithms:

Autoencoders (reconstruction error indicates anomaly)
Isolation Forests (detects outliers)
One-class SVM

Output: “Is this vibration signature abnormal?” (Anomaly score)

Advantages:

Works without historical failures
Detects novel failure modes
Good for rare events

45.12.4 Time-Series Forecasting

Approach: Predicts remaining useful life (RUL) based on degradation trends.

Algorithms:

LSTM neural networks
Prophet (trend + seasonality)
Gaussian Process Regression

Output: “How many hours/days until failure?” (RUL estimate with confidence interval)

Key metrics:

Mean Absolute Error (MAE)
Root Mean Square Error (RMSE)
Percentage within 10%/20% tolerance

Show code

viewof mlModelType = Inputs.select(
  ["Supervised - Classification", "Unsupervised - Anomaly Detection", "Time-Series - RUL Prediction"],
  {
    label: "ML Model Type:",
    value: "Supervised - Classification"
  }
)

viewof trainingDataSize = Inputs.range([10, 500], {
  label: "Training Examples:",
  step: 10,
  value: 100
})

viewof failureRate = Inputs.range([0.01, 0.20], {
  label: "Historical Failure Rate:",
  step: 0.01,
  value: 0.05,
  format: x => `${(x * 100).toFixed(0)}%`
})

mlPerformance = {
  const baseMetrics = {
    "Supervised - Classification": {
      accuracy: 0.85,
      precision: 0.82,
      recall: 0.88,
      f1: 0.85,
      trainingTime: "4-8 hours",
      inference: "< 100ms",
      dataRequirement: "Labeled failure examples"
    },
    "Unsupervised - Anomaly Detection": {
      accuracy: 0.75,
      precision: 0.70,
      recall: 0.92,
      f1: 0.79,
      trainingTime: "1-2 hours",
      inference: "< 50ms",
      dataRequirement: "Normal operation only"
    },
    "Time-Series - RUL Prediction": {
      accuracy: 0.80,
      precision: 0.78,
      recall: 0.85,
      f1: 0.81,
      trainingTime: "8-16 hours",
      inference: "< 200ms",
      dataRequirement: "Degradation curves to failure"
    }
  };
  
  const metrics = baseMetrics[mlModelType];
  
  // Adjust accuracy based on training data size
  const dataFactor = Math.min(trainingDataSize / 200, 1.0);
  const adjustedAccuracy = metrics.accuracy * (0.7 + 0.3 * dataFactor);
  const adjustedPrecision = metrics.precision * (0.7 + 0.3 * dataFactor);
  const adjustedRecall = metrics.recall * (0.75 + 0.25 * dataFactor);
  const adjustedF1 = 2 * (adjustedPrecision * adjustedRecall) / (adjustedPrecision + adjustedRecall);
  
  // Calculate expected false positives and false negatives
  const totalAssets = 100;
  const expectedFailures = totalAssets * failureRate;
  const truePositives = expectedFailures * adjustedRecall;
  const falseNegatives = expectedFailures - truePositives;
  const falsePositives = (totalAssets - expectedFailures) * (1 - metrics.precision) * 0.3;
  
  return {
    accuracy: adjustedAccuracy,
    precision: adjustedPrecision,
    recall: adjustedRecall,
    f1: adjustedF1,
    trainingTime: metrics.trainingTime,
    inference: metrics.inference,
    dataRequirement: metrics.dataRequirement,
    truePositives: truePositives.toFixed(1),
    falseNegatives: falseNegatives.toFixed(1),
    falsePositives: falsePositives.toFixed(1)
  };
}

html`<div style="background: #f8f9fa; padding: 20px; border-radius: 8px; border-left: 4px solid #3498DB; margin: 20px 0;">
  <h3 style="margin-top: 0; color: #2C3E50;">${mlModelType}</h3>
  <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 15px; margin: 20px 0;">
    <div style="background: white; padding: 15px; border-radius: 6px;">
      <div style="color: #7F8C8D; font-size: 0.85em;">Accuracy</div>
      <div style="font-size: 1.8em; font-weight: bold; color: #16A085;">
        ${(mlPerformance.accuracy * 100).toFixed(1)}%
      </div>
    </div>
    <div style="background: white; padding: 15px; border-radius: 6px;">
      <div style="color: #7F8C8D; font-size: 0.85em;">Precision</div>
      <div style="font-size: 1.8em; font-weight: bold; color: #16A085;">
        ${(mlPerformance.precision * 100).toFixed(1)}%
      </div>
    </div>
    <div style="background: white; padding: 15px; border-radius: 6px;">
      <div style="color: #7F8C8D; font-size: 0.85em;">Recall</div>
      <div style="font-size: 1.8em; font-weight: bold; color: #16A085;">
        ${(mlPerformance.recall * 100).toFixed(1)}%
      </div>
    </div>
    <div style="background: white; padding: 15px; border-radius: 6px;">
      <div style="color: #7F8C8D; font-size: 0.85em;">F1 Score</div>
      <div style="font-size: 1.8em; font-weight: bold; color: #16A085;">
        ${(mlPerformance.f1 * 100).toFixed(1)}%
      </div>
    </div>
  </div>
  <div style="background: white; padding: 15px; border-radius: 6px; margin-top: 15px;">
    <h4 style="margin-top: 0; color: #2C3E50;">Expected Performance (100 assets, ${(failureRate * 100).toFixed(0)}% failure rate)</h4>
    <div style="display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 15px;">
      <div>
        <div style="color: #16A085; font-size: 0.85em;">True Positives</div>
        <div style="font-size: 1.5em; font-weight: bold; color: #16A085;">${mlPerformance.truePositives}</div>
        <div style="color: #7F8C8D; font-size: 0.75em;">Correctly predicted failures</div>
      </div>
      <div>
        <div style="color: #E74C3C; font-size: 0.85em;">False Negatives</div>
        <div style="font-size: 1.5em; font-weight: bold; color: #E74C3C;">${mlPerformance.falseNegatives}</div>
        <div style="color: #7F8C8D; font-size: 0.75em;">Missed failures</div>
      </div>
      <div>
        <div style="color: #E67E22; font-size: 0.85em;">False Positives</div>
        <div style="font-size: 1.5em; font-weight: bold; color: #E67E22;">${mlPerformance.falsePositives}</div>
        <div style="color: #7F8C8D; font-size: 0.75em;">Unnecessary alerts</div>
      </div>
    </div>
  </div>
  <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 15px; margin-top: 15px;">
    <div style="background: white; padding: 12px; border-radius: 6px;">
      <div style="color: #7F8C8D; font-size: 0.85em;">Training Time</div>
      <div style="font-weight: bold; color: #2C3E50;">${mlPerformance.trainingTime}</div>
    </div>
    <div style="background: white; padding: 12px; border-radius: 6px;">
      <div style="color: #7F8C8D; font-size: 0.85em;">Inference Time</div>
      <div style="font-weight: bold; color: #2C3E50;">${mlPerformance.inference}</div>
    </div>
  </div>
  <div style="background: #E67E2210; padding: 12px; border-radius: 6px; margin-top: 15px;">
    <div style="color: #7F8C8D; font-size: 0.85em; margin-bottom: 5px;">Data Requirement</div>
    <div style="color: #2C3E50; font-weight: 500;">${mlPerformance.dataRequirement}</div>
  </div>
</div>`

Checkpoint: Model Choice

You now know:

Supervised learning can answer whether a bearing will fail in the next 30 days, but it needs labeled failure examples.
Unsupervised anomaly detection can start from normal operation only, but it still needs engineer review before action.
RUL and time-series models estimate time to failure only for the asset class, fault mode, and operating context represented in the training data.

45.13 Alert to Work Order

Time: ~8 min | Difficulty: Intermediate | Unit: P03.C06.U07

An automotive smart-factory maintenance program usually succeeds or fails at the handoff between analytics and maintenance execution. The useful case-study pattern is not “AI predicted a fault” by itself; it is a closed loop from condition evidence to work order, inspection, repair, and model feedback.

Example scope:

Critical conveyors, robots, compressors, pumps, spindles, and drives are ranked by failure impact.
Vibration, current, temperature, cycle-count, and controller-state data are captured with asset id, speed/load context, and maintenance history.
Edge gateways compute features near the machine while historians, OPC UA servers, MQTT brokers, or MES/CMMS connectors move selected evidence upward.
Maintenance planners review severity, confidence, spare-part lead time, and production windows before scheduling work.

What to measure:

Alert precision and missed-fault rate by failure mode.
Time from first warning to confirmed inspection.
Planned work percentage versus emergency work percentage.
Downtime avoided, repair hours, spare-part waste, and technician trust.

Predictive maintenance workflow from condition evidence to planned repair and feedback.

The workflow must leave an auditable trail. Each alert should identify the asset, feature values, baseline comparison, severity, recommended inspection, technician disposition, replaced part, and return-to-service result.

Lesson learned: Success requires maintenance adoption, not just analytics. Technicians need enough evidence to challenge bad alerts, confirm real faults, and feed the outcome back into thresholds and models.

Automated Electronics Plant Pattern

Highly automated electronics plants often use the same building blocks discussed in this chapter: PLCs and PROFINET or industrial Ethernet at the machine layer, OPC UA or historian interfaces for operations data, RFID or traceability records for product context, and analytics that compare equipment behavior against known-good baselines.

Integration pattern:

Production equipment emits process values, alarms, cycle counts, and quality results.
Asset health features are tied to product, recipe, shift, maintenance event, and environmental context.
Quality, maintenance, and operations teams review the same asset history rather than separate dashboards.
Digital twin or simulation work is used for what-if planning, not as a replacement for measured condition data.

Business impact to verify locally:

Fewer emergency repairs on critical bottleneck assets.
Higher planned-maintenance ratio without excessive part replacement.
Lower false-alarm burden for technicians.
Better root-cause records for repeat failures.

Key Success Factor: The plant treats PdM as a maintenance decision system with measured outcomes, not as a standalone ML demo.

45.14 ROI Calculation Framework

45.14.1 Cost Components

Investment costs (illustrative ranges; replace with local quotes):

Sensors: $100-500 per motor (vibration, temperature)
Gateways: $500-2,000 per zone
Software: $50,000-500,000 (depending on scale)
Integration: 2-5x hardware cost for brownfield
Training: $1,000-5,000 per technician

Operating costs (illustrative ranges; replace with local contracts):

Platform licensing: $10-50 per asset/month
Connectivity: $5-20 per gateway/month
Data storage: $0.02-0.05 per GB/month
Analyst time: $50,000-100,000/year for dedicated resources

45.14.2 Benefit Categories

Direct savings:

Reduced emergency repairs (labor + parts + expediting)
Extended equipment life (deferred replacement)
Lower spare parts inventory (order when needed)
Reduced energy consumption (efficient equipment)

Indirect savings:

Avoided production losses (unplanned downtime)
Improved quality (equipment in specification)
Reduced safety incidents (early warning of hazards)
Better capital planning (known equipment condition)

45.14.3 Sample ROI Calculation

Scenario: 100-motor manufacturing plant using editable planning assumptions.

Item	Value
Average motor replacement cost	$15,000
Historical failures per year	8
Average downtime per failure	12 hours
Downtime cost per hour	$5,000
Annual failure cost	$600,000

With predictive maintenance:

Item	Value
Investment (sensors, software, integration)	$180,000
Annual operating cost	$36,000
Failure prediction rate	85%
Prevented failures	6.8 per year
Annual savings	$510,000
Payback period	4.2 months

Show code

viewof motorCount = Inputs.range([10, 500], {
  label: "Number of Motors:",
  step: 10,
  value: 100
})

viewof avgReplacementCost = Inputs.range([5000, 50000], {
  label: "Average Motor Replacement Cost ($):",
  step: 1000,
  value: 15000
})

viewof historicalFailures = Inputs.range([1, 50], {
  label: "Historical Failures per Year:",
  step: 1,
  value: 8
})

viewof downtimeHours = Inputs.range([4, 72], {
  label: "Average Downtime per Failure (hours):",
  step: 4,
  value: 12
})

viewof downtimeCost = Inputs.range([1000, 20000], {
  label: "Downtime Cost per Hour ($):",
  step: 1000,
  value: 5000
})

viewof sensorCostPerMotor = Inputs.range([100, 1000], {
  label: "Sensor Cost per Motor ($):",
  step: 50,
  value: 300
})

viewof softwareCost = Inputs.range([10000, 200000], {
  label: "Software Platform Cost ($):",
  step: 10000,
  value: 50000
})

viewof integrationMultiplier = Inputs.range([1, 5], {
  label: "Integration Cost Multiplier (x hardware):",
  step: 0.5,
  value: 2,
  format: x => `${x}x`
})

viewof annualOperating = Inputs.range([10000, 100000], {
  label: "Annual Operating Cost ($):",
  step: 5000,
  value: 36000
})

viewof predictionRate = Inputs.range([0.50, 0.95], {
  label: "Failure Prediction Rate:",
  step: 0.05,
  value: 0.85,
  format: x => `${(x * 100).toFixed(0)}%`
})

roiCalculation = {
  // Current state costs
  const replacementCosts = historicalFailures * avgReplacementCost;
  const downtimeCosts = historicalFailures * downtimeHours * downtimeCost;
  const annualFailureCost = replacementCosts + downtimeCosts;
  
  // Investment costs
  const hardwareCost = motorCount * sensorCostPerMotor;
  const integrationCost = hardwareCost * integrationMultiplier;
  const totalInvestment = hardwareCost + softwareCost + integrationCost;
  
  // Savings
  const preventedFailures = historicalFailures * predictionRate;
  const savedReplacementCost = preventedFailures * avgReplacementCost;
  const savedDowntimeCost = preventedFailures * downtimeHours * downtimeCost;
  const annualSavings = savedReplacementCost + savedDowntimeCost;
  const netAnnualSavings = annualSavings - annualOperating;
  
  // ROI metrics
  const paybackMonths = totalInvestment / netAnnualSavings * 12;
  const roi = (netAnnualSavings / totalInvestment) * 100;
  const fiveYearNPV = netAnnualSavings * 5 - totalInvestment;
  
  return {
    annualFailureCost,
    totalInvestment,
    annualSavings,
    netAnnualSavings,
    paybackMonths,
    roi,
    fiveYearNPV,
    preventedFailures
  };
}

html`<div style="background: linear-gradient(135deg, #2C3E50 0%, #3498DB 100%); color: white; padding: 25px; border-radius: 10px; margin: 20px 0;">
  <h3 style="margin-top: 0;">ROI Analysis Results</h3>
  <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 15px; margin-top: 20px;">
    <div style="background: rgba(255,255,255,0.15); padding: 15px; border-radius: 8px;">
      <div style="opacity: 0.9; font-size: 0.85em;">Annual Failure Cost (Current)</div>
      <div style="font-size: 1.6em; font-weight: bold; margin-top: 5px;">
        $${roiCalculation.annualFailureCost.toLocaleString()}
      </div>
    </div>
    <div style="background: rgba(255,255,255,0.15); padding: 15px; border-radius: 8px;">
      <div style="opacity: 0.9; font-size: 0.85em;">Total Investment</div>
      <div style="font-size: 1.6em; font-weight: bold; margin-top: 5px;">
        $${roiCalculation.totalInvestment.toLocaleString()}
      </div>
    </div>
    <div style="background: rgba(255,255,255,0.15); padding: 15px; border-radius: 8px;">
      <div style="opacity: 0.9; font-size: 0.85em;">Net Annual Savings</div>
      <div style="font-size: 1.6em; font-weight: bold; margin-top: 5px; color: #16A085;">
        $${roiCalculation.netAnnualSavings.toLocaleString()}
      </div>
    </div>
  </div>
  <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 15px; margin-top: 15px;">
    <div style="background: rgba(255,255,255,0.2); padding: 20px; border-radius: 8px;">
      <div style="opacity: 0.9; font-size: 0.9em;">Payback Period</div>
      <div style="font-size: 2.5em; font-weight: bold; margin-top: 5px;">
        ${roiCalculation.paybackMonths.toFixed(1)} mo
      </div>
      <div style="opacity: 0.8; font-size: 0.85em; margin-top: 5px;">
        ${(roiCalculation.paybackMonths / 12).toFixed(1)} years
      </div>
    </div>
    <div style="background: rgba(255,255,255,0.2); padding: 20px; border-radius: 8px;">
      <div style="opacity: 0.9; font-size: 0.9em;">ROI (First Year)</div>
      <div style="font-size: 2.5em; font-weight: bold; margin-top: 5px; color: #16A085;">
        ${roiCalculation.roi.toFixed(0)}%
      </div>
      <div style="opacity: 0.8; font-size: 0.85em; margin-top: 5px;">
        Return on investment
      </div>
    </div>
  </div>
  <div style="background: rgba(255,255,255,0.15); padding: 15px; border-radius: 8px; margin-top: 15px;">
    <div style="display: flex; justify-content: space-between; align-items: center;">
      <div>
        <div style="opacity: 0.9; font-size: 0.85em;">5-Year Net Present Value</div>
        <div style="font-size: 1.8em; font-weight: bold; margin-top: 5px;">
          $${roiCalculation.fiveYearNPV.toLocaleString()}
        </div>
      </div>
      <div style="text-align: right;">
        <div style="opacity: 0.9; font-size: 0.85em;">Prevented Failures/Year</div>
        <div style="font-size: 1.8em; font-weight: bold; margin-top: 5px;">
          ${roiCalculation.preventedFailures.toFixed(1)}
        </div>
      </div>
    </div>
  </div>
</div>`

Show code

roiBreakdown = [
  {category: "Hardware", cost: motorCount * sensorCostPerMotor, percent: (motorCount * sensorCostPerMotor / roiCalculation.totalInvestment * 100).toFixed(1)},
  {category: "Software", cost: softwareCost, percent: (softwareCost / roiCalculation.totalInvestment * 100).toFixed(1)},
  {category: "Integration", cost: motorCount * sensorCostPerMotor * integrationMultiplier, percent: (motorCount * sensorCostPerMotor * integrationMultiplier / roiCalculation.totalInvestment * 100).toFixed(1)}
]

html`<div style="margin: 20px 0;">
  <h4 style="color: #2C3E50;">Investment Breakdown</h4>
  ${Plot.plot({
    marginLeft: 100,
    x: {label: "Cost ($)", grid: true},
    y: {label: null},
    marks: [
      Plot.barX(roiBreakdown, {
        y: "category",
        x: "cost",
        fill: "#3498DB",
        tip: true,
        title: d => `${d.category}: $${d.cost.toLocaleString()} (${d.percent}%)`
      }),
      Plot.text(roiBreakdown, {
        y: "category",
        x: "cost",
        text: d => `$${(d.cost/1000).toFixed(0)}K`,
        dx: -30,
        fill: "white",
        fontSize: 12,
        fontWeight: "bold"
      }),
      Plot.ruleX([0])
    ]
  })}
</div>`

Checkpoint: Economics and Rollout

You now know:

The sample 100-motor case uses 8 historical failures, 12 hours per failure, $5,000 per downtime hour, 85% prediction, and a 4.2-month payback.
The separate chemical-plant example pays back in about 9 months after subtracting $50,000/year operating cost from prevented-failure savings.
Phase 1 should select 10-20 high-criticality assets, prove at least one previously undetected issue, and expand only after technician-confirmed alert quality.

45.15 Implementation Roadmap

Phased implementation roadmap for predictive maintenance

Implementation timeline showing three phases: Pilot (months 1-6) focuses on critical asset selection and baseline establishment, Expansion (months 7-18) scales coverage and adds ML capabilities, and Optimization (months 19-36) achieves full facility coverage with automated workflows.

45.15.1 Phase 1: Pilot (Months 1-6)

Select 10-20 critical assets
Deploy basic vibration and temperature sensors
Establish data collection infrastructure
Create baseline normal operation profiles
Success metric: Detect one previously undetected issue

45.15.2 Phase 2: Expansion (Months 7-18)

Expand to 50-100 assets
Implement ML-based anomaly detection
Integrate with CMMS for work order generation
Train maintenance technicians on new tools
Success metric: measured reduction in emergency work on pilot asset classes

45.15.3 Phase 3: Optimization (Months 19-36)

Full facility coverage (all critical assets)
Remaining useful life predictions
Automated parts ordering
Continuous model improvement
Success metric: sustained planned-maintenance ratio and technician-confirmed alert quality

Predictive Maintenance Concepts

Concept	Relates To	Relationship
Vibration Analysis	FFT/Signal Processing	Time-domain vibration data transformed to frequency domain to identify bearing defect harmonics
RUL Prediction	Time-Series ML Models	LSTM networks forecast remaining useful life by learning degradation patterns from historical sensor data
OPC-UA	IIoT Data Collection	Industrial protocol extracts vibration, temperature, and power data from PLCs for predictive models
ROI Calculation	Business Cases	Payback period = Investment / (Prevented_Failures × Failure_Cost - Operating_Cost)

Cross-module connection: Data Storage and Databases explains time-series database design for storing high-frequency vibration data (100-1000 Hz) with millisecond timestamps required for FFT analysis.

Common Pitfalls

Name Failure Mode First

Installing sensors on every machine creates data but not necessarily maintenance value. Start with the asset, component, fault mechanism, consequence, detection method, and action that the alert should trigger.

2. Losing Operating Context

Vibration, temperature, current, and pressure all change with speed, load, recipe, ambient condition, and recent maintenance. Store that context with each feature window or the model will confuse normal operating changes with faults.

3. Treating Alerts as the Finish Line

A high anomaly score is not a maintenance outcome. PdM needs a closed loop: alert review, work-order creation, technician disposition, part condition, return-to-service record, and model or threshold update.

Label the Diagram

Code Challenge

45.16 Summary

Predictive maintenance is one of the highest-value Industrial IoT patterns when it is tied to observable failure modes and closed maintenance workflows:

Quiz: PdM Concepts

Quiz: PdM Implementation Order

Key Takeaways

Strategy comparison: Reactive, preventive, and predictive maintenance make different tradeoffs between emergency repair, planned replacement, condition evidence, and downtime risk.
Sensing technologies: Vibration, thermal, acoustic, current, pressure, oil, and controller-state signals are useful only when they expose the target fault under the asset’s operating conditions.
ML approaches: Supervised models need labeled outcomes, unsupervised models need disciplined baseline review, and RUL models need degradation histories for the specific asset class and fault mode.
Implementation: Start with a small critical-asset pilot, prove alert quality against technician findings, then scale only after the maintenance workflow and economics are measured.
Success factors: Technology is necessary but not sufficient - cultural change, technician training, and organizational commitment are equally important.

Predictive Maintenance Reference

45.16.1 Planning Inputs To Localize

Input	Why It Matters
Failure cost	Sets the value ceiling for prevented failures
Downtime hours and rate	Converts a technical failure into business impact
Sensor and installation cost	Determines whether the asset is worth instrumenting
False-positive burden	Controls technician trust and inspection workload
Spare-part lead time	Defines how early the warning must arrive

45.16.2 Vibration Frequency Signatures

Defect	Common Frequency Clue	Design Note
Imbalance	1x shaft speed	Compare against speed/load baseline
Misalignment	Often strong 2x component	Confirm with axial/radial measurements
Bearing defects	BPFO/BPFI bands and harmonics	Requires bearing geometry and good mounting
Gear mesh	Tooth count x shaft speed	Sidebands and load context matter

45.16.3 Temperature Review

Check	Why It Matters
Rise above ambient	Separates equipment heating from room-temperature change
Phase-to-phase imbalance	Flags electrical connection or load asymmetry
Trend slope	Identifies whether the condition is stable or worsening
Component limit	Keeps decisions tied to the actual device rating

45.16.4 ML Model Selection

Labeled failures -> supervised classification or regression
Few or no labels -> anomaly detection plus engineer review
Degradation histories -> RUL or time-series forecasting

45.16.5 ROI Formula

Annual Savings = (Failures × Detection_Rate × Failure_Cost) - Operating_Cost
Payback_Period = Investment / Annual_Savings

Baseline Data Before Sensors

The Error: A factory installs vibration sensors on 50 motors and immediately expects anomaly alerts. After 2 weeks, they get zero alerts and assume the system is broken – or worse, they tune sensitivity so high that false alarms overwhelm maintenance.

Why It Happens: Machine learning models need to learn “normal” before detecting “abnormal.” Each motor has a unique vibration signature based on its age, mounting, load, and environment. Without baseline data, the model has no reference.

Real Example: A food processing plant deployed predictive maintenance sensors on 30 pumps. They expected immediate failure predictions. Instead, they got alerts on pumps that had run the same way for 10 years. The “anomalies” were just normal operating characteristics the model hadn’t seen yet.

The Fix:

Run in learning mode for 4-8 weeks to establish baseline per motor
Capture full operating envelope: startup, shutdown, light load, heavy load, seasonal variations
Label known-good periods in training data (exclude startups, maintenance events)
Tune thresholds after baseline – start conservative (only flag extreme deviations)
Continuous retraining as equipment ages (bearing wear shifts baseline)

Timeline:

Weeks 1-4: Passive data collection, no alerts enabled
Weeks 5-8: Model training on baseline data, internal validation
Week 9: Enable alerts at conservative thresholds (low sensitivity)
Weeks 10-16: Adjust thresholds based on technician feedback
Month 4+: Confidence in predictions, adjust sensitivity upward

Key Insight: Rushing to production without baseline data causes alert fatigue (“boy who cried wolf”) that destroys user trust. Technicians who ignore 10 false alarms will ignore the 11th real one. The 4-8 week investment in baseline data pays for itself by preventing trust erosion.

45.17 See Also

Vibration Analysis Sensors — MEMS accelerometer specifications for industrial predictive maintenance (100-1000 Hz sampling, ±50g range)
Time-Series Databases — InfluxDB and TimescaleDB design for storing high-frequency sensor data with millisecond precision
LSTM Neural Networks — Recurrent architecture for remaining useful life forecasting with time-series sensor data
Digital Twins — Virtual equipment replicas that combine real-time sensor data with physics-based models for advanced failure prediction

In 60 Seconds

This chapter covers predictive maintenance, explaining the core concepts, practical design decisions, and common pitfalls that IoT practitioners need to build effective, reliable connected systems.

45.18 What’s Next

Direction	Chapter	Description
Related	Industry 4.0 Fundamentals	Core concepts and technologies
Related	OPC-UA Standard	Industrial interoperability for data collection
Deep Dive	Data Storage and Databases	Time-series storage for industrial data
Index	Industry 4.0 Fundamentals	Overview of all IIoT topics