Scenario: Thames Water operates the Beckton Sewage Treatment Works in east London, processing 900 million litres of wastewater daily. The plant monitors chlorine residual, turbidity, pH, flow rate, and dissolved oxygen across 45 treatment stages. A contamination event must be detected within 60 seconds (safety-critical), while gradual process degradation should be caught within hours.
Given:
- 280 sensors across 45 treatment stages
- Sampling rates: Chlorine/pH at 1 Hz, turbidity at 0.5 Hz, flow at 0.1 Hz
- Total data rate: 280 sensors x avg 0.5 Hz = 140 readings/second
- Safety requirement: Chlorine deviation >0.5 mg/L detected within 60 seconds
- Process optimization: Detect sludge thickener degradation within 4 hours
- Historical data: 3 years of sensor readings with 47 labeled contamination events
Step 1 – Deploy edge detection (safety-critical, <60 second response):
Each treatment stage has a PLC (Siemens S7-1500) performing edge detection:
| Chlorine out of range |
Hard bounds |
<0.2 or >4.0 mg/L |
<1 second |
Emergency shutdown of dosing system |
| pH extreme |
Hard bounds |
<5.0 or >10.0 |
<1 second |
Divert flow to holding tank |
| Chlorine rate-of-change |
Delta check |
>0.3 mg/L in 30 seconds |
30 seconds |
Alert operator + increase sampling |
| Turbidity spike |
Z-score (window=60) |
Z > 4.0 |
60 seconds |
Alert + trigger grab sample |
Edge detection catches: instantaneous sensor failures, sudden contamination events, dosing equipment malfunctions.
Edge detection misses: gradual fouling, seasonal baseline shifts, multi-sensor correlated degradation.
Step 2 – Deploy gateway detection (process anomalies, ~5 minute response):
A gateway server (Dell PowerEdge, on-premises) runs Isolation Forest models:
- Input: 5-minute rolling features from 280 sensors (mean, std, min, max, trend slope)
- Model: 45 Isolation Forest models (one per treatment stage), trained on 2 years of normal operation
- Detection targets: Multi-sensor anomalies where individual readings are in-range but the combination is abnormal
- Example: Turbidity normal (2.1 NTU), pH normal (7.2), but dissolved oxygen dropping from 6.0 to 4.5 mg/L while flow is constant – indicates biological treatment failure
| Multi-sensor correlation |
5-minute aggregates |
Monthly retrain |
2.1% (acceptable for operator review) |
| Diurnal pattern deviation |
1-hour features vs time-of-day profile |
Weekly profile update |
1.8% |
| Inter-stage anomaly |
Difference between stage N and stage N+1 |
Monthly |
0.9% |
Step 3 – Deploy cloud detection (long-term degradation, hours to days):
Azure cloud runs LSTM autoencoders on historical patterns:
- Training data: 3 years, 47 labeled events, ~10 billion data points
- Model: LSTM autoencoder per treatment stage, 168-hour (1-week) input window
- Detection: Reconstruction error exceeding 99th percentile of training distribution
- Targets: Membrane fouling (gradual over 2-4 weeks), sludge thickener degradation (5-10 day progression), seasonal algae bloom effects
Retraining: Quarterly, incorporating new labeled events. Model A/B testing before promotion to production.
Step 4 – Calculate detection coverage:
| Sudden contamination (47 events in 3 years) |
43/47 (91%) |
46/47 (98%) |
47/47 (100%) |
47/47 (100%) |
| Gradual degradation (23 events) |
2/23 (9%) |
14/23 (61%) |
21/23 (91%) |
21/23 (91%) |
| Sensor malfunction (156 events) |
148/156 (95%) |
155/156 (99%) |
156/156 (100%) |
156/156 (100%) |
| Weighted detection rate |
77% |
90% |
96% |
98% |
Cost of the pipeline:
| Edge (45 PLCs, existing) |
GBP 0 (already installed) |
Configuration only |
GBP 5,000 |
| Gateway (1 server) |
GBP 4,000 (one-time) |
Open-source ML stack |
GBP 2,000 |
| Cloud (Azure) |
– |
Azure ML + storage |
GBP 14,000 |
| Total annual |
|
|
GBP 21,000 |
Value: A single undetected contamination event resulting in a Drinking Water Inspectorate prosecution costs GBP 250,000-500,000 in fines plus remediation. The pipeline has prevented 3 near-miss events in its first year of operation.
Result: The three-tier architecture achieves 98% weighted detection rate at GBP 21,000/year. Edge tier handles 77% of anomalies with <1 second response. Gateway adds 13% coverage for multi-sensor correlations. Cloud adds 6% for long-term degradation patterns. Each tier catches anomaly types that lower tiers miss.
Key Insight: The three tiers are not redundant – they detect fundamentally different anomaly types. Removing any tier creates blind spots. Edge catches sudden failures (critical for safety). Gateway catches cross-sensor correlations (critical for process quality). Cloud catches slow degradation (critical for maintenance planning). The layered cost also scales appropriately: edge detection is nearly free (reuses existing PLCs), gateway costs GBP 2,000/year, and cloud costs GBP 14,000/year.