Scenario: A manufacturing facility deploys 250 vibration sensors on production machinery, each reporting at 10 Hz. Raw data volume threatens to overwhelm storage capacity and increase cloud costs by 300%. Design a retention strategy that balances operational needs with cost constraints.
Given:
- Sensors: 250 vibration sensors
- Sample rate: 10 Hz (10 readings/second per sensor)
- Data size: 16 bytes per reading (timestamp + sensor_id + vibration_amplitude)
- Current cloud storage cost: $0.023 per GB-month
- Operational requirements: Real-time anomaly detection (last 24 hours), weekly trend analysis, annual compliance reporting
Step 1: Calculate baseline storage requirements
Daily raw data = 250 sensors × 10 readings/sec × 86,400 sec/day × 16 bytes
= 250 × 864,000 × 16 = 3,456,000,000 bytes/day
= 3.22 GB/day raw
Annual raw storage = 3.22 GB/day × 365 days = 1,175 GB/year (1.15 TB)
Annual cost (no retention policy) = 1,175 GB × $0.023 = $27.03/month growing indefinitely
Step 2: Design multi-tier retention policy
Tier 1 (Hot): Raw 10 Hz data, 24-hour retention
Storage = 3.22 GB/day × 1 day = 3.22 GB
Use case: Real-time anomaly detection (FFT analysis requires high-resolution)
Tier 2 (Warm): 1-second averages (10→1 Hz downsampling), 7-day retention
Reduction = 10:1 temporal + 2:1 compression = 20:1 total
Storage = 3.22 GB/day ÷ 10 × 7 days × 0.5 (compression) = 1.13 GB
Use case: Recent operational diagnostics
Tier 3 (Cool): 1-minute averages, 90-day retention
Downsampling = 600:1 (from raw 10 Hz)
Storage = 3.22 GB/day ÷ 600 × 90 days × 0.5 = 0.24 GB
Use case: Weekly trend analysis
Tier 4 (Archive): Daily aggregates (min/max/avg/p95), 7-year retention
Downsampling = 864,000:1
Storage = 3.22 GB/day ÷ 864,000 × 2,555 days × 0.5 = 0.005 GB (5 MB)
Use case: Annual compliance reports
Step 3: Calculate total storage and cost savings
Total storage with retention policy:
Tier 1: 3.22 GB
Tier 2: 1.13 GB
Tier 3: 0.24 GB
Tier 4: 0.005 GB (negligible)
Total: 4.59 GB
Monthly cost = 4.59 GB × $0.023 = $0.11/month
Savings vs. no policy = ($27.03 - $0.11) / $27.03 = 99.6% cost reduction
Storage efficiency: 1,175 GB (annual raw) vs. 4.59 GB (steady state) = 256:1 reduction
Step 4: Validate against operational requirements
- ✓ Real-time anomaly detection: 24-hour hot tier at full 10 Hz resolution
- ✓ Weekly trend analysis: 90-day warm/cool tier with 1-minute granularity
- ✓ Annual compliance: 7-year archive with daily summaries
Result: The factory achieves 99.6% storage cost reduction while meeting all operational and compliance requirements. The multi-tier retention strategy eliminates 256:1 of redundant historical data without losing analytical capability.
Key Insight: Retention policies must align with data access patterns. High-resolution data is only valuable for recent analysis—historical trends require far less granularity. Always implement retention policies from day one; retrofitting them on existing petabyte-scale deployments is prohibitively expensive.