29 Cloud Data: Platforms and Services

analytics-ml

cloud

data

platforms

29.1 Start With the Story

Picture an IoT team using the ideas in Cloud Data: Platforms and Services during a live operations review. A device has produced messy evidence, an analytic step is about to change an alert or control decision, and someone has to explain why the result should be trusted.

Read this page as that path from sensor evidence to accountable action. Start with what the system observes, keep the model or data treatment visible, and finish with the check that would convince an operator, maintainer, or auditor to act.

29.2 Cloud IoT as Data Plane

Cloud IoT platforms combine a device control plane with a data plane. The control plane manages identities, certificates, device registry records, firmware status, policies, and fleet operations. The data plane ingests telemetry, routes events, stores evidence, runs stream and batch jobs, feeds dashboards, and sends selected data to analytics or machine-learning systems.

The service model changes who operates each layer. With IaaS, the team manages virtual machines, operating systems, brokers, databases, and failover. With PaaS, services such as AWS IoT Core, Azure IoT Hub, Google Pub/Sub, Event Hubs, Kinesis, Dataflow, Flink, Spark, and managed databases take over much of the plumbing. With SaaS, the team buys a complete application and accepts less control over the pipeline.

Platform choice should start from responsibilities: device identity, ingestion throughput, stream processing, storage lifecycle, query latency, governance, operations skills, and exit strategy.

A defensible platform choice starts from workload and ownership evidence before selecting IaaS, PaaS, SaaS, or a hybrid boundary.

Device Control

Registry, certificates, twins or shadows, command policies, OTA state, and fleet health.

Ingestion

MQTT, HTTPS, AMQP, broker rules, schema validation, throttling, partitioning, and dead-letter routing.

Processing

Rules engines, stream processors, serverless functions, container jobs, batch pipelines, and model scoring.

Storage

Hot time-series stores, lakehouse tables, object storage, warehouse tables, lifecycle rules, and archives.

Operations

Access control, cost tags, logs, metrics, alerts, deployment history, replay evidence, and runbooks.

Model

Provider Operates

Team Operates

IoT Fit

IaaS

Virtual machines, block storage, virtual networks, and regional infrastructure.

Broker software, OS patching, database backup, scaling, failover, and security hardening.

Specialized workloads, custom networking, or teams with strong infrastructure operations.

PaaS

Managed hubs, event buses, stream services, databases, storage classes, scaling, and service availability.

Device contracts, schemas, access policy, pipeline code, retention, cost controls, and application logic.

Most fleet telemetry systems where speed to production and managed reliability matter.

SaaS

The complete application, data model, dashboards, upgrades, hosting, and common workflows.

Configuration, integrations, data export, process fit, vendor controls, and user adoption.

Standard asset monitoring or operations workflows where customization is limited.

Hybrid edge-cloud

Cloud control, model distribution, selected analytics services, and central governance.

Edge gateways, offline behavior, local control, site networking, and raw evidence retention.

Latency-sensitive, bandwidth-limited, private, or intermittently connected deployments.

Overview Knowledge Check

29.3 Compare Full Service Chain

Cloud platform selection should compare the whole data chain, not only the published message price for an IoT hub. The actual stack includes identity integration, device provisioning, broker limits, stream processing, serverless or container workloads, storage tiers, analytics query engines, dashboards, observability, support, and the cost of the team operating those parts.

Real platforms differ most in surrounding services and team fit. AWS IoT Core often pairs with AWS IoT Greengrass, Kinesis, Lambda, Timestream, S3, Glue, Athena, Redshift, and SageMaker. Azure IoT Hub often pairs with Azure IoT Edge, Event Hubs, Stream Analytics, Data Explorer, Blob Storage, Synapse, Fabric, Power BI, and Microsoft Entra ID. Google Cloud architectures often use Pub/Sub, Dataflow, Cloud Run, BigQuery, Cloud Storage, Looker, and Vertex AI, with device connectivity supplied through MQTT brokers, gateways, or partner services. Open or edge-first stacks may use Eclipse Mosquitto, EMQX, ThingsBoard, ClearBlade, Kafka, Flink, Spark, TimescaleDB, InfluxDB, ClickHouse, or object storage.

Worked example: fleet telemetry sizing
fleet: 10,000 delivery vehicles
reporting: every 30 seconds
payload: 200 bytes per message
operating window: 12 hours/day, 26 days/month

messages per vehicle per day:
12 hours * 60 minutes/hour * 2 messages/minute = 1,440 messages

daily message count:
10,000 * 1,440 = 14,400,000 messages/day

monthly message count:
14,400,000 * 26 = 374,400,000 messages/month

daily data volume:
14,400,000 * 200 bytes = 2,880,000,000 bytes = 2.88 GB/day

monthly data volume:
2.88 GB/day * 26 = 74.88 GB/month

90-day raw retention:
74.88 GB/month * 3 = 224.64 GB

two-year aggregate retention at 10:1 compression:
74.88 GB/month * 24 / 10 = 179.712 GB

example message charge:
if the contracted rate is $0.80 per million messages,
374.4 million * $0.80 = $299.52/month for messaging.

design reading:
The message charge is only one line item. The platform choice must also account
for identity integration, stream processing, map or route services, dashboards,
storage lifecycle, monitoring, support, and operating effort.

A low message price can lose to a better-integrated platform if identity, dashboarding, data export, compliance, or operations work becomes expensive.

Dimension

Question

Evidence to Request

Common Mistake

Device onboarding

How are identities, certificates, provisioning batches, revocation, and ownership transfer handled?

Provisioning logs, certificate policy, device registry schema, and recovery workflow.

Counting message cost while ignoring device lifecycle labor.

Ingestion and routing

Can the hub handle peak burst rate, topic schema, backpressure, and bad-message quarantine?

Broker limits, partition plan, schema registry, dead-letter queue, and replay test.

Designing for average traffic and failing after a reconnect storm.

Storage and analytics

Which data is hot, warm, cold, downsampled, indexed, archived, or deleted?

Lifecycle rules, table layout, query scan bytes, retention policy, and restore test.

Keeping all raw data in a fast tier because it is easier on day one.

Operations fit

Can the team monitor, deploy, secure, budget, and debug this stack with available skills?

Runbooks, ownership matrix, cost tags, dashboards, alert history, and training plan.

Choosing the most feature-rich stack without matching the team’s operations capacity.

Practitioner Knowledge Check

29.4 Cloud Cost as Operations

Cloud cost is not a single billable unit. IoT systems pay for messages, broker throughput, stream processing, serverless duration, always-on containers, databases, object storage, data transfer, warehouse scans, observability, backups, support, and engineering time. Cost control is therefore an operating model: tag resources, set budgets, tie alerts to owners, tier data, shut down idle compute, reserve only predictable capacity, and test restore and replay before deleting anything.

Latency and sovereignty can be more important than cost. A cloud round trip may be too slow for machine protection or local control. Some data may be legally or contractually required to stay in a region or site. In those cases, edge processing, regional routing, private networking, or hybrid storage are platform requirements rather than optimization extras.

Worked example: always-on compute versus autoscaling
always-on deployment:
3 small workers at $0.05/hour
2 medium workers at $0.10/hour
month length: 30 days = 720 hours

always-on compute:
(3 * $0.05 + 2 * $0.10) * 720
($0.15 + $0.20) * 720 = $252/month

non-compute add-ons:
storage, network, logs, monitoring, and backups add 40 percent of compute
$252 * 0.40 = $100.80/month
baseline total = $352.80/month

reserved or committed compute at 30 percent discount:
compute = $252 * 0.70 = $176.40
total = $176.40 + $100.80 = $277.20/month
savings = $75.60/month

autoscaling for a bursty workload active only 25 percent of the month:
compute = $252 * 0.25 = $63.00
total = $63.00 + $100.80 = $163.80/month

design reading:
Committed discounts help predictable loads. Autoscaling helps bursty loads.
The right platform must expose enough metrics and controls to prove which
pattern is true for the workload.

Security Responsibility

Cloud providers secure their services; the team still owns device credentials, topic policy, IAM, network access, and data classification.

Lifecycle Responsibility

Retention, downsampling, archive restore, deletion, and legal hold rules must be explicit and tested.

Observability Responsibility

Message drops, throttling, dead-letter growth, stream lag, late events, query scans, and cost anomalies need owners.

Portability Responsibility

Use standard MQTT topics, schema registries, exportable table formats, and documented replay paths where exit risk matters.

When a platform review is complete, the output should be more than a selected vendor. It should include a service map, a responsibility matrix, a cost model with assumptions, a retention policy, an access-control model, a replay plan, and a list of constraints that require edge or hybrid processing. Without those artifacts, the platform choice cannot be operated or audited.

Under-the-Hood Knowledge Check

29.5 Summary

Cloud IoT platforms combine a device control plane with a telemetry data plane.
IaaS, PaaS, SaaS, and hybrid edge-cloud models trade operational control for managed reliability and speed of delivery.
Platform selection should compare the full service chain: identity, ingestion, routing, stream processing, storage, analytics, governance, observability, support, and team skills.
Workload math should include message count, payload volume, retention tiers, query patterns, compute duty cycle, and non-compute add-ons.
Cost control is an operating model: resource tags, budgets, lifecycle rules, autoscaling, committed capacity, replay plans, and owned alerts.

29.6 Key Takeaway

Choose cloud IoT services by operational fit, not by feature count or headline message price. A defensible platform decision names the device contract, expected rates, processing path, storage lifecycle, security responsibility, replay evidence, cost assumptions, and the team that will operate each layer.

29.7 Common Pitfalls

Comparing only IoT hub message pricing while ignoring stream processing, storage, dashboards, observability, support, and staffing.
Building firmware around provider-specific APIs without a migration or gateway strategy.
Designing for average telemetry rate while ignoring reconnect storms and peak bursts.
Keeping all raw data hot because retention and lifecycle rules were not planned.
Treating cloud security as fully provider-owned instead of a shared responsibility.

29.8 See Also

If you want to…	Read this
Place cloud services in the upper IoT reference-model layers	Cloud Data IoT Reference Model
Secure cloud pipelines and govern sensitive telemetry	Cloud Data Quality and Security
Design ingestion and transformation paths before choosing a platform	Big Data Pipelines
Operate replay, freshness, retention, and cost controls	Big Data Operations
Decide which work should stay near devices	Edge Processing for Big Data