3 Service Boundaries and Decomposition

design-patterns

soa

3.1 Start With the Service That Owns a Promise

A service boundary earns its place when one capability has a promise it can own. Telemetry ingest promises to accept readings, command dispatch promises to track actions, and alerting promises to explain when thresholds became incidents.

Start by naming the promise, owner, data, contract, and failure behavior. If those are vague, more services only spread the confusion across more deployables.

Blueprint Bina

“An architecture is a set of decisions you can point to — if you can’t name the boundary, you haven’t drawn it yet.”

Through this chapter, Bina keeps one sheet per service line: the capability boundary, the extraction decision, and the contract you can point to.

In 60 Seconds

SOA and microservices both organize systems around services, but they solve different operating problems. SOA is usually about integrating capabilities through stable service contracts across an enterprise. Microservices are about independently deployable services that can be owned, changed, scaled, and observed by focused teams. For IoT platforms, the safest path is usually to start with clear module boundaries, then extract services only when team ownership, runtime scaling, failure isolation, or integration evidence justifies the extra distributed-system cost.

Minimum Viable Understanding

A service boundary is an ownership boundary. A service is useful only when a team can build, run, change, and support it without constant coordination.
SOA is integration-first. It is valuable when existing systems, protocols, and contracts must interoperate without rewriting everything.
Microservices are operations-first. They require automated delivery, observability, versioned contracts, and mature incident response.
A modular monolith can be a strong starting point. It keeps deployment simple while the team learns the domain and tests boundaries.
Bad boundaries create a distributed monolith. If every feature requires multiple services to change and deploy together, the split did not buy independence.

3.2 Boundaries Come Before Services

The first design question is not whether to use microservices. It is where the system has real boundaries: device lifecycle, telemetry ingest, command dispatch, alert rules, user accounts, billing, support, or integration with external systems. A service boundary is useful only when it protects a capability that has its own vocabulary, data, owner, runtime needs, and failure behavior. If the team cannot name those things, a new service mostly adds network calls and coordination.

Consider a cold-room monitoring product. The device registry owns device identity, model, provisioning state, certificate status, site assignment, and ownership transfer. Telemetry ingestion owns MQTT or HTTPS message intake, payload validation, timestamp normalization, duplicate handling, queueing, and publication of telemetry.received events. A command service owns requested actions, dispatch state, acknowledgement timeouts, retry budgets, and audit history. Alerting owns threshold rules, suppression windows, escalation policy, and notification outcomes. These are capability boundaries; “API layer” and “database layer” are not.

The simplest deployable shape may still be one application. A modular monolith can keep registry, telemetry, commands, alerts, and dashboards in separate packages while the product team learns how often each capability changes. SOA becomes valuable when the platform must connect to existing ERP, MES, SCADA, billing, support, or warehouse systems through stable contracts. Microservices become valuable when one capability has a distinct owner, release cadence, scaling profile, data model, or failure-isolation need that is strong enough to justify operating another production service.

The mistake is treating service count as architecture maturity. A fleet with 200 devices and one team may be safer with a modular monolith, PostgreSQL tables hidden behind clean modules, and one deploy pipeline. A fleet with millions of readings per hour, a dedicated telemetry team, Kafka or MQTT broker lag to watch, TimescaleDB or ClickHouse retention policy, and separate alerting incidents may need telemetry ingestion extracted. The boundary decision follows evidence, not fashion.

Modular monolith: best when the domain is still being learned.
SOA: best when stable contracts connect existing enterprise capabilities.
Microservices: best when ownership, scaling, release, or failure isolation evidence justifies the operational cost.

Bina’s Boundary Sheet

Boundary: capability lines in the cold-room product — device registry, telemetry ingestion, commands, alerting.
Decision: split by capability, not by layer — “API layer” and “database layer” are ruled out as boundaries.
Point to it: each capability’s owned surface — registry holds identity and provisioning state; ingestion publishes telemetry.received.

3.3 Test a Service Boundary

A service boundary is ready only when the team can describe ownership, data, contracts, runtime behavior, and failure handling. For an IoT alerting capability, that means knowing who owns alert rules, which telemetry it consumes, what event contract it accepts, how notifications are retried, how duplicate events are ignored, and how operators diagnose failures when SMS, email, push, or webhook delivery slows down.

Start with a boundary review before drawing Kubernetes boxes. Write one capability statement: “alerting evaluates customer rules against normalized telemetry and records notification outcomes.” Then list the inputs and outputs. Inputs might be telemetry.received, device.status.changed, and customer rule changes from an admin API. Outputs might be alert.triggered, alert.suppressed, ticket webhooks, and a query API for alert history. If alerting also edits device identity, writes raw telemetry rows, owns billing entitlements, and renders the dashboard, the boundary is too wide.

Next test whether extraction buys something measurable. If alert evaluation consumes bursty telemetry while the registry is stable, an independent worker pool or service may reduce incident scope. If alert rules change weekly while provisioning code changes rarely, a separate release path may reduce coordination. If notification outages should not block telemetry ingestion, an event boundary and retry queue are useful. If none of those pressures exist, keep alerting as an internal module and strengthen contracts inside the monolith.

Ownership: One team can change and support the capability without coordinating every release.
Data: The service owns its writes and exposes data through contracts, not a shared table free-for-all.
Operations: Logs, metrics, traces, dashboards, and runbooks exist before the split becomes production-critical.
Failure mode: A timeout, replay, duplicate event, or downstream outage has a designed response.

For a real review, capture the current coupling. Count how many repositories must change for one alert-rule feature. Check whether dashboards read alert tables directly or use an API. Confirm whether device firmware, gateways, or mobile apps depend on a contract version that cannot be changed quickly. Review metrics such as queue depth, event lag, p95 evaluation latency, notification retry count, dead-letter volume, and incident ownership. These signals tell the team whether it has a service candidate, a module-cleanup task, or a missing operations practice.

3.4 Distributed Costs You Must Pay

Extracting a service converts local code problems into distributed-systems problems. Calls can be slow, duplicated, reordered, partially applied, or unavailable. Data can become eventually consistent. Contract changes can break old devices, field gateways, mobile apps, dashboards, or partner integrations that stay online for years. The service boundary therefore needs contracts, versioning, failure rules, and observability before it becomes production-critical.

flowchart LR
  Device[Device or gateway] -->|MQTT/HTTPS telemetry| Ingest[Telemetry ingestion]
  Device -->|command ack/event| Command[Command service]
  Ingest -->|telemetry.received event| Rules[Rule evaluation]
  Rules -->|alert.triggered event| Notify[Notification service]
  Registry[Device registry] -->|device identity API| Ingest
  Registry -->|ownership/site API| Command
  Ingest -->|owned write model| TSDB[(Time-series store)]
  Notify -->|delivery outcome API| Support[Support and operations]

Example IoT service boundaries: each capability owns data and collaborates through APIs or events instead of shared tables.

In the diagram, telemetry ingestion can accept device messages without waiting for rule evaluation, notification delivery, or dashboard rendering. That protects the data path when a downstream service slows down. The registry exposes identity through an API instead of letting ingestion, command handling, and support screens read registry tables directly. Rule evaluation consumes events and produces alert events. Notification records delivery attempts and failures. Each service still collaborates, but the collaboration is through contracts that can be versioned, observed, retried, and tested.

Idempotency keeps repeated commands, alerts, or webhook deliveries from causing duplicate side effects.
Versioned contracts let device firmware, gateways, and apps move at different speeds.
Event design separates facts that happened from commands that ask another service to act.
Backpressure protects ingest and alerting when telemetry spikes or downstream services slow down.

Bina’s Boundary Sheet

Boundary: telemetry ingest’s downstream edge — events out, no waiting on rules, notifications, or dashboards.
Decision: acknowledge readings before fan-out, ruling out a downstream slowdown becoming a data-loss incident.
Point to it: the flow diagram — telemetry.received events, an owned write model, contracts that are versioned, observed, tested.

Data ownership is the part teams most often underbuild. A telemetry service can publish events and expose read APIs, but other services should not depend on its private table names, partitioning scheme, retention jobs, or compression choices. A command service can emit command-state events, but it should not let alerting update command rows to force a workflow forward. A registry can expose device identity, certificate state, and ownership, but it should protect certificate rotation and provisioning internals from consumers.

Failure semantics must be part of the boundary. If a notification provider is down, telemetry should still be accepted and alerts should enter a retryable state. If the registry API is unavailable, ingestion may reject unknown devices, use a bounded identity cache, or mark messages for quarantine depending on the product risk. If a device reconnects and repeats a command acknowledgement, idempotency keys and sequence numbers should prevent duplicate side effects. If a schema changes, contract tests and versioned payloads should catch consumers before deployment.

The deeper architecture skill is deciding which complexity belongs in a service boundary and which complexity should stay inside a simpler deployable unit. Use service extraction when it turns a real operating pressure into a cleaner ownership and failure model. Avoid extraction when it only spreads one uncertain model across more deployables, repositories, credentials, queues, dashboards, and incidents.

3.5 Learning Objectives

By the end of this chapter, you will be able to:

Compare modular monoliths, SOA integration, and microservices without treating any of them as a default.
Identify IoT service boundaries from business capabilities, bounded contexts, data ownership, and runtime behavior.
Recognize distributed monolith, shared database, synchronous chain, and nano-service anti-patterns.
Explain why database ownership and event-driven collaboration matter in microservice systems.
Decide when an IoT capability should stay inside a modular monolith or be extracted into a separate service.
Review a service architecture for team ownership, deployment independence, failure isolation, and operational readiness.

Most Valuable Understanding

Microservices are not a badge of maturity. They are a trade: you gain independent ownership and runtime control only if you can pay for service contracts, automation, observability, data ownership, versioning, and failure handling. Without those, a modular monolith is often the more professional design.

3.6 Prerequisites

IoT Reference Models and Patterns: Understand layers, viewpoints, and architecture tradeoffs.
Architecture Choice Map: Review the broader service-architecture context.
SOA API Design and Discovery: Understand service contracts, versioning, discovery, and idempotency.
SOA Container Orchestration: Understand what it takes to deploy and operate services.
MQTT Fundamentals: Know why IoT systems often combine APIs with message streams.

3.7 Architecture Choice Map

The question is not “monolith or microservices?” The better question is “which architecture matches the current evidence?”

Three architecture choices for IoT platforms: modular monolith for learning fast, SOA integration for existing systems, and microservices for independently owned and scaled domains — Figure 3.1: SOA architecture choice map for IoT platforms

3.7.1 Modular Monolith

One deployable application with strong internal module boundaries. Use it when the domain is still changing, the team is small, or operational simplicity matters more than independent deployment.

3.7.2 SOA Integration

Stable service contracts connect existing systems, protocols, and departments. Use it when the problem is interoperability across enterprise capabilities rather than rebuilding every system.

3.7.3 Microservices

Independently deployable services own their data, contracts, and runtime health. Use them when several teams need independent change or components have different scaling and failure profiles.

Knowledge Check: Choosing the Starting Architecture

3.8 SOA and Microservices

SOA and microservices overlap, but their center of gravity is different.

3.8.1 SOA Focus

Stable service contracts across systems.
Integration among legacy, packaged, and custom applications.
Centralized governance where consistency is important.
Protocol mediation, routing, and enterprise interoperability.

3.8.2 Microservices Focus

Independent deployability and ownership.
Services organized around business capabilities.
Decentralized data ownership.
Automated delivery, monitoring, and failure-aware design.

In IoT, SOA often appears at the enterprise edge: connecting ERP, MES, SCADA, billing, device platforms, and data warehouses. Microservices often appear inside the cloud platform: telemetry ingestion, device registry, rule evaluation, alerting, analytics, and user-facing APIs.

Knowledge Check: Enterprise Integration

3.9 Service Boundary Review

Before extracting a service, test whether the boundary has enough evidence.

Figure 3.2: Service boundary review for an IoT platform

3.9.1 Good Boundary Signals

The capability has its own vocabulary and rules.
A team can own the full lifecycle.
Data can be owned behind an API or event stream.
The component has distinct scaling, reliability, or change needs.
The service can be deployed without forcing unrelated services to deploy.

3.9.2 Boundary Smells

A feature always changes several services at once.
Services share database tables directly.
The split follows UI, API, business logic, and database layers instead of business capabilities.
Most service calls are synchronous chains.
One team is responsible for too many tiny services.

3.10 IoT Capability Map

IoT platforms contain several candidate domains, but not all of them need to become services immediately.

3.10.1 Device Registry

Owns device identity, lifecycle state, model, ownership, and provisioning metadata.

3.10.2 Telemetry Ingestion

Receives readings, validates payloads, handles buffering, and publishes events for downstream consumers.

3.10.3 Command Service

Tracks command intent, dispatch state, retries, acknowledgments, and audit history.

3.10.4 Rule Evaluation

Evaluates thresholds, windows, device groups, and customer-defined conditions.

3.10.5 Notification Service

Sends alerts through email, SMS, push, webhooks, or ticketing integrations.

3.10.6 Analytics API

Provides historical query, aggregation, anomaly, and dashboard-friendly read models.

Use capability names to discuss boundaries before drawing infrastructure. “Telemetry ingestion” and “device registry” are domain terms. “API layer” and “database layer” are implementation layers.

3.11 Data Ownership and Collaboration

Microservices become real only when data ownership is clear. A service can publish facts and expose APIs, but other services should not depend on its internal tables.

3.11.1 Prefer

Each service owns its schema.
Other services use APIs, events, or replicated read models.
Schema changes are hidden behind contracts.
Cross-service workflows use events, sagas, or explicit orchestration.

3.11.2 Avoid

Several services reading and writing the same tables.
Hidden coupling through shared database views.
Distributed transactions for routine service collaboration.
Analytics queries that bypass service ownership without governance.

Knowledge Check: Shared Database

3.12 Communication Patterns

IoT service architectures usually combine synchronous APIs and asynchronous events.

3.12.1 Synchronous API

Use when the caller needs an immediate answer: device lookup, command submission, dashboard query, or configuration read.

3.12.2 Asynchronous Event

Use when producers and consumers can be decoupled: telemetry received, device registered, command completed, alert triggered, or firmware rollout state changed.

3.12.3 Saga

Use when a business workflow spans services and needs compensating actions rather than a single distributed database transaction.

3.12.4 CQRS or Read Model

Use when write-side ownership and read-side query needs differ, such as high-volume telemetry writes and dashboard-friendly aggregated reads.

IoT Rule

Telemetry paths should avoid long synchronous chains. If ingestion waits for analytics, notifications, and dashboard updates before acknowledging a reading, a downstream slowdown can become a data-loss incident.

3.13 Extraction Workflow

Extract services deliberately. The first extracted service should prove the operating model, not maximize the number of services.

Name capability Describe the business capability and its vocabulary.

Draw boundary Identify data, APIs, events, dependencies, and owners.

Prove reason Document the scaling, deployment, ownership, or failure-isolation reason.

Plan migration Move data and behavior incrementally while keeping clients stable.

Measure result Check whether deployment, incident scope, or team flow actually improved.

Knowledge Check: Over-Decomposition

3.14 Common Patterns

These patterns are useful when they solve a concrete IoT pressure.

API gateway architecture showing clients, an API gateway with authentication, rate limiting, routing, load balancing, protocol translation, caching, monitoring, and backend services for devices, telemetry, analytics, alerts, commands, users, configuration, billing, and reporting — Figure 3.3: API gateway architecture for IoT services

Use Figure 3.3 as a concrete example of a service boundary that belongs at the platform edge. It should simplify client contracts and cross-cutting controls; it should not become a place where every domain rule and database dependency is hidden.

Bina’s Boundary Sheet

Boundary: the platform edge — an API gateway between clients and the services behind it.
Decision: the edge owns client contracts and cross-cutting controls, ruling out gateway-hidden domain rules.
Point to it: the gateway figure — authentication, rate limiting, routing at the edge, while internal services evolve.

3.14.1 API Gateway

Provides a stable edge for devices, dashboards, partners, and mobile apps while internal services evolve.

3.14.2 Database per Service

Gives each service authority over its own schema and protects independent evolution.

3.14.3 Event-Driven Collaboration

Lets telemetry, rules, notifications, storage, and analytics evolve without direct call chains.

3.14.4 Read Models

Build dashboard-friendly or analytics-friendly views without forcing every query through the write service.

3.14.5 Saga

Coordinates multi-service workflows such as provisioning, billing setup, and notification enrollment through local transactions and compensation.

3.14.6 Anti-Corruption Layer

Protects a clean domain model from legacy schemas, third-party APIs, or protocol-specific details.

Resilience Readiness Check

Before extracting an IoT capability into a network service, check whether the service can survive the failure patterns that microservices introduce:

Circuit breaker: fail fast when a dependency is unhealthy, then use a cached result, a default response, or a clear degraded-mode error.
Bulkhead: isolate thread pools, queues, tenants, and expensive integrations so one slow dependency does not consume the whole platform.
Retry with jitter: retry only transient failures, bound the retry count, and add random delay so reconnecting devices do not create a synchronized retry storm.
Buffered telemetry: store or queue measurements before fan-out so dashboards, analytics, and notification services can fall behind without blocking ingestion.
Observable contracts: expose request rate, error rate, latency, saturation, event lag, and dead-letter counts before declaring the split successful.

Treat these checks as extraction evidence. If the team cannot name the fallback, isolation boundary, retry policy, and operator signal, the safer design is usually a modular monolith with stronger internal boundaries.

Common Pitfalls

1. Splitting by technical layer

An API service, business-logic service, and data-access service usually still need to deploy together for one business change. Split vertically by capability when a split is justified.

2. Sharing tables between services

Shared tables make one service’s internal schema another service’s dependency. Use APIs, events, and read models instead.

3. Starting with microservices before the domain is understood

Early service boundaries are guesses. Keep strong module boundaries first, then extract when the team has evidence.

4. Creating synchronous telemetry chains

Telemetry ingestion should not wait on every downstream processor. Store, buffer, publish, and let consumers process independently.

5. Ignoring operations

Every service needs logging, metrics, tracing, deployment automation, ownership, incident response, and versioned contracts. If the team cannot operate that surface area, do not create it yet.

3.15 Architecture Review Checklist

Boundary: Is the service aligned to a business capability or bounded context?

Owner: Can one team own design, code, deployment, support, and roadmap?

Data: Does the service own its data model and expose changes through contracts?

Runtime: Does it have distinct scaling, latency, storage, or reliability needs?

Contract: Are APIs and events versioned and tested with consumers?

Failure: Can the service fail without taking down unrelated capabilities?

Migration: Is there a plan for data movement, dual running, rollback, and observability?

Benefit: Will extraction reduce coordination or risk enough to justify the new operational cost?

3.16 Label the Diagram

3.17 Code Challenge

3.18 Summary

SOA is strongest for stable service integration across existing systems.
Microservices are strongest when teams need independent ownership, deployment, data control, and runtime scaling.
A modular monolith is a valid starting architecture when the domain is still being learned.
Good service boundaries follow business capabilities, bounded contexts, data ownership, and operational evidence.
Bad service boundaries create shared databases, synchronized releases, synchronous chains, and distributed monoliths.

Key Takeaway

Extract a service only when it creates real independence: a clear owner, a clear contract, owned data, measurable runtime needs, and a migration path. Otherwise, strengthen the module boundary and keep learning.

3.19 Knowledge Check

Quiz: SOA and Microservices Fundamentals

Interactive Quiz: Match SOA Fundamentals Concepts

Interactive Quiz: Sequence a Safe Extraction

3.20 Try It Yourself

Exercise: Identify One Service Candidate

Choose an IoT platform or project you know and write a short extraction rationale.

Name one candidate capability, such as telemetry ingestion, device registry, command dispatch, or notifications.
List the data it owns and the data it needs from other domains.
Identify its consumers: devices, dashboards, analytics, partners, or other services.
Decide whether collaboration should be synchronous API, asynchronous event, or both.
Write the reason to extract it, or write why it should stay inside the modular monolith.
Define one metric that would prove the extraction helped.

3.21 References

3.22 What’s Next

3.22.1 Design the Service Contract

SOA API Design and Discovery explains API resources, versions, rate limits, idempotency, and service discovery.

3.22.2 Add Runtime Resilience

SOA Resilience Patterns covers retries, timeouts, circuit breakers, and bulkheads.

3.22.3 Deploy the Service

SOA Container Orchestration shows how to package, probe, scale, and roll out services.

3.22.4 Model Device State

State Machine Patterns helps keep device lifecycle and command state explicit.

3.1 Start With the Service That Owns a Promise

3.2 Boundaries Come Before Services

3.3 Test a Service Boundary

3.4 Distributed Costs You Must Pay

3.5 Learning Objectives

3.6 Prerequisites

3.7 Architecture Choice Map

3.7.1 Modular Monolith

3.7.2 SOA Integration

3.7.3 Microservices

3.8 SOA and Microservices

3.8.1 SOA Focus

3.8.2 Microservices Focus

3.9 Service Boundary Review

3.9.1 Good Boundary Signals

3.9.2 Boundary Smells

3.10 IoT Capability Map

3.10.1 Device Registry

3.10.2 Telemetry Ingestion

3.10.3 Command Service

3.10.4 Rule Evaluation

3.10.5 Notification Service

3.10.6 Analytics API

3.11 Data Ownership and Collaboration

3.11.1 Prefer

3.11.2 Avoid

3.12 Communication Patterns

3.12.1 Synchronous API

3.12.2 Asynchronous Event

3.12.3 Saga

3.12.4 CQRS or Read Model

3.13 Extraction Workflow

3.14 Common Patterns

3.14.1 API Gateway

3.14.2 Database per Service

3.14.3 Event-Driven Collaboration

3.14.4 Read Models

3.14.5 Saga

3.14.6 Anti-Corruption Layer

Common Pitfalls

3.15 Architecture Review Checklist

3.16 Label the Diagram

3.17 Code Challenge

3.18 Summary

3.19 Knowledge Check

3.20 Try It Yourself

3.21 References

3.22 What’s Next

3.22.1 Design the Service Contract

3.22.2 Add Runtime Resilience

3.22.3 Deploy the Service

3.22.4 Model Device State

3.23 Navigation

3.23.1 Previous

3.23.2 Current

3.23.3 Next