Building Services That Scale
When we hear the word service, it sounds simple, almost obvious. In practice, defining what a service is can be surprisingly nuanced. A service isn’t just code running on a server. It isn’t interchangeable with a web or a mobile application. This means a service is more than just code, it’s a self contained piece of functionality with clear boundaries, responsibilities, and relationships to the rest of the system.
The term service is overloaded in the world of software engineering. For example, we’re not talking about the object oriented programming concept where a class is named Service to encapsulate business logic.
How we define a service shapes everything that follows. The architecture choices, scalability, resilience, and even team structure. Here we’ll explore what makes a service distinct, why it usually works behind the scenes without a user interface, and how it interacts with other services, web and mobile applications.
What is a Service?
Understanding what a service is, as well as what it is not, is critical before we dive into design decisions. I learnt a lot about services from the book Microservices Patterns by Chris Richardson. Before going into microservices in depth, Richardson first explains what a plain old service is:
A service is a standalone, independently deployable software component that implements some useful functionality.
It exposes an API that provides clients access to its functionality, typically through:
- Commands - operations that perform actions and update data, e.g. createOrder()
- Queries - operations that retrieve data, e.g. findOrderById()
- Events - notifications about changes, e.g. OrderCreated
Richardson goes on to explain that The API encapsulates the internal implementation, meaning clients cannot bypass it to access internal classes. This encourages modularity and loose coupling. Each service can have its own architecture and technology stack, but the essential requirement is that it:
- Has a well-defined API,
- Is independently deployable,
- Implements a cohesive set of responsibilities.
Core Characteristics of a Service
Services differ from web and mobile applications as they do not include a Graphical User Interface (GUI). They operate behind the scenes, performing tasks such as data processing, scheduling tasks (including cron jobs), and enabling integrations with other systems.
Services rarely operate in isolation. They usually interact with other parts of a system. This can include talking to other services, exchanging data with web or mobile applications, or using message queues to handle tasks in the background or fire events. Services often call APIs to share information or trigger actions.
Data Ownership and Persistence
Services often have their own persistence layer because they need autonomy and control over their data. A dedicated database allows a service to manage its schema, optimize queries for its specific workload, and evolve independently without being constrained by other applications. This isolation also improves fault tolerance: issues in one service’s data store are less likely to impact others.
You may think it’s a good idea for services to share a database and that this will simplify data consistency and reduce duplication, but this introduces tight coupling between services. Changes to the schema or performance tuning may require coordination across teams, slowing down development. Shared databases also increase the risk of cascading failures if the shared database becomes a bottleneck or is unavailable.
As I explain in Duplicate Data in Microservices, my preference is for each service to have its own database. Where necessary, common data can be shared between services and their respective databases to maintain consistency without creating tight coupling.
Services rarely operate in isolation. They usually interact with other parts of a system.
Before we dive into architectural styles like monoliths and microservices, it’s important to understand why the definition of a service matters. The way we structure services influences how we scale, maintain, and evolve systems over time. With that foundation in place, let’s explore how different architectures approach these challenges.
Architectural Choices: Monolith vs. Microservices
The purpose of any architecture is to support the business. If we don’t understand what the business actually does, its capabilities, we risk designing an architecture that looks good technically but doesn’t map well to real world needs. That leads to brittle systems and unnecessary complexity.
A Business Capability is a distinct, cohesive function within an organisation that delivers specific value to customers or the business. Examples include Order Management, Payment Processing, or Customer Profile Management. These capabilities represent stable, business oriented responsibilities rather than technical concerns. They form the foundation for defining service boundaries because they align software components with real world business operations.
I’ve borrowed the terms Business Capability from Microservices Patterns. Equally you could use business needs or requirements. I prefer business capabilities because it helps me to think about what the service should do in the context of what the business does to provide value to its customers.
Monoliths
In a monolith, all business capabilities are packaged together in a single, unified service. The entire system runs as one process, sharing the same codebase and database. This approach simplifies development and deployment early on because there is only one service to manage. It can also make certain changes easier to make. For example, if you need to add a new property to a data model, such as adding a middleName field to a User object, it’s usually straightforward in a monolith. All components share the same codebase and database, so the change can propagate easily without coordinating across multiple services.
However, as the system grows, a monolith becomes harder to maintain and scale. Changes in one capability can ripple through the entire application, making updates risky and time-consuming. Breaking a monolith into smaller services typically requires changes across multiple layers, releasing new libraries, and deploying several components, which adds complexity.
Distributed Monoliths (What Not to Do)
A distributed monolith is an architectural anti-pattern that combines the worst aspects of monoliths and microservices. On the surface, it looks like a microservices architecture with multiple deployable components, separate repositories, and network calls, but in practice, it behaves like a monolith. Services are tightly coupled, changes ripple across boundaries, and deployments require coordination across teams.
This situation often arises when teams split a monolith without redefining boundaries or when they share state and logic between services. Instead of gaining autonomy and scalability, you inherit the complexity of distributed systems without the benefits. A distributed monolith is the worst of both worlds: the complexity of microservices without their flexibility. If you find yourself deploying multiple services but needing to coordinate every release, you’re probably in distributed monolith territory.
For example, imagine you have three services in an e-commerce system:
- Order Service
- Inventory Service
- Payment Service
They are deployed separately and communicate over HTTP, so it looks like microservices.
However:
- All three services share the same database schema.
- A change in the Order table requires updates in Inventory and Payment code.
- Deploying a new feature means all three services must be released together because they depend on each other’s internal logic.
So, even though they are distributed, they behave like a single monolith with tight coupling, shared state, and coordinated deployments.
Microservices
A microservice architecture divides a system into smaller, independent services, each responsible for a single business capability. These services communicate through APIs or messaging queues and are deployed separately. Each microservice can have its own technology stack and database, which enables teams to develop, scale, and release features independently.
This approach offers significant benefits in terms of flexibility and resilience. Teams can work autonomously, choose the best tools for their service, and deploy changes without impacting the entire system. However, it introduces a different type of complexity compared to monolithic architectures. While monoliths concentrate complexity in a single codebase, microservices distribute it across multiple services. This means challenges shift toward interservice communication, data consistency, and deployment orchestration.
To manage these complexities, teams often adopt platform engineering practices or rely on a dedicated platform engineering team. Cloud computing helps by offloading part of this complexity to providers, offering managed services for networking, storage, and scaling. Kubernetes is a prime example, it orchestrates containerised workloads, handles service discovery, and simplifies deployment at scale. Still, even with these tools, designing and operating a microservices ecosystem requires careful planning and robust automation.
Sizing a Microservice
One of the most common questions when adopting microservices is: How big should a microservice be? Chris Richardson provides a useful definition in Microservices Patterns:
A microservice is an independently deployable service that is small enough to be developed and managed by a small team, and that focuses on a single business capability.
-- Microservices Patterns, Chris Richardson
This definition emphasizes two key principles: independence and focus. A microservice should be deployable on its own and should encapsulate a single business capability. But what does that mean in practice?
In The Goldilocks Zone: Finding the Best Size for Microservices, Jaime Nagase explains:
While there’s no perfect size for every microservice, aiming for 100 to 1,000 lines of code is a good start. By focusing on this and always improving, businesses can make the most out of microservices, creating applications that are easy to scale, change, and keep up to date.
-- The Goldilocks Zone, Jaime Nagase
While this heuristic can be helpful, it’s not a hard rule. The right size depends on the product you are building and how it aligns with the wider business. For example, in the same organisation, you might have a service that handles creating, modifying, and fulfilling orders, and another that simply sends emails. Both are valid microservices, even though their complexity and size differ significantly.
Ultimately, there is no universal answer to the question of size. It’s a judgment call for engineers, guided by principles rather than strict metrics. A good starting point is to ensure that each microservice represents a single business capability, is small enough for one team to manage end to end, and can be deployed independently without creating tight coupling with other services.
Shared Code & Cross‑Cutting Concerns
Shared features and code play an important role in maintaining consistency across services, but they must be handled carefully to avoid tight coupling. Cross cutting concerns such as authentication, logging, observability, and resilience patterns should live in shared, versioned libraries, rather than being reimplemented in every service. This approach ensures uniform behavior without forcing teams into lockstep deployments.
Domain specific functionality, on the other hand, should never be shared as internal code. Instead, it should be exposed through well defined APIs or events. When other services need access to a capability like order management or payment processing, they should consume it via the owning service’s contract, not by importing its implementation or database schema. This preserves autonomy, scalability, and clear ownership boundaries while reducing hidden dependencies.
There are a couple of exceptions. Where data structures, such as a User, Address, Order, etc, are the same across services, these can also be shared via a versioned library. Services, such as those which send emails, may publish a versioned client library to make integrating them into client services simpler.
Monolith vs. Microservices
Choosing the right architecture is a critical decision that impacts scalability, maintainability, and development speed. The summary below compares monolithic and microservices approaches to help guide that choice.
Monolith
- Coupling: Tightly coupled
- Structure: All functionality bundled together
- Pros: Easy to start, simpler deployment
- Cons: Harder to scale and maintain over time
- When to use: Small applications, early-stage projects, or when team size and complexity are low
Microservices
- Coupling: Loosely coupled
- Structure: Distributed across multiple services
- Pros: Better scalability and agility
- Cons: Requires sophisticated infrastructure, including networking, monitoring, and fault tolerance
- When to use: Large, complex systems needing independent scaling and frequent deployments
In summary: monoliths favor simplicity at the beginning, while microservices prioritise flexibility and scalability for complex, evolving systems.
I prefer to start with microservices, unless there’s a clear reason not to, such as hardware limitations or cost constraints. However, In the early stages, boundaries can be hard to define, especially in a new domain where the business is still evolving. Services should also be under constant review to ensure they remain fit for purpose, and restructured or broken down into smaller, more manageable components when necessary to improve maintainability, scalability, and performance. Over time, the goal is a balanced approach: internal communication stays internal, while maintaining enough modularity to adapt as things change.
That said, while monoliths are often seen as the opposite of microservices, there’s a middle ground: the modular monolith. This approach keeps the system as a single deployable unit but enforces strong internal boundaries between modules. Each module represents a distinct business capability, similar to how microservices would, but without the overhead of distributed systems.
The key idea is designing for escape: if you start with a monolith, make it easy to break apart later. By organizing code into vertical slices, where each slice owns its domain logic, and data access you reduce coupling and make future decomposition straightforward.
The architecture we choose doesn’t just shape how we build services, it also determines how they scale.
Scaling
Expected and unexpected traffic surges, such as those caused by major events, can push services to their CPU and memory limits, resulting in degraded performance and poor scalability. Two primary strategies exist to address this: vertical scaling and horizontal scaling. Vertical scaling involves increasing the capacity of a single machine to handle more load. This can include adding CPU cores, RAM, faster storage, improving network bandwidth, or other optimisations that reduce resource constraints. This approach is simple and transparent to the software, often improving performance metrics such as latency, throughput, or consistency because requests are processed within one system without requiring architectural changes. However, it has hard limits. Performance gains diminish as hardware costs rise, and eventually, you hit a ceiling where adding more resources becomes impractical. Vertical scaling also creates a single point of failure, if that machine goes down, the entire service is impacted.
Horizontal scaling distributes workload across multiple instances of a service, increasing capacity by adding additional instances that share the load. When designed well, this approach can deliver near linear improvements in throughput and, in some architectures, improve resiliencey by isolating failures to individual instances. However, these benefits are not guaranteed, as they depend on deliberate architectural choices. Services must be stateless or handle distributed state gracefully, incorporate load balancing, and manage coordination, partitioning, and replication. Some systems prioritise consistency, latency, or throughput over fault tolerance, which means scaling out may introduce complexity without necessarily improving reliability. Unlike vertical scaling, horizontal scaling is not something you can bolt on later; it needs to be considered from the start if you expect unpredictable or massive growth.
In practice, vertical scaling is often the first step because it’s quick and easy. But for long term resilience and scalability, horizontal scaling is essential. Engineers should plan for it early, addressing challenges like state management and distributed consistency upfront to avoid costly redesigns later.
If you’re not as familiar with the difference between horizontal and vertical scaling, I’ve written a rudimentary guide called Scream If You Want to Go Faster.
As we design for scalability, especially with horizontal scaling and distributed systems, the need for deep visibility into system behavior becomes critical. Observability provides that visibility, enabling teams to understand why a system behaves the way it does, not just whether it is running. It ensures predictable operations even as infrastructure grows and evolves.
Observability: Seeing Inside Your Services
Modern distributed systems demand more than basic monitoring. Monitoring simply tells you if the system is running, whereas observability explains why it behaves the way it does. It provides engineers with the ability to understand internal states from external outputs without deploying new code. In microservices, where complexity is distributed, observability becomes a critical differentiator. Without it, diagnosing issues across multiple services and dynamic infrastructure can feel like searching for a needle in a haystack. Observability enables faster root cause analysis, proactive anomaly detection, and better capacity planning.
Observability rests on three pillars: metrics, logs, and traces. Metrics provide quantitative measurements such as latency, throughput, error rates, and resource utilization, helping teams monitor performance trends and detect anomalies. Logs record detailed, structured events that explain what happened and why, and should include correlation IDs and context for easier debugging. Traces offer end to end visibility of requests across services, allowing engineers to follow a request through multiple components and identify bottlenecks. Platforms like Datadog and Sumo Logic are popular choices for building observability into modern systems. Datadog offers comprehensive monitoring for metrics, traces, and application performance, while Sumo Logic focuses on log aggregation, analytics, and security insights.
Implementing observability requires careful instrumentation and integration. Standardising telemetry with frameworks like OpenTelemetry ensures consistency, while context propagation allows trace IDs to flow across services. Use structured logging instead of free text, and apply sampling to capture enough traces without overwhelming storage. Automate dashboards and alerts for real-time visibility. Observability is foundational in microservices; designing for it from day one avoids costly retrofits and enables resilient, scalable systems.
With observability in place, the next step is to ensure that external consumers interact with your system through well defined, consistent contracts. This is where public APIs come in. They provide clarity, stability, and predictability as your services scale and evolve.
Public API
The public API defines the external contract for your service, forming the single, reliable entry point for consumers. One of the most common ways to implement a public API is through REST (Representational State Transfer), which provides a simple, standardised approach for defining and consuming service contracts.
Alternatives such as gRPC can also be used, offering benefits like efficient binary serialisation and strong typing. To make these contracts explicit and machine readable, specifications such as OpenAPI for REST or Protocol Buffers (mandatory for gRPC) are employed, ensuring clarity and consistency for both providers and consumers.
RESTful API Principles
A REST API provides a clear contract between a service and its consumers, defining how clients interact in a consistent, predictable, and secure manner. A well-designed API ensures interoperability, scalability, and ease of integration for third-party applications. It should expose only what is necessary for external consumers, avoiding internal implementation details while maintaining a stable boundary:
Consistency in resource naming, error handling, and response formats reduces integration complexity, and changes must be managed through explicit versioning to maintain backward compatibility.
I have written about RESTful interface behaviour, endpoint structure, HTTP methods, and payload formats in the RESTful Behavior Guide.
Reducing Coupling Through Orchestration
Orchestration is the process of coordinating and managing interactions between multiple services to execute a business workflow or achieve a specific outcome. It ensures that these services work together in the correct sequence and handle dependencies consistently.
To minimize coupling and keep orchestration internal, a web or mobile application should ideally interact with a single backend service, which then orchestrates other services within the system boundary. This prevents exposing internal service topology and reduces complexity for clients. It makes the system more robust as the client does not need to handle distributed transactions, rollback logic, or partial failures. Those concerns stay within the system boundary where they can be managed consistently.
This approach improves security as it centralises orchestration, transactions are harder to intercept or manipulate, and sensitive credentials or tokens are never exposed to multiple endpoints. It enhances performance by reducing network chatter and eliminating multiple round trips between the client and various services, resulting in faster response times. Latency within the system boundary is typically far lower than over the public internet, so consolidating calls through a single backend service delivers a smoother, more efficient user experience.
Security & Authentication
Security is a fundamental requirement. Usually all endpoints enforce authentication and authorization, so that sensitive data is protected. The API must also be resilient, degrading gracefully under load and providing meaningful error messages when limits are reached.
At the system boundary, trust in the client is established through authentication and authorisation. For app-to-service communication, use short-lived tokens (such as OAuth2 access tokens) rather than static API keys to avoid leakage. Passing these tokens between internal services is problematic, especially for background tasks without a user context, so internal service-to-service calls should rely on internal system authentication, such as API keys, rather than propagating user credentials.
While RESTful APIs handle synchronous interactions, many workflows benefit from asynchronous communication for resilience and scalability. This is where messaging queues come in, enabling services to exchange events without tight coupling or blocking dependencies.
Asynchronous Communication
Modern distributed systems increasingly rely on asynchronous communication to achieve resilience, scalability, and loose coupling between services. Unlike synchronous interactions, where a service waits for a response before proceeding, asynchronous patterns allow services to continue processing without being blocked. This approach is particularly valuable in high-throughput environments or workflows that involve long running tasks, as it prevents bottlenecks and improves overall system responsiveness.
The most common mechanism for asynchronous communication is message queues. These act as intermediaries, decoupling producers from consumers and enabling reliable delivery even when one component is temporarily unavailable. For example, when an order service publishes an OrderCreated event, downstream services such as inventory or billing can consume that event at their own pace without impacting the order service’s performance. This design not only improves fault tolerance but also supports horizontal scaling, as multiple consumers can process messages in parallel.
Async Communication
Event driven architecture is a natural extension of this pattern. By broadcasting domain events rather than invoking direct API calls, services reduce dependencies and avoid tight coupling. This makes the system more adaptable to change. New consumers can subscribe to events without requiring modifications to the producer. However, asynchronous communication introduces its own challenges, such as ensuring message ordering, handling duplicates, and maintaining eventual consistency across distributed data stores. These concerns must be addressed through idempotent operations, retry policies, and careful schema evolution.
While asynchronous messaging removes temporal coupling, the structure of the message itself introduces semantic coupling between producer and consumer. Changes to the event schema can break consumers unless managed through versioning and backward compatible design. Keeping payloads minimal and using clear event semantics helps reduce this risk.
In practice, asynchronous communication complements synchronous APIs rather than replacing them entirely. While REST calls are ideal for immediate, transactional operations, messaging systems like RabbitMQ, Kafka, or AWS SQS excel at handling background tasks, integrations, and workflows that tolerate latency. A balanced approach, combining synchronous and asynchronous patterns, creates systems that are both responsive and resilient, capable of scaling gracefully under unpredictable load.
Finally
Designing services that scale isn’t just about picking the right architecture or technology. It's about making deliberate choices that align with business needs and anticipate future growth. Every decision, from defining service boundaries to planning for resilience, security and observability, shapes how systems evolve over time. The goal is not perfection on day one, but adaptability. Build services that can respond to change without collapsing under complexity. Keep reviewing, keep refining, and remember that scalability is as much about clarity and discipline as it is about infrastructure. The best systems are those that stand the test of time because they were designed with both purpose and pragmatism.
Build for today, design for tomorrow. Services that scale start with clarity and purpose.
Acknowledgements
Thank you to my reviewers:
Uzma Ali
Dom Davis
Laurent Douchy
Orsolya Kardos
Russel King
Alex Leslie
Jon Moore
Hoyt Reid
Kevin Richardson
Kristina Safonova
Nayana Solanki
References
Microservices Patterns
Chris Richardson
ISBN-13: 978-1617294549
The Goldilocks Zone: Finding the Best Size for Microservices
Jaime Nagase




Comments
Post a Comment