Scalability
Deep-dive into horizontal vs. vertical scaling patterns and handling exponential resource growth curves.
What you'll learn
- Concrete Load Metrics
- Performance Under Load Curve
- Vertical Scaling (Scale Up)
- Horizontal Scaling (Scale Out)
- Stateless vs. Stateful Services
TL;DR
Deep-dive into horizontal vs. vertical scaling patterns and handling exponential resource growth curves.
Visual System Topology
Scalability Evolution (0 to 10M+ Users)
Everything (App code, Database files) resides on a single machine. Low complexity, high resource competition.
A Load Balancer distributes incoming requests across independent, stateless server nodes with shared session state.
Databases are sharded horizontally based on keys to divide and scale write throughput globally.
Concept Overview
As an application grows, the load on it grows too: more users, more data, and more requests per second. A design that worked for a thousand users may not work for a million, and a database that served a hundred queries per second may fail at ten thousand.
This is where scalability becomes critical.
What is Scalability? Scalability is the ability of a system to handle increased load by adding resources. The key word here is ability — a scalable system can grow to meet demand without requiring a complete architectural overhaul. It is one of the foundational concerns of system design, as many other properties of a production system depend on the choices made at the scaling layer.
Key Architectural Pillars
Concrete Load Metrics
Before scaling, you must measure. Load metrics evaluate systems along concrete dimensions: Requests per second (RPS), Concurrent users active at the same time, Data volume stored (e.g. TB/PB), network throughput (GB/s), Database query rates (QPS), and Queue message processing rates.
Performance Under Load Curve
Good scaling maintains predictable performance as load multiplies. Ideally, we look for linear or sublinear degradation (e.g., doubling load does not double response times). When response times spike exponentially (superlinear degradation) or timeout, you have hit a scalability wall.
Vertical Scaling (Scale Up)
Upgrading the hardware capacity of an existing machine (adding CPU cores, increasing RAM, adopting faster SSDs, or upgrading network cards). Requires zero architectural changes but has a hard hardware ceiling, no redundancy (single point of failure), and costs rise disproportionately.
Horizontal Scaling (Scale Out)
Adding more standard commodity servers to the pool rather than upgrading existing ones. Incoming traffic is distributed across servers via a load balancer, providing fault tolerance, elasticity, and practically infinite scaling capabilities.
Stateless vs. Stateful Services
To scale out horizontally, application services must be stateless (session data is not kept locally on a server). Instead, user sessions live in a shared store like Redis, tokens (JWT) are sent client-side, and files are stored in object storage (like S3). This gives the load balancer complete freedom to route any request to any server.
