Core Concepts
The technology-agnostic building blocks that show up in nearly every design problem — the vocabulary interviewers assume you have before you start scaling Instagram or designing Uber. This page is the quick what/why/when for each; the hard mechanics live in the dedicated pages linked throughout.
Networking
Section titled “Networking”Default to HTTP over TCP — it covers ~90% of cases and is the assumed choice unless you have a reason to deviate.
- SSE vs WebSockets — SSE is unidirectional (server pushes over a client-opened HTTP connection: live scores, notifications); WebSockets are bidirectional (chat, collaboration). SSE is simpler and rides standard HTTP infra; reach for WebSockets only when the client must push frequently. Both are stateful — they don’t sit behind a plain L7 balancer, and you must plan for connection persistence and server failure with thousands of live connections.
- gRPC — binary + HTTP/2, much faster than JSON/HTTP; use it for internal service-to-service calls. Not for public APIs (browsers need gRPC-Web + a proxy). Common pattern: REST externally, gRPC internally.
- L7 vs L4 load balancing — L7 routes on HTTP content (send API vs page requests to different services); L4 distributes TCP connections, faster but content-blind. WebSockets typically need L4 (persistent TCP).
- Geography & latency — NY↔London is ~80ms minimum just from light through fiber, before any processing. Global low latency forces regional deployments and geo-replication; this is why CDNs serve static content from the edge.
A common mistake: proposing WebSockets when long-polling or SSE would do. They’re “real-time” but add real stateful-connection complexity — only when you genuinely need bidirectional push.
API design
Section titled “API design”Sketch 4–5 endpoints and move on — most interviewers want reasonable APIs, not perfect ones. Spending 10 minutes here is going too deep.
- REST by default — resources to URLs, HTTP verbs to actions (
GET /users/{id},POST /events/{id}/bookings). - Pagination — cursor- vs offset-based: cursor for real-time data where items get inserted frequently, offset for most cases.
- Auth — JWT for user sessions, API keys for service-to-service.
- Rate limiting — mention it if abuse/bots are plausible; don’t go deep unless asked.
Data modeling
Section titled “Data modeling”The schema choices have massive downstream effects on performance and scalability.
- Relational vs NoSQL — relational (Postgres) for structured data with clear relationships, complex queries (SQL joins), transactions, and constraints. NoSQL (DynamoDB, MongoDB) for flexible schemas or horizontal scale without joins.
- Normalize vs denormalize — normalized splits data across tables (consistent, but reads need joins that get expensive at scale); denormalized duplicates data to kill joins (fast reads, but every copy must be updated). Default: start normalized, denormalize specific hot paths once you’ve identified a read problem — don’t denormalize upfront.
- NoSQL is access-pattern-first — you design the partition/sort key around your queries.
user_idas partition key makes “posts for user X” a fast single-partition lookup but makes “posts with hashtag Y” a full scan. You must know the queries up front.
Indexing
Section titled “Indexing”Without an index, finding a user by email scans every row; with one, it’s a millisecond jump. See Postgres Internals for the mechanics.
- B-tree — the default; supports exact lookups and range queries. Hash is faster for exact matches only (no ranges). Specialized: full-text for search, geospatial for location.
- Index what you query frequently — email for auth lookups,
user_idon the orders table. Compound index for multi-column filters (“events in SF on Dec 25” → index on(city, date)). - External indexes — Elasticsearch for full-text, PostGIS for geo in Postgres. They sync from the primary via CDC, so reads lag slightly — stale by a small amount, almost always fine for search.
Caching
Section titled “Caching”Comes up the moment your database is read-bound. Store hot data in fast memory (Redis) and skip the DB for most reads — also offloading the DB so it can take more writes. (Caching vocabulary.)
- Cache-aside + Redis is the 90% pattern: check cache → on miss, query DB, populate with a TTL, return.
- Invalidation is the hard part — on write, delete/update the cached copy or you serve stale data. Tools: invalidate-on-write, short TTLs, or both.
- Stampede — when a hot key expires, concurrent misses pile onto the DB. Defend with request coalescing/locks, early recomputation, or jittered TTLs. A full Redis outage is distinct: defend with a small in-process fallback, circuit breakers, or graceful degradation.
- Don’t cache everything — only frequently-read, rarely-changed data; caching per-request-changing data just adds latency. Profile first.
CDN caching (static assets at the edge) and in-process caching (small rarely-changed values like config/flags) are different tools; external Redis is the default for core app data.
Replication
Section titled “Replication”Keep copies of data on multiple nodes — for read scaling (fan reads across replicas) and durability/HA (survive a node loss). The first lever to pull before sharding. (Replication entry.)
- Leader-follower is the common shape: the leader takes writes, followers serve reads and stand by to take over. Scales reads, not writes (every write still hits the leader).
- Sync vs async — sync replication waits for follower acks (no data loss on failover, higher write latency); async acks immediately (fast, but a leader crash can lose the last writes). Async is the default.
- Replication lag — followers trail the leader, so reads can be stale. This is the source of the read-your-writes problem: a user updates, then reads a lagging replica and sees the old value. Fix by reading your own writes from the leader, or pinning the session briefly.
- Failover — on leader loss, a follower is promoted (manual or via leader election). Watch for split-brain (two leaders) and data loss of unreplicated writes.
Replication scales reads and gives HA; sharding scales writes and storage. They compose — production systems usually shard and replicate each shard.
Sharding
Section titled “Sharding”Split data across independent servers once a single DB is outgrown — storage (TB+), write throughput (tens of thousands of writes/s), or read load replicas can’t absorb. (Sharding/partitioning.)
- Shard key decides everything —
user_idkeeps a user’s data on one shard (fast user-scoped queries) but makes global queries (“trending across all users”) hit every shard and aggregate. State the key and the tradeoff. - Strategies — hash-based (hash key, modulo to a shard) distributes evenly, avoids hot spots, most common; range-based works when access naturally partitions (multi-tenant) but risks hot ranges; directory-based (lookup table) is flexible but adds a dependency and latency — rarely worth it.
- Don’t shard too early — a tuned single DB with read replicas goes far. Do the capacity math first; 10K writes/s and 100GB doesn’t need it.
- New problems — cross-shard transactions are near-impossible (design boundaries to avoid them, else sagas/distributed transactions); hot spots (one shard hammered); resharding is painful data movement.
Consistent hashing
Section titled “Consistent hashing”Fixes the resize problem in distributed caches and sharded stores. Arrange servers and keys on a virtual ring: a key belongs to the next server clockwise, so adding/removing a server only moves the keys in that one arc. Used by Redis Cluster, Memcached, Cassandra, DynamoDB, some CDNs and load balancers. In interviews it’s usually enough to name it — bring it up when discussing elastic scaling. See Distributed Systems.
Internally: wrap the hash space (0 … 2^m−1) into a circle and place both nodes and keys with the same hash — pos = hash(x) mod 2^m. A key is owned by the first node clockwise (smallest pos(node) ≥ pos(key), wrapping), found by binary search in O(log N). Add/remove a node and only its one arc moves. Virtual nodes — each node hashed to ~100–200 ring points — even out lopsided arcs and spread a departing node’s load across many neighbors.
CAP & PACELC
Section titled “CAP & PACELC”Under a network partition you get two of three: Consistency, Availability, Partition tolerance. Partitions are unavoidable, so the real choice is C or A. (CAP entry.)
- Choose consistency → some nodes refuse requests during a partition rather than serve stale data (correct but may be down).
- Choose availability → every node keeps serving, nodes may diverge until the partition heals.
- Default: availability + eventual consistency — feeds, recommendations, analytics tolerate slightly stale data but not downtime.
- Strong consistency when stale data costs money: inventory (overselling), banking (balances), booking (double-booked seats). You can mix per subsystem — eventually-consistent reviews, strongly-consistent inventory.
- PACELC extends it: during a Partition choose A/C; Else (healthy network) choose Latency/Consistency. Even healthy, strong consistency costs latency because nodes coordinate before responding.
Numbers to know
Section titled “Numbers to know”You don’t do back-of-envelope math up front — you do it when a decision needs it (“shard or not?”, “one Redis enough?”). Modern hardware is bigger than most assume; 2010-era numbers make you shard and cache far too early.
Latency ladder (the gaps that drive decisions): memory access ≈ nanoseconds · SSD reads ≈ microseconds · intra-datacenter network ≈ 1–10 ms · cross-continent ≈ tens–hundreds of ms.
| Component | Rough capacity | Scale triggers |
|---|---|---|
| Cache (Redis) | ~1 ms latency · 100K+ ops/s · memory-bound (up to ~1TB) | hit rate < 80% · latency > 1ms · memory > 80% · thrashing |
| Database | up to ~50K txns/s · sub-5ms cached reads · 64 TiB+ storage | writes > 10K TPS · uncached read latency > 5ms · geo-distribution needs |
| App server | 100K+ concurrent connections · 8–64 cores · 64–512GB RAM (up to 2TB) | CPU > 70% · latency > SLA · connections near 100K/instance · memory > 80% |
| Message queue | up to ~1M msgs/s per broker · sub-5ms end-to-end · up to ~50TB | throughput near 800K msgs/s · ~200K partitions/cluster · growing consumer lag |
A single Postgres instance handles a few TB comfortably — don’t shard until tens of TB. Proposing sharding at 500GB adds complexity for nothing.