Skip to content

Non-Functional Requirements

NFRs are the qualities a system must have (“the system should be…”), as opposed to features (“users can…”). In an interview, pick 3–4 that actually drive the design and quantify them — “low latency” is a wish every system shares; “home timeline renders in < 200 ms” is a requirement that names a target.

Pull your 3–4 from these. Mnemonic: FCC + SLEDS.

NFRThe question to ask
Fault toleranceHow well must it keep working through failures?
CAPConsistency or availability under a network partition? (Partition tolerance is a given.)
ComplianceLegal / regulatory / industry standards?
ScalabilityDAU/MAU, QPS, bursty traffic, read:write ratio?
LatencyHow fast, on which operations specifically?
EnvironmentMobile battery, limited memory/bandwidth?
DurabilityHow bad is data loss? (Social feed vs. bank ledger.)
SecurityData protection, access control?

Availability, CAP & fault tolerance — untangled

Section titled “Availability, CAP & fault tolerance — untangled”

This cluster gets conflated constantly. Keep them separate:

  • Availability / uptime — a real, quantifiable NFR target: % of time the system is up, in nines (99.9% ≈ 8.7 h down/yr, 99.99% ≈ 52 min/yr). This is the goal.
  • CAP’s “A” is not this. CAP-availability means “every request still gets a response during a network partition” — a behavior-under-partition property, not a percentage of uptime. Same word, different concept; don’t state your uptime target as “the A in CAP.”
  • High availability and fault tolerance are the approach, not the target — how you hit an uptime number by surviving failures:
    • High availability (HA)minimizes downtime; on failure a standby is promoted, so there’s a brief interruption.
    • Fault tolerance (FT) — runs continuously through a failure with no noticeable gap, because redundant components serve in parallel.
    • The difference is the failover gap: HA has a small one, FT has none.
  • Redundancy, failover, recovery are tools — the mechanisms that implement HA/FT. Not requirements in themselves.

So on a strict list the genuine NFRs are the targets (uptime %, durability, latency…). “Fault tolerance” is either the means to the uptime target or a sharper restatement of it — “survive an AZ loss with no gap” — not a separate axis.

HA and FT aren’t a binary — they’re the top rungs of a redundancy ladder, each shortening the failover gap at higher cost. The line: HA still has a promote step, FT doesn’t.

PostureBackup stateFailover
No redundancy (SPOF)nonefull outage
Cold standbyoffminutes–hours (boot + restore)
Warm standbyrunning, not servingseconds–minutes (promote)HA
Hot standby (active-passive)live, readysub-second (promote)HA
Active-activeall nodes servingnone — just lost capacityFT
Lockstep + voterparallel, in syncnone — fault maskedFT