Message Queues
A message queue is a durable buffer that hands each message to one worker and deletes it on success. That one sentence is the whole difference from a log: a queue is deliver-then-forget, built for tasks — units of work to run once, retry on failure, and scale across workers. It’s the counterpart to the Kafka page; reach for it when the message is a command (“do this”), not an event for many to observe (see event vs task).
Message vs event vs command. Message is the umbrella — the envelope you send through any broker, queue, or log. An event states a fact, a command/task requests an action, and a query asks for data: same wire format, different intent. So all events and tasks are messages; a queue just happens to be the natural home for command messages.
The model: competing consumers
Section titled “The model: competing consumers”Many workers read the same queue, but each message goes to exactly one of them — so adding workers scales throughput. The lifecycle of a message is the part worth knowing:
Mermaid source
flowchart LR classDef io fill:#eef2f8,stroke:#94a3b8,stroke-width:1.5px,color:#0f172a; classDef q fill:#eef0fe,stroke:#6366f1,stroke-width:1.5px,color:#0f172a; classDef worker fill:#e7f5ec,stroke:#3f9c5a,stroke-width:1.5px,color:#0f172a; classDef dlq fill:#fef6e7,stroke:#d9a441,stroke-width:1.5px,color:#0f172a; P(["Producer"]):::io Q["Queue"]:::q W1["Worker"]:::worker W2["Worker"]:::worker W3["Worker"]:::worker DLQ[("Dead-letter queue")]:::dlq P --> Q Q -->|"each message → one worker"| W1 Q --> W2 Q --> W3 W1 -.->|"ack → delete"| Q W2 -.->|"fail → redeliver after<br/>visibility timeout"| Q Q -.->|"after N failures"| DLQ- Ack-and-delete — a worker pulls a message, processes it, and acknowledges; only then is it removed. No ack → it comes back.
- Visibility timeout — while a worker holds a message it’s hidden from others; if the worker dies or doesn’t ack in time, the message reappears and another worker retries. This is what makes a queue survive worker crashes — and why processing must be idempotent (at-least-once delivery means retries can double-fire).
- Dead-letter queue (DLQ) — after N failed deliveries, the message is shunted to a DLQ instead of poisoning the queue forever — inspect and replay later.
- Knobs queues give you (and a log doesn’t): per-message retry/backoff, priority, delay/TTL, selective ack. These are exactly what job processing needs.
Ordering
Section titled “Ordering”Most queues are best-effort order — fine for independent jobs. Strict order costs throughput: FIFO modes (SQS FIFO, a single-consumer RabbitMQ queue) guarantee order within a group but cap parallelism, since order and competing-consumers pull against each other. Need strict per-key order and high throughput? That’s a log’s partition, not a queue.
The landscape
Section titled “The landscape”| System | What it is | Notes |
|---|---|---|
| RabbitMQ | Self-hosted AMQP broker | Exchanges (direct / topic / fanout / headers) route to queues via bindings — flexible routing, can do work-queues and pub/sub. Push-based, acks, dead-letter exchanges. |
| Amazon SQS | Fully-managed queue | Standard (at-least-once, best-effort order, near-infinite throughput) vs FIFO (ordered, exactly-once processing, limited throughput). Visibility timeout, DLQ, long polling. Pair with SNS for fanout. |
| Task queues (Celery, Sidekiq, BullMQ) | App-level job frameworks | Sit on top of a broker (Redis/RabbitMQ) and add scheduling, retries, concurrency, result tracking — the everyday “background job” workhorse. |
Returning a result (and why it gets brittle)
Section titled “Returning a result (and why it gets brittle)”A queue is one-way: producer → worker. The moment the sender needs the result back, you bolt on a reply queue plus a correlation ID to match each response to its request. For a single round-trip that’s fine.
It falls apart at multi-step flows. Call A, wait for A’s result, then call B, handle B’s failure, compensate — now you’re hand-maintaining correlation IDs, per-step state, timeouts, and retries scattered across several queues, with no single place that owns the flow. That’s the “queue glue” trap: it works in a demo and rots in production — impossible to see end-to-end, painful to debug, scary to change. The result also creates a command-disguised-as-event tangle — every reply is really “here’s your answer,” but nobody owns the conversation.
That brittleness is the signal you’ve outgrown the queue: when the hard part is tracking the state of a process rather than delivering a message, hand it to an orchestrator — the workflow awaits each result durably and owns the whole flow.
Queue vs log vs orchestrator
Section titled “Queue vs log vs orchestrator”Three tools, three jobs — don’t force one into another’s role:
- Queue (this page) — do this once: a task for one worker, with retry/visibility/priority/DLQ. (SQS, RabbitMQ.)
- Log (Kafka) — this happened: a retained, replayable event stream many consumers read at their own offset.
- Orchestrator (Temporal) — run this multi-step process: durable state, the workflow owns the flow and the result. When you’re hand-rolling retries + correlation + state across queue messages, you’ve outgrown the queue.
When to use — and not
Section titled “When to use — and not”Use it for: background jobs, task distribution across workers, decoupling a slow step from the request path, buffering/smoothing bursty work, and rate-limited processing.
Avoid it for: broadcasting an event to many independent consumers or needing replay/history (use a log); strict ordering at high throughput (a log’s partition); and multi-step stateful processes with compensation (use an orchestrator).
These are working notes — the queue half of the messaging picture, paired with the Kafka log page. The one idea to keep: a queue delivers a unit of work to one worker and forgets it on ack — everything else (visibility timeout, DLQ, FIFO) follows from making that reliable. One-liners in the Study List.