Queue Worker Capacity Calculator

Queue name:

Shown in exports so the capacity run stays traceable.

Arrival rate:

Use the sustained peak interval, not a full-day average when traffic is bursty.

msg/s

Current workers:

Enter the active consumer count you want to compare against the target worker plan.

workers

Per-worker throughput:

Use measured completion rate per worker from logs or queue metrics.

msg/s

Current backlog:

Enter visible queue depth, pending jobs, or delayed messages that must be drained.

messages

Target drain time:

The worker recommendation must keep pace with arrivals and clear this backlog inside the target.

min

Utilization ceiling:

Common production targets sit around 70-85% when arrival bursts and retries are possible.

{{ error }}

Arrival burst reserve:

Leave 0 for the measured average; raise it when queue metrics show frequent short spikes.

Retry or overhead reserve:

Leave 0 when the per-worker throughput already includes normal retry and coordination overhead.

Worker start delay:

Leave 0 when current workers are already active or scale-out is immediate.

sec

Fixed safety workers:

Use 0 for pure math; add workers when downstream variance or rollout risk deserves a fixed buffer.

workers

Worker cap:

Leave 0 for no cap; set a limit to flag when the plan cannot be reached.

workers

Metric	Value	Detail	Copy
{{ row.metric }}	{{ row.value }}	{{ row.detail }}

Plan point	Workers	Net drain	Drain time	Utilization	Operator note	Copy
{{ row.point }}	{{ row.workers }}	{{ row.netDrain }}	{{ row.drainTime }}	{{ row.utilization }}	{{ row.note }}

Checkpoint	Signal	Action	Operator note	Copy
{{ row.checkpoint }}	{{ row.signal }}	{{ row.action }}	{{ row.note }}

Embed:

Customize

Include current inputs

Size

Advanced

Width

Height

Aspect ratio

Max height

Collapsible embed

Allow fullscreen

Referrer policy

Sandbox tokens

Background queues make user-facing work look finished before the hidden work has actually cleared. A checkout page, upload screen, webhook endpoint, CI trigger, or report request may hand work to a queue and return quickly, but email sends, image processing, retries, builds, imports, and stream events still need enough workers to finish at the rate they arrive. Queue worker capacity measures that hidden service rate against incoming demand, waiting backlog, and the delay target the system is expected to protect.

The backlog number alone is easy to overread. A queue with 10,000 waiting messages can be healthy when workers complete 1,000 more messages per minute than arrive, and dangerous when the net drain is only a few messages per second. A service-level target makes the capacity question concrete because it changes "will this eventually drain?" into "will this drain inside the required window while new work keeps entering the queue?"

Arrival rate: Incoming jobs, messages, tasks, or events per second during the interval being modeled.
Service rate: Completed work per second. For a worker pool, this is worker count multiplied by effective throughput per worker.
Net drain: Service rate minus arrival rate. Positive net drain shrinks backlog; zero or negative net drain lets waiting work persist or grow.
Utilization: Arrival rate divided by service capacity. High utilization leaves little room for bursts, retries, slow downstream calls, or uneven message sizes.

Busy-period measurement matters more than a full-day average. Many queues are quiet for hours and then receive concentrated work from a campaign, cron job, payment provider, CI run, migration, or batch import. Sizing from the whole day can make a worker pool look safe even though the hour that matters has a higher arrival rate, more retries, and slower downstream calls.

Worker pools also need slack. At 95% utilization, a small slowdown in a database, API, object store, lock manager, or network dependency can turn a stable queue into a growing queue. Retry work has the same effect because repeated messages consume capacity without representing new user demand. A fixed worker cap, partition limit, function-concurrency limit, or downstream quota can block a mathematically valid target even when extra workers would clear the backlog.

Capacity planning gives a sizing target, not a guarantee. The estimate is strongest when workers are roughly interchangeable, throughput comes from recent completed-work metrics, and the modeled interval resembles the traffic pattern being planned for. Priority queues, large job-size differences, poisoned messages, retry storms, and shared downstream bottlenecks need live operational checks alongside the calculation.

How to Use This Tool:

Model one queue, topic, stream, or worker group at a time. The main decision is whether the current pool can drain the starting backlog inside the target window while staying under the utilization ceiling.

Enter Queue name as the traceable label for the capacity run.
Set Arrival rate to the sustained busy-period messages per second from enqueue, publish, ingress, or trigger metrics.
Use the busiest representative interval. A full-day average can hide the burst that actually drives worker count.
Enter Current workers and Per-worker throughput. Throughput should mean completed messages per worker per second after normal downstream service time is included.
Set Current backlog, Target drain time, and Utilization ceiling. These fields decide the catch-up worker floor and the steady-state headroom floor.
Open Advanced when measured averages need adjustment. Use Arrival burst reserve for short spikes, Retry or overhead reserve for duplicate or coordination work, Worker start delay for autoscaler lag, Fixed safety workers for a whole-worker buffer, and Worker cap for quotas or downstream limits.
If Worker cap is positive and lower than the recommendation, the result reports the capped state instead of pretending the target is reachable.
Review Queue Metrics and Worker Plan before using the charts. Check whether Recommended worker target, Current drain time, and Current utilization agree with the scale-out decision.
Use Backlog Burn-Down, Worker Sensitivity, and Capacity Brief to verify the plan. If a validation warning appears, correct the named field before relying on any worker target.

Interpreting Results:

Recommended worker target is the larger of two floors: workers needed to clear the backlog by the target time and workers needed to keep steady utilization at or below the ceiling. Fixed safety workers are added after that comparison. If a positive worker cap is below the target, the summary switches to the capped outcome and flags the target as blocked.

Current drain time depends on positive net drain. If current service capacity is less than or equal to modeled arrivals, the backlog will not shrink while arrivals continue, so the result should be read as a capacity deficit rather than a late completion estimate.

Queue worker capacity result interpretation
Output	Read it as	What to verify
`Modeled arrival rate`	Base arrival rate after the optional burst reserve.	Check that the interval reflects the busy period, not a diluted daily average.
`Per-worker effective throughput`	Useful per-worker completion rate after retry or overhead reserve.	Confirm the measured rate includes locks, downstream throttling, setup work, and normal retries.
`Backlog at added-worker start`	Starting backlog adjusted for the worker start delay.	Use a realistic autoscaler, deployment, warm-up, or cold-start delay.
`Queueing estimate`	An M/M/c approximation of wait probability, average wait, and expected queue depth when the model is stable.	Compare the estimate with observed age, wait, or latency metrics before treating it as an SLA forecast.
`Worker cap outcome`	The modeled state when a quota or concurrency cap prevents the recommended target.	Decide whether to raise the cap, relax the drain target, lower arrivals, or reduce downstream cost per message.

A passing table result can still fail in production when traffic is uneven, workers are not identical, messages have priority classes, a downstream system throttles, or retries spike. Treat the recommendation as a sizing brief and compare it with oldest-message age, retry rate, worker saturation, error rate, and downstream latency.

Technical Details:

The capacity model treats the queue as a single stream of work served by whole workers. Arrival rate and per-worker throughput are measured in messages per second. Worker count is rounded to whole workers, so every required worker floor is rounded up, not rounded to the nearest integer.

Two constraints govern the target. The drain constraint adds enough service rate to process new arrivals and remove the adjusted starting backlog within the target time. The utilization constraint keeps steady load below the selected ceiling. This separation matters because an empty queue can still need more workers for headroom, while a stable queue with a large starting backlog can still miss the drain target.

Formula Core:

The formulas use messages per second for rates and seconds for time. Burst reserve raises arrivals; retry or overhead reserve lowers effective per-worker throughput.

\begin{matrix} λ_{eff} & = & λ \times (1 + \frac{b}{100}) \\ μ_{eff} & = & μ \times (1 - \frac{h}{100}) \\ net (c) & = & c \times μ_{eff} - λ_{eff} \\ Q_{start} & = & \max (0, Q_{0} - net (c_{current}) \times d) \\ c_{drain} & = & ⌈ \frac{λ_{eff} + \frac{Q_{start}}{T}}{μ_{eff}} ⌉ \\ c_{util} & = & ⌈ \frac{λ_{eff}}{μ_{eff} \times u} ⌉ \\ c_{target} & = & \max (c_{drain}, c_{util}) + s \end{matrix}

Here lambda is the entered arrival rate, mu is raw per-worker throughput, b is burst reserve percent, h is retry or overhead reserve percent, c is workers, Q0 is the current backlog, d is worker start delay, T is target drain time, u is utilization ceiling as a fraction, and s is fixed safety workers. A positive worker cap does not change the unconstrained target; it changes the planned state used for capped warnings and charts.

For the default values, 12 messages per second arrive, 8 workers each complete 2.4 messages per second, and 18,000 messages are waiting. Current capacity is 19.2 messages per second, so net drain is 7.2 messages per second and the current drain time is about 41 minutes 40 seconds. The 30-minute target requires 10 workers because the pool must cover 12 arrivals per second plus 10 backlog messages per second.

Queue worker capacity rule boundaries
Rule	Boundary	Effect on the plan
Drain floor	Workers must cover modeled arrivals plus `backlog / target time`.	Prevents a stable but slow catch-up plan.
Utilization floor	Modeled arrivals divided by service capacity must be at or below the ceiling.	Preserves headroom for bursts, retries, and slower downstream calls.
Scale delay	Backlog changes at the current net drain rate until added workers are assumed active.	Captures autoscaler delay, deployment delay, warm-up time, or cold starts.
Safety workers	Whole workers are added after the drain and utilization floors are compared.	Raises the recommendation without changing measured arrival or service rates.
Worker cap	A positive cap below the target constrains the planned state.	Shows the capped outcome and warns that the target is blocked.

The queueing estimate uses the Erlang C M/M/c approximation when service capacity exceeds arrivals and the worker count is within the supported range. The approximation assumes random arrivals, exponentially distributed service times, identical workers, one shared queue, no abandonment, and no priority classes. Those assumptions are useful for a rough wait estimate, but they are weaker for batchy traffic, long-tail job durations, and systems where one downstream dependency becomes the real bottleneck.

\begin{matrix} a & = & \frac{λ_{eff}}{μ_{eff}} \\ ρ & = & \frac{a}{c} < 1 \\ W_{q} & = & \frac{C (c, a)}{c \times μ_{eff} - λ_{eff}} \\ L_{q} & = & λ_{eff} \times W_{q} \end{matrix}

Backlog Burn-Down samples backlog over a modeled horizon. Before added workers start, it uses the current worker state. After the start delay, it uses the planned worker state, which may be the capped state if a cap blocks the target. Worker Sensitivity evaluates nearby whole-worker counts so the drain-time cost of one fewer or one more worker is visible.

Limitations, Privacy, and Accuracy Notes:

The model assumes one queue, identical workers, constant average arrival and service rates, no priority classes, and no message abandonment.
Per-worker throughput should come from completed work, not only worker availability. Locks, API throttling, database contention, retries, and per-message setup time all reduce useful throughput.
Large bursts can violate a result that passes on a sustained average. Use the burst reserve or model the busiest interval separately when spikes are operationally important.
Erlang C wait estimates are skipped or marked unstable when the modeled service capacity is invalid, the worker count is very large, or arrivals are at or above capacity.
The capacity math runs in the browser from the values on the page. Queue names, rates, backlog counts, and worker settings are not sent to a dedicated calculation service.

Use the exported tables and JSON as a planning record, then compare the recommendation with live queue age, oldest-message age, retry rate, worker CPU or memory saturation, and downstream latency before changing production limits.

Worked Examples:

Default queue needs extra catch-up workers

With 12 msg/s arrivals, 8 current workers, 2.4 msg/s per worker, 18,000 messages waiting, and a 30 min drain target, current service capacity is 19.2 msg/s. Net drain is 7.2 msg/s, so the backlog clears in about 41 min 40 sec. The drain floor rises to 10 workers because the pool must handle new arrivals and remove 10 msg/s of existing backlog.

Burst and overhead reserves expose a cap problem

At 20 msg/s arrivals, a 25% burst reserve raises modeled arrivals to 25 msg/s. If each worker handles 2 msg/s but a 10% overhead reserve applies, effective throughput is 1.8 msg/s. With 30,000 messages waiting and a 20 min target, the drain floor reaches 28 workers. A 14-worker cap turns the result into a quota or downstream concurrency problem.

Zero backlog still needs a steady-state check

A queue with no starting backlog can still need more workers when the utilization ceiling is tight. If arrivals are 40 msg/s, per-worker throughput is 5 msg/s, and the ceiling is 80%, the utilization floor is 10 workers. The burn-down chart may stay flat at zero, but the worker plan still protects headroom.

FAQ:

What arrival rate should I enter?

Use the sustained busy-period arrival rate for the interval you are sizing. If traffic has short spikes, enter the measured average and add Arrival burst reserve rather than hiding the spike in a full-day average.

Why can utilization pass while drain time fails?

Utilization checks steady load. Drain time also includes the starting backlog. A queue can have enough workers for future arrivals and still need temporary catch-up capacity to clear old messages before the target time.

What does retry or overhead reserve do?

Retry or overhead reserve reduces Per-worker effective throughput. Use it when duplicate work, lock contention, throttling, cold setup, or coordination overhead means raw completion rate overstates useful service rate.

Why does worker start delay change the recommendation?

Added workers are not always active immediately. During a scale-out, deploy, warm-up, or cold-start delay, the current worker pool continues to drain or grow the backlog. The recommendation uses the backlog expected when added workers begin processing.

Why is the queueing estimate skipped?

The wait estimate is skipped for invalid service capacity or very large worker counts, and it is marked unstable when arrivals are at or above modeled service capacity. In those cases, the drain and utilization results are more useful than an average wait estimate.

Can I use this for serverless or autoscaled workers?

Yes, when each concurrent function instance, pod, process, or consumer can be represented as a worker with an average completion rate. Use Worker start delay for cold starts or autoscaler lag, and use Worker cap for concurrency limits.

Glossary:

Arrival rate: The modeled incoming work rate in messages per second.
Per-worker throughput: The average completed messages per second for one worker before or after the overhead reserve.
Service capacity: Worker count multiplied by effective per-worker throughput.
Net drain: Service capacity minus modeled arrivals; positive net drain reduces backlog.
Utilization ceiling: The maximum modeled worker occupancy allowed for steady-state headroom.
Erlang C: A queueing approximation for wait probability and average wait in an M/M/c queue.

References:

Little's Law, Wolfram MathWorld.
ErlangC, Wolfram Language & System Documentation.
Scaling based on Amazon SQS, AWS Auto Scaling User Guide.