WebSocket Connection Capacity Calculator
Size WebSocket node pools from peak sockets, memory, egress, descriptor limits, and N+ reserve with bottleneck and scale-out checks.| Metric | Value | Operator note | Copy |
|---|---|---|---|
| {{ row.metric }} | {{ row.value }} | {{ row.note }} |
| Constraint | Safe cap/node | Plan utilization | Action | Copy |
|---|---|---|---|---|
| {{ row.constraint }} | {{ row.safeCap }} | {{ row.utilization }} | {{ row.action }} |
| Plan point | Nodes | Failover capacity | Headroom | Note | Copy |
|---|---|---|---|---|---|
| {{ row.point }} | {{ row.nodes }} | {{ row.failoverCapacity }} | {{ row.headroom }} | {{ row.note }} |
{{ capacityBrief }}
{{ jsonString }}
Introduction:
WebSocket capacity planning is different from sizing ordinary request traffic because the expensive part is not only how many messages arrive in a second. A WebSocket service keeps client connections open for minutes, hours, or whole workdays. Each connected client can hold memory, file descriptors, TLS state, subscriptions, heartbeat timers, buffers, and load balancer state even while it appears idle.
That long-lived shape changes the planning question. A short HTTP endpoint may care most about requests per second and response latency. A WebSocket fleet also has to survive open-socket count, outbound fanout, reconnect waves, node drains, operating system limits, proxy timeouts, and broker behavior. A chat presence service, a multiplayer lobby, a market data feed, and an IoT dashboard can all show the same concurrent connection count while stressing completely different resources.
The useful capacity number is the smallest safe ceiling across several independent limits. A server can pass a high connection-count test but still run out of memory when session state grows. It can have plenty of memory and still hit a descriptor limit. It can hold idle sockets cleanly and then saturate outbound egress when one event is pushed to thousands of clients. The planning reserve matters because production failures rarely happen at a calm average load.
| Term | Plain meaning | Why it changes capacity |
|---|---|---|
| Peak concurrent sockets | The largest number of open client connections expected at one time. | It sets the base demand before reserve is added. |
| Fanout | One event or update sent to many connected clients. | Outbound traffic can become the bottleneck even when inbound traffic is small. |
| Per-connection memory | Memory held for each socket, subscription, buffer, and session record. | Small per-client state becomes large when multiplied by tens of thousands of sockets. |
| N+ reserve | Extra nodes held aside for node loss, deploy drains, or uneven connection spread. | It prevents a plan from depending on every node being healthy and fully usable. |
WebSocket capacity should therefore be treated as a pool design rather than a single-server brag number. The practical answer combines measured load-test evidence with resource math, then asks how many serving nodes remain after the planned reserve is removed. That gives a more conservative basis for autoscaling targets, launch readiness, and capacity reviews.
The result still needs real testing. Kernel tuning, runtime garbage collection, load balancer affinity, TLS termination, compression, message broker fanout, and client reconnect behavior can all move the real ceiling. A calculator can organize the model; it cannot replace a soak test that uses the same payloads, subscriptions, heartbeat settings, and failure patterns as production.
How to Use This Tool:
- Choose the Workload preset closest to the traffic shape. Use it only as a starting point, then replace the sample values with measurements from your load test or production forecast.
- Enter Peak concurrent sockets, Current WebSocket nodes, and Tested connection cap per node. The tested cap should stop before latency, heartbeat misses, reconnects, memory growth, or event-loop delay becomes unacceptable.
- Add the per-node resource limits: Memory per connection, Node RAM budget, File descriptor limit, and Node egress ceiling. Use the application budget for RAM and egress, not the theoretical host maximum, if other services share the node.
- Describe outbound fanout with Outbound message rate, Average outbound payload, and Protocol overhead. Use compressed payload sizes only when compression is consistently enabled and included in the overhead assumption.
- Set Planning reserve for growth, bursty reconnects, deploy drains, and uneven distribution. Then set N+ node loss reserve to the whole-node cushion you need after the serving fleet is sized.
- Open Advanced when you need to adjust memory utilization, network utilization, file descriptors, protocol overhead, node-loss reserve, or result precision.
- Review Capacity Snapshot first. The result is ready to share when the snapshot has no input warnings, the Constraint Ledger shows a believable bottleneck, and the Scale-Out Plan matches the failover posture you intend to run.
Use the copy, CSV, document, chart, brief, and JSON outputs for handoff only after the numbers reflect a representative environment. Defaults are useful for orientation, not procurement decisions.
Interpreting Results:
Recommended pool size is the total node count after the serving nodes are calculated and the N+ reserve is added. Safe sockets per node is the smallest finite capacity among the tested runtime cap, memory budget, file descriptor budget, and egress bandwidth model. If that bottleneck is unrealistic, fix the input behind it before trusting the node recommendation.
- Reserved connection target is peak concurrent sockets multiplied by the planning reserve.
- Current failover capacity removes the N+ reserve from the current node count, then multiplies the remaining serving nodes by safe sockets per node.
- Node gap is the added node count needed for the recommended plan. A zero gap means the current pool clears the model, not that it is proven under failure.
- Target egress and Egress per planned node help distinguish connection-count capacity from message fanout capacity.
- Capacity Bottleneck Chart compares the per-node ceilings, while Scale-Out Curve Chart shows how failover capacity changes as nodes are added.
A green result should still be checked against reconnect storms, sticky-session distribution, health-check draining, autoscaling lag, TLS handshakes, broker replay, and regional failover. WebSocket fleets often fail during transitions, not during steady idle load.
Technical Details:
WebSocket node sizing starts with reserved demand and then evaluates per-node ceilings. The resource model is intentionally conservative because the safe fleet size is driven by the tightest constraint. A memory-rich node does not help if the descriptor limit is lower, and a high descriptor limit does not help if outbound fanout saturates the egress budget.
Planning reserve and node-loss reserve serve different jobs. Planning reserve increases the socket target before the serving-node count is calculated. Node-loss reserve is added after that serving-node count is known, so those reserve nodes are not counted as serving capacity when failover capacity is checked.
Formula Core:
The formulas use decimal percentages, so a 25% planning reserve is written as 0.25 and a 65% network ceiling is written as 0.65.
| Symbol | Meaning | Visible input or result |
|---|---|---|
| P | Observed or forecasted peak open sockets. | Peak concurrent sockets |
| T | Peak sockets after planning reserve. | Reserved connection target |
| G, M, u | Node memory in GB, memory per socket in KB, and allowed memory utilization. | Node RAM budget, Memory per connection, Memory utilization ceiling |
| F | Total descriptor limit before the 1,024 descriptor runtime reserve is subtracted. | File descriptor limit |
| S, m, o | Average payload bytes, messages per connection per minute, and protocol overhead. | Average outbound payload, Outbound message rate, Protocol overhead |
| E, n | Node egress ceiling in Mbps and allowed network utilization. | Node egress ceiling, Network utilization ceiling |
| L | Whole nodes reserved for failover or planned loss. | N+ node loss reserve |
With the default chat and presence values, 75,000 peak sockets with a 25% reserve become 93,750 target sockets. The memory ceiling is about 734,003 sockets per node, the descriptor ceiling is 64,512 sockets per node, and the egress ceiling is about 765,065 sockets per node. The tested runtime cap of 30,000 is lower than those derived limits, so safe capacity is 30,000 sockets per node. The base serving requirement is four nodes, and N+1 reserve makes the recommended pool five nodes.
| Boundary | Rule | Interpretation |
|---|---|---|
| Per-node caps | Derived socket counts are rounded down before comparison. | Fractional sockets are not counted as usable capacity. |
| Node count | The serving-node requirement is rounded up. | A partial node demand becomes a full added node. |
| Descriptor reserve | 1,024 descriptors are reserved before client sockets are counted. | Runtime files, upstream sockets, logs, and proxy work need handles too. |
| Bandwidth disabled | If message rate or payload size is zero, egress is not a finite bottleneck. | Idle-only models should be rerun with realistic push traffic before launch. |
| Validation pause | Peak sockets, tested cap, memory, RAM, egress, and descriptor inputs must clear minimum checks. | Warnings mean the exported plan should not be used yet. |
The model assumes an even enough connection distribution across serving nodes after the N+ reserve is removed. Sticky sessions, least-connections balancing, zone imbalance, and slow drains can violate that assumption. Compare the recommended per-node load with the actual highest-loaded node during tests, not only the average across the pool.
Accuracy and Privacy Notes:
This is a capacity model, not a production guarantee. Treat it as a planning and review aid that must be validated with WebSocket-specific load tests, soak tests, controlled deploy drains, reconnect drills, and regional failover exercises.
- Measure per-connection memory with the real authentication state, subscription fanout, compression setting, TLS location, heartbeat cadence, and buffer behavior.
- Check operating system limits such as file descriptors, ephemeral ports, TCP keepalive, accept queues, and proxy or load balancer idle timeouts.
- Do not assume the tested cap survives churn. Opening, closing, reconnecting, and resubscribing clients can be more stressful than a stable idle socket count.
- The calculation runs in the browser and does not need to upload the entered capacity plan. Copied briefs, CSV files, document exports, JSON, and saved URLs can still reveal private architecture assumptions if shared.
Worked Examples:
Chat and presence default. A 75,000-socket peak with 25% planning reserve produces 93,750 target sockets. With a 30,000 tested cap per node, four serving nodes are needed, and N+1 reserve raises the recommendation to five total nodes. A current three-node pool has only two serving nodes after reserve, so the node gap is two.
Push-heavy fanout. Keep the same 75,000-socket peak, but raise outbound traffic to 600 messages per socket per minute with 1,200-byte payloads and 25% overhead. The egress model falls to about 5,416 safe sockets per node on a 1,000 Mbps node with a 65% network ceiling. The recommendation jumps to 19 nodes under N+1 because bandwidth, not socket count, drives the plan.
IoT telemetry default. A 160,000-device dashboard with 30% reserve targets 208,000 sockets. The default N+2 posture recommends seven total nodes: five serving nodes plus two reserve nodes. The node gap is one when the current pool has six nodes, even though the target egress is only about 27.5 Mbps, because the tested per-node runtime cap still controls the model.
Invalid descriptor input. If the file descriptor limit is set to 1,024 or lower, the model pauses because the descriptor limit must exceed the reserved runtime allowance. Raise the limit or correct the input before copying the plan.
FAQ:
Why include a tested cap when memory, bandwidth, and descriptors are already modeled?
The tested cap captures runtime behavior that simple resource math misses, including event-loop delay, garbage collection, heartbeat stability, TLS work, proxy behavior, and reconnect churn.
Should reserve be added before or after node count?
Planning reserve is added before serving-node count because it increases demand. N+ reserve is added after serving-node count because those nodes are held back for failover, drains, or uneven distribution.
Why can bandwidth show as not limiting?
If outbound message rate or payload size is zero, the bandwidth formula has no finite per-socket traffic to divide into node egress. Enter realistic fanout values for push-heavy systems.
Can one region-wide number size a global WebSocket service?
Use the model per region, shard, or traffic cell. Global systems also need traffic steering, regional failover assumptions, subscription replication, and connection distribution checks.
What should be tested after the recommended node count looks acceptable?
Run a test at the planned per-node connection count, then add churn, deploy draining, reconnect waves, broker replay, and the largest expected fanout event. The steady-state number is only one part of the capacity review.
Glossary:
- WebSocket
- A persistent, full-duplex connection that starts with an HTTP Upgrade handshake and then carries framed messages between client and server.
- Fanout
- Sending one event or update to many connected clients, often the main outbound bandwidth driver.
- Descriptor
- An operating system handle used for sockets, files, pipes, logs, and related resources.
- Soak test
- A long-running load test that exposes leaks, slow memory growth, heartbeat misses, and connection churn behavior.
- Failover capacity
- The connection capacity left after reserve nodes are removed from the node pool.
- N+ reserve
- Whole nodes held aside so the fleet can keep serving after a node loss, deploy drain, or similar capacity reduction.
References:
- RFC 6455: The WebSocket Protocol, IETF.
- WebSocket API, MDN Web Docs, last modified March 26, 2026.