| Metric | Tokens | Cost (USD) | Copy |
|---|---|---|---|
| {{ row.label }} | {{ row.tokensDisplay }} | {{ row.costDisplay }} |
| Component | Per Request | Per Day | Per Month | Copy |
|---|---|---|---|---|
| {{ row.label }} | {{ row.perRequest }} | {{ row.perDay }} | {{ row.perMonth }} |
| Component | Per Request | Per Day | Per Month | Copy |
|---|---|---|---|---|
| {{ row.label }} | {{ row.perRequest }} | {{ row.perDay }} | {{ row.perMonth }} |
| Metric | Value | Copy |
|---|---|---|
| {{ row.label }} | {{ row.value }} |
Token billing is the basic meter behind most large language model APIs. Providers usually quote separate prices for input and output tokens, then your own traffic volume determines what those rates mean in practice. This calculator turns that pricing structure into per-request, daily, and monthly cost estimates.
That matters when you are sizing a prototype, checking whether a launch fits a team budget, or comparing one prompt design against another. A request that looks cheap in isolation can become expensive once retries, fixed fees, and billing days are added back in.
The package combines prompt tokens, completion tokens, requests per day, billing days, and either a preset or custom rate card. It then layers in cached prompt pricing, retry overhead, margin uplift, fixed monthly fees, growth, and an optional budget cap so the estimate matches operational planning better than a bare price-table lookup.
A modest change in output length can move spend as much as a headline rate change, and a retry multiplier can quietly turn 10,000 successful requests into 12,000 billable attempts. The reverse is also true: shorter completions or a strong cache hit rate can materially reduce monthly totals without shrinking traffic.
These figures are planning estimates, not invoice guarantees. The built-in prompt estimator is heuristic, provider tokenizers and caching rules differ, and the bundled preset rates can drift from current vendor price sheets. Use the result for budgeting and comparison, not procurement approval or financial advice.
Start with the workload you can describe with the least guesswork: typical prompt tokens, typical completion tokens, and successful requests per day. If you already know your contract rates, switch to Custom or override the preset values in Advanced. If you do not, a preset is a reasonable first-pass baseline.
Prompt draft to rough in Prompt tokens, not to certify an invoice number. The package auto-fills prompt tokens until you manually override that field.Retry multiplier early if fallbacks, streaming reconnects, or transient errors are common. A multiplier of 1.20 means the daily volume is treated like 20% more attempts.Cache hit rate as prompt-side relief only. Completion tokens never receive a cached discount in this model.Monthly total (tokens + fees) in Summary with Budget headroom / overage in Efficiency before changing assumptions.After the first pass, compare Summary and Components instead of focusing only on Per request. Most budget surprises come from traffic, retries, or fixed fees rather than from the single-request figure.
The calculator tracks two token buckets per request: prompt tokens and completion tokens. Each bucket is billed against a dollars-per-1,000-token rate. Prompt tokens can be split between a normal rate and a cached rate according to Cache hit rate, which gives the package an effective prompt price before any request-volume scaling happens.
Volume is then expanded in two stages. Requests per day becomes effective daily traffic after the lower-bounded Retry multiplier is applied, and monthly usage multiplies that traffic by Billing days per month. Margin uplift multiplies the token cost, while Fixed monthly fees are added only after token spend has been computed.
The scenario outputs reuse the same per-request cost model. The growth line multiplies daily requests by 1 + growthPercent / 100, and the peak line uses the larger of that growth multiplier plus 0.25 or 1.5. In other words, the chart is a stress view of traffic, not a forecasting engine.
These equations describe the billing model used for the main summary outputs.
Here, P is prompt tokens, O is completion tokens, H is cache-hit ratio from 0 to 1, Q is successful requests per day, S is retry multiplier, D is billing days, M is margin uplift, and F is fixed monthly fees.
When you paste text into Prompt draft, the package estimates prompt size from two rough lenses: characters and words. It computes a character-based estimate with approximately 4 characters per token, computes a word-based estimate at about 1.32 tokens per word, averages the two, rounds to the nearest whole token, and writes that number back to Prompt tokens until you manually edit that field.
| Output Field | Meaning | Reading Note |
|---|---|---|
Per request |
Unit cost for one successful request after margin | Does not include daily volume by itself |
Monthly total (tokens + fees) |
Projected month cost including fixed fees | Primary budget review figure |
Cache savings (...) |
Prompt-side cost reduction from cached pricing | Appears only when cache hit rate is above zero |
Budget headroom / overage |
Difference between budget cap and monthly total | Positive means headroom, negative means overage |
Effective cost per 1K tokens |
Blended cost intensity for your configured workload | Useful for comparing setups, not invoices |
The guardrails are simple and deterministic. Negative numeric values are coerced to zero where applicable, Cache hit rate is clamped to 0 to 100, and Retry multiplier is floored at 1. That means a monthly total of zero often signals missing traffic inputs rather than a broken rate formula.
Use this flow when you want a cost number that survives a quick sanity check.
Preset that is close to your model family, or choose Custom if you already know the exact rate card you need.Prompt tokens and Completion tokens. If prompt size is still rough, paste representative text into Prompt draft and let the package seed the prompt-token field.Requests per day and Billing days per month. These two fields decide whether your unit cost stays a small lab number or becomes an operating number.Advanced for the billing assumptions that change spend most: Prompt rate, Prompt cached rate, Cache hit rate, Completion rate, Retry multiplier, Margin uplift, Fixed monthly fees, Growth scenario, and Monthly budget cap.Summary first. Confirm that Per request, the daily row, and Monthly total (tokens + fees) all move in the direction you expect.Components and Efficiency to see what is driving spend, then open Cost Forecast for the base, growth, and peak scenarios.If the summary still looks odd, check request volume and billing days before you change prices. Those two inputs explain many zero or unexpectedly tiny totals.
Treat Per request as unit cost, not total impact. The number that usually matters operationally is Monthly total (tokens + fees), because that is where token size, effective traffic, retries, and fixed fees all meet.
Budget headroom / overage means the current month estimate is still inside the cap. A negative value means the package projects the cap will be exceeded.Effective cost per 1K tokens does not guarantee a lower real invoice if your pasted prompt omits system text, retrieval context, or provider-specific tokenization behavior.Cache savings (...) only reflects prompt-side discounts. A long completion can still dominate spend even when the cache hit rate is high.Per request looks reasonable but daily or monthly totals are zero, verify Requests per day and Billing days per month before trusting the forecast.Use Summary for the decision, Efficiency for the sanity check, and the chart as a scenario view after the tabular totals already make sense.
With the package defaults, Prompt tokens = 1400, Completion tokens = 600, Requests per day = 240, Billing days per month = 30, and the GPT-4o preset rates. The Summary table returns Per request = $ 0.0095, Daily (240 requests incl. retries) = $ 2.28, and Monthly total (tokens + fees) = $ 68.40. In Efficiency, Effective cost per 1K tokens is $ 0.0048. That is a clean baseline for comparing later tweaks.
Keep the same workload, set Cache hit rate = 60, and add Monthly budget cap = $60. The blended prompt rate falls, Per request drops to $ 0.0085, Cache savings (60% hit) shows -$ 7.56 for the month, and Monthly total (tokens + fees) lands at $ 60.84. The budget badge still reports overage because Budget headroom / overage is -$ 0.84. This is the useful edge case: the package shows that an efficiency win can still miss the cap.
Suppose the token fields and rates are filled, but Requests per day stays at 0. The tool still calculates Per request, yet the daily row and Monthly total (tokens + fees) stay at $ 0.00 because there is no traffic to multiply. Once Requests per day is restored to a real workload, the monthly forecast populates immediately. That is a correction case, not a pricing bug.
Use this tool for cost planning, scenario comparison, and early budget conversations. Do not treat it as an invoice, an accounting record, or proof that a deployment will stay within contract terms once provider discounts, taxes, rounding policy, or tiered pricing are applied.
If a decision carries real procurement or financial risk, verify the assumptions against the provider's current rate card and your own traffic logs before committing spend.
Because the package uses a rough blend of character count and word count. Real tokenizers vary by model, language, and formatting, so the estimator is a starting point rather than a billing-grade counter.
No. In this model the cached discount applies only to prompt tokens. Completion tokens always use Completion rate.
The usual causes are Requests per day = 0 or Billing days per month = 0. The unit cost can still exist even when traffic volume is missing.
No. It only compares the projected month against your cap and reports headroom or overage.
No. This tool has no server-side processing path, so the calculations stay in the browser.
No. The presets are bundled package values. Check vendor pricing pages and update the advanced rate fields when accuracy matters.