LLM Spend Snapshot
{{ formatCurrency(monthlyCost) }}
{{ summaryLine }}
{{ presetLabel }} {{ formatCurrency(costPerCall) }} per request {{ formatNumber(totalRequestsPerMonth, 0) }} billed attempts / mo {{ formatNumber(monthlyTokens, 0) }} monthly tokens Budget {{ budgetStatusText }} {{ topRecommendation.savingsDisplay }} saved Rank {{ benchmarkRank }} / {{ benchmarkCount }} {{ selectedPresetMeta.status }}
Inputs
{{ presetFormText }}
tokens
tokens
{{ tokenMixLine }}
req
days
Estimator currently reads {{ estimatedTokens }} tokens. Manual edits to Prompt tokens pause auto-fill until the estimate matches again.
$ / 1K tokens
$ / 1K tokens
%
$ / 1K tokens
%
$
%
×
$
Waiting for inputs. Add token counts and traffic to populate the cost model.
Budget fit
{{ budgetHeadline }}
{{ budgetSupportLine }}
Fastest lever
{{ topRecommendation ? topRecommendation.title : 'No cost lever yet' }}
{{ topRecommendation ? topRecommendation.nextStep : 'Increase traffic or add a budget cap to unlock comparison guidance.' }}
Current rate card
{{ rateCardHeadline }}
{{ rateCardSubline }}
{{ summaryLeadLine }}
Main cost summary
Metric Tokens / Volume Cost (USD) Copy
{{ row.label }} {{ row.tokensDisplay }} {{ row.costDisplay }}
{{ componentsLeadLine }}
Cost composition by component
Component Per Request Per Day Per Month Copy
{{ row.label }} {{ row.perRequest }} {{ row.perDay }} {{ row.perMonth }}
{{ usageLeadLine }}
Usage detail
Component Per Request Per Day Per Month Copy
{{ row.label }} {{ row.perRequest }} {{ row.perDay }} {{ row.perMonth }}
{{ economicsLeadLine }}
Unit economics metrics
Metric Value Copy
{{ row.label }} {{ row.value }}
Primary cost driver
{{ spendDriverHeadline }}
{{ spendDriverLine }}
Budget-safe envelope
{{ budgetEnvelopeHeadline }}
{{ budgetEnvelopeLine }}
{{ pressureLeadLine }}
Spend pressure actions
Priority Move Monthly Total Savings Budget Effect Next Step Copy
{{ row.priority }}
{{ row.title }}
{{ row.context }}
{{ row.monthlyDisplay }} {{ row.savingsDisplay }} {{ row.budgetDisplay }} {{ row.nextStep }}
No savings moves are available yet. Add traffic, retries, or a budget cap to unlock pressure guidance.
Current rank
{{ benchmarkRank > 0 ? `${benchmarkRank} / ${benchmarkCount}` : '—' }}
{{ benchmarkRankLine }}
Cheapest verified preset
{{ bestBenchmark ? bestBenchmark.label : '—' }}
{{ bestBenchmark ? `${bestBenchmark.monthlyDisplay} · ${bestBenchmark.deltaDisplay}` : 'Add a workload to compare presets.' }}
Same-provider option
{{ sameProviderBenchmark ? sameProviderBenchmark.label : 'Already lowest in provider lane' }}
{{ sameProviderBenchmark ? `${sameProviderBenchmark.monthlyDisplay} · ${sameProviderBenchmark.deltaDisplay}` : sameProviderFallbackLine }}
{{ benchmarkLeadLine }}
Preset benchmark ladder
Rank Model Provider Per Request Monthly Vs Current Copy
{{ row.rank }}
{{ row.label }}
{{ row.budgetDisplay }}
{{ row.provider }} {{ row.requestDisplay }} {{ row.monthlyDisplay }} {{ row.deltaDisplay }}
{{ benchmarkAssumptionLine }}
Base month
{{ formatCurrency(monthlyCost) }}
{{ formatNumber(totalRequestsPerMonth, 0) }} billed attempts and {{ formatNumber(monthlyTokens, 0) }} tokens.
Best non-switch lever
{{ topWorkloadRecommendation ? topWorkloadRecommendation.title : 'No workload lever yet' }}
{{ topWorkloadRecommendation ? `${topWorkloadRecommendation.savingsDisplay} saved without changing providers.` : 'Add retries, cache assumptions, or more tokens to unlock scenario deltas.' }}
Peak month
{{ peakScenarioRow ? peakScenarioRow.monthlyDisplay : '—' }}
{{ peakScenarioRow ? peakScenarioRow.context : 'The peak scenario appears once the model is calculated.' }}
{{ forecastAssumptionLine }}

            
:

Introduction

Large language model billing usually starts with a simple rule and ends with a messy planning question. Providers commonly charge one rate for prompt tokens and another for completion tokens, but the real month total also depends on how many requests you serve, how often retries happen, whether repeated prompt text is cached, and which fixed platform costs sit on top of token spend. This calculator turns those moving parts into one workload model so you can estimate per-request cost, daily run-rate, and monthly spend from the same assumptions.

That makes it useful well before an invoice exists. You can size a prototype, compare one model card against another, check whether longer answers are still affordable, or test whether a monthly cap survives growth. The tool accepts either a public pricing preset or custom rates, then layers in traffic, retry overhead, cached prompt reads, margin uplift, fixed monthly fees, a growth scenario, and an optional budget cap.

The result is not just a headline number. You get a summary table, a component breakdown, a usage table that separates successful requests from billed attempts, a unit-economics view, ranked savings suggestions in Spend Pressure, a cross-provider Model Ladder, a Scenario Burn chart, and a structured JSON export. Table exports support copy, CSV download, and DOCX export, while the chart tabs add image and CSV downloads for the visual views.

Prompt tokens input size per request Completion tokens output size per request Rate card preset or custom with cached prompt reads Workload requests, billing days, retries growth and peak stress Variable spend per-request cost times billed attempts Monthly total variable spend plus fixed monthly fees Budget check headroom or overage plus safe request envelopes
The calculator keeps token size, traffic, and budget review separate so you can see which lever is actually driving the month total.

All calculations stay in the browser. The tool has no server-side pricing path, so pasted prompt drafts and manual assumptions are processed locally on the page.

Technical Details:

The calculator builds monthly spend from four layers. First it prices prompt tokens and completion tokens separately. Prompt tokens can be blended between the normal prompt rate and the cached-read rate according to the cache hit percentage, while completion tokens always use the completion rate. Second it converts successful daily traffic into billed attempts by multiplying requests per day by the retry multiplier. Third it applies any margin uplift to the variable token cost. Fourth it adds fixed monthly fees after the usage-driven cost has already been calculated.

The main summary rows use one baseline formula set. Growth and peak views do not introduce a new pricing method. They rerun the same per-request economics at higher billed-attempt counts. The growth month uses your configured growth percentage, while the peak month uses whichever is larger: growth plus another 25%, or 1.50 times the base billed-attempt rate. That makes the chart a workload stress test rather than a forecasting engine.

Rprompt,eff = Rprompt×(1H)+Rcached×H Abilled = Q×max(S,1) Crequest = (P1000×Rprompt,eff+O1000×Rout)×(1+M100) Cmonth = Crequest×Abilled×D+F

In those expressions, P is prompt tokens per request, O is completion tokens per request, H is the cache-hit ratio from 0 to 1, Q is successful requests per day, S is the retry multiplier, M is the margin percentage, D is billing days per month, and F is fixed monthly fees.

The prompt draft box is only an estimator. The page normalizes the pasted text, counts both characters and words, estimates tokens from roughly four characters per token and 1.32 tokens per word, averages the two estimates, then rounds to the nearest whole token. That is a practical starting point for English-like text, but real tokenizer counts still vary by model, language, punctuation, hidden system text, and retrieved context.

Main cost levers in the LLM usage cost calculator
Lever How the calculator uses it What to keep in mind
Prompt tokens Priced at the effective prompt card after any cached-read blend Long system prompts and retrieval context often hide here
Completion tokens Priced only at the completion card Shorter answers often cut cost faster than model swaps
Cache hit rate Moves a share of prompt tokens onto the cached-read price This tool does not add cache-write or cache-storage charges
Retry multiplier Turns successful requests into billed attempts Retry cost can rise even when the user-facing traffic stays flat
Fixed monthly fees Added after the variable token total is calculated Useful for platform, observability, or support costs
Budget cap Compares the monthly total against a ceiling and derives safe envelopes It does not change the cost math; it only evaluates the result

The preset ladder covers public rate cards from OpenAI, Anthropic, Google, Mistral, and Cohere. If you switch to Custom, the live rate fields stay editable and the ladder still compares your current rates against the verified public presets. Some rows may also carry provider status notes, which is useful when a public card is already deprecated but still helpful as a comparison point.

Everyday Use & Decision Guide:

Start with the parts of the workload you can measure directly. Typical prompt size, typical answer size, successful requests per day, and active billing days usually give a better first estimate than trying to guess a perfect model choice up front. Once those numbers look realistic, move to the pricing preset or custom card that best matches the workload you are planning.

If you are trying to reduce spend, it helps to separate three different questions. Token-size questions live in Prompt tokens and Completion tokens. Reliability questions live in Retry multiplier. Budget-floor questions live in Fixed monthly fees. Mixing those together can make a cost problem feel harder than it is, so the tool keeps them in separate tables and recommendations.

  • Use Prompt draft when you need a fast estimate, but replace it with observed token counts when you move from rough planning to launch budgeting.
  • Set Retry multiplier early if your flow can fan out, reconnect, or fall back. A small retry increase can matter more than a minor price-card change.
  • Treat Cache hit rate as prompt-side relief only. In this model, completion tokens never get a cache discount.
  • Add Fixed monthly fees if you want the result to reflect the real operating floor rather than token spend alone.
  • Set Monthly budget cap when you need to answer capacity questions such as how many requests per day or how many tokens per request fit inside a monthly target.

When a preset shows a status note, treat that row as a planning reference, not an automatic recommendation. A deprecated public card may still be worth comparing for cost history, but it is rarely the right answer for fresh production work.

Step-by-Step Guide:

  1. Choose a preset that matches the provider card you want to test, or switch to Custom if you already know the exact prompt, cached-read, and completion rates.
  2. Enter Prompt tokens and Completion tokens. If prompt size is still rough, paste representative text into Prompt draft and let the estimator seed the prompt field.
  3. Set Requests per day and Billing days per month. These two fields turn a unit-cost idea into an operating-cost estimate.
  4. Open Advanced and add the assumptions that change planning decisions most: cached-read pricing, cache hit rate, retry multiplier, margin uplift, fixed monthly fees, growth scenario, and budget cap.
  5. Read Summary first to confirm the per-request, daily, and monthly totals move the way you expect.
  6. Use Components, Usage, Unit Economics, Spend Pressure, Model Ladder, and Scenario Burn to figure out whether the next move should be shorter prompts, shorter answers, fewer retries, a different model card, or a tighter budget target.

Interpreting Results:

Per request is the unit price of the current mix. It matters, but it is usually not the decision number. The operational number is Monthly total (tokens + fees), because that is where token size, billed attempts, margin, and fixed fees finally meet. A cheap-looking request can still miss a cap once it is repeated enough times.

Components needs a careful read. Those rows are diagnostic layers, not a strict additive ledger. Cached-read relief and retry overhead describe pressure already folded into the billed total, so you should use them to explain the current month rather than sum them again by hand.

How to read the main result areas in the LLM usage cost calculator
Result area Best question it answers Common misread
Summary What will this workload cost per request, per day, and per month? Treating per-request cost as the whole budget story
Components Which billing layer is pushing the total upward? Adding diagnostic rows together as if they were separate charges
Usage How much traffic is successful volume versus retry overhead? Ignoring the difference between requests and billed attempts
Unit Economics How expensive is the current mix per 1K or 1M tokens, and what fits under the cap? Assuming a lower effective token cost automatically means a lower invoice under every workload
Spend Pressure Which single change saves the most money from the current baseline? Taking a model switch as a quality recommendation instead of a cost-only comparison
Model Ladder How does the same workload price out across verified preset cards? Forgetting that only the rate card changes while all workload assumptions stay fixed
Scenario Burn What happens if prompt size, answer size, cache share, growth, or peak demand changes? Reading it as a long-range forecast instead of a nearby stress test

Budget results are straightforward. Positive headroom means the configured month still fits under the cap. Overage means it does not. The derived request and token ceilings are useful planning aids, but they are only as good as the workload assumptions already entered above them.

Worked Examples:

1. Baseline GPT-4o planning run

With the default setup, the workload uses 1,400 prompt tokens, 600 completion tokens, 240 requests per day, 30 billing days, and the GPT-4o preset card. The calculator returns Per request = $0.0095, Daily = $2.28, and Monthly total (tokens + fees) = $68.40 across 14.4 million monthly tokens. The growth scenario rises to $82.08 and the peak month rises to $102.60. That is a clean baseline because the rate card, token mix, and traffic are all visible in one place.

2. Strong prompt caching still misses a tight cap

Keep the same baseline workload, set Cache hit rate to 60%, and add a Monthly budget cap of $60. The blended prompt card drops enough to move Per request to $0.00845, and the summary shows about $7.56 in monthly cached-read relief. Even so, the monthly total lands at $60.84, so the budget result still shows overage. This is a good example of why efficiency wins and budget fit are related but not identical questions.

3. Retry overhead plus fixed fees in a heavier custom workload

Suppose a custom workflow averages 2,200 prompt tokens, 900 completion tokens, 1,200 successful requests per day, 22 billing days, 40% cached prompt reads, a 1.12 retry multiplier, 15% margin uplift, and $250 in fixed monthly fees. The calculator treats that as 1,344 billed attempts per day and produces a monthly total of $705.64, of which $250 is the fixed-fee floor. With a 35% growth scenario the month rises to $865.12, and the peak month reaches $979.03. This is the kind of setup where Usage, Components, and Spend Pressure usually matter more than the headline rate card.

FAQ:

Why does the prompt draft estimate differ from provider token counts?

Because the page uses a blended character-and-word heuristic, not a provider tokenizer. It is meant to get you close enough for planning, not to reproduce a billing ledger exactly.

Does cache hit rate reduce completion cost too?

No. In this model, cached pricing only affects prompt tokens. Completion tokens always use the completion rate.

Are cache-write or cache-storage fees included?

No. The calculator models discounted cached reads only. If your provider charges separate cache writes, storage, search grounding, tool use, taxes, or regional surcharges, you need to account for those outside this page.

Does the budget cap change the spend calculation?

No. The cap is a comparison layer. It reports headroom or overage and derives request or token envelopes, but it does not alter the core cost formula.

What does the model ladder actually compare?

It keeps prompt size, completion size, traffic, retries, margin, cache hit rate, and fixed fees constant. Only the preset rate card changes, so the ladder is a price comparison under the same workload.

Are my prompt drafts and inputs sent anywhere?

No. This tool has no server-side processing path, so the calculations and prompt estimate stay in the browser.

Are preset prices guaranteed to stay current?

No. Public pricing pages can change. Use the presets as a checked starting point, then refresh the rates or switch to custom values when accuracy matters for procurement or launch approval.

Glossary:

Prompt tokens
Input tokens sent to the model before it generates a response.
Completion tokens
Output tokens generated by the model in the response.
Billed attempts
Successful requests plus retry overhead after the retry multiplier is applied.
Cache hit rate
The share of prompt tokens that are billed at the cached-read price instead of the normal prompt price.
Fixed monthly fees
Costs added after token spend, such as platform, monitoring, or support charges.
Budget headroom or overage
The difference between the configured monthly cap and the projected monthly total.