Large Language Model (LLM) Usage Cost Calculator

LLM Usage Snapshot

${{ costPerCall.toFixed(4) }} per request

Prompt {{ formatNumberDisplay(promptTokens) }} tokens Completion {{ formatNumberDisplay(completionTokens) }} tokens ${{ dailyCost.toFixed(2) }} daily ${{ monthlyCost.toFixed(2) }} monthly {{ monthlyTokens.toLocaleString() }} monthly tokens Margin {{ marginPercent }}% Retries ×{{ retryMultiplierSafe.toFixed(2) }} Budget {{ budgetStatusText }}

Preset:

Prompt tokens:

tokens:

Completion tokens:

tokens:

Requests per day:

req:

Billing days per month:

days:

Prompt draft:

Estimator auto applied: {{ estimatedTokens }} tokens Updating the prompt updates token count.

Prompt rate:

$ / 1K tokens:

Prompt cached rate:

$ / 1K tokens:

Cache hit rate:

Completion rate:

$ / 1K tokens:

Margin uplift:

Fixed monthly fees:

Growth scenario:

Retry multiplier:

Monthly budget cap:

Waiting for inputs. Enter token counts and requests to see projections.

Preset: {{ presetLabel }}

Cost breakdown per horizon
Metric	Tokens	Cost (USD)	Copy
{{ row.label }}	{{ row.tokensDisplay }}	{{ row.costDisplay }}

Cost composition by component
Component	Per Request	Per Day	Per Month	Copy
{{ row.label }}	{{ row.perRequest }}	{{ row.perDay }}	{{ row.perMonth }}

Usage detail
Component	Per Request	Per Day	Per Month	Copy
{{ row.label }}	{{ row.perRequest }}	{{ row.perDay }}	{{ row.perMonth }}

High-growth scenario adds {{ growthPercent }}% more requests when drawing the chart. Adjust requests per day or the retry multiplier for conservative estimates.

Efficiency metrics
Metric	Value	Copy
{{ row.label }}	{{ row.value }}

Tags: Automation , Devtools , Finance

Export to PDF Fullscreen

Include query parameters

Embed:

Customize embed code

Include query parameters

Wrap embed in collapsible toggle

Collapse panel by default

Hide card frame (bare iframe)

Loading behavior

Width

Height

Aspect ratio (width : height)

Max height (optional)

Collapsible heading

Collapsible description (optional)

Allow fullscreen

Referrer policy

Sandbox tokens

Token billing is the basic meter behind most large language model APIs. Providers usually quote separate prices for input and output tokens, then your own traffic volume determines what those rates mean in practice. This calculator turns that pricing structure into per-request, daily, and monthly cost estimates.

That matters when you are sizing a prototype, checking whether a launch fits a team budget, or comparing one prompt design against another. A request that looks cheap in isolation can become expensive once retries, fixed fees, and billing days are added back in.

The package combines prompt tokens, completion tokens, requests per day, billing days, and either a preset or custom rate card. It then layers in cached prompt pricing, retry overhead, margin uplift, fixed monthly fees, growth, and an optional budget cap so the estimate matches operational planning better than a bare price-table lookup.

A modest change in output length can move spend as much as a headline rate change, and a retry multiplier can quietly turn 10,000 successful requests into 12,000 billable attempts. The reverse is also true: shorter completions or a strong cache hit rate can materially reduce monthly totals without shrinking traffic.

These figures are planning estimates, not invoice guarantees. The built-in prompt estimator is heuristic, provider tokenizers and caching rules differ, and the bundled preset rates can drift from current vendor price sheets. Use the result for budgeting and comparison, not procurement approval or financial advice.

Everyday Use & Decision Guide:

Start with the workload you can describe with the least guesswork: typical prompt tokens, typical completion tokens, and successful requests per day. If you already know your contract rates, switch to Custom or override the preset values in Advanced. If you do not, a preset is a reasonable first-pass baseline.

Use Prompt draft to rough in Prompt tokens, not to certify an invoice number. The package auto-fills prompt tokens until you manually override that field.
Set Retry multiplier early if fallbacks, streaming reconnects, or transient errors are common. A multiplier of 1.20 means the daily volume is treated like 20% more attempts.
Treat Cache hit rate as prompt-side relief only. Completion tokens never receive a cached discount in this model.
Stop and verify when the budget badge flips from headroom to overage. Compare Monthly total (tokens + fees) in Summary with Budget headroom / overage in Efficiency before changing assumptions.

After the first pass, compare Summary and Components instead of focusing only on Per request. Most budget surprises come from traffic, retries, or fixed fees rather than from the single-request figure.

Technical Details:

The calculator tracks two token buckets per request: prompt tokens and completion tokens. Each bucket is billed against a dollars-per-1,000-token rate. Prompt tokens can be split between a normal rate and a cached rate according to Cache hit rate, which gives the package an effective prompt price before any request-volume scaling happens.

Volume is then expanded in two stages. Requests per day becomes effective daily traffic after the lower-bounded Retry multiplier is applied, and monthly usage multiplies that traffic by Billing days per month. Margin uplift multiplies the token cost, while Fixed monthly fees are added only after token spend has been computed.

The scenario outputs reuse the same per-request cost model. The growth line multiplies daily requests by 1 + growthPercent / 100, and the peak line uses the larger of that growth multiplier plus 0.25 or 1.5. In other words, the chart is a stress view of traffic, not a forecasting engine.

Formula Core:

These equations describe the billing model used for the main summary outputs.

\begin{array}{lll} R_{prompt,eff} & = & R_{prompt} \times (1 - H) + R_{cached} \times H \\ Q_{eff} & = & Q \times max (S, 1) \\ C_{request} & = & (\frac{P}{1000} \times R_{prompt,eff} + \frac{O}{1000} \times R_{output}) \times (1 + \frac{M}{100}) \\ C_{day} & = & C_{request} \times Q_{eff} \\ C_{month} & = & C_{day} \times D + F \end{array}

Here, P is prompt tokens, O is completion tokens, H is cache-hit ratio from 0 to 1, Q is successful requests per day, S is retry multiplier, D is billing days, M is margin uplift, and F is fixed monthly fees.

Prompt Draft Estimator:

When you paste text into Prompt draft, the package estimates prompt size from two rough lenses: characters and words. It computes a character-based estimate with approximately 4 characters per token, computes a word-based estimate at about 1.32 tokens per word, averages the two, rounds to the nearest whole token, and writes that number back to Prompt tokens until you manually edit that field.

LLM usage cost result fields
Output Field	Meaning	Reading Note
`Per request`	Unit cost for one successful request after margin	Does not include daily volume by itself
`Monthly total (tokens + fees)`	Projected month cost including fixed fees	Primary budget review figure
`Cache savings (...)`	Prompt-side cost reduction from cached pricing	Appears only when cache hit rate is above zero
`Budget headroom / overage`	Difference between budget cap and monthly total	Positive means headroom, negative means overage
`Effective cost per 1K tokens`	Blended cost intensity for your configured workload	Useful for comparing setups, not invoices

The guardrails are simple and deterministic. Negative numeric values are coerced to zero where applicable, Cache hit rate is clamped to 0 to 100, and Retry multiplier is floored at 1. That means a monthly total of zero often signals missing traffic inputs rather than a broken rate formula.

Step-by-Step Guide:

Use this flow when you want a cost number that survives a quick sanity check.

Pick a Preset that is close to your model family, or choose Custom if you already know the exact rate card you need.
Enter Prompt tokens and Completion tokens. If prompt size is still rough, paste representative text into Prompt draft and let the package seed the prompt-token field.
Set Requests per day and Billing days per month. These two fields decide whether your unit cost stays a small lab number or becomes an operating number.
Open Advanced for the billing assumptions that change spend most: Prompt rate, Prompt cached rate, Cache hit rate, Completion rate, Retry multiplier, Margin uplift, Fixed monthly fees, Growth scenario, and Monthly budget cap.
Read Summary first. Confirm that Per request, the daily row, and Monthly total (tokens + fees) all move in the direction you expect.
Use Components and Efficiency to see what is driving spend, then open Cost Forecast for the base, growth, and peak scenarios.

If the summary still looks odd, check request volume and billing days before you change prices. Those two inputs explain many zero or unexpectedly tiny totals.

Interpreting Results:

Treat Per request as unit cost, not total impact. The number that usually matters operationally is Monthly total (tokens + fees), because that is where token size, effective traffic, retries, and fixed fees all meet.

A positive Budget headroom / overage means the current month estimate is still inside the cap. A negative value means the package projects the cap will be exceeded.
A lower Effective cost per 1K tokens does not guarantee a lower real invoice if your pasted prompt omits system text, retrieval context, or provider-specific tokenization behavior.
Cache savings (...) only reflects prompt-side discounts. A long completion can still dominate spend even when the cache hit rate is high.
If Per request looks reasonable but daily or monthly totals are zero, verify Requests per day and Billing days per month before trusting the forecast.

Use Summary for the decision, Efficiency for the sanity check, and the chart as a scenario view after the tabular totals already make sense.

Worked Examples:

Default planning snapshot

With the package defaults, Prompt tokens = 1400, Completion tokens = 600, Requests per day = 240, Billing days per month = 30, and the GPT-4o preset rates. The Summary table returns Per request = $ 0.0095, Daily (240 requests incl. retries) = $ 2.28, and Monthly total (tokens + fees) = $ 68.40. In Efficiency, Effective cost per 1K tokens is $ 0.0048. That is a clean baseline for comparing later tweaks.

Caching helps, but the budget still misses

Keep the same workload, set Cache hit rate = 60, and add Monthly budget cap = $60. The blended prompt rate falls, Per request drops to $ 0.0085, Cache savings (60% hit) shows -$ 7.56 for the month, and Monthly total (tokens + fees) lands at $ 60.84. The budget badge still reports overage because Budget headroom / overage is -$ 0.84. This is the useful edge case: the package shows that an efficiency win can still miss the cap.

A zero month caused by missing traffic

Suppose the token fields and rates are filled, but Requests per day stays at 0. The tool still calculates Per request, yet the daily row and Monthly total (tokens + fees) stay at $ 0.00 because there is no traffic to multiply. Once Requests per day is restored to a real workload, the monthly forecast populates immediately. That is a correction case, not a pricing bug.

Responsible Use Note:

Use this tool for cost planning, scenario comparison, and early budget conversations. Do not treat it as an invoice, an accounting record, or proof that a deployment will stay within contract terms once provider discounts, taxes, rounding policy, or tiered pricing are applied.

If a decision carries real procurement or financial risk, verify the assumptions against the provider's current rate card and your own traffic logs before committing spend.

FAQ:

Why does the prompt draft estimate not match a provider token count exactly?

Because the package uses a rough blend of character count and word count. Real tokenizers vary by model, language, and formatting, so the estimator is a starting point rather than a billing-grade counter.

Does cache hit rate reduce completion cost too?

No. In this model the cached discount applies only to prompt tokens. Completion tokens always use Completion rate.

Why are my daily and monthly totals still zero?

The usual causes are Requests per day = 0 or Billing days per month = 0. The unit cost can still exist even when traffic volume is missing.

Does the budget cap change the cost calculation?

No. It only compares the projected month against your cap and reports headroom or overage.

Are my prompts or assumptions sent to a server?

No. This tool has no server-side processing path, so the calculations stay in the browser.

Are the preset rates guaranteed to be current?

No. The presets are bundled package values. Check vendor pricing pages and update the advanced rate fields when accuracy matters.

Glossary:

Prompt tokens: Input tokens billed when you send text to the model.
Completion tokens: Output tokens billed when the model generates text.
Cached rate: Lower prompt-side price used for cached input tokens.
Retry multiplier: Average attempts per successful request.
Budget headroom: Remaining budget before the projected month reaches the cap.

References:

OpenAI Help Center. "What are tokens and how to count them?" OpenAI, accessed March 8, 2026. https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
OpenAI. "API Pricing." OpenAI, accessed March 8, 2026. https://openai.com/api/pricing/
Anthropic. "API Pricing." Anthropic, accessed March 8, 2026. https://www.anthropic.com/pricing
Anthropic Docs. "Prompt Caching." Anthropic, accessed March 8, 2026. https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
Google AI for Developers. "Gemini API Pricing." Google, accessed March 8, 2026. https://ai.google.dev/gemini-api/docs/pricing
Google AI for Developers. "Context Caching." Google, accessed March 8, 2026. https://ai.google.dev/gemini-api/docs/caching