Prompt Completion Rate Requests
Inputs
Presets populate Advanced; choose Custom before entering your own /1K token prices.
Enter average input size per successful request, e.g. 1,400 prompt tokens.
tokens
Enter expected answer length per request, e.g. 600 completion tokens.
tokens
Use average successful API calls per day, e.g. 240.
req
Accepted: 1-31; use 22 for workdays or 30 for steady service.
days
Paste one representative prompt or context block; current estimate: {{ estimatedTokens }} tokens.
Example: 0.0025 here equals $2.50 per 1M input tokens.
$ / 1K tokens
Enter /1K cached input price, e.g. 0.00025 for $0.25 per 1M.
$ / 1K tokens
Accepted: 0-100; enter 60 for 60% cached prompt reads.
%
Example: 0.0100 here equals $10.00 per 1M output tokens.
$ / 1K tokens
Enter percent markup, e.g. 15; use 0 for raw provider cost.
%
Enter monthly USD add-ons, e.g. 49.00; leave 0 for token spend only.
$
Enter planned demand lift, e.g. 20 for +20% billed attempts.
%
Use 1.00 for no retry cost; 1.15 adds 15% billable attempts.
x
Enter target monthly USD, e.g. 500; leave 0 to skip cap checks.
$
Main cost summary
Metric Tokens / Volume Cost (USD) Copy
{{ row.label }} {{ row.tokensDisplay }} {{ row.costDisplay }}
Cost composition by component
Component Per Request Per Day Per Month Copy
{{ row.label }} {{ row.perRequest }} {{ row.perDay }} {{ row.perMonth }}
Usage detail
Component Per Request Per Day Per Month Copy
{{ row.label }} {{ row.perRequest }} {{ row.perDay }} {{ row.perMonth }}
Unit economics metrics
Metric Value Copy
{{ row.label }} {{ row.value }}
Spend pressure actions
Priority Move Monthly Total Savings Budget Effect Next Step Copy
{{ row.priority }}
{{ row.title }}
{{ row.context }}
{{ row.monthlyDisplay }} {{ row.savingsDisplay }} {{ row.budgetDisplay }} {{ row.nextStep }}
No savings moves are available yet. Add traffic, retries, or a budget cap to unlock pressure guidance.
Preset benchmark ladder
Rank Model Provider Per Request Monthly Vs Current Copy
{{ row.rank }}
{{ row.label }}
{{ row.budgetDisplay }}
{{ row.provider }} {{ row.requestDisplay }} {{ row.monthlyDisplay }} {{ row.deltaDisplay }}

            
Customize
Advanced
:

Introduction

LLM spending is a volume problem disguised as a tiny unit price. One request may cost less than a cent, but a support assistant, search workflow, coding agent, report generator, or document summarizer repeats that request pattern all month. The bill grows from the token mix, the number of attempts, the model rate card, and the fixed services wrapped around the model.

Tokens are the working unit. A token is a chunk of text or other model-readable content, and providers usually bill input and output tokens at different rates. Input covers the prompt, system instructions, retrieved context, tool results, and any repeated conversation history. Output covers the model's answer and, depending on the provider and model family, may include billed generated or reasoning categories that are not obvious from the visible reply alone.

Input tokens
Prompt, system text, retrieved context, tool results, and other content sent into the model.
Output tokens
Visible answer text and, for some models, billed reasoning or generated content categories.
Cached reads
Repeated input tokens billed at a reduced read price when the provider recognizes reusable context.
Billed attempts
The successful request volume plus expected retry, fallback, reconnect, or duplicate-call overhead.

Rate cards are usually quoted per million tokens because the per-token price is too small to read comfortably. Budget planning often works better at the per-request level: convert the rate into the unit used by the estimate, multiply by the average prompt and answer size, then multiply again by daily traffic. A small unit mistake can change the estimate by a factor of 1,000, especially when a published per-million price is entered into a per-thousand field.

Common LLM cost planning situations
Workload pattern Cost driver to watch Common mistake
Customer support chat Repeated system prompt and answer length Counting only the user's short message.
Retrieval-augmented answers Retrieved context and cache hit rate Ignoring documents injected before the model call.
Agent or tool workflow Retries, fallback calls, tool schemas, and tool results Pricing one visible answer as if it came from one request.
Batch report writing Long output tokens and peak-month volume Assuming input price matters more than generated text.
LLM API spend diagram showing input tokens, output tokens, rate cards, billed attempts, monthly fees, and budget cap
The same request can become more expensive through longer output, cache misses, retries, or fixed monthly costs.

Good estimates start from representative traffic, not from a clean demo prompt. A production trace may include hidden instructions, policy text, retrieved documents, function results, tool schemas, reconnects, and automatic retries. Measured token counts from real requests give a stronger baseline, then the same workload can be tested under changes such as shorter answers, higher cache reuse, higher request volume, or a different model price card.

A cost estimate cannot decide model quality. A cheaper row in a price comparison can identify pressure, but production selection still depends on latency, accuracy, context-window fit, safety behavior, regional terms, data-handling requirements, and the provider's current billing rules.

How to Use This Tool:

Start with one representative request, then widen the assumptions after the base request cost looks believable.

  1. Choose Preset for a built-in planning rate card, or choose Custom when a dashboard, invoice, enterprise quote, region, or batch plan gives different prices.
  2. Enter Prompt tokens and Completion tokens for an average successful request. Use Prompt draft for a rough input estimate when no tokenizer count is available.
  3. Set Requests per day and Billing days per month. The request count should represent successful user-facing work before retry overhead is added.
  4. Open Advanced to set prompt, cached-read, and completion rates, then add Cache hit rate, Retry multiplier, Margin uplift, Fixed monthly fees, Growth scenario, and Monthly budget cap.
  5. Read Usage Cost Brief for the monthly total and cap status. Use Cost Components and Token Usage Ledger to find the part of the bill that moved.
  6. Use Unit Economics, Spend Pressure Moves, Model Price Ladder, and Scenario Burn Curve for pricing comparisons, savings moves, and growth checks without losing the original workload shape.

If the total is off by hundreds or thousands of times, check the rate units first. Public pages often quote dollars per 1M tokens, while the editable rate fields use dollars per 1K tokens.

Interpreting Results:

Monthly total (tokens + fees) is the operating estimate. Per request is better for comparing prompt designs, but the budget decision depends on repeated volume, retry overhead, margin, fixed fees, and active billing days.

Cost Components is the best place to diagnose pressure. Cached-read relief lowers only the prompt side of the bill. Retry overhead raises billed attempts without adding user-facing volume. Fixed fees create a monthly floor that remains even when token counts shrink.

LLM usage cost output fields and interpretation cues
Output field Boundary or cue How to read it
Budget headroom >= 0 The modeled month fits inside the configured cap.
Budget overage < 0 headroom The workload misses the cap; reduce token size, attempts, rates, margin, or fixed fees.
Billed attempts Higher than successful requests when Retry multiplier is above 1.00 Retries, reconnects, or fallback calls are adding cost.
Effective prompt card Falls as Cache hit rate rises Prompt caching is improving input cost, but completion tokens are unchanged.
Model Price Ladder Same workload under different built-in rates Use it for cost pressure, not as proof that a cheaper model is suitable.

Do not treat the prompt draft estimate as invoice-grade tokenization. Verify high-stakes budgets against provider usage data, including hidden system text, retrieved context, tool calls, retries, cache-write charges, and generated output categories that your provider bills separately.

Technical Details:

LLM usage cost is a variable-cost model with fixed add-ons. Variable cost comes from token counts and token prices. Fixed cost comes from platform, logging, support, observability, marketplace, or internal chargeback amounts that do not shrink when an individual request becomes shorter.

Prompt and completion tokens are priced separately, then multiplied by billed attempts. Cached prompt reads are represented as a blended prompt rate. The cache hit ratio shifts a share of prompt tokens from the normal prompt rate to the cached-read rate, while completion tokens continue to use the completion rate. Cache writes, storage duration, batch discounts, regional multipliers, or tool-specific charges may exist in a provider contract, so the custom rate fields should reflect the actual billing terms being modeled.

Formula Core:

The main calculation blends the prompt rate, converts successful traffic into billed attempts, applies margin to variable token spend, and adds fixed fees at the end.

Rprompt,eff = Rprompt×(1-H)+Rcached×H Abilled = Q×max(S,1) Crequest = (P1000×Rprompt,eff+O1000×Rout)×(1+M100) Cmonth = Crequest×Abilled×D+F headroom = B-Cmonth
LLM usage cost formula variables
Symbol Meaning Unit or handling
PPrompt tokens per successful requestTokens
OCompletion tokens per successful requestTokens
HCache hit ratePercent converted to a ratio from 0 to 1
QSuccessful requests per dayRequests/day
SRetry multiplierClamped to at least 1.00
MMargin upliftPercent added to variable token spend
DBilling days per monthDays/month
FFixed monthly feesAdded after variable token spend
BMonthly budget capUsed for headroom and cap-envelope outputs

With 1,400 prompt tokens, 600 completion tokens, 60% cached prompt reads, prompt rate $0.0025 per 1K, cached prompt rate $0.00125 per 1K, and output rate $0.0100 per 1K, the effective prompt rate is $0.00175 per 1K. The base token cost is $0.00845 before margin. Adding 15% margin gives about $0.0097 per request; at 240 requests/day, 1.15x retry multiplier, and 30 days, monthly variable spend is about $80.46 before fixed fees.

Scenario and budget calculation rules
Derived result Rule Interpretation limit
Growth month Uses the configured growth percentage on billed attempts. Token size and rates stay constant.
Peak month Uses the larger of growth plus 25% or a 1.5x attempt multiplier. It is a stress test, not a long-range forecast.
Max tokens per request at cap Divides the variable monthly budget by current request volume and effective per-token cost. Fixed fees reduce the variable budget first.
Max prompt tokens at cap Holds completion cost fixed and solves the remaining per-request budget against effective prompt rate. Only valid while request volume, completion tokens, rates, and margin remain unchanged.
Max completion tokens at cap Holds prompt cost fixed and solves the remaining per-request budget against completion rate. Only valid while request volume, prompt tokens, rates, and margin remain unchanged.

The prompt draft estimator is deliberately rough. It normalizes whitespace, estimates from character count and word count, averages those two estimates, and rounds to a whole-token value. Provider tokenizers can differ by language, punctuation, code, hidden messages, tool schemas, images, and model family.

Displayed currency is rounded for readability, but the underlying arithmetic uses the numeric field values. Small rounding differences are normal when comparing the page to a provider invoice that reports more decimal places or separates extra billing categories.

Pricing Accuracy:

Built-in presets are a planning snapshot, not a live provider contract. The public-card refresh date is March 12, 2026, and provider pages can change model names, tokenizers, batch discounts, cache-write charges, cache-read discounts, data-residency multipliers, tool-call pricing, and deprecation status after that date.

Use Custom when a provider dashboard, invoice, procurement quote, cloud marketplace, region, batch mode, or enterprise agreement gives a different rate. For production budgets, reconcile the estimate against real usage exports before setting alerts, customer-facing prices, or internal chargebacks.

Worked Examples:

A report generator with no cache discount. A workload with 1,400 prompt tokens, 600 completion tokens, 240 requests/day, 30 billing days, prompt rate $0.0025 per 1K, and completion rate $0.0100 per 1K produces about $0.0095 in Per request cost. Monthly total (tokens + fees) is about $68.40 before fixed fees or margin.

A cached workload with retry overhead. Keeping the same token counts and rates, then adding 60% cache hits, 15% margin, $49 fixed monthly fees, and a 1.15x retry multiplier produces a monthly total near $129.46. With a $500 budget cap, Budget headroom remains positive at roughly $370.54, but Billed attempts is higher than successful request volume.

A rate-unit mistake during budget review. If a provider rate of $2.50 per 1M input tokens is pasted as 2.50 in the per-1K prompt-rate field, the prompt cost is overstated by a factor of 1,000. A surprising Budget overage should be checked against Effective prompt card and Completion card before changing the model choice.

FAQ:

Should I use preset pricing or custom rates?

Use presets for early comparison. Use Custom for current quotes, invoices, regional pricing, cloud marketplace pricing, batch pricing, or enterprise terms.

Does cache hit rate discount completion tokens?

No. The cache hit rate blends the prompt-token rate only. Completion tokens still use the completion rate.

Why are billed attempts higher than requests per day?

Retry multiplier turns successful requests into billed attempts. A value above 1.00 means repeated calls, fallback calls, reconnects, or retries are expected to be billed.

Why does the prompt draft estimate differ from my provider bill?

The draft estimator uses text length and word count. Provider tokenizers, system messages, retrieved context, tool schemas, images, reasoning tokens, and hidden billing categories can change the final count.

What should I check when the budget cap fails?

Check Completion tokens, Retry multiplier, Fixed monthly fees, and the per-1K rate fields first. Then use Spend Pressure Moves to see which single adjustment saves the most.

Glossary:

Prompt tokens
Input tokens sent to the model, including user text and any added context.
Completion tokens
Output tokens generated by the model for a request.
Cached reads
Repeated input tokens billed at a reduced read rate when provider caching applies.
Billed attempts
Successful request volume after retry and fallback overhead is included.
Retry multiplier
The factor that increases successful requests into expected billed attempts.
Margin uplift
A percentage added to variable token spend for chargeback, markup, or risk buffer.
Fixed monthly fees
Monthly costs added after token spend, such as platform or observability fees.
Budget headroom
The amount left between the modeled monthly total and the budget cap.

References: