AWS Lambda Cost Calculator

Workload preset:

Start from a common Lambda pattern, then tune requests, duration, memory, and provisioned concurrency.

Architecture:

Changing architecture updates the editable rate card unless Custom pricing is selected.

Region price profile:

Use US East for AWS public examples, or pick a premium/custom profile before editing exact regional rates.

Custom regional multiplier:

Set 1.00 for US East parity; raise it when your target region carries a published premium.

Monthly invocations:

Use billed invocations from CloudWatch, Cost Explorer, or your forecast.

Average billed duration:

Enter milliseconds per invoke after runtime and extension behavior are included.

Memory allocation:

AWS Lambda supports 128 MB to 10,240 MB in 1 MB increments.

Execution model:

Use On-demand for ordinary pay-per-use functions; use Provisioned window for launch events or latency-sensitive aliases.

Provisioned concurrency:

This drives GB-second capacity charges for the whole enabled window.

Provisioned hours per month:

Use 744 for always-on, or the exact event hours for a launch window.

hours

Requests during provisioned window: {{ provisioned_request_share_percent }}%

Monthly budget cap:

This is a planning cap before taxes, credits, CloudWatch logs, data transfer, API Gateway, and support charges.

$ / mo

Check Lambda assumptions

{{ error }}

Ephemeral storage allocation:

Raise this when functions use /tmp above the included 512 MB.

Model HTTP response streaming bytes

Adds Lambda response streaming charges for bytes beyond the first 6 MB per streamed request.

Model asynchronous event payload request units

Adds request-charge units for async events larger than the included 256 KB request payload window.

Apply Lambda free-tier allowances

Free-tier allowance is applied to request charges, on-demand duration, and response streaming bytes when modeled. Provisioned capacity and provisioned duration remain fully billable.

Free requests allowance:

requests

Free on-demand compute:

GB-s

Request rate:

$ / 1M requests

On-demand duration rate:

$ / GB-s

Provisioned duration rate:

$ / GB-s

Provisioned capacity rate:

$ / GB-s

Ephemeral storage rate:

$ / GB-s

Compute Savings Plans discount: {{ savings_plan_discount_percent }}%

Line item	Usage basis	Billable units	Cost	Pricing note	Copy
{{ row.item }}	{{ row.basis }}	{{ row.units }}	{{ row.cost }}	{{ row.note }}

Metric	Value	Why it matters	Copy
{{ row.metric }}	{{ row.value }}	{{ row.note }}

Lever	Modeled impact	Monthly delta	Operator note	Copy
{{ row.lever }}	{{ row.impact }}	{{ row.delta }}	{{ row.note }}

Embed:

Customize

Include current inputs

Size

Advanced

Width

Height

Aspect ratio

Max height

Collapsible embed

Allow fullscreen

Referrer policy

Sandbox tokens

A single Lambda invoke can look almost costless in a test log, yet a monthly bill is built from meters that stack together. Requests add count-based charges, billed duration becomes compute usage, configured memory controls the GB-second multiplier, and optional features such as Provisioned Concurrency, extra temporary storage, and response streaming can add separate lines. Small assumptions become visible when traffic reaches millions of invokes or when a launch window keeps capacity warm for hours.

GB-seconds are the central compute unit. The meter combines time and configured memory, so a 512 MB function running for 200 ms uses half as many compute units as a 1,024 MB function running for the same time. Memory is also a performance setting. More memory gives Lambda proportionally more CPU, and faster CPU can shorten billed duration enough to offset the higher memory multiplier for CPU-bound, dependency-heavy, or network-waiting functions.

Lambda estimates are usually needed before a deployment, a traffic launch, or a cost review. Teams compare architectures, check whether an API backend still fits a monthly cap, decide whether Provisioned Concurrency is worth the latency premium, or find out why a small-looking function grew expensive after a batch workload started sending millions of events. The useful answer is rarely one number alone. It is the billable total plus the reason that total changed.

Provisioned Concurrency changes both latency behavior and billing shape. It keeps execution environments ready for a selected version or alias, which can reduce cold starts for interactive traffic. That readiness has a capacity charge for the configured concurrency and enabled time. Invocations that land inside the provisioned window use provisioned-duration pricing rather than the ordinary on-demand duration path.

Account-level allowances and shared pricing tiers are common sources of mistaken forecasts. The public Lambda free tier includes monthly request and compute allowances that can be consumed by other functions before the workload being estimated gets any benefit. On-demand duration tiers are also aggregated by architecture, region, and account or organization. A single-function estimate should therefore be read as a planning model, not a promise that AWS will apply every allowance to that one function.

A useful Lambda cost estimate starts with billed duration from real measurements, request units that include retries and event-source behavior, the deployed memory setting, current regional rates, and a clear boundary around exclusions. API Gateway, CloudWatch Logs, data transfer, queues, databases, storage, taxes, credits, and private agreements can outweigh the Lambda charge itself, so the Lambda number belongs inside a wider architecture budget.

How to Use This Tool:

Start with a workload preset only as a shortcut. Then replace the preset values with the traffic, duration, memory, and rate assumptions for the function or forecast you need to price.

Choose the workload preset, architecture, and region price profile. Switch to custom pricing when you have exact regional rates, private terms, or a rate card from a billing export.
Enter monthly invocations, average billed duration, and memory allocation. Use billed duration from Lambda reports or a conservative forecast, not only handler timing from local tests.
Select the execution model. Use on-demand for ordinary pay-per-use functions, or choose a Provisioned Concurrency mode when a fixed warm window is part of the plan.
Enter provisioned concurrency, enabled hours, and the request share expected to land during the warm window when provisioned modeling is active. Use 744 hours only for always-on capacity.
Set a monthly budget cap when the summary should flag budget pressure for the Lambda slice of the architecture.
Open advanced settings for extra ephemeral storage, response streaming, asynchronous event payload request units, free-tier allowances, editable request and GB-second rates, and Compute Savings Plans discount.
Review the summary and Cost Ledger. If validation reports a range error, correct it before using Usage Units, Tuning Brief, charts, or JSON for a budget note.

Interpreting Results:

Total monthly cost combines request charges, on-demand duration, provisioned duration, provisioned capacity, extra ephemeral storage, response streaming overage, and any entered Compute Savings Plans adjustment. It excludes surrounding AWS services and account charges that are not part of the Lambda meters shown in the ledger.

Cost per million invokes is useful for traffic planning because it normalizes the current estimate into a unit that teams can discuss. It is not a fixed marginal rate. Free-tier exhaustion, provisioned capacity, async payload request units, streaming bytes, and memory changes can make the per-million figure rise or fall as volume changes.

AWS Lambda cost result interpretation
Result area	What it shows	Question to ask before budgeting
`Cost Ledger`	Line-item cost for requests, duration, capacity, storage, streaming, discount, and total.	Do the visible rates match the target region, architecture, and account terms?
`Usage Units`	Raw invocations, request units, GB-seconds, free-tier use, extra storage, and budget variance.	Are retries, asynchronous payload chunks, and streamed-response bytes represented?
`Tuning Brief`	What-if deltas for shorter duration, memory right-sizing, architecture, provisioned capacity, storage, free-tier headroom, streaming, and budget cap.	Can the suggested lever be tested safely with production-like traffic?
`Lambda Cost Mix`	A chart of which Lambda meter contributes most to the monthly total.	Is the largest slice a real Lambda problem, or is another service likely to dominate total application cost?
`Invocation Cost Curve`	Monthly cost across traffic levels up to about twice the current invocation volume.	Does the curve reveal a free-tier cliff, budget crossing, or fixed provisioned-capacity floor?

Treat a low Lambda estimate as a narrow finding. A function that reads from a queue, writes logs, calls a database, serves through API Gateway, or moves data across networks can still create costs outside the ledger. Use the JSON and table exports to keep the Lambda assumptions visible when you combine them with a broader cloud estimate.

Technical Details:

Lambda duration billing multiplies elapsed billed time by configured memory. AWS bills duration in millisecond units, and billed duration can include runtime initialization, handler work, extension shutdown, and other lifecycle behavior depending on the function configuration. Because average billed duration is entered directly, rounding and lifecycle effects should already be represented in that value.

Request pricing starts with invocation count, then can expand for asynchronous event payloads. The async model counts one request unit for the first 256 KB of an event and one additional request unit for each extra 64 KB chunk, up to a 1 MB event-size boundary. Response streaming is a separate byte meter that begins after the first 6 MB per streamed request and after any entered monthly streaming allowance.

Provisioned Concurrency creates two compute meters. Configured warm capacity is billed for enabled time and memory size, rounded by AWS billing rules outside the hour value entered here. Invocations served inside that window use provisioned-duration pricing. Invocations outside the window remain on the on-demand duration path and can use the entered on-demand compute allowance.

Formula Core:

The main cost drivers can be reduced to a small set of monthly usage units. The variable names below are only for the formulas: N is monthly invocations, D is billed duration in seconds, M is configured memory in GB, and PC is provisioned concurrency.

\begin{array}{lcl} M & = & \frac{memory MB}{1024} \\ D & = & \frac{billed duration ms}{1000} \\ {GBs}_{onDemand} & = & N_{onDemand} \times D \times M \\ {GBs}_{pcDuration} & = & N_{pc} \times D \times M \\ {GBs}_{pcCapacity} & = & PC \times enabled hours \times 3600 \times M \\ {GBs}_{tmp} & = & N \times D \times \frac{\max (0, ephemeral MB - 512)}{1024} \end{array}

Request cost is billable request units divided by one million and multiplied by the request rate. On-demand compute cost is billable on-demand GB-seconds after the entered compute allowance. Provisioned duration, provisioned capacity, and extra ephemeral storage use separate GB-second rates. The Compute Savings Plans discount is applied to duration and provisioned-capacity charges only in this model, and the slider is capped at 17%.

C_{total} = C_{requests} + C_{compute} - C_{savings} + C_{tmp} + C_{streaming}

AWS Lambda modeled meters and boundaries
Meter	Modeled rule	Boundary to verify
Requests	Total request units after optional async payload expansion, then reduced by the entered request allowance.	Retries, test invokes, fan-out, and event-source behavior can raise request units above business transactions.
On-demand duration	On-demand invocations multiplied by billed seconds and memory GB, then reduced by the entered on-demand compute allowance.	Provisioned-window invocations do not consume this compute allowance in the model.
Provisioned duration	Invocations assigned to the provisioned window multiplied by billed seconds, memory GB, and the provisioned-duration rate.	Traffic that exceeds configured provisioned capacity may run as ordinary on-demand execution in AWS.
Provisioned capacity	Configured concurrency multiplied by enabled seconds and memory GB.	AWS rounds configured periods up to a five-minute minimum, so enter short windows conservatively.
Additional ephemeral storage	Only storage above the included 512 MB is multiplied by duration and invocation count.	ETL, media, archive, and machine-learning workloads are more likely to make `/tmp` storage visible.
Response streaming	Streamed bytes above 6 MB per streamed request, reduced by the entered monthly streaming allowance, then multiplied by the per-GB rate.	Streaming can still bill full function duration even when a client disconnects.

For the default API backend shape of 25 million monthly requests, 180 ms billed duration, and 512 MB memory, raw on-demand compute is 25,000,000 x 0.18 x 0.5 = 2,250,000 GB-seconds before the entered compute allowance. Doubling memory to 1,024 MB doubles the memory factor, but total cost can still fall if the extra CPU cuts duration by more than half.

Limitations and Accuracy Notes:

Cloud prices vary by region, architecture, date, account agreement, usage tier, and credits. The built-in rates are planning defaults and every rate is editable. Check the current AWS pricing page, Cost Explorer, billing exports, and private agreements before committing a budget or publishing a forecast.

The calculation uses values entered on the page. It does not connect to an AWS account, read CloudWatch metrics, discover consumed free tier, inspect function settings, or verify a private rate card. Enter any allowance already consumed elsewhere as a reduced allowance, or turn the allowance off for a conservative estimate.

Several Lambda-adjacent and newer pricing areas are outside this calculator, including SnapStart snapshot cache and restore charges, Lambda Managed Instances, durable functions, event-source mapping provisioned mode, CloudWatch Logs, API Gateway, NAT gateway, data transfer, queues, databases, storage, taxes, and support charges.

Worked Examples:

API backend on demand

At 25 million monthly invokes, 180 ms billed duration, and 512 MB memory, the compute meter is 25,000,000 x 0.18 x 0.5 = 2,250,000 GB-seconds. Request charges and free-tier allowances are then applied separately, so the ledger can show billable requests even when part of the compute meter is still covered by the entered allowance.

Launch window with Provisioned Concurrency

A launch event with 100 provisioned environments for 8 hours at 1.5 GB memory creates 100 x 8 x 3,600 x 1.5 = 4,320,000 provisioned-capacity GB-seconds before any request duration is added. That fixed line can dominate a short campaign, even when each invocation is fast.

Async event payload expansion

If an asynchronous event averages 384 KB, the first 256 KB counts as one request unit and the extra 128 KB spans two additional 64 KB chunks. Each async event is therefore modeled as three request units while compute invocations remain unchanged.

Large streamed responses

With response streaming enabled and an average streamed response of 8 MB, only 2 MB per streamed request is treated as overage before the entered monthly streaming allowance. Keeping average streamed responses at or below 6 MB avoids that modeled byte overage, although ordinary duration and request charges still apply.

FAQ:

Should I use average duration or a percentile duration?

Use the statistic that matches the decision. Average billed duration is usually best for reconciling monthly spend. A higher percentile or conservative blend is safer for prelaunch planning when cold starts, retries, or long-tail requests are not yet known.

Why can lower memory increase cost?

Lower memory reduces the GB-second multiplier, but it also reduces available CPU. If the function becomes much slower, the longer duration can offset or exceed the cheaper memory setting.

Does the estimate include surrounding AWS services?

No. The result is limited to the Lambda meters represented in the ledger. Add API Gateway, CloudWatch Logs, data transfer, event sources, storage, queues, databases, and support charges separately.

When should Provisioned Concurrency be modeled?

Model it when a version or alias must stay warm for latency-sensitive traffic, especially around launch windows or predictable peaks. Do not add it only to reduce cost, because it introduces a fixed capacity line.

Are AWS account details used?

No. The calculator uses the values entered on the page and does not fetch AWS usage, credentials, CloudWatch metrics, or private pricing.

AWS Lambda Cost Calculator

How to Use This Tool:

Interpreting Results:

Technical Details:

Formula Core:

Limitations and Accuracy Notes:

Worked Examples:

API backend on demand

Launch window with Provisioned Concurrency

Async event payload expansion

Large streamed responses

FAQ:

References: