Cache TTL Hit Rate Calculator

Cache name:

Names the cache population in the exported plan.

Requests per minute:

Use the cacheable request rate after routing and bypass rules.

/min

Unique keys per minute:

Use distinct normalized cache keys, not raw URLs when query normalization applies.

keys/min

TTL:

Set the freshness window being considered for this cache rule.

min

Target hit rate:

Set the operating target for the guardrail ledger and current-TTL marker.

TTL behavior:

Choose the behavior closest to the cache being modeled.

Fixed expiry Sliding expiry

Origin change interval:

Used to flag freshness risk when TTL approaches update cadence.

min

Invalidation penalty:

Leave at 0 when purges are rare or already reflected in observed miss data.

Origin latency:

Use measured origin response latency for a representative miss.

Cache latency:

Use observed edge/cache latency for a representative hit.

Check cache inputs

{{ error }}

Metric	Value	Readout	Copy
{{ row.metric }}	{{ row.value }}	{{ row.readout }}

TTL	Hit rate	Origin pulls	Freshness note	Copy
{{ row.ttl }}	{{ row.hitRate }}	{{ row.originPulls }}	{{ row.freshness }}

Guardrail	Current	Decision	Next step	Copy
{{ row.guardrail }}	{{ row.current }}	{{ row.decision }}	{{ row.nextStep }}

Embed:

Customize

Include current inputs

Size

Advanced

Width

Height

Aspect ratio

Max height

Collapsible embed

Allow fullscreen

Referrer policy

Sandbox tokens

Cache hit rate is usually discussed as a single percentage, but the percentage is only the visible result of several moving parts. A request can be cacheable and still miss because the object has expired, the cache key is too specific, a purge cleared the entry, or a rule bypassed the cache for that request. Time to live, usually shortened to TTL, is the freshness window that decides how long a stored response may be reused before the cache must treat it as expired or revalidate it.

The useful question is not simply whether a TTL is "long enough." The same 15 minute TTL can be generous for a popular product image and nearly useless for an API route whose requests are split by user, language, tracking query, device header, or authorization state. Hit rate improves when many requests ask for the same normalized key while the cached object is still fresh. It weakens when traffic spreads across many keys, when objects change faster than the TTL, or when operational purges erase entries before normal expiry.

Request volume: The number of cacheable requests that actually reach the cache rule in a minute.
Key reuse: How often those requests repeat the same normalized cache keys instead of creating one-off entries.
Freshness window: The TTL or expiry behavior that keeps an object reusable before it becomes stale.
Invalidation loss: Hit opportunity removed by purges, deploy clears, bypass rules, or other events outside normal expiry.

Fixed and sliding TTLs answer different operational questions. A fixed TTL starts when the cache stores the object and expires after that duration, which matches many HTTP and CDN freshness decisions. A sliding TTL refreshes the expiry after access, so a frequently used object can remain cached as long as requests keep arriving. Mixing those ideas can make a cache plan look healthier than the real system. A CDN behavior controlled by HTTP freshness headers is usually closer to fixed expiry, while some application-memory caches and session-like stores are closer to sliding expiry.

TTL planning also has a safety side. A high hit rate is useful only when the response is safe to share and remains acceptable until expiry. Private data, personalized HTML, authorization-dependent responses, and headers that vary meaningfully by user should not be forced into a shared cache just to improve the percentage. For public objects, a longer TTL may still require reliable purge coverage or versioned URLs so updates appear when they should.

A TTL hit-rate estimate is best treated as a planning model. It can show whether more reuse, better key normalization, lower purge churn, or a longer freshness window is likely to help, but production cache logs remain the authority for real traffic bursts, cold starts, regional cache tiers, eviction, and response-specific cache directives.

How to Use This Tool:

Name the cache population with Cache name. Use one route, CDN behavior, object class, or cache namespace so the inputs describe one consistent rule.
Enter Requests per minute for cacheable traffic that reaches that population after routing and bypass rules.
Enter Unique keys per minute after cache-key normalization. Count the keys the cache actually sees, not every raw URL if query strings or headers are collapsed.
Set the current TTL and choose Fixed expiry or Sliding expiry. Use fixed expiry for most HTTP and CDN freshness planning unless the cache refreshes expiry after each hit.
Set Target hit rate and Invalidation penalty. The penalty should represent would-be hits lost to purges, manual clears, bypasses, deployment churn, or other cache-clearing events.
Add Origin change interval, Origin latency, and Cache latency so the result can connect hit rate with freshness exposure and latency payoff.
Start with Cache Snapshot, compare candidate windows in TTL Scenario Grid, then check the TTL Guardrail Ledger before changing a live rule. Use TTL Hit Curve to see whether extra TTL still buys meaningful hit-rate or latency improvement.

The table, chart, DOCX, CSV, image, and JSON exports are useful for review notes, but the inputs should still be reconciled with real cache analytics before a production rollout.

Interpreting Results:

The headline percentage is the effective hit rate, which applies invalidation loss after the fixed or sliding TTL model. Compare it with the modeled hit rate in the snapshot to see whether the TTL is the limiting factor or whether purge and bypass loss are taking away otherwise available hits.

Origin pulls converts misses back into request volume. A route can show a respectable percentage and still send too many requests to origin when traffic is large.
Target runway explains whether the selected hit-rate goal is reachable under the current reuse and invalidation assumptions. If the target is blocked, extending TTL cannot solve the problem by itself.
Blended latency is a weighted average of cache-hit latency and origin-miss latency. It is good for comparing TTL options, not for predicting percentile latency.
Freshness exposure compares the TTL window with the origin change interval and effective hit rate. A high warning means the cache is depending heavily on stored objects while the origin may change.
Invalidation drag shows how many hit-rate points are lost after the TTL model. Large drag often points to purge design, deployment practice, or cache-key fragmentation before it points to a longer TTL.

Treat a good result as permission to test, not as proof that the cache rule is safe. Confirm that the key varies on every response-changing value, private responses are excluded, purge coverage matches real updates, and the result agrees with observed hit and miss logs.

Technical Details:

The model starts with reuse intensity: cacheable requests per minute divided by unique cache keys per minute. That value estimates how many requests, on average, compete for the same stored object during each minute. A larger value means the same key is likely to be requested again before expiry. A smaller value means the cache spends more of its time filling objects that may not be reused.

Fixed TTL and sliding TTL use different probability curves. Under fixed expiry, an object has a set lifetime after fill, so the chance of a hit rises with reuse intensity and TTL but approaches its ceiling gradually. Under sliding expiry, every access renews the window, so repeated access can keep a hot object alive more aggressively. Both curves assume average request arrivals and do not model bursts, warm-up order, regional spread, or cache-size eviction.

Formula Core:

The core calculation is:

\begin{array}{lcl} λ & = & \frac{requests per minute}{unique keys per minute} \\ H_{fixed} & = & \frac{λ \times T}{1 + λ \times T} \\ H_{sliding} & = & 1 - e^{- λ \times T} \\ H_{effective} & = & H_{modeled} \times (1 - p) \\ origin pulls & = & requests per minute \times (1 - H_{effective}) \\ blended latency & = & H_{effective} \times L_{cache} + (1 - H_{effective}) \times L_{origin} \end{array}

λ is requests per key per minute, T is TTL in minutes, and p is invalidation penalty as a fraction. Effective hit rate is the result to compare against the target because it includes loss after normal TTL behavior.

Cache TTL input effects and checks
Quantity	Effect on the model	Common measurement mistake
Requests per minute	Sets the total cacheable request stream and scales hit and miss counts.	Including non-cacheable, private, health-check, or intentionally bypassed traffic.
Unique keys per minute	Controls reuse intensity; fewer keys make each TTL minute more valuable.	Counting raw URLs when the cache normalizes query strings, cookies, or headers differently.
TTL	Extends or shrinks the reuse window. A zero-minute TTL produces no modeled hits.	Using browser-cache TTL when the question is edge-cache behavior, or the reverse.
Invalidation penalty	Caps effective hit rate after the TTL curve is calculated.	Leaving purge churn at zero when deploys or content updates regularly clear the cache.
Origin change interval	Feeds the freshness exposure guardrail by comparing update cadence with TTL.	Using a whole-site deploy interval for an object class that changes on a different schedule.
Latency values	Convert the hit and miss mix into blended latency and estimated milliseconds saved.	Mixing percentile latency with average hit-rate math, or measuring cache and origin paths from different clients.

Target TTL and Boundaries:

The target TTL is solved by reversing the selected hit-rate curve after accounting for invalidation loss. The target is unreachable when the requested effective hit rate is greater than or equal to the post-invalidation ceiling. For example, a 5% invalidation penalty leaves a maximum effective hit rate below 95%, so a 98% target is blocked no matter how long the TTL becomes.

\begin{array}{lcl} A & = & 1 - p \\ q & = & \frac{H_{target}}{A} \\ T_{fixed} & = & \frac{q}{(1 - q) \times λ} \\ T_{sliding} & = & \frac{- \ln (1 - q)}{λ} \end{array}

A is the effective hit-rate ceiling after invalidation loss. q is the target converted back to the theoretical curve before loss is applied. The reverse formula is used only when the target is below the ceiling and reuse intensity is greater than zero.

Freshness exposure thresholds
Freshness exposure	Label	Meaning
Below 35%	Freshness steady	The TTL is small relative to update cadence after hit-rate weighting.
35% to below 70%	Freshness watch	Further TTL increases should be checked against real content changes and purge coverage.
70% or above	High freshness pressure	The cache may be relying heavily on stored objects while the origin changes often enough to matter.

HTTP cache behavior also depends on response directives, shared-cache rules, browser behavior, and provider-specific minimum, default, and maximum TTL settings. The model does not simulate validation requests, stale-while-revalidate behavior, collapsed forwarding, origin shielding, regional cache hierarchy, eviction pressure, or personalized response safety. Use it to narrow the likely TTL range, then confirm with production headers and cache analytics.

Limitations and Privacy:

The calculation runs in the browser from the values entered on the page. It does not need a server-side lookup to produce the hit-rate model, scenario rows, guardrail table, chart, or JSON report.

The estimate assumes average behavior for one cache population. It cannot detect unsafe cacheability, missing Vary behavior, accidental caching of private responses, cache eviction, traffic bursts, bot spikes, or provider-specific rule conflicts. For sensitive or user-specific responses, cache eligibility should be reviewed before any TTL tuning exercise.

Worked Examples:

Product-card edge cache. With 12,000 cacheable requests per minute, 1,800 unique keys per minute, a 15 minute fixed TTL, 5% invalidation penalty, 220 ms origin latency, and 35 ms cache latency, reuse intensity is about 6.67 requests per key per minute. The effective hit rate is about 94.1%, origin pulls are about 713/min, blended latency is about 46 ms, and the TTL needed for a 90% target is roughly 2.7 minutes.

High traffic with weak reuse. A route with 1,200 requests per minute, 6,000 unique keys per minute, and the same 15 minute fixed TTL has reuse intensity of only 0.20. The effective hit rate lands around 71.3%, so key normalization, variant cleanup, or fill coalescing may matter more than simply adding more TTL.

Target blocked by invalidation. If the target hit rate is 98% but invalidation loss is 5%, the post-invalidation ceiling is below the target. The target can only become reachable by reducing purge or bypass loss, improving cache eligibility, or lowering the target.

FAQ:

Why can a busy route still have a poor cache hit rate?

Traffic helps only when requests repeat the same normalized keys. Tracking query strings, user-specific cookies, device variants, or unnecessary header variation can turn one busy route into many low-reuse objects.

Should fixed TTL or sliding TTL be used for CDN planning?

Use fixed TTL for most HTTP and CDN cache planning because freshness usually starts when the object is stored. Use sliding TTL only when the cache actually renews expiry after access.

Can a longer TTL always reach the target hit rate?

No. Invalidation loss creates a ceiling. If the target is higher than the remaining post-invalidation opportunity, the target is blocked until purge loss, bypass loss, or the target changes.

Is blended latency the same as p95 or p99 latency?

No. Blended latency is an average-style comparison based on the hit and miss mix. Tail latency depends on traffic shape, origin behavior, network path, and cache provider behavior.

Does freshness exposure prove users will see stale content?

No. It is a guardrail that highlights risky TTL choices. Actual stale exposure depends on object updates, validation, purge timing, versioning, and whether clients or shared caches already hold older responses.

Glossary:

Cache key: The normalized identifier a cache uses to decide whether two requests can share the same stored object.
TTL: Time to live, the freshness duration assigned to a cached object before it expires or needs validation.
Hit rate: The share of cacheable requests served from cache instead of fetched from origin.
Invalidation penalty: The modeled share of hit opportunity removed by purges, bypasses, clears, or similar events.
Freshness exposure: A guardrail that compares TTL with origin update cadence and weights the result by effective cache hits.

References:

RFC 9111: HTTP Caching, Internet Engineering Task Force.
Edge and Browser Cache TTL, Cloudflare documentation.
Manage how long content stays in the cache, Amazon CloudFront Developer Guide.
How to enable caching in Nginx, Simplified Guide.
How to clear AWS CloudFront cache, Simplified Guide.