Log Sampling Rate Calculator

Incoming log rate:

Use the unsampled event rate before agent filters, index exclusions, or storage tiering.

Average event size:

Use measured average bytes per log line or vendor-reported event size.

Target daily volume:

Enter the daily budget for retained logs from this stream.

Export label:

Optional stream, service, index, or environment label for handoff exports.

Full-fidelity reserve:

Use 0 when all logs can share the same probability; reserve space for errors, audit, or security logs when needed.

Burst multiplier:

Use 1 for average rate, or higher values such as 1.5 to size against peaks.

Index overhead:

Leave 0 when the average event size already includes storage overhead.

Metric	Value	Detail	Copy
{{ row.metric }}	{{ row.value }}	{{ row.detail }}

Policy lever	Setting	Recommendation	Operator action	Copy
{{ row.lever }}	{{ row.setting }}	{{ row.recommendation }}	{{ row.action }}

Export to PDF Fullscreen

Embed:

Customize

Include current inputs

Size

Advanced

Width

Height

Aspect ratio

Max height

Collapsible embed

Allow fullscreen

Referrer policy

Sandbox tokens

Log sampling reduces stored telemetry by keeping a planned share of events and dropping the rest before retention costs grow out of control. The useful number is not only the raw event count. It is the retained daily volume after event size, burst planning, storage overhead, and any always-kept logs are accounted for.

Sampling is most useful for high-volume streams whose routine messages are valuable for trends but too expensive to keep in full. Access logs, debug-heavy service logs, and repeated health-check noise are common candidates. Security, audit, error, and incident-specific records often need different treatment because losing a rare event can be more costly than storing extra data.

Flow diagram showing raw log volume split into an always-kept reserve and a sampled stream, then compared with a daily storage target.

The main caution is statistical. A 10% sampling rate does not promise that every rare failure appears in the stored logs. It means roughly one event out of ten is retained from the sampleable portion over a large enough stream. Low-volume incidents, one-off security events, and audit trails need deterministic keep rules, a separate route, or a higher budget.

A good sampling plan therefore keeps two ideas separate in practice: what must always be retained, and what can be represented by a probability. The daily volume target should be checked after both parts are combined.

Technical Details:

Daily log volume comes from the event rate multiplied by the average stored bytes per event and by the number of seconds in a day. A burst multiplier raises the modeled rate before sizing, and an index overhead percent increases the stored byte estimate when labels, indexes, or metadata are not already included in the average event size.

The full-fidelity reserve is handled before probability sampling. If 5% of the stream must always be kept, that portion consumes budget first. The remaining sampleable traffic gets the calculated keep rate. When the reserve alone is larger than the target daily volume, lowering the probabilistic rate cannot make the plan fit.

Formula Core

The calculation uses bytes internally. Displayed storage units use binary multiples, so 1 GiB equals 1,073,741,824 bytes.

\begin{array}{lcl} Modeled events per second & = & incoming events per second \times burst multiplier \\ Raw daily bytes & = & modeled events per second \times 86400 \times average event bytes \times (1 + \frac{overhead percent}{100}) \\ Protected bytes & = & raw daily bytes \times \frac{full-fidelity reserve percent}{100} \\ Sampleable keep percent & = & clamp (\frac{target daily bytes - protected bytes}{raw daily bytes - protected bytes} \times 100, 0, 100) \\ Retained daily bytes & = & protected bytes + sampleable bytes \times \frac{sampleable keep percent}{100} \end{array}

For the default values, 18,000 events/sec at 950 bytes per event produces about 1,375.97 GiB/day before sampling. A 120 GiB/day target with no reserve requires a sampleable keep rate of about 8.72%, which is an agent rate near 0.0872 and roughly 1 in 11.5 sampleable events.

Input handling for the log sampling rate calculation
Input	Accepted range or unit	How it affects the result
`Incoming log rate`	Greater than zero, entered as events/sec, events/min, events/hour, or events/day.	Converted to events/sec before burst sizing and daily volume math.
`Average event size`	Greater than zero, entered as bytes/event, KiB/event, or MiB/event.	Multiplies the event count to estimate raw stored bytes.
`Target daily volume`	Greater than zero, entered as MiB/day, GiB/day, or TiB/day.	Sets the retained storage budget that the sampling rate tries to meet.
`Full-fidelity reserve`	0% to 95%.	Consumes budget before probability sampling is applied to the rest of the stream.
`Burst multiplier`	Greater than zero.	Raises or lowers the modeled event rate for peak-aware sizing.
`Index overhead`	0% to 300%.	Adds storage overhead when average event size does not already include index or metadata cost.

Important result states and boundaries
Result state	Boundary	Meaning
`full fidelity fits`	Sampleable keep rate is effectively 100%.	The modeled stream fits the target without probabilistic sampling.
`sampling planned`	Sampleable keep rate is at least 10% and below 100%.	The target can be met with a moderate sampling rate.
`tight budget`	Sampleable keep rate is below 10% and at least 1%.	Routine trends may remain visible, but rare events need explicit keep rules.
`sparse baseline`	Sampleable keep rate is below 1%.	The retained baseline is very thin and can miss low-frequency behavior.
`budget conflict`	Target daily volume is below the protected volume.	The reserve, target, or routing plan must change because probability sampling cannot fix it.

Everyday Use & Decision Guide:

Use measured counters when possible. A vendor usage screen, collector metric, or short aggregation query is better than a guess because the result changes directly with event rate and average event size. If the average event size already reflects stored bytes, leave Index overhead at zero. If it is only the raw line or JSON payload size, add the expected overhead before using the result as a quota plan.

Set Target daily volume to the budget for this stream, not the account-wide quota unless the stream owns the whole quota. Add Full-fidelity reserve when specific categories must bypass sampling. Error logs, audit entries, security findings, and billing events usually need exact keep rules before a broad probability is applied to high-volume routine traffic.

Read Sampleable keep rate as the percent to apply after always-keep rules.
Use Agent sampling value when your collector expects a decimal rate such as 0.0872.
Use Overall retained rate for capacity planning because it includes the reserve plus sampled traffic.
Check Sampling Budget Curve when you need to explain how a different keep percent changes retained GiB/day.
Copy or download Sampling Budget, Keep Policy, or JSON when the recommendation needs to move into a change ticket or runbook.

Treat very low rates as a warning, not merely a cheaper setting. If the result says sparse baseline, use deterministic rules for rare classes or increase the target before relying on the sampled stream for incident investigation.

Step-by-Step Guide:

Start with the measured stream average, then adjust only the planning assumptions that are known for the rollout.

Enter the pre-sampling count in Incoming log rate and choose the matching unit. The summary should move away from needs input once the value is greater than zero.
Enter the measured Average event size. If the usage source reports stored bytes, keep the same unit family and do not add duplicate overhead.
Set Target daily volume to the retained budget for this stream. The summary badge shows the target as a daily storage amount.
Open Advanced when the stream has protected categories, burst planning, or index overhead. Set Full-fidelity reserve, Burst multiplier, and Index overhead only when those assumptions are part of the plan.
Read the summary line and badge. If it reports Protected logs exceed budget, lower the reserve, raise the target, or move protected logs to another route before using the sampling rate.
Open Sampling Budget for raw volume, retained volume, dropped volume, and the agent rate. Open Keep Policy for the operator action tied to reserve, sparse baseline risk, burst planning, and overhead.
Use Sampling Budget Curve to compare candidate keep rates against the target line. Download the chart or CSV when the review needs evidence.
Use JSON when another workflow needs the inputs, recommendation, policy rows, and chart points in a structured handoff.

A complete handoff includes the stream label, incoming rate, average event size, target volume, reserve percent, burst multiplier, sampleable keep rate, overall retained rate, and retained GiB/day.

Interpreting Results:

The most important result is Sampleable keep rate. It is the probability for traffic outside the reserve, not necessarily the percent of all raw logs retained. When a reserve is present, Overall retained rate is the better capacity number because it combines always-kept logs with sampled logs.

Raw daily volume shows what the stream would store before sampling under the selected burst and overhead assumptions.
Retained daily volume should be at or near the target unless the reserve exceeds the target or the raw volume already fits.
Dropped daily volume is expected storage avoided, not proof that the dropped events are safe to lose.
Sparse baseline risk matters when the sampleable keep rate is very low. Trend charts may still work, but rare events can disappear.
Peak allowance tells you whether the result was sized for average traffic or for the raised burst model.

Do not treat a clean retained-volume result as validation that the sampling policy is operationally safe. Compare the proposed rate with real collector behavior, incident requirements, audit needs, and vendor billing counters after rollout.

Worked Examples:

Busy access stream under a 120 GiB/day target

An edge access stream sends 18,000 events/sec with an average size of 950 bytes/event. With no reserve, no burst uplift, and a 120 GiB/day target, Raw daily volume is about 1,375.97 GiB/day. Sampleable keep rate becomes about 8.72%, Agent sampling value is near 0.0872, and Dropped daily volume is about 1,255.97 GiB/day. The tight budget status means routine trends may be affordable, but rare classes should be kept separately.

Protected errors and peak planning

A service stream runs at 3,000 events/sec, averages 800 bytes/event, uses a 1.2x burst multiplier, adds 20% overhead, and reserves 5% for full-fidelity categories. The raw model is about 278.09 GiB/day, and the reserve alone is about 13.90 GiB/day. A 120 GiB/day target leaves enough room for a Sampleable keep rate around 40.16%, with Overall retained rate about 43.15%. The policy is easier to defend because protected errors remain available while the routine stream is reduced.

Reserve larger than the target

A stream at 10,000 events/sec and 2,000 bytes/event produces about 1,609.33 GiB/day. A 15% full-fidelity reserve alone is about 241.40 GiB/day, so a 20 GiB/day target produces budget conflict. Sampleable keep rate falls to 0%, but Retained daily volume still exceeds the target because the reserve is too large. The corrective path is to lower the reserve, raise the target, or route protected events elsewhere.

FAQ:

What rate should I put into my collector?

Use Agent sampling value when your collector expects a decimal probability, and use Sampleable keep rate when it expects a percent. Apply it only after any full-fidelity rules represented by Full-fidelity reserve.

Why does the result say protected logs exceed budget?

The selected Full-fidelity reserve consumes more daily storage than Target daily volume. Probability sampling affects only the remaining sampleable traffic, so the fix is to reduce the reserve, increase the target, or separate the protected stream.

Does a 5% keep rate mean I keep every twentieth important event?

No. It means the sampleable stream is retained at about that probability across many events. Important low-frequency categories should be matched by deterministic rules instead of relying on the random sample.

Why are KiB, MiB, GiB, and TiB used for storage?

The calculation uses binary storage units: KiB is 1,024 bytes, MiB is 1,024 KiB, and GiB is 1,024 MiB. Match the unit to the way your logging vendor reports usage before comparing budgets.

Are my log contents sent anywhere?

This calculator only asks for numeric rates, sizes, percentages, and an optional label, and the arithmetic runs in the browser after the page loads. Avoid secrets in Export label because changed inputs can be mirrored in the current URL for sharing.

Glossary:

Sampleable keep rate: The percent of non-reserved traffic to retain after full-fidelity rules are applied.
Agent sampling value: The decimal probability form of the sampleable keep rate, such as 0.0872.
Full-fidelity reserve: The share of the stream modeled as always kept before probabilistic sampling.
Raw daily volume: The estimated daily stored bytes before sampling, after burst and overhead assumptions.
Index overhead: Extra storage modeled for labels, indexes, metadata, or vendor storage expansion.
Sparse baseline: A very low sampleable keep rate that may be too thin for rare events.

References:

Sampling, OpenTelemetry Authors, October 16, 2025.
Sample Processor, Datadog.
Sampling stage, Grafana Labs.
Definitions of the SI units: The binary prefixes, National Institute of Standards and Technology.