Log Sampling Rate Calculator
Calculate online log sampling rates from event volume, average size, retention budget, burst allowance, and reserve rules for storage and quota planning.{{ result.summaryTitle }}
| Metric | Value | Detail | Copy |
|---|---|---|---|
| {{ row.metric }} | {{ row.value }} | {{ row.detail }} |
| Policy lever | Setting | Recommendation | Operator action | Copy |
|---|---|---|---|---|
| {{ row.lever }} | {{ row.setting }} | {{ row.recommendation }} | {{ row.action }} |
Log sampling reduces stored telemetry by keeping a planned share of events and dropping the rest before retention costs grow out of control. The useful number is not only the raw event count. It is the retained daily volume after event size, burst planning, storage overhead, and any always-kept logs are accounted for.
Sampling is most useful for high-volume streams whose routine messages are valuable for trends but too expensive to keep in full. Access logs, debug-heavy service logs, and repeated health-check noise are common candidates. Security, audit, error, and incident-specific records often need different treatment because losing a rare event can be more costly than storing extra data.
The main caution is statistical. A 10% sampling rate does not promise that every rare failure appears in the stored logs. It means roughly one event out of ten is retained from the sampleable portion over a large enough stream. Low-volume incidents, one-off security events, and audit trails need deterministic keep rules, a separate route, or a higher budget.
A good sampling plan therefore keeps two ideas separate in practice: what must always be retained, and what can be represented by a probability. The daily volume target should be checked after both parts are combined.
Technical Details:
Daily log volume comes from the event rate multiplied by the average stored bytes per event and by the number of seconds in a day. A burst multiplier raises the modeled rate before sizing, and an index overhead percent increases the stored byte estimate when labels, indexes, or metadata are not already included in the average event size.
The full-fidelity reserve is handled before probability sampling. If 5% of the stream must always be kept, that portion consumes budget first. The remaining sampleable traffic gets the calculated keep rate. When the reserve alone is larger than the target daily volume, lowering the probabilistic rate cannot make the plan fit.
Formula Core
The calculation uses bytes internally. Displayed storage units use binary multiples, so 1 GiB equals 1,073,741,824 bytes.
For the default values, 18,000 events/sec at 950 bytes per event produces about 1,375.97 GiB/day before sampling. A 120 GiB/day target with no reserve requires a sampleable keep rate of about 8.72%, which is an agent rate near 0.0872 and roughly 1 in 11.5 sampleable events.
| Input | Accepted range or unit | How it affects the result |
|---|---|---|
Incoming log rate |
Greater than zero, entered as events/sec, events/min, events/hour, or events/day. | Converted to events/sec before burst sizing and daily volume math. |
Average event size |
Greater than zero, entered as bytes/event, KiB/event, or MiB/event. | Multiplies the event count to estimate raw stored bytes. |
Target daily volume |
Greater than zero, entered as MiB/day, GiB/day, or TiB/day. | Sets the retained storage budget that the sampling rate tries to meet. |
Full-fidelity reserve |
0% to 95%. | Consumes budget before probability sampling is applied to the rest of the stream. |
Burst multiplier |
Greater than zero. | Raises or lowers the modeled event rate for peak-aware sizing. |
Index overhead |
0% to 300%. | Adds storage overhead when average event size does not already include index or metadata cost. |
| Result state | Boundary | Meaning |
|---|---|---|
full fidelity fits |
Sampleable keep rate is effectively 100%. | The modeled stream fits the target without probabilistic sampling. |
sampling planned |
Sampleable keep rate is at least 10% and below 100%. | The target can be met with a moderate sampling rate. |
tight budget |
Sampleable keep rate is below 10% and at least 1%. | Routine trends may remain visible, but rare events need explicit keep rules. |
sparse baseline |
Sampleable keep rate is below 1%. | The retained baseline is very thin and can miss low-frequency behavior. |
budget conflict |
Target daily volume is below the protected volume. | The reserve, target, or routing plan must change because probability sampling cannot fix it. |
Everyday Use & Decision Guide:
Use measured counters when possible. A vendor usage screen, collector metric, or short aggregation query is better than a guess because the result changes directly with event rate and average event size. If the average event size already reflects stored bytes, leave Index overhead at zero. If it is only the raw line or JSON payload size, add the expected overhead before using the result as a quota plan.
Set Target daily volume to the budget for this stream, not the account-wide quota unless the stream owns the whole quota. Add Full-fidelity reserve when specific categories must bypass sampling. Error logs, audit entries, security findings, and billing events usually need exact keep rules before a broad probability is applied to high-volume routine traffic.
- Read
Sampleable keep rateas the percent to apply after always-keep rules. - Use
Agent sampling valuewhen your collector expects a decimal rate such as0.0872. - Use
Overall retained ratefor capacity planning because it includes the reserve plus sampled traffic. - Check
Sampling Budget Curvewhen you need to explain how a different keep percent changes retained GiB/day. - Copy or download
Sampling Budget,Keep Policy, orJSONwhen the recommendation needs to move into a change ticket or runbook.
Treat very low rates as a warning, not merely a cheaper setting. If the result says sparse baseline, use deterministic rules for rare classes or increase the target before relying on the sampled stream for incident investigation.
Step-by-Step Guide:
Start with the measured stream average, then adjust only the planning assumptions that are known for the rollout.
- Enter the pre-sampling count in
Incoming log rateand choose the matching unit. The summary should move away fromneeds inputonce the value is greater than zero. - Enter the measured
Average event size. If the usage source reports stored bytes, keep the same unit family and do not add duplicate overhead. - Set
Target daily volumeto the retained budget for this stream. The summary badge shows the target as a daily storage amount. - Open
Advancedwhen the stream has protected categories, burst planning, or index overhead. SetFull-fidelity reserve,Burst multiplier, andIndex overheadonly when those assumptions are part of the plan. - Read the summary line and badge. If it reports
Protected logs exceed budget, lower the reserve, raise the target, or move protected logs to another route before using the sampling rate. - Open
Sampling Budgetfor raw volume, retained volume, dropped volume, and the agent rate. OpenKeep Policyfor the operator action tied to reserve, sparse baseline risk, burst planning, and overhead. - Use
Sampling Budget Curveto compare candidate keep rates against the target line. Download the chart or CSV when the review needs evidence. - Use
JSONwhen another workflow needs the inputs, recommendation, policy rows, and chart points in a structured handoff.
A complete handoff includes the stream label, incoming rate, average event size, target volume, reserve percent, burst multiplier, sampleable keep rate, overall retained rate, and retained GiB/day.
Interpreting Results:
The most important result is Sampleable keep rate. It is the probability for traffic outside the reserve, not necessarily the percent of all raw logs retained. When a reserve is present, Overall retained rate is the better capacity number because it combines always-kept logs with sampled logs.
Raw daily volumeshows what the stream would store before sampling under the selected burst and overhead assumptions.Retained daily volumeshould be at or near the target unless the reserve exceeds the target or the raw volume already fits.Dropped daily volumeis expected storage avoided, not proof that the dropped events are safe to lose.Sparse baseline riskmatters when the sampleable keep rate is very low. Trend charts may still work, but rare events can disappear.Peak allowancetells you whether the result was sized for average traffic or for the raised burst model.
Do not treat a clean retained-volume result as validation that the sampling policy is operationally safe. Compare the proposed rate with real collector behavior, incident requirements, audit needs, and vendor billing counters after rollout.
Worked Examples:
Busy access stream under a 120 GiB/day target
An edge access stream sends 18,000 events/sec with an average size of 950 bytes/event. With no reserve, no burst uplift, and a 120 GiB/day target, Raw daily volume is about 1,375.97 GiB/day. Sampleable keep rate becomes about 8.72%, Agent sampling value is near 0.0872, and Dropped daily volume is about 1,255.97 GiB/day. The tight budget status means routine trends may be affordable, but rare classes should be kept separately.
Protected errors and peak planning
A service stream runs at 3,000 events/sec, averages 800 bytes/event, uses a 1.2x burst multiplier, adds 20% overhead, and reserves 5% for full-fidelity categories. The raw model is about 278.09 GiB/day, and the reserve alone is about 13.90 GiB/day. A 120 GiB/day target leaves enough room for a Sampleable keep rate around 40.16%, with Overall retained rate about 43.15%. The policy is easier to defend because protected errors remain available while the routine stream is reduced.
Reserve larger than the target
A stream at 10,000 events/sec and 2,000 bytes/event produces about 1,609.33 GiB/day. A 15% full-fidelity reserve alone is about 241.40 GiB/day, so a 20 GiB/day target produces budget conflict. Sampleable keep rate falls to 0%, but Retained daily volume still exceeds the target because the reserve is too large. The corrective path is to lower the reserve, raise the target, or route protected events elsewhere.
FAQ:
What rate should I put into my collector?
Use Agent sampling value when your collector expects a decimal probability, and use Sampleable keep rate when it expects a percent. Apply it only after any full-fidelity rules represented by Full-fidelity reserve.
Why does the result say protected logs exceed budget?
The selected Full-fidelity reserve consumes more daily storage than Target daily volume. Probability sampling affects only the remaining sampleable traffic, so the fix is to reduce the reserve, increase the target, or separate the protected stream.
Does a 5% keep rate mean I keep every twentieth important event?
No. It means the sampleable stream is retained at about that probability across many events. Important low-frequency categories should be matched by deterministic rules instead of relying on the random sample.
Why are KiB, MiB, GiB, and TiB used for storage?
The calculation uses binary storage units: KiB is 1,024 bytes, MiB is 1,024 KiB, and GiB is 1,024 MiB. Match the unit to the way your logging vendor reports usage before comparing budgets.
Are my log contents sent anywhere?
This calculator only asks for numeric rates, sizes, percentages, and an optional label, and the arithmetic runs in the browser after the page loads. Avoid secrets in Export label because changed inputs can be mirrored in the current URL for sharing.
Glossary:
- Sampleable keep rate
- The percent of non-reserved traffic to retain after full-fidelity rules are applied.
- Agent sampling value
- The decimal probability form of the sampleable keep rate, such as
0.0872. - Full-fidelity reserve
- The share of the stream modeled as always kept before probabilistic sampling.
- Raw daily volume
- The estimated daily stored bytes before sampling, after burst and overhead assumptions.
- Index overhead
- Extra storage modeled for labels, indexes, metadata, or vendor storage expansion.
- Sparse baseline
- A very low sampleable keep rate that may be too thin for rare events.
References:
- Sampling, OpenTelemetry Authors, October 16, 2025.
- Sample Processor, Datadog.
- Sampling stage, Grafana Labs.
- Definitions of the SI units: The binary prefixes, National Institute of Standards and Technology.