{{ result.summaryTitle }}
{{ result.primaryDisplay }}
{{ result.secondaryText }}
{{ result.statusText }} {{ result.targetBadge }} {{ result.dropBadge }} {{ result.agentRateBadge }}
Incoming Size Sampling Budget
Log sampling rate inputs
Use the unsampled event rate before agent filters, index exclusions, or storage tiering.
Use measured average bytes per log line or vendor-reported event size.
Enter the daily budget for retained logs from this stream.
Optional stream, service, index, or environment label for handoff exports.
Use 0 when all logs can share the same probability; reserve space for errors, audit, or security logs when needed.
%
Use 1 for average rate, or higher values such as 1.5 to size against peaks.
Leave 0 when the average event size already includes storage overhead.
%
Metric Value Detail Copy
{{ row.metric }} {{ row.value }} {{ row.detail }}
Policy lever Setting Recommendation Operator action Copy
{{ row.lever }} {{ row.setting }} {{ row.recommendation }} {{ row.action }}

        
Customize
Advanced
:

Introduction

High-volume logging becomes a storage problem before it becomes a search problem. Access logs, debug messages, health checks, retries, and routine success events can arrive by the thousands every second, while the records people need during an incident may be only a small part of the stream. Sampling gives teams a way to keep baseline telemetry without paying to retain every routine event forever.

A log sampling rate is the share of selected log events that should survive before the stream reaches the retained storage tier. A 10% rate means the sampled portion is expected to keep roughly one event in ten across a large enough volume. It does not guarantee that every rare error, unusual user journey, audit event, or billing record appears in the saved data.

Common log volume controls and when they fit
Control Typical use Main caution
Full retention Security, audit, billing, and incident-critical records that must be complete. Cost grows directly with event count, event size, and index overhead.
Filtering Drop known noise such as repetitive health checks or low-value debug messages. A broad filter can remove the only clue for a future investigation.
Probabilistic sampling Keep a representative baseline from routine high-volume logs. Rare events can disappear unless they have separate always-keep rules.
Consistent sampling Keep or drop related events together by trace ID, request ID, user, pod, or another stable attribute. The same sampling key and rate must be used where consistent grouping matters.

The basic storage arithmetic is simple, but the planning answer changes quickly. Event rate decides how many records arrive in a day. Average event size controls the bytes attached to each record. Burst planning raises the modeled rate above the measured average. Indexes, labels, metadata, and vendor accounting can add overhead. A daily target then forces a choice: keep everything, lower the sampleable volume, or redesign which categories stay at full fidelity.

Raw log volume split into always-kept records and sampleable records before retained volume is compared with a daily cap
A sampling percentage applies only after protected records are set aside.

Good sampling plans separate log categories before applying a probability. Routine access logs and repetitive success messages can tolerate sampling when aggregate trends are the goal. Security findings, audit trails, payment records, and low-volume errors usually need deterministic keep rules, because a statistically fair sample can still miss the single record that matters later.

The most common mistake is treating the daily storage target as the only constraint. A plan also needs a realistic event size, a peak or average traffic assumption, and a clear definition of which records bypass sampling. Without those inputs, a keep rate can fit the storage budget while still producing an investigation gap.

How to Use This Tool:

Use the calculator when you know the approximate incoming rate and want a storage-sized sampling percentage for one stream, service, index, or environment.

  1. Enter Incoming log rate before sampling and choose the matching unit. Use events per second, minute, hour, or day from collector counters, usage pages, or recent log metrics.
  2. Enter Average event size in bytes, KiB, or MiB per event. If your vendor already reports stored bytes, keep Index overhead neutral unless you are deliberately modeling extra labels, indexes, or metadata.
  3. Set Target daily volume to the retained daily budget for this stream. Do not use an account-wide quota unless this stream owns the whole quota.
  4. Open Advanced for rollout assumptions. Use Full-fidelity reserve for always-kept records, Burst multiplier for peak planning, and Index overhead for stored-byte expansion not already included in event size.
  5. Check the summary badge and Sampleable keep rate. If the badge says needs input, make the incoming rate, event size, target volume, and burst multiplier positive values before trusting the output.
  6. Use Sampling Budget for retained and dropped volume, Keep Policy for rollout guidance, and Sampling Budget Curve to see how retained GiB/day changes across candidate keep rates.

For a handoff, record the stream label, input units, Sampleable keep rate, Agent sampling value, Overall retained rate, retained daily volume, and any protected categories that must stay outside the probability rule.

Interpreting Results:

Sampleable keep rate is the probability to apply to records outside the full-fidelity reserve. Overall retained rate is the total share of modeled bytes kept after the reserve and sampled portion are combined. When the reserve is greater than zero, these two percentages usually differ.

Log sampling status badges and decision boundaries
Status badge Boundary What to do next
needs input Required numeric inputs are zero or invalid Fix Incoming log rate, Average event size, Target daily volume, or Burst multiplier.
budget conflict Target daily volume is less than protected volume Lower the reserve, raise the target, or route protected records to a separate retention path.
full fidelity fits Sampleable keep rate is at least 99.999% Keep all modeled events unless another cost or privacy policy requires filtering.
sampling planned >= 10% and below full fidelity Use the rate after exact keep rules for errors, audit, security, and other protected categories.
tight budget >= 1% and < 10% Check whether routine trends still have enough events and protect rare incidents separately.
sparse baseline < 1% Treat the sample as a thin baseline; do not rely on it for rare-event discovery.

A retained-volume match is not proof that the policy is safe. After rollout, compare the modeled retained daily volume with actual ingest or index counters, then confirm that critical categories appear through deterministic rules rather than chance.

Technical Details:

Daily log volume is the product of event frequency, bytes per event, and time. Sampling changes only the portion of the stream that is eligible for probability-based dropping. Records held in a full-fidelity reserve are counted before the probability is solved, so a large reserve can consume the target even when the sampleable traffic is dropped completely.

The calculation uses binary storage units for display. 1 GiB is 1,073,741,824 bytes, and 1 TiB is 1,024 GiB. The burst multiplier is applied to the incoming event rate before daily volume is calculated, and index overhead expands the estimated stored bytes when the average event size does not already include that overhead.

Formula Core:

The keep rate is solved by letting the protected volume use the daily target first, then assigning the remaining budget to the sampleable volume.

Emodeled = E×B Vraw = Emodeled×86400×S×(1+O) Vprotected = Vraw×R Ksampleable = clamp(T-VprotectedVraw-Vprotected,0,1) Vretained = Vprotected+(Vraw-Vprotected)×Ksampleable
Log sampling formula variables and units
Symbol Meaning Unit or range
EIncoming event rate after unit conversionEvents per second
BBurst multiplierGreater than 0
SAverage event size after unit conversionBytes per event
OIndex overhead as a fraction0 to 3, matching 0% to 300%
RFull-fidelity reserve as a fraction0 to 0.95
TTarget daily volume after unit conversionBytes per day
K_sampleableSampleable keep rate before percent formatting0 to 1

With the default values, 18,000 events/sec at 950 bytes/event produces about 1,375.97 GiB/day before sampling. A 120 GiB/day target with no reserve gives 120 / 1,375.97 = 0.0872, so the Sampleable keep rate is about 8.72% and the Agent sampling value is 0.0872.

The one-in-N cue is 1 / K_sampleable when the keep rate is above zero. It is a planning shorthand, not a promise about the exact position of retained records. Hash-based and trace-aware samplers may keep related records together, while independent random sampling can split related events unless the collector supports a consistent sampling key.

Accuracy Notes:

The model is a capacity estimate, not a vendor bill. Real retained volume can differ because of compression, field indexing, parsing failures, ingest filters, archive routing, billing rounding, retention tiers, and traffic patterns that do not match the measured average.

  • Use a peak-aware Burst multiplier when the daily target must survive traffic spikes rather than average days.
  • Keep the same event-size method across comparisons; mixing line bytes, compressed bytes, and billed bytes can distort the keep rate.
  • Check low keep rates against incident response needs before rollout, especially below 10%.
  • Review full-fidelity rules separately when the status is budget conflict; changing the sampling probability cannot reduce protected volume.

Worked Examples:

Busy access stream. A stream at 18,000 events/sec with 950 bytes/event, no reserve, no burst uplift, and a 120 GiB/day target produces about 1.34 TiB/day before sampling. Sampleable keep rate is 8.72%, Agent sampling value is 0.0872, and the status is tight budget.

Reserve conflict. The same stream with a 10% Full-fidelity reserve protects about 137.60 GiB/day before any sampling. Because that protected volume is already above the 120 GiB/day target, the status changes to budget conflict and probability changes cannot make the modeled stream fit.

Peak and overhead plan. A service at 4,200 events/sec with 1.2 KiB/event, a 1.4x burst multiplier, 25% index overhead, a 5% reserve, and a 260 GiB/day target models about 726.75 GiB/day raw. Sampleable keep rate is about 32.40%, so the policy keeps roughly one in 3.1 sampleable events after the reserve.

Suspiciously low rate. A 75,000 events/sec stream at 1,400 bytes/event with 20% overhead and an 80 GiB/day target needs a 0.79% Sampleable keep rate. Before using that sparse baseline, recheck whether the event rate was entered in the right unit, whether the target belongs only to this stream, and whether low-volume error categories have deterministic keep rules.

FAQ:

What value should I put into a collector or agent?

Use Agent sampling value when the collector expects a decimal probability, and use Sampleable keep rate when it expects a percentage.

Why can protected logs exceed the target?

Full-fidelity reserve is counted before sampling. If that reserved volume is larger than Target daily volume, the calculator reports budget conflict.

Does the calculator upload my logs?

No. It works from the numbers and optional stream label you enter; it does not read, parse, or upload log files.

Why does my vendor bill not match retained daily volume exactly?

The estimate can differ from billing because vendors may apply compression, indexes, metadata charges, archive rules, ingest filters, daily quotas, or rounding that are not captured by a single overhead percentage.

Should errors and audit logs use the same sampling rate?

Usually no. Keep critical categories with deterministic rules, then apply the calculated probability to routine sampleable logs.

What does a needs-input status mean?

At least one required numeric value is missing or invalid. Make Incoming log rate, Average event size, Target daily volume, and Burst multiplier greater than zero.

Glossary:

Incoming log rate
The unsampled event frequency converted to events per second.
Average event size
The expected bytes per log event before optional overhead is added.
Target daily volume
The retained storage budget for the modeled stream.
Full-fidelity reserve
The share of raw volume kept before probability sampling is applied.
Sampleable keep rate
The probability applied to logs outside the full-fidelity reserve.
Overall retained rate
The total share of modeled raw volume kept after reserve and sampling are combined.
Agent sampling value
The sampleable keep rate expressed as a decimal probability.
Index overhead
Extra stored bytes for labels, metadata, indexes, or similar accounting overhead.

References: