Filebeat Grok Pattern
{{ patternPreview }}
Captures {{ fieldCount }} field{{ fieldCount === 1 ? '' : 's' }} from {{ presetLabel }} sample
{{ presetLabel }} Prefix {{ fieldPrefixDisplay }} {{ typeSpread.badges.join(' · ') }} Key=value detection Pipeline snippet Matches {{ sampleSummary.matches }}/{{ sampleSummary.processed }} Failures {{ sampleSummary.failures }} Coverage {{ sampleCoveragePercent }}% Avg fields {{ sampleAverageFields }}
  • {{ err }}
  • {{ warn }}
{{ pattern }}

                
Metric Value Copy
{{ row.label }} {{ row.value }}
No metrics available.
Field Pattern Example Type Copy
{{ field.name }} {{ field.pattern }} {{ field.sample || '—' }} {{ field.type }}
No captured fields detected.
Analysed {{ sampleSummary.processed }}/{{ sampleSummary.total }} Matches {{ sampleSummary.matches }} Failures {{ sampleSummary.failures }} Avg coverage {{ sampleCoveragePercent }}% Avg fields {{ sampleAverageFields }}
Sample Status Captured Coverage Copy
#{{ row.index }} · {{ row.snippet }} {{ row.status === 'matched' ? 'Matched' : row.status === 'failed' ? 'No match' : row.status === 'error' ? 'Error' : 'Blank' }}
{{ row.message }}
Fields: {{ field.name }}, , ...
{{ row.capturedCount }} {{ (row.coverage || 0).toFixed(1) }}%
No samples analysed yet.
Showing capture counts grouped by the Grok operand assigned to each field.

                

Introduction:

Grok patterns are reusable match expressions for log messages that turn unstructured lines into labeled fields you can search. Filebeat uses them to extract time, severity, paths, and other details from common web and system logs. Many teams look for a simple grok pattern generator for Nginx or Apache when they want fast results.

Provide a representative line and you get a proposed pattern with captured fields and a clear sense of coverage. You can start from an Apache format, an ingress gateway style, or a classic syslog sample and then refine the result.

Results are designed to be scanned quickly so you can decide what to keep and what to ignore. A short sample is fine for a first pass, and a larger paste gives a clearer picture of match quality.

For example, a line with a timestamp, a level of INFO, a path of /api/items, and a status of 200 yields captures for time, level, path, and status, plus a message tail. That makes it easy to route fields to dashboards without manual regex work.

Logs often carry surprises and a match can still miss a corner case, so review the coverage and make small edits until the fields reflect your intent. Prefer stable keys and consistent quoting to keep patterns sturdy.

Technical Details:

Log lines are text records that often contain timestamps, levels, IP addresses, Uniform Resource Locators (URL), user agents, and free text. Field names in the output follow Elastic Common Schema conventions, for example http.request.method, url.path, and log.level.

The generator assembles a Grok expression by detecting a timestamp, scanning for key=value fragments, classifying positional tokens such as methods and status codes, and escaping the literal separators between them. Recognized keys map to canonical ECS fields, while unfamiliar keys can be normalized into a dotted form.

Each run reports a pattern string, an estimated confidence label, known‑field coverage, and a table of captured fields. Coverage indicates how many captures align with well known ECS concepts; higher values usually need fewer custom mappings.

Presets include Apache HTTP combined, Nginx ingress, and Syslog (RFC 3164). Timestamps are detected as ISO 8601 or RFC 3339 date‑times, HTTP date strings, classic syslog stamps, or epoch milliseconds depending on the hint you select.

Processing pipeline

  1. Find a timestamp using the selected hint and add a time capture.
  2. Detect a standalone level token such as INFO or ERROR if present.
  3. Parse key=value pairs, map known keys to ECS, and infer simple types.
  4. Sweep remaining tokens and classify method, status, IP, host, URL path, and durations.
  5. Capture any remaining tail as a message when no explicit message key exists.
  6. Apply an optional field prefix and ensure unique field names.
  7. Escape literal separators, anchor with ^ and $, and output the pattern.
  8. Optionally emit an ingest pipeline JSON with a grok processor for the chosen match field.
  9. Compile the pattern to a regular expression and evaluate sample lines for coverage.

Token classes & meaning

Token classes mapped to meanings
Token Meaning Datatype Typical ECS Field
TIMESTAMP_ISO8601 ISO 8601 or RFC 3339 timestamp timestamp @timestamp
HTTPDATE HTTP date string timestamp @timestamp
SYSLOGTIMESTAMP Syslog style timestamp timestamp @timestamp
EPOCHTIMESTAMP Epoch seconds or millis number @timestamp
WORD / NUMBER Alphanumeric or numeric token text / int / float http.request.method, http.response.status_code
IP / HOSTNAME Address or host label ip / text source.ip, host.name
URIPATH / URIPATHPARAM Path with optional query text url.path
QUOTEDSTRING / GREEDYDATA Quoted fragment or message tail text user_agent.original, message

Canonicalization examples

Examples of key normalization
Input Key Canonical Field
level, severity, loglevel log.level
status, status_code http.response.status_code
method, verb, request_method http.request.method
path, route url.path
url, uri url.original
agent, user_agent user_agent.original
app, program service.name or process.name
pid, process_id process.pid

Parameters

Parameters and behavior
Parameter Meaning Datatype Typical Range Notes
timestamp_hint Bias timestamp detection enum auto, iso8601, httpdate, syslog, epoch Improves disambiguation in mixed formats
normalize_fields Apply ECS‑style field names boolean true / false Unknown keys become custom.*
detect_pairs Scan for key=value tokens boolean true / false Quoted values are supported
infer_types Cast numbers and booleans boolean true / false Adds :int, :float, :long suffixes
field_prefix Optional dotted prefix string e.g., event. Not applied to @ fields
include_pipeline Emit ingest pipeline JSON boolean true / false Uses the generated pattern
match_field Source field grok reads string default message Used in pipeline export
module_tag Optional module hint string e.g., custom Adds event.module in export
batch Process every non‑empty line boolean true / false Up to 200 lines per run

Units, precision & rounding

  • Coverage badges round to the nearest integer; sample table coverage displays one decimal place.
  • Average coverage and fields are kept to two decimals in the JSON view.
  • Duration tokens accept suffixes ms, s, m, h.

Limits & behavior

  • Batch analysis processes up to 200 lines; single‑sample mode analyzes only the first non‑empty line.
  • Confidence label thresholds: High ≥ 0.75, Medium ≥ 0.45, Low > 0.
  • Group names in the compiled regex use safe identifiers with non‑alphanumerics replaced by underscores.

Networking & storage

Processing runs in the browser. Clipboard copy, file download, and local file reads do not transmit data to a server. A charting layer may be loaded by the host page.

Assumptions & limitations

  • Heuristics may misclassify unusual token layouts Heads‑up
  • IPv6 recognition is basic and may not handle all compressed notations
  • HTTP date parsing assumes a standard day/month/year layout
  • Syslog timestamps lack a year and may need pipeline enrichment
  • Key detection expects key=value with optional quotes, not nested pairs
  • Unknown keys normalize to custom.* when normalization is on
  • Numbers are cast to :int or :float by simple patterns
  • Durations recognize only common unit suffixes and not compound forms

Edge cases & error sources

  • Mismatched or nested quotes in values
  • Overlapping tokens preventing a clean capture range
  • Epoch values shorter than 10 or longer than 13 digits
  • Non‑ASCII whitespace or zero‑width characters in pasted lines
  • Paths containing embedded quotes or brackets
  • IPv6 zones or mixed IPv6/IPv4 forms
  • Ambiguous method‑like words in message text
  • Locale‑specific month names in dates
  • Very long lines causing truncated table snippets
  • Trailing punctuation altering token boundaries

Privacy & compliance

Parsing is browser‑based and no server‑side storage is performed. Logs may contain sensitive data; review organizational policies before sharing examples.

Standards & references

  • RFC 3339 for date‑time profiles of ISO 8601.
  • RFC 3164 for traditional syslog format.
  • Elastic Common Schema for field naming consistency.

Step‑by‑Step Guide

Log patterns for Filebeat grok capture are built from a single representative line or a batch of lines.

  1. Choose a preset that matches your source, or keep Custom.
  2. Paste one canonical line into Sample.
  3. Set Timestamp hint if the date style is ambiguous.
  4. Toggle Normalize field names and Detect key=value as needed.
  5. Enable Batch analysis to check many lines at once.
  6. When satisfied, enable Include ingest pipeline and set the match field.

Your pattern is now ready to use in an ingest pipeline or for further tuning.

FAQ

Is my data stored?

No. Parsing and evaluation run in the browser, and downloads use local file generation. Review the exported JSON or DOCX locally before sharing.

How accurate is the pattern?

Accuracy depends on the representativeness of your sample. Coverage and confidence help judge fit. Use batch analysis to expose edge cases early.

Which timestamp formats are supported?

ISO 8601 or RFC 3339 date‑times, HTTP date strings, syslog timestamps, and 10 to 13 digit epoch values can be detected.

Can I use a different source field?

Yes. Set the match field when exporting the ingest pipeline. The default is message.

How do I parse Nginx ingress logs?

Select the Nginx ingress preset, paste a typical line, and adjust options. Confirm that status, bytes, upstream address, and request ID are captured.

What does “borderline” coverage mean?

It indicates many fields are custom or weakly classified. Consider adding stable keys, turning on normalization, or simplifying the message tail.

Does it work offline?

Once loaded, the generator runs locally. Any optional charting libraries may require network access on first load by the host page.

Are there costs or licenses?

The fragment describes behavior only. Refer to your deployment’s licensing and terms for usage of the surrounding platform and any third‑party libraries.

Troubleshooting

  • No match in samples: confirm the first line represents your format and adjust the timestamp hint.
  • Few fields captured: enable key detection and normalization, and add a field prefix if needed.
  • Coverage is low: prefer explicit keys and consistent quoting for stable captures.
  • Confidence is low: reduce ambiguity and remove noisy tokens from the example line.
  • Doc export unavailable: your host may not include the exporter module.
  • Clipboard blocked: allow copy permissions in your browser.
  • Batch truncated: reduce input size or run multiple batches.

Advanced Tips

  • Tip Add a short field prefix to keep custom captures organized.
  • Tip Use batch mode to reveal rare tokens and refine the message tail.
  • Tip Prefer quoted values for user agents and referers to avoid token splits.
  • Tip Map business‑specific keys to ECS fields to improve coverage.
  • Tip Set the match field to parse alternate sources such as event.original.
  • Tip Keep one stable example per source to regenerate patterns consistently.

Glossary

Grok pattern
Named regex fragments combined to parse text.
ECS field
Standardized name for common event attributes.
Coverage
Share of captured fields mapped to known concepts.
Confidence
Heuristic score based on token certainty.
Message tail
Remaining text after structured parts.
Ingest pipeline
Processing steps that enrich and parse events.