Filebeat Grok Pattern Generator
Turn one representative log line into a Filebeat-ready Grok pattern with ECS field mapping, sample-match checks, and optional ingest pipeline JSON.Generated result
Generated result
| Editable field | Pattern | Example | Type |
|---|---|---|---|
|
{{ field.renameError }}
|
{{ field.pattern }} | {{ field.sample || '—' }} | {{ field.type }} |
| No captured fields detected. | |||
| Metric | Value | Copy |
|---|---|---|
| {{ row.label }} | {{ row.value }} | |
|
No pattern metrics available
Enter a sample log line to build exportable pattern metrics.
|
||
| Field | Pattern | Example | Type | Copy |
|---|---|---|---|---|
| {{ field.name }} | {{ field.pattern }} | {{ field.sample || '—' }} | {{ field.type }} | |
|
No captured fields detected
Choose a preset or paste a representative line before exporting the field mapping.
|
||||
| Sample | Status | Captured | Coverage | Copy |
|---|---|---|---|---|
| #{{ row.index }} · {{ row.snippet }} |
{{ row.status === 'matched' ? 'Matched' : row.status === 'failed' ? 'No match' : row.status === 'error' ? 'Error' : 'Blank' }}
{{ row.message }}
Fields
{{ field.name }}={{ field.value }},
, ...
|
{{ row.capturedCount }} | {{ (row.coverage || 0).toFixed(1) }}% | |
|
No samples analysed yet
Load sample lines to compare the generated Grok pattern against source events.
|
||||
Introduction
Log files often compress a full event into one line of text. A web request, database warning, system login, or proxy error may carry a timestamp, host, process name, client address, URL, status code, duration, and human message in a single string. That format is convenient in a terminal, but search, dashboards, alerts, and retention rules work better when those pieces become named fields.
Grok is a pattern language used in the Elastic stack to split log text into fields. It combines reusable tokens such as IP, HTTPDATE, WORD, NUMBER, and GREEDYDATA with capture names such as source.ip, @timestamp, and http.response.status_code. The pattern still has to match the literal punctuation, spaces, brackets, and quotes around those captures.
- Capture
- A named value extracted from the log line, such as a client IP address or HTTP status code.
- Literal text
- The fixed characters that must appear around captures, including spaces, quotes, brackets, and separators.
- ECS name
- An Elastic Common Schema field name that helps events from different sources use shared vocabulary.
- Ingest pipeline
- An Elasticsearch processing chain that can run a Grok processor before a document is indexed.
Pattern quality depends on how representative the sample is. Access logs, syslog lines, and database logs often have stable shapes, while application logs may contain optional keys, embedded quotes, variable message tails, or multi-line stack traces. One successful match proves only that one example can be parsed.
The practical goal is a draft that is narrow enough to reject the wrong log family and flexible enough to accept normal variation inside the right one. Broad captures such as DATA and GREEDYDATA are useful for message text, but they can also hide fields that deserve separate names.
How to Use This Tool:
Start with one representative single-line event. The generator uses the first non-empty line for the pattern and sample analysis, so choose the line that best represents the log family you plan to parse.
- Choose Preset when the sample matches a known family such as Apache HTTP combined, Apache HTTP common, Apache error log, Nginx ingress, Nginx error log, HAProxy HTTP, IIS W3C, Postgres CSV log, or Syslog. Choose Custom for application-specific formats.
- Paste a Sample log line, use Browse log, or drop a LOG or TXT file. If the input contains several non-empty lines, the note shows how many extra lines were ignored for single-line pattern generation.
- Set Timestamp hint only when auto-detection could choose the wrong date shape. The hints cover ISO 8601 / RFC 3339, HTTP date, syslog timestamp, and epoch milliseconds.
- Leave Normalize field names on when captures should use ECS-style names. Use Detect key=value pairs for structured application logs, and turn it off when fixed-column text is being overcaptured.
- Use Infer numeric and boolean types when clear integers, decimals, and booleans should receive typed Grok suffixes such as
:int,:float, or:boolean. - Review the Grok field mapping editor. Rename generic fields such as
segment.1, and fix blank, duplicate, whitespace-containing, or colon-containing field names before copying the pattern. - Open Advanced only when you need the optional Pipeline Snippet, a custom Pipeline ID, a different Match field name, a field prefix, or a module tag for handoff.
Interpreting Results:
The Grok Pattern tab contains the string that will be copied into Elastic tooling. The other tabs explain how that pattern was assembled, how the representative line matched, and which fields need review before the pattern becomes part of a live pipeline.
- Field Mapping shows each capture, its Grok token, sample value, inferred type, and field name. Generic field names should be renamed only when the value has a stable meaning across real logs.
- Pattern Metrics reports segment count, confidence score, known field coverage, literal escaped characters, and sample-analysis metrics.
- Sample Analysis should show Matched for the representative line. No match means the anchored expression did not match the line from start to finish.
- Pattern Types shows the mix of token categories. Heavy use of broad text tokens means the expression may accept more variation than intended.
- Pipeline Snippet appears only when ingest pipeline JSON is enabled. The snippet is a handoff artifact; it does not create a pipeline or configure Filebeat by itself.
A high confidence label means the preset or recognizers found familiar structure. It is still a draft. Test the copied pattern with successful events, failures, empty fields, IPv6 addresses, odd quoting, and other production variants in Elastic's Grok Debugger or ingest pipeline simulation before routing live Filebeat data through it.
Technical Details:
Grok expressions sit on top of regular-expression matching while adding reusable aliases and semantic names. The syntax %{IP:source.ip} captures an IP-shaped value into source.ip. A suffix such as :int asks the receiving processor to convert a captured number rather than store it as plain text.
Filebeat can publish events to Elasticsearch, where an ingest pipeline runs processors before indexing. A Grok processor usually targets a text field such as message, applies one or more patterns, and writes named captures back into the event. The pattern therefore has two review surfaces: the text match itself and the schema fit of the field names.
Transformation Core
| Clue in the log | Typical Grok output | Field naming result | Main review point |
|---|---|---|---|
| Known preset shape. | Anchored family pattern with predefined tokens. | Mostly ECS-style names and source-specific fields. | A preset mismatch falls back to custom inference and should be treated as lower certainty. |
| ISO, HTTP, syslog, or epoch-like timestamp. | TIMESTAMP_ISO8601, HTTPDATE, SYSLOGTIMESTAMP, or numeric epoch capture. |
Usually @timestamp. |
The timestamp hint matters when several date shapes could be plausible. |
key=value or key: value fragments. |
Value-specific tokens such as NUMBER, IP, URI, WORD, or DATA. |
Known keys map to ECS fields; unknown keys become custom dotted names when normalization is on. | Fixed-column logs can be distorted when pair detection treats positional text as keys. |
| HTTP methods, status codes, IP addresses, hosts, paths, quoted user agents, and durations. | Loose recognizer captures with confidence based on how specific the token is. | Common fields such as http.request.method, http.response.status_code, source.ip, and url.path. |
Generic names such as segment.1 mean the value was recognized as text, not understood semantically. |
| Remaining message text. | GREEDYDATA message tail. |
Usually message. |
A greedy capture is useful for human text but can hide fields that deserve separate parsing. |
Coverage and Confidence
Coverage and confidence describe different properties. Known field coverage measures how many captures map to recognized field names. Sample coverage measures how many generated fields received a value when the representative line was checked. Confidence is the average strength of the preset or heuristic recognizers used to build the fields.
If a pattern defines 10 captures and the representative line fills 9 of them, sample coverage is 90 percent. A failed full-line match reports no captured fields because the anchors did not accept the sample from beginning to end.
| Signal | Boundary | Meaning |
|---|---|---|
| High confidence | At least 0.75. | A preset matched or most fields came from specific, known recognizers. |
| Medium confidence | At least 0.45 and below 0.75. | The draft has useful structure, but some captures are weaker or custom. |
| Low confidence | Above 0 and below 0.45. | The expression leans heavily on generic text captures. |
| Unknown | 0. | No meaningful confidence signal is available. |
The generated patterns are anchored with ^ and $. Anchoring prevents partial matches from looking successful, but it also means optional sections need deliberate handling. If production logs sometimes omit a referer, add an upstream field, change quote style, or switch date format, use multiple tested patterns or a more specific pipeline branch rather than one overloaded expression.
The optional ingest pipeline JSON wraps the pattern in a Grok processor for the selected match field, can add an event.module tag, and can include a pipeline ID. The JSON still has to be stored in Elasticsearch and referenced by Filebeat before it affects indexed events.
Accuracy and Privacy Notes:
Log parsing can look precise while still being fragile. A pattern that matches one line may fail on error paths, empty fields, alternate timestamps, IPv6 addresses, escaped quotes, unusual user agents, or multi-line stack traces.
- The in-page analysis uses the first non-empty line for single-line generation, even when pasted text or a file contains more lines.
- Broad tokens such as
DATA,GREEDYDATA, and quoted-string captures should be reviewed because they can accept unintended text. - Sample parsing and LOG/TXT file reading happen in the browser, but real logs can contain tokens, usernames, internal hosts, customer IDs, or request details. Redact sensitive values before sharing output.
- Elastic's Grok Debugger or ingest simulation remains the final check because custom pattern definitions, processor order, and pipeline failure handling are controlled by the running Elasticsearch environment.
Worked Examples:
Apache combined access log. A line such as 192.0.2.10 - - [12/Oct/2024:17:42:03 +0000] "GET /index.html HTTP/1.1" 200 1043 "referrer-page" "Mozilla/5.0" matches the Apache HTTP combined preset shape. Field Mapping should include source.ip, @timestamp, http.request.method, url.path, http.response.status_code, http.response.body.bytes, referer when present, and user_agent.original. Test normal responses, redirects, errors, blank referers, and unusual user agents before rollout.
Custom key-value application log. A line like 2024-10-20T14:52:11Z api-prod-1 app[1234]: level=INFO message="Started request" duration=35 path=/api/items status=200 lets custom inference detect a timestamp, log level, message, duration, path, and HTTP status. If Field Mapping includes generic segment.* fields, rename them only when the sample value has a stable meaning across real events.
Preset mismatch recovery. If Nginx error log is selected but the sample is really an application log, the warning explains that fallback heuristics were applied. Choose Custom or a closer preset, then confirm Sample Analysis returns Matched before copying the pattern.
FAQ:
Can I paste several sample lines?
You can paste them, but the generated pattern is based on the first non-empty line. Use Elastic tooling with a larger sample set before production rollout.
Why did my preset fall back to custom inference?
The representative line did not match the selected preset's expected shape. Pick a closer preset, fix the sample format, or keep the custom draft and review every field name.
Why does Sample Analysis show No match?
The generated pattern did not match the representative line from start to finish. Check the timestamp hint, pair detection setting, field edits, and broad tokens, then regenerate or test a cleaner sample.
Should I keep normalized ECS names?
Keep Normalize field names on when parsed events should align with fields such as source.ip, url.path, and http.response.status_code. Turn it off only when your pipeline intentionally uses source-specific names.
Does the pipeline snippet configure Filebeat?
No. The Pipeline Snippet is JSON for an Elasticsearch ingest pipeline. You still need to create the pipeline and configure Filebeat to use that pipeline ID.
Are pasted logs uploaded?
The sample pattern generation and file reading happen in the browser. Still redact secrets before sharing copied patterns, JSON, or reports because examples can contain sensitive log values.
Glossary:
- Grok pattern
- A pattern expression that uses aliases and named captures to extract fields from log text.
- Named capture
- The field name attached to a matched token, such as
source.ipor@timestamp. - Elastic Common Schema
- A shared field naming model for logs, metrics, and other event data in Elasticsearch.
- Ingest pipeline
- An Elasticsearch processing chain that can parse, set, rename, or otherwise transform events before indexing.
- Greedy capture
- A broad capture such as
GREEDYDATAthat can consume the rest of a message. - Sample coverage
- The percent of generated fields that received values when the representative line was checked.
References:
- Grok processor, Elastic Docs.
- Parse data using an ingest pipeline, Elastic Docs.
- Elastic Common Schema reference, Elastic Docs.
- RFC 3164: The BSD Syslog Protocol, IETF Datatracker.
- How to ingest logs from Filebeat into Elasticsearch, simplified.guide.
- How to create an ingest pipeline in Elasticsearch, simplified.guide.
- How to simulate an Elasticsearch ingest pipeline, simplified.guide.
- How to parse logs with grok in Logstash, simplified.guide.