Filebeat Grok Pattern Generator

Filebeat Grok Pattern

Captures {{ fieldCount }} field{{ fieldCount === 1 ? '' : 's' }} from {{ presetLabel }} sample

{{ presetLabel }} Prefix {{ fieldPrefixDisplay }} {{ typeSpread.badges.join(' · ') }} Key=value detection Pipeline snippet Matches {{ sampleSummary.matches }}/{{ sampleSummary.processed }} Failures {{ sampleSummary.failures }} Coverage {{ sampleCoveragePercent }}% Avg fields {{ sampleAverageFields }}

{{ err }}

{{ warn }}

Ignored {{ multiLineNote.extra }} additional line{{ multiLineNote.extra === 1 ? '' : 's' }} in single-item mode. Enable Batch to analyse all entries.

Preset:

Sample log line:

Field prefix:

Timestamp hint:

Normalize field names:

Detect key=value pairs:

Infer numeric & boolean types:

Batch analysis:

Include ingest pipeline JSON:

Pipeline ID:

Match field name:

Module tag (optional):

{{ pattern }}

Metric	Value	Copy
{{ row.label }}	{{ row.value }}
No metrics available.

Field	Pattern	Example	Type	Copy
{{ field.name }}	{{ field.pattern }}	{{ field.sample \|\| '—' }}	{{ field.type }}
No captured fields detected.

Analysed {{ sampleSummary.processed }}/{{ sampleSummary.total }} Matches {{ sampleSummary.matches }} Failures {{ sampleSummary.failures }} Avg coverage {{ sampleCoveragePercent }}% Avg fields {{ sampleAverageFields }}

Displaying the first {{ sampleSummary.processed }} sample{{ sampleSummary.processed === 1 ? '' : 's' }} of {{ sampleSummary.total }}. Refine your paste or download CSV for the full set.

Sample	Status	Captured	Coverage	Copy
#{{ row.index }} · {{ row.snippet }}	{{ row.status === 'matched' ? 'Matched' : row.status === 'failed' ? 'No match' : row.status === 'error' ? 'Error' : 'Blank' }} {{ row.message }} Fields: {{ field.name }}, , ...	{{ row.capturedCount }}	{{ (row.coverage \|\| 0).toFixed(1) }}%
No samples analysed yet.

Pattern components by type will appear once captures are detected.

Showing capture counts grouped by the Grok operand assigned to each field.

Tags: Devops , Devtools , Sysadmin

Export to PDF Fullscreen

Embed:

Grok patterns are reusable match expressions for log messages that turn unstructured lines into labeled fields you can search. Filebeat uses them to extract time, severity, paths, and other details from common web and system logs. Many teams look for a simple grok pattern generator for Nginx or Apache when they want fast results.

Provide a representative line and you get a proposed pattern with captured fields and a clear sense of coverage. You can start from an Apache format, an ingress gateway style, or a classic syslog sample and then refine the result.

Results are designed to be scanned quickly so you can decide what to keep and what to ignore. A short sample is fine for a first pass, and a larger paste gives a clearer picture of match quality.

For example, a line with a timestamp, a level of INFO, a path of /api/items, and a status of 200 yields captures for time, level, path, and status, plus a message tail. That makes it easy to route fields to dashboards without manual regex work.

Logs often carry surprises and a match can still miss a corner case, so review the coverage and make small edits until the fields reflect your intent. Prefer stable keys and consistent quoting to keep patterns sturdy.

Technical Details:

Log lines are text records that often contain timestamps, levels, IP addresses, Uniform Resource Locators (URL), user agents, and free text. Field names in the output follow Elastic Common Schema conventions, for example http.request.method, url.path, and log.level.

The generator assembles a Grok expression by detecting a timestamp, scanning for key=value fragments, classifying positional tokens such as methods and status codes, and escaping the literal separators between them. Recognized keys map to canonical ECS fields, while unfamiliar keys can be normalized into a dotted form.

Each run reports a pattern string, an estimated confidence label, known‑field coverage, and a table of captured fields. Coverage indicates how many captures align with well known ECS concepts; higher values usually need fewer custom mappings.

Presets include Apache HTTP combined, Nginx ingress, and Syslog (RFC 3164). Timestamps are detected as ISO 8601 or RFC 3339 date‑times, HTTP date strings, classic syslog stamps, or epoch milliseconds depending on the hint you select.

Processing pipeline

Find a timestamp using the selected hint and add a time capture.
Detect a standalone level token such as INFO or ERROR if present.
Parse key=value pairs, map known keys to ECS, and infer simple types.
Sweep remaining tokens and classify method, status, IP, host, URL path, and durations.
Capture any remaining tail as a message when no explicit message key exists.
Apply an optional field prefix and ensure unique field names.
Escape literal separators, anchor with ^ and $, and output the pattern.
Optionally emit an ingest pipeline JSON with a grok processor for the chosen match field.
Compile the pattern to a regular expression and evaluate sample lines for coverage.

Token classes & meaning

Token classes mapped to meanings
Token	Meaning	Datatype	Typical ECS Field
`TIMESTAMP_ISO8601`	ISO 8601 or RFC 3339 timestamp	timestamp	`@timestamp`
`HTTPDATE`	HTTP date string	timestamp	`@timestamp`
`SYSLOGTIMESTAMP`	Syslog style timestamp	timestamp	`@timestamp`
`EPOCHTIMESTAMP`	Epoch seconds or millis	number	`@timestamp`
`WORD` / `NUMBER`	Alphanumeric or numeric token	text / int / float	`http.request.method`, `http.response.status_code`
`IP` / `HOSTNAME`	Address or host label	ip / text	`source.ip`, `host.name`
`URIPATH` / `URIPATHPARAM`	Path with optional query	text	`url.path`
`QUOTEDSTRING` / `GREEDYDATA`	Quoted fragment or message tail	text	`user_agent.original`, `message`

Worked example. Sample: 2024-10-20T14:52:11Z ... level=INFO ... path=/api/items status=200 → captures include @timestamp, log.level, url.path, and http.response.status_code, with the remainder as message. A high coverage score means most captures map to known ECS fields.

Canonicalization examples

Examples of key normalization
Input Key	Canonical Field
`level`, `severity`, `loglevel`	`log.level`
`status`, `status_code`	`http.response.status_code`
`method`, `verb`, `request_method`	`http.request.method`
`path`, `route`	`url.path`
`url`, `uri`	`url.original`
`agent`, `user_agent`	`user_agent.original`
`app`, `program`	`service.name` or `process.name`
`pid`, `process_id`	`process.pid`

Parameters

Parameters and behavior
Parameter	Meaning	Datatype	Typical Range	Notes
`timestamp_hint`	Bias timestamp detection	enum	auto, iso8601, httpdate, syslog, epoch	Improves disambiguation in mixed formats
`normalize_fields`	Apply ECS‑style field names	boolean	true / false	Unknown keys become `custom.*`
`detect_pairs`	Scan for `key=value` tokens	boolean	true / false	Quoted values are supported
`infer_types`	Cast numbers and booleans	boolean	true / false	Adds `:int`, `:float`, `:long` suffixes
`field_prefix`	Optional dotted prefix	string	e.g., `event.`	Not applied to `@` fields
`include_pipeline`	Emit ingest pipeline JSON	boolean	true / false	Uses the generated pattern
`match_field`	Source field grok reads	string	default `message`	Used in pipeline export
`module_tag`	Optional module hint	string	e.g., `custom`	Adds `event.module` in export
`batch`	Process every non‑empty line	boolean	true / false	Up to 200 lines per run

Units, precision & rounding

Coverage badges round to the nearest integer; sample table coverage displays one decimal place.
Average coverage and fields are kept to two decimals in the JSON view.
Duration tokens accept suffixes ms, s, m, h.

Limits & behavior

Batch analysis processes up to 200 lines; single‑sample mode analyzes only the first non‑empty line.
Confidence label thresholds: High ≥ 0.75, Medium ≥ 0.45, Low > 0.
Group names in the compiled regex use safe identifiers with non‑alphanumerics replaced by underscores.

Networking & storage

Processing runs in the browser. Clipboard copy, file download, and local file reads do not transmit data to a server. A charting layer may be loaded by the host page.

Assumptions & limitations

Heuristics may misclassify unusual token layouts Heads‑up
IPv6 recognition is basic and may not handle all compressed notations
HTTP date parsing assumes a standard day/month/year layout
Syslog timestamps lack a year and may need pipeline enrichment
Key detection expects key=value with optional quotes, not nested pairs
Unknown keys normalize to custom.* when normalization is on
Numbers are cast to :int or :float by simple patterns
Durations recognize only common unit suffixes and not compound forms

Edge cases & error sources

Mismatched or nested quotes in values
Overlapping tokens preventing a clean capture range
Epoch values shorter than 10 or longer than 13 digits
Non‑ASCII whitespace or zero‑width characters in pasted lines
Paths containing embedded quotes or brackets
IPv6 zones or mixed IPv6/IPv4 forms
Ambiguous method‑like words in message text
Locale‑specific month names in dates
Very long lines causing truncated table snippets
Trailing punctuation altering token boundaries

Privacy & compliance

Parsing is browser‑based and no server‑side storage is performed. Logs may contain sensitive data; review organizational policies before sharing examples.

Standards & references

RFC 3339 for date‑time profiles of ISO 8601.
RFC 3164 for traditional syslog format.
Elastic Common Schema for field naming consistency.

Step‑by‑Step Guide

Log patterns for Filebeat grok capture are built from a single representative line or a batch of lines.

Choose a preset that matches your source, or keep Custom.
Paste one canonical line into Sample.
Set Timestamp hint if the date style is ambiguous.
Toggle Normalize field names and Detect key=value as needed.
Enable Batch analysis to check many lines at once.
When satisfied, enable Include ingest pipeline and set the match field.

Example: Paste an Apache combined line and confirm that method, path, status, bytes, referer, and agent are captured.

Your pattern is now ready to use in an ingest pipeline or for further tuning.

FAQ

Is my data stored?

No. Parsing and evaluation run in the browser, and downloads use local file generation. Review the exported JSON or DOCX locally before sharing.

How accurate is the pattern?

Accuracy depends on the representativeness of your sample. Coverage and confidence help judge fit. Use batch analysis to expose edge cases early.

Which timestamp formats are supported?

ISO 8601 or RFC 3339 date‑times, HTTP date strings, syslog timestamps, and 10 to 13 digit epoch values can be detected.

Can I use a different source field?

Yes. Set the match field when exporting the ingest pipeline. The default is message.

How do I parse Nginx ingress logs?

Select the Nginx ingress preset, paste a typical line, and adjust options. Confirm that status, bytes, upstream address, and request ID are captured.

What does “borderline” coverage mean?

It indicates many fields are custom or weakly classified. Consider adding stable keys, turning on normalization, or simplifying the message tail.

Does it work offline?

Once loaded, the generator runs locally. Any optional charting libraries may require network access on first load by the host page.

Are there costs or licenses?

The fragment describes behavior only. Refer to your deployment’s licensing and terms for usage of the surrounding platform and any third‑party libraries.

Troubleshooting

No match in samples: confirm the first line represents your format and adjust the timestamp hint.
Few fields captured: enable key detection and normalization, and add a field prefix if needed.
Coverage is low: prefer explicit keys and consistent quoting for stable captures.
Confidence is low: reduce ambiguity and remove noisy tokens from the example line.
Doc export unavailable: your host may not include the exporter module.
Clipboard blocked: allow copy permissions in your browser.
Batch truncated: reduce input size or run multiple batches.

Pattern still does not match? Test with a simpler line, add keys for critical values, and rebuild step by step.

Advanced Tips

Tip Add a short field prefix to keep custom captures organized.
Tip Use batch mode to reveal rare tokens and refine the message tail.
Tip Prefer quoted values for user agents and referers to avoid token splits.
Tip Map business‑specific keys to ECS fields to improve coverage.
Tip Set the match field to parse alternate sources such as event.original.
Tip Keep one stable example per source to regenerate patterns consistently.

Glossary

Grok pattern: Named regex fragments combined to parse text.
ECS field: Standardized name for common event attributes.
Coverage: Share of captured fields mapped to known concepts.
Confidence: Heuristic score based on token certainty.
Message tail: Remaining text after structured parts.
Ingest pipeline: Processing steps that enrich and parse events.