HTTP Access Latency Log Analyzer
Analyze NGINX, Apache, or ALB access logs in your browser, compare endpoint P95/P99 and 5xx signals, and export SLA findings.| {{ header }} | Copy |
|---|---|
|
{{ emptyTableTitle(tab.key) }}
{{ emptyTableNote(tab.key) }}
|
|
| {{ cell.value }} {{ cell.value }} |
Introduction
Slow web requests rarely announce themselves as one neat failure. A checkout route may answer most users in under a second while a small share waits through cache misses, overloaded targets, retry storms, or a database queue. HTTP access logs are often the fastest evidence trail because they sit at the edge of the service and record what actually reached the server, proxy, or load balancer.
An access-log latency review turns individual request rows into a picture of service behavior. The useful row usually contains a method, path, status code, and one or more timing fields. Those fields may describe different parts of the request path: time spent at a reverse proxy, time waiting for an upstream application, target processing time behind a load balancer, or total elapsed request time.
- Endpoint
- A request method and path grouped together, such as
GET /api/orders/:id, so similar requests can be compared as one route. - Status class
- A broad HTTP response family such as
2xx,4xx, or5xx. Latency and failures should be read together because a fast error is still a failed request. - Tail latency
- The slower end of the sample, commonly summarized with P95 or P99 instead of an average.
Percentiles are more useful than averages for production latency work. P50 is the median, so half of the parsed requests are at or below that time. P95 marks the point where about 95 percent of parsed requests are at or below the value, leaving the slowest five percent in the tail. P99 is stricter again and often catches rare queueing, cold starts, retries, or overloaded dependencies that a median hides.
Timing basis is a common source of confusion. Backend or target timing focuses on the application side of the request. Edge or total timing includes more of the proxy, load balancer, and request handling path. Neither view is universally better. Backend timing is useful for service owners trying to find slow routes; edge timing is useful when the user-facing path, queueing, upload time, or load-balancer behavior is part of the question.
A reliable review also asks how much of the log was actually parsed. If format selection, timing units, or missing fields cause many rows to be skipped, the remaining percentile table can look precise while describing only part of the traffic. The safest reading combines endpoint percentiles, status-class mix, slow-request counts, and parse coverage before making an operational change.
Access logs are a starting point, not a full trace. They show request timing at the logging point, but they do not explain every database wait, downstream retry, cache decision, or client-network delay. Use them to narrow the question before checking traces, metrics, deploy history, target health, and application logs.
How to Use This Tool:
Start by matching the parser and timing basis to the log sample. The percentile and SLA findings are only useful when the rows are interpreted with the right format and unit.
- Enter a Service name so copied tables, downloaded files, and document exports carry useful context.
- Choose Log format for the rows you are pasting: NGINX key-value timing, Apache combined plus latency, or AWS ALB access log.
Use the Apache option only when the access log includes a duration token such as
%D,%T,%{ms}T, or a keyed value such asduration=. - Select Latency basis. Choose backend or target time for application service review, edge or total time for the wider request path, or auto when backend timing should win whenever the parser finds it.
- Set the P95 target, then leave Group IDs in paths on when numeric IDs or long hashes would split one route into many low-count endpoints.
- Paste access log rows, browse for a LOG or TXT file, drag a file onto Access log lines, or choose Load sample. Use Normalize when pasted rows include extra trailing whitespace or inconsistent line endings.
- Review any Check source data warning before using the tables.
Ignored-line warnings usually mean the selected parser, timing field, or Apache latency unit does not match the input. Open Parse Ledger to see the line-level parser note.
- Read Endpoint Latency and SLA Findings after the warning state is clear or understood. The summary badge shows parsed-line coverage, while the tables identify the hottest endpoint, 5xx pressure, slow-request count, and parse loss.
Interpreting Results:
Endpoint Latency ranks routes by P95 latency first, then request count. The State column moves to review when an endpoint P95 is greater than the configured P95 target or its 5xx rate is greater than the configured 5xx target.
Status Class Mix shows whether the sample is mostly successful traffic, client-visible rejects, server-side failures, or rows with missing status. Treat this table as context for the latency table. A route with low latency but a high 5xx rate is not healthy, and a slow 4xx-heavy route may point to abusive clients, validation problems, or authentication flows rather than backend saturation.
| Signal | What it means | What to verify |
|---|---|---|
| P95 breach | The sample's 95th percentile is greater than the selected target. | Check whether one endpoint, status class, or timing basis owns the breach. |
| P99 tail latency | The 99th percentile is above the slow request cutoff. | Inspect raw rows around the slow endpoints and look for retries, queueing, or target saturation. |
| 5xx rate | Server-side or load-balancer failure responses exceed the selected target. | Correlate status mix with deploys, target health, upstream resets, and capacity events. |
| Parse loss | Some lines did not produce a usable endpoint and latency value. | Open Parse Ledger and confirm the log format, timing fields, and Apache unit. |
SLA Findings is the action list. It can flag an overall P95 breach, P99 tail latency, high 5xx rate, slow request count, ignored lines, and the hottest endpoint. A passing SLA posture means no configured gate was exceeded in the parsed sample; it does not prove the service met its full service-level objective across all traffic.
Small samples make percentiles jump. With only a handful of rows, one slow request can dominate P95 or P99. Use the result as a triage signal, then compare with a longer window before changing autoscaling, routing, cache rules, or timeout settings.
Technical Details:
Access-log latency depends on which clock wrote the timing field. NGINX commonly records total request time separately from upstream response time. Apache can log elapsed request time in seconds, milliseconds, or microseconds depending on the configured directive. Application Load Balancer rows split request processing, target processing, and response processing into separate second-based fields.
Before percentiles are meaningful, timing values must be normalized to one unit and grouped against a stable endpoint key. Milliseconds are a practical common unit because they preserve the detail in NGINX and ALB second fields while matching many operational SLOs and alert thresholds. Endpoint grouping removes query strings and can replace numeric IDs or long hexadecimal path segments so one route does not scatter across many low-count rows.
Parser Core
| Log format | Timing fields used | Boundary to remember |
|---|---|---|
| NGINX key-value timing | urt or upstream_response_time for backend timing, uht as a backend fallback, and rt or request_time for edge timing. |
Values are read as seconds and converted to milliseconds. When multiple upstream values appear, the largest usable value is used. |
| Apache combined plus latency | Keyed duration fields such as duration, latency, request_time, rt, or time_taken, otherwise the last numeric timing token. |
The Apache latency unit can force seconds, milliseconds, or microseconds; auto mode infers from suffixes and value shape. |
| AWS ALB access log | target_processing_time for backend timing and request plus target plus response processing time for edge timing. |
Negative processing values are treated as unusable timing, so malformed or undispatched requests may be ignored for latency math. |
Formula Core
Percentiles are calculated from sorted latency values with linear interpolation. This method can return a value between two observed requests, especially in small samples, which is why a displayed P95 may not exactly match one raw log row.
For six backend latencies sorted as 18, 142, 205, 388, 711, and 1590 ms, the P95 rank is 4.75. The result sits three quarters of the way between 711 and 1590, so P95 is about 1,370 ms.
| Finding | Boundary | Action cue |
|---|---|---|
| Endpoint review | Endpoint P95 is greater than target, or endpoint 5xx rate is greater than target. | Prioritize that route before lower-latency passing endpoints. |
| Overall P95 latency | Overall P95 is greater than P95 target. | Start with the highest-P95 endpoint and compare timing basis. |
| Tail latency | Overall P99 is greater than the effective slow cutoff. | Inspect raw rows around tail endpoints. |
| Slow request count | Any parsed request latency is at or above the effective slow cutoff. | Use the endpoint table to find which route owns the slow rows. |
| 5xx rate | Overall 5xx rate is greater than 5xx target. | Inspect target health, upstream resets, and deploy timing. |
| Parse loss | At least one input line cannot produce both an endpoint and a non-negative latency value. | Confirm parser selection, timing field names, and Apache latency units before trusting the percentages. |
Status classes are grouped as 1xx through 5xx plus unknown. The 5xx rate uses parsed requests as the denominator, so ignored rows can change the meaning of the percentage. Parse loss is reported as a finding because the cleanest-looking percentiles can still be incomplete when the parser missed part of the sample.
Privacy Notes:
Access log text and browsed files are parsed in the browser. The JSON export summarizes the input line count and character count rather than repeating the full pasted log in the input summary, but result tables and parse ledgers can still contain endpoint paths, status evidence, and timing values. Review exports before sharing them outside your team.
Advanced Tips:
- Use the same Log format, Latency basis, path grouping, P95 target, and 5xx target when comparing before-and-after windows. Changing any of those settings changes the denominator, endpoint grouping, or breach test.
- Force Apache latency unit when you know the access-log directive. Use microseconds for
%D, milliseconds for%{ms}T, and seconds for%Tso P95 and slow-request counts are not off by a factor of 1,000. - Raise Parse ledger limit when a file has many ignored rows. The visible ledger is capped for readability, so increasing the limit helps inspect parser problems without changing the aggregate counts.
- Open Endpoint Percentile Profile after the table review. The chart compares P50, P95, and P99 for the highest-latency endpoints and marks the selected P95 target.
- Export SLA Findings and Endpoint Latency when handing off a summary. Use the JSON export for an audit envelope, but remember it intentionally excludes the raw pasted log text.
Worked Examples:
These examples use the shipped labels so the visible warning or table row is easy to recognize.
Backend latency breach
Six NGINX rows with upstream times from about 18 to 1590 ms can produce a sample P95 above a 300 ms target. SLA Findings should name the hottest endpoint before lower-volume routes, and Endpoint Percentile Profile should show whether P95 or only P99 is carrying the tail.
5xx-heavy route
A POST /api/payments group with one 502 in a small sample may exceed a 1% 5xx target even when most request timings look acceptable. Treat the 5xx rate finding as a failure signal, then check deploy timing, target health, and upstream resets before treating the route as a latency-only problem.
Parser mismatch
Apache rows pasted while Log format is set to NGINX can show ignored lines and missing timing. Switch to Apache combined plus latency, set Apache latency unit when the log directive is known, and confirm Parse Ledger reports parsed timing instead of missing fields.
FAQ:
Should I use backend or edge latency?
Use backend or target time when you want to focus on upstream service behavior. Use edge or total time when proxy, load-balancer, upload, or broader request-path time is part of the question.
Why did P95 look high in a tiny sample?
Small samples make percentile values sensitive to one slow row. Use a larger window before changing capacity, cache, routing, or timeout settings.
Why were some log lines ignored?
Ignored rows did not produce a usable endpoint and latency value. Check the log format, timing field names, and Apache latency unit, then use Parse Ledger to see the specific parser note.
What does Group IDs in paths change?
It replaces numeric path segments with :id and long hexadecimal segments with :hash. This keeps similar routes together for endpoint percentile calculations.
Glossary:
- P50
- The median parsed latency for an endpoint or sample.
- P95
- The latency value near the 95th percentile, often used as a tail-latency target.
- P99
- The latency value near the 99th percentile, useful for spotting rare slow requests.
- 5xx rate
- The share of parsed requests with server-side or load-balancer failure status.
- Timing basis
- The chosen clock used for latency math, such as backend, target, edge, or total request time.
- SLA finding
- A displayed action item triggered by the selected latency target, slow cutoff, 5xx target, slow-request count, or parse-loss check.
References:
- Configuring Logging, NGINX.
- mod_log_config, Apache HTTP Server.
- Access logs for your Application Load Balancer, Amazon Web Services.
- RFC 9110: HTTP Semantics, IETF.
- How to configure detailed Apache logging, Simplified Guide.