Git Diff Risk Analyzer
Analyze git diff risk online from numstat or diffstat rows to flag large files, risky paths, deletion pressure, binary diffs, and review actions before merge.{{ analysis.summaryTitle }}
| {{ header }} | Copy |
|---|---|
| No rows to export for the current input. | |
| {{ cell.value }} {{ cell.value }} |
Git diff risk is the review pressure hidden inside a set of changed files. A diff summary can show that one pull request changes five files and another changes fifty, but reviewers still need to know where the pressure sits: a database migration, authentication branch, payment gateway, generated lockfile, or deletion-heavy cleanup can all need different review attention before merge.
Line counts are useful because they make the review surface visible. They are also easy to overread. A large lockfile may mostly need a source-consistency check, while a small schema edit may need rollback planning and a domain owner. Deletions can be healthy cleanup, but a high deletion mix in migrations, permissions, or data paths should slow the review down until tests and recovery notes are clear.
A risk score is not a verdict on code quality. It is a way to decide which files deserve early attention, which review owners should be involved, and whether a pull request should be split before line-by-line review begins. The best use is a short triage pass before a reviewer opens the full diff, runs tests, or checks the deployment plan.
The most helpful output ties each changed path to its line delta, matched risk signals, severity level, and review note. That keeps review planning concrete instead of leaving the team with a raw diff summary and a guess about where the risky work might be.
Technical Details:
Git diff summaries compress changed files into counts and paths. --numstat is the most precise fit because each row carries additions, deletions, and a full pathname in a machine-friendly form. Binary files appear with dash counts, so they can be flagged without pretending that image, archive, or compiled artifacts have meaningful line totals.
Diffstat rows are less exact but still useful. A path | total +--- row supplies the total changed line count and a visual mark pattern. Additions and deletions can be estimated from the relative count of plus and minus marks. If the marks are missing, the row still contributes a total, but deletion pressure becomes less informative.
The scoring model combines changed-line volume with path and file-type signals. The wide-change threshold supplies the scale for the size term, while risk path terms, deletion pressure, generated-file terms, and binary rows add review weight. The final score is capped at 100 so very large files do not hide the other signals.
Here riskTerms is the number of configured risk terms found in the path. deletionHit, generatedHit, and binaryHit are either 1 or 0. Deletion pressure uses deletions / (additions + deletions) * 100, and the threshold comparison is inclusive: a file at the configured percentage is treated as deletion-heavy.
| Level | Boundary | Review meaning |
|---|---|---|
| Critical | score >= 70 |
Focused owner review should happen before merge. |
| High | 45 <= score < 70 |
Route the file to a reviewer who knows the touched domain. |
| Medium | 22 <= score < 45 |
Use the normal pull-request checklist and verify the named signal. |
| Low | score < 22 |
No strong line-count or path signal was found for that file. |
Separate queue rules catch pull-request level concerns that a file-by-file score can miss. A branch can have no single extreme file and still be broad enough to need splitting or extra reviewer coordination.
| Signal | Rule | What to check next |
|---|---|---|
| Risk path | Changed path contains one or more configured risk terms, such as migration, schema, auth, payment, infra, or lock. |
Ask for a reviewer who owns that subsystem or release path. |
| Large file | additions + deletions >= wide-change threshold. |
Confirm the file-level change is understandable and not mixing unrelated work. |
| Wide diff | changed file count >= wide-change threshold. |
Consider splitting the pull request or adding a subsystem checklist. |
| Deletion pressure | File or overall deletion mix is at least the configured deletion percentage. | Confirm rollback, migration, data, and test coverage before approval. |
| Generated or lockfile | Changed path contains a generated-file term such as package-lock, yarn.lock, dist/, generated, or min.js. |
Check that source and generated artifacts were produced by the same change. |
| Binary diff | Numstat reports dash counts instead of added and deleted lines. | Review the artifact source, provenance, and replacement reason directly. |
Everyday Use & Decision Guide:
Start with one branch or pull request and use git diff --numstat base...head when you can. That format gives additions and deletions directly, which makes deletion pressure and the churn chart more reliable. Paste git diff --stat output only when a numeric summary is all you have, and treat its addition and deletion split as an estimate.
Set Branch or PR to the review label, then tune Risk path terms to your repository. The defaults are intentionally conservative for deployment-sensitive changes: migrations, schemas, authentication, payments, infrastructure, and lockfiles. Add local terms such as billing, permissions, terraform, or secrets if those paths deserve early owner review in your team.
- Use
Wide-change thresholdas a first-pass review-size gate. A low value catches broad pull requests sooner; a higher value reduces noise for repositories with many small generated files. - Use
Deletion pressureto surface patches where removals dominate the line mix. The default35%is a useful prompt for rollback and data-review questions. - Keep
Generated or lockfile termsaligned with your build system so generated artifacts are checked for consistency without being mistaken for hand-written code. - Open
File Risk Ledgerfor path-by-path evidence, then useReview Queuefor the short list of actions that should happen before merge. - Use
Churn Weight Mapto spot the ten largest changed files by additions and deletions. The chart is a visual triage aid, not a substitute for reading the diff.
This works best before a human review starts or when a release manager needs to decide whether a pull request can stay as one change. It is a poor fit for deciding whether the code is correct, whether tests are enough, or whether a security fix is complete. Those calls still require the full diff, test results, owners, and release context.
If the summary says owner review, start with the critical and high rows before reading the rest of the branch. If it says standard review, still scan generated files, binary diffs, and deletion mix before treating the change as routine.
Step-by-Step Guide:
Use one review target at a time so file counts, deletion mix, and queue rows stay attached to the right branch.
- Enter the branch, pull request, or change identifier in
Branch or PR. The summary line and JSON context will use that label. - Paste
git diff --numstatrows intoGit diff stat, drop a.diff,.patch, or.txtfile onto the textarea, or useBrowse diff/TXT. The source badge should show the loaded character count. - Adjust
Wide-change thresholdto match the review size you want to flag. This threshold affects broad-diff queueing and the per-file size score. - Update
Risk path termsfor sensitive paths in the repository. Terms are comma-separated and matched against changed paths without case sensitivity. - Open
Advancedonly when the defaults need tuning.Deletion pressurecontrols deletion-heavy flags, andGenerated or lockfile termscontrols generated artifact detection. - Read the summary first. A nonzero
high riskcount means at least one file is critical or high, and the deletion-mix badge gives the overall removal share. - Move from
File Risk LedgertoReview Queue. Use the ledger for evidence and the queue for actions such as owner review, splitting the pull request, checking rollback coverage, or verifying generated artifacts.
If the page reports Paste git diff --stat or --numstat output, replace the source with summary rows rather than a full patch body. If a browsed file is rejected for size, create a smaller text file containing only the stat or numstat output.
Interpreting Results:
The primary summary counts critical and high files together as high risk. That count is the first triage cue, not the final review decision. A critical row means the score reached at least 70; a high row means it reached at least 45. Both deserve early owner attention.
A low score does not prove the change is safe. It only means the available stat rows did not hit the configured size, path, deletion, generated, or binary signals strongly enough. Read the actual diff for intent, tests, behavior, security, and rollout risk before approving.
Risk signalsexplains why a row was raised. Check those terms before assuming the score is about line count alone.Review notegives the next action for that file, from normal checklist review to focused owner review.Review Queueis pull-request level. It can list broad diff, deletion pressure, and generated-file actions even when a single row is not the only concern.Churn Weight Mapshows the largest ten changed files by total churn. Small but sensitive files can still matter even when they do not dominate the chart.
Worked Examples:
Payment retry branch with a lockfile. A branch labeled feature/payment-retry uses the default risk terms and pastes five numstat rows: payment source files, a database migration, a package lockfile, and a documentation file. The payment files and migration match configured risk terms, and the lockfile is both large and generated. File Risk Ledger ranks the payment files, migration, and lockfile as critical or high, while Review Queue calls for owner review and a generated-artifact consistency check.
Broad cleanup with many small files. A refactor changes fourteen files, each with fewer than twelve changed lines, and the Wide-change threshold remains at 12. Several file scores may stay medium or low, but Review Queue still lists Wide diff because the changed file count reaches the threshold. The practical response is to ask whether the pull request can be split by subsystem or reviewed with a short checklist for each area.
Full patch pasted by mistake. A reviewer pastes unified patch hunks that contain @@ markers and code lines but no stat rows. The summary changes to Check input and the warning says Paste git diff --stat or --numstat output. Running git diff --numstat base...head and pasting those rows gives the parser additions, deletions, and paths, so the ledger, queue, chart, and JSON can populate again.
FAQ:
Which Git output should I paste?
Use git diff --numstat when possible because it gives additions, deletions, and paths directly. git diff --stat rows with pipe bars are accepted too, but their addition and deletion split is estimated from the visible plus and minus marks.
Does a high score mean the change is bad?
No. A high or critical score means the diff has review pressure, such as large churn, sensitive path terms, deletion pressure, generated artifacts, or a binary row. The next step is focused review, not automatic rejection.
Why did a lockfile become high risk?
Lockfiles often have large line counts and match the default generated-file terms. That can raise the score and add a generated-file queue row so reviewers confirm the source dependency change and generated artifact belong together.
Are pasted diffs uploaded?
The analysis runs in the browser. Browsed text files are read locally, and the pasted diff summary is used to build the ledger, queue, chart, and JSON without sending the source text to a server for scoring.
Why do I only see a check-input warning?
The parser needs stat-style rows. Paste lines such as 42 8 src/payment/retry.js or src/payment/retry.js | 50 +++++---, not a full patch body. For file upload, keep the diff, patch, or text file under the browser-side size limit shown by the page.
Glossary:
- Diffstat
- A compact Git summary that shows each changed path with a total and a plus/minus bar.
- Numstat
- A machine-friendly Git summary with additions, deletions, and pathname columns.
- Churn
- The total changed-line volume for a file, calculated as additions plus deletions.
- Deletion pressure
- The share of changed lines that are deletions, used to flag cleanup, removal, migration, and rollback questions.
- Risk path term
- A configured word or phrase that raises review attention when it appears in a changed path.
- Review Queue
- The action list for broad diffs, high-risk files, deletion pressure, and generated-file checks.
References:
- Git diff documentation, Git project.
- Reviewing proposed changes in a pull request, GitHub Docs.
- Small CLs, Google Engineering Practices.