Avro Schema Compatibility Validator
Compare released and proposed Avro schemas by compatibility mode, with field-level blockers, alias/default checks, and registry-ready notes.| Severity | Direction | Version | Schema path | Finding | Release action | Copy |
|---|---|---|---|---|---|---|
| {{ row.severityText }} | {{ row.direction }} | {{ row.version }} | {{ row.path }} | {{ row.message }} | {{ row.action }} | |
|
No compatibility findings
The parsed schemas did not produce release blockers or advisory rows for the selected policy.
|
||||||
| Area | Status | Evidence | Release action | Copy |
|---|---|---|---|---|
| {{ row.area }} | {{ row.status }} | {{ row.evidence }} | {{ row.action }} |
{{ registryRequestText }}
{{ jsonText }}
Introduction:
Avro schema compatibility is about whether existing data and clients can keep working after an event contract changes. Avro data is written with one schema and read with another, so the question is not simply whether the proposed schema is valid JSON. The important question is whether a reader can resolve the writer's records, fields, names, unions, enums, fixed values, and logical annotations without losing the meaning consumers depend on.
This matters most in event streams, data lakes, and schema registries where many producers and consumers move at different speeds. A producer might add a field before every consumer is deployed. A consumer might replay old Kafka messages with a newer reader schema. A data platform might compare the latest proposed contract against several historical versions because long-lived topics can be replayed years after the first version was registered.
The familiar compatibility names describe upgrade promises. Backward compatibility means a new reader can read data written with an older schema. Forward compatibility means an older reader can read data written with the new schema. Full compatibility means both directions are expected. Transitive modes expand the comparison from the latest released version to every released version in the pasted history.
Small Avro edits can have large effects. Adding a new reader field without a default breaks backward reads because old data has no value for that field. Removing an enum symbol can break any old event that still contains that symbol. Changing a fixed size is not a harmless rename. Logical types are also easy to underestimate: a timestamp stored as a long can carry different semantics depending on its annotation, and decimal precision or scale changes can alter what numbers mean.
Aliases are Avro's controlled escape hatch for renames. They can let a reader map a released name to a new name, but they are not a general substitute for consumer testing. A compatibility pass means the schema resolution rules line up for the selected mode; it does not prove that every downstream database, generated class, analytics query, or business process will interpret the new contract correctly.
The safest release review treats syntax, compatibility direction, history depth, defaults, aliases, logical types, and registry policy as separate questions. A proposed schema can parse cleanly and still be risky for replay, and a schema can be technically compatible while still requiring a coordinated producer or consumer rollout.
How to Use This Tool:
Paste the released history and the proposed Avro schema, then read the ledger as a release review rather than a generic JSON syntax check.
- Enter the Schema Registry subject so the verdict, generated request, and exported evidence match the subject you intend to update.
- Choose the Compatibility mode. Use BACKWARD for new readers against latest old data, FORWARD for old readers against new data, FULL for both directions, and the transitive variants when every pasted released version matters.
- Set Name and alias matching. Keep reader aliases on when a proposed schema intentionally renames fields or named types with Avro aliases. Use exact names when aliases should not be accepted during review.
- Set Logical type policy. Use strict review when decimal precision, decimal scale, timestamps, dates, UUIDs, and similar annotations are contract-sensitive.
- Paste Released schema history from oldest to newest. The field accepts a JSON array of schemas or separate schema documents split by a line containing only three hyphens.
- Paste the Proposed schema. Browse or drop AVSC, JSON, or TXT files when the source is already on disk, then use Format only after parsing is clean.
- Use Compatibility Ledger for field-level findings, Rule Summary for policy-level decisions, Compatibility Risk Mix for severity counts, and Registry Request when you want a starter command for a registry compatibility check.
Interpreting Results:
A passing result means the parsed schemas produced no critical or high-severity blocker for the selected policy. It does not mean every consumer is deployed, every code generator agrees, or every old message has been replay-tested. Treat the ledger paths as where to inspect the schema, and treat the release action as the concrete fix or rollout question.
| Severity | Typical meaning | Release response |
|---|---|---|
| Critical | Schema resolution can fail for a required direction, such as missing defaults, incompatible type changes, enum symbol loss, fixed-size drift, or unresolved references. | Do not register under the selected policy until the schema or rollout plan changes. |
| High | A logical type or default problem is serious enough to block a strict release review. | Keep the contract stable or deliberately move to a new subject or coordinated migration. |
| Medium | A fallback may exist, but the semantics need review, such as an enum default absorbing removed symbols. | Confirm the fallback is acceptable to consumers before registering. |
| Low | A non-blocking drift or advisory point deserves test coverage or reviewer attention. | Record the decision and keep contract tests around the changed path. |
| Info | The change is usually compatible, such as a promoted numeric primitive or ignored writer field. | Keep it intentional and verify downstream expectations. |
The main false-confidence trap is selecting a non-transitive mode for a topic that still replays older versions. If replay from early history matters, compare against every released schema or confirm that the registry policy enforces a transitive mode.
Technical Details:
Avro resolution compares a writer schema with a reader schema. The writer schema describes how the data was encoded. The reader schema describes what the consumer expects while decoding. Compatibility checks are therefore directional: reversing the writer and reader can change the answer even when the two schema documents are the same pair.
Transformation Core
| Stage | Rule | Evidence in results |
|---|---|---|
| Parse history | Read released schemas from a JSON array or hyphen-separated schema documents. | Rule Summary reports released version count and parse status. |
| Validate structure | Check named references, record names, field arrays, duplicate field names, enum symbols, fixed sizes, union shape, and default/type alignment. | Compatibility Ledger includes parse and structure findings with schema paths. |
| Select comparison set | Use the latest released schema for non-transitive modes, or all released schemas for transitive modes. | Rule Summary shows latest-only or all-history scope. |
| Run direction checks | Backward compares released writer to proposed reader; forward compares proposed writer to released reader. | Ledger rows carry Backward, Forward, Structure, or Parse direction labels. |
| Sort and cap findings | Critical and high findings sort first, then lower-severity findings up to the selected render limit. | Input warnings note hidden findings when the rendered row limit is reached. |
Primitive compatibility follows Avro promotion rules where applicable. An int writer can be read as long, float, or double; long can promote to float or double; float can promote to double; and string/bytes promotion is recognized. Other primitive changes are treated as incompatible unless a compatible union branch preserves the old shape.
| Schema element | Compatible pattern | Risky or blocking pattern |
|---|---|---|
| Reader field | New reader field has an Avro default, often with null first for nullable unions. |
New reader field has no default, so old data cannot supply it. |
| Record, enum, fixed name | Name stays stable or the reader supplies an alias for the released name. | Name changes without a matching reader alias. |
| Enum | Reader accepts every writer symbol or supplies a valid enum default. | Writer symbol is missing from reader enum with no usable default. |
| Fixed | Name and byte width stay stable. | Fixed size changes, even if the name stays similar. |
| Logical type | Decimal precision and scale, timestamps, dates, and UUID semantics stay intentional under the selected policy. | Strict logical-type review finds drift that consumers may interpret differently. |
Default values are checked against the first branch of a union because Avro uses that branch when resolving missing reader fields. This is why the common nullable pattern is ["null", "type"] with a null default, rather than placing null later in the union and hoping the reader will infer intent.
Limitations and Privacy Notes:
The validator performs local schema parsing and rule checks for the visible report. It is not a substitute for the exact registry, serializer, language binding, or consumer test suite used in production.
- Generated registry commands are starter text and should be adapted to the real registry URL, authentication, subject naming, and release process.
- Compatibility rules can vary by registry product, format, and policy. Use the registry's own compatibility endpoint before registering a production schema.
- Schema text stays in the current browser page for the local analysis and exports.
Worked Examples:
Safe optional field. A proposed customer event adds region as a string with a default value. In backward mode, old events can be read by the proposed reader because the missing field has a value to use during decoding.
Breaking field reshape. A released email string becomes a nested record without a union or staged migration. The ledger reports a critical type change because consumers expecting the old string cannot resolve the new record shape in the selected direction.
Transitive surprise. The latest released schema is compatible, but an older history version lacks an enum symbol default needed by the proposed reader. The issue appears only after switching from BACKWARD to BACKWARD_TRANSITIVE.
FAQ:
Is BACKWARD always the safest mode?
Backward compatibility is common for Kafka-style replay because new consumers can read old data. It is not automatically enough when old consumers must read new data or when every historical version must remain readable.
Why do defaults matter so much?
A default gives a reader a value when old data does not contain a newly added field. Without it, the reader has no value to resolve for that field.
Can aliases make any rename safe?
No. Aliases can map released names to reader names for schema resolution, but consumers, generated code, downstream storage, and business semantics still need review.
Why does NONE still show findings?
NONE turns off compatibility direction checks, but syntax and Avro structure checks still help catch malformed schema documents before they reach a registry or build step.
Glossary:
- Writer schema
- The schema used when data was encoded or will be produced.
- Reader schema
- The schema used by a consumer while decoding data.
- Backward compatibility
- New readers can read data written with older schemas.
- Forward compatibility
- Older readers can read data written with the new schema.
- Transitive compatibility
- The new schema is compared against all released versions in scope, not just the latest one.