HA Pair Role Consistency Checker

HA role rows:

Use one row per node. Header is optional; legacy order is node, role, priority, vip, state.

Keep service groups together so active, standby, VIP, sync, and heartbeat evidence line up.

Expected active-up:

Set to 1 for ordinary active/passive HA pairs.

active-up

Minimum standby-up:

Use 1 for a pair; raise it only when the same service group has extra warm spares.

standby-up

Priority rule:

Choose the convention used by the platform, or ignore priority when roles are manually pinned.

Sync stale after:

Applies only when the sync column contains seconds or lag wording.

seconds

Check HA input rows

{{ error }}

Group	Service / VIP	Nodes	Active-up	Standby-up	Sync	Verdict	Copy
{{ row.group }}	{{ row.service }}	{{ row.nodes }}	{{ row.activeUp }}	{{ row.standbyUp }}	{{ row.sync }}	{{ row.verdict }}

Node	Group	Role	State	Priority	Sync	Heartbeat	Flags	Copy
{{ row.node }}	{{ row.group }}	{{ row.roleLabel }}	{{ row.stateLabel }}	{{ row.priorityLabel }}	{{ row.syncLabel }}	{{ row.heartbeatLabel }}	{{ row.flagsText }}

Severity	Group	Node	Check	Evidence	Next step	Copy
{{ row.severityLabel }}	{{ row.group }}	{{ row.node }}	{{ row.check }}	{{ row.evidence }}	{{ row.nextStep }}

No numeric priority values are available for the priority role ladder.

Export to PDF Fullscreen

Embed:

Customize

Include current inputs

Size

Advanced

Width

Height

Aspect ratio

Max height

Collapsible embed

Allow fullscreen

Referrer policy

Sandbox tokens

High-availability pairs rely on a clear ownership contract: one node serves the workload, one peer stays ready, and both peers agree on the service address, health state, synchronization state, and election evidence. When those facts drift, redundancy can become misleading. A standby that is down, a second active owner, a stale sync state, or a mismatched virtual IP can make a planned failover unsafe even when the service still appears reachable.

Active/passive systems are common around firewalls, routers, load balancers, databases, and clustered services that move a virtual IP (VIP) or service role between peers. The basic reading is simple enough to explain: the active owner should be up, the standby should be up, the peers should describe the same service, and the priority or preemption convention should match the owner you expect. The hard part is seeing all of that evidence together before maintenance, failover testing, or incident review.

HA pair role evidence diagram with active and standby nodes, shared VIP, heartbeat, sync state, and priority checks.

A role snapshot is still only evidence. It cannot prove that fencing is correct, quorum is healthy, routing has converged everywhere, or the replacement node will serve valid application traffic after a real failure. It helps catch obvious drift before that deeper validation begins.

Technical Details:

Active/passive role consistency has three main checks. The ownership check asks whether the expected number of active nodes are both marked active and up. The protection check asks whether enough standby nodes are also up. The evidence check compares supporting fields such as VIP, heartbeat, sync status, priority, preempt setting, and duplicate node rows so the role count is not read in isolation.

Priority evidence is platform-dependent. Some systems elect the higher numeric priority as the preferred owner, while others use a lower value or ignore priority when roles are pinned by policy. Preemption can also change whether the preferred node should reclaim ownership after recovery. For that reason, priority is best treated as an ownership clue that must match the selected convention, not as a universal rule across every HA product.

Rule Core:

The checker turns each row into normalized role, state, sync, heartbeat, preempt, and priority evidence, then evaluates the rows by service group.

HA pair role consistency rule core
Rule	Clean evidence	Finding trigger
Active ownership	`Active-up` equals `Expected active-up`.	A different count is critical, including duplicate active ownership and missing active ownership.
Standby readiness	`Standby-up` is at least `Minimum standby-up`.	A lower count is critical because failover protection is not ready.
Active state	The active owner is also `up`.	An active node marked down, failed, unavailable, or similar is critical.
VIP consistency	Rows in one group point to the same service or VIP value.	Multiple service or VIP values create a warning to split or correct the evidence.
Priority ownership	The active node matches the selected higher-priority, lower-priority, or ignored convention.	A peer that outranks the active owner creates a warning when priority checking is enabled.
Sync freshness	Text says synced, or numeric lag is less than or equal to `Sync stale after`.	Text such as stale, lagging, out-of-sync, or a numeric lag greater than the threshold creates a warning.
Heartbeat path	Heartbeat reads up, on, enabled, healthy, or similar.	Heartbeat down or failed is critical because peer coordination evidence is broken.
Duplicate evidence	Each node appears once within a service group.	Duplicate node rows create a warning because counts may no longer represent the actual pair.

Accepted Evidence:

Accepted HA role row evidence and normalization rules
Column	Recognized examples	How the result uses it
`group`	`group`, `cluster`, `pair`, `service`, `ha_group`	Rows with the same group are checked together as one service group.
`role`	`active`, `primary`, `master`, `standby`, `passive`, `backup`	Builds active-up and standby-up counts. Unknown or blank roles create warnings.
`state`	`up`, `ready`, `healthy`, `down`, `failed`, `degraded`, `maintenance`	Combines with role to decide whether the active or standby evidence is usable.
`sync`	`synced`, `stale`, `out-of-sync`, `lagging`, or numeric seconds	Marks stale state when text reports drift or numeric lag exceeds the chosen seconds threshold.
`heartbeat`	`up`, `on`, `enabled`, `down`, `failed`	Flags broken peer communication separately from service role and node state.
`priority`	Any numeric value, with commas ignored	Feeds the `Priority Role Ladder` and ownership mismatch check.
`preempt`	`on`, `off`, `enabled`, `disabled`, and similar on/off values	Mixed values create an informational finding because failback behavior may differ between peers.

The analysis is deterministic from the pasted rows and settings. It does not contact devices, query a cluster manager, test fencing, inspect quorum, or verify live traffic. Missing sync or heartbeat values are marked as unknown rather than treated as proof of failure, so unsupported columns should be added when the operational decision depends on that evidence.

Everyday Use & Decision Guide:

Start with a header row when possible. Use one row per node and include group, node, role, priority, vip, state, sync, heartbeat, and preempt when those facts are available. Without a header, the legacy order is node, role, priority, VIP, and state, which is fine for a quick check but leaves sync and heartbeat evidence empty.

For ordinary active/passive pairs, keep Expected active-up at 1 and Minimum standby-up at 1. Raise the standby value only when the same service group really has extra warm spares. Set Priority rule to the convention your platform uses, or choose Ignore priority ownership when roles are intentionally pinned and priority should not create a warning.

Use Pair Health Table first. It compresses each group into node count, active-up count, standby-up count, sync summary, and verdict.
Use Failover Findings when the verdict is Critical, Warning, or Review. The Evidence and Next step columns name the specific reason.
Use Node Role Ledger to find the row carrying stale sync, missing role, down heartbeat, duplicate-node evidence, or a priority flag.
Use Priority Role Ladder only when numeric priorities are present. It is helpful for spotting a standby that outranks the active owner.
Use JSON when another review needs the exact parameters, summary counts, pair health rows, node ledger, findings, and chart rows.

The best fit is a maintenance or incident-prep snapshot where you already trust the inventory source enough to paste it, but you want a fast consistency pass before touching the service. It is a poor fit for proving that a live failover is safe by itself. A clean read still needs device logs, cluster status, fencing or quorum evidence, and a service-level readiness check.

Treat critical findings as stop-and-verify cues before planned failover. A warning may still be serious when it explains your exact risk, such as stale sync before a stateful handoff or priority drift on a preemptive pair.

Step-by-Step Guide:

Build the row set first, then tune the active, standby, priority, and sync assumptions before using the verdict.

Paste rows into HA role rows. If the alert area says Paste at least two HA node rows to check a pair., add both peers or load Clean sample to inspect the expected shape.
Use a header row when your columns are not in the legacy order. If a row is missing node or role, the alert area or Failover Findings names the row that needs repair.
Set Expected active-up. The Pair Health Table Active-up column should match that count, such as 1/1 for a normal pair.
Set Minimum standby-up. Check the Standby-up column before assuming the service has a usable peer.
Choose Priority rule. Open Priority Role Ladder when priorities are numeric, then confirm the active owner is the node that should win under the selected convention.
Set Sync stale after to the lag threshold you consider acceptable. If Failover Findings reports state synchronization drift, confirm the source system or wait until the lag falls below that threshold.
Read the summary title and badges, then open Failover Findings for every non-clean group. Critical heartbeat, active ownership, and standby readiness findings should be cleared before a planned handoff.
Use Node Role Ledger and JSON only after the group-level verdict matches the inventory you intended to check.

Interpreting Results:

The first read is the group verdict in Pair Health Table. Clean means no active, standby, VIP, sync, heartbeat, duplicate, or priority inconsistency was found in the pasted evidence. Critical means the pair does not meet an active owner, standby readiness, active-state, or heartbeat requirement under the current settings. Warning means the row set deserves review before relying on the pair.

Do not treat a clean verdict as proof that failover will succeed. It only says the pasted evidence is internally consistent. The corrective check is to compare the result with a live cluster or device source before maintenance, especially when the decision depends on fencing, quorum, traffic convergence, or data freshness.

How to interpret HA pair role consistency outputs
Output cue	Trust this for	Do not overread
`Active-up`	Whether the expected number of active nodes are up in that group.	It does not prove the active node owns traffic at every client path.
`Standby-up`	Whether enough standby peers appear ready by role and state.	It does not prove the standby has current state or will pass application readiness checks.
`Sync`	Whether reported lag is stale under the selected seconds threshold.	Unknown sync means no usable sync field was provided, not that sync is healthy.
`Priority ownership`	Whether the active owner matches the selected priority convention.	It does not account for platform-specific override, hold, maintenance, or manual pinning rules.
`Heartbeat`	Whether the pasted peer-link evidence says the HA control path is up.	It does not test the network or replace cluster-manager status.

A split-brain shape is the most urgent false-confidence case. If two active-up nodes appear where Expected active-up is 1, the service may still be responding somewhere, but ownership evidence is unsafe until the duplicate active state is resolved or explained by a supported multi-active design.

Worked Examples:

Two groups with one degraded pair:

The default rows contain corp-fw with fw-a active, fw-b standby, both up, both synced, and the same VIP. Pair Health Table reads Active-up as 1/1, Standby-up as 1/1, and the group can stay clean. The same input also contains branch-fw with stale sync on the active node and a standby peer marked down. That group shows Standby-up as 0/1, a critical verdict, and Failover Findings points to standby readiness plus state synchronization.

Split-brain sample before a firewall change:

Loading Split-brain sample creates edge-fw with edge-a and edge-b both active and up against the same VIP, while edge-c is standby but down. With Expected active-up set to 1 and Minimum standby-up set to 1, Pair Health Table shows Active-up as 2/1 and Standby-up as 0/1. Failover Findings should include critical active ownership and standby readiness rows before anyone treats the pair as protected.

Priority mismatch with otherwise healthy nodes:

A pair with db-a,active,100,10.10.10.8,up,synced,up,on and db-b,standby,150,10.10.10.8,up,synced,up,on can have clean role and state counts. If Priority rule is Higher priority should be active, the active owner has lower priority than its standby peer, so Failover Findings reports Priority ownership. If that pair is intentionally pinned, switch to Ignore priority ownership or document the platform override before using the warning in a runbook.

Bad row format during a quick paste:

If only one peer row is pasted, the validation area reports that at least two node rows are needed for a useful HA pair check. If a row has a node name but no role, the warning names the row with the missing role. The fix is to add the missing peer or role value, then recheck Pair Health Table before copying the result into an incident note.

FAQ:

Can this prove an HA pair is safe to fail over?

No. It checks consistency in the rows you paste. A real failover decision still needs live device or cluster status, fencing or quorum evidence when applicable, route or VIP confirmation, and service readiness checks.

What should I paste if my export has different column names?

Include a header row. Aliases such as cluster, host, ha_role, device_priority, floating_ip, config_sync, and peer_link are recognized for the main fields.

Why is priority flagged when the active and standby counts look right?

The selected Priority rule says which numeric priority should own the service. If a standby outranks the active node under that rule, the checker reports priority ownership drift even when both nodes are up.

Why does sync show unknown?

The row did not include usable sync evidence, or the value was not recognized as synced, stale, lagging, out-of-sync, or numeric seconds. Add the sync column when state freshness matters for the review.

Does the checker send the HA rows to network devices?

No. The rows are evaluated in the page from the pasted text and settings. The result can still contain sensitive infrastructure names, VIPs, and priorities, so treat copied rows or JSON as operational data.

Glossary:

HA pair: A high-availability pair of peers intended to keep one service available when one node fails or is taken down.
Active-up: A node that is both assigned the active role and marked up in the pasted evidence.
Standby-up: A standby peer that is also marked up and therefore looks available for failover by role and state.
VIP: Virtual IP address, the service address that usually moves or is owned by the active peer in the pair.
Split-brain: A failure shape where more than one peer appears active for a service that should have a single owner.
Preempt: A failback behavior where a preferred node may reclaim active ownership after it returns.
Sync lag: The reported freshness delay between peers, read as stale when it exceeds the selected threshold.

References:

Configuring and managing high availability clusters, Red Hat Documentation.
Fencing, Pacemaker Explained, ClusterLabs.
RFC 5798: Virtual Router Redundancy Protocol (VRRP) Version 3 for IPv4 and IPv6, IETF, March 2010.
Keepalived Overview, Red Hat Documentation.