ADR-0005: Confidence Ownership — Planner + Reconciler, Not Capabilities¶

Status: Accepted Date: 2026-06-22 Deciders: Paxman core team Supersedes: — Superseded by: —

Context and Problem Statement¶

Confidence is a critical part of Paxman's output. The question is: which subsystem(s) may assign confidence to a candidate or resolved value?

The earlier drafts of PACKAGE_STRUCTURE_draft.md (now superseded by PACKAGE_STRUCTURE.md) had contradictory statements: - §3 guardrails: "Planner owns confidence assignment (not capabilities)." - §4 guardrails: "Confidence is exclusively owned by the Planner and Reconciler." - §6 guardrails: "Reconciler is the only layer that assigns final confidence and final truth."

This ADR resolves the contradiction.

Decision Drivers¶

No confidence inflation — capabilities should not assign confidence because each capability would calibrate differently, leading to incomparable scores.
Globally comparable confidence — confidence must be a single scale that the Reconciler can compare across capabilities.
Determinism — confidence assignment must be deterministic given the same inputs.
Replayability — the same ConfidenceBand for the same field across replays.

Considered Options¶

Option A — Reconciler is the only confidence assigner (chosen)¶

The Reconciler is the sole subsystem that assigns a confidence score (float) and band to a candidate or resolved value. The Planner may emit a target_confidence per field (read from the field's confidence_threshold) but does not score candidates. Capabilities return candidates without confidence.

Pros:

Strongest structural guarantee: one place assigns confidence.
Globally comparable: all confidence comes from the same calibration.
Easy to test: confidence logic is centralized.
Replay is straightforward: the same candidates produce the same confidence.

Cons:

The Reconciler has more responsibility.
Capabilities can't provide a "self-assessed confidence" hint.

Option B — Capabilities return confidence; Reconciler picks the best¶

Each capability returns a confidence score; the Reconciler picks the highest-confidence candidate.

Pros:

Capabilities "know" how good their output is.

Cons:

Confidence inflation: each capability calibrates differently, leading to incomparable scores.
Hard to test deterministically across capabilities.
Replay is harder: capabilities may be non-deterministic.

Option C — Planner assigns an initial confidence; Reconciler finalizes¶

The Planner emits a target_confidence and the Reconciler re-scores candidates based on evidence quality.

Pros:

Combines the "field has a target" with "evidence is graded" intuitions.

Cons:

Two places assign confidence, complicating the rule.
The "initial" vs "final" distinction is not load-bearing.

Decision Outcome¶

Chosen option: A (Reconciler is the only confidence assigner). The Reconciler is the sole subsystem that assigns a confidence float and band. The Planner may emit a target_confidence per field, but this is a threshold, not a score. Capabilities return candidates without confidence.

Concretely:

CapabilityResult has no confidence field. Candidates are returned with value, evidence_refs, and diagnostics only.
FieldPlan has a target_confidence field (read from the field's confidence_threshold) that the Executor uses for early stop.
The Reconciler assigns the final confidence (float) and confidence_band on FieldResult.

This ADR supersedes the contradictory language in the earlier PACKAGE_STRUCTURE_draft.md §3, §4, §6. The new PACKAGE_STRUCTURE.md §7.4, §7.5, §11.3, §11.5 are consistent with this decision.

Consequences¶

Positive¶

Strongest structural guarantee: one place assigns confidence.
Globally comparable confidence.
Confidence calibration is centralized.
Replay is straightforward.

Negative¶

Capabilities can't "self-assess" their output.
The Reconciler must reason about evidence quality, not just trust capability output.

Neutral¶

The Planner still emits a target_confidence per field (read-only from the contract).

Validation¶

Static check: no module outside paxman.reconciler imports the ConfidenceBand constructor for assignment.
CapabilityResult schema has no confidence field. Capabilities that try to set confidence fail type-check.
Property test: same candidates + same contract + same evidence → same confidence (deterministic).
Property test: monotonic — strictly better evidence never lowers confidence.

References¶

PRD.md §7.3, §7.7
ARCHITECTURE.md §4.2, §4.5, §8
PACKAGE_STRUCTURE.md §4, §5, §7, §11.3, §11.5