Skip to content

Planning

Status: V1 Audience: Paxman users inspecting the plan; Paxman contributors extending the planner. Related docs: GLOSSARY.md §Plan, §Planner, ARCHITECTURE.md §4 Planner Subsystem, ADR-0001 (field-centric planning), ADR-0002 (rule-based V1 planner), docs/specs/capability-cost-model.md (scoring formula).

The planner is the deterministic, rule-based brain of Paxman. It synthesizes a per-field ExecutionPlan from a CanonicalContract, an InputProfile, a Budget, a Policy, and the capability registry. The planner is a pure function (per ADR-0002): the same inputs always produce the same plan, byte-for-byte.

This document explains the planner's role, the heuristic chain, the scoring formula, the budget and policy gates, and the determinism guarantees.


1. What planning is — and isn't

The planner only synthesizes the plan. It does not run capabilities, does not consume budget, does not produce evidence, and does not assign confidence. The Executor's job is to run the plan; the Reconciler's job is to grade the candidates.

                  ┌────────────────┐
                  │  Canonical     │
                  │  Contract      │
                  └───────┬────────┘
                  ┌───────▼────────┐
                  │  InputProfile  │ ← make_profile(input_data)
                  └───────┬────────┘
                          │  + Budget, Policy, Registry
                  ┌────────────────┐
                  │   Planner      │  ← pure function
                  │   (rule-based) │
                  └───────┬────────┘
                  ┌────────────────┐
                  │ ExecutionPlan  │ ← one FieldPlan per required field
                  └────────────────┘

The plan is a declarative artifact: a list of capability chains with their step configs and target_confidence thresholds. The Executor interprets the plan; the planner never sees the Executor.


2. Field-centric planning

Per ADR-0001, Paxman is field-centric, not document-centric. Each required field gets its own FieldPlan:

@attrs.frozen(slots=True)
class FieldPlan:
    field_id: str                                # CanonicalField.id
    capability_chain: tuple[FieldPlanStep, ...]  # ordered capability chain
    target_confidence: float = 0.8               # threshold for SUCCESS
    fallback_policy: ResolutionPolicy = ResolutionPolicy()  # per-field
    early_stop: bool = True                      # short-circuit on threshold met

A FieldPlanStep carries:

  • capability_id — e.g. "regex_extraction".
  • capability_version — the resolved version (e.g. "1.0").
  • config — frozen per-step config (e.g. the regex pattern for regex_extraction). Wrapped in types.MappingProxyType to prevent post-construction mutation.
  • note — optional human-readable note.

The Executor walks the FieldPlan in step order, invokes the capability at each step, and collects candidates. If early_stop is True (the default) and a step's candidate meets target_confidence, the Executor stops — there is no reason to keep spending budget.

The ExecutionPlan carries:

  • contract_id — the canonical contract's id.
  • input_content_hash — the SHA-256 of the input (matches the InputProfile.content_hash).
  • fieldstuple[FieldPlan, ...] in declaration order (not dict-iteration order).
  • diagnostics — planner-emitted notes (e.g. budget exclusions).
  • planner_version — the planner version (e.g. "1").

Declaration order matters: the Executor walks fields in plan order, not in dict-iteration order. The plan encodes order explicitly.


3. The 7-step heuristic chain

For each required field, the planner walks this chain in order (per ARCHITECTURE.md §4.2 and the Oracle M7 clarification):

  1. Explicit evidence. A planner rule on the InputProfile that decides whether the input already contains the field's value (e.g. a KEY: VALUE line in a text/plain payload for a STRING field tagged header).
  2. Local deterministic extraction. regex_extraction, validation (and any other LOCAL_DETERMINISTIC-tier capabilities registered for this output_type).
  3. Structured lookup. lookup (V1: in-memory dict backend).
  4. Derived computation. Formula over resolved fields (V2).
  5. Local inference. inference with a local model.
  6. Remote inference. inference with a remote provider.
  7. UNRESOLVED. Terminal; no chain. The field's CandidateResult.status is UNRESOLVED.

Each step picks the highest-scoring capability for the field's output_type from the available registry, subject to:

  • Policy gates. Policy.allow_remote_inference=False drops step 6. Policy.allow_local_inference=False drops step 5.
  • Budget gates. A Budget.max_total_cost_usd < Decimal("0.001") drops both 5 and 6 (inference's minimum USD cost is 0.001).
  • Capability availability. A step with no registered capability for the field's output_type is dropped.
  • Step config. For regex_extraction, the planner must have a pattern (read from the field's constraints or a regex semantic tag). If no pattern is available, the step is skipped.

The chain stops at the first step that produces a non-empty candidate set. If all steps are exhausted without producing a candidate, the plan emits an UNRESOLVED CandidateResult for the field.


4. Scoring

The planner scores every (capability, field) pair and picks the best within each tier. The full scoring formula is in docs/specs/capability-cost-model.md §4. The V1 weights are:

Weight Value Purpose
TIER_WEIGHT 10000 Tier penalty: lower tier → lower score.
USD_WEIGHT 1000000 USD cost dominates.
MS_WEIGHT 1 Latency is the tie-breaker.
score = tier_rank * TIER_WEIGHT + (usd * USD_WEIGHT) + (ms * MS_WEIGHT)

where tier_rank is the integer value of the CapabilityTier enum (LOCAL_DETERMINISTIC=1, STRUCTURED_LOOKUP=2, LOCAL_INFERENCE=3, REMOTE_INFERENCE=4). lower is better. The V1 calibration makes USD dominate ms dominate tier — a slightly slower deterministic capability beats a much cheaper non-deterministic one.

The score is computed once per (capability, field) pair during planning; the planner stores the result in the FieldPlanStep for replay (replay rehydrates the plan, but does not re-plan; the recorded score is part of the evidence).


5. Budget and policy gates

The planner respects Budget and Policy constraints when synthesizing the plan. The gates are applied at plan time (not at execute time):

Gate Effect on the plan
Policy.allow_remote_inference=False Step 6 (remote inference) is dropped from every FieldPlan.
Policy.allow_local_inference=False Step 5 (local inference) is dropped from every FieldPlan.
Budget.max_total_cost_usd < Decimal("0.001") Steps 5 and 6 are dropped (inference's minimum USD cost is 0.001).
Policy.unresolved_acceptable=True The Executor's post-loop status is PARTIAL_SUCCESS instead of UNRESOLVED if any required field is unresolved.
Policy.currency_policy=ALLOW_FX (no fx_rate field) The Reconciler rejects cross-currency MONEY candidates.

The Executor also applies dynamic budget gates during execution (the BudgetTracker short-circuits when the running total exceeds max_total_cost_usd). The planner cannot predict this; it only encodes the expected plan based on the call-site constraints.


6. Effective policy

The planner and reconciler compute an effective policy = derive_effective_policy(call_site_policy, contract_policy):

  • call_site_policy — the Policy passed to paxman.normalize().
  • contract_policy — the ContractPolicy carried by the canonical contract (per-contract overrides).

The contract's ContractPolicy takes precedence over the call-site Policy for fields in the contract. The combined EffectivePolicy is passed to build_field_plan(...) and to the Reconciler.

This means a contract can declare a higher confidence_floor than the call-site default, and the planner will respect the contract's value for those fields. See paxman.planner.policies for the full logic.


7. Determinism

The planner is a pure function (per ADR-0002). It:

  • Does not read the clock.
  • Does not read random state.
  • Does not perform I/O.
  • Does not invoke capabilities.

The only source of non-determinism in the planner is the capability registry's iteration order. Mitigations:

Property tests in tests/property/test_planner_determinism.py verify byte-equal plans for 100 random (contract, input, policy) tuples.


8. Key types

Type Module Purpose
InputProfile paxman.planner.input_profile Lightweight metadata about the raw input.
FieldPlan paxman.planner.field_plan Per-field plan (capability chain + target confidence).
FieldPlanStep paxman.planner.field_plan A single capability invocation.
ExecutionPlan paxman.planner.field_plan The planner's full output (one FieldPlan per required field).
PlanDiagnostic paxman.planner.field_plan A planner-emitted note (e.g. budget exclusion).
EffectivePolicy paxman.planner.policies The combined call-site + contract policy for one run.

The full public surface is re-exported from paxman.planner.__init__. The Planner module is a private subsystem — it is not a public API; do not import from it directly in user code.


9. Inspection and debugging

The plan is serializable via the stable JSON encoder. To inspect a plan:

from paxman import planner, profile
import json

plan = planner.plan(canonical_contract, profile(input_data), budget, policy, registry)
print(json.dumps(plan.to_dict(), indent=2))

For unit tests, use paxman.testing.plans() (a Hypothesis strategy that generates random ExecutionPlan instances with stable replay_hash).


10. Common pitfalls

Pitfall Why it bites Fix
Registering a capability with a non-canonical version (e.g. "v1") The registry's tie-breaker falls back to insertion order. Use a real semver ("1.0", "1.0.1").
Setting Policy.allow_remote_inference=True with Budget.max_total_cost_usd = 0 The plan is empty (every step is budget-excluded). Either set a positive budget or disable remote inference.
Calling planner.plan() with a Policy that has no currency_policy The Reconciler defaults to STRICT_MATCH and rejects MONEY candidates. Set currency_policy explicitly.
Expecting the planner to short-circuit on a regex_extraction hit The planner does not invoke capabilities; the Executor does. The Executor short-circuits per the EarlyStop policy.
Mixing confidence_floor per-field with the call-site Policy.confidence_floor The contract-level value overrides; the call-site value is ignored. Document the precedence in the contract's ContractPolicy.

11. See also