Paxman Package Architecture (V1)¶
Status: Stable v1 (promoted from draft; this is the source of truth for module-level design). Audience: Engineers implementing or extending Paxman. Related docs: ARCHITECTURE.md, GLOSSARY.md
This is the stable mental model for the codebase, derived from ARCHITECTURE.md. It reflects the V1 product definition: a contract-driven, field-centric, deterministic normalization engine.
It supersedes the earlier PACKAGE_STRUCTURE_draft.md. The numbering is preserved where possible to keep references stable.
1. Top-Level Structure (Conceptual Layers)¶
Paxman is split into seven strict subsystem zones:
paxman/
│
├── contract/ # canonicalization + validation of caller contracts
├── planner/ # field-centric plan synthesis (deterministic, rule-based)
├── capabilities/ # atomic, reusable operations (V1 surface)
├── executor/ # runs the plan, collects evidence, stops early
├── reconciler/ # merges candidates, assigns confidence, resolves truth
├── artifact/ # final normalized output + evidence + replay data
│
└── api/ # public surface (the ONLY thing users touch)
Everything else is derived from these boundaries. The api/ layer is a hard boundary enforced by tests; the rest are internal.
Core principle: Contracts define what output must look like. Planner defines what to do. Executor defines how to run it. Reconciler defines what is ultimately true. Artifact defines what was produced. API defines what the user can ask for.
2. Module Dependency DAG (Hard Rule)¶
The dependency graph is strictly layered. A module may import from its own layer, any layer to its right, and the cross-cutting modules at the bottom — never to its left.
┌──────────┐
│ api │
└────┬─────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌────────────┐
│contract│ ──▶ │ planner │ ──▶ │ capabilities│
└────────┘ └────┬─────┘ └──────┬─────┘
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ executor │ ──▶│reconciler│
└────┬─────┘ └────┬─────┘
│ │
└───────┬───────┘
▼
┌─────────┐
│ artifact│
└─────────┘
┌────────────────────────────────────────┐
│ Cross-cutting (imported by all): │
│ - errors - types - logging │
│ - versioning - protocols │
└────────────────────────────────────────┘
Forbidden imports (enforced by CI / ruff / import-linter):
contract/may NOT import fromplanner/,executor/,reconciler/,artifact/,capabilities/, orapi/.planner/may NOT import fromexecutor/,reconciler/,artifact/, orapi/.capabilities/may NOT import fromplanner/,executor/,reconciler/,artifact/, orapi/.executor/may NOT import fromreconciler/,artifact/, orapi/.reconciler/may NOT import fromartifact/orapi/.artifact/may NOT import fromapi/.api/may import from any internal layer.- No subsystem may import from
capabilities/v1/*directly; the registry is the only entry point.
Tooling: this DAG is enforced by import-linter configured in pyproject.toml.
3. contract/ — "Translation + Validation Boundary"¶
3.1 Meaning¶
contract/ is Paxman's adapter + validation subsystem. It is the only layer that knows about external contract formats.
It converts caller-provided contracts (Pydantic, JSON Schema, Dict DSL, OpenAPI) into Paxman's canonical internal representation, and rejects anything invalid.
3.2 Structure¶
contract/
├── canonical.py # CanonicalContract + CanonicalField data models
├── validator.py # rejects invalid contracts → INVALID_CONTRACT
├── semantics.py # semantic tag handling, structural vs semantic layers
├── registry.py # adapter lookup by contract format
├── _types.py # internal: FieldType enum, Constraint, ResolutionPolicy
│
└── adapters/
├── base.py # ContractAdapter protocol (the SPI)
├── pydantic.py # Pydantic model → canonical
├── json_schema.py # JSON Schema → canonical
├── dict_dsl.py # internal Dict DSL → canonical
└── openapi.py # Optional OpenAPI → canonical
3.3 Public surface of this module¶
| Symbol | Visibility | Notes |
|---|---|---|
CanonicalContract |
Re-exported in api.types |
The frozen internal form |
CanonicalField |
Re-exported in api.types |
Per-field canonical spec |
FieldType |
Re-exported in api.types |
STRING, INTEGER, ... |
ContractAdapter |
Re-exported in api.protocols |
SPI for adding new adapters |
ResolutionPolicy |
Re-exported in api.types |
Per-field fallback policy |
register_adapter |
api |
Register a third-party adapter at runtime |
pydantic.adapt(model_class) |
api.normalize accepts Pydantic directly |
Convenience |
3.4 Internal invariants (must hold; tested)¶
- Adapters only produce canonical output; they never execute or plan.
- Validator is mandatory — invalid contracts fail immediately with
INVALID_CONTRACT. - Core types:
STRING,INTEGER,DECIMAL,BOOLEAN,DATE,ENUM,OBJECT,ARRAY,MONEY. MONEYis first-class (precision, currency, ISO 4217, FX, rounding).CanonicalFieldcarries:id,path,name,type,required,critical,nullable,confidence_threshold,evidence_required,semantic_tags,fallback_policy.CanonicalContractcarries:id,version,fields,constraints,policies.- No adapter may leak its source representation downstream — the only output of an adapter is a
CanonicalContract. - Adapters are pure functions — same input → same
CanonicalContract(no random, no clock, no I/O).
3.5 Per-module testing strategy¶
- Unit tests — adapter output for representative Pydantic / JSON Schema / Dict DSL / OpenAPI contracts.
- Golden tests — frozen canonical snapshots for each adapter.
- Validator tests — every error path.
- Property tests — roundtrip: Pydantic → canonical → Pydantic should preserve the structure (within Pydantic v2's expressible subset).
- No I/O — adapter tests do not touch network, disk, or clock.
Contract = "the only place that knows about external schemas"
4. planner/ — "Field-Centric Plan Synthesis"¶
4.1 Meaning¶
The planner is the deterministic brain of Paxman. It does NOT execute anything.
It reads the canonical contract, analyzes the input profile, and produces a field-by-field execution plan — one plan per required field, not one plan per document.
4.2 Structure¶
planner/
├── planner.py # top-level: contract + input + budget → plan
├── heuristics.py # ordering rules (explicit evidence → ... → UNRESOLVED)
├── scoring.py # candidate cost / confidence / coverage scoring
├── policies.py # budget, accuracy, fallback policies
├── field_plan.py # FieldPlan + FieldPlanStep data models
├── input_profile.py # lightweight input classifier (no capability invocation)
└── _registry.py # internal: capability registry handle
4.3 Public surface¶
| Symbol | Visibility | Notes |
|---|---|---|
ExecutionPlan |
Internal | Not re-exported in api/ |
FieldPlan |
Internal | Per-field plan |
FieldPlanStep |
Internal | One capability invocation in a chain |
Heuristic |
api.protocols |
SPI for new heuristics |
register_heuristic |
api |
Pluggable heuristic (post-V1) |
4.4 Internal invariants¶
- Field-centric, not document-centric. Each required field gets its own
FieldPlan. - Deterministic. Given the same canonical contract + input profile + config + capability set, the planner MUST produce the same plan.
- Rule-based in V1. No LLM planner, no agent planner, no AI-generated planning logic.
- Heuristic ordering (highest to lowest preference):
- Explicit evidence
- Local deterministic extraction
- Structured lookup
- Derived computation
- Local inference
- Remote inference
UNRESOLVED- Planner never touches raw input execution — it only emits
FieldPlans. - Planner does NOT assign confidence — capabilities return candidates; the Reconciler assigns final confidence. The Planner may emit a
target_confidence(read from the field'sconfidence_threshold) but never scores a candidate. See ADR-0005.
4.5 Per-module testing strategy¶
- Unit tests — heuristic selection for each (input profile × contract shape) combination.
- Golden tests — frozen
ExecutionPlansnapshots for representative inputs. - Property tests — determinism: identical inputs → byte-identical
ExecutionPlanJSON. - Budget tests — planner respects
max_remote_inference_calls,max_total_cost_usd, etc. - No I/O — planner tests use fake capability registries and fake input profiles.
Planner = "synthesizes the cheapest trustworthy path per field"
5. capabilities/ — "Atomic Operations"¶
5.1 Meaning¶
Capabilities are the only executable primitives. They are reusable atomic operations with metadata describing input, output, cost, and determinism.
LLMs are providers behind inference capabilities, not capabilities themselves.
5.2 Structure¶
capabilities/
├── base.py # Capability protocol (the SPI)
├── spec.py # CapabilitySpec data model
├── result.py # Candidate, EvidenceRef, Diagnostic
├── registry.py # capability lookup + metadata
│
└── v1/
├── text_extraction.py # pull text out of raw input
├── regex_extraction.py # pattern-based local extraction
├── lookup.py # structured / retrieval-based extraction
├── inference.py # model-backed inference (LLM is a provider)
└── validation.py # verify a candidate value
5.3 Public surface¶
| Symbol | Visibility | Notes |
|---|---|---|
Capability |
api.protocols |
SPI for new capabilities |
CapabilitySpec |
Internal | Metadata |
Candidate |
Internal | Single proposed value |
EvidenceRef |
Internal | Pointer to a piece of evidence |
Diagnostic |
Internal | Structured note |
register_capability |
api |
Register a third-party capability at runtime |
get_capability(id, version) |
api |
Look up a registered capability |
5.4 Internal invariants¶
- V1 capability surface is deliberately small:
text_extractionregex_extractionlookup(deterministic backend: yes; vector backend: no)inference(always non-deterministic when model is remote)validation- Capabilities never assign confidence. They return candidates + evidence + diagnostics only.
- Confidence is exclusively owned by the Reconciler (the Planner may emit a
target_confidencebut never scores a candidate). This prevents confidence inflation. See ADR-0005. - Every capability MUST expose a
CapabilitySpecwith input/output, cost, determinism flag. - Capabilities are stateless and side-effect-free except for declared external effects (which MUST be captured in evidence).
- Capabilities never read the canonical contract directly — they receive a
CapabilityContextbuilt by the Executor.
5.5 Per-module testing strategy¶
- Unit tests per capability — known input → known candidates + evidence.
- Determinism tests for deterministic capabilities (regex, validation) — property test.
- Provider-mock tests for non-deterministic capabilities (text_extraction, inference) — mock the provider, assert shape of
CapabilityResult. - No raw input in capability tests — fixtures only.
Capabilities = "pure atomic operations, never the source of truth"
6. executor/ — "Deterministic Runner"¶
6.1 Meaning¶
The Executor runs the plan produced by the Planner. It does not replan, reroute, or optimize.
It walks the per-field plans in order, invokes capabilities, collects evidence, and stops early when the contract is satisfied with acceptable confidence.
6.2 Structure¶
executor/
├── executor.py # top-level execution driver
├── field_runner.py # executes one FieldPlan
├── context.py # CapabilityContext builder
├── evidence.py # evidence + diagnostics collection
├── early_stop.py # short-circuit when contract is satisfied
├── budget_tracker.py # tracks cost / latency / invocations against Budget
└── execution_state.py # transient in-flight state (never authoritative)
6.3 Public surface¶
| Symbol | Visibility | Notes |
|---|---|---|
Executor |
Internal | Not re-exported |
CandidateResult |
Internal | Per-field output of execution |
6.4 Internal invariants¶
- Executor follows the plan exactly as the Planner defined it. No replanning, no rerouting, no structural retries.
- Executor passes context forward — never mutates the plan.
- Executor stops early when a field hits its
confidence_threshold(the Executor knows the target because the Planner embeds it in theFieldPlan; the Executor does not score the candidate — the Reconciler does). - Executor returns explicit
UNRESOLVEDcandidates when it cannot satisfy a field — never silent guessing. - Executor never assigns final confidence — only collects candidate evidence.
- Executor never reads raw input directly — it receives an opaque
InputDatahandle and passes aCapabilityContextto each capability.
6.5 Per-module testing strategy¶
- Unit tests — sequential execution of a 3-field plan; assertion on invocation order and stop conditions.
- Mocked capabilities — no real capabilities in Executor tests; capabilities are mocks.
- Budget tests — Executor short-circuits when budget is exhausted.
- Early-stop tests — Executor stops on first successful candidate per field.
- No planner mocking needed — Executor tests inject a fake
ExecutionPlan.
Executor = "hands that execute the plan, never rewrite it"
7. reconciler/ — "Truth Resolution"¶
7.1 Meaning¶
The Reconciler is a first-class subsystem. It is where Paxman converts uncertainty into trustworthy normalized output.
It owns the final truth and the final confidence (the only subsystem that does).
7.2 Structure¶
reconciler/
├── reconciler.py # top-level: candidates → resolved truth
├── merge.py # candidate merging strategies
├── conflict.py # conflict detection
├── evidence_compare.py # compare evidence quality across candidates
├── confidence.py # confidence assignment (bands: CERTAIN/HIGH/MEDIUM/LOW/UNTRUSTED)
├── unresolved.py # explicit unresolved state handling
├── validation.py # apply Validation capability to candidates
├── money.py # MONEY arithmetic + currency policy
└── truth.py # TruthLayer data models (Contract / Candidate / Resolved)
7.3 Public surface¶
| Symbol | Visibility | Notes |
|---|---|---|
ResolvedResult |
Internal | Per-field resolved output |
TruthLayer |
Internal | Tagged-union of Contract / Candidate / Resolved |
ConfidenceBand |
Re-exported in api.types |
CERTAIN, HIGH, MEDIUM, LOW, UNTRUSTED |
7.4 Internal invariants¶
- Reconciler is the ONLY layer that assigns final confidence and final truth. See ADR-0005.
- Three truth layers are explicit:
- Contract Truth — what the caller requires
- Candidate Truth — what capabilities discovered
- Resolved Truth — what the Reconciler accepts into the artifact
- Reconciler never executes capabilities.
- Reconciler never reads raw input.
- Reconciler never sees external schemas — only
CanonicalContract. - Unresolved fields are explicit, never silent.
- Confidence bands are fixed:
CERTAIN,HIGH,MEDIUM,LOW,UNTRUSTED. The internal float is0.0–1.0; the band is derived deterministically from the float and the field'sconfidence_threshold.
7.5 Per-module testing strategy¶
- Unit tests per merging strategy — union, intersection, prefer-by-evidence.
- Conflict tests — same field, two candidates, two evidence sources.
- Confidence calibration tests — fixed inputs → fixed confidence bands.
- Property tests — Reconciler is monotonic: a strictly better candidate (higher evidence quality) never lowers confidence.
MONEYarithmetic tests — currency mismatch, FX, precision.- No capability mocking needed — Reconciler takes candidates, not capabilities.
Reconciler = "where uncertainty becomes trustworthy normalized output"
8. artifact/ — "The Product + Replay Source"¶
8.1 Meaning¶
The artifact is the final output bundle returned by Paxman. It contains normalized data, evidence, diagnostics, unresolved fields, plan metadata, and replay data.
It is the only replay mechanism.
8.2 Structure¶
artifact/
├── artifact.py # ExecutionArtifact + FieldResult data models
├── confidence.py # confidence band mapping (float ↔ CERTAIN/HIGH/MEDIUM/LOW/UNTRUSTED)
├── replay.py # replay hash computation + rehydration
├── evidence.py # evidence references + provenance
├── diagnostics.py # structured diagnostics
├── statistics.py # execution statistics
├── serializer.py # stable JSON encoding (sorted keys, no whitespace)
└── _hash.py # replay hash internals
8.3 Public surface¶
| Symbol | Visibility | Notes |
|---|---|---|
ExecutionArtifact |
Re-exported in api.types |
The product |
FieldResult |
Internal | Per-field result |
Status |
Re-exported in api.types |
SUCCESS, PARTIAL_SUCCESS, UNRESOLVED, INVALID_CONTRACT, EXECUTION_FAILED |
replay(artifact, contract, paxman_version) |
api |
Rehydrate an artifact |
ReplayError family |
api.errors |
Replay-specific errors |
8.4 Internal invariants¶
ExecutionArtifactis the product — it contains:normalized_datafield_resultsunresolved_fieldsevidencediagnosticsexecution_planreplay_hashstatisticsFieldResultcarries:field_id,status,value,confidence,evidence_refs.- Statuses:
SUCCESS,PARTIAL_SUCCESS,UNRESOLVED,INVALID_CONTRACT,EXECUTION_FAILED. - Replay hash captures: canonical contract representation + input fingerprint + planner version + capability versions + configuration + constraints.
- Artifact is replayable without recomputation — rehydration only.
- No persistence in core. The caller stores the artifact.
- JSON serialization is deterministic — sorted keys, no whitespace, RFC 8785 (or equivalent).
- Schema version is embedded —
artifact.paxman_version,artifact.planner_version,artifact.capability_versions[].
8.5 Per-module testing strategy¶
- Replay equality tests — given an artifact,
replay(artifact, contract) == artifact(byte-equal JSON). - Hash determinism — same inputs → same hash (Hypothesis property test).
- Version mismatch tests — wrong Paxman version raises
VersionMismatchError. - Tamper tests — modifying any field changes the hash and is detected on replay.
- Schema migration tests — old artifacts replay cleanly on a compatible new version.
Artifact = "frozen, evidence-backed, replayable truth"
9. api/ — "The Only Thing Users See"¶
9.1 Meaning¶
The API is the public surface. It hides everything else. The complexity lives inside Planner, Executor, and Reconciler — not at the call site.
9.2 Structure¶
api/
├── normalize.py # paxman.normalize(input_data, contract, budget, policy)
├── replay.py # paxman.replay(artifact, contract)
├── types.py # re-exports: CanonicalContract, CanonicalField, FieldType,
│ # ExecutionArtifact, Status, ConfidenceBand,
│ # ResolutionPolicy, Budget, Policy
├── errors.py # public error types (re-export from errors.py)
├── protocols.py # public SPIs: ContractAdapter, Capability, Heuristic
├── registry.py # public registration: register_adapter, register_capability
└── version.py # __version__ string
9.3 Public surface (V1)¶
import paxman
# Top-level functions
paxman.normalize(input_data, contract, budget=None, policy=None) -> ExecutionArtifact
paxman.replay(artifact, contract) -> ExecutionArtifact
paxman.register_adapter(adapter: ContractAdapter) -> None
paxman.register_capability(capability: Capability) -> None
paxman.__version__ -> str
# Types
paxman.CanonicalContract
paxman.CanonicalField
paxman.FieldType
paxman.Status
paxman.ConfidenceBand
paxman.ResolutionPolicy
paxman.Budget
paxman.Policy
paxman.ExecutionArtifact
paxman.CurrencyPolicy
# Errors
paxman.PaxmanError
paxman.InvalidContractError
paxman.ExecutionError
paxman.CapabilityError
paxman.InferenceProviderError
paxman.BudgetExceededError
paxman.ReconciliationError
paxman.ReplayError
paxman.VersionMismatchError
paxman.HashMismatchError
paxman.ConfigurationError
# Protocols
paxman.ContractAdapter
paxman.Capability
9.4 Internal invariants¶
- Tiny surface. The whole API should be learnable in minutes.
- No internal concepts leak out (no
FieldPlan, noCapabilitySpec, noTruthLayerexposed by name). - No pipeline mutation allowed from API.
- No stage awareness exposed directly — the API is "normalize this against this contract under this budget".
- Stable and versioned. The artifact is the version contract, not the API.
- CI enforces the public surface —
tests/test_public_api.pyfails if anything is added without an ADR.
API = "stable, minimal, deterministic surface"
10. Cross-Cutting Concerns¶
These live as shared modules at the package root, not as layers. Every layer may import from them. They may not import from any layer.
paxman/
├── errors.py # PaxmanError hierarchy
├── types.py # shared enums, IDs, paths
├── protocols.py # shared Protocols (subset of api.protocols)
├── versioning.py # Paxman core version, capability versions, planner version
├── logging.py # structured, deterministic logging (no timestamps in replay path)
├── budget.py # Budget + Policy types
├── clock.py # injectable Clock (for deterministic tests)
├── ids.py # prefixed IDs (e.g., `field_`, `cap_`, `art_`)
└── serialization.py # stable JSON encoder (sorted keys, no whitespace)
| Module | Visibility | Notes |
|---|---|---|
errors.py |
Public (re-exported in api.errors) |
Exception hierarchy |
types.py |
Internal | Shared enums only; public types live in api.types |
protocols.py |
Internal (subset re-exported in api.protocols) |
ContractAdapter, Capability, etc. |
versioning.py |
Internal | Version constants |
logging.py |
Internal | Logger factory |
budget.py |
Public (re-exported in api.types) |
Budget, Policy |
clock.py |
Internal | For test injection |
ids.py |
Internal | Prefixed IDs |
serialization.py |
Internal | Stable JSON encoder |
11. System Boundary Rules (Hard Enforcement)¶
These are the real architecture enforcement rules. They are tested in CI.
11.1 Rule 1 — Contract Is the Only External-Schema Boundary¶
Adapters live inside contract/. Nothing else in Paxman knows about Pydantic, JSON Schema, OpenAPI, or any external schema language.
11.2 Rule 2 — Planner Never Executes¶
The planner emits FieldPlans. It never calls a capability, never reads raw input beyond a lightweight input profile, and never touches the artifact.
11.3 Rule 3 — Capabilities Never Assign Confidence¶
Capabilities return candidates + evidence + diagnostics. Confidence is exclusively owned by the Reconciler (the Planner may emit a target_confidence but does not score candidates). See ADR-0005.
11.4 Rule 4 — Executor Never Replans¶
The executor runs the plan exactly as the Planner defined it. It can stop early, but it cannot reroute, retry structurally, or reorder.
11.5 Rule 5 — Reconciler Is the Only Truth Authority¶
The final value and confidence on a FieldResult are set by the Reconciler — nowhere else.
11.6 Rule 6 — Artifact Is the Only Replay Source¶
Replay rehydrates the captured truth. It does not recompute, re-execute, or reinterpret.
11.7 Rule 7 — API Hides Everything¶
The public surface exposes only normalize, replay, public types, and public errors. No subsystem names, no plan structure, no capability names leak out.
11.8 Rule 8 — No Persistence in Core¶
Paxman returns an ExecutionArtifact and stops. Storage, queues, and databases are the caller's responsibility.
11.9 Rule 9 — Cross-Cutting Modules May Not Import From Layers¶
errors.py, types.py, protocols.py, versioning.py, logging.py, budget.py, clock.py, ids.py, serialization.py are at the bottom of the DAG. They may not import from any subsystem layer.
11.10 Enforcement¶
| Rule | Enforced by |
|---|---|
| 1, 2, 3, 4, 5, 6, 7, 9 | import-linter config in pyproject.toml |
| 7 (additional) | tests/test_public_api.py snapshots the public surface |
| 8 | Manual code review; no storage imports anywhere |
12. V1 Capability Set (Explicit)¶
V1 ships with exactly:
text_extractionregex_extractionlookupinferencevalidation
Everything else (capability marketplace, visual planners, graph execution, LLM planners, workflow orchestration, persistent execution, RAG, multi-agent coordination) is postponed to V2.
13. V1 Contract Adapter Set (Explicit)¶
Per ARCHITECTURE.md §4.1, V1 ships with:
- Pydantic Adapter (required)
- JSON Schema Adapter (required)
- Dict DSL Adapter (required)
- Optional OpenAPI Adapter (best-effort)
14. Determinism Guarantees¶
Determinism is required, not aspirational.
Given the same:
- Canonical contract
- Input profile
- Configuration
- Capability set
- Budget + policy
- Paxman core version
- Planner version
- Capability versions
Paxman MUST produce the same:
FieldPlans- Execution order
ExecutionArtifact(modulo declared non-deterministic capability outputs, which are recorded as evidence)
Replay reproduces the same artifact without recomputation.
See REPLAY_AND_DETERMINISM.md for the full replay model.
15. Public Protocols (SPIs)¶
These are the only stable extension points in V1.
15.1 ContractAdapter (in api.protocols)¶
class ContractAdapter(Protocol):
"""SPI: translate an external contract format to/from CanonicalContract."""
@property
def format_id(self) -> str:
"""Stable identifier (e.g., 'pydantic', 'json_schema:draft-2020-12')."""
...
def adapt(self, external: Any) -> CanonicalContract: ...
def export(self, canonical: CanonicalContract) -> Any: ...
15.2 Capability (in api.protocols)¶
class Capability(Protocol):
"""SPI: an atomic operation."""
@property
def spec(self) -> CapabilitySpec: ...
def invoke(self, ctx: CapabilityContext) -> CapabilityResult: ...
15.3 Heuristic (in api.protocols, post-V1)¶
class Heuristic(Protocol):
"""SPI: a planner heuristic (post-V1)."""
def select(self, field: CanonicalField, ctx: HeuristicContext) -> list[FieldPlanStep]: ...
15.4 InferenceProvider (internal, in capabilities/inference.py)¶
class InferenceProvider(Protocol):
"""SPI: a model provider behind the inference capability (internal)."""
def complete(self, request: CompletionRequest) -> Completion: ...
See EXTENDING.md for step-by-step guides.
16. Type Stub and Type-Checking Strategy¶
- The package ships
py.typed(PEP 561 marker) at the package root. mypy --strictis the CI default for the public surface.- The internal modules are type-checked with
mypy --strictas well, but with allowance forfrom __future__ import annotationsandProtocoluse. pyrightis run in CI for cross-validation.- Type hints are mandatory on every public symbol and every internal symbol in
contract/,planner/,reconciler/,artifact/.
17. Build and Packaging Strategy¶
17.1 Build backend¶
- Backend:
hatchling(modern, fast, PEP 517 compliant). - Source layout:
src/paxman/(src-layout to prevent accidental imports from the project root). - Wheel + sdist published to PyPI.
17.2 pyproject.toml layout (skeleton)¶
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "paxman"
version = "0.1.0" # dynamic from git
requires-python = ">=3.11"
license = { text = "MIT" } # or Apache-2.0 — final TBD
authors = [{ name = "Paxman team" }]
description = "Contract-driven deterministic normalization engine."
readme = "README.md"
# Core dependencies (intentionally small)
dependencies = [
"attrs>=23.0",
"typing-extensions>=4.0",
]
[project.optional-dependencies]
# Adapter extras
pydantic = ["pydantic>=2.5"]
json-schema = ["jsonschema>=4.20"]
openapi = ["openapi-spec-validator>=0.6"]
# Provider extras
inference = [] # V1: no remote provider by default
# Convenience
all = ["paxman[pydantic,json-schema,openapi]"]
# Dev extras (also as PEP 735 dependency groups)
dev = [
"pytest>=7.4",
"pytest-cov>=4.1",
"pytest-xdist>=3.3",
"hypothesis>=6.0",
"ruff>=0.4",
"mypy>=1.10",
"pyright>=1.1",
"import-linter>=2.0",
"interrogate>=1.7",
"structlog>=24.1",
]
[tool.hatch.build.targets.wheel]
packages = ["src/paxman"]
[tool.hatch.build.targets.wheel.force-include]
"src/paxman/py.typed" = "paxman/py.typed"
[tool.ruff]
line-length = 100
target-version = "py311"
extend-exclude = ["docs/"]
[tool.ruff.lint]
select = ["E", "F", "W", "I", "B", "UP", "ANN", "ASYNC", "S", "RUF"]
ignore = ["S101"] # asserts OK in tests
[tool.mypy]
strict = true
python_version = "3.11"
files = ["src/paxman"]
[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-ra -q --strict-markers"
markers = [
"deterministic: tests that verify deterministic behavior",
"replay: tests that verify replay equivalence",
"property: hypothesis property tests",
"slow: tests that take >1s",
]
[tool.importlinter:contract:internal-dag]
name = "Internal subsystem DAG"
type = "layers"
layers = [
"paxman.api",
"paxman.artifact",
"paxman.reconciler",
"paxman.executor",
"paxman.capabilities",
"paxman.planner",
"paxman.contract",
"paxman", # cross-cutting modules (errors, types, ...)
]
17.3 Dependency policy¶
| Category | Where declared | Examples |
|---|---|---|
| Core (always installed) | [project].dependencies |
attrs, typing-extensions |
| Adapter extras | [project.optional-dependencies] |
pydantic, jsonschema, openapi-spec-validator |
| Inference extras (V2+) | [project.optional-dependencies] |
openai, anthropic (V1: no remote provider) |
| Dev extras | [project.optional-dependencies] (or PEP 735) |
pytest, mypy, ruff, import-linter, hypothesis |
Hard rule: core dependencies must be ≤ 3 packages and no transitive heavyweight deps (no numpy, no pytorch, no requests by default). This keeps pip install paxman cheap and safe.
See DEPENDENCIES.md for the full policy.
18. Tooling Configuration¶
| Tool | Purpose | Config location | CI behavior |
|---|---|---|---|
ruff |
Lint + format | pyproject.toml [tool.ruff] |
Required to pass |
mypy |
Static type checking | pyproject.toml [tool.mypy] |
Required to pass on public surface; advisory on internals |
pyright |
Cross-validation type checking | pyrightconfig.json |
Required to pass |
import-linter |
Enforce the module DAG | pyproject.toml [tool.importlinter] |
Required to pass |
pytest |
Test runner | pyproject.toml [tool.pytest.*] |
Required to pass with ≥ 90% coverage on core |
hypothesis |
Property-based testing | test files | Used for determinism and replay tests |
interrogate |
Docstring coverage | pyproject.toml [tool.interrogate] |
Required: 100% on public surface |
pre-commit |
Git hooks | .pre-commit-config.yaml |
Local; CI runs the same hooks |
19. Test Layout¶
tests/
├── unit/
│ ├── contract/
│ ├── planner/
│ ├── capabilities/
│ ├── executor/
│ ├── reconciler/
│ ├── artifact/
│ └── api/
├── integration/
│ ├── end_to_end/
│ └── cross_subsystem/
├── property/
│ ├── determinism/
│ └── replay/
├── fixtures/ # see tests/fixtures/README.md
│ ├── README.md
│ ├── DATASET_LICENSES.md # attribution for every vendored file
│ ├── contracts/ # LAYER 3: pydantic, json_schema, dict_dsl, openapi
│ ├── inputs/ # LAYER 4: vendored open datasets + LAYER 1: adversarial
│ ├── artifacts/ # LAYER 3: golden ExecutionArtifact JSON
│ └── generated/ # LAYER 2: programmatic (gitignored)
├── public_api/ # tests that the public surface is stable
└── conftest.py
The full test data policy, dataset catalog, and licensing rules are in docs/TEST_DATA.md. Briefly: Layer 1 = hand-written edge cases, Layer 2 = programmatic (factory_boy, faker, hypothesis), Layer 3 = curated fixtures with golden artifacts, Layer 4 = vendored open-dataset samples, Layer 5 = real production data (never committed).
See TESTING_STRATEGY.md for the test strategy and docs/TEST_DATA.md for the test data policy.
20. One-Line Mental Model¶
Paxman is a contract-driven deterministic normalization engine: the
contractsubsystem canonicalizes and validates the caller's schema, theplannersynthesizes a field-by-field plan,capabilitiesexecute the plan atomically, theexecutorcollects candidate evidence, thereconcilerassigns final confidence and truth, and theartifactfreezes the evidence-backed result for replay behind a tiny publicapi— with the module DAG enforced byimport-linterand the public surface enforced by snapshot tests.
21. References¶
- ARCHITECTURE.md — System architecture, ADRs, sequence diagram.
- GLOSSARY.md — Full domain vocabulary.
- REPLAY_AND_DETERMINISM.md — Replay model deep dive.
- SECURITY.md — Threat model and PII handling.
- TESTING_STRATEGY.md — Test seams and determinism tests.
- docs/TEST_DATA.md — Test data policy, dataset catalog, licensing rules.
- DEVELOPMENT.md — Local dev setup.
- EXTENDING.md — How to add a capability, adapter, or provider.
- DEPENDENCIES.md — Core vs optional dependencies.
- docs/adr/ — Architecture Decision Records.