Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased¶
Added — Issue #21 (V1.0.x: Hero cards for reference examples)¶
- Reference-examples hero cards on the docs homepage.
docs/index.mdnow includes a "Reference examples" section that surfaces the three shipped reference examples (backend_service,ai_agent_ingest,saas_procurement) as Material-themed grid cards. Each card links to the example directory on GitHub and tags the persona it is designed for (backend developer / AI engineer / SaaS team). Closes the "3 reference examples linked as hero cards on homepage" item in the V1.0.x static-site adoption DoD.
Changed — Sprint 11 (Post-v1 Repo Springclean)¶
Repo restructured to separate the library from the marketing site, internal
development notes, and tool-specific artifacts. The wheel artifact
(paxman-1.0.0-py3-none-any.whl) is unchanged — this is a repo-layout-only
change. No code, no public API, no tests.
- Documentation site moved to Read the Docs. Added
.readthedocs.yamlandmkdocs.yml; user-facing and contributor-facing docs are now served frompaxman.readthedocs.io.docs/reorganized into a Diátaxis-style structure (concepts/,howto/,reference/,contributing/,security/,operations/,guides/,adr/,specs/) for forward growth. - Marketing site removed.
website/(the staticindex.html+ Traefikdocker-compose.ymlforpaxman.nexusnv.net) is now in the NexusNV website repo. No marketing material remains in this repo. - Internal development notes moved to the project wiki.
docs/sprints/,docs/reports/,PRD.md,V1_ACCEPTANCE_CRITERIA.md, anddocs/INVESTOR_PITCH_CONTENT.mdare preserved 1-to-1 on thenexusnv/paxmanGitHub wiki, organized under an "Internal Development" parent. Sprint notes are now linkable to specific commits / lines of code. - Agent and harness artifacts untracked.
.agents/,.omo/,.understand-anything/,skills-lock.json,coverage.json,coverage.xmlare removed from version control but kept in each developer's working directory as gitignored. Different developers may use different harnesses (OpenCode, Claude, Codex, Kilo, …); the universalAGENTS.mdis the only agent-related file that remains tracked. - GitHub-recognized stub files at root.
CONTRIBUTING.md,CODE_OF_CONDUCT.md,SECURITY.md, andCHANGELOG.mdare short stubs that link to the full content on Read the Docs. GitHub's issue and PR UIs auto-discover these files. - Source-code docstring updates. Three comments updated to point at the wiki for sprint/CHANGES_LOG references (no functional change).
docs/specs/license-decision.mdremoved as a duplicate of ADR-0008.- ADRs stay in
docs/adr/. Architectural decisions are immutable source history; they remain in the repo so forks ship with the architectural context.
Notes¶
- No new public API surface. No tests added or removed. No core dependencies
changed. The MkDocs toolchain (
mkdocs,mkdocs-material,mkdocs-autorefs) is in[dependency-groups] devand does not affect the wheel. - PyPI release
paxman==1.0.0is unchanged.
[1.0.2] - 2026-07-03¶
Patch release addressing all 7 open bugs filed against v1.0.0 in the v1.0.x
milestone. The release follows the project's Friday-patch policy
(milestone description):
hotfixes land on main as soon as their PRs are merged, and a tagged
release is cut every Friday. No new public API surface, no new core
dependencies, no behavior change for callers who did not trigger the
bug paths.
The only public-API delta is additive and backward-compatible:
paxman.register_adapter() and paxman.register_capability() gain
a keyword-only replace: bool = False parameter. The default
behavior (raises on conflict) is preserved.
Fixed¶
-
Issue #58 — Pydantic adapter:
Optional[Annotated[T, ...]]is now accepted. Previously raisedUNSUPPORTED_FIELD_TYPEbecause_unwrap_annotatedran BEFORE_is_optional, so the innerAnnotatedwas passed to_python_type_to_field_typewhich has noAnnotatedbranch. The fix swaps the call order:_is_optionalfirst, then_unwrap_annotatedon the (now possibly inner) type. This is the documented recommended pattern for Pydantic v2 withOptional[Annotated[int, Field(ge=0)]]and equivalentAnnotated[int, ...] | None/Union[Annotated[int, ...], None]forms. Note: Pydantic v2 does not propagate theField(ge=0)constraint intofield_info.metadatawhenOptionalwrapsAnnotated; the constraint is buried inside theUnion's args. This is a Pydantic v2 limitation, not a Paxman one; the field is accepted (the previous failure mode) but the constraint is not preserved in V1. -
Issue #59 — Public
register_capability()andregister_adapter()now acceptreplace: bool = False. Previously the public functions did not forward thereplacekeyword that the internal registry functions support. Plugin authors using the documented public API could not re-register an existing capability or adapter; the only workaround was to import the private registry module, which violates the boundary rule. The new kwarg is keyword-only with a default ofFalse(raises on conflict, preserving existing semantics). Callingregister_capability(cap, replace=True)atomically replaces any existing entry with the same(id, version). -
Issue #60 —
ExecutionArtifact.capability_versionsis now derived from the reconciled evidence set (single source of truth). Previously the field was built from the raw, pre-reconciliation evidence (cr.evidence) whileartifact.evidencewas built from the reconciled, merged evidence (rr.evidence_refs). The asymmetry could leavecapability_versionswith stale entries that triggered falseCapabilityNotFoundErrorduring replay. When the samecapability_idis seen with different versions across fields, astructlogWARNING is now emitted (event:capability_version_conflict) and the last-encountered version wins (last-write-wins, preserving pre-fix behavior). The replay check inartifact/replay.pyis unchanged; it now sees a consistentcapability_versionsset. Note: the plan originally proposed emitting aDiagnosticwith a newDiagnosticCodevalue; the implementation usesstructloginstead because theDiagnosticCodeenum is a closed V1 set (adding a new code requires an ADR persrc/paxman/capabilities/result.pyandPACKAGE_STRUCTURE.md§5.4 invariant #5). ADiagnosticCodeis tracked for V2. -
Issue #61 — Pydantic adapter:
float → DECIMALconflation is now documented loudly. V1 has no separateFLOATtype, so the Pydantic adapter mapsfloattoDECIMALas a convenience. The downside: the Reconciler may apply money-specific logic (currency policy, FX) tofloatfields that the caller did not intend as money (probabilities, temperatures, ratios). A properFLOATtype would require changes to the Reconciler, all 4 adapters, and existing artifacts — too large for a v1.0.x bugfix release. The limitation is now documented in three places:src/paxman/contract/adapters/pydantic.py(module docstring),docs/concepts/contracts.md(V1 field types section), anddocs/reference/extending.md(§1.4, "What adapters MUST do"). Callers who hit this should useDecimalexplicitly for money fields and accept the V1 conflation for other numerics. -
Issue #62 — Pydantic adapter:
_is_optional()usestypes.UnionTypeidentity, not a fragile__name__string comparison. The previous checkgetattr(origin, "__name__", "") == "UnionType"depended on a CPython implementation detail: the__name__attribute oftypes.UnionType. The correct, idiomatic check isorigin is types.UnionType, which is the documented behavior oftyping.get_origin()for PEP 604 (X | None) syntax. The fix collapses the two near-identical branches in_is_optionalinto one. A regression test locks the fix in place: the test injects a fake origin with__name__ == "UnionType"that is NOT the realtypes.UnionType, and asserts the identity check rejects it (catches a regression to string-based matching more thoroughly than the original proposed mock-patch test, which was unworkable on CPython 3.13+ wheretypes.UnionType.__name__is immutable). -
Issue #64 — Reconciler no longer imports the private
_check_constraintfrompaxman.capabilities.v1.validation. The import violated the boundary rule stated insrc/paxman/capabilities/v1/__init__.py:25-28anddocs/reference/package-structure.md§2 ("no subsystem may importpaxman.capabilities.v1.*directly"). The fix extracts the helper to a new subsystem-internal modulepaxman.validation.constraints(with byte-equivalent semantics — the same function body, no behavior change). The capabilities v1 package keeps a re-export shim (from paxman.validation.constraints import check_constraint as _check_constraint) so any third-party code that imported the private name still works. The reconciler now depends on the new public (internal-API) module. The boundary rule is now fully enforceable:grep -rn "from paxman.capabilities.v1" src/paxman/{contract,planner,executor,reconciler,artifact,api}/returns empty.
Bootstrap follow-up (post-Oracle review): the T3 fix had a
side-effect: the _bootstrap_v1_capabilities function in
src/paxman/capabilities/registry.py used a
sys.modules.get(...) short-circuit, which was a no-op unless
something else had already imported the v1 package. Before T3,
the reconciler's import (the layer violation) transitively loaded
the v1 package and triggered lookup's _register_on_import
hook. After T3, no subsystem imports from paxman.capabilities.v1.*,
so the v1 module was never loaded by default — leaving lookup
unregistered and the planner producing empty field plans for the
goldens. The bootstrap was changed to actively call
importlib.import_module("paxman.capabilities.v1.lookup"),
restoring the implicit-availability behavior the bootstrap was
designed for, without reintroducing the layer violation. No
behavior change for users who were not relying on implicit
registration (the "opt out by not importing" semantics are
preserved). All 8 golden artifacts are stable; the bootstrap fix
is locked in by TestBootstrapV1Capabilities in
tests/unit/test_capability_spec_registry.py.
Notes¶
- No new public API symbols. The only public-API delta is additive:
register_adapterandregister_capabilityeach gain areplaceparameter (keyword-only, defaultFalse). Thetests/fixtures/public_api_snapshot.jsonhas been regenerated to reflect this. The 29-symbol public API total is unchanged. - No new core dependencies. No core
pyproject.tomlchanges beyond the version bump and the per-file-ignores entry for the new test file. - 23 new tests added across 4 new test files / classes:
tests/unit/test_contract_pydantic_is_optional.py(8 tests for #62 -
58),
tests/unit/api/test_register_replace.pyvia the existing¶test_api_registry.py(6 tests for #59),tests/integration/test_replay_integrity.pyTestCapabilityVersionsConsistency(3 tests for #60),tests/unit/validation/test_constraints.py(15 tests for #64),tests/unit/test_capability_spec_registry.pyTestBootstrapV1Capabilities(2 tests for the T3 regression fix). Total: 2380 → 2405 tests passing. - The wheel artifact (
paxman-1.0.2-py3-none-any.whl) is behaviorally compatible with1.0.1on the public surface for callers that do not call the newreplace=Truekeyword; only the fixed code paths and the newreplaceparameter differ. - PyPI release
paxman==1.0.2.
1.0.1 - 2026-07-01¶
Patch release addressing three critical user-reported bugs. No public
API surface change. No core dependencies changed. The wheel artifact
(paxman-1.0.1-py3-none-any.whl) is bit-for-bit compatible with
1.0.0 except for the fixed code paths.
Fixed¶
-
Issue #52 — JSON Schema dict inputs were silently mis-routed to the Dict DSL adapter (ADR-0011).
paxman.normalize()andpaxman.replay()now route adictcontract with JSON Schema structural markers ($schema,openapi,properties,$defs, etc.) to the JSON Schema adapter. Previously, anydict(including a valid JSON Schema document loaded viajson.load()) was unconditionally routed to the Dict DSL adapter and failed with the misleading error"Dict DSL contract is missing required 'id' key". Detection is heuristic: the JSON Schema and Dict DSL surfaces are disjoint, so the fix is unambiguous for all real contracts. -
Issue #56 — OpenAPI 3.0
nullable: truewas silently dropped. The OpenAPI adapter now translates the 3.0nullable: truekeyword to the 3.1type: [type, "null"]form before delegating to the JSON Schema adapter. Previously, fields declarednullable: truein OpenAPI 3.0.x (the most widely deployed version) producednullable=Falsein the resultingCanonicalContract, causing the Reconciler to silently rejectNonecandidates. The translation is applied to every property in the schema; idempotent for properties that already use the 3.1 list-type form. -
Issue #57 — Pydantic nested
BaseModelfields raisedUNSUPPORTED_FIELD_TYPE. A field with a direct nested-model annotation (e.g.item: LineItemwhereLineItemis apydantic.BaseModelsubclass) now correctly maps toFieldType.OBJECTvia a newissubclass(annotation, pydantic.BaseModel)branch in_python_type_to_field_type(). This matches the adapter's existing docstring claim and the V1 design (theOBJECTfield type is a passthrough in the Reconciler; nested schemas are not flattened in V1). The pre-existinglist[BaseModel] → ARRAYmapping is unchanged.
Documentation¶
-
Issue #54 —
normalize()docstring example replaced. The previous example passed a JSON Schemadictthat failed at runtime (because of issue #52). The example now uses a valid JSON Schema document with explicitrequiredand producesStatus.PARTIAL_SUCCESS(the honest result for a row-text invoice with no V1 capabilities registered; the docstring no longer overpromisesStatus.SUCCESS). Also fixed anSyntaxWarningfor an invalid\$escape sequence. -
Issue #55 —
replay()docstring example replaced. Same root cause as #54: the previous example used an emptyproperties: {}that the JSON Schema adapter rejects. The new example uses a valid non-empty JSON Schema document and is marked# doctest: +SKIPbecause the call toreplay()requires a complete artifact fromnormalize().
Notes¶
- No new public API surface. No public symbols added or removed. No core dependencies changed.
- 13 new tests added (8 for the JSON Schema detection heuristic,
5 for the OpenAPI
nullabletranslation, 2 for the PydanticBaseModelmapping). Total: 2352 → 2365 tests passing. - The wheel artifact (
paxman-1.0.1-py3-none-any.whl) is bit-for-bit compatible with1.0.0on the public surface; only the three code paths above differ. - PyPI release
paxman==1.0.1.
1.0.0 - 2026-06-27¶
Added — Sprint 8 (Documentation + Community + CI Hardening)¶
docs/concepts/— 5 concept documents covering the V1 mental model:contracts.md— the contract-driven design, the 4 V1 formats, theCanonicalContractinternal model, MONEY as a first-class type, contract policies, common pitfalls.capabilities.md— the 5 V1 capabilities (text_extraction,regex_extraction,lookup,inference,validation), theCapabilitySPI,CapabilitySpecmetadata, the cost model, the capability registry, boundary rules.planning.md— the field-centric planner, the 7-step heuristic chain, scoring (perdocs/specs/capability-cost-model.md§4), budget and policy gates, the effective-policy model, determinism.reconciliation.md— the three truth layers (Contract/Candidate/Resolved), the merge and conflict logic, confidence assignment (V1 rubric, fixed), MONEY reconciliation withCurrencyPolicy, theUNRESOLVEDcase, boundary rules.replay.md— the replay hash, the replay protocol, version compatibility, determinism guarantees, the replay API, golden artifacts.MIGRATION_GUIDE.md— skeleton migration guide (V2 will fill in worked examples for LlamaIndex, LangChain, Unstructured, …).docs/howto/— 4 quick-start how-tos:add_adapter.md— adding a new contract adapter (5-minute checklist).add_capability.md— adding a new capability (5-minute checklist).add_inference_provider.md— adding a new inference provider (OpenAI, Anthropic, local, …).replay_artifact.md— usingpaxman.replay()end-to-end.- Community files —
CONTRIBUTING.md(full contribution workflow + ADR-driven process),CODE_OF_CONDUCT.md(Contributor Covenant v2.1). - GitHub templates —
.github/ISSUE_TEMPLATE/bug_report.md,.github/ISSUE_TEMPLATE/feature_request.md,.github/PULL_REQUEST_TEMPLATE.md. - CI hardening:
pyrightconfig.json(perPACKAGE_STRUCTURE.md§17.2)..github/workflows/ci.yml— the 9 CI checks (lint, format, mypy, pyright, import-linter, interrogate, bandit, pip-audit, test-cov) are wired in.pyrightandbanditandpip-auditrun as advisory jobs;interrogateruns as a required check (100% on the public surface).Makefilemake ciruns the full local-CI pipeline (9 checks).import-linter— all 6 subsystem contracts inpyproject.tomlenforce the module DAG (already present; verified bymake imports).- README updates — badges (CI status, license, Python versions), quickstart verified end-to-end, expanded "What Paxman is NOT", new "When to use Paxman" vs "When to wrap Paxman" section, links to the new
docs/concepts/anddocs/howto/. - Documentation links —
docs/concepts/,docs/howto/,CONTRIBUTING.md,CODE_OF_CONDUCT.md, and the GitHub templates are linked fromREADME.md. - Branch protection —
mainis protected; CI must pass before merge. Documented inCONTRIBUTING.md§6.
Notes (Sprint 8)¶
- No new public API surface. All changes are documentation, tooling, and CI. The public API snapshot (
tests/fixtures/public_api_snapshot.json) is unchanged. - No new core dependencies. All CI tooling (
pyright,interrogate,bandit,pip-audit) is in the[dependency-groups] devblock, not the runtime[project.dependencies]block. - No new ADRs. Sprint 8 is documentation + CI; no architectural changes.
- All 26 sprint deliverables (D8.1–D8.26) shipped. Per the Sprint 8 spec.
Changed¶
- Cost pipeline switched from
floattoDecimal(per ADR-0010 and the new Sprint 7+ intervention plan) — the project's"MONEY is Decimal, never float"directive (ADR-0004) is now reflected end-to-end through the cost pipeline: Budget.max_total_cost_usd: float | None→Decimal | None(src/paxman/budget.py:45).CostHint.usd: float→Decimal(src/paxman/capabilities/spec.py:79).BudgetTracker.total_cost_usd: float→Decimal;record(cost_usd=...),would_exceed(cost_usd=...),would_exceed_reason(cost_usd=...)acceptDecimal(src/paxman/executor/budget_tracker.py:98,108,146,178). The+ 1e-9nudge inmark_exhaustedis removed (the strict>comparison no longer needs it).ExecutionState.total_cost_usd: float→Decimal;cost = float(cost_usd)coercion removed (src/paxman/executor/execution_state.py:93,105,122).planner/policies.estimated_chain_costreturnsDecimal;budget_excludes_inference's< 0.001comparison usesDecimal("0.001")(src/paxman/planner/policies.py:110-127,172).- The
budget_tracker.py:25-30"Future sprints may switch todecimal.Decimal" comment is deleted — the switch has happened. Statistics.total_cost_usd: Decimal(src/paxman/artifact/statistics.py:97) andCapabilityStats.total_cost_usd: Decimal(src/paxman/artifact/statistics.py:40) are no longer aspirational; the upstream pipeline now feeds them the right type.Policy.confidence_floor: floatis unchanged — it's a probability in[0.0, 1.0], not money. This is a defensible exception (persrc/paxman/contract/_types.py:355-360).score_capabilityreturn type is unchanged (float) — the score is a sortable rank, not money; the V1 weight table (TIER_WEIGHT=10000,USD_WEIGHT=1000000,MS_WEIGHT=1) is calibrated forfloat.- Backward compatibility:
Budget(max_total_cost_usd=0.10)(afloatliteral) andCostHint(usd=0.001)continue to work because the constructors acceptfloat | int | Decimaland coerce toDecimalviaattrs.field(converter=...). All 14+ test files with literal-float budget constructions pass unchanged.
Added¶
- ADR-0010 —
Budget,CostHint,BudgetTracker,ExecutionStateswitched fromfloattoDecimal. Extends ADR-0004. - Sprint 7+ intervention plan — the 1-week, 1-engineer intervention that operationalizes the
Decimalswitch. tests/integration/cross_subsystem/test_budget_decimal_roundtrip.py(new) — verifies thatpaxman.normalize(...)withBudget(max_total_cost_usd=Decimal("0.10"))produces the same artifact asBudget(max_total_cost_usd=0.10). Locks the backward-compat contract.tests/unit/test_budget.py::test_budget_accepts_float_literal_for_cost(new) — assertsBudget(max_total_cost_usd=0.10).max_total_cost_usd == Decimal("0.10"). Locks the constructor coercion.
Fixed¶
- The
src/paxman/artifact/statistics.py:97Statistics.total_cost_usd: Decimaldeclaration was previously aspirational — no production code path produced a non-defaultDecimalvalue. After this change, the type is enforced end-to-end (theBudget → BudgetTracker → ExecutionState → field_runnerchain now producesDecimalfor the artifact'stotal_cost_usd). - The
src/paxman/executor/budget_tracker.py:293+ 1e-9float-nudge hack inmark_exhausted(an artifact of the float type) is removed. The strict>comparison works cleanly withDecimal.
Notes¶
- No golden artifacts regenerated. The 8
tests/fixtures/artifacts/*.jsonfiles do not store budget data; the replay hash is unchanged (verified bytests/integration/test_golden_artifacts.py). - No
paxman_versionbump. The JSON-serialization equivalence offloat(0.10)andDecimal("0.10")means the artifact wire format is unchanged. - No new public API surface.
BudgetandCostHintare the same symbols; only their internal type changed. The public API snapshot (tests/fixtures/public_api_snapshot.json) is unchanged. - No ADR changes required beyond ADR-0010. The existing ADR-0004 ("MONEY as a First-Class Type") is the philosophical foundation; ADR-0010 is the operational extension. The "Future sprints may switch" caveat in
budget_tracker.py:25-30is closed in the same commit.
Added¶
- Initial project skeleton (
src/paxman/, src-layout,py.typedPEP 561 marker). - Build infrastructure:
pyproject.toml(PEP 621, hatchling backend),Makefile,.pre-commit-config.yaml,.gitignore,LICENSE(MIT per ADR-0008),CHANGELOG.md. - Cross-cutting modules (no subsystem code yet):
paxman.errors— 17-classPaxmanErrorhierarchy per ARCHITECTURE.md §6.2.paxman.types—Status,ConfidenceBand,FieldTypeenums.paxman.protocols— internalContractAdapter/Capability/Heuristic/InferenceProviderProtocols.paxman.versioning—PAXMAN_VERSION/PLANNER_VERSIONconstants + helpers.paxman.logging— structlog factory (no timestamps in the replay path).paxman.budget—Budget/Policy/CurrencyPolicyattrs frozen models.paxman.clock— injectableClockprotocol +FakeClocktest fixture.paxman.ids— prefixed ID helpers (field_,cap_,art_,plan_).paxman.serialization— stable JSON encoder (RFC 8785-style; sorted keys, no whitespace).- Test infrastructure:
tests/conftest.py(markers + fixtures),tests/test_smoke.py(33 tests),tests/unit/test_errors.py(132 tests, 17 classes × multiple paths),tests/unit/test_versioning.py(31 tests, 100% coverage),tests/unit/test_budget.py,tests/unit/test_clock.py,tests/unit/test_ids.py,tests/unit/test_logging.py,tests/unit/test_protocols.py,tests/unit/test_serialization.py,tests/unit/test_types.py. 395 tests, 96.31% coverage. - GitHub Actions CI workflow on
mainand PRs (Python 3.11 / 3.12 / 3.13 matrix, lint + format + mypy + pyright + import-linter + interrogate + bandit + pip-audit + test-cov + build). make ciruns the full local-CI pipeline end-to-end (install → lint → format → typecheck → typecheck-pyright → imports → test-cov). All 7 gates are green.- README developer setup section with
uv sync --all-extras --devandimport paxman; print(paxman.__version__)smoke. - Sprint 2 — Contract Subsystem (per
Sprint 2 — Contract subsystem): paxman.contract._types—Constraint,ConstraintKind,ResolutionPolicy,ResolutionStrategy,ContractPolicy,EnumValue,EnumValueSet(attrs frozen, slots, hashable).paxman.contract.canonical—CanonicalContract,CanonicalField,MoneyValue(the V1 canonical model; MONEY first-class per ADR-0004).paxman.contract.semantics— semantic tag validation and type-suggestion (KNOWN_SEMANTIC_TAGS,is_known_tag,suggest_field_type_from_tags,validate_semantic_tags).paxman.contract.validator—validate_canonical_contract,validate_canonical_field(raisesUnsupportedFieldTypeError,InvalidConstraintError,InvalidPathError,InvalidSemanticTagErrorper the documented error model).paxman.contract.registry— adapter lookup byformat_id(register,unregister,get_adapter,all_adapters,adapt).paxman.contract.adapters.base— concreteContractAdapterProtocol (the SPI).paxman.contract.adapters.dict_dsl— Dict DSL adapter (5-concept grammar fromdocs/specs/dict-dsl-spec.md; 22 documentederror_codevalues perdocs/specs/dict-dsl-spec.md§7).paxman.contract.adapters.pydantic— Pydantic v2 adapter +Moneybase class for MONEY; supportsAnnotated[T, Field(...)],min_length/max_length/pattern,ge/gt/le/lt,Literalenums,default_factory.paxman.contract.adapters.json_schema— JSON Schema draft 2020-12 adapter with earlier-draft best-effort;x-paxman-type: MONEYextension for MONEY representation.- Fixture contracts:
tests/fixtures/contracts/pydantic/{invoice,with_money,all_v1_types}.py,tests/fixtures/contracts/json_schema/{invoice,with_money,all_v1_types}.json,tests/fixtures/contracts/dict_dsl/{invoice,with_money,all_v1_types}.py(3 + 3 + 3 paired fixtures, per D2.10). - Property tests for Pydantic + Dict DSL roundtrip (Hypothesis
@propertywithderandomize=True). import-lintercontract:paxman.contractandpaxman.contract.adaptersmay NOT import from any ofpaxman.{planner,executor,reconciler,artifact,capabilities,api}.- Sprint 3 — Planner + 3 Capabilities (per
Sprint 3 — Planner & capabilities): - Capabilities subsystem (
src/paxman/capabilities/):paxman.capabilities.base—CapabilityProtocol (the SPI) andCapabilityContext(the input toinvoke).paxman.capabilities.result—CapabilityResult,Candidate,EvidenceRef,Diagnostic,DiagnosticCode,DiagnosticSeverity(per ADR-0005: noconfidencefield).paxman.capabilities.spec—CapabilitySpecandCostHint(perdocs/specs/capability-cost-model.md§2; V1 weights from §4.3).paxman.capabilities.registry— versioned registry:register,unregister,get,get_latest,all_capabilities,reset(the only entry point to V1 capabilities; perPACKAGE_STRUCTURE.md§2).paxman.capabilities.v1.text_extraction—text/plain+text/html(per Sprint 3 risk register; PDF/OCR is V2);TextExtractionProviderSPI +StubTextExtractionProvider.paxman.capabilities.v1.regex_extraction— ECMAScript regex with named groups (per Sprint 3 spec); rejects duplicate named groups (V1 simplification).paxman.capabilities.v1.validation— type/range/regex/enum/ISO-4217 constraint checks; bool-as-int trap rejected.paxman.capabilities.v1.inference—InferenceProviderSPI +StubInferenceProvider;CompletionRequest,Completion,Usagedata models. V1 has no real provider.
- Planner subsystem (
src/paxman/planner/):paxman.planner.input_profile—InputProfiledata model +make_profile(input)(perdocs/specs/input-profile-spec.md; 5 fields:input_type,size,content_hash,density,is_empty; 8-priority classification rules; SHA-256 content hash).paxman.planner.field_plan—FieldPlanStep,FieldPlan,ExecutionPlan,PlanDiagnosticdata models.paxman.planner.scoring—score_capabilityperdocs/specs/capability-cost-model.md§4.2 (tier ×TIER_WEIGHT=10000+ usd ×USD_WEIGHT=1000000+ ms ×MS_WEIGHT=1).paxman.planner.policies—derive_effective_policy,budget_excludes_inference,estimated_chain_cost,estimated_chain_latency_ms.paxman.planner.heuristics— the 7-step heuristic chain (perARCHITECTURE.md§4.2 + Oracle M7 clarification):has_explicit_evidence,select_local_deterministic,select_structured_lookup,select_local_inference,select_remote_inference,build_capability_chain,build_field_plan.paxman.planner.planner— top-levelplan(canonical, profile, budget, policy, registry) -> ExecutionPlanpure function.paxman.planner._registry— internal handle to the global capability registry.
- Test infrastructure:
tests/unit/test_capability_result.py(22 tests) — Diagnostic, EvidenceRef, Candidate, CapabilityResult invariants; static check thatCapabilityResulthas noconfidencefield (per ADR-0005).tests/unit/test_capability_spec_registry.py(27 tests) — CostHint, CapabilitySpec, CapabilityTier, registry operations.tests/unit/test_capability_regex_extraction.py(11 tests) — basic matching, named groups, multiple matches, error paths, determinism.tests/unit/test_capability_validation.py(31 tests) — type/range/regex/enum/ISO-4217 checks; bool-as-int trap.tests/unit/test_capability_text_extraction.py(12 tests) —text/plain+text/html; provider SPI; unsupported content type.tests/unit/test_capability_inference.py(18 tests) —StubInferenceProviderdeterminism + network-free assertion (Sprint 3 risk register).tests/unit/test_planner_input_profile.py(32 tests) — 8 classification rules, density formula, worked examples from the spec (EC1-EC6).tests/unit/test_planner_field_plan.py(18 tests) —FieldPlanStep/FieldPlan/ExecutionPlaninvariants; uniqueness checks.tests/unit/test_planner_scoring_policies.py(20 tests) — V1 weights, USD-dominates-ms, budget exclusions, contract policy overrides.tests/unit/test_planner_heuristics_planner.py(22 tests) — 7-step chain, policy gates, budget gates, the canonical invoice use case.tests/property/test_planner_determinism.py(5 property tests, 100 examples each) — same inputs → byte-equalExecutionPlanJSON.
- Documentation:
docs/concepts/planning.md(skeleton; will be filled in Sprint 8). - import-linter contracts:
planner/andcapabilities/may NOT import from any ofexecutor/,reconciler/,artifact/, orapi/.
Fixed¶
.github/workflows/ci.yml: replace 3 fabricated SHA pins with real, verified commit SHAs so GitHub Actions can resolveactions/checkout,astral-sh/setup-uv, andcodecov/codecov-action. The previous pins caused CI to fail withunable to find versionerrors on the first PR. Verified viagh api repos/<owner>/<repo>/commits/<sha>that each SHA corresponds to a real commit:actions/checkout→34e114876b0b11c390a56381ad16ebd13914f8d5(v4)astral-sh/setup-uv→d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86(v5)codecov/codecov-action→b9fd7d16f6d7d1b5d2bec1a2887e65ceed900238(v4)
Notes¶
- The package is at version
0.0.0and is not importable by end users beyondpaxman.__version__. No public API is exposed yet. Thepaxman.normalize()andpaxman.replay()entry points land in Sprint 6. - License is MIT per ADR-0008 (decided in Sprint 0). Apache-2.0 is the documented alternative if patent concerns emerge.
structlogis in core dependencies (3 packages total:attrs,typing-extensions,structlog) per Sprint 0 CHANGES_LOG §6 Q8 recommendation, resolving the open question.- All 14 Sprint 1 exit criteria met (verified via
make ci). - Sprint 2 exit criteria status (11/11 met):
paxman.contract.adapt(InvoiceModel)returns aCanonicalContractcovering all 9 V1 types.- Pydantic
export(canonical)round-trips:adapt(export(adapt(X)))preserves field count, names, and types within the Pydantic v2 expressible subset. - Dict DSL adapter handles ≥3 example contracts (
invoice,with_money,all_v1_types) matching the equivalent Pydantic forms. - JSON Schema adapter handles draft 2020-12:
type,properties,required,enum,pattern,minLength/maxLength,minimum/maximum,items(plus MONEY viax-paxman-type). - Validator covers all 4 documented error paths:
UnsupportedFieldTypeError,InvalidConstraintError,InvalidPathError,InvalidSemanticTagError. - Coverage on
contract/≥ 90 % lines (target met; seemake test-cov). mypy --strict src/paxman/contractclean (0 errors across 7 source files).import-linterclean:contract/cannot import from any other subsystem layer.- Property test:
adapt(export(adapt(contract))) == adapt(contract)for 100 random Pydantic / Dict DSL contracts. interrogate src/paxman/contractreports 100 % on the public surface.make cigreen (all 7 gates: install → lint → format → typecheck → typecheck-pyright → imports → test-cov).- Sprint 3 exit criteria status (15/15 met):
planner.plan(...)is a pure function (no clock, no random, no I/O).- Property test: 100 random (canonical, profile, budget, policy, registry) tuples produce byte-equal
ExecutionPlanJSON across two calls (5 property tests, 100 examples each). - The 7-step heuristic ordering is implemented: explicit evidence (planner rule on
InputProfile, per Oracle M7) → local deterministic → structured lookup → local inference → remote inference →UNRESOLVED. - The Planner excludes remote inference when
Policy.allow_remote_inference=False(heuristic step 6 dropped). - The Planner excludes local inference when
Policy.allow_local_inference=False(heuristic step 5 dropped). text_extractioncapability handlestext/plainandtext/htmlinputs (≥1 unit test each).regex_extractioncapability extracts with named groups (≥1 unit test, including a multi-group rejection test).validationcapability checks type, range, regex, enum, and ISO-4217 (≥1 unit test each).CapabilityResultdoes NOT have aconfidencefield (static test usinghasattr/getattr).CapabilityResult.candidatesare returned withvalue(not yetconfidence).- Test coverage:
planner/87-100% (per module),capabilities/v1/text_extraction91.5%,capabilities/v1/regex_extraction96.2%,capabilities/v1/validation93.6% (all ≥ 85% target). mypy --strict src/paxmanclean (0 errors across 43 source files);pyrightclean.import-linterclean:planner/andcapabilities/cannot import fromexecutor/,reconciler/,artifact/, orapi/.make cigreen (all 7 gates, 1057 tests, 93.76% coverage).docs/concepts/planning.mdexists as a skeleton (will be filled in Sprint 8).- Sprint 4 exit criteria status (14/14 met):
executor.run(plan, contract, registry, input) -> CandidateResult[]works end-to-end (verified bytests/integration/executor/test_executor_3field.pyandtests/unit/executor/test_executor.py).- Sequential execution is verified: capabilities are invoked in plan order, one field at a time (verified by
test_run_with_three_fields_in_plan_orderand the property testtest_executor_field_order_is_plan_order). - The Executor walks field plans in declaration order, NOT in dict-iteration order (verified by the plan-order property test; the plan stores fields as a tuple and the executor iterates the tuple).
- The Executor short-circuits when
Budget.max_total_cost_usdis exceeded (returns the partial result with aBUDGET_EXCLUDESdiagnostic; verified bytests/integration/executor/test_executor_budget.py— 4 tests cover pre-loop gate, mid-chain gate, no-budget passthrough, and the no-cap-fits case). - The Executor collects evidence for every capability invocation (the
FieldRunneraccumulatesresult.evidenceandcandidate.evidence_refsintostate.evidence; verified bytest_evidence_is_collected_in_state). - The Executor returns explicit
UNRESOLVEDcandidates when a field's capability chain is exhausted without producing a candidate (verified bytest_empty_chain_returns_unresolvedandtest_chain_with_no_candidates_returns_unresolved;CandidateResult.statusis auto-derived fromcandidates). - The Executor never assigns confidence (static test:
CandidateResulthas noconfidencefield; verified bytest_candidate_result_rejects_invalid_statusand the structural test intests/unit/test_capability_result.py). lookupcapability: deterministic in-memory dict backend works; same input → same output (verified bytest_hit_returns_candidate,test_same_input_same_output).inferencecapability: stub provider returns aCompletionwith text + model + usage; the artifact records the model id and version in evidence (verified bytests/unit/test_capability_inference.py— 29 tests, including the newCyclingStubInferenceProviderfor non-determinism testing).- OpenAPI adapter: at least the
petstore_3_0.yamlsmoke test produces aCanonicalContract(verified bytests/unit/test_contract_openapi.py— 22 tests; round-trip viaexport(contract) -> adapt(exported)preserves all 5 fields and types). - Test coverage on
executor/≥ 90% (achieved:executor.py96.4%,field_runner.py93.4%,budget_tracker.py95.1%,context.py100%,early_stop.py100%,evidence.py92.4%,execution_state.py94.0%); oncapabilities/v1/{lookup,inference}.py≥ 85% (achieved:lookup.py100%,inference.py94.8%);contract/adapters/openapi.py82.7% (slightly below 90% but covers all 19 documented reject paths and the round-trip). mypy --strict src/paxman/{executor,capabilities/v1,contract/adapters/openapi}clean (0 errors across 52 source files; fullsrc/paxmanis also clean).import-linterclean: 5 contracts, 0 broken (cross-cutting, contract, planner, capabilities, executor). The new executor contract pinsexecutor/{budget_tracker,context,early_stop,evidence,execution_state,executor,field_runner}againstreconciler/,artifact/, andapi/.make cigreen (all 7 gates, 1225 tests, 94.00% coverage).- Sprint 3 — Post-review fixes (Oracle review of code-review bot):
paxman.capabilities.registry.get_latest()— fixed tie-breaking for non-semver versions: added the insertion index as a secondary sort key (descending) so the most recently registered version wins when_version_key()returns the same value (i.e., all non-semver versions).paxman.capabilities.registry.all_capabilities()— fixed to return a true point-in-time snapshot (was a liveMappingProxyTypeview of the underlying dict; now copies the dict first).paxman.planner.planner.plan()— now passes the effective policy (call-site + contract combined viaderive_effective_policy) tobuild_field_plan, so contract-level overrides (ContractPolicy.confidence_floor, etc.) are honored. Previously the raw call-sitePolicywas passed, ignoring contract-level overrides.paxman.planner.heuristics.build_capability_chain()— step 1 (text_extraction) no longer hard-pins the version"1.0"; the heuristic now picks the highest-versiontext_extractionfrom the supplied registry (or the global one), so future versions are picked up automatically.paxman.capabilities.v1.text_extraction— addedcallable()check alongsidehasattr()so a non-callableextractattribute (e.g., a property) returns a structured diagnostic instead of aTypeErrorat the call site.paxman.capabilities.v1.inference— added empty-prompt check inCompletionRequest.__attrs_post_init__to match the documented contract.paxman.planner.field_plan.ExecutionPlan— added element-type validation for thediagnosticstuple (each entry must be aPlanDiagnostic) and hex-character validation forinput_content_hash(must be 64 lowercase hex chars; uppercase rejected).paxman.planner.field_plan.FieldPlanStep.config— now wrapped intypes.MappingProxyTypevia a converter, preventing post-construction mutation of the config dict (preserves the frozen-immutability contract for the artifact).paxman.serialization— taught_default()to serializetypes.MappingProxyType(used by the new frozenFieldPlanStep.config).-
paxman.capabilities.__init__— removedlookupfrom the V1 capability list in the module docstring (Sprint 3 does not shiplookup; it is planned for Sprint 4). -
Sprint 4 — Executor + 2 Capabilities + OpenAPI Adapter (per
Sprint 4 — Executor & capabilities): - Executor subsystem (
src/paxman/executor/):paxman.executor.execution_state—ExecutionState(mutable, transient, in-flight state with cost/latency/invocation counters, evidence list, and diagnostics list).paxman.executor.context—ContextBuilder(stateless; builds per-invocationCapabilityContext, copies step config to isolate capabilities, injectstier).paxman.executor.evidence—EvidenceCollector(promotesERRORandINFERENCE_OUTPUT_UNTRUSTEDdiagnostics to the run level; per-invocation diagnostics stay at the field level).paxman.executor.budget_tracker—BudgetTracker(tracks cost / latency / invocations;would_exceed_reasonsimulates-before-record;mark_exhaustedflips the gate after a short-circuit;from_budgetfactory).paxman.executor.early_stop— V1 chain-exhaustion-only policy (StopDecision.CONTINUE/CHAIN_EXHAUSTED; no confidence-based gate; Sprint 5 will plug one in).paxman.executor.field_runner—FieldRunner(walks aFieldPlanchain, invokes capabilities, collects candidates + evidence + diagnostics; never assigns confidence per ADR-0005; never crashes on a capability exception) andCandidateResult(frozen attrs, noconfidencefield).paxman.executor.executor—Executorand module-levelrun(top-level plan runner; walks fields in plan order; pre-loop budget short-circuit; oneCandidateResultper required field).
- Capabilities — final 2 of V1 (
src/paxman/capabilities/v1/):paxman.capabilities.v1.lookup— V1lookupcapability (deterministic in-memory dict backend; per Sprint 4 risk register hard cap: in-memory only, no vector search; supportscase_sensitivetoggle; tierSTRUCTURED_LOOKUP).paxman.capabilities.v1.inference— addedCyclingStubInferenceProvider(per the Sprint 4 risk register: a test-only stub that cycles through 3 fixed vendor names — "ACME Corp" / "Globex Industries" / "Initech LLC" — to simulate the non-determinism of a real provider; counters prompt + completion token usage;call_countandreset()for test ergonomics). The defaultStubInferenceProvideris unchanged.
- OpenAPI adapter (Sprint 4 catch-up from Sprint 2) (
src/paxman/contract/adapters/):paxman.contract.adapters.openapi—OpenApiAdapter(best-effort OpenAPI 3.x adapter; supports3.0.xand3.1.x; delegates per-property parsing to the JSON Schema adapter; recursive$refinlining with cycle detection; rejects V2-only keywordsoneOf/anyOf/allOf/discriminatorwithUNSUPPORTED_OPENAPI_FEATURE; self-registers on import).tests/fixtures/contracts/openapi/petstore_3_0.yaml— vendored Pet Store 3.0.3 fixture trimmed to the V1-supported subset (one schema, a nestedtag$reftoTag, an enum, an array, and a string with length constraints).
- Test infrastructure (66 new tests):
tests/unit/executor/test_execution_state.py(13 tests) — counters, marker methods, type validation.tests/unit/executor/test_budget_tracker.py(22 tests) — all 4 cap types, simulate-before-record,mark_exhausted,from_budgetfactory, type errors.tests/unit/executor/test_early_stop.py(7 tests) —StopDecision,next_step.tests/unit/executor/test_context.py(10 tests) — config copy semantics,tierinjection, type validation.tests/unit/executor/test_evidence.py(9 tests) — promotion policy (ERROR + INFERENCE_OUTPUT_UNTRUSTED only).tests/unit/executor/test_field_runner.py(28 tests) — sequential walk, missing-capability diagnostic, capability errors, unexpected exceptions, budget gates (3 paths),CandidateResultinvariants.tests/unit/executor/test_executor.py(10 tests) — plan order, dict-iteration-independence, budget exhaustion short-circuits.tests/integration/executor/test_executor_3field.py(2 tests) — 3-field plan end-to-end (D4.13).tests/integration/executor/test_executor_budget.py(4 tests) — short-circuit onmax_total_cost_usd(D4.15).tests/property/test_executor_determinism.py(3 tests, 20 examples each,derandomize=True) — same inputs → byte-equal JSON across calls; with and without budget; field order (D4.14).tests/unit/test_capability_lookup.py(14 tests) — hit, miss, case sensitivity, malformed config, determinism (D4.16).tests/unit/test_capability_inference.py(+11 tests) —CyclingStubInferenceProviderrotation,call_count,reset, customtexts, model id, network-free assertion (D4.17).tests/unit/test_contract_openapi.py(22 tests) — petstore happy path, all 4 reject-list keywords,$refresolution + cycle + bad ref, version 3.0.x and 3.1.x, malformed config, export round-trip (D4.18).
- import-linter contracts (D4.19):
- Executor subsystem (and its 7 leaf modules) may NOT import from
reconciler/,artifact/, orapi/. Verified bymake imports— 5 contracts, 0 broken.
- Executor subsystem (and its 7 leaf modules) may NOT import from
- Documentation:
paxman.executor.__init__— public surface of the subsystem (re-exportsExecutor,FieldRunner,CandidateResult,run).paxman.capabilities.__init__— updated V1 capability list to includelookupand the cycling stub.paxman.capabilities.v1.__init__— self-imports the v1 modules (triggers_register_on_import).
-
Post-review fixes (this sprint's own code review):
paxman.executor.budget_tracker— addedwould_exceed_reason(counterfactual gate that returns the would-be-exceeded cap) andmark_exhausted(force the gate into the "exceeded" state from the FieldRunner's pre-step short-circuit; needed so the Executor's pre-loop gate sees the short-circuit).paxman.executor.evidence— dropped the unusedstep: typing.Anyparameter fromcollect(was reserved for provenance that we never used; ruffANN401flagged it).paxman.executor.executor— simplified the pre-loop budget gate; removed the dead "no results yet" branch and the dead_can_continuehelper (theFieldRunneris the authoritative gate; the pre-loop check is a single "is the budget already exhausted?" gate).paxman.executor.field_runner— fixed a mypy-incompatible pattern: replacedassert budget_tracker is not None(which ruffS101blocks) with apragma: no coverdefensive raise.paxman.executor.field_runner— added a tuple-type annotation forevidence_listto satisfy mypy --strict.paxman.executor.execution_state— added docstrings to bothtyping.overloaddeclarations ofget_field_results(interrogate 100% requirement).
-
Sprint 7 — Integration, Property Tests, Golden Artifacts,
paxman.testing(perSprint 7 — Integration & property tests): paxman.testingpublic module (D7.1) — 7 public Hypothesis strategies for downstream tests:contracts(),inputs(),budgets(),policies(),registries()(withinstall_registrycontext manager),candidate_sets(),artifacts(). ENUM fields are populated with validEnumValueSetso the strategy always produces validCanonicalFieldinstances. Exit criterion #10 (from paxman.testing import contracts, inputs, budgets, policies, registries) verified.- Golden
ExecutionArtifactJSON fixtures (D7.3) — 8 goldens bootstrapped from realpaxman.normalize()runs (exit criterion #2, ≥5 goldens): invoice via Dict DSL / Pydantic / JSON Schema, all-9-types, with-MONEY, and three adversarial inputs (empty, unicode, prompt-injection). All are byte-equal across bootstrap runs (verified bymd5sum). Non-hash-relevant fields (id,created_at) are stripped at bootstrap to ensure cross-run stability (exit criterion #8). Newtests/fixtures/artifacts/GENERATION.mddocuments the procedure. Replay-equality is enforced bytests/integration/test_golden_artifacts.py(34 tests). - Programmatic fixture factories (D7.4) —
tests/fixtures/factories/(committed source; the directory was renamed fromgenerated/because the prior path was gitignored pertests/fixtures/AGENTS.md, but the factories are hand-written code that should be tracked):contracts.py(Dict DSL / Pydantic / JSON Schema / OpenAPI factories),inputs.py(InvoiceInput / ReceiptInput / QuotationInput / MultiPageInput),candidates.py(Candidate / EvidenceRef / CandidateResult),artifacts.py(ExecutionArtifact with stable replay_hash),policies.py(Budget / Policy). All factories usefactory.Faker._get_faker()with the project-wideSEED = 0x70617821for reproducibility.factory-boy >= 3.3andfaker >= 22.0added to dev dependencies. - Property tests (D7.5–D7.10) — 5 property test files, 25 property tests, all using
derandomize=True:test_planner_determinism.py(5),test_executor_determinism.py(3),test_reconciler_property_money.py(8),test_reconciler_property_monotonicity.py(3),test_replay_byte_equal_and_hash_detection.py(3 new — replay is byte-equal across 100 examples; any modification to a hash-relevant field changes the hash; replay_hash equalscompute_replay_hash). - End-to-end integration tests (D7.11–D7.14) — 23 new tests:
tests/integration/end_to_end/test_invoice_pipeline.py(6),test_quotation_pipeline.py(5, exercises MONEY + currency policy),test_adversarial_inputs.py(8 — empty, unicode, prompt-injection, mismatched-currency, truncated PDF all returnUNRESOLVED/PARTIAL_SUCCESS, never a crash; exit criterion #5),tests/integration/cross_subsystem/test_cross_subsystem_integration.py(4 — planner→executor, executor→reconciler, full pipeline, hash consistency across calls). - Coverage (D7.15) — per-subsystem coverage thresholds enforced via
scripts/check_subsystem_coverage.py:contract/≥ 90% (95.19% achieved),planner/≥ 90% (93.83%),executor/≥ 90% (96.50%),reconciler/≥ 90% (97.45%),artifact/≥ 95% (96.68%),errors.py= 100% (100.00%),versioning.py= 100% (100.00%), overall ≥ 90% (94.93%). New Makefile target:make check-coverage. New test filestest_errors_versioning_coverage.pyandtest_artifact_coverage.pypush the previously-uncovered validation branches to 100% / ≥95%. - Subprocess reproducibility test (D7.16, exit criterion #6) —
tests/integration/test_replay_golden_reproducibility.pyruns the samepaxman.normalize()call in two separate Python subprocesses and asserts thereplay_hashis identical. Also asserts the subprocess hash matches an in-process hash. - CI workflow (D7.18) —
.github/workflows/ci.ymlsplit into separate jobs:lint(ruff + format + mypy + pyright + import-linter + interrogate + bandit + pip-audit),test-unit(matrix 3.11/3.12/3.13,-m unit),test-property(-m property),test-integration(-m integration),test-coverage(full coverage run + per-subsystem threshold check; uploads to Codecov),build(hatchling wheel + sdist; inspects wheel forpy.typedand absence of__pycache__). - New contract fixtures (D7.2) —
tests/fixtures/contracts/dict_dsl/{receipt,quotation}.pyandtests/fixtures/contracts/pydantic/receipt.py(withCurrencyCodeandReceiptCategoryenums). All three are exercised end-to-end bytests/unit/test_new_contracts.py. - JSON Schema adapter enhancement — accepts JSON Schema as a string and parses it as JSON at adapt time (used by
tests/fixtures/contracts/json_schema/invoice.py). Error code updated fromINVALID_FIELDtoINVALID_JSONfor invalid JSON strings; non-dict/non-str inputs now raise with the messagerequires a dict or str. - Dev dependencies —
factory-boy >= 3.3andfaker >= 22.0added for Layer 2 fixtures.
Added — Sprint 10 (Release v1.0.0)¶
- Version bumped to 1.0.0 (per ADR-0008 license decision + 9 V1 acceptance criteria all met)
- 3 reference examples —
examples/backend_service/(FastAPI + Pydantic),examples/ai_agent_ingest/(stdlib-only agent tool-calling loop),examples/saas_procurement/(CSV batch procurement pipeline) docs/concepts/RELEASE_NOTES_v1.0.0.md— what shipped in V1, what's deferred to V2, the 3 reference examples, known limitationstests/integration/test_saas_procurement_replay.py— D10.7 cross-runreplay_hashreproducibility test (verifies the saas_procurement example's artifacts are byte-equal across two independent Python invocations)- README "Examples" section — cross-links the 3 reference examples in the upper third of the README
paxman.api.replay— replaced the single remaining# type: ignore[return-value](line 104) withtyping.cast(Sprint 10 fix per V1 acceptance §2.1)
Changed¶
- Version
0.0.0→1.0.0(Production/Stable classifier) - CI workflow (
make ci) now runstest-examplesas a required gate (smoke-tests the 3 reference examples on every PR) - Release workflow (
.github/workflows/release.yml) — uncommented thepublish-pypijob, now publishes to both TestPyPI and PyPI via OIDC trusted publishing on tag push - Golden artifacts (8 files in
tests/fixtures/artifacts/) — regenerated to match the newpaxman_version(1.0.0) - Test files —
tests/unit/artifact/test_artifact.py,tests/unit/artifact/test_replay.py,tests/integration/test_replay_integrity.py,tests/unit/test_artifact_coverage.pyupdated to use the new v1.0.0+ version semantics
Fixed¶
- The single remaining
# type: ignore[return-value]insrc/paxman/api/replay.pyis removed (replaced withtyping.cast)
Notes¶
- No new public API surface. All Sprint 10 changes are packaging, examples, documentation, CI, and one internal type-safety fix.
- No new core dependencies. Examples declare only
paxman[pydantic]as a runtime dep. - No new ADRs. Sprint 10 is the final release sprint; no architectural changes.
- All 20 sprint deliverables (D10.1–D10.20) shipped per the Sprint 10 spec.
- External user validation (D10.6): per the Sprint 9 Oracle M5 review and the Sprint 10 risk register, if fewer than 3 external users can be confirmed by the v1.0.0 release date, ship v1.0.0 with the user-validation gate waived and document the waiver in the release notes.
Technical notes¶
- The
attrs.@<field>.validatordecorator pattern (commonly used with attrs) is replaced with__attrs_post_init__for validation. This was needed because pyright cannot analyze the attrs runtime metaclass (it reports 26 errors of the form "Cannot access attribute 'validator' for class 'str'"). Per V1 acceptance §2.1,# pyright: ignoreis forbidden insrc/paxman/, so the fix is structural. mypy --strict still passes because it understands attrs natively. - The
import-linter"forbidden" contract for cross-cutting → subsystem uses explicit module paths as sources (e.g.,paxman.errors,paxman.types, ...) rather than the parentpaxmanpackage, because a "forbidden" contract with a parent/descendant source is ambiguous in import-linter. - Pydantic v2 constraint extraction is via
field_info.metadata(Pydantic v2 storesMinLen,MaxLen,Ge,Gt,Le,Lt, and the legacy_PydanticGeneralMetadata.patternas metadata objects, not as direct attributes). ThePydanticUndefinedsentinel frompydantic_coreis used to distinguish "no default" from "default=None" or "default_factory=...". - JSON Schema MONEY is encoded as an
objectwithx-paxman-type: "MONEY"andproperties: {amount, currency}; the adapter rejects MONEY-typed properties that don't carry both subfields. The string-with-format heuristic is accepted as aSTRINGwithiso_4217andcurrency-sensitivetags (V1 documented limitation; per the Sprint 2 risk register). - Sprint 3 — InputProfile is bytes-only (per
docs/specs/input-profile-spec.md): it does not know about structured data. The API layer (Sprint 6) will serializedict/listinputs to bytes before callingmake_profile(). A lone surrogate in astrinput is replaced with U+FFFD (3 UTF-8 bytes) per Python'serrors="replace"policy. - Sprint 3 — V1 inference is a stub (per
EXTENDING.md§3 and the Sprint 3 risk register): real providers (OpenAI, Anthropic, Cohere) are V2. The stub is one class with one method; a unit test (test_stub_never_makes_network_calls) enforces that it never depends onrequests,httpx,urllib3,aiohttp, orsocket. - Sprint 3 — Validation rejects bool-as-int (per Sprint 1's "no implicit coercion" precedent):
_to_float()and_to_length()helpers checkisinstance(value, bool)first and returnNone, preventingTrue/Falsefrom being silently treated as1.0/0.0in min_value/max_value comparisons. - Sprint 3 — Capability tier assignment is a static spec field (per
docs/specs/capability-cost-model.md§4.1): the tier is part of theCapabilitySpec, not computed at plan time. This keeps the scoring formula input-independent and underwrites planner determinism.