Paxman¶
Contract-driven deterministic normalization engine for Python.
Paxman transforms arbitrary input (PDFs, scans, emails, spreadsheets, APIs, free text) into evidence-backed, replayable normalized artifacts that conform to caller-supplied contracts (Pydantic, JSON Schema, OpenAPI, or a built-in Dict DSL).
What is Paxman?¶
Paxman is a library that produces an evidence-backed, replayable normalized artifact. It is the normalization step in a larger system. If you find yourself wanting workflow, persistence, or agentic features inside Paxman, that is a signal to wrap Paxman from the outside.
- Contract-driven. You bring the contract. Paxman doesn't own your schema.
- Field-centric, deterministic planning. Each required field gets its own plan.
- Evidence-backed. Every resolved value carries provenance and confidence.
- Replayable. Rehydrate the artifact without recomputation.
- Honest. Unresolved fields are explicit, never silent.
Install¶
pip install paxman # core (no adapters)
pip install paxman[pydantic] # + Pydantic adapter
pip install paxman[all] # + all V1 adapters
5-minute quickstart¶
from decimal import Decimal
from pydantic import BaseModel
import paxman
import paxman.contract.adapters.pydantic # self-registers the adapter
class LineItem(BaseModel):
description: str
quantity: int
unit_price: float
class Invoice(BaseModel):
supplier_name: str
total_amount: float
currency_code: str
line_items: list[LineItem] = []
artifact = paxman.normalize(
input_data="ACME Corp — Invoice #1234 — Total: $1,234.56 USD",
contract=Invoice,
budget=paxman.Budget(max_total_cost_usd=Decimal("0.10")),
)
print(artifact.status) # Status.SUCCESS or Status.PARTIAL_SUCCESS
print(artifact.normalized_data) # {"supplier_name": "ACME Corp", ...}
print(artifact.unresolved_fields) # [] (or list of fields Paxman could not resolve)
print(artifact.replay_hash) # deterministic SHA-256 signature
# Later: replay the artifact without re-running the pipeline
rehydrated = paxman.replay(artifact, contract=Invoice)
assert rehydrated == artifact # byte-equal
Where to go next¶
| If you want to… | Start here |
|---|---|
| Understand the mental model | Concepts → Contracts |
| Add a new contract format (e.g. Avro) | How-to → Add a contract adapter |
| Add a new capability (e.g. table extraction) | How-to → Add a capability |
| Add a new inference provider (OpenAI, Anthropic, local) | How-to → Add an inference provider |
| Replay a stored artifact | How-to → Replay an artifact |
| Understand why a decision was made | Decision records (ADRs) |
| Migrate from LlamaIndex / LangChain / Unstructured | Migration guide |
| Contribute to Paxman | Contributing |
| Read the v1.0.0 release notes | Release notes v1.0.0 |
| Read the v1.0.1 release notes | Release notes v1.0.1 |
Reference examples¶
Paxman ships with three reference mini-packages that cover the three
target personas. Each is a standalone, runnable project. Clone the
repo, cd into the example, and follow its README.
-
Backend service
A minimal FastAPI service exposing
POST /normalizefor contract-driven normalization. Accepts raw text, returns structured, evidence-backed JSON with a deterministic replay hash.Persona A — backend developer
-
AI agent ingest
A stdlib-only agent tool-calling loop that invokes
paxman.normalize()as a tool. Zero framework dependencies — port theNormalizeToolto LangChain, LlamaIndex, or any custom agent.Persona B — AI engineer
-
SaaS procurement pipeline
A CSV-batch invoice / quotation pipeline. Reads a manifest of raw input files, normalizes each against a Pydantic contract, writes artifacts to disk, and verifies cross-run replay-hash reproducibility.
Persona C — SaaS team
Project links¶
- Source code: github.com/nexusnv/paxman
- PyPI: pypi.org/project/paxman
- Changelog: CHANGELOG.md on Read the Docs
- Issue tracker: github.com/nexusnv/paxman/issues
- Security disclosures: security policy
About¶
Paxman is developed by Nexus Envision Sdn Bhd. Released under the MIT License.