Skip to content

Paxman

Contract-driven deterministic normalization engine for Python.

CI License: MIT Python py.typed

Paxman transforms arbitrary input (PDFs, scans, emails, spreadsheets, APIs, free text) into evidence-backed, replayable normalized artifacts that conform to caller-supplied contracts (Pydantic, JSON Schema, OpenAPI, or a built-in Dict DSL).


What is Paxman?

Paxman is a library that produces an evidence-backed, replayable normalized artifact. It is the normalization step in a larger system. If you find yourself wanting workflow, persistence, or agentic features inside Paxman, that is a signal to wrap Paxman from the outside.

  • Contract-driven. You bring the contract. Paxman doesn't own your schema.
  • Field-centric, deterministic planning. Each required field gets its own plan.
  • Evidence-backed. Every resolved value carries provenance and confidence.
  • Replayable. Rehydrate the artifact without recomputation.
  • Honest. Unresolved fields are explicit, never silent.

Install

pip install paxman                          # core (no adapters)
pip install paxman[pydantic]                # + Pydantic adapter
pip install paxman[all]                     # + all V1 adapters

5-minute quickstart

from decimal import Decimal
from pydantic import BaseModel

import paxman
import paxman.contract.adapters.pydantic  # self-registers the adapter


class LineItem(BaseModel):
    description: str
    quantity: int
    unit_price: float


class Invoice(BaseModel):
    supplier_name: str
    total_amount: float
    currency_code: str
    line_items: list[LineItem] = []


artifact = paxman.normalize(
    input_data="ACME Corp — Invoice #1234 — Total: $1,234.56 USD",
    contract=Invoice,
    budget=paxman.Budget(max_total_cost_usd=Decimal("0.10")),
)

print(artifact.status)               # Status.SUCCESS or Status.PARTIAL_SUCCESS
print(artifact.normalized_data)      # {"supplier_name": "ACME Corp", ...}
print(artifact.unresolved_fields)    # []  (or list of fields Paxman could not resolve)
print(artifact.replay_hash)          # deterministic SHA-256 signature

# Later: replay the artifact without re-running the pipeline
rehydrated = paxman.replay(artifact, contract=Invoice)
assert rehydrated == artifact  # byte-equal

Where to go next

If you want to… Start here
Understand the mental model Concepts → Contracts
Add a new contract format (e.g. Avro) How-to → Add a contract adapter
Add a new capability (e.g. table extraction) How-to → Add a capability
Add a new inference provider (OpenAI, Anthropic, local) How-to → Add an inference provider
Replay a stored artifact How-to → Replay an artifact
Understand why a decision was made Decision records (ADRs)
Migrate from LlamaIndex / LangChain / Unstructured Migration guide
Contribute to Paxman Contributing
Read the v1.0.0 release notes Release notes v1.0.0
Read the v1.0.1 release notes Release notes v1.0.1

Reference examples

Paxman ships with three reference mini-packages that cover the three target personas. Each is a standalone, runnable project. Clone the repo, cd into the example, and follow its README.

  • Backend service


    A minimal FastAPI service exposing POST /normalize for contract-driven normalization. Accepts raw text, returns structured, evidence-backed JSON with a deterministic replay hash.

    Persona A — backend developer

    Read the example

  • AI agent ingest


    A stdlib-only agent tool-calling loop that invokes paxman.normalize() as a tool. Zero framework dependencies — port the NormalizeTool to LangChain, LlamaIndex, or any custom agent.

    Persona B — AI engineer

    Read the example

  • SaaS procurement pipeline


    A CSV-batch invoice / quotation pipeline. Reads a manifest of raw input files, normalizes each against a Pydantic contract, writes artifacts to disk, and verifies cross-run replay-hash reproducibility.

    Persona C — SaaS team

    Read the example

About

Paxman is developed by Nexus Envision Sdn Bhd. Released under the MIT License.