Document pipelines

Unstructured
documents. Audited,
gated, structured output.

Every stage of the pipeline runs a Produce-Audit-Gate-Fix loop. The LLM produces. An auditor scores. A gate accepts or rejects. The output is not a summary — it is a traced specification that passed every gate to get there.

Forge output is designed to be ingested by Prism as typed knowledge graph nodes. Documents become nodes and edges, not summaries.

Gomulti-stagePAGF loopfanout · map · reduce

forge · requirements_processingstage 3/8

S1Normalizemap

S2Groupfanout

S3Enrichmap

S4Validatemap

S5Synthesizereduce

PAGF · produce → audit → gate → fix247 items

Multi-stage

Each pipeline is a sequence of typed stages — fanout, map, flatmap, reduce. Stages compose. A fanout creates N items; downstream map stages process each. Reduce collapses them back.

PAGF quality

Every stage runs a Produce-Audit-Gate-Fix loop. The LLM produces. An auditor scores. A gate accepts or rejects. A fixer corrects and retries. No stage produces output that hasn't passed a gate.

Traced output

Every output artifact carries its lineage: which stage, which prompt version, which audit decision, which source document. Nothing is anonymous. Every claim is traceable to its origin.

The problem

Documents contain knowledge. LLMs lose it.

A single-pass LLM summarizes. It compresses. It discards the structure, the edge cases, the contradictions, the implied constraints. You get a paragraph where you needed a specification.

Forge does not summarize. It transforms — stage by stage, with each transformation audited and gated before the next begins. The output is structured enough to be consumed by code, precise enough to be reviewed by engineers.

Stage types

Forge pipelines are composable. Each stage type handles one transformation pattern:

mapOne input → one output. Process each item independently.

fanoutOne input → N outputs. Decompose a document into items.

flatmapN inputs → M outputs. Expand and filter in a single pass.

reduceN inputs → one output. Synthesize a collection into one artifact.

PAGF quality loop

Every stage either meets the bar or fixes itself. No stage produces output that hasn't passed a gate.

At each stage: the LLM produces an artifact. An auditor evaluates it against explicit criteria. The gate accepts or rejects. On rejection, a fixer receives the artifact and the audit findings and retries. This loop runs until the artifact passes or the retry budget is exhausted.

Produce→Audit→Gate→Fix

Implemented pipelines

The pipeline engine is general-purpose. Requirements processing is the first production use case.

requirements_processing8 stages

XLSX requirements document → grouped, summarized, capability-mapped specification

requirements_processing_enriched12 stages

Extends base pipeline with domain knowledge indexing and per-requirement enrichment

requirement_analysis4 stages

Clarity audit, scope creep analysis, uncertainty analysis, effort estimation

Unstructureddocuments. Audited,gated, structured output.