Encoding dense domain rules
Underwriting guidelines, tax rules, compliance controls, regulatory timing — this logic has to be exact, testable, and auditable, and you can’t get there by hoping a free-running LLM “knows the rules.” The recurring answer is to pull the rule out of the model: encode it as deterministic, version-controlled code (with the LLM drafting the encoding and proposing inputs), constrain the model’s output to a checkable shape, and make every decision carry the lineage that proves which rule fired.
Why it’s hard
Section titled “Why it’s hard”The rules are dense and they change — too intricate to trust to a model’s latent knowledge, too fluid to bury in hand-written if-statements that drift out of date. And the cost of getting one wrong is asymmetric: Pylon “reps & warrants every loan,” so “an incorrectly modeled rule can cost the company tens of thousands of dollars on a single loan” — there’s no partial credit. On top of correctness, regulated domains demand auditability: a CPA, an examiner, or a lawyer has to see why a decision was made, which rule applied, and what data fed it — so a confident answer with no traceable basis is a liability, not a feature.
Patterns
Section titled “Patterns”Rules-as-code, with the LLM as the draft layer — Compile the regulatory text into deterministic, executable logic and enforce it outside the model; let the LLM draft the encoding and propose inputs, but never be the final arbiter of the rule. Pylon “encode[s] natural language rules into code … with DSLs and novel techniques — including AI — to translate dense regulatory guidelines into executable logic,” built “side-by-side with mortgage experts.” — Pylon
Constrain generation to a schema — Where model output feeds a rule, force it into a structured shape so it’s machine-checkable rather than prose to parse. Comp AI generates policies and controls as Zod-structured output (57 generateObject call-sites vs. 7 free-text), with controls mapped many-to-many to frameworks so one control satisfies many regulations. — Comp AI
Explainability as a ship gate: lineage, confidence, citation — A rule-bound output must carry why it fired, and go-live is gated on that, not just accuracy. Basis benchmarks “not just accuracy, but how clearly the model can explain its reasoning,” each output carrying “what data was used, why it was mapped that way, and how confident the system is”; Harvey emits sentence-level citations alongside its reasoning. — Basis, Harvey
Author with experts, gate the hard cases to them — Domain experts write and validate the rules, and low-confidence or high-stakes items route to a human rather than auto-executing. Pylon builds its DSLs with mortgage experts; Confido routes low-confidence and high-dollar deductions to reviewers with “full traceability.” — Pylon, Confido
Test the rulebook like code — Version the rules and run a regression suite over historical cases so a rule change can’t silently break, the same discipline you’d apply to any build. For Pylon, where a misencoded rule is six figures of liability, that means golden-file/snapshot tests over historical loans gating DSL changes in CI. — Pylon
Tools & popular choices
Section titled “Tools & popular choices”| Decision | Common choice | Notes |
|---|---|---|
| Encoding the rule | A domain DSL / rules-as-code; AI drafts → experts finalize | Pylon compiles regulatory English into executable, version-controlled DSLs. |
| Shaping model output | Schema-constrained generation (Zod / structured output) | Comp AI: structured from the start, not prose — 57 generateObject call-sites. |
| Proving the decision | Data lineage + confidence + sentence-level citations | Basis attaches lineage and confidence per output; Harvey cites at sentence level across 30k concurrent cells. |
| Mapping to regulation | Many-to-many control → framework mappings, version-controlled | Comp AI’s RequirementMap: one control satisfies many frameworks. |
| Authoring | SME-in-the-loop: experts write/validate; HITL on hard cases | Pylon with mortgage experts; Confido reviewers on low-confidence / high-$. |
| Testing | Golden-file / snapshot regression over historical cases in CI | Gates rule changes against real history — see Testing output that isn’t reproducible. |
Reference architecture
Section titled “Reference architecture”There are two planes: an authoring plane and a runtime plane. In authoring, domain experts and the LLM together turn regulation into a deterministic, version-controlled rule layer — a DSL or schema-defined policy set — and a regression suite over historical cases gates every change before it lands. At runtime, the LLM proposes a structured output, which a deterministic check validates against the rule layer and stamps with data lineage, confidence, and citations. Clear cases act and write to an audit trail; low-confidence or high-stakes cases route to an expert. The model never is the rule — it drafts the rule offline and proposes inputs online, and the deterministic layer is what actually decides.
Mermaid source
flowchart LR classDef io fill:#eef2f8,stroke:#94a3b8,stroke-width:1.5px,color:#0f172a; classDef ai fill:#eef0fe,stroke:#6366f1,stroke-width:1.5px,color:#0f172a; classDef gate fill:#fef6e7,stroke:#d9a441,stroke-width:1.5px,color:#0f172a; classDef human fill:#fdecec,stroke:#e0564f,stroke-width:1.5px,color:#0f172a;
subgraph authoring["Authoring plane"] Reg[("Regulation / policy<br/>+ domain experts")]:::io Encode("Encode as DSL / policy-as-code<br/>AI drafts → SME finalizes"):::ai Tests{"Regression suite over<br/>historical cases"}:::gate Rules[("Deterministic rule layer<br/>version-controlled")]:::io end subgraph runtime["Runtime plane"] LLM("LLM proposes<br/>structured output"):::ai Check{"Deterministic check<br/>+ lineage / citation / confidence"}:::gate Act("Act + audit trail"):::io Review("Expert review<br/>low-confidence / high-stakes"):::human end
Reg --> Encode --> Tests Tests -->|gate changes| Rules Rules --> Check LLM --> Check Check -->|pass| Act Check -->|fail / low confidence| Review Review --> ActBest practices
Section titled “Best practices”- Encode the rule, don’t hope the model knows it. Dense regulation belongs in deterministic, tested code — let the LLM draft the encoding and propose inputs, but enforce the rule outside the model.
- Constrain generation to a schema. Where output feeds a rule, force structure so it’s machine-checkable, not prose to parse after the fact.
- Make explainability a ship gate. A rule-bound decision must carry its lineage, confidence, and citations; if the system can’t say why, it doesn’t go live (see Graduating an agent from assistant to actor).
- Author with the experts, gate the hard cases to them. SMEs write and validate the rules; route low-confidence or high-stakes items to a human instead of auto-executing.
- Test the rulebook like code. Snapshot and regression suites over historical cases catch a rule change that silently breaks — version the rules and gate changes in CI.
Seen in
Section titled “Seen in”- Pylon — compiles dense mortgage-underwriting guidelines into executable, version-controlled DSLs (AI drafts, mortgage experts finalize), enforced deterministically and gated by economic liability — a misencoded rule can cost tens of thousands on a single loan.
- Basis — explainability is the gate, not accuracy alone: outputs carry data lineage and confidence, and models are benchmarked on how clearly they explain their reasoning before a workflow ships.
- Comp AI — compliance-as-code in the open: policies generated as Zod-structured output (not free text), controls mapped many-to-many to frameworks, every rule auditable on GitHub.
- Harvey / Confido — proof at the point of decision: Harvey emits sentence-level citations with reasoning; Confido gates low-confidence and high-dollar items to human reviewers with full traceability.