Method & trust

Why you can trust a skill — and exactly where its value comes from.

A confidently-wrong legal answer is worse than no tool at all. Everything about how these skills are built follows from that one sentence.

The trust model

Two layers, two different trust standards.

The judgment layer is AI-drafted and expert-reviewed — the model is good at how to approach a task, and review catches tone and gaps. The authoritative layer is sourced from real law and expert-verified — because that is where a hallucinated fact would live, and the judgment layer’s instruction to “sound authoritative and cite sources” would make that wrong fact more damaging, not less.

SKILL.md Judgment layer

How to approach the task, the way an election expert would.

Identify the governing jurisdiction first
Cite the controlling law for every claim
Flag genuine uncertainty — never smooth it over
Draft in plain language, for human review

AI-drafted · expert-reviewed

references/ Authoritative layer

The actual law — the facts the judgment grounds itself in.

Specific statutes & administrative rules
Deadlines, cure windows, exemptions
Dated to the day the law was verified
Never AI-generated from memory

Sourced from real law · expert-verified

Axis 1Agnostic judgment + intentional grounding

Value type 1 — encoded methodology

Make the model approach the task the way an expert would, even when the facts still come from the model. Achievable today; polished, but possibly wrong on hard legal specifics.

Good enough alone for method-dominant tasks, where exact local law isn’t make-or-break.

Value type 2 — authoritative grounding

Connect the model to verified, current, jurisdiction-specific facts. This is the real moat — and it is not achievable by instructions alone. It needs carried references or runtime retrieval.

Required for fact-dominant tasks like cure deadlines or records releasability — or they become liabilities.

Axis 2Single-shot vs. agentic verification loop

A loop drafts, checks against explicit criteria, revises, and converges. It’s worth it only when the verification step is more reliable than the generation step — the model can be a mediocre generator but a good checker.

Strong fit

Records-request response
Plain-language rewrite
Claim-grounding against sources

Weak fit

Open-ended judgment with no checkable target
“Warm tone” — single-shot suffices

A loop is only as good as its test point. We build loops only where a real verification signal exists — self-consistency is not correctness.

Production model

AI-drafted, expert-reviewed, expert-tested.

1

Scope the task & jurisdiction

Pick a language-heavy, judgment-driven task that consumes officials’ time, and decide what jurisdiction the facts must be true to.
2

AI drafts the judgment layer

The model writes the thin, agnostic SKILL.md — how to approach the task like an expert. AI is genuinely good at this part.
3

Source the authoritative layer

Every fact is pulled from real statute or guidance and cited. Never AI-generated from memory. This is the moat and the risk, so it gets the strictest standard.
4

Expert review

A subject-matter expert reviews tone, completeness, and — above all — the legal facts. The SME is the trust gate. The pipeline cannot skip them.
5

Adversarial testing

The expert tests the skill in a real tool against real questions — careful ones and terse, messy, real-world ones — to see whether it holds up under pressure.
6

Version, date & publish

The skill carries the date its law was verified, so staleness is always visible. Then it ships.

Non-negotiable gates

The lines the build never crosses.

No legal fact ships unverified against a real source.
Uncertainty is flagged, not smoothed.
Every skill is tested by an expert against real questions before publish.
Any fact that can’t be traced to a real source is flagged for human SME verification — never published as settled.

SMEs are the trust gate. The plan cannot skip them.

Deliberately out of scope

What these skills are not.

No hosted platform, marketplace, or agent runtime to maintain. Distribution stays light.
No tasks that demand deterministic correctness — ballot proofing, tabulation, chain-of-custody. Those want software, not an LLM.
No auto-sending and no replacing an official’s judgment.
No covering all fifty states at once. Depth in the pilot jurisdiction first.