Kihagyás

SV B-8 — RSI Tier-2 real-LLM Critic (skeleton)#

2026-05-19 — first landing of the real-LLM Critic Layer 4 of the multi-layer safety-gate. Skeleton-grade: opt-in env-flag, 2-phase pending pattern reused from crystallize-pending, $0 cost via subagent-fanout.

Why#

The bmad-vault-bridge --apply safety-gate had 4 deterministic rules at Layer 4 (length / forbidden-target / git-hook / Critic-stub). The stub was a placeholder. This skeleton turns the stub into a real 5-dim LLM rubric that can refuse propagation on semantic grounds — but stays opt-in until calibration data accumulates.

Architecture#

                         ┌────────────────────────────────┐
candidate bullet ────────▶│  critic-review.py             │
                         │   Phase 1: write request.json │
                         └─────────────┬──────────────────┘
                         ┌────────────────────────────────┐
                         │  Claude subagent (crystallize- │
                         │  pending pattern — $0 cost)   │
                         │   reads request.json          │
                         │   writes response.json        │
                         └─────────────┬──────────────────┘
                         ┌────────────────────────────────┐
                         │  critic-review.py              │
                         │   Phase 3: parse response     │
                         │   apply threshold-policy      │
                         │   write audit-log JSONL       │
                         │   return verdict pass/fail    │
                         └────────────────────────────────┘

5-dim rubric (0.0 - 1.0 float each)#

Dim Measures 0.0 anchor 1.0 anchor
factuality Is the bullet verifiable + correct? False / unverifiable KO-DB match @ conf >= 0.95
novelty Genuinely new info? Exact-dup of existing wiki First-of-kind playbook
durability 6+ months relevant? Session-debug noise Architecture pattern
vault_fit Suits the target file? ADR-content in MEMORY.md Canonical fit
safety PII / kred / forbidden? Leaks credential Fully public-safe

safety is a hard gate >= 0.9 in every mode.

Threshold modes (VAULT_CRITIC_MODE)#

Mode Logic When to use
strict (default) all 5 dims >= 0.85 AND safety >= 0.9 Aggressive ramp
default mean >= 0.7 AND min >= 0.5 AND safety >= 0.9 Conservative ramp
lenient mean >= 0.5 AND safety >= 0.9 Shadow calibration

Mode is read from env at score time. Default is strict to stay conservative until enough shadow data accumulates for ramp-down.

File map#

File Lines Role
.vault-ko/safety/critic-review.py ~260 Runner: write request, parse response, apply threshold, audit-log
.vault-ko/prompts/critic-review-template.md ~180 5-dim rubric, anchor examples, strict-JSON output
.vault-ko/safety/git-pre-commit-hook.sh ~140 Layer 3 + Layer 4 — invokes runner when VAULT_CRITIC_ACTIVE=1
.vault-ko/tests/test_critic_review.py ~180 5 pytest tests (phase-1, parse, strict, safety-hard-gate, back-compat)

Activation#

# Stay on the deterministic 4-rule stub (default — back-compat)
git commit ...

# Activate real-LLM Critic for THIS commit
VAULT_CRITIC_ACTIVE=1 \
VAULT_CRITIC_MODE=strict \
CRITIC_BULLET="<the candidate bullet text>" \
CRITIC_TARGET="<path/of/target/file.md>" \
CRYSTALLIZE_APPLYING=1 \
  git commit ...

The runner expects a Claude subagent to have written .vault-ko/safety/pending/<hash>-response.json BEFORE the commit attempts to land (typically dispatched by bmad-vault-bridge itself during the apply pipeline, via the crystallize-pending skill).

Fail-closed semantics#

If no response.json appears within VAULT_CRITIC_TIMEOUT (default 300s) or the JSON is malformed / missing dims, the runner returns verdict=fail and the hook blocks the commit. There is no fail-open path — that is intentional for Layer-4 safety.

Audit log#

Every Critic verdict appends one JSONL row to 06-Audits/critic-review-log.jsonl:

{"ts":"2026-05-19T15:00:00Z","hash":"abc123","target_file":"11-wiki/foo.md","verdict":"pass","mode":"strict","scores":{...},"reasoning":"..."}

Stub-mode commits also log (with "mode":"stub") so the audit-log gives a single timeline view across both modes.

What is NOT done (next-step)#

  • No automatic subagent-spawn from the runner — the caller (bmad-vault-bridge --apply) still has to orchestrate the Claude subagent invocation. Mirrors the crystallize-pending pattern.
  • No kodb_context injection yet — the prompt template references it but the runner doesn't load it. Pending hook into the existing vault-ko-query --top-k bridge.
  • No threshold calibration data — modes are tuned from the design doc, not measured. Needs 50-100 shadow-mode bullets before any ramp decision.
  • No modified_bullet flow — the response schema supports it but the runner currently only consumes scores + reasoning. Wiring up the modify-pathway is the next iteration.

Cost#

$0 per Critic invocation — Claude subagent dispatch happens on the Claude Code subscription, no Anthropic API key required. Matches the crystallize-pending skill cost profile.