Kihagyás

Subagent-fanout context-aware classification#

Origin: Originally written in Hungarian as part of MyForge Vault 11.11 — Superintelligent Vault project. Source: subagent-fanout-context-aware-classification (Hungarian version).

The subagent-fanout pattern (claude-code-subagent-fanout) scales well for bulk LLM mutation ($0 cost in Claude Code), but classification tasks (entity typing, label injection, summary tagging, NER) have quality that varies drastically based on whether the subagent receives context for the item being classified. A blind classifier (item name only) typically yields ~5% false-positive rate and low recall; a context-aware classifier (subagent reads the source documents where the entity is mentioned) has FP-rate <2%, with recall +20-40pp higher.

TL;DR#

  • Blind fanout = subagent INPUT is only entity_name → fast, cheap, but FP ~5%, low recall
  • Context-aware fanout = subagent INPUT is entity_name + 1-2 source excerpts → slower (~2-3× the time), but FP <2%, recall +20-40pp
  • Hybrid strategy: blind fanout in iteration 1 → keep high-confidence items → re-run ambiguous ones via context-aware in iteration 2
  • Reusable: applies to every classification task (entity typing, doc categorization, sentiment, intent detection)

Background — entity-typing pipeline evolution#

An entity-typing project for a graph with 8,997 entities went through three iterations:

Iteration Pattern Coverage FP-rate (sample)
1 stand-in classifier (rule-based, entity-name regex only) 14.87% ~8% (manual sample, 4 wrong out of 50)
2 blind subagent-fanout (7 batches, entity-name only) 28.9% → 72.8%* ~5%
3 context-aware subagent-fanout (entity + source excerpt) >90% target <2% target

*The 28.9% → 72.8% jump was the fanout COMBINED with the mgclient.autocommit fix — see mgclient-autocommit-silent-rollback. The increase attributable purely to fanout was significant, but the autocommit bug masked its real magnitude in the first 4 batches.

The blind iteration produced 2 primary error types: 1. Homonym entities — e.g. "Cache" can be an HTTP cache pattern, a character name in a game-design GDD, or a React useMemo cache. Without context, the subagent picks the most common meaning → wrong label for a specific vault segment. 2. Domain-specific meanings — e.g. "Pocock" in the context of Sequential Validity statistical methods, NOT the British political historian — a general-knowledge subagent can't tell. The context-aware iteration would resolve this from an excerpt of Pocock-design-monitoring-statistics.md.

The pattern#

Blind iteration (Phase 1)#

# Fast, cheap, ~80% accuracy
def blind_classify(entity_name):
    prompt = f"""Classify this vault entity into one of:
    Concept, Decision, Pattern, Skill, Project, Person, Tool, Other.

    Entity name: {entity_name}

    Reply with ONLY the label, no explanation."""
    return subagent.run(prompt)

Context-aware iteration (Phase 2)#

# Slower, more expensive, ~95-98% accuracy
def context_aware_classify(entity_name, source_mds):
    """source_mds: list of (path, excerpt) where entity is mentioned"""
    context = "\n\n".join(
        f"### {p}\n{e[:500]}" for p, e in source_mds[:3]
    )
    prompt = f"""Classify this vault entity based on actual usage context:

    Entity name: {entity_name}

    Context from documents where it appears:
    {context}

    Choose ONE label: Concept, Decision, Pattern, Skill, Project, Person, Tool, Other.

    Reasoning (1 sentence) then label on last line."""
    return subagent.run(prompt)

Hybrid (production-grade)#

def hybrid_classify(entities):
    # Round 1: blind, high-throughput
    round1 = {e: blind_classify(e) for e in entities}

    # Confidence filter: concrete label, NOT "Other", NOT null
    confident = {e: l for e, l in round1.items() if l != "Other"}
    ambiguous = [e for e in entities if e not in confident]

    # Round 2: context-aware ONLY for ambiguous
    for e in ambiguous:
        sources = find_mentions(e, vault)
        round1[e] = context_aware_classify(e, sources)

    return round1

This handles 60-80% of items via the cheap blind path and spends the expensive context-aware budget on the rest. Vault-scale ROI is ~3-5× more significant than pure context-aware.

When blind, when context-aware#

Task Blind enough Context-aware mandatory
Unique global proper noun (person name, brand) rarely
General concept without polysemy (HTTP, REST) only <2% domain-specific
Domain-specific code-word / internal project name MANDATORY
Polysemous entity (Cache, Pipeline, Worker) MANDATORY
Newly coined slug / freshly minted convention MANDATORY
Spec-violation detection MANDATORY (the spec is the context)

Rule of thumb: if your corpus contains domain-specific jargon (and every serious long-form corpus does), the minimum is 2 iterations.

Anti-pattern: "blind-only" big-bang#

Do NOT run a large entity set (>5,000) with blind only classification and accept the result. The 5% FP inflates the typing coverage metric (false-positives look "typed"), but downstream features (search rerank by label, label-filtered graph traversal) train on noise. The FP-rate cascades into the next layer's quality, which is harder to measure.

Another anti-pattern: skipping manual sampling. Whether after a blind or context-aware iteration — a 50-100-item random sample for manual verify is mandatory. A high-confidence subagent report does NOT guarantee that the label is actually correct. LLM self-evaluation bias mitigation (g-eval-bias-mitigation-pattern) tells us that the LLM tends to overstate its own confidence.

Reusable rules#

Task type Recommended pattern
Entity typing (graph matrix) Hybrid: blind → ambiguous context-aware
Doc categorization (markdown → folder) Blind enough if taxonomy is stable
Tag suggestion Context-aware mandatory (project conventions)
Sentiment analysis Blind, except ironic/sarcastic domains
Intent detection (chat) Context-aware (last 2-3 turns injected)
NER (general) Blind (specialized model is materially better than LLM-fanout)
Spec-violation detection Context-aware (the spec is the context)

Cost model#

Subagent-fanout in Claude Code is $0 ($0 / agent call), but there's a time cost: - Blind: ~2-3 sec/entity (simple prompt) - Context-aware: ~6-10 sec/entity (3-5 MD excerpts read + classification) - Hybrid 80/20: ~4 sec/entity average

For 8,997 entities, hybrid is ~10 agent-hours (parallel 8 subagents → ~1.25 hours wall clock). For production bulk tasks this is acceptable.

Complementary patterns#

  • Self-verification — every subagent output is verified by ANOTHER subagent (two-phase quorum). More expensive but FP drops below 1%.
  • Ensemble — 3 subagents classify independently, majority vote → even lower FP
  • Active-learning loop — errors found in manual samples are added to the few-shot prompt for the next iteration
  • Source-quality weightwiki/ sources outweigh raw/ sources, context-injection priority
  • Confidence-threshold escalation — blind output includes "confidence 0-1"; <0.7 → context-aware re-run

Audio overview#

  • EN narration (Charon voice): [[.vault-nb/audio-overviews/subagent-fanout-context-aware-classification.en.mp3]]
  • HU narration (Kore voice): [[.vault-nb/audio-overviews/subagent-fanout-context-aware-classification.hu.mp3]]

Generated via Gemini 3.1 Flash TTS preview. ~1-2 minutes each. See gemini-3-1-flash-tts-pipeline for the pipeline.