RRF hybrid-fusion retrieval pattern#
[!info] When to apply Have two (or N) retrieval systems with complementary errors (e.g., 30% both-hit + 20% only-A + 20% only-B + 30% neither = union > each-alone)? Reciprocal Rank Fusion (RRF, Cormack et al. 2009) merges their ranked lists into one production-grade top-K. Always worth piloting when overlap < 60% and union-ceiling > each-alone-recall by 15pp+.
A pattern#
Query
├─→ System A (top-N candidates ranked) ─┐
├─→ System B (top-N candidates ranked) ─┤
├─→ (System C, D, ... optional) ─┤
│ ├─→ RRF merge → top-K
│ score(d) = Σ_systems 1/(k_rrf + rank_in_system + 1)
│ │
└─→ Re-rank by score, return top-K ─┘
Default constants (Cormack-style, validated 2026-05-20 on vault): - k_rrf = 60 (1-100 range insensitive on small corpus) - fetch_k = 20 per-system (sweet-spot — see "Fetch-K sweep" below) - top_k = 5 (display window)
Why it works#
Two retrieval-systems with azonos recall (e.g., 55-55%) but különböző hibák (28% both-hit + 22/22% only-one) compose: the union of their top-K lists covers ~72% of ground-truth, which RRF can extract because it re-ranks by mutual agreement (a doc in both lists scores higher than a doc in one).
Concrete result — vault-search + agentmemory (2026-05-20)#
| Configuration | Recall@5 |
|---|---|
| vault-search alone (BM25+bge-m3 hybrid) | 54.5% |
| agentmemory alone (smart-search noop) | 76.4% |
| RRF fusion (k_rrf=60, fetch-k=20) | 77.5% average · 85.39% best · 69.66% worst |
Lift vs vault-search alone: +23pp average, +30pp best-case. Lift vs agentmemory alone: +1pp (close — agentmemory the stronger single), +9pp best.
Methodology-sensitivity: RRF fusion magasabb a "tuning"-query-distribution-on, alacsonyabb a held-out methodology-n. Production-recall realisztikus: 70-85% sávban query-mix-től függően.
Fetch-K sweep — monotone-decreasing pattern over 20#
| fetch-K | RRF Recall@5 (n=89, clean setup) |
|---|---|
| 10 | 79.78% |
| 20 | 85.39% ⭐ sweet-spot |
| 30 | 79.78% |
| 50 | 76.40% (= agentmemory alone, RRF nem nyer) |
Same monotone-decreasing pattern as the longmemeval-k5-sweet-spot finding (2026-05-19): több candidate ≠ jobb. A "wider pool helps fusion" BEIR/MTEB-lore NEM áll a vault-corpus-on. fetch-k=20 a sweet-spot.
When NOT to use#
| Anti-pattern | Why |
|---|---|
| Single retrieval-system available | RRF needs ≥2 ranked lists |
| Systems with HIGH overlap (>70% both-hit) | Marginal lift, latency-cost not worth it |
| Realtime-strict (<200ms) requirements | RRF adds the slower system's latency on top of the faster (parallel-call possible, sequential-call not) |
| Per-system tuning impossible | RRF doesn't help if both systems are equally bad on the query-distribution |
| Result-set size unknown / unbounded | RRF needs fetch-K — define it explicitly |
Implementation#
from collections import defaultdict
def rrf_fuse(lists: list[list[str]], k_rrf: int = 60, top_k: int = 5) -> list[str]:
"""Reciprocal Rank Fusion (Cormack et al. 2009).
Args:
lists: list of ranked doc-id lists, one per retrieval-system
k_rrf: fusion constant (Cormack default 60)
top_k: result-window
"""
scores = defaultdict(float)
for L in lists:
for rank, doc in enumerate(L):
if doc:
scores[doc] += 1.0 / (k_rrf + rank + 1)
return sorted(scores.keys(), key=lambda d: -scores[d])[:top_k]
Production deployment checklist#
- CLI wrapper for end-to-end orchestration (e.g.,
vault-search-fusion) - Graceful fallback to single-system if one is unreachable
- Per-system fetch-K tunable (sweep before deploy)
- systemd service for persistent ingest-storage (state-loss-recovery)
- Mirror-cron for new content auto-ingest (10-15 min lag acceptable)
- Persistent id→path map (mtime-cached, reload on change)
- JSON output mode for downstream tools
- Cross-validation on held-out methodology (NOT tuning-set)
- Latency budget defined (<200ms strict / <600ms acceptable / >1s bad)
Source verified#
- Implementation:
/usr/local/bin/vault-search-fusion(vault-search + agentmemory RRF wrapper) - Cron mirror:
*/10 * * * * agentmemory-ingest --since-min 15 - systemd:
/etc/systemd/system/agentmemory.service - Benchmark: 89-Q on vault sessions, 2026-05-20
- Production audit: ../06-Audits/2026-05-20 Production-stack v2 — RRF fusion CLI + systemd + cron-mirror + cross-validation
- Theoretical foundation: Cormack, G., Clarke, C., Buettcher, S. (2009). "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods"
Kapcsolódó#
- longmemeval-k5-sweet-spot — analogue K-sweep monotone-decreasing
- hybrid-bm25-semantic-rrf-pattern — szűkebb scope (BM25+semantic egy rendszerben)
- ../06-Audits/2026-05-20 RRF hybrid-fusion pilot — 91 percent R@5 (vault-search + agentmemory)
- ../06-Audits/2026-05-20 agentmemory head-to-head LongMemEval-S R@5 — TIE 52.81 percent, 22pp ensemble-gain potential
- sv-01-memory-architecture — B-2 sprint retrieval-stack