Files
echo-core/docs/okf-navigation-plan.md
Marius Mutu 5c9748ffb4 feat(memory): hybrid retrieval — navigation index.md + RAG hardening
Expose a navigation layer to the agent and harden RAG, after analyzing the
OKF note and testing on the real KB.

- memory_search.search(): dedupe best-chunk-per-file (a relevant note can no
  longer be buried by another file's chunks) + keyword fallback tagged
  degraded:True when Ollama is unreachable (no more hard crash).
- update_notes_index.py: emit per-folder index.md + root router; prune empty
  folders; fix latent subcategory->project bug.
- Exclude generated index.md from RAG rglob (reindex/incremental) + indexer
  scans + heartbeat freshness check (prevents self-pollution / reindex thrash).
- CLAUDE.md: reframe memory as hybrid (navigation first, RAG for fuzzy recall).
- Delete stale orphan kb/youtube/index.json; correct the OKF source note.
- Tests: dedup, keyword fallback, index.md exclusion. Plan + review in docs/.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 17:52:27 +00:00

7.9 KiB
Raw Permalink Blame History

Plan: Navigation layer pentru memoria agentului (OKF-inspired)

Sursă: analiza notei memory/kb/youtube/2026-06-27_google-open-knowledge-format.md

  • test empiric pe KB-ul real (151 note youtube, 581 note total).

Context / problemă

Agentul Echo caută în memorie DOAR prin RAG (src/memory_search.py: Ollama all-minilm 384-dim + cosine scan în SQLite). CLAUDE.md îl declară "single source of truth". Test empiric: RAG ratează nota relevantă când query-ul e parafrazat conceptual (ex. "cum organizez un KB pt agenți să folosească mai puțini tokens" → nota OKF nu apare în top-8). memory/kb/index.json există (581 note, regenerat azi) dar e consumat DOAR de dashboard-ul web (căi notes-data/), nu de agent, și are 84k tokens. Există un orfan stale kb/youtube/index.json (8/151 note, 5 luni vechime).

Obiectiv

Dă agentului un strat de navigare ieftin și robust care completează RAG-ul (nu îl înlocuiește), prinde parafrazele pe care embeddings le ratează, și merge ca fallback când Ollama remote pică.

Recomandări (scope propus)

R1 — Șterge orfanul kb/youtube/index.json

Stale din 30 ian (8/151 note). Capcana "index învechit > lipsă index". Efort: trivial.

R2 — Generează index.md slim per-folder, auto

Extinde tools/update_notes_index.py să emită, pe lângă index.json, un index.md per subfolder kb/ (title + descriere 1 rând + tags). Pilot dovedit: youtube/ index.md = 11k tokens vs 259k (citit tot, 24×) vs 84k (index global, 7.7×). Capcană: scriptul scanează *.md recursiv → trebuie să excludă explicit index.md ca să nu-l trateze ca notă (poluează index.json). Regenerat din heartbeat.py la fiecare notă nouă.

R3 — Expune navigarea agentului (hibrid cu RAG)

La memory_search, încarcă întâi index.md slim al folderului-țintă pe lângă top-k din RAG, și combină. Prinde și parafraza, și keyword-ul. Instrucțiune în CLAUDE.md cum să folosească indexul.

R4 — Tratează Ollama remote ca SPOF

RAG depinde de host remote (10.0.20.161:11434). Dacă pică, search() aruncă ConnectionError → memoria agentului dispare. index.md per-folder = fallback fără Ollama. Adaugă degradare grațioasă în memory_search.search().

R5 — NU face conversie big-bang la YAML front matter

Doar 6/586 note au YAML; update_notes_index.py extrage deja metadata din convenția **Tags:**/**Data:**. Standardizează doar de-acum în template-ul de notă nouă.

R6 — Corectează nota OKF

Marchează "Google a lansat OKF" ca neverificat (o sursă YouTube; se confundă cu Open Knowledge Foundation). Actualizează "Relevanță": nu lipsesc indexuri, lipsește un index navigabil EXPUS agentului.

NU în scope

  • Vizualizare HTML graph a KB-ului (deprioritizat, efort mare/valoare mică).
  • Înlocuirea RAG cu navigare pură (hibrid, nu substituție).
  • Migrare ANN/vector-ext pentru viteza RAG (separat).

GSTACK REVIEW REPORT (/autoplan)

Voices: Claude subagent only — codex missing on this host (all phases [subagent-only]). Phases run: CEO, Eng, DX. Design skipped (no UI scope — HTML viz is out of scope).

Cross-phase themes (flagged independently in 2-3 phases = high confidence)

Theme Phases Severity
T1 — R3 routing is undefined. "Load the target folder's index.md" requires already knowing the folder — that IS the navigation problem. The 11k figure holds only for youtube alone; loading all 13 folders ≈ 43-84k, erasing the win. CEO, Eng, DX CRITICAL
T2 — Wrong consumer. The autonomous agent (Claude CLI in heartbeat.py) has filesystem access and never calls search(). Wiring R3 into memory_search.search() only changes the human /search command, not the agent. Eng, DX HIGH
T3 — Staleness trap recreated. R1 deletes a stale index (proof these rot). R2 creates 13+ new generated artifacts triggered only on new note, not edits → silent drift. CEO, Eng, DX HIGH
T4 — Self-pollution into RAG. memory_search.reindex()/incremental_index() do rglob("*.md") with no exclusion → index.md gets embedded and returned as fake "notes" in top-k. (Plan only flagged the index.json pollution, missed the RAG DB one.) Eng HIGH
T5 — Token win vs strawman baseline. Comparison is against "read all 259k" (nobody does that). Real baseline = RAG top-k (~1-3k tokens). Against that, index.md is more tokens, justified only by recall. CEO HIGH
T6 — Cheaper alternatives unexamined. init_config already supports ollama.model/embedding_dim → swapping all-minilm(384) for nomic/bge + reindex is a one-line change. Plus likely chunk-dedup recall bug, plus SQLite FTS5 hybrid (no new infra). All target "RAG misses paraphrases" directly. CEO CRITICAL
T7 — R4 is the one sound, decoupled item. search() raises ConnectionError on Ollama outage with no fallback (real SPOF). Ship independently. BUT it's a breaking contract change (existing tests assert it raises). CEO, Eng, DX keep

CEO consensus (subagent-only)

  • Right problem? DISAGREE w/ plan — likely weak embedding model + chunk-dedup bug, not missing navigation.
  • Premises stated? No — one query is not enough evidence; token win is vanity baseline.
  • 6-month regret: 3 parallel stale metadata copies (SQLite, index.json, index.md).
  • Alternatives explored? No — BM25/FTS5 hybrid, reranker, better embedder never compared.
  • Prior art: OKF unverified/possibly nonexistent; bespoke format = zero portability gain.

Eng consensus (subagent-only)

  • Architecture: R3 unbuildable as written (no folder signal into search()). R2-in-update_notes_index acceptable reuse but keep separated from notes-data/ rewriting.
  • Edge cases: T4 self-pollution; heartbeat mtime thrash; projects/ (236 notes, nested) breaks flat per-folder assumption.
  • Tests: R4 breaks search() contract — existing tests assert raise; need rewrite + new coverage for R2/R3/T4.

DX consensus (subagent-only)

  • Discoverability: CLAUDE.md:138 calls RAG "single source of truth" — a soft new instruction loses to it; agent keeps defaulting to RAG.
  • Human workflow: edit-without-new-file → silent index.md drift.
  • Degradation signal (R4): must return mode="degraded_navigation_only" + tell user, never silent.
  • Latent bug to fix first: update_notes_index.py:244 references n['subcategory'], a key never set (extractor sets project).

Decision Audit Trail

# Phase Decision Class Principle Rationale
1 Eng Add index.md exclusion to BOTH update_notes_index scan AND memory_search rglob (reindex/incremental) Mechanical P1 completeness T4 is silent corruption; non-negotiable IF R2 ships
2 Eng R4 split from R2/R3, shipped standalone Mechanical P6 action Highest value/lowest risk, no dependency
3 DX R4 returns structured degraded mode + user signal, not silent Mechanical P1 Silent shallow results worse than error
4 CEO/DX R3 (hybrid into search()) deferred until routing + consumer resolved (T1/T2) Taste P5 explicit Unbuildable as written
5 CEO Add "fix RAG first" track (model test + chunk-dedup + FTS5) before bespoke index USER CHALLENGE P3/P4 Cheaper, reuses infra, targets same symptom — but user's call
6 all R1 (delete orphan) + R6 (fix note) ship anytime Mechanical P6 Trivial, independent

REVISED scope (post-review)

  • Ship now (safe, independent): R1 delete orphan, R6 fix note, R4 graceful degradation (with explicit signal + test rewrite), fix latent bug update_notes_index.py:244, chunk-dedup in search().
  • Test before building (cheap, reversible): swap embedding model (nomic-embed-text/bge-m3) + reindex; re-run the failing paraphrase query; prototype SQLite FTS5 hybrid.
  • Build only if the above doesn't fix recall: R2 index.md (with T3/T4 lifecycle + exclusion fixes, per-category granularity for projects/), R3 hybrid (after routing + consumer T1/T2 designed).