RAG is Dead Introducing Agenti

Limitations of Standard RAG #

Context Loss: Standard Retrieval-Augmented Generation (RAG) relies on semantic similarity and chunking, which causes the loss of global context and document structure.
Retrieval Failures: Traditional systems struggle with cross-document references and dependencies where information is spread across multiple files.
Semantic Limitations: Simple similarity searches often ignore specific document cross-references (common in legal or insurance documents) that a human reader would follow.

Agentic File Exploration Overview #

The Concept: A move away from "retrieval" toward "exploration," mimicking how humans navigate a large corpus of information.
No Indexing: Unlike RAG, this system does not use pre-built vector indices or embeddings for initial retrieval.
Architecture: Inspired by coding agents (like Claude Engineer), it uses an event-driven loop orchestration via LlamaIndex and a multi-layered backend.
Tool-Based Interaction: The agent uses six specific tools: folder scanning, document parsing (via Dockling), file previewing, reading, RegEx search, and path pattern finding.

Three-Phase Exploration Process #

Phase 1 (Scanning): The system converts PDFs/text into Markdown using Dockling and uses an LLM to scan document starts to identify potentially relevant files based on the query.
Phase 2 (Deep Dive): The agent reads identified documents in full. It identifies missing cross-references or headers that point to other documents.
Phase 3 (Backtracking & Collection): The agent utilizes a backtracking mechanism to fetch context from newly discovered references, aggregating all data to answer complex queries.

Project Implementation & Local Models #

Model Requirements: Small models (4B, 8B, 14B) failed to follow complex multi-step instructions and hallucinated tool use. Qwen 2.5 32B is the recommended minimum for reliable performance.
Local Support: While the original version used Gemini 1.5 Flash, a new branch supports local execution via Ollama.
Hardware (NVIDIA DGX Spark): The developer uses DGX Spark with 128GB of unified memory to support the 32B model and a 64,000-token context window.
Inference Speed: The system is not designed for real-time chatbot interaction; complex queries can take several minutes (e.g., 4 minutes for a multi-file risk assessment).

Summary #

Agentic File Exploration is an open-source alternative to RAG designed to solve the "lost in the middle" and context fragmentation issues of semantic search. By using an agentic workflow—where an LLM uses tools to browse, read, and follow cross-references within a document folder—the system can synthesize answers from disparate files that traditional chunking would miss. While slower than standard RAG and requiring more compute (ideally a 32B+ model and significant VRAM), it excels at generating comprehensive reports from complex, interconnected document sets like legal or financial records.

last updated: 2026-03-11