PageIndex

PageIndex usa un sistema agentico per trovare documenti rilevanti per rispondere a una query, rispetto al classico RAG basato su similarità semantica.

Traditional vector-based RAG relies on semantic similarity rather than true relevance. But similarity ≠ relevance — what we truly need in retrieval is relevance, and that requires reasoning. When working with professional documents that demand domain expertise and multi-step reasoning, similarity search often falls short.

Inspired by AlphaGo, we propose PageIndex — a vectorless, reasoning-based RAG system that builds a hierarchical tree index from long documents and uses LLMs to reason over that index for agentic, context-aware retrieval. It simulates how human experts navigate and extract knowledge from complex documents through tree search, enabling LLMs to think and reason their way to the most relevant document sections. PageIndex performs retrieval in two steps:

Generate a “Table-of-Contents” tree structure index of documents

Perform reasoning-based retrieval through tree search

Note di Matteo