Focuses on providing a complete solution with sensible defaults, while still giving the knowledgeable user precise control over all the step
Designed to be transparent, so you can inspect outputs at intermediate steps
{rdocdump} - Dump source code, documentation and vignettes of R packages into a single file. Supports installed packages, tar.gz archives, and package source directories
The output is a single plain text file or a ‘character’, which is useful to ingest complete package documentation into a large language model (‘LLM’) or pass it further to other tools, such as {ragnar} to create a Retrieval-Augmented Generation (RAG) workflow.
{RAGFlowChainR} - Brings Retrieval-Augmented Generation (RAG) capabilities to R, inspired by LangChain. It enables intelligent retrieval of documents from a local vector store (DuckDB), enhanced with optional web search, and seamless integration with Large Language Models (LLMs).
In RAG applications, the first part is about fetching the correct information, and getting it right is important because you can’t overload an agent with too much irrelevant information.
To make it precise the chunks need to be quite small and relevant to the search query.
However, if you make the chunks too small, you risk giving the LLM too little context. With chunks that are too large, the search system may become imprecise.
Retrieval
Often benefits from hybrid search (supported by both Qdrant and LlamaIndex), although it may not be enough.
Semantic search can connect things that answer the question without using the exact wording,
Sparse methods can identify exact keywords. But sparse methods like BM25 are token-based by default, so plain BM25 won’t match substrings.
If you also want to search for substrings (part of product IDs, that kind of thing), you need to add a search layer that supports partial matches as well.