AgenticRAGTracer
No description available
Install / Use
/learn @YqjMartin/AgenticRAGTracerREADME
AgenticRAGTracer: A Hop-Aware Benchmark for Diagnosing Multi-Step Retrieval Reasoning in Agentic RAG
📁 Repository Structure
AgenticRAGTrace
├── multihop_pipeline.py # Core multi-hop generation pipeline
├── multihop_run.py # Parallel runner for multi-hop QA generation
├── multihop_prompt.yaml # The prompts used in the multi-hop generation pipeline
├── evaluation.py # Agentic RAG evaluation script
├── retriever_serving.py # Retriever service
├── retriever_config.yaml # Retriever configuration
└── README.md
🔎 Retriever Service
First, RAG serving needs to be started. Below are some of the configurations that need to be made in the code.
# retriever_config.yaml
gpu_id: ""
retrieval_method: "e5"
retrieval_model_path: "e5-base-v2"
index_path: "e5_flat_inner.index"
faiss_gpu: False
corpus_path: "wiki18_100w.jsonl"
# retriever_serving.py
python retriever_serving.py \
--config retriever_config.yaml \
--port 8000
🔄 Multi-hop Data Generation
You can use the code of multihop_run.py and multihop_pipeline.py to build the Multihop QA of AgenticRAG. Below are some of the configurations you need to make in the code.
# multihop_pipeline.py
API_URL = "" # The URL address of the LLM API you are using
API_KEY = "" # API key
DEFAULT_MODEL = "" # The model you want to use for generation
# multihop_run.py
python multihop_run.py
📊 Evaluation
You can use the code in evaluation.py to evaluate the LLM that you want to assess. Below are some of the configurations you need to make.
MAX_WORKERS = # Maximum number of working threads
MODEL_ID = "" # The name of the LLM you wish to evaluate
API_BASE = "" # API base URL
API_URL = "" # Full URL of the API
API_KEY = "" # API key
DEFAULT_MODEL = "" # The name of the LLM used for the LLM judge
INPUT_JSONL_LIST = [] # The path of our benchmark
Related Skills
node-connect
348.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
348.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
348.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
