SkillAgentSearch skills...

WikiChat

WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.

Install / Use

/learn @stanford-oval/WikiChat
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"> <img src="./public/img/logo_light.png" width="120px" alt="WikiChat Logo" style="display: block; margin: 0 auto;" /> <h1 align="center"> <b>WikiChat</b> <br> <a href="https://arxiv.org/abs/2305.14292"> <img src="https://img.shields.io/badge/cs.CL-2305.14292-b31b1b" alt="arXiv"> </a> <a href="https://github.com/stanford-oval/WikiChat/stargazers"> <img src="https://img.shields.io/github/stars/stanford-oval/WikiChat?style=social" alt="Github Stars"> </a> </h1> </p> <p align="center"> Stopping the Hallucination of Large Language Models </p> <p align="center"> <!-- <a href="https://stanford.edu" target="_blank"> <img src="./public/img/stanford.png" width="140px" alt="Stanford University" /> </a> --> </p> <p align="center"> Online demo: <a href="https://wikichat.genie.stanford.edu" target="_blank"> https://wikichat.genie.stanford.edu </a> <br> </p>

https://github.com/user-attachments/assets/3ac856ba-682c-4aed-9271-ce2f6a27cd5e

Table of Contents

<!-- <hr /> -->

Introduction

Large language model (LLM) chatbots like ChatGPT and GPT-4 get things wrong a lot, especially if the information you are looking for is recent ("Tell me about the 2024 Super Bowl.") or about less popular topics ("What are some good movies to watch from [insert your favorite foreign director]?"). WikiChat uses Wikipedia and the following 7-stage pipeline to makes sure its responses are factual. Each numbered stage involves one or more LLM calls.

<p align="center"> <img src="./public/img/pipeline.svg" width="700px" alt="WikiChat Pipeline" /> </p>

Check out our paper for more details: Sina J. Semnani, Violet Z. Yao*, Heidi C. Zhang*, and Monica S. Lam. 2023. WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore. Association for Computational Linguistics.

🚨 Announcements

  • (April 29, 2025) WikiChat 2.1 is now available! Key updates include:

  • (August 22, 2024) WikiChat 2.0 is now available! Key updates include:

    • Multilingual Support: By default, retrieves information from 10 different Wikipedias: 🇺🇸 English, 🇨🇳 Chinese, 🇪🇸 Spanish, 🇵🇹 Portuguese, 🇷🇺 Russian, 🇩🇪 German, 🇮🇷 Farsi, 🇯🇵 Japanese, 🇫🇷 French, and 🇮🇹 Italian.

    • Improved Information Retrieval

      • Now supports retrieval from structured data such as tables, infoboxes, and lists, in addition to text.
      • Has the highest quality public Wikipedia preprocessing scripts
      • Uses the state-of-the-art multilingual retrieval model BGE-M3.
      • Uses Qdrant for scalable vector search.
      • Uses RankGPT to rerank search results.
    • Free Multilingual Wikipedia Search API: We offer a high-quality, free (but rate-limited) search API for access to 10 Wikipedias, encompassing over 180M vector embeddings.

    • Expanded LLM Compatibility: Supports 100+ LLMs through a unified interface, thanks to LiteLLM.

    • Optimized Pipeline: Option for a faster and more cost-effective pipeline by merging the "generate" and "extract claim" stages of WikiChat.

    • LangChain Compatibility: Fully compatible with LangChain 🦜️🔗.

    • And Much More!

  • (June 20, 2024) WikiChat won the 2024 Wikimedia Research Award!

    <blockquote class="twitter-tweet"><p lang="en" dir="ltr">The <a href="https://twitter.com/Wikimedia?ref_src=twsrc%5Etfw">@Wikimedia</a> Research Award of the Year 2024 goes to &quot;WikiChat: Stopping the hallucination of large language model chatbots by few-shot grounding on Wikipedia&quot; ⚡<br><br>📜 <a href="https://t.co/d2M8Qrarkw">https://t.co/d2M8Qrarkw</a> <a href="https://t.co/P2Sh47vkyi">pic.twitter.com/P2Sh47vkyi</a></p>&mdash; Wiki Workshop 2024 (@wikiworkshop) <a href="https://twitter.com/wikiworkshop/status/1803793163665977481?ref_src=twsrc%5Etfw">June 20, 2024</a></blockquote>
  • (May 16, 2024) Our follow-up paper "🍝 SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing" is accepted to the Findings of ACL 2024. This paper adds support for structured data like tables, infoboxes and lists.

  • (January 8, 2024) Distilled LLaMA-2 models are released. You can run these models locally for a cheaper and faster alternative to paid APIs.

  • (December 8, 2023) We present our work at EMNLP 2023.

  • (October 27, 2023) The camera-ready version of our paper is now available on arXiv.

  • (October 06, 2023) Our paper is accepted to the Findings of EMNLP 2023.

Installation

Installing WikiChat involves the following steps:

  1. Install dependencies
  2. Configure the LLM of your choice. WikiChat supports over 100 LLMs, including models from OpenAI, Azure, Anthropic, Mistral, HuggingFace, Together.ai, and Groq.
  3. Select an information retrieval source. This can be any HTTP endpoint that conforms to the interface defined in retrieval/retriever_server.py. We provide instructions and scripts for the following options:
    1. Use our free, rate-limited API for Wikipedia in 25 languages.
    2. Download and host our provided Wikipedia index yourself.
    3. Create and run a new custom index from your own documents.
  4. Run WikiChat with your desired configuration.
  5. [Optional] Deploy WikiChat for multi-user access. We provide code to deploy a simple front-end and backend, as well as instructions to connect to an Azure Cosmos DB database for storing conversations.

System Requirements

This project has been tested with Python 3.11 on Ubuntu 20.04 LTS (Focal Fossa), but it should be compatible with many other Linux distributions. If you plan to use this on Windows WSL or macOS, or with a different Python version, be prepared for potential troubleshooting during installation.

Hardware requirements vary based on your intended use:

  1. Basic Usage: Running WikiChat with LLM APIs and our Wikipedia search API has minimal hardware requirements and should work on most systems.

View on GitHub
GitHub Stars1.6k
CategoryCustomer
Updated1d ago
Forks140

Languages

Python

Security Score

100/100

Audited on Mar 30, 2026

No findings