SkillAgentSearch skills...

QAnything

Question and Answer based on Anything.

Install / Use

/learn @netease-youdao/QAnything
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <a href="https://github.com/netease-youdao/QAnything"> <!-- Please provide path to your logo here --> <img src="docs/images/qanything_logo.png" alt="Logo" width="800"> </a>

Question and Answer based on Anything

<p align="center"> <a href="./README.md">English</a> | <a href="./README_zh.md">简体中文</a> </p> </div> <div align="center">

<a href="https://qanything.ai"><img src="https://img.shields.io/badge/try%20online-qanything.ai-purple"></a>      <a href="https://read.youdao.com#/home"><img src="https://img.shields.io/badge/try%20online-read.youdao.com-purple"></a>     

<a href="./LICENSE"><img src="https://img.shields.io/badge/license-AGPL--3.0-yellow"></a>      <a href="https://github.com/netease-youdao/QAnything/pulls"><img src="https://img.shields.io/badge/PRs-welcome-red"></a>      <a href="https://twitter.com/YDopensource"><img src="https://img.shields.io/badge/follow-%40YDOpenSource-1DA1F2?logo=twitter&style={style}"></a>     

<a href="https://discord.gg/5uNpPsEJz8"><img src="https://img.shields.io/discord/1197874288963895436?style=social&logo=discord"></a>     

</div> <details open="open"> <summary>Table of Contents</summary> </details>

🚀 Important Updates

<h1><span style="color:red;">Important things should be said three times.</span></h1>

[2024-08-23: QAnything updated to version 2.0.]

[2024-08-23: QAnything updated to version 2.0.]

[2024-08-23: QAnything updated to version 2.0.]

<h2>
  • <span style="color:green">This update brings improvements in various aspects such as usability, resource consumption, search results, question and answer results, parsing results, front-end effects, service architecture, and usage methods.</span>
  • <span style="color:green">At the same time, the old Docker version and Python version have been merged into a new unified version, using a single-line command with Docker Compose for one-click startup, ready to use out of the box.</span>
</h2>

Contributing

We appreciate your interest in contributing to our project. Whether you're fixing a bug, improving an existing feature, or adding something completely new, your contributions are welcome!

Thanks to all contributors for their efforts

<a href="https://github.com/netease-youdao/QAnything/graphs/contributors"> <img src="https://contrib.rocks/image?repo=netease-youdao/QAnything" /> </a>

Special thanks!

<h2><span style="color:red;">Please note: Our list of contributors is automatically updated, so your contributions may not appear immediately on this list.</span></h2> <h2><span style="color:red;">Special thanks!:@ikun-moxiaofei</span></h2> <h2><span style="color:red;">Special thanks!:@Ianarua</span></h2>

Business contact information:

010-82558901

What is QAnything?

QAnything(Question and Answer based on Anything) is a local knowledge base question-answering system designed to support a wide range of file formats and databases, allowing for offline installation and use.

With QAnything, you can simply drop any locally stored file of any format and receive accurate, fast, and reliable answers.

Currently supported formats include: PDF(pdf),Word(docx),PPT(pptx),XLS(xlsx),Markdown(md),Email(eml),TXT(txt),Image(jpg,jpeg,png),CSV(csv),Web links(html) and more formats coming soon…

Key features

  • Data security, supports installation and use by unplugging the network cable throughout the process.
  • Supports multiple file types, high parsing success rate, supports cross-language question and answer, freely switches between Chinese and English question and answer, regardless of the language of the file.
  • Supports massive data question and answer, two-stage vector sorting, solves the problem of degradation of large-scale data retrieval, the more data, the better the effect, no limit on the number of uploaded files, fast retrieval speed.
  • Hardware friendly, defaults to running in a pure CPU environment, and supports multiple platforms such as Windows, Mac, and Linux, with no dependencies other than Docker.
  • User-friendly, no need for cumbersome configuration, one-click installation and deployment, ready to use, each dependent component (PDF parsing, OCR, embed, rerank, etc.) is completely independent, supports free replacement.
  • Supports a quick start mode similar to Kimi, fileless chat mode, retrieval mode only, custom Bot mode.

Architecture

<div align="center"> <img src="docs/images/qanything_arch.png" width = "700" alt="qanything_system" align=center /> </div>

Why 2 stage retrieval?

In scenarios with a large volume of knowledge base data, the advantages of a two-stage approach are very clear. If only a first-stage embedding retrieval is used, there will be a problem of retrieval degradation as the data volume increases, as indicated by the green line in the following graph. However, after the second-stage reranking, there can be a stable increase in accuracy, the more data, the better the performance.

<div align="center"> <img src="docs/images/two_stage_retrieval.jpg" width = "500" alt="two stage retrievaal" align=center /> </div>

QAnything uses the retrieval component BCEmbedding, which is distinguished for its bilingual and crosslingual proficiency. BCEmbedding excels in bridging Chinese and English linguistic gaps, which achieves

  • A high performance on <a href="https://github.com/netease-youdao/BCEmbedding/tree/master?tab=readme-ov-file#evaluate-semantic-representation-by-mteb" target="_Self">Semantic Representation Evaluations in MTEB</a>;
  • A new benchmark in the realm of <a href="https://github.com/netease-youdao/BCEmbedding/tree/master?tab=readme-ov-file#evaluate-rag-by-llamaindex" target="_Self">RAG Evaluations in LlamaIndex</a>.

1st Retrieval(embedding)

| Model | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | Avg |
|:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|
| bge-base-en-v1.5 | 37.14 | 55.06 | 75.45 | 59.73 | 43.05 | 37.74 | 47.20 |
| bge-base-zh-v1.5 | 47.60 | 63.72 | 77.40 | 63.38 | 54.85 | 32.56 | 53.60 |
| bge-large-en-v1.5 | 37.15 | 54.09 | 75.00 | 59.24 | 42.68 | 37.32 | 46.82 |
| bge-large-zh-v1.5 | 47.54 | 64.73 | 79.14 | 64.19 | 55.88 | 33.26 | 54.21 |
| jina-embeddings-v2-base-en | 31.58 | 54.28 | 74.84 | 58.42 | 41.16 | 34.67 | 44.29 |
| m3e-base | 46.29 | 63.93 | 71.84 | 64.08 | 52.38 | 37.84 | 53.54 |
| m3e-large | 34.85 | 59.74 | 67.69 | 60.07 | 48.99 | 31.62 | 46.78 |
| bce-embedding-base_v1 | 57.60 | 65.73 | 74.96 | 69.00 | 57.29 | 38.95 | 59.43 |

2nd Retrieval(rerank)

| Model | Reranking | Avg |
|:-------------------------------|:--------:|:--------:|
| bge-reranker-base | 57.78 | 57.78 |
| bge-reranker-large | 59.69 | 59.69 |
| bce-reranker-base_v1 | 60.06 | 60.06 |

RAG Evaluations in LlamaIndex(embedding and rerank)

<img src="https://github.com/netease-youdao/BCEmbedding/blob/master/Docs/assets/rag_eval_multiple_domains_summary.jpg">

NOTE:

  • In WithoutReranker setting, our bce-embedding-base_v1 outperforms all the other embedding models.
  • With fixing the embedding model, our bce-reranker-base_v1 achieves the best performance.
  • The combination of bce-embedding-base_v1 and bce-reranker-base_v1 is SOTA.
  • If you want to use embedding and rerank separately, please refer to BCEmbedding

LLM

The open source version of QAnything is based on QwenLM and has been fine-tuned on a large number of professional question-answering datasets. It greatly enhances the ability of question-answering. If you need to use it for commercial purposes, please follow the license of QwenLM. For more details, please refer to: QwenLM

🚀 Latest Updat

View on GitHub
GitHub Stars13.9k
CategoryDevelopment
Updated18m ago
Forks1.3k

Languages

Python

Security Score

95/100

Audited on Apr 3, 2026

No findings