Factool
FacTool: Factuality Detection in Generative AI
Install / Use
/learn @GAIR-NLP/FactoolREADME
FacTool: Factuality Detection in Generative AI
Factuality Leaderboard | Installation | Quick Start | ChatGPT Plugin with FacTool | Citation |
This repository contains the source code and plugin configuration for our paper.
This repository also contains the resources for Halu-J, which introduces an open-source model for critique-based hallucination judge.
Factool is a tool augmented framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). Factool now supports 4 tasks:
- knowledge-based QA: Factool detects factual errors in knowledge-based QA.
- code generation: Factool detects execution errors in code generation.
- mathematical reasoning: Factool detects calculation errors in mathematical reasoning.
- scientific literature review: Factool detects hallucinated scientific literatures.
News
- [2023/09/25] 🔥 Congratulate Baichuan2-53B on achieving SOTA performance on the ChineseFactEval benchmark across Chinese LLMs!
- [2023/09/13] We release ChineseFactEval, a factuality benchmark for Chinese LLMs
- [2023/07/25] We introduce FacTool, a tool augmented framework for detecting factual errors of texts generated by LLMs
Demo of Knowledge-based QA:



Factuality Leaderboard
Our factuality leaderboard shows the factual accuracy of different chatbots evaluated by FacTool.
| LLMs | Weighted Claim-Level Accuracy | Response-Level Accuracy | | -------- | -------- | -------- | | GPT-4 | 75.60 | 43.33 | | ChatGPT | 68.63 | 36.67 | | Claude-v1 | 63.95 | 26.67 | | Bard | 61.15 | 33.33 | | Vicuna-13B | 50.35 | 21.67 |
Installation
-
For General User
pip install factool
-
For Developer
git clone git@github.com:GAIR-NLP/factool.git
cd factool
pip install -e .
Quick Start
API Key Preparation
- get your OpenAI API key from here. This is used in all scenarios (Knowledge-based QA, Code, Math, Scientific Literature Review).
- get your Serper API key from here. This is only used in Knowledge-based QA.
- get your Scraper API key from here. This is only used in Scientific Literature Review.
General Usage
You could also directly refer to ./example/example.py and example_inputs.jsonl for general usage.
<details> <summary>General Usage (click to toggle the content)</summary>export OPENAI_API_KEY=... # this is required in all tasks
export SERPER_API_KEY=... # this is required only in knowledge-based QA
export SCRAPER_API_KEY=... # this is requried only in scientific literature review
# Initialize a list of inputs. "entry_point" is only needed when the task is "code generation"
# please refer to example_inputs.jsonl for example inputs for each category
inputs = [
{"prompt": "<prompt1>", "response": "<response1>", "category": "<category1>", "entry_point": "<entry_point_1>"},
{"prompt": "<prompt2>", "response": "<response2>", "category": "<category2>", "entry_point": "<entry_point_2>"},
...
]
where
prompt: The prompt for the model to generate the response.response: The response generated by the model.category: The category of the task. it could be:kbqacodemathscientific
entry_point: The function name of the code snippet to be fact-checked in the response. Could be "null" if the category of the task is notcode.
from factool import Factool
# Initialize a Factool instance with the specified keys. foundation_model could be either "gpt-3.5-turbo" or "gpt-4"
factool_instance = Factool("gpt-4")
inputs = [
{
"prompt": "Introduce Graham Neubig",
"response": "Graham Neubig is a professor at MIT",
"category": "kbqa"
},
...
]
response_list = factool_instance.run(inputs)
print(response_list)
</details>
Knowledge-based QA
<details> <summary>Detailed usage of factool on knowledge-based QA (click to toggle the content)</summary>export OPENAI_API_KEY=...
export SERPER_API_KEY=...
from factool import Factool
# Initialize a Factool instance with the specified keys. foundation_model could be either "gpt-3.5-turbo" or "gpt-4"
factool_instance = Factool("gpt-4")
inputs = [
{
"prompt": "Introduce Graham Neubig",
"response": "Graham Neubig is a professor at MIT",
"category": "kbqa"
},
]
response_list = factool_instance.run(inputs)
print(response_list)
The response_list should follow the following format:
{
"average_claim_level_factuality": avg_claim_level_factuality
"average_response_level_factuality": avg_response_level_factuality
"detailed_information": [
{
'prompt': prompt_1,
'response': response_1,
'category': 'kbqa',
'claims': [claim_11, claim_12, ..., claims_1n],
'queries': [[query_111, query_112], [query_121, query_122], ..[query_1n1, query_1n2]],
'evidences': [[evidences_with_source_11], [evidences_with_source_12], ..., [evidences_with_source_1n]],
'claim_level_factuality': [{claim_11, reasoning_11, error_11, correction_11, factuality_11}, {claim_12, reasoning_12, error_12, correction_12, factuality_12}, ..., {claim_1n, reasoning_1n, error_1n, correction_1n, factuality_1n}],
'response_level_factuality': factuality_1
},
{
'prompt': prompt_2,
'response': response_2,
'category': 'kbqa',
'claims': [claim_21, claim_22, ..., claims_2n],
'queries': [[query_211, query_212], [query_221, query_222], ..., [query_2n1, query_2n2]],
'evidences': [[evidences_with_source_21], [evidences_with_source_22], ..., [evidences_with_source_2n]],
'claim_level_factuality': [{claim_21, reasoning_21, error_21, correction_21, factuality_21}, {claim_22, reasoning_22, error_22, correction_22, factuality_22}, ..., {claim_2n, reasoning_2n, error_2n, correction_2n, factuality_2n}],
'response_level_factuality': factuality_2,
},
...
]
}
In this case, you will get:
{
'average_claim_level_factuality': 0.0,
'average_response_level_factuality': 0.0,
'detailed_information': [
{
'prompt': 'Introduce Graham Neubig',
'response': 'Graham Neubig is a professor at MIT',
'category': 'kbqa', 'search_type': 'online',
'claims': [{'claim': 'Graham Neubig is a professor at MIT'}],
'queries': [['Graham Neubig current position', 'Is Graham Neubig a professor at MIT?']],
'evidences': [{'evidence': 'I am an Associate Professor of Computer Science at Carnegie Mellon University and CEO of Inspired Cognition. My research and development focuses on AI and ...', 'source': 'https://www.linkedin.com/in/graham-neubig-10b41616b'}, {'evidence': 'Missing: position | Show results with:position', 'source': 'https://www.linkedin.com/in/graham-neubig-10b41616b'}, {'evidence': 'My research is concerned with language and its role in human communication. In particular, my long-term research goal is to break down barriers in ...', 'source': 'https://miis.cs.cmu.edu/people/222215657/graham-neubig'}, {'evidence': 'My research focuses on handling human languages (like English or Japanese) with computers -- natural language processing. In particular, I am interested in ...', 'source': 'http://www.phontron.com/'}, {'evidence': 'Missing: current | Show results with:current', 'source': 'http://www.phontron.com/'}, {'evidence': 'Graham Neubig. I am an Associate Professor at the Carnegie Mellon University Language Technology Institute in the School of Computer Science, and work with ...', 'source': 'http://www.phontron.com/'}, {'evidence': 'Missing: MIT? | Show results with:MIT?', 'source': 'http://www.phontron.com/'}, {'evidence': 'Associate Professor, Language Technology Institute, Carnegie Mellon University Affiliated Faculty, Machine Learning Department, Carnegie Mellon University', 'source': 'https://www.phontron.com/research.php'}, {'evidence': 'Missing: MIT? | Show results with:MIT?', 'source': 'https://www.phontron.com/research.php'}, {'evidence': 'MIT Embodied Intelligence ... About the speaker: Graham ...', 'source': 'https://youtube.com/watch?v=CtcP5bvODzY'}],
'claim_level_factuality': [
{
'reasoning': 'The given text is non-factual. The evidence provided clearly states that Graham Neubig is an Associate Professor of Computer Science at Carnegie Mellon University, not at MIT.',
'error': 'The error in the text is the incorrect affiliation of Graham Neubig. He is not a professor at MIT.',
'correction': 'Graham Neubig is a professor at Carnegie Mellon University.',
'factuality': False,
'claim': 'Graham Neubig is a professor at MIT'
}
],
'response_level_factuality': False
}
]
}
</
Related Skills
node-connect
349.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
109.8kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
109.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
349.9kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
