Factool

FacTool: Factuality Detection in Generative AI

Generate Convert Improve

Install / Use

/learn @GAIR-NLP/Factool

About this skill

Quality Score

0/100

README

FacTool: Factuality Detection in Generative AI

This repository contains the source code and plugin configuration for our paper.

This repository also contains the resources for Halu-J, which introduces an open-source model for critique-based hallucination judge.

Project Website

Factool is a tool augmented framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). Factool now supports 4 tasks:

knowledge-based QA: Factool detects factual errors in knowledge-based QA.
code generation: Factool detects execution errors in code generation.
mathematical reasoning: Factool detects calculation errors in mathematical reasoning.
scientific literature review: Factool detects hallucinated scientific literatures.

News

[2023/09/25] 🔥 Congratulate Baichuan2-53B on achieving SOTA performance on the ChineseFactEval benchmark across Chinese LLMs!
[2023/09/13] We release ChineseFactEval, a factuality benchmark for Chinese LLMs
[2023/07/25] We introduce FacTool, a tool augmented framework for detecting factual errors of texts generated by LLMs

Demo of Knowledge-based QA:

Alt Text

Factuality Leaderboard

Our factuality leaderboard shows the factual accuracy of different chatbots evaluated by FacTool.

| LLMs | Weighted Claim-Level Accuracy | Response-Level Accuracy | | -------- | -------- | -------- | | GPT-4 | 75.60 | 43.33 | | ChatGPT | 68.63 | 36.67 | | Claude-v1 | 63.95 | 26.67 | | Bard | 61.15 | 33.33 | | Vicuna-13B | 50.35 | 21.67 |

Installation

For General User

pip install factool

For Developer

git clone git@github.com:GAIR-NLP/factool.git
cd factool
pip install -e .

Quick Start

API Key Preparation

get your OpenAI API key from here. This is used in all scenarios (Knowledge-based QA, Code, Math, Scientific Literature Review).
get your Serper API key from here. This is only used in Knowledge-based QA.
get your Scraper API key from here. This is only used in Scientific Literature Review.

General Usage

You could also directly refer to ./example/example.py and example_inputs.jsonl for general usage.

<details> <summary>General Usage (click to toggle the content)</summary>

export OPENAI_API_KEY=... # this is required in all tasks
export SERPER_API_KEY=... # this is required only in knowledge-based QA
export SCRAPER_API_KEY=... # this is requried only in scientific literature review

# Initialize a list of inputs. "entry_point" is only needed when the task is "code generation"
# please refer to example_inputs.jsonl for example inputs for each category
inputs = [
            {"prompt": "<prompt1>", "response": "<response1>", "category": "<category1>", "entry_point": "<entry_point_1>"},
            {"prompt": "<prompt2>", "response": "<response2>", "category": "<category2>", "entry_point": "<entry_point_2>"},
          ...
        ]

where

prompt: The prompt for the model to generate the response.
response: The response generated by the model.
category: The category of the task. it could be:
- kbqa
- code
- math
- scientific
entry_point: The function name of the code snippet to be fact-checked in the response. Could be "null" if the category of the task is not code.

from factool import Factool

# Initialize a Factool instance with the specified keys. foundation_model could be either "gpt-3.5-turbo" or "gpt-4"
factool_instance = Factool("gpt-4")

inputs = [
            {
                "prompt": "Introduce Graham Neubig",
                "response": "Graham Neubig is a professor at MIT",
                "category": "kbqa"
            },
            ...
]
response_list = factool_instance.run(inputs)

print(response_list)

</details>

Knowledge-based QA

<details> <summary>Detailed usage of factool on knowledge-based QA (click to toggle the content)</summary>

export OPENAI_API_KEY=...
export SERPER_API_KEY=...

from factool import Factool

# Initialize a Factool instance with the specified keys. foundation_model could be either "gpt-3.5-turbo" or "gpt-4"
factool_instance = Factool("gpt-4")

inputs = [
            {
                "prompt": "Introduce Graham Neubig",
                "response": "Graham Neubig is a professor at MIT",
                "category": "kbqa"
            },
]
response_list = factool_instance.run(inputs)

print(response_list)

The response_list should follow the following format:

{
  "average_claim_level_factuality": avg_claim_level_factuality
  "average_response_level_factuality": avg_response_level_factuality
  "detailed_information": [
    {
      'prompt': prompt_1, 
      'response': response_1, 
      'category': 'kbqa', 
      'claims': [claim_11, claim_12, ..., claims_1n], 
      'queries': [[query_111, query_112], [query_121, query_122], ..[query_1n1, query_1n2]], 
      'evidences': [[evidences_with_source_11], [evidences_with_source_12], ..., [evidences_with_source_1n]], 
      'claim_level_factuality': [{claim_11, reasoning_11, error_11, correction_11, factuality_11}, {claim_12, reasoning_12, error_12, correction_12, factuality_12}, ..., {claim_1n, reasoning_1n, error_1n, correction_1n, factuality_1n}], 
      'response_level_factuality': factuality_1
    },
    {
      'prompt': prompt_2, 
      'response': response_2, 
      'category': 'kbqa',
      'claims': [claim_21, claim_22, ..., claims_2n], 
      'queries': [[query_211, query_212], [query_221, query_222], ..., [query_2n1, query_2n2]], 
      'evidences': [[evidences_with_source_21], [evidences_with_source_22], ..., [evidences_with_source_2n]], 
      'claim_level_factuality': [{claim_21, reasoning_21, error_21, correction_21, factuality_21}, {claim_22, reasoning_22, error_22, correction_22, factuality_22}, ..., {claim_2n, reasoning_2n, error_2n, correction_2n, factuality_2n}],
      'response_level_factuality': factuality_2,
    },
    ...
  ]
}

In this case, you will get:

{
    'average_claim_level_factuality': 0.0,  
    'average_response_level_factuality': 0.0, 
    'detailed_information': [
        {
          'prompt': 'Introduce Graham Neubig',
          'response': 'Graham Neubig is a professor at MIT', 
          'category': 'kbqa', 'search_type': 'online', 
          'claims': [{'claim': 'Graham Neubig is a professor at MIT'}], 
          'queries': [['Graham Neubig current position', 'Is Graham Neubig a professor at MIT?']], 
          'evidences': [{'evidence': 'I am an Associate Professor of Computer Science at Carnegie Mellon University and CEO of Inspired Cognition. My research and development focuses on AI and ...', 'source': 'https://www.linkedin.com/in/graham-neubig-10b41616b'}, {'evidence': 'Missing: position | Show results with:position', 'source': 'https://www.linkedin.com/in/graham-neubig-10b41616b'}, {'evidence': 'My research is concerned with language and its role in human communication. In particular, my long-term research goal is to break down barriers in ...', 'source': 'https://miis.cs.cmu.edu/people/222215657/graham-neubig'}, {'evidence': 'My research focuses on handling human languages (like English or Japanese) with computers -- natural language processing. In particular, I am interested in ...', 'source': 'http://www.phontron.com/'}, {'evidence': 'Missing: current | Show results with:current', 'source': 'http://www.phontron.com/'}, {'evidence': 'Graham Neubig. I am an Associate Professor at the Carnegie Mellon University Language Technology Institute in the School of Computer Science, and work with ...', 'source': 'http://www.phontron.com/'}, {'evidence': 'Missing: MIT? | Show results with:MIT?', 'source': 'http://www.phontron.com/'}, {'evidence': 'Associate Professor, Language Technology Institute, Carnegie Mellon University Affiliated Faculty, Machine Learning Department, Carnegie Mellon University', 'source': 'https://www.phontron.com/research.php'}, {'evidence': 'Missing: MIT? | Show results with:MIT?', 'source': 'https://www.phontron.com/research.php'}, {'evidence': 'MIT Embodied Intelligence ... About the speaker: Graham ...', 'source': 'https://youtube.com/watch?v=CtcP5bvODzY'}],
          'claim_level_factuality': [
              {
                'reasoning': 'The given text is non-factual. The evidence provided clearly states that Graham Neubig is an Associate Professor of Computer Science at Carnegie Mellon University, not at MIT.', 
                'error': 'The error in the text is the incorrect affiliation of Graham Neubig. He is not a professor at MIT.', 
                'correction': 'Graham Neubig is a professor at Carnegie Mellon University.', 
                'factuality': False, 
                'claim': 'Graham Neubig is a professor at MIT'
              }
          ], 
          'response_level_factuality': False
       }
    ]
}

Related Skills

node-connect

349.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

claude-opus-4-5-migration

109.8k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

frontend-design

109.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

model-usage

349.9k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

GAIR-NLP

View profile

View on GitHub

GitHub Stars922

CategoryDevelopment

Updated4d ago

Forks68

GAIR-NLP/factool

Languages

Python

Security Score

100/100

Audited on Apr 2, 2026

No findings

Factool

Install / Use

README

FacTool: Factuality Detection in Generative AI

News

Demo of Knowledge-based QA:

Factuality Leaderboard

Installation

For General User

For Developer

Quick Start

API Key Preparation

General Usage

Knowledge-based QA

Related Skills