SkillAgentSearch skills...

Factool

FacTool: Factuality Detection in Generative AI

Install / Use

/learn @GAIR-NLP/Factool

README

FacTool: Factuality Detection in Generative AI

Factuality Leaderboard | Installation | Quick Start | ChatGPT Plugin with FacTool | Citation |

This repository contains the source code and plugin configuration for our paper.

This repository also contains the resources for Halu-J, which introduces an open-source model for critique-based hallucination judge.

Project Website

Factool is a tool augmented framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). Factool now supports 4 tasks:

  • knowledge-based QA: Factool detects factual errors in knowledge-based QA.
  • code generation: Factool detects execution errors in code generation.
  • mathematical reasoning: Factool detects calculation errors in mathematical reasoning.
  • scientific literature review: Factool detects hallucinated scientific literatures.
<p align="center"> <img src="figs/factool.png" width="300"/> </p>

News

Demo of Knowledge-based QA:

Alt Text

Alt Text

Alt Text

Factuality Leaderboard

Our factuality leaderboard shows the factual accuracy of different chatbots evaluated by FacTool.

| LLMs | Weighted Claim-Level Accuracy | Response-Level Accuracy | | -------- | -------- | -------- | | GPT-4 | 75.60 | 43.33 | | ChatGPT | 68.63 | 36.67 | | Claude-v1 | 63.95 | 26.67 | | Bard | 61.15 | 33.33 | | Vicuna-13B | 50.35 | 21.67 |

Installation

  • For General User

pip install factool
  • For Developer

git clone git@github.com:GAIR-NLP/factool.git
cd factool
pip install -e .

Quick Start

API Key Preparation

  • get your OpenAI API key from here. This is used in all scenarios (Knowledge-based QA, Code, Math, Scientific Literature Review).
  • get your Serper API key from here. This is only used in Knowledge-based QA.
  • get your Scraper API key from here. This is only used in Scientific Literature Review.

General Usage

You could also directly refer to ./example/example.py and example_inputs.jsonl for general usage.

<details> <summary>General Usage (click to toggle the content)</summary>
export OPENAI_API_KEY=... # this is required in all tasks
export SERPER_API_KEY=... # this is required only in knowledge-based QA
export SCRAPER_API_KEY=... # this is requried only in scientific literature review
# Initialize a list of inputs. "entry_point" is only needed when the task is "code generation"
# please refer to example_inputs.jsonl for example inputs for each category
inputs = [
            {"prompt": "<prompt1>", "response": "<response1>", "category": "<category1>", "entry_point": "<entry_point_1>"},
            {"prompt": "<prompt2>", "response": "<response2>", "category": "<category2>", "entry_point": "<entry_point_2>"},
          ...
        ]

where

  • prompt: The prompt for the model to generate the response.
  • response: The response generated by the model.
  • category: The category of the task. it could be:
    • kbqa
    • code
    • math
    • scientific
  • entry_point: The function name of the code snippet to be fact-checked in the response. Could be "null" if the category of the task is not code.
from factool import Factool

# Initialize a Factool instance with the specified keys. foundation_model could be either "gpt-3.5-turbo" or "gpt-4"
factool_instance = Factool("gpt-4")

inputs = [
            {
                "prompt": "Introduce Graham Neubig",
                "response": "Graham Neubig is a professor at MIT",
                "category": "kbqa"
            },
            ...
]
response_list = factool_instance.run(inputs)

print(response_list)
</details>

Knowledge-based QA

<details> <summary>Detailed usage of factool on knowledge-based QA (click to toggle the content)</summary>
export OPENAI_API_KEY=...
export SERPER_API_KEY=...
from factool import Factool

# Initialize a Factool instance with the specified keys. foundation_model could be either "gpt-3.5-turbo" or "gpt-4"
factool_instance = Factool("gpt-4")

inputs = [
            {
                "prompt": "Introduce Graham Neubig",
                "response": "Graham Neubig is a professor at MIT",
                "category": "kbqa"
            },
]
response_list = factool_instance.run(inputs)

print(response_list)

The response_list should follow the following format:

{
  "average_claim_level_factuality": avg_claim_level_factuality
  "average_response_level_factuality": avg_response_level_factuality
  "detailed_information": [
    {
      'prompt': prompt_1, 
      'response': response_1, 
      'category': 'kbqa', 
      'claims': [claim_11, claim_12, ..., claims_1n], 
      'queries': [[query_111, query_112], [query_121, query_122], ..[query_1n1, query_1n2]], 
      'evidences': [[evidences_with_source_11], [evidences_with_source_12], ..., [evidences_with_source_1n]], 
      'claim_level_factuality': [{claim_11, reasoning_11, error_11, correction_11, factuality_11}, {claim_12, reasoning_12, error_12, correction_12, factuality_12}, ..., {claim_1n, reasoning_1n, error_1n, correction_1n, factuality_1n}], 
      'response_level_factuality': factuality_1
    },
    {
      'prompt': prompt_2, 
      'response': response_2, 
      'category': 'kbqa',
      'claims': [claim_21, claim_22, ..., claims_2n], 
      'queries': [[query_211, query_212], [query_221, query_222], ..., [query_2n1, query_2n2]], 
      'evidences': [[evidences_with_source_21], [evidences_with_source_22], ..., [evidences_with_source_2n]], 
      'claim_level_factuality': [{claim_21, reasoning_21, error_21, correction_21, factuality_21}, {claim_22, reasoning_22, error_22, correction_22, factuality_22}, ..., {claim_2n, reasoning_2n, error_2n, correction_2n, factuality_2n}],
      'response_level_factuality': factuality_2,
    },
    ...
  ]
}

In this case, you will get:

{
    'average_claim_level_factuality': 0.0,  
    'average_response_level_factuality': 0.0, 
    'detailed_information': [
        {
          'prompt': 'Introduce Graham Neubig',
          'response': 'Graham Neubig is a professor at MIT', 
          'category': 'kbqa', 'search_type': 'online', 
          'claims': [{'claim': 'Graham Neubig is a professor at MIT'}], 
          'queries': [['Graham Neubig current position', 'Is Graham Neubig a professor at MIT?']], 
          'evidences': [{'evidence': 'I am an Associate Professor of Computer Science at Carnegie Mellon University and CEO of Inspired Cognition. My research and development focuses on AI and ...', 'source': 'https://www.linkedin.com/in/graham-neubig-10b41616b'}, {'evidence': 'Missing: position | Show results with:position', 'source': 'https://www.linkedin.com/in/graham-neubig-10b41616b'}, {'evidence': 'My research is concerned with language and its role in human communication. In particular, my long-term research goal is to break down barriers in ...', 'source': 'https://miis.cs.cmu.edu/people/222215657/graham-neubig'}, {'evidence': 'My research focuses on handling human languages (like English or Japanese) with computers -- natural language processing. In particular, I am interested in ...', 'source': 'http://www.phontron.com/'}, {'evidence': 'Missing: current | Show results with:current', 'source': 'http://www.phontron.com/'}, {'evidence': 'Graham Neubig. I am an Associate Professor at the Carnegie Mellon University Language Technology Institute in the School of Computer Science, and work with ...', 'source': 'http://www.phontron.com/'}, {'evidence': 'Missing: MIT? | Show results with:MIT?', 'source': 'http://www.phontron.com/'}, {'evidence': 'Associate Professor, Language Technology Institute, Carnegie Mellon University Affiliated Faculty, Machine Learning Department, Carnegie Mellon University', 'source': 'https://www.phontron.com/research.php'}, {'evidence': 'Missing: MIT? | Show results with:MIT?', 'source': 'https://www.phontron.com/research.php'}, {'evidence': 'MIT Embodied Intelligence ... About the speaker: Graham ...', 'source': 'https://youtube.com/watch?v=CtcP5bvODzY'}],
          'claim_level_factuality': [
              {
                'reasoning': 'The given text is non-factual. The evidence provided clearly states that Graham Neubig is an Associate Professor of Computer Science at Carnegie Mellon University, not at MIT.', 
                'error': 'The error in the text is the incorrect affiliation of Graham Neubig. He is not a professor at MIT.', 
                'correction': 'Graham Neubig is a professor at Carnegie Mellon University.', 
                'factuality': False, 
                'claim': 'Graham Neubig is a professor at MIT'
              }
          ], 
          'response_level_factuality': False
       }
    ]
}

</

Related Skills

View on GitHub
GitHub Stars922
CategoryDevelopment
Updated4d ago
Forks68

Languages

Python

Security Score

100/100

Audited on Apr 2, 2026

No findings