RAGTruth

RAGTruth is a word-level hallucination corpus in various tasks within the Retrieval-augmented generation (RAG) setting both for training and evaluating.

RAG has become a main technique for alleviating hallucinations in large language models (LLMs). Despite the integration of RAG, LLMs may still present unsupported or contradictory claims to the retrieved contents. In order to develop effective hallucination prevention strategies under RAG, it is important to create benchmark datasets that can measure the extent of hallucination. RAGTruth comprises nearly 18,000 naturally generated responses from diverse LLMs using RAG. These responses have undergone meticulous manual annotations at both the individual cases and word levels, incorporating evaluations of hallucination intensity.

Updates

[2024/06] We released our training and evaluation code. Model weight can be found here
[2024/02] We updated the data: we included more annotated hallucinations and added one new meta, implicit_true.
[2024/01] We released the RAGTruth corpus.

Dataset

Given that each piece of source information elicits six distinct responses from various Language Models, we publish the source information and its corresponding responses individually.

response.jsonl

| Field Name | Field Value | Description | | ----------- | ----------- | ------------------------------------------- | | id | String | The index of the response. | | source_id | String | The index of the source information. | | model | String | The model that generates the response. | | temperature | Float | The temperature when generating response. | | labels |List[Dict] | The hallucination spans. text is the hallucination itself; start & end are the position of hallucination span in the response; label_type is the type of the hallucination span. due_to_null means whether the hallucination is caused by null value. implicit_true means this span is correct while the info is not mentioned in the context. meta is the comment of the hallucination given by annotators. | | split | String | train or test | | quality | String | Indicate whether this answer has quality issues. incorrect_refusal means the model mistakenly refuses to answer despite the presence of the relevant context. truncated means the response is truncated unexpectedly. | | response | String | The response of LLMs given a certain instruction. |

Sample:

{
  "id": "1472",
  "source_id": "11316",
  "model": "mistral-7B-instruct",
  "temperature": 0.925,
  "labels": [
    {
      "start": 219,
      "end": 229,
      "text": "Gaza Strip",
      "meta": "HIGH INTRO OF NEW INFO\nIt is not mentioned in the original source that Gaza Strip is included in the occupied Palestinian territory.",
      "label_type": "Evident Baseless Info"
    }
  ],
  "split": "train",
  "quality": "good",
  "response": "The Palestinian Authority has officially become the 123rd member of the International Criminal Court (ICC), giving the court jurisdiction over alleged crimes in Palestinian territories. This includes East Jerusalem and Gaza Strip, which are occupied by Israel. The signing of Rome Statute by Palestinians in January 2021 had already established ICC's jurisdiction over alleged crimes committed \"since June 13, 2014\" in these areas. Now, the court can open a preliminary investigation or formal investigation into the situation in Palestinian territories, potentially leading to war crimes probes against Israeli individuals. However, this could also lead to counter-charges against Palestinians. The ICC welcomed Palestine's accession, while Israel and the US, who are not ICC members, opposed the move."
}

source_info.jsonl

| Field Name | Field Value | Description | | ----------- | ----------- | ------------------------------------------- | | source_id | String | The index of the source information. | | task_type | String | The task type of the data, including QA, Data2txt and Summary | | source | String | The source of the original content. | | source_info | String or Dict | Base content under RAG setting. For summarization tasks, the value of this field is of the string type; for data-to-text and QA (question-answering) tasks, the value of this field is of the dict type. | | prompt | String | The prompt we used to generate responses. For Llama and Mistral models, we used <s>[INST] {prompt} [/INST]. |

QA sample:

{
  "source_id": "14312",
  "task_type": "QA",
  "source": "MARCO",
  "source_info": {
    "question": "how to prepare beets and beet greens",
    "passages": "passage 1:Procedures: 1  Preheat oven to 350 degrees Fahrenheit. 2  Wash beets thoroughly, leaving skins on. 3  Place beets in a small baking dish or roasting pan, toss with 2 tablespoons of coconut oil, cover and bake for 45 to 60 minutes or until tender.  For the greens: heat remaining coconut oil in a skillet over medium-low heat.\n\npassage 2:Serve with red wine vinegar or butter and salt and pepper. For the greens: heat remaining coconut oil in a skillet over medium-low heat. Add garlic and onion and cook for one minute. Tear the beet greens into 2 to 3 inch pieces, and add to skillet, stirring until wilted and tender. Season with salt and pepper.\n\npassage 3:Directions See How It's Made. 1  Wash the greens thoroughly several times in deep water. Cook in very little boiling salted water until just tender, a few minutes. 2  Submit a Correction.\n\n"
  },
  "prompt": "Briefly answer the following question:\nhow to prepare beets and beet greens\nBear in mind that your response should be strictly based on the following ten passages:\npassage 1:Procedures: 1  Preheat oven to 350 degrees Fahrenheit. 2  Wash beets thoroughly, leaving skins on. 3  Place beets in a small baking dish or roasting pan, toss with 2 tablespoons of coconut oil, cover and bake for 45 to 60 minutes or until tender.  For the greens: heat remaining coconut oil in a skillet over medium-low heat.\n\npassage 2:Serve with red wine vinegar or butter and salt and pepper. For the greens: heat remaining coconut oil in a skillet over medium-low heat. Add garlic and onion and cook for one minute. Tear the beet greens into 2 to 3 inch pieces, and add to skillet, stirring until wilted and tender. Season with salt and pepper.\n\npassage 3:Directions See How It's Made. 1  Wash the greens thoroughly several times in deep water. Cook in very little boiling salted water until just tender, a few minutes. 2  Submit a Correction.\n\nIn case the passages do not contain the necessary information to answer the question, please reply with: \"Unable to answer based on given passages.\"\noutput:"
}

Data2txt sample:

{
  "source_id": "13661",
  "task_type": "Data2txt",
  "source": "Yelp",
  "source_info": {
    "name": "Subway",
    "address": "1940 Cliff Dr, Ste B-13",
    "city": "Santa Barbara",
    "state": "CA",
    "categories": "Restaurants, Sandwiches, Salad, Fast Food",
    "hours": {
      "Monday": "9:0-22:30",
      "Tuesday": "9:0-22:30",
      "Wednesday": "9:0-22:30",
      "Thursday": "9:0-22:30",
      "Friday": "9:0-22:30",
      "Saturday": "9:0-22:30",
      "Sunday": "11:0-22:0"
    },
    "attributes": {
      "BusinessParking": {
        "garage": false,
        "street": false,
        "validated": false,
        "lot": true,
        "valet": false
      },
      "RestaurantsReservations": false,
      "OutdoorSeating": true,
      "WiFi": "no",
      "RestaurantsTakeOut": true,
      "RestaurantsGoodForGroups": true,
      "Music": null,
      "Ambience": {
        "touristy": false,
        "hipster": false,
        "romantic": null,
        "divey": null,
        "intimate": null,
        "trendy": null,
        "upscale": null,
        "classy": null,
        "casual": null
      }
    },
    "business_stars": 3.0,
    "review_info": [
      {
        "review_stars": 1.0,
        "review_date": "2020-05-11 02:07:36",
        "review_text": "My husband and I came in earlier today for lunch after I ordered my sandwich my husband ordered a club and didn't think anything about it while the girl made it because he assumed she knew what went on a club.  Once we got in the car I looked at the receipt and realized she made him a turkey sandwich so we went back in to ask her to add the other meat and to refund us and recharge the correct price since a club is a little more. She was very rude about it and told us she wasn't going to do anything about and proceeded to call us liars and say he asked for a turkey sub. I told her she didn't have to be so rude so she told me to \"get the f**k out b***h\" and if I had a problem with it I could \"call her f***ing manager\". Also she proceeded to cuss us as we walked out of the store. It was quite unacceptable and inappropriate of an employee to be this unprofessional and aggressive."
      },
      {
        "review_stars": 3.0,
        "review_date": "2020-03-02 20:05:55",
        "review_text": "Small store, personnel not very well organized, store is only moderately clean. \nStaff is friendly sometimes, other times they will only barely recognize you."
      },
      {
        "review_stars": 5.0,
        "review_date": "2019-07-10 01:49:07",
        "review_text": "Nice and clean location. Toppings look fresh and well stocked. Joaquin and Odalis are always helpful and friendly."
      }
    ]
  },
  "prompt": "Instruction:\nWrite an objective overview about the following local business based only on the provided structured data in the JSON format. You should include details and cover the information mentioned in the customer

RAGTruth

Install / Use

README

RAGTruth

Updates

Dataset