VerifAI
VerifAI initiative to build open-source easy-to-deploy generative question-answering engine that can reference and verify answers for correctness (using posteriori model)
Install / Use
/learn @nikolamilosevic86/VerifAIREADME
VerifAI
VerifAI is a Generative Search/Productivity engine with Verifiable answers. Please check our website https://verifai-project.com/, or deployed app at https://app.verifai-project.com/
<img src="https://github.com/nikolamilosevic86/verif.ai/assets/5192295/e95b4877-0847-4fa2-99e5-a6a7fc0003f8" width="500"/>No more searches, just verifiably accurate answers.
Table of Contents
- Project Description
- Main Features
- Support This Project
- Installation and Start-up
- Developed Models and Datasets
- Using our APP
- Collaborators and Contributions
- Papers and Citations
- Funding
Project Description
VerifAI is a document-based question-answering systems that aims to address problem of hallucinations in generative large language models and generative search engines. Initially, we started with biomedical domain, however, now we have expanded VerifAI to support indexing any documents in txt,md, docx, pptx, or pdf formats.
VerifAI is an AI system designed to answer users' questions by retrieving the most relevant documents, generate answer with references to the relevant documents and verify that the generated answer does not contain any hallucinations. In the core of the engine is generative search engine, powered by open technologies. However, generative models may hallucinate, and therefore VerifAI is developed a second model that would check the sources of generative model and flag any misinformation or misinterpretations of source documents. Therefore, make the answer created by generative search engine completly verifiable.
The best part is, that we are making it open source, so anyone can use it!
Check the article about VerifAI project published on TowardsDataScience
How does it work:
Main features
- Easy installation by running a single script
- Easy indexing of local files in PDF,EPUB, PPTX, DOCX, MD and TXT formats
- Combination of lexical and semantic search to find the most relevant documents
- Usage of any HuggingFace listed model for document embeddings
- Usage of any LLM that follows OpenAI API standard (deployed using vLLM, Nvidia NIM, Ollama, or via commercial APIs, such as OpenAI, Azure)
- Supports large amounts of indexed documents (tested with over 200GB of data and 30 million documents)
- Shows the closest sentence in the document to the generated claim
- User registration and log-in
- Pleasent user interface developed in React.js
- Verification that generated text does not contain hallucinations by a specially fine-tuned model
- Possible single-sign-on with AzureAD (future plans to add other services, e.g. Google, GitHub, etc.)
⭐️ Support This Project
If you find this project helpful or interesting, please consider giving it a star on GitHub! Your support helps make this project more visible to others who might benefit from it.
<a href="https://github.com/nikolamilosevic86/verifAI/stargazers"> <img src="https://img.shields.io/github/stars/nikolamilosevic86/verifAI?style=social" alt="Star on GitHub"> </a>By starring this repository, you'll also stay updated on new features and improvements. Thank you for your support! 🙏
Installation and start-up
VerifAI Core
- Clone the repository or download latest release
- Create virtual python environment by running:
python -m venv verifai
source verifai/bin/activate
- In case you get errors with installing psycopg2, you may need to install postgres by running
sudo apt install postgresql-server-dev-all - On a clean instance you may need to run:
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt update
- Run requirements.txt by running
pip install -r backend/requirements.txt - Configure system, by replacing and modifying
.env.local.examplein backend folder and rename it into just.env: The configuration should look in the following manner:
SECRET_KEY=6183db7b3c4f67439ad61d1b798224a035fe35c4113bf870
ALGORITHM=HS256
DBNAME=verifai_database
USER_DB=myuser
PASSWORD_DB=mypassword
HOST_DB=localhost
OPENSEARCH_IP=localhost
OPENSEARCH_USER=admin
OPENSEARCH_PASSWORD=admin
OPENSEARCH_PORT=9200
OPENSEARCH_USE_SSL=False
QDRANT_IP=localhost
QDRANT_PORT=6333
QDRANT_API=8da7725d78141e19a9bf3d878f4cb333fedb56eed9727904b46ce4b32e1ce085
QDRANT_USE_SSL=False
OPENAI_PATH=<path-to-openai/azure/vllm/nvidia_nim/ollama-interface>
OPENAI_KEY=<key-in-interface>
DEPLOYMENT_MODEL=GPT4o
MAX_CONTEXT_LENGTH=128000
SIMILARITY_METRIC=DOT
VECTOR_SIZE=768
EMBEDDING_MODEL="sentence-transformers/msmarco-bert-base-dot-v5"
INDEX_NAME_LEXICAL = 'myindex-lexical'
INDEX_NAME_SEMANTIC = "myindex-semantic"
USE_VERIFICATION=True
Please note that value of SIMILARITY_METRIC can be either DOT (dot product) or COSINE (cosine similarity). If not stated, it will resolve to Cosine similarity.
- Run install_datastores.py file. To run this file, it is necessary to install Docker (and run the daemon). This file is designed to install necessary components, such as OpenSearch, Qdrant and PostgreSQL, as well as to create database in PostgreSQL.
python install_datastore.py
- Index your files, by running index_files.py and pointing it to the directory with files you would like to index. It will recuresevly index all files in the directory.
python index_files.py <path-to-directory-with-files>
As an example, we have created a folder with some example files in the folder test_data. You can index them by running:
python index_files.py test_data
- Run the backend of VerifAI by running
main.pyin the backend folder.
python main.py
- Install React by following this guide, or by running following commands:
sudo apt update
sudo apt install nodejs npm
sudo npm install -g create-react-app
- Install React requirements for the front-end in
client-gui/verifai-uifolder and run front end:
cd ..
cd client-gui/verifai-ui
npm install
Create .env file in the client-gui/verifai-ui folder with the following content (or based on .env.example file):
REACT_APP_BACKEND = http://127.0.0.1:5001/ # or your API url
REACT_APP_AZURE_CLIENT_ID=<your_azure_client_id>
REACT_APP_AZURE_TENANT_ID=<your_azure_tenant_id>
REACT_APP_AZURE_REDIRECT_URL=http://localhost:3000
If you do not configure REACT_APP_AZURE_CLIENT_ID and REACT_APP_AZURE_TENANT_ID, the app will not have the option to log in with AzureAD. Your AzureAD application needs to be registered as Single-Page Application in Azure. Change REACT_APP_AZURE_REDIRECT_URL to the redirect URL matching one in Azure.
Start the app by running:
npm start
- Go to
http://localhost:3000to see the VerifAI in action.
You can check a tutorial on deploying VerifAI published on Towards Data Science
VerifAI BioMed
This is biomedical version of VerifAI. It is designed to answer questions from the biomedical domain.
One requirement to run locally is to have installed Postgres SQL. You can install it for example on mac by running brew install postgresql.
- Clone the repository
- Run requirements.txt by running
pip install -r backend/requirements.txt - Download Medline. You can do it by executing
download_medline_data.shfor core files for the current year anddownload_medline_data_update.shfor Medline current update files. - Install Qdrant following the guide here
- Run the script:
python medline2json.pyto transform MEDLINE XML files into JSON - Run
python json2selected.pyto selects the fields that should be inported into the index - Run
python abstarct_parser.pyto concatinate abstract titles and abstracts and splits texts to 512 parts that can be indexed using a transformer model - Run
python embeddings_creation.pyto create embeddings. - Run
python scripts/indexing_qdrant.pyto create qdrant index. Make sure to point to the right folder created in the previous step and to the qdrant instance. - Install OpenSearch following the guide here
- Create OpenSearch index by running
python scripts/indexing_lexical_pmid.py. Make sure to configure access to the OpenSearch and point the path variable to the folder created by json2selected script. - Set up syste
