CTrag
A cyber threat intelligence chatbot that ingested 2200+ reports from vx-underground.
Install / Use
/learn @Karib0u/CTragREADME
CTrag
<p align="center"> <img src="images/logo.webp" width="400s"> </p>CTrag is a cyber threat intelligence tool that uses local large language models (LLMs) and a vector database to answer your questions about cyber threats. It's built on top of Langchain, Ollama, Chroma, and PyPDF.
Credits
The majority of the code for this project was adapted from the work of the first developer of local-LLM-with-RAG. The project was then adapted for the specific cyber threat intelligence use-case and a first vector database was built.
https://github.com/amscotti/local-LLM-with-RAG
Requirements
- Ollama version 0.1.26 or higher.
- You can download other models with
ollama pull [MODEL_NAME]so that the chatbot can use them (default: dolphin-mistral)
- You can download other models with
Setup
- Clone this repository to your local machine using Git LFS to ensure you receive the already processed vector database with the 2200 reports. If you haven't installed Git LFS, please follow the instructions here.
- Create a Python virtual environment by running
python3 -m venv env. - Activate the virtual environment by running
source env/bin/activateon Unix or MacOS, or.\env\Scripts\activateon Windows. - Install the required Python packages by running
pip install -r requirements.txt.
Running the project
Note: The first time you run the project, it will download the necessary models from Ollama for the LLM and embeddings. This is a one-time setup process and may take some time depending on your internet connection.
- Ensure your virtual environment is activated.
- Run the streamlit GUI with
python streamlit run app.py
Adding reports to the vector database
- Put the PDFs in the
reportsfolder - Run the tool, it will automatically process the files and add them to the db.
Available commands
Here are the available command line arguments and their default values for running the CTrag Streamlit application:
streamlit run app.py [-m MODEL] [-e EMBEDDING_MODEL] [-p PATH] [--nb-docs NB_DOCS]
-m MODEL,--model MODEL: The name of the LLM model to use. Default is"dolphin-mistral".-e EMBEDDING_MODEL,--embedding_model EMBEDDING_MODEL: The name of the embedding model to use. Default is"nomic-embed-text".-p PATH,--path PATH: The path to the directory containing documents to load. Default is"reports".--nb-docs NB_DOCS: The number of documents to retrieve from the vector database. Default is8.
Example usage:
streamlit run app.py -m "dolphin-mistral" -e "nomic-embed-text" -p "/path/to/documents" --nb-docs 10
This command runs the CTrag Streamlit application using the "dolphin-mistral" LLM model, the "nomic-embed-text" embedding model, loads documents from the "/path/to/documents" directory, and retrieves 10 documents from the vector database.
Source used to build the original vector database
Technologies Used
- Langchain: A Python library for working with Large Language Model
- Ollama: A platform for running Large Language models locally.
- Chroma: A vector database for storing and retrieving embeddings.
- PyPDF: A Python library for reading and manipulating PDF files.
License
This project is licensed under the terms of the MIT license.
Related Skills
openhue
348.0kControl Philips Hue lights and scenes via the OpenHue CLI.
sag
348.0kElevenLabs text-to-speech with mac-style say UX.
weather
348.0kGet current weather and forecasts via wttr.in or Open-Meteo
tweakcc
1.6kCustomize Claude Code's system prompts, create custom toolsets, input pattern highlighters, themes/thinking verbs/spinners, customize input box & user message styling, support AGENTS.md, unlock private/unreleased features, and much more. Supports both native/npm installs on all platforms.
