PassLLM
World's most accurate password guessing AI tool. A PyTorch implementation of PassLLM (USENIX 2025) that leverages PII and LoRA fine-tuning to outperform existing tools by over 45% on consumer hardware.
Install / Use
/learn @Tzohar/PassLLMREADME
PassLLM: AI-Based Targeted Password Guessing
About The Project
PassLLM is the world's most accurate targeted password guessing framework, outperforming other models by 15% to 45% in most scenarios. It uses Personally Identifiable Information (PII) - such as names, birthdays, phone numbers, emails and previous passwords - to predict the specific passwords a target is most likely to use. The model fine-tunes 7B/4B parameter LLMs on millions of leaked PII records using LoRA, enabling a private, high-accuracy framework that runs entirely on consumer PCs.
<img src="https://github.com/user-attachments/assets/00cafb1e-1c28-4c50-9e12-9e00ad33a32f" alt="PassLLM Demo" width="52%">Capabilities
- State-of-the-Art Accuracy: Achieves +45% higher success rates than leading benchmarks (RankGuess, TarGuess) in most scenarios.
- PII Inference: With sufficient information, it successfully guesses 12.5% - 31.6% of typical users within just 100 guesses.
- Efficient Fine-Tuning: Custom training loop utilizing LoRA to lower VRAM usage without sacrificing model reasoning capabilities.
- Advanced Inference: Implements the paper's algorithm to maximize probability, prioritizing the most likely candidates over random sampling.
- Data-Driven: Can be trained on millions of real-world credentials to learn the deep statistical patterns of human passwords creation.
- Pre-trained Weights: Includes robust models pre-trained on millions of real-world records from major PII breaches (e.g., Post Millennial, ClixSense) combined with the COMB dataset.
Use Guide
Tip: You can run this tool instantly without any local installation by opening our Google Colab Demo, providing your target's PII, and simply executing each cell in order.
Installation
- Python: 3.10+
- Password Guessing: Runs on Any GPU, Nvidia or AMD. A standard CPU or Mac (M1/M2) is also sufficient to run the pre-trained model.
- Training: NVIDIA GPU with CUDA (RTX 3090/4090 recommended, Google Colab's free tier is often enough).
# 1. Clone the repository
git clone https://github.com/tzohar/PassLLM.git
cd PassLLM
# 2. Install dependencies (Choose one)
# Option A: Install from requirements (Recommended)
pip install -r requirements.txt
# Option B: Manual install
pip install torch torch-directml "transformers<5.0.0" peft datasets bitsandbytes accelerate gradio
Configuration
Download the trained weights (~126 MB) and place them in the models/ directory.
Alternatively, via terminal:
curl -L https://github.com/Tzohar/PassLLM/releases/download/v1.3.0/PassLLM-Qwen3-4B-v1.0.pth -o models/PassLLM_LoRA_Weights.pth
Once installed and downloaded, adjust the settings in the WebUI or src/config.py to match your hardware.
| Hardware | OS | Device | 4-Bit Quantization | Torch DType | Inference Batch Size |
| --- | --- | --- | --- | --- | --- |
| NVIDIA | Any | cuda | ✅ On (Recommended) | bfloat16 | High (64+) |
| AMD | Windows | dml | ❌ Off | float16 | Low (8-16) |
| AMD (RDNA 3+) | Linux/WSL | cuda | ❌ Off | bfloat16 | Medium (64+) |
| AMD (Older) | Linux/WSL | cuda | ❌ Off | float16 | Low (8-16) |
| CPU | Any | cpu | ❌ Off | float32 | Low (1-4) |
Note (AMD on Linux/WSL): DirectML (
dml) is Windows-only. For AMD GPUs on Linux or WSL, you must install ROCm and PyTorch for ROCm. Once installed, setDEVICE = "cuda"as ROCm uses the CUDA API. 4-bit quantization (bitsandbytes) is not officially supported on ROCm. Newer AMD GPUs (RDNA 3 / RX 7000 series, MI200/MI300) have nativebfloat16support, use it for significant speed improvements.
Tip: Don't forget to customize the Min/Max Password Length, Character Bias, and Epsilon (search strictness) according to your specific target's needs!
Password Guessing (Pre-Trained)
You can use the graphical interface (WebUI) or the command line to generate candidates.
Option A: WebUI (Recommended)
- Launch the Interface:
python webui.py
- Generate:
- Open the local URL (e.g.,
http://127.0.0.1:7860). - Select Model: Choose the most recent model from the dropdown.
- Enter PII: Fill in the target's Name, Email, Birth Year, etc., into the form.
- Click Generate: The engine will stream ranked candidates in real-time.
Option B: Command Line (CLI)
Best for automation or headless servers.
- Create a Target File:
Create a
target.jsonlfile (or use the existing one) in the main folder. You can include any field defined insrc/config.py.
{
"name": "Johan P.",
"birth_year": "1966",
"email": "johan66@gmail.com",
"sister_pw": "Johan123"
}
- Run the Engine:
python app.py --file target.jsonl --weights models/PassLLM-Qwen3-4B-v1.0.pth --fast
--file: Path to your target PII file.--fast: Uses optimized, shallow beam search (omit for full deep search).--weights: Path to your downloaded model weights (e.g., the .pth file).--superfast: Very quick but less accurate, mainly for testing.
Training From Databases
To reproduce the paper's results or train on a new breach, you must provide a dataset of PII-to-Password pairs.
-
Prepare Your Dataset: Create a file at
training/passllm_raw_data.jsonl. Each line must be a valid JSON object containing apiidictionary and the targetoutputpassword.Example
passllm_raw_data.jsonl:{"pii": {"name": "Alice", "birth_year": "1990"}, "output": "Alice1990!"} {"pii": {"email": "bob@test.com", "sister_pw": "iloveyou"}, "output": "iloveyou2"}Note: Ensure your keys (e.g.,
first_name,email) match the schema defined insrc/config.py. -
Configure Parameters: Edit
src/config.pyto match your hardware and dataset specifics:# Hardware Settings TRAIN_BATCH_SIZE = 4 # Lower to 1 or 2 if hitting OOM on consumer GPUs GRAD_ACCUMULATION = 16 # Simulates larger batches (Effective Batch = 4 * 16 = 64) # Model Settings LORA_R = 16 # Rank dimension (Keep at 16 for standard reproduction) VOCAB_BIAS_DIGITS = -4.0 # Penalty strength for non-password patterns -
Start Training:
python train.pyThis script automates the full pipeline:
- Freezes the base model (Mistral/Qwen).
- Injects Trainable LoRA adapters into Attention layers.
- Masks the loss function so the model only learns to predict the password, not the PII.
- Saves the lightweight adapter weights to
models/PassLLM_LoRA_Weights.pth.
Results & Demo
{"name": "Marcus Thorne", "birth_year": "1976", "username": "mthorne88", "country": "Canada"}:
$ python app.py --file target.jsonl --superfast
--- TOP CANDIDATES ---
CONFIDENCE | PASSWORD
------------------------------
1.96% | marcus1976
1.91% | thorne1976
1.20% | mthorne1976
1.19% | marc1976 (marc is a common diminutive of Marcus, used in many passwords)
1.18% | a123456 (a high-probability global baseline across users with similar PII)
1.16% | marci1976 (another common variation of Marcus)
1.01% | winniethepooh (our training dataset demonstrated Winnie-related passwords to be common in Canada)
... (907 passwords generated)
{"name": "Elena Rodriguez", "birth_year": "1995", "birth_month": "12", "birth_day": "04", "email": "elena1.rod51@gmail.com", "id":"489298321"}:
$ python app.py --file target.jsonl --fast
--- TOP CANDIDATES ---
CONFIDENCE | PASSWORD
------------------------------
8.55% | elena1204 (all variations of name + birth date are naturally given very high probability)
8.16% | elena1995
7.77% | elena951204
6.29% | elena9512
5.37% | Elena1995
5.32% | elena1.rod51
5.00% | 120495
... (5,895 passwords generated)
{"name": "Sophia M. Turner", "birth_year": "2001", "pet_name": "Fluffy", "username": "soph_t", "email": "sturner99@yahoo.com", "country": "England", "sister_pw": ["soph12345", "13rockm4n", "01mamamia"]}:
$ python app.py --file target.jsonl --fast
--- TOP CANDIDATES ---
CONFIDENCE | PASSWORD
------------------------------
2.93% | sophia123 (this is a mix of the target's first name and the sister password "soph12345")
2.53% | mamamia01 (a simple variation of another sister password)
1.96% | sophia2001
1.78% | sophie123 (UK passwords often interchange between "sophie" and "sophia")
1.45% | 123456a (a very commmon password, ranked high due to the "12345" pattern)
1.39% | soph
