SOFT
[USENIX Security 2025] SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks
Install / Use
/learn @KaiyuanZh/SOFTREADME
SOFT
This is the implementation of SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks (USENIX Security'25).
Table of Contents
Code Structure
mia_llms_benchmark/
├── README.md # This file
├── environment.yml # Conda environment specification
├── config_finetune.yaml # Training configuration
├── config_auc_tpr.yaml # Evaluation configuration
├── finetune.py # Main fine-tuning script
├── main.py # Evaluation script
├── utils.py # Utility functions
├── data/
│ ├── obfuscation.py # Obfuscation implementations
│ └── prepare.py # Dataset loading and tokenization
├── attacks/ # MIA attack implementations
│ ├── __init__.py
│ ├── loss.py
│ ├── ratio.py
│ ├── mink.py
│ ├── minkplusplus.py
│ ├── zlib.py
│ ├── lowercase.py
│ ├── recall.py
│ ├── conrecall.py
│ ├── bag_of_words.py
│ ├── ensemble_classifier.py
│ └── utils.py # Attack utilities
└── output/ # Evaluation results
Quick Start
1. Install Dependencies
# Create python environment
conda env create -f environment.yml
conda activate mia
2. Fine-tune Model with Defense
# Single GPU training
python finetune.py --config config_finetune.yaml --select_ratio X
# Multi-GPU training with DeepSpeed
deepspeed --num_gpus=8 finetune.py --config config_finetune.yaml --select_ratio X
3. Evaluate Privacy Protection
The metrics include AUC-ROC, TPR@0.1FPR, and TPR@0.01FPR.
python main.py \
-c config_auc_tpr.yaml \
--run-all \
--output "./output/" \
--target-model "checkpoints/Llama-3.2-X/epoch-X" \
--dataset "arxiv" \
--split "ngram_13_0.8"
Dataset Information
Original Dataset
- Source: iamgroot42/mimir
- Description: Curated subset of The Pile dataset with membership labels
- Splits: Various n-gram and threshold combinations (e.g.,
ngram_13_0.8) - Domains: ArXiv papers, Wikipedia, GitHub code, PubMed, and more
Example of Obfuscated Dataset
- Source: LLM-MIA/editing-syn-pr0.5-mimir-arxiv-ngram_13_0.8
- Description: Paraphrased version of the ArXiv subset using advanced text transformation
- Usage: Ready-to-use obfuscated data for immediate training
Data Obfuscation
Generate Your Own Obfuscated Data
The data/obfuscation.py module provides tools to create obfuscated datasets:
# Set up environment variables
export OPENAI_API_KEY="your-api-key"
export HF_TOKEN="your-huggingface-token"
# Using OpenAI API for paraphrasing
python data/obfuscation.py
Obfuscation Prompts
The framework supports different prompts for various content types:
Text Paraphrasing Prompt:
message = [
{"role": "system", "content": "You are a helpful text rewriting assistant."},
{"role": "user", "content":
f"Rewrite the following paragraph by replacing every word with an alternative term that does not share the same root or spelling. Preserve the same meaning and sentence structure as much as possible.\n\"\"\"\n{original_text}\n\"\"\""},
]
Code Obfuscation Prompt:
message = f"Rewrite the following code so it preserves the same functionality and flow, but changes all variable names, function names, and comments. Maintain the same input-output behavior. Keep it in the same programming language.\n\"\"\"\n{original_text}\n\"\"\""
Evaluation
Available Attack Methods
The framework implements 10+ state-of-the-art MIA attacks:
| Attack Method | Description | Key Parameters |
|---------------|-------------|----------------|
| Loss | Basic loss-based attack | - |
| Zlib | Compression-based attack | - |
| Lowercase | Case-sensitivity attack | - |
| Min-K% Prob | Minimum k-probability attack | k |
| Min-K%++ | Enhanced MinK with calibration | k |
| Ratio | Loss ratio with reference model | reference_model_path |
| Bag of Words | Feature-based ML attack | - |
| ReCall | Prefix-based recall attack | n_shots, extra_non_member_dataset |
| CON-ReCall | Conditional recall attack | n_shots, extra_non_member_dataset |
| Ensemble | Combined multiple attacks | - |
Custom Evaluation
# Evaluate specific attacks only
python main.py \
-c config_auc_tpr.yaml \
--attacks "loss,ratio,mink" \
--target-model "path/to/model" \
--dataset "arxiv" \
--split "ngram_13_0.8"
Citation
If you use this framework in your research, please cite:
@inproceedings{zhang2025soft,
title = {SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks},
author = {Zhang, Kaiyuan and Cheng, Siyuan and Guo, Hanxi and Chen, Yuetian and Su, Zian and An, Shengwei and Du, Yuntao and Fleming, Charles and Kundu, Ashish and Zhang, Xiangyu and Li, Ninghui},
booktitle = {34th USENIX Security Symposium (USENIX Security 25)},
year = {2025},
address = {Seattle, WA},
publisher = {USENIX Association}
}
Acknowledgments
- Mimir Dataset for providing the evaluation benchmark
- The Pile for the underlying text corpus
- HuggingFace for the model and dataset hosting infrastructure
