SOFT

This is the implementation of SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks (USENIX Security'25).

Code Structure
Quick Start
Dataset Information
Data Obfuscation
Evaluation
Citation
Acknowledgments

Code Structure

mia_llms_benchmark/
├── README.md                       # This file
├── environment.yml                 # Conda environment specification
├── config_finetune.yaml            # Training configuration
├── config_auc_tpr.yaml             # Evaluation configuration
├── finetune.py                     # Main fine-tuning script
├── main.py                         # Evaluation script
├── utils.py                        # Utility functions
├── data/
│   ├── obfuscation.py              # Obfuscation implementations
│   └── prepare.py                  # Dataset loading and tokenization
├── attacks/                        # MIA attack implementations
│   ├── __init__.py                 
│   ├── loss.py                     
│   ├── ratio.py                    
│   ├── mink.py                     
│   ├── minkplusplus.py             
│   ├── zlib.py                     
│   ├── lowercase.py                
│   ├── recall.py                   
│   ├── conrecall.py                
│   ├── bag_of_words.py             
│   ├── ensemble_classifier.py      
│   └── utils.py                    # Attack utilities
└── output/                         # Evaluation results

Quick Start

1. Install Dependencies

# Create python environment
conda env create -f environment.yml
conda activate mia

2. Fine-tune Model with Defense

# Single GPU training
python finetune.py --config config_finetune.yaml --select_ratio X

# Multi-GPU training with DeepSpeed
deepspeed --num_gpus=8 finetune.py --config config_finetune.yaml --select_ratio X

3. Evaluate Privacy Protection

The metrics include AUC-ROC, TPR@0.1FPR, and TPR@0.01FPR.

python main.py \
    -c config_auc_tpr.yaml \
    --run-all \
    --output "./output/" \
    --target-model "checkpoints/Llama-3.2-X/epoch-X" \
    --dataset "arxiv" \
    --split "ngram_13_0.8"

Dataset Information

Original Dataset

Source: iamgroot42/mimir
Description: Curated subset of The Pile dataset with membership labels
Splits: Various n-gram and threshold combinations (e.g., ngram_13_0.8)
Domains: ArXiv papers, Wikipedia, GitHub code, PubMed, and more

Example of Obfuscated Dataset

Source: LLM-MIA/editing-syn-pr0.5-mimir-arxiv-ngram_13_0.8
Description: Paraphrased version of the ArXiv subset using advanced text transformation
Usage: Ready-to-use obfuscated data for immediate training

Data Obfuscation

Generate Your Own Obfuscated Data

The data/obfuscation.py module provides tools to create obfuscated datasets:

# Set up environment variables
export OPENAI_API_KEY="your-api-key"
export HF_TOKEN="your-huggingface-token"

# Using OpenAI API for paraphrasing
python data/obfuscation.py

Obfuscation Prompts

The framework supports different prompts for various content types:

Text Paraphrasing Prompt:

message = [
    {"role": "system", "content": "You are a helpful text rewriting assistant."},
    {"role": "user", "content":
     f"Rewrite the following paragraph by replacing every word with an alternative term that does not share the same root or spelling. Preserve the same meaning and sentence structure as much as possible.\n\"\"\"\n{original_text}\n\"\"\""},
]

Code Obfuscation Prompt:

message = f"Rewrite the following code so it preserves the same functionality and flow, but changes all variable names, function names, and comments. Maintain the same input-output behavior. Keep it in the same programming language.\n\"\"\"\n{original_text}\n\"\"\""

Evaluation

Available Attack Methods

The framework implements 10+ state-of-the-art MIA attacks:

| Attack Method | Description | Key Parameters | |---------------|-------------|----------------| | Loss | Basic loss-based attack | - | | Zlib | Compression-based attack | - | | Lowercase | Case-sensitivity attack | - | | Min-K% Prob | Minimum k-probability attack | k | | Min-K%++ | Enhanced MinK with calibration | k | | Ratio | Loss ratio with reference model | reference_model_path | | Bag of Words | Feature-based ML attack | - | | ReCall | Prefix-based recall attack | n_shots, extra_non_member_dataset | | CON-ReCall | Conditional recall attack | n_shots, extra_non_member_dataset | | Ensemble | Combined multiple attacks | - |

Custom Evaluation

# Evaluate specific attacks only
python main.py \
    -c config_auc_tpr.yaml \
    --attacks "loss,ratio,mink" \
    --target-model "path/to/model" \
    --dataset "arxiv" \
    --split "ngram_13_0.8"

Citation

If you use this framework in your research, please cite:

@inproceedings{zhang2025soft,
    title = {SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks},
    author = {Zhang, Kaiyuan and Cheng, Siyuan and Guo, Hanxi and Chen, Yuetian and Su, Zian and An, Shengwei and Du, Yuntao and Fleming, Charles and Kundu, Ashish and Zhang, Xiangyu and Li, Ninghui},
    booktitle = {34th USENIX Security Symposium (USENIX Security 25)},
    year = {2025},
    address = {Seattle, WA},
    publisher = {USENIX Association}
}

Acknowledgments

Mimir Dataset for providing the evaluation benchmark
The Pile for the underlying text corpus
HuggingFace for the model and dataset hosting infrastructure

SOFT

Install / Use

README

SOFT

Table of Contents

Code Structure

Quick Start

1. Install Dependencies

2. Fine-tune Model with Defense

3. Evaluate Privacy Protection

Dataset Information

Original Dataset

Example of Obfuscated Dataset

Data Obfuscation

Generate Your Own Obfuscated Data

Obfuscation Prompts

Evaluation

Available Attack Methods

Custom Evaluation

Citation

Acknowledgments