RADAR
Code for our NeurIPS2023 accepted paper: RADAR: Robust AI-Text Detection via Adversarial Learning. We tested RADAR on 8 LLMs including Vicuna and LLaMA. The results show that RADAR can attain good detection performance on LLM-generated AI-text while being robust against paraphrasing.
Install / Use
/learn @IBM/RADARREADME
RADAR_AI_Detection
Code for our NeurIPS2023 accepted paper: RADAR: Robust AI-Text Detection via Adversarial Learning.
Live demo for RADAR: RADAR-Demo
We tested RADAR on 8 LLMs including Vicuna and LLaMA. The results show that RADAR can attain good detection performance on LLM-generated AI-text while being robust against paraphrasing.
Environment Build
cd env
# go to env directory
conda env create -f radar_core.yaml
# to init a environment with packages installed using conda
conda activate radar_env
#activate conda environment
pip install -r radar_requirements.txt
# to install packages install using pip
Use RADAR to get AI-generated probability
Our RADAR detector is trained from the RoBERTa-large model. You can use it as using RoBERTa-large model. Here is an example of using RADAR to get the probability that the text is generated by Vicuna.
detector = transformers.AutoModelForSequenceClassification.from_pretrained("TrustSafeAI/RADAR-Vicuna-7B")
tokenizer = transformers.AutoTokenizer.from_pretrained("TrustSafeAI/RADAR-Vicuna-7B")
detector.eval()
detector.to(device)
Text_Input=["I'm not a chatbot"]
with torch.no_grad():
inputs = tokenizer(Text_input, padding=True, truncation=True, max_length=512, return_tensors="pt")
inputs = {k:v.to(device) for k,v in inputs.items()}
output_probs = F.log_softmax(detector(**inputs).logits,-1)[:,0].exp().tolist()
print("Probability of AI-generated texts is",output_probs)
Paraphrase the ai-text to evade detection
We prompt the gpt-3.5-turbo/gpt-4 to paraphrase the ai-generated text to make it more like human-written.
import openai
openai.api_key = "your_api_key"
def _openai_response(text,openai_model):
# get paraphrases of text from the openai model
# openai_model can be gpt-3.5-turbo/gpt-4
system_instruct = {"role": "system", "content": "Enhance the word choices in the sentence to sound more like that of a human."}
user_input={"role": "user", "content": text}
messages = [system_instruct,user_input]
k_wargs = { "messages":messages, "model": openai_model}
r = openai.ChatCompletion.create(**k_wargs)['choices'][0].message.content
return r
Calculate the Detection AUROC
We may need to calculate the detection auroc of the detector.
from sklearn.metrics import auc,roc_curve
def get_roc_metrics(human_preds, ai_preds):
# human_preds is the ai-generated probabiities of human-text
# ai_preds is the ai-generated probabiities of ai-text
fpr, tpr, _ = roc_curve([0] * len(human_preds) + [1] * len(ai_preds), human_preds + ai_preds,pos_label=1)
roc_auc = auc(fpr, tpr)
return fpr.tolist(), tpr.tolist(), float(roc_auc)
Examples
We provide some examples of using RADAR in radar_examples.ipynb. You can refer to it to get more familiar with RADAR working flow.
Citation
If you find RADAR useful, please cite the following paper:
@inproceedings{DBLP:conf/nips/HuCH23,
author = {Xiaomeng Hu and
Pin{-}Yu Chen and
Tsung{-}Yi Ho},
title = {{RADAR:} Robust AI-Text Detection via Adversarial Learning},
booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference
on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans,
LA, USA, December 10 - 16, 2023},
year = {2023}
}
Contact
Feel free to contact Xiaomeng Hu if you have any questions.
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
20.0kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
