JustAsk
JustAsk: Curious Code Agents Reveal System Prompts in Frontier LLMs | Verified on Claude Code | Autoresearch for System Prompt Extraction
Install / Use
/learn @x-zheng16/JustAskQuality Score
Category
Development & EngineeringSupported Platforms
README
JustAsk
Curious Code Agents Reveal System Prompts in Frontier LLMs
</div> <p align="center"> <a href="https://arxiv.org/abs/2601.21233"><img src="https://img.shields.io/badge/arXiv-2601.21233-b31b1b.svg" alt="arXiv"></a> <a href="https://x-zheng16.github.io/System-Prompt-Open/"><img src="https://img.shields.io/badge/Gallery-System_Prompt_Open-22D3BB.svg" alt="Gallery"></a> <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License"></a> <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/Python-3.11+-blue.svg" alt="Python"></a> </p> <p align="center"> <a href="https://github.com/x-zheng16/JustAsk/stargazers"><img src="https://img.shields.io/github/stars/x-zheng16/JustAsk?style=social" alt="Stars"></a> <a href="https://github.com/x-zheng16/JustAsk/network/members"><img src="https://img.shields.io/github/forks/x-zheng16/JustAsk?style=social" alt="Forks"></a> </p> <h3 align="center"> <a href="https://arxiv.org/abs/2601.21233">Paper</a> · <a href="https://x-zheng16.github.io/System-Prompt-Open/">Gallery</a> · <a href="https://github.com/x-zheng16/System-Prompt-Open">Gallery Data</a> · <a href="https://github.com/x-zheng16/JustAsk/issues">Issues</a> </h3>[!CAUTION] Research use only. JustAsk is released exclusively for academic safety research, responsible disclosure, and evaluation of LLM security. We do not condone or permit any use of this tool for unauthorized extraction, prompt theft, or exploitation of commercial systems.
<div align="center">
<a href="https://x-zheng16.github.io/">Xiang Zheng</a><sup>1</sup>, <a href="https://github.com/wuyoscar">Yutao Wu</a><sup>2</sup>, <a href="https://hanxunh.github.io/">Hanxun Huang</a><sup>3</sup>, <a href="https://github.com/bboylyg">Yige Li</a><sup>4</sup>, <a href="https://xingjunma.com/">Xingjun Ma</a><sup>5,†</sup>, <a href="https://aisecure.github.io/">Bo Li</a><sup>6</sup>, <a href="https://scholar.google.com/citations?user=f3_FP8AAAAAJ">Yu-Gang Jiang</a><sup>5</sup>, <a href="https://www.cs.cityu.edu.hk/~congwang/">Cong Wang</a><sup>1,†</sup>
<sup>1</sup>City University of Hong Kong, <sup>2</sup>Deakin University, <sup>3</sup>The University of Melbourne, <sup>4</sup>Singapore Management University, <sup>5</sup>Fudan University, <sup>6</sup>University of Illinois at Urbana-Champaign
<sup>†</sup>Corresponding authors
</div>Latest News
| Date | Update | |:-----------|:----------------------------------------------------------------------------------------------------------------| | 2026-03 | Code and data open-sourced on GitHub | | 2026-03 | System Prompt Open Gallery launched with 45+ extracted system prompts | | 2026-01 | Paper posted on arXiv |
Table of Contents
Overview
<div align="center"> <img src="assets/fig_framework.png" alt="JustAsk Framework" width="100%"> <p><em>JustAsk framework: a self-evolving agent that autonomously discovers extraction strategies through UCB-guided skill selection.</em></p> </div>JustAsk is a self-evolving framework that autonomously discovers effective system prompt extraction strategies through interaction alone. Unlike prior prompt-engineering or dataset-based attacks, JustAsk requires no handcrafted prompts, labeled supervision, or privileged access beyond standard user interaction.
Key Insight: Autonomous code agents fundamentally expand the LLM attack surface. JustAsk treats each model interaction as a learning opportunity -- the agent evolves its skill set organically through experience, not model fine-tuning.
Results
<div align="center"> <img src="assets/fig_validation.png" alt="Extraction Results" width="100%"> <p><em>Validation: JustAsk's semantic extraction (left) closely matches the ground truth obtained via reverse engineering (right), confirming high extraction fidelity.</em></p> </div>Browse the full extraction results at the System Prompt Open Gallery.
Abstract
Autonomous code agents built on large language models are reshaping software and AI development through tool use, long-horizon reasoning, and self-directed interaction. However, this autonomy introduces a previously unrecognized security risk: agentic interaction fundamentally expands the LLM attack surface, enabling systematic probing and recovery of hidden system prompts that guide model behavior. We identify system prompt extraction as an emergent vulnerability intrinsic to code agents and present JustAsk, a self-evolving framework that autonomously discovers effective extraction strategies through interaction alone. Unlike prior prompt-engineering or dataset-based attacks, JustAsk requires no handcrafted prompts, labeled supervision, or privileged access beyond standard user interaction. It formulates extraction as an online exploration problem, using Upper Confidence Bound-based strategy selection and a hierarchical skill space spanning atomic probes and high-level orchestration. These skills exploit imperfect system-instruction generalization and inherent tensions between helpfulness and safety. Evaluated on 45 black-box commercial models across multiple providers, JustAsk consistently achieves full or near-complete system prompt recovery, revealing recurring design- and architecture-level vulnerabilities. Our results expose system prompts as a critical yet largely unprotected attack surface in modern agent systems.
Method
Skill Set Definition:
Skill Set = Skills (fixed) + Rules (evolving) + Stats (evolving)
| Component | Role | Evolves? | | ---------- | ----------------------------------------- | -------- | | Skills | Fixed vocabulary (L1-L14, H1-H15) | No | | Rules | Exploitation knowledge (long-term memory) | Yes | | Stats | Exploration guidance (UCB) | Yes |
Skill Selection (UCB):
UCB(Ci) = success_rate(Ci) + c * sqrt(ln(N) / ni)
─────────────── ─────────────────────
exploitation exploration (curiosity)
Structure
.
├── config/
│ └── exp_config.yaml # Experiment configuration
├── docs/
│ ├── PAP.md # Persuasion templates & skill mappings
│ └── PAP_taxonomy.jsonl # 40 real-world persuasion patterns
└── src/
├── skill_evolving.py # Main extraction via OpenRouter
├── skill_testing.py # Controlled evaluation
├── skill_testing_controlled.py # Protection-level evaluation
├── ucb_ranking.py # UCB skill selection algorithm
├── knowledge.py # Knowledge persistence
├── validation.py # Cross-verify & self-consistency
└── ...
Setup
conda create -n justask python=3.11
conda activate justask
pip install python-dotenv requests numpy
Create a .env file:
OPENROUTER_API_KEY=sk-or-v1-your-key-here
Controlled Evaluation
python src/skill_testing.py --model openai/gpt-5.2
| Metric | Description | | ---------------- | ---------------------------------- | | Semantic Sim | Embedding cosine similarity | | Secret Leak | Fraction of injected secrets found |
See START.md for detailed agent instructions.
Related Projects
From the same team:
- System Prompt Open
-- Gallery of extracted system prompts from 45+ frontier models
- ISC-Bench
-- Internal Safety Collapse in Frontier LLMs
- Awesome-Embodied-AI-Safety
-- Safety in Embodied AI: Risks, Attacks, and Defenses
- Awesome-Large-Model-Safety
-- Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety
- XTransferBench
-- Super Transferable Adversarial Attacks on CLIP (ICML 2025)
- BackdoorLLM
-- A Comprehensive Benchmark for Backdoor Attacks on LLMs (NeurIPS 2025)
- BackdoorAgent
-- Backdoor Attacks on LLM-based Agent Workflows
