HostileShop
A Quaint Hostel Shop with Sharp Tools
Install / Use
/learn @mikeperry-tor/HostileShopREADME
HostileShop: A Quaint Hostel Shop with Sharp Tools
HostileShop is a tool for generating prompt injections and jailbreaks against LLM agents. It creates asimulated web shopping agent environment where an attacker agent attempts to manipulate a target shopping agent into performing unauthorized actions, recording successful attack examples in the process.
The framework automatically and immediately detects success conditions for direct and indirect prompt injections, without using an LLM to judge success. This enables low cost in-context learning by the attacker agent via immediate success reporting, and long-term learning via novel injection example extraction.
HostileShop supports the entire agent-capable LLM frontier, and maintains working attack examples for all such LLMs.
HostileShop also supports adversarial evaluation of prompt filters, and has generated bypass injections for gpt-oss-safeguard, even with a custom HostileShop-adapted safeguard policy.
HostileShop also provides a Prompt Injection Assistant Mode, where the full set of injection examples are provided to an agent that has been given instructions to assist you in performing red team exercises against other agent systems. Prompt injections discovered by HostileShop can be adapted to other agentic systems with this mode, since they are issues with the underlying LLM, rather than any specific agent system or agent framework.
OpenAI GPT-OSS-20B Red Team Contest Winner
HostileShop was one of the ten prize winners in OpenAI's GPT-OSS-20B RedTeam Contest.
The official contest writeup for HostileShop contains more information specific to the attacks that HostileShop discovered against gpt-oss-20b.
The branch gpt-oss-20b-submission preserves the code used to generate contest findings, and includes reproduction instructions in its README.
This branch contains many new features and improvements since then.
Table of Contents
Attack Capabilities
The detailed results of the framework against gpt-oss-20b are documented in my contest writeup.
HostileShop has been expanded and enhanced since then. The high-level attack capabilities are as follows:
Context Window Structure Injection
The entire agent-capable frontier is currently vulnerable to attacks that render portions of the LLM's context window in common markup languages, such as XML, JSON, TOML, YAML, and Markdown.
HostileShop discovers examples of this vulerability by providing the attacker agent with a description of the context window of the target agent, along with instructions and curated examples on how to generate prompt injections that the target will recognize as if they were native context window tags.
This includes:
HostileShop provides utilities to automatically generate context window format documentation for both open-weight and closed-weight models.
Code Debugging and Social Engineering
The attacker agent discovered that code debugging questions are quite effective at causing secrets to be revealed. It also discovered that social engineering attacks are quite effective.
When used in combination, most of the LLM frontier will perform debugging that leaks confidential information as a side effect.
Jailbreak Mutation and Porting
With the introduction of externally sourced jailbreaks, HostileShop is able to mutate and enhance these jailbreaks so that they work again, to bypass filters, overcome model adaptation, or covercome additional prompt instructions.
Interestingly, old universal jailbreaks that have been fixed by the LLM provider or blocked by system prompt instructions will often work again when mutated or combined with other attacks.
Additionally, jailbreaks can be ported between models through this mutation.
Adversarial Prompt Filter Bypass
HostileShop is capable of evaluating arbitrary prompt filter systems. It has generated bypass injections for gpt-oss-safeguard, even with a custom HostileShop-adapted safeguard policy.
ParselTongue Obfuscation
HostileShop contains an implementation of ParselTongue as a single python multitool for the attacker agent to perform obfuscation of injections, by layering multiple transforms together. It contains a prompt fragment that is useful as a general jailbreaking and for bypassing prompt filters.
Attack Stacking
Attacks become even more reliable when all of the above are stacked together. This enables attacks to succeed against larger models, such as GPT-5, Claude-4.5, GLM-4.6, and Kimi-K2. It also enables bypass of policy-based prompt filters, by combining injections for the model with injections for the prompt filter.
For new models, or new jailbreaks, it takes some probing for the attacker to discover successful combinations, but once it does, it is usually able to make them quite reliable, especially to induce tool call invocation.
Installation
Setup
-
Clone the repository:
git clone https://github.com/mikeperry-tor/HostileShop.git cd HostileShop -
Install dependencies:
Choose one of the following methods to install the required Python packages:
Using pip:
pip install -r requirements.txtUsing conda:
# Create a new conda environment conda create -n HostileShop python=3.13 conda activate HostileShop # Install dependencies pip install -r requirements.txtUsing uv:
# Create a virtual environment uv venv # Activate the virtual environment source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install dependencies uv pip install -r requirements.txt -
**
Related Skills
node-connect
347.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
