SkillAgentSearch skills...

HostileShop

A Quaint Hostel Shop with Sharp Tools

Install / Use

/learn @mikeperry-tor/HostileShop
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

HostileShop: A Quaint Hostel Shop with Sharp Tools

HostileShop is a tool for generating prompt injections and jailbreaks against LLM agents. It creates asimulated web shopping agent environment where an attacker agent attempts to manipulate a target shopping agent into performing unauthorized actions, recording successful attack examples in the process.

The framework automatically and immediately detects success conditions for direct and indirect prompt injections, without using an LLM to judge success. This enables low cost in-context learning by the attacker agent via immediate success reporting, and long-term learning via novel injection example extraction.

HostileShop supports the entire agent-capable LLM frontier, and maintains working attack examples for all such LLMs.

HostileShop also supports adversarial evaluation of prompt filters, and has generated bypass injections for gpt-oss-safeguard, even with a custom HostileShop-adapted safeguard policy.

HostileShop also provides a Prompt Injection Assistant Mode, where the full set of injection examples are provided to an agent that has been given instructions to assist you in performing red team exercises against other agent systems. Prompt injections discovered by HostileShop can be adapted to other agentic systems with this mode, since they are issues with the underlying LLM, rather than any specific agent system or agent framework.

OpenAI GPT-OSS-20B Red Team Contest Winner

HostileShop was one of the ten prize winners in OpenAI's GPT-OSS-20B RedTeam Contest.

The official contest writeup for HostileShop contains more information specific to the attacks that HostileShop discovered against gpt-oss-20b.

The branch gpt-oss-20b-submission preserves the code used to generate contest findings, and includes reproduction instructions in its README.

This branch contains many new features and improvements since then.

Table of Contents

Attack Capabilities

The detailed results of the framework against gpt-oss-20b are documented in my contest writeup.

HostileShop has been expanded and enhanced since then. The high-level attack capabilities are as follows:

Context Window Structure Injection

The entire agent-capable frontier is currently vulnerable to attacks that render portions of the LLM's context window in common markup languages, such as XML, JSON, TOML, YAML, and Markdown.

HostileShop discovers examples of this vulerability by providing the attacker agent with a description of the context window of the target agent, along with instructions and curated examples on how to generate prompt injections that the target will recognize as if they were native context window tags.

This includes:

HostileShop provides utilities to automatically generate context window format documentation for both open-weight and closed-weight models.

Code Debugging and Social Engineering

The attacker agent discovered that code debugging questions are quite effective at causing secrets to be revealed. It also discovered that social engineering attacks are quite effective.

When used in combination, most of the LLM frontier will perform debugging that leaks confidential information as a side effect.

Jailbreak Mutation and Porting

With the introduction of externally sourced jailbreaks, HostileShop is able to mutate and enhance these jailbreaks so that they work again, to bypass filters, overcome model adaptation, or covercome additional prompt instructions.

Interestingly, old universal jailbreaks that have been fixed by the LLM provider or blocked by system prompt instructions will often work again when mutated or combined with other attacks.

Additionally, jailbreaks can be ported between models through this mutation.

Adversarial Prompt Filter Bypass

HostileShop is capable of evaluating arbitrary prompt filter systems. It has generated bypass injections for gpt-oss-safeguard, even with a custom HostileShop-adapted safeguard policy.

ParselTongue Obfuscation

HostileShop contains an implementation of ParselTongue as a single python multitool for the attacker agent to perform obfuscation of injections, by layering multiple transforms together. It contains a prompt fragment that is useful as a general jailbreaking and for bypassing prompt filters.

Attack Stacking

Attacks become even more reliable when all of the above are stacked together. This enables attacks to succeed against larger models, such as GPT-5, Claude-4.5, GLM-4.6, and Kimi-K2. It also enables bypass of policy-based prompt filters, by combining injections for the model with injections for the prompt filter.

For new models, or new jailbreaks, it takes some probing for the attacker to discover successful combinations, but once it does, it is usually able to make them quite reliable, especially to induce tool call invocation.

Installation

Setup

  1. Clone the repository:

    git clone https://github.com/mikeperry-tor/HostileShop.git
    cd HostileShop
    
  2. Install dependencies:

    Choose one of the following methods to install the required Python packages:

    Using pip:

    pip install -r requirements.txt
    

    Using conda:

    # Create a new conda environment
    conda create -n HostileShop python=3.13
    conda activate HostileShop
    # Install dependencies
    pip install -r requirements.txt
    

    Using uv:

    # Create a virtual environment
    uv venv
    # Activate the virtual environment
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    # Install dependencies
    uv pip install -r requirements.txt
    
  3. **

Related Skills

View on GitHub
GitHub Stars71
CategoryDevelopment
Updated22d ago
Forks12

Languages

Python

Security Score

95/100

Audited on Mar 12, 2026

No findings