SkillAgentSearch skills...

AgentRun

The easiest, and fastest way to run AI-generated Python code safely

Install / Use

/learn @tjmlabs/AgentRun
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

AgentRun: Run AI Generated Code Safely

PyPI Tests Changelog PyPI - Downloads Python Version from PEP 621 TOML License MkDocs Twitter Follow

AgentRun is a Python library that makes it easy to run Python code safely from large language models (LLMs) with a single line of code. Built on top of the Docker Python SDK and RestrictedPython, it provides a simple, transparent, and user-friendly API to manage isolated code execution.

AgentRun automatically installs and uninstalls dependencies with optional caching, limits resource consumption, checks code safety, and sets execution timeouts. It has 97% test coverage with full static typing and only two dependencies.

[!NOTE] Looking for a state of the art RAG API? Check out ColiVara, also from us.

Why?

Giving code execution ability to LLMs is a massive upgrade. Consider the following user query: what is 12345 * 54321? or even something more ambitious like what is the average daily move of Apple stock during the last week?? With code execution it is possible for LLMs to answer both accurately by executing code.

However, executing untrusted code is dangerous and full of potential footguns. For instance, without proper safeguards, an LLM might generate harmful code like this:

import os
# deletes all files and directories
os.system('rm -rf /')

This package gives code execution ability to any LLM in a single line of code, while preventing and guarding against dangerous code.

Key Features

  • Safe code execution: AgentRun checks the generated code for dangerous elements before execution
  • Isolated Environment: Code is executed in a fully isolated docker container
  • Configurable Resource Management: You can set how much compute resources the code can consume, with sane defaults
  • Timeouts: Set time limits on how long a script can take to run
  • Dependency Management: Complete control on what dependencies are allowed to install
  • Dependency Caching: AgentRun gives you the ability to cache any dependency in advance in the docker container to optimize performance.
  • Automatic Cleanups: AgentRun cleans any artifacts created by the generated code.
  • Comes with a REST API: Hate setting up docker? AgentRun comes with already configured docker setup for self-hosting.
  • Transparent Exception Handling: AgentRun returns the same exact output as running Python in your system - exceptions and tracebacks included. No cryptic docker messages.

If you want to use your own Docker configuration, install this package with pip and simply initialize AgentRun with a running Docker container. Additionally, you can use an already configured Docker Compose setup and API that is ready for self-hosting by cloning this repo.

Unless you are comfortable with Docker, we highly recommend using the REST API with the already configured Docker as a standalone service.

Getting Started

There are two ways to use AgentRun, depending on your needs: with pip for your own Docker setup, or directly as a REST API as a standalone service (recommended).

REST API

Clone the github repository and start immediately with a standalone REST API.

git clone https://github.com/Jonathan-Adly/agentrun
cd agentrun/agentrun-api
cp .env.example .env.dev
docker-compose up -d --build

Then - you have a fully up and running code execution API. Code in --> output out

fetch('http://localhost:8000/v1/run/', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json'
    },
    body: JSON.stringify({
        code: "print('hello, world!')"
    })
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));

Or if you prefer the terminal.

curl -X POST http://localhost:8000/v1/run/ -H "Content-Type: application/json" -d '{"code": "print(\'hello, world!\')"}'

pip install

Install AgentRun with a single command via pip (you will need to configure your own Docker setup):

pip install agentrun

Here is a simple example:

from agentrun import AgentRun

runner = AgentRun(container_name="my_container") # container should be running
code_from_llm = get_code_from_llm(prompt) # "print('hello, world!')"

result = runner.execute_code_in_container(code_from_llm)
print(result)
#> "Hello, world!" 

Difference | Python Package | REST API | --------- | -------------- | ----------- | Docker setup| You set it up | Already setup for you |
Installation| Pip | Git clone | Ease of use | Easy | Super Easy | Requirements| A running docker container| Docker installed | Customize | Fully | Partially |

Usage

Now, let's see AgentRun in action with something more complicated. We will take advantage of function calling and AgentRun, to have LLMs write and execute code on the fly to solve arbitrary tasks. You can find the full code under docs/examples/

First, we will install the needed packages. We are using mixtral here via groq to keep things fast and with minimal depenencies, but AgentRun works with any LLM out of the box. All what's required is for the LLM to return a code snippet.

FYI: OpenAI assistant tool code_interpreter can execute code. AgentRun is a transparent, open-source version that can work with any LLM.

!pip install groq 
!pip install requests

Next, we will setup a function that executed the code and returns an output. We are using the API here, so make sure to have it running before trying this.

Here is the steps to run the API:

git clone https://github.com/Jonathan-Adly/agentrun
cd agentrun/agentrun-api
cp .env.example .env.dev
docker-compose up -d --build
def execute_python_code(code: str) -> str:
    response = requests.post("http://localhost:8000/v1/run/", json={"code": code})
    output = response.json()["output"]
    return output

Next, we will setup our LLM function calling skeleton code. We need:

  1. An LLM client such Groq or OpenAI or Anthropic (alternatively, you can use litellm as wrapper)
  2. The model you will use
  3. Our code execution tool - that encourages the LLM model to send us python code to execute reliably
from groq import Groq
import json

client = Groq(api_key ="Your API Key")

MODEL = 'mixtral-8x7b-32768'

tools = [
    {
        "type": "function",
        "function": {
            "name": "execute_python_code",
            "description": "Sends a python code snippet to the code execution environment and returns the output. The code execution environment can automatically import any library or package by importing.",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "The code snippet to execute. Must be a valid python code. Must use print() to output the result.",
                    },
                },
                "required": ["code"],
            },
        },
    },
]

Next, we will setup a function to call our LLM of choice.

def chat_completion_request(messages, tools=None, tool_choice=None, model=GPT_MODEL):
    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            tools=tools,
            tool_choice=tool_choice,
        )
        return response
    except Exception as e:
        print("Unable to generate ChatCompletion response")
        print(f"Exception: {e}")
        return e

Finally, we will set up a function that takes the user query and returns an answer. Using AgentRun to execute code when the LLM determines code execution is necesary to answer the question

def get_answer(query):
    messages = []
    messages.append(
        {
            "role": "system",
            "content": """Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.\n 
            Use the execute_python_code tool to run code if a question is better solved with code. You can use any package in the code snippet by simply importing. Like `import requests` would work fine.\n
            """,
        }
    )
    messages.append({"role": "user", "content": query})

    chat_response = chat_completion_request(messages, tools=tools)

    message = chat_response.choices[0].message
    # tool call versus content
    if message.tool_calls:
        tool_call = message.tool_calls[0]
        arg = json.loads(tool_call.function.arguments)["code"]
        print(f"Executing code: {arg}")
        answer = execute_python_code(arg)
        # Optional: call an LLM again to turn

Related Skills

View on GitHub
GitHub Stars366
CategoryDevelopment
Updated1h ago
Forks41

Languages

Python

Security Score

95/100

Audited on Apr 1, 2026

No findings