MPLSandbox

MPLSandbox is an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler and analysis tools for LLMs.

Generate Convert Improve

Install / Use

/learn @Ablustrund/MPLSandbox

About this skill

Quality Score

0/100

README

✨ MPLSandbox

MPLSandbox is an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler and analysis tools for LLMs.

https://arxiv.org/abs/2410.23074

🔍 Introduction

we propose MPLSandbox, an out-of-the-box sandbox designed to provide unified compiler feedback across multiple programming languages. Additionally, it integrates traditional code analysis tools, delivering comprehensive code information to LLMs from numerous perspectives. MPLSandbox simplifies code analysis for researchers, and can be seamlessly integrated into LLM training and application processes to enhance the performance of LLMs in a range of code-related tasks.

MPLSandbox consists of three core modules:

Multi-Programming Language Sandbox Environment

This Module can provide unified compiler feedback by compiling and executing the code. The code and unit test samples are sent to the sub-sandbox of the corresponding programming language for isolated execution to obtain compiler feedback. The sandbox ensures the program executes safely without jeopardizing the external environment or interrupting the training process

Code Analysis Module

This module includes multiple traditional analysis tools to offer a comprehensive analysis report from numerous perspectives. It provides a comprehensive code analysis from multiple perspectives, such as static analysis (i.e., potential bug detection} and code smell analysis) and dynamic analysis (i.e., fuzz testing and efficiency analysis). Additionally, this module can also assess other input information besides the code, such as evaluating the coverage of unit tests for the code, aiding researchers in improving the quality of these unit tests.

Information Integration Module

This module integrates compilation feedback and various analysis results to accomplish a range of complex code-related tasks. It integrates these results for LLMs to improve the quality of generated code and enhance their performance on a range of code-related tasks.

🛠️ Setup

Install MPLSandbox

The user can create and install MPLSandbox using the following command:

git clone git@github.com:Ablustrund/MPLSandbox.git
cd MPLSandbox
pip install .
# pip install -e . ## for editable mode

Prepare the Docker Images

First, users need to deploy the Docker images addresses on the host machine. After extensive testing, we have installed the necessary dependencies in Docker containers for various languages and packaged these custom Docker containers into the corresponding images as follows. We hope that users can directly use our open-source images because this can, to some extent, reduce the hassle of installing dependencies for various languages.

Python: mplsandbox-python-3.9.19-v1

Java: mplsandbox-java-11.0.12-v1

JavaScript: mplsandbox-javascript-22-v1

C++: mplsandbox-cpp-11.2.0-v1

Go: mplsandbox-golang-1.17.0-v1

Ruby: mplsandbox-ruby-3.0.2-v1

TypeScript: mplsandbox-typescript-1-22-v1

Bash: mplsandbox-bash-v1

We recommend that users manually download these image files and then use the following command to import them into Docker:

docker load < <path_to_downloaded_image>

If users wish to use custom images, we recommend modifying the DefaultImage class in /mplsandbox/const.py to define their own images.

📚 Usage

Use in the Project

Users can start mplsandbox and run it with the following lines of code:

from mplsandbox import MPLSANDBOX
data = {   
"question":"Define get_sum_of_two_numbers():\n    \"\"\"Write a function that takes two integers as input and returns their sum.\n\n    -----Input-----\n    \n    The input consists of multiple test cases. Each test case contains two integers $a$ and $b$ ($-10^9 \\le a, b \\le 10^9$).\n    \n    -----Output-----\n    \n    For each test case, print the sum of the two integers.\n    \n    -----Example-----\n    Input\n    3\n    1 2 ↵\n    -1 1 ↵\n    1000000000 1000000000\n    \n    Output\n    3\n    0\n    2000000000\n    \"\"\"",
"code": 'def get_sum_of_two_numbers():\n    a, b = map(int, input().split(" "))\n    print(a * b)\nget_sum_of_two_numbers()',
"unit_cases": {
"inputs": ["1 2", "3 4"],
"outputs": ["3", "7"]
},
"lang": "python"
}  # or a JSON file path
executor = MPLSANDBOX(data)
result = executor.run(analysis_type="all")

The specific descriptions of all fields in the data are as follows:

| Field | Description | |----------------|-------------| | question | (Required) Specifies the path to the code file to be executed. | | code | (Required) Specifies the code to be executed. | | unit_cases | (Required) Specifies the unit test cases, including inputs and expected outputs. | | lang | (Optional) Specifies the language of the code. If not specified, it can be set to "AUTO" for automatic recognition. | | libraries | (Optional) Specifies a list of dependency library names that need to be installed. | | client | (Optional) Specifies the docker client instance to be used | | image | (Optional) Specifies the docker image used to run the code. | | dockerfile | (Optional) Specifies the path to the dockerfile used to build a custom docker image. | | keep_template | (Optional) If it is set to True, the template files will be kept after the code is run. | | verbose | (Optional) If it is set to True, verbose output will be enabled to assist with debugging and diagnosing issues. | | app | (Optional) If it is set to True, app mode will be enabled, facilitating the deployment of services on the server. |

Use from the Command Line

We also provide the following command-line interface to scan the data.json file and output the report to the report.txt file:

mplsandbox --data /path/to/your/data.json --report /path/to/your/report.txt

Use as a Service

MPLSandbox often serves as a node for emitting code-related signals, so configuring the corresponding services is very important. We have provided a simple service demo in the scripts directory, and users can run this demo with the following command:

cd scripts
python ./app.py

Then, users can access the service using the curl command or other methods, and the format example is in scripts/test_app.sh

./test_app.sh

Providing feedback signals in RL

MPLSandbox can also provide stable compilation feedback signals for RLCF tasks. For specific implementation details, please refer to the mplsandbox_for_rl project.

🧑‍💻 Developing

We are working hard to refactor and improve the open-source version of MPLSandbox to closely match the functionality of the version used internally by Meituan LLM Team. We are currently working hard to reconstruct analysis tools for languages such as Go, JavaScript, and Ruby to achieve better code analysis and automated testing.

👀 Citation

@misc{dou2024MPLSandbox,
      title={Multi-Programming Language Sandbox for LLMs}, 
      author={Shihan Dou and Jiazheng Zhang and Jianxiang Zang and Yunbo Tao and Haoxiang Jia and Shichun Liu and Yuming Yang and Shenxi Wu and Shaoqing Zhang and Muling Wu and Changze Lv and Limao Xiong and Wenyu Zhan and Lin Zhang and Rongxiang Weng and Jingang Wang and Xunliang Cai and Yueming Wu and Ming Wen and Rui Zheng and Tao Ji and Yixin Cao and Tao Gui and Xipeng Qiu and Qi Zhang and Xuanjing Huang},
      year={2024},
      eprint={2410.23074},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2410.23074}, 
}

@article{dou2024s,
  title={What's Wrong with Your Code Generated by Large Language Models? An Extensive Study},
  author={Dou, Shihan and Jia, Haoxiang and Wu, Shenxi and Zheng, Huiyuan and Zhou, Weikang and Wu, Muling and Chai, Mingxu and Fan, Jessica and Huang, Caishuang and Tao, Yunbo and others},
  journal={arXiv preprint arXiv:2407.06153},
  year={2024}
}

Related Skills

diffs

338.0k

Use the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.

clearshot

Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.

openpencil

1.8k

The world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.

ui-ux-designer

Use this agent when you need to design, implement, or improve user interface components and user experience flows. Examples include: creating new pages or components, improving existing UI layouts, implementing responsive designs, optimizing user interactions, building forms or dashboards, analyzing existing UI through browser snapshots, or when you need to ensure UI components follow design system standards and shadcn/ui best practices.\n\n<example>\nContext: User needs to create a new dashboard page for team management.\nuser: "I need to create a team management dashboard where users can view team members, invite new members, and manage roles"\nassistant: "I'll use the ui-ux-designer agent to design and implement this dashboard with proper UX considerations, using shadcn/ui components and our design system tokens."\n</example>\n\n<example>\nContext: User wants to improve the user experience of an existing form.\nuser: "The signup form feels clunky and users are dropping off. Can you improve it?"\nassistant: "Let me use the ui-ux-designer agent to analyze the current form UX and implement improvements using our design system and shadcn/ui components."\n</example>\n\n<example>\nContext: User wants to evaluate and improve existing UI.\nuser: "Can you take a look at our pricing page and see how we can make it more appealing and user-friendly?"\nassistant: "I'll use the ui-ux-designer agent to take a snapshot of the current pricing page, analyze the UX against Notion-inspired design principles, and implement improvements using our design tokens."\n</example>

Ablustrund

View profile

View on GitHub

GitHub Stars180

CategoryDesign

Updated2mo ago

Forks29

Ablustrund/MPLSandbox

Languages

Python

Security Score

100/100

Audited on Jan 15, 2026

No findings