AIOpsLab
A holistic framework to enable the design, development, and evaluation of autonomous AIOps agents.
Install / Use
/learn @microsoft/AIOpsLabREADME
🤖Overview | 🚀Quick Start | 📦Installation | ⚙️Usage | 📂Project Structure | 📄How to Cite
</div> <h2 id="🤖overview">🤖 Overview</h2>
AIOpsLab is a holistic framework to enable the design, development, and evaluation of autonomous AIOps agents that, additionally, serve the purpose of building reproducible, standardized, interoperable and scalable benchmarks. AIOpsLab can deploy microservice cloud environments, inject faults, generate workloads, and export telemetry data, while orchestrating these components and providing interfaces for interacting with and evaluating agents.
Moreover, AIOpsLab provides a built-in benchmark suite with a set of problems to evaluate AIOps agents in an interactive environment. This suite can be easily extended to meet user-specific needs. See the problem list here.
<h2 id="📦installation">📦 Installation</h2>Requirements
- Python >= 3.11
- Helm
- Poetry (recommended) or pip
- Additional requirements depend on the deployment option selected, which is explained in the next section
Step 1: Install Python 3.11
sudo apt update
sudo apt install python3.11 python3.11-venv python3.11-dev -y
Step 2: Install Poetry (Official Installer)
# Use the official installer (NOT apt - the apt version is outdated)
curl -sSL https://install.python-poetry.org | python3.11 -
export PATH="$HOME/.local/bin:$PATH"
# Add to your shell profile for persistence
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
Warning: Do NOT use
sudo apt install python3-poetry- it installs an outdated version that may not work with the lock file.
Step 3: Clone and Install
git clone --recurse-submodules <CLONE_PATH_TO_THE_REPO>
cd AIOpsLab
poetry env use python3.11
poetry install
eval $(poetry env activate)
Troubleshooting: If you get a "lock file not compatible" error, run
poetry lockfirst, thenpoetry install.
Alternative installation with pip:
pip install -e .
<h2 id="🚀quickstart">🚀 Quick Start </h2>
<!-- TODO: Add instructions for both local cluster and remote cluster -->
Choose either a) or b) to set up your cluster and then proceed to the next steps.
a) Local simulated cluster
AIOpsLab can be run on a local simulated cluster using kind on your local machine. Please look at this README for a list of prerequisites.
# For x86 machines
kind create cluster --config kind/kind-config-x86.yaml
# For ARM machines
kind create cluster --config kind/kind-config-arm.yaml
If you're running into issues, consider building a Docker image for your machine by following this README. Please also open an issue.
[Tips]
If you are running AIOpsLab using a proxy, beware of exporting the HTTP proxy as 172.17.0.1. When creating the kind cluster, all the nodes in the cluster will inherit the proxy setting from the host environment and the Docker container.
The 172.17.0.1 address is used to communicate with the host machine. For more details, refer to the official guide: Configure Kind to Use a Proxy.
Additionally, Docker doesn't support SOCKS5 proxy directly. If you're using a SOCKS5 protocol to proxy, you may need to use Privoxy to forward SOCKS5 to HTTP.
If you're running VLLM and the LLM agent locally, Privoxy will by default proxy localhost, which will cause errors. To avoid this issue, you should set the following environment variable:
export no_proxy=localhost
After finishing cluster creation, proceed to the next "Update config.yml" step.
b) Remote cluster (Manual setup with Ansible)
AIOpsLab supports any remote kubernetes cluster that your kubectl context is set to, whether it's a cluster from a cloud provider or one you build yourself. We have some Ansible playbooks to setup clusters on providers like CloudLab and our own machines. Follow this README to set up your own cluster, and then proceed to the next "Update config.yml" step.
c) Azure VMs with Terraform + Ansible (Recommended for cloud)
Single command provisions VMs, sets up K8s, and configures AIOpsLab:
# Mode B (AIOpsLab on laptop, remote kubectl):
python3 scripts/terraform/deploy.py --apply --resource-group <your-rg> --workers 2 --mode B
# Mode A (AIOpsLab on controller VM, full fault injection support):
python3 scripts/terraform/deploy.py --apply --resource-group <your-rg> --workers 2 --mode A
See Terraform README for all options (--allowed-ips, --dev, --setup-only, etc.).
Note: Mode B is convenient for development but some fault injectors (e.g., VirtualizationFaultInjector) require Docker on the local machine. Use Mode A for full functionality.
Update config.yml
cd aiopslab
cp config.yml.example config.yml
Update your config.yml so that k8s_host is the host name of the control plane node of your cluster. Update k8s_user to be your username on the control plane node. If you are using a kind cluster, your k8s_host should be kind. If you're running AIOpsLab on cluster, your k8s_host should be localhost.
Running agents locally
Human as the agent:
python3 cli.py
(aiopslab) $ start misconfig_app_hotel_res-detection-1 # or choose any problem you want to solve
# ... wait for the setup ...
(aiopslab) $ submit("Yes") # submit solution
Run GPT-4 baseline agent:
# Create a .env file in the project root (if not exists)
echo "OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>" > .env
# Add more API keys as needed:
# echo "QWEN_API_KEY=<YOUR_QWEN_API_KEY>" >> .env
# echo "DEEPSEEK_API_KEY=<YOUR_DEEPSEEK_API_KEY>" >> .env
python3 clients/gpt.py # you can also change the problem to solve in the main() function
Our repository comes with a variety of pre-integrated agents, including agents that enable secure authentication with Azure OpenAI endpoints using identity-based access. Please check out Clients for a comprehensive list of all implemented clients.
The clients will automatically load API keys from your .env file.
You can check the running status of the cluster using k9s or other cluster monitoring tools conveniently.
To browse your logged session_id values in the W&B app as a table:
- Make sure you have W&B installed and configured.
- Set the USE_WANDB environment variable:
# Add to your .env file echo "USE_WANDB=true" >> .env - In the W&B web UI, open any run and click Tables → Add Query Panel.
- In the key field, type
runs.summaryand clickRun, then you will see the results displayed in a table format.
AIOpsLab can be used in the following ways:
Running agents remotely
You can run AIOpsLab on a remote machine with larger computational resources. This section guides you through setting up and using AIOpsLab remotely.
-
On the remote machine, start the AIOpsLab service:
SERVICE_HOST=<YOUR_HOST> SERVICE_PORT=<YOUR_PORT> SERVICE_WORKERS=<YOUR_WORKERS> python service.py -
Test the connection from your local machine: In your local machine, you can test the connection to the remote AIOpsLab service using
curl:# Check if the service is running curl http://<YOUR_HOST>:<YOUR_PORT>/health # List available problems curl http://<YOUR_HOST>:<YOUR_PORT>/problems # List available agents curl http://<YOUR_HOST>:<YOUR_PORT>/agents -
Run vLLM on the remote machine (if using vLLM agent): If you're using the vLLM agent, make sure to launch the vLLM server on the remote machine:
# On the remote machine chmod +x ./clients/launch_vllm.sh ./clients/launch_vllm.shYou can customize the model by editing
launch_vllm.shbefore running it. -
Run the agent: In your local machine, you can run the agent using the following command:
curl -X POST http://<YOUR_HOST>:<YOUR_PORT>/simulate \ -H "Content-Type: application/json" \ -d '{ "problem_id": "misconfig_app_hotel_res-mitigation-1", "agent_name": "vllm", "max_steps": 10, "temperature": 0.7, "top_p": 0.9 }'
How to onboard your agent to AIOpsLab?
AIOpsLab makes it extremely easy to develop and evaluate your agents. You can onboard your agent to AIOpsLab in 3 simple steps:
-
Create your agent: You are free to develop agents using any framework of your choice. The only requirements are:
-
Wrap your agent in a Python class, say
Agent -
Add an async method
get_actionto the class:# given current state and returns the agent's action async def get_action(self, state: str) -> str: # <your agent's logic here>
-
-
Register your agent with AIOpsLab: You can now register the agent with AIOpsLab's orchestrator. The orchestrator will manage the interac
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
