SkillAgentSearch skills...

AIOpsLab

A holistic framework to enable the design, development, and evaluation of autonomous AIOps agents.

Install / Use

/learn @microsoft/AIOpsLab
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <h1>AIOpsLab</h1>

🤖Overview | 🚀Quick Start | 📦Installation | ⚙️Usage | 📂Project Structure | 📄How to Cite

ArXiv Link ArXiv Link

</div> <h2 id="🤖overview">🤖 Overview</h2>

alt text

AIOpsLab is a holistic framework to enable the design, development, and evaluation of autonomous AIOps agents that, additionally, serve the purpose of building reproducible, standardized, interoperable and scalable benchmarks. AIOpsLab can deploy microservice cloud environments, inject faults, generate workloads, and export telemetry data, while orchestrating these components and providing interfaces for interacting with and evaluating agents.

Moreover, AIOpsLab provides a built-in benchmark suite with a set of problems to evaluate AIOps agents in an interactive environment. This suite can be easily extended to meet user-specific needs. See the problem list here.

<h2 id="📦installation">📦 Installation</h2>

Requirements

  • Python >= 3.11
  • Helm
  • Poetry (recommended) or pip
  • Additional requirements depend on the deployment option selected, which is explained in the next section

Step 1: Install Python 3.11

sudo apt update
sudo apt install python3.11 python3.11-venv python3.11-dev -y

Step 2: Install Poetry (Official Installer)

# Use the official installer (NOT apt - the apt version is outdated)
curl -sSL https://install.python-poetry.org | python3.11 -
export PATH="$HOME/.local/bin:$PATH"

# Add to your shell profile for persistence
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc

Warning: Do NOT use sudo apt install python3-poetry - it installs an outdated version that may not work with the lock file.

Step 3: Clone and Install

git clone --recurse-submodules <CLONE_PATH_TO_THE_REPO>
cd AIOpsLab
poetry env use python3.11
poetry install
eval $(poetry env activate)

Troubleshooting: If you get a "lock file not compatible" error, run poetry lock first, then poetry install.

Alternative installation with pip:

pip install -e .
<h2 id="🚀quickstart">🚀 Quick Start </h2> <!-- TODO: Add instructions for both local cluster and remote cluster -->

Choose either a) or b) to set up your cluster and then proceed to the next steps.

a) Local simulated cluster

AIOpsLab can be run on a local simulated cluster using kind on your local machine. Please look at this README for a list of prerequisites.

# For x86 machines
kind create cluster --config kind/kind-config-x86.yaml

# For ARM machines
kind create cluster --config kind/kind-config-arm.yaml

If you're running into issues, consider building a Docker image for your machine by following this README. Please also open an issue.

[Tips]

If you are running AIOpsLab using a proxy, beware of exporting the HTTP proxy as 172.17.0.1. When creating the kind cluster, all the nodes in the cluster will inherit the proxy setting from the host environment and the Docker container.

The 172.17.0.1 address is used to communicate with the host machine. For more details, refer to the official guide: Configure Kind to Use a Proxy.

Additionally, Docker doesn't support SOCKS5 proxy directly. If you're using a SOCKS5 protocol to proxy, you may need to use Privoxy to forward SOCKS5 to HTTP.

If you're running VLLM and the LLM agent locally, Privoxy will by default proxy localhost, which will cause errors. To avoid this issue, you should set the following environment variable:

export no_proxy=localhost

After finishing cluster creation, proceed to the next "Update config.yml" step.

b) Remote cluster (Manual setup with Ansible)

AIOpsLab supports any remote kubernetes cluster that your kubectl context is set to, whether it's a cluster from a cloud provider or one you build yourself. We have some Ansible playbooks to setup clusters on providers like CloudLab and our own machines. Follow this README to set up your own cluster, and then proceed to the next "Update config.yml" step.

c) Azure VMs with Terraform + Ansible (Recommended for cloud)

Single command provisions VMs, sets up K8s, and configures AIOpsLab:

# Mode B (AIOpsLab on laptop, remote kubectl):
python3 scripts/terraform/deploy.py --apply --resource-group <your-rg> --workers 2 --mode B

# Mode A (AIOpsLab on controller VM, full fault injection support):
python3 scripts/terraform/deploy.py --apply --resource-group <your-rg> --workers 2 --mode A

See Terraform README for all options (--allowed-ips, --dev, --setup-only, etc.).

Note: Mode B is convenient for development but some fault injectors (e.g., VirtualizationFaultInjector) require Docker on the local machine. Use Mode A for full functionality.

Update config.yml

cd aiopslab
cp config.yml.example config.yml

Update your config.yml so that k8s_host is the host name of the control plane node of your cluster. Update k8s_user to be your username on the control plane node. If you are using a kind cluster, your k8s_host should be kind. If you're running AIOpsLab on cluster, your k8s_host should be localhost.

Running agents locally

Human as the agent:

python3 cli.py
(aiopslab) $ start misconfig_app_hotel_res-detection-1 # or choose any problem you want to solve
# ... wait for the setup ...
(aiopslab) $ submit("Yes") # submit solution

Run GPT-4 baseline agent:

# Create a .env file in the project root (if not exists)
echo "OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>" > .env
# Add more API keys as needed:
# echo "QWEN_API_KEY=<YOUR_QWEN_API_KEY>" >> .env
# echo "DEEPSEEK_API_KEY=<YOUR_DEEPSEEK_API_KEY>" >> .env

python3 clients/gpt.py # you can also change the problem to solve in the main() function

Our repository comes with a variety of pre-integrated agents, including agents that enable secure authentication with Azure OpenAI endpoints using identity-based access. Please check out Clients for a comprehensive list of all implemented clients.

The clients will automatically load API keys from your .env file.

You can check the running status of the cluster using k9s or other cluster monitoring tools conveniently.

To browse your logged session_id values in the W&B app as a table:

  1. Make sure you have W&B installed and configured.
  2. Set the USE_WANDB environment variable:
    # Add to your .env file
    echo "USE_WANDB=true" >> .env
    
  3. In the W&B web UI, open any run and click Tables → Add Query Panel.
  4. In the key field, type runs.summary and click Run, then you will see the results displayed in a table format.
<h2 id="⚙️usage">⚙️ Usage</h2>

AIOpsLab can be used in the following ways:

Running agents remotely

You can run AIOpsLab on a remote machine with larger computational resources. This section guides you through setting up and using AIOpsLab remotely.

  1. On the remote machine, start the AIOpsLab service:

    SERVICE_HOST=<YOUR_HOST> SERVICE_PORT=<YOUR_PORT> SERVICE_WORKERS=<YOUR_WORKERS> python service.py
    
  2. Test the connection from your local machine: In your local machine, you can test the connection to the remote AIOpsLab service using curl:

    # Check if the service is running
    curl http://<YOUR_HOST>:<YOUR_PORT>/health
    
    # List available problems
    curl http://<YOUR_HOST>:<YOUR_PORT>/problems
    
    # List available agents
    curl http://<YOUR_HOST>:<YOUR_PORT>/agents
    
  3. Run vLLM on the remote machine (if using vLLM agent): If you're using the vLLM agent, make sure to launch the vLLM server on the remote machine:

    # On the remote machine
    chmod +x ./clients/launch_vllm.sh
    ./clients/launch_vllm.sh
    

    You can customize the model by editing launch_vllm.sh before running it.

  4. Run the agent: In your local machine, you can run the agent using the following command:

    curl -X POST http://<YOUR_HOST>:<YOUR_PORT>/simulate \
      -H "Content-Type: application/json" \
      -d '{
        "problem_id": "misconfig_app_hotel_res-mitigation-1",
        "agent_name": "vllm",
        "max_steps": 10,
        "temperature": 0.7,
        "top_p": 0.9
      }'
    

How to onboard your agent to AIOpsLab?

AIOpsLab makes it extremely easy to develop and evaluate your agents. You can onboard your agent to AIOpsLab in 3 simple steps:

  1. Create your agent: You are free to develop agents using any framework of your choice. The only requirements are:

    • Wrap your agent in a Python class, say Agent

    • Add an async method get_action to the class:

      # given current state and returns the agent's action
      async def get_action(self, state: str) -> str:
          # <your agent's logic here>
      
  2. Register your agent with AIOpsLab: You can now register the agent with AIOpsLab's orchestrator. The orchestrator will manage the interac

Related Skills

View on GitHub
GitHub Stars846
CategoryDevelopment
Updated18h ago
Forks153

Languages

Python

Security Score

95/100

Audited on Apr 1, 2026

No findings