Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

| :warning: Update | |:-----------------------------------------| | 18/12/2025: We've updated the dependencies for this environment. Now we've applied Python 3.10. Note that PyTorch has been updated to 2.9.0 and Pandapower has been updated to 3.3.0. The CUDA version we tested is 12.8. In the environment.yml, we only provide the general version of PyTorch 2.9 with no support for GPU. If you'd like to use GPU accelerated PyTorch, you need to manually install PyTorch with the appropriate version of CUDA, e.g., pip install torch==2.9.0+cu128 --index-url https://download.pytorch.org/whl/cu128.| | 17/12/2025: We've uploaded the large-size dataset that drives our environment to Hugging Face Platform, which could be more friendly for those people who would like to download dataset directly on remote servers. All download links of data have been updated: (1) the download link for voltage_control_data.zip is: https://huggingface.co/datasets/hsvgbkhgbv/Multi-Agent-Power-Distribution-Networks/resolve/main/voltage_control_data.zip; (2) the download link for traditional_control_data.zip is: https://huggingface.co/datasets/hsvgbkhgbv/Multi-Agent-Power-Distribution-Networks/resolve/main/traditional_control_data.zip. | | 15/03/2024: We fixed a bug in assigning p and q of PV to the nodes equipped with an agent. Thanks to Yang Zhang, a PhD student from Department of Automation, Shanghai Jiao Tong University, who found this bug and assisted us in fixing it. |

This is the implementation of the paper Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks.

MAPDN is an environment of distributed/decentralised active voltage control on power distribution networks and a batch of state-of-the-art multi-agent actor-critic algorithms that can be used for training.

The environment implementation follows the multi-agent environment framework provided in PyMARL. Therefore, all baselines that are compatible with that framework can be easily applied to this environment.

Summary of the Repository

This repository includes the following components.

An environment of active voltage control (decentralised and distributed);
A training framework for MARL;
10 MARL algorithms;
- IAC, IDDPG, MADDPG, SQDDPG, IPPO, MAPPO, MAAC, MATD3, COMA, and FacMADDPG.
5 voltage barrier functions;
- Bowl, L1, L2, Courant Beltrami, and Bump.
Implementation of droop control and OPF in Matlab.

A Brief Introduction of the Task

In this section, we give a brief introduction of this task so that the users can easily understand the objective of this environment.

Objective: Each agent controls a PV inverter that generates the reactive power so that the voltage of each bus is varied and within the safety range defined as $0.95 \ p.u. \leq v_{k} \leq 1.05 \ p.u., \ \forall k \in V$, where $V$ is the set of buses of the whole system and $p.u.$ is a unit to measure voltage. Since each agent's decision could influence each other due to property of power networks and not all buses is installed a PV, agents should cooperate to control the voltage of all buses in a power network. Also, each agent can only observe the partial information as the observation. This problem is natually a Dec-POMDP.

Action: The reactive power is constrained by the capacity of the equipment, and the capacity is related to the active power of PV. As a result, the range of reactive power is dynamically varied. Mathematically, the reactive power of each PV inverter is represented as $$q_{k}^{\scriptscriptstyle PV} = a_{k} \ \sqrt{(s_{k}^{\scriptscriptstyle \max})^{2} - (p_{k}^{\scriptscriptstyle PV})^{2}},$$ where $s_{k}^{\scriptscriptstyle \max}$ is the maximum apparent power of the $k\text{th}$ node that is dependent on the physical capacity of the PV inverter; $p_{k}^{\scriptscriptstyle PV}$ is the instantaneous PV active power. The action we control is the variable $0 \leq a_{k} \leq 1$, indicating the percentage of the intantaneous capacity of reactive power. For this reason, the action is continuous in this task.

Observation: Each agent can observe the information of the zone where it belongs. For example, in Figure 1 the agent on bus 25 can observe the information in zone 3. Each agent's observation consists of the following variables within the zone:

Load Active Power,
Load Reactive Power,
PV Active Power,
PV Reactive Power,
Voltage.

<figure> <img src="img/case33.png" height="240" weight="720"> <figcaption> Figure 1: Illustration on the 33-bus network. Each bus is indexed by a circle with a number. Four control regions are partitioned by the smallest path from the terminal to the main branch (bus 1-6). We control the voltage on bus 2-33 whereas bus 0-1 represent the substation with constant voltage and infinite active and reactive power capacity. G represents an external generator; small Ls represent loads; and emoji of sun represents the location where a PV is installed. </figcaption> </figure>

Reward: The reward function is shown as follows:

$$\mathit{r} = - \frac{1}{|V|} \sum_{i \in V} l_{v}(v_{i}) - \alpha \cdot l_{q}(\mathbf{q}^{\scriptscriptstyle PV}),$$

where $l_{v}(\cdot)$ is a voltage barrier function that measures whether the voltage of a bus is within the safety range; $l_{q}(\mathbf{q}^{\scriptscriptstyle PV})=\frac{1}{|\mathcal{I}|} \Vert \mathbf{q}^{\scriptscriptstyle PV} \Vert_{1}$ that can be seen as a simple approximation of power loss, where $\mathbf{q}^{\scriptscriptstyle PV}$ is a vector of agents' reactive power, $\mathcal{I}$ is a set of agents and $\alpha$ is a multiplier to adjust the balance between voltage control and the generation of reactive power. In this work, we investigate different forms of $l_{v}(\cdot)$. Literally, the aim of this reward function is controlling the voltage, meanwhile minimising the power loss that is correlated with the economic loss.

Installation of the Dependencies

Install Anaconda.
After cloning or downloading this repository, assure that the current directory is [your own parent path]/MAPDN.
If you are on Linux OS (e.g. Ubuntu), please execute the following command.
```
conda env create -f environment.yml
```
If you are on Windows OS, please execute the following command. Note that please launch the Anaconda shell by the permission of Administration.
```
conda env create -f environment_win.yml
```
If you'd like to use GPU accelerated PyTorch, you need to manually install PyTorch with the appropriate version of CUDA, e.g.,
```
pip install torch==2.9.0+cu128 --index-url https://download.pytorch.org/whl/cu128
```
Activate the installed virtual environment using the following command.
```
conda activate mapdn
```

Downloading the Dataset

Download the data from the link.
Unzip the zip file and you can see the following 3 folders:
- case33_3min_final
- case141_3min_final
- case322_3min_final
Go to the directory [Your own parent path]/MAPDN/environments/var_voltage_control/ and create a folder called data.
Move the 3 extracted folders by step 2 to the directory [Your own parent path]/MAPDN/environments/var_voltage_control/data/.

Two modes of Tasks

Background

There are 2 modes of tasks included in this environment, i.e. distributed active voltage control and decentralised active voltage control. Distributed active voltage control is the task introduced in the paper Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks, whereas Decentralised active voltage control is the task that most of the prior works considered. The primary difference between these 2 modes of tasks are that in decentralised active voltage control the equipments in each zone are controlled by an agent, while in distributed active voltage control each equipment is controlled by an agent (see Figure 1).

How to use?

If you would attempt distributed active voltage control, you can set the argument for train.py and test.py as follows.

python train.py --mode distributed

python test.py --mode distributed

If you would attempt decentralised active voltage control, you can set the argument for train.py and test.py as follows.

python train.py --mode decentralised

python test.py --mode decentralised

Quick Start

Training Your Model

You can execute the following command to train a model on a power system using the following command.

python train.py --alg matd3 --alias 0 --mode distributed --scenario case33_3min_final --voltage-barrier

MAPDN

Install / Use

README