CoSo
Official code for paper "Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning"
Install / Use
/learn @langfengQ/CoSoREADME
Table of Contents
Installation
(Option 1) Using Docker (Recommended)
Fast and isolated setup using the provided Dockerfile.
docker build -t coso .
docker run --name coso --gpus all --device /dev/kvm --group-add kvm --shm-size 2gb -it -v <repo_path>/CoSo:<repo_path>/CoSo coso
Installation is complete! Skip to Configuration.
(Option 2) Using Conda (4 Steps)
1. Create the Environment and Install Dependencies
conda create -n coso python==3.10
conda activate coso
git clone https://github.com/langfengQ/CoSo.git
cd CoSo
pip install -e .
2. Environment Setup
The environment setup follows the same procedure as DigiRL. Please refer to the environment README. Before moving on, you should be able to view this screenshot by running this script.
3. Download Model Checkpoints
Download the model:
wget https://huggingface.co/cooelf/Auto-UI/resolve/main/Auto-UI-Base.zip
unzip Auto-UI-Base.zip -d <path_to_autoui_dir>
The folder should contain:
Auto-UI-Base/
├── config.json
├── pytorch_model.bin
├── tokenizer.json
...
4. Pre-Collected Trajectories
Download from Google Drive:
| File Name | #Trajectories | Horizon | Size | | ---------------------------------------- | ------------- | ------- | ------- | | general-off2on-sft-trajectories.pt | 608 | 10 | 95.5MB | | general-offline-sft-trajectories.pt | 1552 | 10 | 243.9MB | | webshop-off2on-sft-trajectories.pt | 528 | 20 | 115.2MB | | webshop-offline-sft-trajectories.pt | 1296 | 20 | 297.5MB |
Store the files in path ~/data/:
mkdir ~/data
cp *.pt ~/data/
Configuration
-
Edit the configuration file: scripts/config/main/default.yaml:
-
Fill in API keys and project info:
huggingface_token: '' wandb_key: '' gemini_key: '' entity_name: '' project_name: '' -
Define the asset path:
assets_path: '<repo_path>/CoSo/digirl/environment/android/assets/task_set' -
(Only if using conda) Replace /root/ with your own paths:
policy_lm: '/root/Auto-UI-Base' cache_dir: '/root/.cache'
-
-
Edit the sub-configuration file
-
Choose the appropriate sub-configuration depending on training mode:
digirl_off2onordigirl_offline.yamlordigirl_online.yamloreval_only -
(Only if using conda) Replace /root/ with your own paths:
offline_data_path: "/root/data/webshop-off2on-sft-trajectories.pt"
-
Run Examples
1. Run CoSo
use_entropy and use_causal are set to True in the default.yaml.
use_entropy: True
use_causal: True
Then run CoSo via:
cd scripts
python run.py --config-path config/main --config-name digirl_off2on
2. Run Naive Entropy
Modify default.yaml:
use_entropy: True
use_causal: False
3. Run DigiRL Baseline
Modify default.yaml:
use_entropy: False
use_causal: False
<!-- ### Main Results Reproduction
To reproduce the results in Table 1 of our paper, first download the corresponding checkpoints as described above. As the results in the training set are obtained by randomly sampling tasks, we recommend reproducing the test results (which are obtained by sequentially sampling the first 96 trajectories).
To do this, modify the [`eval_only.yaml`](https://github.com/DigiRL-agent/digirl/blob/master/scripts/config/main/default.yaml) config file and its parent ['default.yaml'](https://github.com/DigiRL-agent/digirl/blob/master/scripts/config/main/default.yaml) config file to experiment settings. For instance, you can modify these configs for reproduction:
1. `default.yaml`
1. Set `task_split: "test"` and `eval_sample_mode: "sequential"`
2. Don't forget to increase `max_steps` to `20` if `task_set` is set to `webshop` (as the webshop tasks usually need more steps than the general tasks to complete).
2. `eval_only.yaml`
1. Make sure `rollout_size` (in `default.yaml`) * `eval_iterations` (in `eval_only.yaml`) = 96. For example, `rollout_size (16) * eval_iterations (6) = 96`. -->
Citation
If you find this code and CoSo useful in your research or applications, we would appreciate it if you could cite our work:
@article{feng2025towards,
title={Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning},
author={Feng, Lang and Tan, Weihao and Lyu, Zhiyi and Zheng, Longtao and Xu, Haiyang and Yan, Ming and Huang, Fei and An, Bo},
journal={arXiv preprint arXiv:2505.03792},
year={2025}
}
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
flutter-tutor
Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
16.9kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
