GenMAC

[AAAI 2026] GenMAC for Compositional Text-to-Video Generation

Generate Convert Improve

Install / Use

/learn @Karine-Huang/GenMAC

About this skill

Quality Score

0/100

README

<img src="assets/genmac_logo.png" alt="Icon" width="30"/> GENMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration

Method Overview

This repository provides the inference code for GenMAC, enabling compositional text-to-video generation. Follow the steps below to set up and generate your videos.

🔨 Installation

Step 1: Install Environment

Set up the required environment using Anaconda:

conda create -n genmac python=3.9 -y
conda activate genmac
pip install -r requirements.txt

Step 2: Download Required Files

Pretrained Model
- Download the pretrained model from Hugging Face. Place the model in the following directory (checkpoints/base_512_v2/model.ckpt):
```
mkdir -p checkpoints/base_512_v2
huggingface-cli download VideoCrafter/VideoCrafter2 model.ckpt --local-dir checkpoints/base_512_v2
```

Tokenizer

Download the tokenizer from this link. Place it in the following directory (checkpoints/tokenizer):

huggingface-cli download ali-vilab/text-to-video-ms-1.7b \
   --include "tokenizer/*" \
   --local-dir checkpoints

💡 Quick Start

Step 1: Prepare API Key

GenMAC uses GPT-4o for its multi-agent collaboration. To enable access to the model:

Add your OpenAI API key to the file:
```
utils/api_key.py
```
Ensure that your system has internet access to connect to OpenAI's servers.

Step 2: Run Inference

To generate videos, use the following command:

python scripts/run_t2v.py

Note: This process requires approximately 78GB of GPU VRAM

Additional Configurations

Specify the Prompt

Update the prompt text in the following file:

assets/prompt/prompt.txt
Adjust Iterations

Modify the maximum iteration number (MAX_ITER) in Line 22 of the script (scripts/run_t2v.py):

e.g., MAX_ITER = 5
Set Seed

Set the seed value in Line 13 of the script (scripts/run_t2v.py):

e.g., seed = 12345678

Output

Generated videos will be saved in the following directory:

results/<seed>_<timestamp>/iter_<MAX_ITER-1>/video/results_t2v_baseline_0/base_512_v2/

Example: If the following parameters are used:

MAX_ITER = 5
seed = 12345678
Script executed on 2025-07-25-22-21-24 The generated videos will be located in:

results/12345678_2025-07-25-22-21-24/iter_4/video/results_t2v_baseline_0/base_512_v2/

⭐ Acknowledgements

We would like to thank the following great open-source projects and research works: LVD, VideoCrafter2.

📚 Citation

@article{huang2024genmaccompositionaltexttovideogeneration,
      title={GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration}, 
      author={Kaiyi Huang and Yukun Huang and Xuefei Ning and Zinan Lin and Yu Wang and Xihui Liu},
      year={2024},
      eprint={2412.04440},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.04440}, 
}

Related Skills

docs-writer

98.6k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

329.0k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

arscontexta

2.8k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

Assume the personality of the Persona described in any of the document available in the @~/.ai/personas directory.

Karine-Huang

View profile

View on GitHub

GitHub Stars32

CategoryContent

Updated1mo ago

Forks1

Karine-Huang/GenMAC

Languages

Python

Security Score

75/100

Audited on Feb 2, 2026

No findings