<h1 align="center"> DreamEngine</h1> <p align="center"> <a href="https://arxiv.org/abs/2502.20172"> <img alt="Static Badge" src="https://img.shields.io/badge/Paper-arXiv-red"> </a> <a href="https://huggingface.co/leonardPKU/DreamEngine-ObjectFusion"> <img alt="Static Badge" src="https://img.shields.io/badge/Model-HuggingFace-yellow"> </a> </p> <img width="898" alt="截屏2025-02-23 22 38 04" src="https://github.com/user-attachments/assets/cdeb00d3-2c6a-459a-b884-7946a51075b8" />

DreamEngine is a unified framework that integrates multimodal encoders like QwenVL with diffusion models through a two-stage training approach, enabling advanced text-image interleaved control and achieving state-of-the-art performance in generating images with complex, concept-merged inputs.

https://github.com/user-attachments/assets/f070c0e8-4ea2-4294-b878-63f564da1b79

Updates:

2025-03-03: Release checkpoint and a demo for text-guided object fusion.

Run the Demo locally

bash setup.sh

# setup the paths in demo.py
python src/scripts/eval/demo.py

Model Structure

Training

Demos

Citation

If you feel the work helpful, please kindly cite

@misc{chen2025multimodalrepresentationalignmentimage,
      title={Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think}, 
      author={Liang Chen and Shuai Bai and Wenhao Chai and Weichu Xie and Haozhe Zhao and Leon Vinci and Junyang Lin and Baobao Chang},
      year={2025},
      eprint={2502.20172},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.20172}, 
}

DreamEngine

Install / Use

README

Run the Demo locally

Model Structure

Training

Demos

Citation