StyleStudio
[CVPR 2025] Official implementation of StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
Install / Use
/learn @Westlake-AGI-Lab/StyleStudioREADME
News and Update
- [2024.12.12] 🔥🔥We release the code.
- [2024.12.19] 📝📝We have summarized the recent developments in style transfer. And we will continue to update.
Abstract
Text-driven style transfer aims to merge the style of a reference image with content described by a text prompt. Recent advancements in text-to-image models have improved the nuance of style transformations, yet significant challenges remain, particularly with overfitting to reference styles, limiting stylistic control, and misaligning with textual content. In this paper, we propose three complementary strategies to address these issues. First, we introduce a cross-modal Adaptive Instance Normalization (AdaIN) mechanism for better integration of style and text features, enhancing alignment. Second, we develop a Style-based Classifier-Free Guidance (SCFG) approach that enables selective control over stylistic elements, reducing irrelevant influences. Finally, we incorporate a teacher model during early generation stages to stabilize spatial layouts and mitigate artifacts. Our extensive evaluations demonstrate significant improvements in style transfer quality and alignment with textual prompts. Furthermore, our approach can be integrated into existing style transfer frameworks without fine-tuning.
Getting Started
1.Clone the code and prepare the environment
git clone https://github.com/Westlake-AGI-Lab/StyleStudio
cd StyleStudio
# create env using conda
conda create -n StyleStudio python=3.10
conda activate StyleStudio
# install dependencies with pip
# for Linux and Windows users
pip install -r requirements.txt
2.Run StyleStudio
Please note: Our solution is designed to be fine-tuning free and can be combined with different methods.
Parameter Explanation
adainIPusing the cross modal AdaINfuSAttnhijack Self-Attention Map in the Teacher ModelfuAttnhijack Cross-Attention Map in the Teacher Modelend_fusiondefine when the Teacher Model stops participatingpromptspecified prompt for generating the imagestyle_pathpath to the style image or folderneg_style_pathpath to the negative style image
Integration with CSGO
Follow CSGO to download pre-trained checkpoints.
This is an example of usage: as the value of end_fusion increases, the style gradually diminishes. If the num_inference_steps are set to 50, we recommend setting end_fusion between 10 and 20. Typically, end_fusion should be set within the first 1/5 to 1/3 of the total num_inference_steps.
If you find that layout stability is not satisfactory, consider increasing the duration of the Teacher Model's involvement.
# Generate a single stylized image
# Use a specific text prompt and style image path
python infer_StyleStudio.py \
--prompt "A red apple" \
--style_path "assets/style1.jpg" \
--adainIP \ # Enable Cross-Modal AdaIN
--fuSAttn \ # Enable Teacher Model with Self Attention Map
--end_fusion 20 \ # Define when the Teacher Model stop participating
--num_inference_steps 50
# Check layout stability across different style images
# With the same text prompt and a set of style images
python infer_StyleStudio_layout_stability.py \
--prompt "A red apple" \
--style_path "path/to/style_images_folder" \
--adainIP \ # Enable Cross-Modal AdaIN
--fuSAttn \ # Enable Teacher Model with Self Attention Map
--end_fusion 20 \ # Define when the Teacher Model stop participating
--num_inference_steps 50
Note
- As shown in Figure 15 of the paper, employing a Cross Attention Map in the Teacher Model does not ensure layout stability. We have also provided an interface
fuAttnand encourage everyone to experiment with it. - To ensure layout stability and consistency for the same prompt under different style images, it is important to maintain consistency in the initial noise $z_0$ during experiments. For more details on this aspect, refer to
infer_StyleStudio_layout_stability.py.
This is an example of using Style-based Classifier-Free Guidance.
python infer_StyleStudio.py \
--prompt "A red apple" \
--style_path "assets/style2.jpg" \
--neg_style_path "assets/neg_style2.jpg" \
Some recommendations for generating Negative Style Images.
- You can use ControlNet Canny for generation.
- To ensure the generated images are more realistic, you can use weights from Civitai or Huggingface that are better suited for generating realistic image effects. We use the RealVisXL_V4.0.
To generate negative style images, we provide a code implementation in example_create_neg_style.py for your reference.
Integration with InstantStyle
Follow InstantStyle to download pre-trained checkpoints.
python infer_InstantStyle.py \
--prompt "A red apple" \
--style_path "assets/style1.jpg" \
--adainIP \ # Enable Cross-Modal AdaIN
--fuSAttn \ # Enable Teacher Model with Self Attention Map
--end_fusion 20 \ # Define when the Teacher Model stop participating
--num_inference_steps 50
Integration with StyleCrafter
Follow StyleCrafter to download pre-trained checkpoints.
We encourage you to integrate the Teacher Model with StyleCrafter. This combination, as shown in our experiments, not only helps maintain layout stability but also effectively reduces content leakage.
cd stylecrafter_sdxl
python stylecrafter_teacherModel.py \
--config config/infer/style_crafter_sdxl.yaml \
--style_path "../assets/style1.jpg" \
--prompt "A red apple" \
--scale 0.5 \
--num_samples 2 \
--end_fusion 10 # Define when the Teacher Model stop participating
3. Demo
To run a local demo of the project, run the following:
python gradio/app.py
Related Links
- Style Transfer with Diffusion Models: A paper collection of recent style transfer methods with diffusion models.
- CSGO: Content-Style Composition in Text-to-Image Generation
- InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
- StyleCrafter-SDXL
- IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
BibTeX
If you find our repo helpful, please consider leaving a star or cite our paper :)
@inproceedings{lei2025stylestudio,
title={StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements},
author={Lei, Mingkun and Song, Xue and Zhu, Beier and Wang, Hao and Zhang, Chi},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={23443--23452},
year={2025}
}
📭 Contact
If you have any comments or questions, feel free to contact Mingkun Lei.
Related Skills
node-connect
350.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
