[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

</div> <div align="center"> <p> <a href="https://lyl1015.github.io/">Yunlong Lin</a><sup>1*</sup>, <a href="https://github.com/iendi">Zixu Lin</a><sup>1*</sup>, <a href="https://github.com/kunjie-lin">Kunjie Lin</a><sup>1*</sup>, <a href="https://noyii.github.io/">Jinbin Bai</a><sup>5</sup>, <a href="https://paulpanwang.github.io/">Panwang Pan</a><sup>4</sup>, <a href="https://chenxinli001.github.io/">Chenxin Li</a><sup>3</sup>, <a href="https://haoyuchen.com/">Haoyu Chen</a><sup>2</sup>, <a href="https://zhongdao.github.io/">Zhongdao Wang</a><sup>6</sup>, <a href="https://scholar.google.com/citations?user=k5hVBfMAAAAJ&hl=zh-CN">Xinghao Ding</a><sup>1†</sup>, <a href="https://fenglinglwb.github.io/">Wenbo Li</a><sup>3♣</sup>, <a href="https://yanshuicheng.info/">Shuicheng Yan</a><sup>5†</sup> </p> </div> <div align="center"> <p> <sup>1</sup>Xiamen University, <sup>2</sup>The Hong Kong University of Science and Technology (Guangzhou), <sup>3</sup> The Chinese University of Hong Kong, <sup>4</sup>Bytedance, <sup>5</sup>National University of Singapore, <sup>6</sup>Tsinghua University </p>   </div> <details open><summary>💡 Our new work that may interest you ✨. </summary><p>

[CVPR' 2026] JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization <br> Yunlong Lin, Lingqing Wang, Zixu Lin and Kunjie Lin, etc. <br> <br>
</p></details>

📮 Updates

[2025.12.8] The evaluation set MMArt-Bench is now released! Data construction scripts are now released! Check out Data Scripts
[2025.12.7] Training (SFT & GRPO-R) and Evaluation scripts are now released! Check out Training Guide and Evaluation.
[2025.10.7] Local client now supports Agent-to-Lightroom Protocol ! See our Agent-to-Lightroom Protocol documentation for seamless AI agent integration with Adobe Lightroom.
[2025.10.1] MMArt-PPR10k is now live on Hugging Face Datasets! Built upon @PPR10K, this open-source dataset contains diverse user instructions, alongside Lightroom Lua/XMP files and corresponding original and edited images. It's released under the Apache 2.0 license.
[2025.9.18] Congratulations! JarvisArt is accepted to NeurIPS 2025.
[2025.7.14] Thanks to @pydemo for writing a helpful tutorial: Automate Your Lightroom Preset Creation with AI.
[2025.7.12] Inference code is now available! Check out our Inference documentation.
[2025.7.9] We're grateful to @AK for featuring JarvisArt on Twitter!
[2025.7.4] See our Chinese blog to get more details about JarvisArt! 中文解读｜修图界ChatGPT诞生！JarvisArt：解放人类艺术创造力——用自然语言指挥200+专业工具.
[2025.7.3] Hugging Face online demo is now available: Try it here: JarvisArt-Preview.
[2025.6.28] Gradio demo and model weights are now available! Check out our Gradio Demo and Model Weights.
[2025.6.20] Paper is now available on arXiv.
[2025.6.16] Project page is live.

🧭 Navigation

📝 Overview

<div align="center"> <img src="assets/teaser.jpg" alt="JarvisArt Teaser" width="800"/> <br> <em>JarvisArt workflow and results showcase</em> </div>

JarvisArt is a multi-modal large language model (MLLM)-driven agent for intelligent photo retouching. It is designed to liberate human creativity by understanding user intent, mimicking the reasoning of professional artists, and coordinating over 200 tools in Adobe Lightroom. JarvisArt utilizes a novel two-stage training framework, starting with Chain-of-Thought supervised fine-tuning for foundational reasoning, followed by Group Relative Policy Optimization for Retouching (GRPO-R) to enhance its decision-making and tool proficiency. Supported by the newly created MMArt dataset (55K samples) and MMArt-Bench, JarvisArt demonstrates superior performance, outperforming GPT-4o with a 60% improvement in pixel-level metrics for content fidelity while maintaining comparable instruction-following capabilities.

🎬 Demo Videos

Global Retouching Case

Local Retouching Case

<div align="center"> <img src="assets/local_demo1.gif" alt="JarvisArt Demo" width="800px"> <p>JarvisArt supports multi-granularity retouching goals, ranging from scene-level adjustments to region-specific refinements. Users can perform intuitive, free-form edits through natural inputs such as text prompts and bounding boxes</p> </div>