SkillAgentSearch skills...

JarvisArt

[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

Install / Use

/learn @LYL1015/JarvisArt
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <img src="assets/logo.png" alt="JarvisArt Icon" width="100"/>

[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

<!-- **JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent** -->

<a href="https://arxiv.org/pdf/2506.17612"><img src="https://img.shields.io/badge/arXiv-2506.17612-b31b1b.svg" alt="Paper"></a> <a href="https://jarvisart.vercel.app/"><img src="https://img.shields.io/badge/Project%20Page-Visit-blue" alt="Project Page"></a> <a href="https://www.youtube.com/watch?v=Ol28DQj8wV8"><img src="https://img.shields.io/badge/YouTube-Watch-red" alt="YouTube"></a> <a href="https://www.bilibili.com/video/BV1Sd3nzREvP/?spm_id_from=333.1007.top_right_bar_window_history.content.click&vd_source=3939804dc1d27869e194605ae46329ec"><img src="https://img.shields.io/badge/BiliBili-哔哩哔哩-FF69B4" alt="BiliBili"></a>

<a href="https://huggingface.co/spaces/LYL1015/JarvisArt-Preview"><img src="https://img.shields.io/badge/🤗-HF Demo-yellow.svg" alt="Hugging Face Demo"></a> <a href="https://huggingface.co/papers/2506.17612"><img src="https://img.shields.io/badge/🤗-Daily%20Papers-ffbd00.svg" alt="Huggingface Daily Papers"></a> <a href="https://huggingface.co/JarvisArt/JarvisArt-1208"><img src="https://img.shields.io/badge/🤗-Model%20Weights-green.svg" alt="Model Weights"></a> <a href="https://huggingface.co/datasets/JarvisArt/MMArt-PPR10k"><img src="https://img.shields.io/badge/🤗-Dataset-blue.svg" alt="Dataset"></a> <a href="https://huggingface.co/datasets/JarvisArt/MMArt-Bench"><img src="https://img.shields.io/badge/🤗-MMArt--Bench-blueviolet.svg" alt="MMArt-Bench"></a>

<a href="https://x.com/ling_yunlong/status/1940010865627103419"><img src="https://img.shields.io/twitter/follow/LYL1015?style=social" alt="Twitter Follow"></a> <a href="https://github.com/LYL1015/JarvisArt"><img src="https://img.shields.io/github/stars/LYL1015/JarvisArt?style=social" alt="GitHub Stars"></a>

</div> <div align="center"> <p> <a href="https://lyl1015.github.io/">Yunlong Lin</a><sup>1*</sup>, <a href="https://github.com/iendi">Zixu Lin</a><sup>1*</sup>, <a href="https://github.com/kunjie-lin">Kunjie Lin</a><sup>1*</sup>, <a href="https://noyii.github.io/">Jinbin Bai</a><sup>5</sup>, <a href="https://paulpanwang.github.io/">Panwang Pan</a><sup>4</sup>, <a href="https://chenxinli001.github.io/">Chenxin Li</a><sup>3</sup>, <a href="https://haoyuchen.com/">Haoyu Chen</a><sup>2</sup>, <a href="https://zhongdao.github.io/">Zhongdao Wang</a><sup>6</sup>, <a href="https://scholar.google.com/citations?user=k5hVBfMAAAAJ&hl=zh-CN">Xinghao Ding</a><sup>1†</sup>, <a href="https://fenglinglwb.github.io/">Wenbo Li</a><sup>3♣</sup>, <a href="https://yanshuicheng.info/">Shuicheng Yan</a><sup>5†</sup> </p> </div> <div align="center"> <p> <sup>1</sup>Xiamen University, <sup>2</sup>The Hong Kong University of Science and Technology (Guangzhou), <sup>3</sup> The Chinese University of Hong Kong, <sup>4</sup>Bytedance, <sup>5</sup>National University of Singapore, <sup>6</sup>Tsinghua University </p> <!-- <sup>*</sup>Equal Contributions <sup>♣</sup>Project Leader <sup>†</sup>Corresponding Author --> <!-- <p>Accepted by CVPR 2025</p> --> </div> <details open><summary>💡 Our new work that may interest you ✨. </summary><p> <!-- may -->

[CVPR' 2026] JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization <br> Yunlong Lin, Lingqing Wang, Zixu Lin and Kunjie Lin, etc. <br> github github arXiv Project Page <br>

</p></details>

📮 Updates

<!-- - **[2025.12.8]** for the complete data generation pipeline. -->

🧭 Navigation


📝 Overview

<div align="center"> <img src="assets/teaser.jpg" alt="JarvisArt Teaser" width="800"/> <br> <em>JarvisArt workflow and results showcase</em> </div>

JarvisArt is a multi-modal large language model (MLLM)-driven agent for intelligent photo retouching. It is designed to liberate human creativity by understanding user intent, mimicking the reasoning of professional artists, and coordinating over 200 tools in Adobe Lightroom. JarvisArt utilizes a novel two-stage training framework, starting with Chain-of-Thought supervised fine-tuning for foundational reasoning, followed by Group Relative Policy Optimization for Retouching (GRPO-R) to enhance its decision-making and tool proficiency. Supported by the newly created MMArt dataset (55K samples) and MMArt-Bench, JarvisArt demonstrates superior performance, outperforming GPT-4o with a 60% improvement in pixel-level metrics for content fidelity while maintaining comparable instruction-following capabilities.


🎬 Demo Videos

<!-- <div align="center"> <video width="800" controls> <source src="assets/demo.mp4" type="video/mp4"> Your browser does not support the video tag. </video> <p>JarvisArt Demo Video: Showcasing intelligent photo retouching capabilities</p> </div> --> <!-- <div align="center"> <img src="assets/demo1.gif" alt="JarvisArt Demo" width="800px"> <p>JarvisArt Interactive Retouching Demonstration</p> </div> <div align="center"> <img src="assets/demo2.gif" alt="JarvisArt Demo" width="800px"> <p>JarvisArt Multimodal Instruction Understanding and Execution</p> </div> -->

Global Retouching Case

<div align="center"> <img src="assets/global_demo1.gif" alt="JarvisArt Demo" width="800px"> <p></p> </div>

Local Retouching Case

<div align="center"> <img src="assets/local_demo1.gif" alt="JarvisArt Demo" width="800px"> <p>JarvisArt supports multi-granularity retouching goals, ranging from scene-level adjustments to region-specific refinements. Users can perform intuitive, free-form edits through natural inputs such as text prompts and bounding boxes</p> </div>

💻 Getting Started

For gradio demo running, please follow:

For batch inference, please follow the instructions below:

For Agent-to-Lightroom Protocol, please follow:

For training (SFT & GRPO-R), please follow:

For data construction pipeline (image pairs, instructions, CoT generation & format conversion), please follow:

For evaluation, please follow:


🎪 Checklist

  • [x] Create repo and project page
  • [x] Release preview Inference code and gradio demo
  • [x] Release huggingface online demo
  • [x] Release preview model weight
  • [x] Release Agent-to-Lightroom Protocol
  • [x] Release MMArt-PPR10K dataset with open license
View on GitHub
GitHub Stars855
CategoryDevelopment
Updated3h ago
Forks32

Languages

Python

Security Score

85/100

Audited on Mar 23, 2026

No findings