SynFMC
[ICCV 2025] Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation
Install / Use
/learn @FudanCVL/SynFMCREADME
🎯 Introduction
Controlling the movements of dynamic objects and the camera within generated videos is a meaningful yet challenging task. Due to the lack of datasets with comprehensive 6D pose annotations, existing text-to-video methods can not simultaneously control the motions of both camera and objects in 3D-aware manner. Therefore we introduce a Synthetic Dataset for Free-Form Motion Control (SynFMC). The proposed SynFMC dataset includes diverse object and environment categories and covers various motion patterns according to specific rules, simulating common and complex real-world scenarios. The complete 6D pose information facilitates models learning to disentangle the motion effects from objects and the camera in a video. To provide precise 3D-aware motion control, we further propose a method trained on SynFMC, Free-Form Motion Control (FMC). FMC can control the 6D poses of objects and camera independently or simultaneously, producing high-fidelity videos.
<img src='assets/teaser.svg' width='100%' /> <p align="left"> <b>Figure 1.</b> The rule-based generation pipeline of videos in the proposed Synthetic Dataset for Free-Form Motion Control (SynFMC). This example generates synthetic video with three objects: (1) The environment asset and it’s matching object assets are selected as the scene elements. (2) The motion types of objects and camera are randomly selected for trajectory generation. (3) The center region shows the resulting 3D animation sequence used for rendering. The rendered video and annotations are demonstrated in the last row. </p> <br> <img src='assets/network.svg' width='100%' /> <p align="left"> <b>Figure 2.</b> The architecture of FMC. In the first stage, we randomly sample the images from synthetic videos and update the parameters from injected Domain LoRA. Next, the modules from CMC are learned. It consists of two parts: Camera Encoder and Camera Adapter, where the Camera Adapter is introduced into the temporal modules. Finally, we train the Object Encoder from OMC. It receives the 6D object pose features, which are repeated in the corresponding object region. We use Gaussian blur kernel centered at the centroid to prevent the need of precise masks. Then, the output is multiplied by the coarse masks to modulate the features in the main branch. </p> <br>⚙️Quick Start
1. Setup
conda env create -f environment.yaml
conda activate fmc
2. Training
The training process of FMC consists of three stages.
2.1 Learn Domain LoRA
In the first stage, we randomly sample the images from synthetic videos and update the parameters from injected Domain LoRA.
bash dist_run_lora.bash
2.2 Learn Camera Motion Controller (CMC)
Next, the modules from CMC are learned. Inspired by Cameractrl, it consists of two parts: Camera Encoder and Camera Adapter, where the Camera Adapter is introduced into the temporal modules.
bash dist_run_cam.bash
2.3 Learn Object Motion Controller (OMC)
Finally, we train the Object Encoder from OMC. It receives the 6D object pose features, which are repeated in the corresponding object region. We use Gaussian blur kernel centered at the centroid to prevent the need of precise masks. Then, the output is multiplied by the coarse masks to modulate the features in the main branch.
bash dist_run_obj.bash
<br>
<!-- ## 📋 TODO List
- [x] Upload training code of FMC.
- [ ] Upload SynFMC dataset (in progress).
- [ ] Upload the code of SynFMC.
- [ ] Upload inference code and model weights of FMC. -->
✒️ Citation
If you find our work useful for your research and applications, please kindly cite using this BibTeX:
@inproceedings{SynFMC,
title={{Free-Form Motion Control}: Controlling the 6D Poses of Camera and Objects in Video Generation},
author={Shuai, Xincheng and Ding, Henghui and Qin, Zhenyuan and Luo, Hao and Ma, Xingjun and Tao, Dacheng},
booktitle={ICCV},
year={2025}
}
Related Skills
qqbot-channel
353.1kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.7k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
353.1kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
3.1kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
