Fines
Code for paper "FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning" Neurips2025.
Install / Use
/learn @JiazuoYu/FinesREADME
🚀 Finers
Code for paper "FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning" Neurips2025. ArXiv
📊 Benchmark
👉 🔥 Our Benchmark on HuggingFace


🛠️Framework

📦 Installation
# Create environment
conda create -n finers python=3.10
conda activate finers
# Project requirements
pip install -r requirements.txt
# Install PyTorch (CUDA 11.8)
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118
# xFormers
pip install -U xformers==0.0.29 --index-url https://download.pytorch.org/whl/cu118
# Core dependencies
pip install bitsandbytes accelerate loguru pycocotools matplotlib sam2
pip install flash-attn --no-build-isolation # may take long, or download from GitHub releases
# Editable install
pip install -e .
🤖 Download Model
apt install git-lfs
mkdir ckpts && cd ckpts
git lfs install
git clone https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct
▶️ Run
1️⃣ LR Data Processing
python data_process/data_converter_fixed_512_gt_crop_random_region.py
2️⃣ LR Training
bash training_scripts/final_lr_training.sh
3️⃣ HR Data Processing (Two Methods)
3.1 Faster Method
python data_process/data_converter_fixed_1920_qa.py
3.2 Paper Method (Search-Based)
# Step 1: region search based on LR model
bash data_process/data_convert_1920_with_best_region_by_LR_model.sh
# Step 2: HR conversion
python data_process/data_converter_fixed_1920_qa_with_best_region.py
4️⃣ HR Training
bash training_scripts/final_hr_training.sh
# or for 3.1 faster data processing
bash training_scripts/final_hr_training_faster.sh
🔄 Model Conversion (HuggingFace Format)
python3 training_scripts/model_merger.py --local_dir workdir/xxx/global_step_xxx/actor
🧪 Evaluation
bash eval.sh
🔥 Our Pretrained Models for Inference
https://huggingface.co/mycfhs/FineRS/tree/main
Acknowledgement
- Our repo is built on Seg-Zero, EasyR1 and veRL. We thank the authors for sharing their codes.
- This work utilizes models from Qwen2.5-VL and SAM2.
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
19.1kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
