SkillAgentSearch skills...

EVP

[ECCV 2024] EVP model for metric depth estimation from a single image and referring segmentation

Install / Use

/learn @Lavreniuk/EVP
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

[ECCV 2024] EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment

<a href='https://lavreniuk.github.io/EVP'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2312.08548'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://huggingface.co/spaces/MykolaL/evp'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a> Open In Collab

PWC <br> PWC <br> PWC <br> PWC <br> PWC <br> PWC

by Mykola Lavreniuk, Shariq Farooq Bhat, Matthias Müller, Peter Wonka

This repository contains PyTorch implementation for paper "EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment".

EVP (<ins>E</ins>nhanced <ins>V</ins>isual <ins>P</ins>erception) builds on the previous work VPD which paved the way to use the Stable Diffusion network for computer vision tasks.

intro

Installation

Clone this repo, and run

git submodule init
git submodule update

Download the checkpoint of stable-diffusion (we use v1-5 by default) and put it in the checkpoints folder. Please also follow the instructions in stable-diffusion to install the required packages.

Referring Image Segmentation with EVP

EVP achieves 76.35 overall IoU and 77.61 mean IoU on the validation set of RefCOCO.

Please check refer.md for detailed instructions on training and inference.

Depth Estimation with EVP

EVP obtains 0.224 RMSE on NYUv2 depth estimation benchmark, establishing the new state-of-the-art.

| | RMSE | d1 | d2 | d3 | REL | log_10 | |---------|-------|-------|--------|------|-------|-------| | EVP | 0.224 | 0.976 | 0.997 | 0.999 | 0.061 | 0.027 |

EVP obtains 0.048 REL and 0.136 SqREL on KITTI depth estimation benchmark, establishing the new state-of-the-art.

| | REL | SqREL | RMSE | RMSE log | d1 | d2 | d3 | |---------|-------|-------|--------|------|-------|-------|-------| | EVP | 0.048 | 0.136 | 2.015 | 0.073 | 0.980 | 0.998 | 1.000 |

Please check depth.md for detailed instructions on training and inference.

License

MIT License

Acknowledgements

This code is based on stable-diffusion, mmsegmentation, LAVT, MIM-Depth-Estimation and VPD

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{lavreniuk2024evp,
  title={EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment},
  author={Mykola Lavreniuk and Shariq Farooq Bhat and Matthias Muller and Peter Wonka},
  booktitle={European Conference on Computer Vision Workshops (ECCVW)},
  pages={206--225},
  year={2024}
}

Related Skills

View on GitHub
GitHub Stars86
CategoryDevelopment
Updated2mo ago
Forks7

Languages

Jupyter Notebook

Security Score

95/100

Audited on Jan 18, 2026

No findings