165 skills found · Page 1 of 6
iMoonLab / Yolov13Implementation of "YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception".
xahidbuffon / FUnIE GANFast underwater image enhancement for Improved Visual Perception. #TensorFlow #PyTorch #RAL2020
shenyunhang / APE[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
OpenGVLab / Vision RWKV[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
wl-zhao / VPD[ICCV 2023] VPD is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.
HKUST-Aerial-Robotics / OmniNxt[IROS'24 Oral] A Fully Open-source and Compact Aerial Robot with Omnidirectional Visual Perception
saccharomycetes / Mllms Know[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
JIA-Lab-research / VisionReasonerVisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
baaivision / DIVA[ICLR 2025] Diffusion Feedback Helps CLIP See Better
bytedance / WMPReproduction code of paper "World Model-based Perception for Visual Legged Locomotion"
nnnth / UFO[NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"
kaanaksit / OdakScientific computing library for optics, computer graphics and visual perception.
prashnani / PerceptualImageErrorA metric for Perceptual Image-Error Assessment through Pairwise Preference (PieAPP at CVPR 2018).
facebookresearch / SWAGOfficial repository for "Revisiting Weakly Supervised Pre-Training of Visual Perception Models". https://arxiv.org/abs/2201.08371.
USTCPCS / CVPR2018 AttentionContext Encoding for Semantic Segmentation MegaDepth: Learning Single-View Depth Prediction from Internet Photos LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume On the Robustness of Semantic Segmentation Models to Adversarial Attacks SPLATNet: Sparse Lattice Networks for Point Cloud Processing Left-Right Comparative Recurrent Model for Stereo Matching Enhancing the Spatial Resolution of Stereo Images using a Parallax Prior Unsupervised CCA Discovering Point Lights with Intensity Distance Fields CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation Learning a Discriminative Feature Network for Semantic Segmentation Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation Unsupervised Deep Generative Adversarial Hashing Network Monocular Relative Depth Perception with Web Stereo Data Supervision Single Image Reflection Separation with Perceptual Losses Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains EPINET: A Fully-Convolutional Neural Network for Light Field Depth Estimation by Using Epipolar Geometry FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds Decorrelated Batch Normalization Unsupervised Learning of Depth and Egomotion from Monocular Video Using 3D Geometric Constraints PU-Net: Point Cloud Upsampling Network Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer Tell Me Where To Look: Guided Attention Inference Network Residual Dense Network for Image Super-Resolution Reflection Removal for Large-Scale 3D Point Clouds PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image Fully Convolutional Adaptation Networks for Semantic Segmentation CRRN: Multi-Scale Guided Concurrent Reflection Removal Network DenseASPP: Densely Connected Networks for Semantic Segmentation SGAN: An Alternative Training of Generative Adversarial Networks Multi-Agent Diverse Generative Adversarial Networks Robust Depth Estimation from Auto Bracketed Images AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation DeepMVS: Learning Multi-View Stereopsis GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation Single-Image Depth Estimation Based on Fourier Domain Analysis Single View Stereo Matching Pyramid Stereo Matching Network A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth, and Optical Flow Estimation Image Correction via Deep Reciprocating HDR Transformation Occlusion Aware Unsupervised Learning of Optical Flow PAD-Net: Multi-Tasks Guided Prediciton-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing Surface Networks Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation TextureGAN: Controlling Deep Image Synthesis with Texture Patches Aperture Supervision for Monocular Depth Estimation Two-Stream Convolutional Networks for Dynamic Texture Synthesis Unsupervised Learning of Single View Depth Estimation and Visual Odometry with Deep Feature Reconstruction Left/Right Asymmetric Layer Skippable Networks Learning to See in the Dark
google-research-datasets / RxRRoom-across-Room (RxR) is a large-scale, multilingual dataset for Vision-and-Language Navigation (VLN) in Matterport3D environments. It contains 126k navigation instructions in English, Hindi and Telugu, and 126k navigation following demonstrations. Both annotation types include dense spatiotemporal alignments between the text and the visual perceptions of the annotators
nasa-jpl / Visual Perception EngineVisual Perception Engine: fast and flexible framework designed to run multiple perception models in an optimized and concurrent manner on NVIDIA Jetson
zli12321 / Vision SR1Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
baaivision / DenseFusionDenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
LeapLabTHU / AdaptiveNN[Nature Machine Intelligence 2025] Emulating Human-like Adaptive Vision for Efficient and Flexible Machine Visual Perception