Results for "visual-understanding"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

248 skills found · Page 1 of 9

om-ai-lab / VLM R1

5.9k

Solve Visual Understanding with Reinforced VLMs

universal

deepseek-r1grpollm+7

Updated 8h ago

luigifreda / Pyslam

3.2k

pySLAM is a hybrid Python/C++ Visual SLAM pipeline supporting monocular, stereo, and RGB-D cameras. It provides a broad set of modern local and global feature extractors, multiple loop-closure strategies, a volumetric reconstruction module, integrated depth-prediction models, and semantic segmentation capabilities for enhanced scene understanding.

universal

3d-reconstructiondepth-estimationdepth-prediction+17

Updated 11h ago

DAMO-NLP-SG / Video LLaMA

3.1k

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

universal

blip2cross-modal-pretraininglarge-language-models+5

Updated 1d ago

eslint / Config Inspector

1.3k

A visual tool for inspecting and understanding your ESLint flat configs.

universal

devtoolseslinteslint-flat-config+1

Updated 1d ago

PKU-YuanGroup / Chat UniVi

946

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

universal

image-understandinglarge-language-modelsvideo-understanding+1

Updated 18d ago

PKU-YuanGroup / UniWorld

862

UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

universal

diffusionhigh-level-featureimage-editing+6

Updated 8h ago

CircleRadon / Osprey

837

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

universal

mllmpixel-understandingsam+1

Updated 12d ago

photonlines / Intuitive Guide To Maxwells Equations

798

An intuitive and visual guide to understanding Maxwell's equations.

universal

Updated 12d ago

VUE / VUE

609

Visual Understanding Environment

universal

Updated 8d ago

FoundationVision / UniTok

518

[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding

universal

autoregressive-modelsgenerativegenerative-ai+6

Updated 14h ago

OpenGVLab / All Seeing

506

[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"

universal

all-seeingdatasetregion-text

Updated 1mo ago

jqtangust / Robust R1

488

🔥🔥🔥[AAAI 2026 Oral] Official Implementation of Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

universal

large-language-modelsmulti-modalmultimodel-large-language-model+3

Updated 1d ago

CoreyGinnivan / Whocanuse

483

WhoCanUse is a tool that brings attention and understanding to how color contrast can affect different people with visual impairments.

universal

a11yaccessibilityaccessibility-service+2

Updated 20d ago

mit-han-lab / Vila U

421

[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

universal

Updated 14h ago

google-deepmind / Videoprism

362

Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)

universal

self-supervised-learningvideo-foundation-modelvision-language-model+1

Updated 5d ago

VARGPT-family / VARGPT

341

VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model

universal

mllmunified-model

Updated 7d ago

Open3DA / LL3DA

316

[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.

universal

3d3d-models3d-to-text+7

Updated 7d ago

shabie / Docformer

287

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)

universal

Updated 6d ago

SALT-NLP / LLaVAR

269

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

universal

chatbotchatgptgpt-4+5

Updated 4mo ago

jiweil / Visualizing And Understanding Neural Models In NLP

226

No description available

universal

Updated 21d ago