170 skills found · Page 1 of 6
roboflow / NotebooksA collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM 3, and Qwen3-VL.
joanrod / Star VectorStarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.
VainF / Torch Pruning[CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.
mit-han-lab / EfficientvitEfficient vision foundation models for high-resolution generation and perception.
OpenGVLab / InternImage[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
baaivision / PainterPainter & SegGPT Series: Vision Foundation Models from BAAI
ByteDance-Seed / Seed1.5 VLSeed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
uncbiag / Awesome Foundation ModelsA curated list of foundation models for vision and language tasks
taokz / BiomedGPTBiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks
youquanl / Segment Any Point Cloud[NeurIPS'23 Spotlight] Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
rmaphoh / RETFoundVision Foundation Models for Medical AI, including RETFound, DINOv2, DINOv3
ChenDelong1999 / RemoteCLIP🛰️ Official repository of paper "RemoteCLIP: A Vision Language Foundation Model for Remote Sensing" (IEEE TGRS)
JindongGu / Awesome Prompting On Vision Language ModelThis repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
google-research / Maxvit[ECCV 2022] Official repository for "MaxViT: Multi-Axis Vision Transformer". SOTA foundation models for classification, detection, segmentation, image quality, and generative modeling...
mahmoodlab / CONCHVision-Language Pathology Foundation Model - Nature Medicine
UCSC-VLAA / OpenVisionOpenVision (ICCV 2025), OpenVision 2 (CVPR 2026), and OpenVision 3
ViTAE-Transformer / Remote Sensing RVSAThe official repo for [TGRS'22] "Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model"
microsoft / CogACTA Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
w1oves / Rein[CVPR 2024] Official implement of <Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation>
RT-DETRs / RT DETRv4Official implementation of RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation Models