Results for "vision-model"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

2,688 skills found · Page 1 of 90

huggingface / Transformers

158.6k

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

universal

audiodeep-learningdeepseek+16

100

Updated 5m ago

mudler / LocalAI

44.6k

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

zedclaude code+1

agentsaiapi+15

Updated 2m ago

pytorch / Vision

17.6k

Datasets, Transforms and Models specific to Computer Vision

universal

computer-visionmachine-learning

Updated 12h ago

getomni-ai / Zerox

12.2k

OCR & Document Extraction using vision models

universal

ocrpdf

Updated 6m ago

vikhyat / Moondream

9.5k

tiny vision language model

universal

Updated 6h ago

roboflow / Notebooks

9.3k

A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM 3, and Qwen3-VL.

universal

automatic-labeling-systemcomputer-visiondeep-learning+17

Updated 10h ago

rednote-hilab / Dots.ocr

8.2k

Multilingual Document Layout Parsing in a Single Vision-Language Model

universal

Updated 2h ago

GetStream / Vision Agents

7.6k

Open Vision Agents by Stream. Build Vision Agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.

universal

agentic-aiagentsai+8

Updated 4h ago

apple / Ml Fastvlm

7.3k

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

universal

Updated 5h ago

QwenLM / Qwen VL

6.6k

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

universal

large-language-modelsvision-language-model

Updated 41m ago

deepseek-ai / DeepSeek VL2

5.3k

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

universal

Updated 6m ago

unslothai / Notebooks

5.1k

250+ Fine-tuning & RL Notebooks for text, vision, audio, embedding, TTS models.

universal

unsloth

Updated 10h ago

Deci-AI / Super Gradients

5.0k

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.

universal

dependency-graph

Updated 49m ago

joanrod / Star Vector

4.3k

StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.

universal

llmmultimodal-large-language-modelssvg+1

Updated 6h ago

QwenLM / Qwen2.5 Omni

4.0k

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

universal

Updated 15h ago