50 skills found · Page 1 of 2
google-research / Big VisionOfficial codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
gokayfem / ComfyUI VLM NodesCustom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
discus0434 / Aesthetic Predictor V2 5SigLIP-based Aesthetic Score Predictor
merveenoyan / SiglipProjects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗
qubvel / Transformers NotebooksInference and fine-tuning examples for vision models from 🤗 Transformers
aimagelab / LLaVA MORE[ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
rizavelioglu / Tryoffdiff[CVPR'25-Demo] Official repository of "TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models".
marqo-ai / Marqo FashionCLIPState-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.
MCG-NJU / AWT[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
MaviProject / MachineLearning本项目以应用为主出发,结合了从基础的机器学习、深度学习到目标检测以及目前最新的大模型,采用目前成熟的 第三方库、开源预训练模型以及相关论文的最新技术,目的是记录学习的过程同时也进行分享以供更多人可以直接进行使用。
dusty-nv / NanoDBZero-copy multimodal vector DB with CUDA and CLIP/SigLIP
miccunifi / Cross The Gap[ICLR 2025] - Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
encord-team / Text To Image EvalEvaluate custom and HuggingFace text-to-image/zero-shot-image-classification models like CLIP, SigLIP, DFN5B, and EVA-CLIP. Metrics include Zero-shot accuracy, Linear Probe, Image retrieval, and KNN accuracy.
PRITHIVSAKTHIUR / FineTuning SigLIP 2Fine-Tuning SigLIP 2 for Single/Multi-Label Image Classification. Image classification vision-language encoder model fine-tuned for Image Classification Tasks
NikosEfth / FreedomOfficial PyTorch implementation of the WACV 2025 Oral paper "Composed Image Retrieval for Training-FREE DOMain Conversion".
marqo-ai / Marqo Ecommerce EmbeddingsState-of-the-art embedding models fine-tuned for the ecommerce domain. +67% increase in evaluation metrics vs ViT-B-16-SigLIP.
rhysdg / Vision At A ClipLow-latency ONNX and TensorRT based zero-shot classification and detection with contrastive language-image pre-training based prompts
NMS05 / DinoV2 SigLIP Phi3 LoRA VLMNo description available
filipbasara0 / Simple ClipA minimal, but effective implementation of CLIP (Contrastive Language-Image Pretraining) in PyTorch
awsaf49 / Flickr DatasetDownload flickr8k, flickr30k image caption datasets