17 skills found
LLaVA-VL / LLaVA NeXTNo description available
Coobiw / MPP LLaVAPersonal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
RLHF-V / RLAIF V[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
xiaoachen98 / Open LLaVA NeXTAn open-source implementation for training LLaVA-NeXT.
zjysteven / Lmms FinetuneA minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
mu-cai / Matryoshka MmMatryoshka Multimodal Models
chuangchuangtan / LLaVA NeXT Image Llama3 LoraLLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft
hasanar1f / HiRED[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Vision-Language Models (e.g., LLaVA-Next) under a fixed token budget.
Farzad-R / Finetune LLAVA NEXTThis repository contains codes for fine-tuning LLAVA-1.6-7b-mistral (Multimodal LLM) model.
justinsunyt / MultiAgentGenerative web browsing chat agent with text + vision input. Powered by MultiOn, llama-3, llava, qwen, Next.js, FastAPI, and Supabase. Landed me an internship at MultiOn :)
Jorffy / NoteMR[CVPR 2025] Code for "Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering".
zaiquanyang / LLaVA Next STVGLLaVA-Next for STVG
friedrichor / LLaVA NeXT ReproducedReproduced LLaVA-NeXT with training code and scripts.
Darren-greenhand / LLaVA NextLLaVA_OpenVLA part 3, Use LLaVA to train a stronger VLA model
hari-huynh / ViVQA Voice AssistantVoice assistant using Multimodal LLMs - LLaVA-NeXT (Mistral 7B) finetuned & PhoWhisper
alyakin314 / CNS ObsidianCNS-Obsidian: A Neurosurgical Vision-Language Model Built From Scientific Publications
luxus180 / LLaVA OneVision 1.5🛠️ Build and train multimodal models easily with LLaVA-OneVision 1.5, an open framework designed for seamless integration of vision and language tasks.