114 skills found · Page 1 of 4
zhaochen0110 / Awesome Think With ImagesResources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
NVlabs / EagleEagle: Frontier Vision-Language Models with Data-Centric Strategies
YingqingHe / Awesome LLMs Meet Multimodal Generation🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
liudaizong / Awesome LVLM Attack😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.
Hon-Wong / VoRA[Fully open] [Encoder-free MLLM] Vision as LoRA
zhaochen0110 / OpenThinkIMGOpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
NishilBalar / Awesome LVLM Hallucinationup-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
MMStar-Benchmark / MMStar[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
XuankunRong / Awesome LVLM SafetyA curated list of resources dedicated to the safety of Large Vision-Language Models. This repository aligns with our survey titled A Survey of Safety on Large Vision-Language Models: Attacks, Defenses, and Evaluations.
LALBJ / PAI[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
Ice-wave / AttentionLens LVLMA lightweight and extensible toolkit for visualizing attention flow in Large Vision-Language Models (LVLMs). It renders token-to-token attention maps, cross-modal attention paths, and layer–head attention dynamics, helping researchers diagnose abnormal attention behaviors.
ekonwang / VisuoThink[Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics]: VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
opendatalab / HA DPOBeyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
lhanchao777 / LVLM Hallucinations SurveyThis is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and continuously update our survey, we maintain this repository of relevant references.
thu-nics / FrameFusion[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"
wang2226 / Awesome LLM Decoding📜 Paper list on decoding methods for LLMs and LVLMs
UMxYTL-AI-Labs / MalayMMLU[MalayMMLU] This is the first-ever Bahasa Melayu multitask benchmark designed to elevate the performance of Large Language Models (LLMs) and Large Vision Language Models (LVLMs).
w1oves / Hqclip[ICCV 2025] HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets
bytedance / LVLM InterpretationThe official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"
liuxuannan / FAK Owl[ACM MM 2024] FKA-Owl: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs