86 skills found · Page 1 of 3
Tencent-Hunyuan / HunyuanVideo FoleyHunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation.
HorizonWind2004 / Reconstruction Alignment[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
shalfun / DriVerse[ACMMM 2025] Officially implement of the paper "DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment"
GradientSpaces / CrossOver[CVPR 2025, Highlight] CrossOver: 3D Scene Cross-Modal Alignment
Kwai-YuanQi / MM RLHFThe Next Step Forward in Multimodal LLM Alignment
RainBowLuoCS / OpenOmni(NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
chenllliang / DreamEngineMultimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!
thecharm / MegaCode for ACM MM 2021 Paper "Multimodal Relation Extraction with Efficient Graph Alignment".
Max-Fu / Tvl[ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignment
kangverse / DALRThe implementation of our ACL 2025 paper "DALR: Dual-level Alignment Learning for Multimodal Sentence Representation Learning"
YuanLi95 / EEGA For JMEREThis is code for Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph Alignment Network and Word-pair Relation Tagging (AAAI 2023)
aistudynow / Comfyui HunyuanFoleyComfyui Nodes HunyuanVideo-Foley Low Vram: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation.
OpenGVLab / TPOTask Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
Yu-xm / ReVisionModality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
double125 / MADTPMADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
qinzzz / Multimodal Alignment FrameworkImplementation for MAF: Multimodal Alignment Framework
tliby / UniForkUniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
codefuse-ai / GALLa[ACL 2025] Graph Aligned Large Language Models for Improved Source Code Understanding
wwzhuang01 / Math PUMA[AAAI 2025]Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning
ColinFX / Prot2Text V2Prot2Text-V2: Protein Function Prediction with Multimodal Contrastive Alignment