This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

universal

multimodal-deep-learningmultimodal-interactionsmultimodal-learning+1

Updated 2d ago

Osilly / Vision DeepResearch

609

Multimodal deep-research MLLM and benchmark. The first long-horizon multimodal deep-research MLLM, extending the number of reasoning turns to dozens and the number of search-engine interactions to hundreds.

universal

Updated 1d ago

microsoft / Psi

570

Platform for Situated Intelligence

universal

artificial-intelligencecomponent-libraryframework+7

Updated 2mo ago

soujanyaporia / MUStARD

372

Multimodal Sarcasm Detection Dataset

universal

multimodal-deep-learningmultimodal-interactionssarcasm+1

Updated 2d ago

DjangoPeng / Agent Hub

326

This repository is a hub for AI Agent projects, including GitHub Sentinel, LanguageMentor, and ChatPPT, designed to enhance enterprise workflows, language learning, and multimodal interaction. Explore a growing family of agents geared towards revolutionizing various industries with cutting-edge AI solutions.

universal

Updated 5d ago

declare-lab / Awesome Emotion Recognition In Conversations

276

A comprehensive reading list for Emotion Recognition in Conversations

universal

conversational-aidialogue-systemsemotion-recognition+4

Updated 4d ago

OriNachum / Autonomous Intelligence

223

Embodied AI system combining real-time multimodal perception, speech-to-speech interaction, and autonomous awareness on NVIDIA Jetson hardware.

universal

Updated 7d ago

mahmoodlab / SurvPath

165

Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024

universal

histology-transcriptomicsinterpretabilitymahmoodlab+5

Updated 7d ago

declare-lab / Contextual Utterance Level Multimodal Sentiment Analysis

125

Context-Dependent Sentiment Analysis in User-Generated Videos

universal

keraslstmlstm-neural-networks+3

Updated 11mo ago

cocacola-lab / MineLand

110

Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs

universal

ai-agentslarge-language-modelsminecraft+1

Updated 6h ago

pliang279 / PID

[NeurIPS 2023, ICMI 2023] Quantifying & Modeling Multimodal Interactions

universal

Updated 1mo ago

zjr2000 / Awesome Multimodal Chatbot

Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction, such as text, speech, images, and videos, to provide a seamless and versatile user experience.

universal

awesomechat-applicationchatbot+8

Updated 9d ago

umdsquare / Data At Hand Mobile

Mobile application for exploring fitness data using both speech and touch interaction.

universal

fitbitfitness-trackermobile-app+6

Updated 10mo ago

fxxJuses / MICFormer

implement of "Multimodal Information Interaction for Medical Image Segmentation."

universal

Updated 15d ago

Raina-Xin / I2MoE

[ICML 2025] I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts.

universal

Updated 14d ago

gokulkarthik / Hateclipper

Hate-CLIPper: Multimodal Hateful Meme Classification with Explicit Cross-modal Interaction of CLIP features - Accepted at EMNLP 2022 Workshop

universal

ai4goodcomputer-visionmemes+2

Updated 6d ago

MaxiDonkey / DelphiGemini

Delphi wrapper for the Google Gemini API: stateless generation and agent workflows with multimodal, streaming, persistent interactions, structured outputs, vector search, batch, image/video generation, and grounding (Search/Maps).

gemini cli

agentsapi-wrapperdeep-research+12

Updated 15d ago

YingLv1106 / CAINet

This is a multimodal semantic segmentation method, named CAINet: Context-Aware Interaction Network for RGB-T Semantic Segmentation.

universal

Updated 18d ago