92 skills found · Page 1 of 4
InternLM / InternLM XComposerInternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
declare-lab / MELDMELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation
declare-lab / Multimodal Deep LearningThis repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Osilly / Vision DeepResearchMultimodal deep-research MLLM and benchmark. The first long-horizon multimodal deep-research MLLM, extending the number of reasoning turns to dozens and the number of search-engine interactions to hundreds.
microsoft / PsiPlatform for Situated Intelligence
soujanyaporia / MUStARDMultimodal Sarcasm Detection Dataset
DjangoPeng / Agent HubThis repository is a hub for AI Agent projects, including GitHub Sentinel, LanguageMentor, and ChatPPT, designed to enhance enterprise workflows, language learning, and multimodal interaction. Explore a growing family of agents geared towards revolutionizing various industries with cutting-edge AI solutions.
declare-lab / Awesome Emotion Recognition In ConversationsA comprehensive reading list for Emotion Recognition in Conversations
OriNachum / Autonomous IntelligenceEmbodied AI system combining real-time multimodal perception, speech-to-speech interaction, and autonomous awareness on NVIDIA Jetson hardware.
mahmoodlab / SurvPathModeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
declare-lab / Contextual Utterance Level Multimodal Sentiment AnalysisContext-Dependent Sentiment Analysis in User-Generated Videos
cocacola-lab / MineLandSimulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs
pliang279 / PID[NeurIPS 2023, ICMI 2023] Quantifying & Modeling Multimodal Interactions
zjr2000 / Awesome Multimodal ChatbotAwesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction, such as text, speech, images, and videos, to provide a seamless and versatile user experience.
umdsquare / Data At Hand MobileMobile application for exploring fitness data using both speech and touch interaction.
fxxJuses / MICFormerimplement of "Multimodal Information Interaction for Medical Image Segmentation."
Raina-Xin / I2MoE[ICML 2025] I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts.
gokulkarthik / HateclipperHate-CLIPper: Multimodal Hateful Meme Classification with Explicit Cross-modal Interaction of CLIP features - Accepted at EMNLP 2022 Workshop
MaxiDonkey / DelphiGeminiDelphi wrapper for the Google Gemini API: stateless generation and agent workflows with multimodal, streaming, persistent interactions, structured outputs, vector search, batch, image/video generation, and grounding (Search/Maps).
YingLv1106 / CAINetThis is a multimodal semantic segmentation method, named CAINet: Context-Aware Interaction Network for RGB-T Semantic Segmentation.