28 skills found
baaivision / PainterPainter & SegGPT Series: Vision Foundation Models from BAAI
UX-Decoder / DINOv[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
Atomic-man007 / Awesome Multimodel LLMAwesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
ZhangYuanhan-AI / Visual Prompt Retrieval[NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"
USTCPCS / CVPR2018 AttentionContext Encoding for Semantic Segmentation MegaDepth: Learning Single-View Depth Prediction from Internet Photos LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume On the Robustness of Semantic Segmentation Models to Adversarial Attacks SPLATNet: Sparse Lattice Networks for Point Cloud Processing Left-Right Comparative Recurrent Model for Stereo Matching Enhancing the Spatial Resolution of Stereo Images using a Parallax Prior Unsupervised CCA Discovering Point Lights with Intensity Distance Fields CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation Learning a Discriminative Feature Network for Semantic Segmentation Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation Unsupervised Deep Generative Adversarial Hashing Network Monocular Relative Depth Perception with Web Stereo Data Supervision Single Image Reflection Separation with Perceptual Losses Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains EPINET: A Fully-Convolutional Neural Network for Light Field Depth Estimation by Using Epipolar Geometry FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds Decorrelated Batch Normalization Unsupervised Learning of Depth and Egomotion from Monocular Video Using 3D Geometric Constraints PU-Net: Point Cloud Upsampling Network Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer Tell Me Where To Look: Guided Attention Inference Network Residual Dense Network for Image Super-Resolution Reflection Removal for Large-Scale 3D Point Clouds PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image Fully Convolutional Adaptation Networks for Semantic Segmentation CRRN: Multi-Scale Guided Concurrent Reflection Removal Network DenseASPP: Densely Connected Networks for Semantic Segmentation SGAN: An Alternative Training of Generative Adversarial Networks Multi-Agent Diverse Generative Adversarial Networks Robust Depth Estimation from Auto Bracketed Images AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation DeepMVS: Learning Multi-View Stereopsis GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation Single-Image Depth Estimation Based on Fourier Domain Analysis Single View Stereo Matching Pyramid Stereo Matching Network A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth, and Optical Flow Estimation Image Correction via Deep Reciprocating HDR Transformation Occlusion Aware Unsupervised Learning of Optical Flow PAD-Net: Multi-Tasks Guided Prediciton-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing Surface Networks Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation TextureGAN: Controlling Deep Image Synthesis with Texture Patches Aperture Supervision for Monocular Depth Estimation Two-Stream Convolutional Networks for Dynamic Texture Synthesis Unsupervised Learning of Single View Depth Estimation and Visual Odometry with Deep Feature Reconstruction Left/Right Asymmetric Layer Skippable Networks Learning to See in the Dark
libaolu312 / VFXMasterVFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning
edward3862 / AnalogistAnalogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model (SIGGRAPH 2024)
showlab / VisInContextOfficial implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
sudraj2002 / AWRaCLePyTorch code for AWRaCLe: All-Weather Image Restoration using Visual In-Context Learning
syp2ysy / Prompt SelF[TIP] Exploring Effective Factors for Improving Visual In-Context Learning
Jackieam / InMeMo[WACV 2024] Instruct Me More! Random Prompting for Visual In-Context Learning
space-bacon / Semiotic Analysis ToolThe Semiotic Analysis Tool is a comprehensive and sophisticated Python-based application designed to analyze various sign systems within textual and visual data. This tool integrates multiple advanced NLP techniques, machine learning models, and external knowledge sources to provide an in-depth analysis of the meaning and context of the input data.
LanqingL / SCS"Visual Prompt Selection for In-Context Learning Segmentation Framework"
gimpong / CVPR25 CondenserThe code for the paper "Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning" (CVPR'25).
leomqyu / BraInCoRLMeta-Learning an In-Context Transformer Model of Human Higher Visual Cortex (NeurIPS 2025)
smthemex / ComfyUI VisualClozeThis node is base on VisualCloze method, A Universal Image Generation Framework via Visual In-Context Learning
chenxshuo / True MiclCode of True Multimodal In-Context Learning Needs Attention to the Visual Context (COLM2025)
Akella17 / Beta VAETo learn and reason like humans, AI must first learn to factorise interpretable representations of independent data generative factors (preferably in an unsupervised manner!!). What does all this mean? Go through this tutorial to get an overview of disentanglement in the context of unsupervised visual disentangled representation learning.
Wu-Wenxiao / RH Partial2GlobalThis repository contains the official implementation for the NeurIPS 2025 paper, "Towards Reliable and Holistic Visual In-Context Learning Prompt Selection".
RheoDesign / AAVS BeijingTITLE: SU(PE)RREAL Director: Li-Qun Zhao SuperReal is about the manipulation of the mass information in the Big Data Era. Due to the development of multi-media technologies, everyone has submerged in the data ocean. Data could be generated by anything surround us. Instead of generating forms and effects, the key of SuperReal is, how we can parameterize the information mapping, regardless visible or invisible, with visual communication. Various multi-media tools will be used in data collection, processing and presentation. The workshop will start with exercises of data mapping and visualization through parametric modelling tools. Surreal emerges when we represent and reproduce the SuperReal data with multi-media medium, which promotes more interactive response between clients and users. We understand the representation of SuperReal is the project itself, meaning iterative feedback from statistical database to inspirational presentations will generate the design concepts. In this workshop, we will borrow the techniques and knowledge for film, animation and game industries, to produce the super-real surreal architecture in-between the virtual and the real space. The context of our workshop will be based on the imagination of how people would use the Galaxy Soho in Beijing in 50 years from now on. As we know, the Galaxy Soho is a new icon among those most recognizable icons in the capital of China. All the icons are designed to play against the human scale as the way to respect humans. The application of the SuperReal & Surreal through multi-media tools is how to re-occupying the macro anti/pro-human iconic buildings with micro events in human scale inspired by the data mapping outputs that we produced in the early stage. Some of the most prominent features, which the participants will be exposed to during AAVS Beijinginclude: • Teaching team: AAVS Beijing tutors are selected from recent graduates / current tutors at the AA. Participants engage in an active learning environment where the large tutor to student ratio (5:1) allows for personalized tutorials and debates. • Facilities: AAVS Beijing is based on Tsinghua University, which offers laser cutting, CNC milling, and 3d printing facilities. • Computational skills: The toolset of AAVS Beijing includes the most advanced computational design tools, such as Rhinoceros, Maya, Digital Project, Processing, Arduino, and Grasshopper. According to the agenda of this year, it is also include InformationMapping and Multi-Media representation tools. • Theoretical understanding: The dissemination of fundamental design techniques and relevant critical thinking methodologies to the participants through theoretical sessions and seminars forms one of the major goals of AAVS Beijing. • Professional awareness: AAVS Beijing performs as a simulation of the professional environment due the priority given to team-based design approach. Participants ranging from 2nd year students to PhD candidates and full-time professionals experience a highly focused collaborative educational model, which promotes research-based design and making. • Fabrication: According to the specific agenda of each year, form node model to a one-to-one scale prototype could be fabricated and assembled by design teams. • Lecture series: Based on its unique location, Beijing, AAVS Beijing creates a vibrant atmosphere with its intense lecture programme conveying the diverse expertise of professionals from some of the world’s exciting practices in the areas of urbanization,regional and computational architecture design.