<div align="center"> <h1><a href="http://arxiv.org/abs/2503.06277">STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification (CVPR 2025)</a></h1>

Siyi Du, Xinzhe Luo, Declan P. O'Regan, and Chen Qin

GitHub stars

</div>

TIP

<p align="center">Overall framework of STiL. STiL encodes image-tabular data using $\phi$, decomposes modality-shared and -specific information through DCC $\psi$ (a), and outputs predictions via multimodal and unimodal classifiers $f$. STiL generates pseudo-labels for unlabeled data using CGPL (b) and refines them with prototype similarity scores in PGLS (c). (d) Training pathways for labeled and unlabeled data.</p>

This is an official PyTorch implementation for STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification. We built the code based on the code of our prior ECCV 2024 paper siyi-wind/TIP.

We also include plenty of comparing models in this repository: SimMatch, Multimodal SimMatch, CoMatch, Multimodal CoMatch, FreeMatch, Multimodal FreeMatch, MMatch, and Co-training (Please go to the paper to find the detailed information of these models).

Concact: s.du23@imperial.ac.uk (Siyi Du)

Share us a :star: if this repository does help.

Updates

[12/03/2025] The arXiv paper and the code are released.

[21/02/2026] We have a new paper accepted at ICLR 2026, which proposes an inference-time dynamic modality selection framework (DyMo) for various missing data scenarios across multiple modalities. Please check this repository for details.

Our Multimodal Learning Research Line

This repository is part of our research line on multimodal learning.

TIP (ECCV2024): An image-tabular pre-training framework for intra-modality missingness (siyi-wind/TIP)
STiL (CVPR 2025, this work): A semi-supervised image-tabular framework for modality heterogeneity and limited labeled data (siyi-wind/STiL)
DyMo (ICLR 2026): An inference-time dynamic modality selection framework for missing modality (siyi-wind/DyMo)

Requirements
Data Preparation
Training & Testing
Checkpoints
Lisence & Citation
Acknowledgements

Requirements

This code is implemented using Python 3.9.15, PyTorch 1.11.0, PyTorch-lighting 1.6.4, CUDA 11.3.1, and CuDNN 8.

cd STiL/
conda env create --file environment.yaml
conda activate stil

Data Preparation

Download DVM data from here

Apply for the UKBB data here

Preparation

We conduct the same data preprocessing process as siyi-wind/TIP.

Training & Testing

Training

CUDA_VISIBLE_DEVICES=0 python -u run.py --config-name config_dvm_STiL dataset=dvm_all_server_reordered_SemiPseudo_0.01 exp_name=train evaluate=True checkpoint={YOUR_PRETRAINED_CKPT_PATH}

Testing

CUDA_VISIBLE_DEVICES=0 python -u run.py --config-name config_dvm_STiL dataset=dvm_all_server_reordered_SemiPseudo_0.01 exp_name=test test=True checkpoint={YOUR_TRAINED_CKPT_PATH}

Checkpoints

Lisence & Citation

This repository is licensed under the Apache License, Version 2.

If you use this code in your research, please consider citing:

@inproceedings{du2025stil,
  title={{STiL}: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification},
  author={Du, Siyi and Luo, Xinzhe and O'Regan, Declan P. and Qin, Chen},
  booktitle={Conference on Computer Vision and Pattern Recognition (CVPR) 2025},
  year={2025}}

Acknowledgements

We would like to thank the following repositories for their great works:

STiL

Install / Use

README