DALR

The implementation of our ACL 2025 paper "DALR: Dual-level Alignment Learning for Multimodal Sentence Representation Learning"

Generate Convert Improve

Install / Use

/learn @kangverse/DALR

About this skill

Quality Score

0/100

README

DALR

English | 中文

Overview

We propose DALR (Dual-level Alignment Learning for multimodal sentence Representation Learning).

To achieve cross-modal fine-grained alignment, we propose a cross-modal alignment method to mitigate the cross-modal misalignment bias (CMB) issue. To alleviate the intra-modal semantic divergence (ISD) issue, we integrate ranking distillation with global alignment learning to effectively align intra-modal representations.

The figure below illustrates the overall model architecture.

DALR model architecture

Getting Started
- Environment Setup
- Download Datasets
Quick Start: Use DALR
Evaluation
Train Your Own Models
Project Structure
Citation
Acknowledgements
Contributing

Getting Started

Environment Setup

We recommend creating a virtual environment first:

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

Install PyTorch (CUDA 11.1):

pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 \
    -f https://download.pytorch.org/whl/torch_stable.html

For CUDA < 11 or CPU-only:

pip install torch==1.8.1

Then install the remaining dependencies:

pip install -r requirements.txt

Download Datasets

Download Flickr30k and MS-COCO from their official websites and organize them as follows:

REPO ROOT
├── data
│   ├── Flickr/
│   ├── MS-COCO/
│   └── wiki1m_for_simcse.txt
├── Model/
│   ├── bert-base-uncased/
│   ├── simcse/
│   ├── DiffCSE/
│   └── clip/
│       └── ViT-L-14.pt

Wiki1M (used for text training):

wget https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/resolve/main/wiki1m_for_simcse.txt \
    -P data/

SentEval evaluation datasets (from SimCSE):

cd SentEval/data/downstream/
bash download_dataset.sh

Pretrained models (SimCSE, DiffCSE, BERT-base, CLIP ViT-L/14) can be downloaded from Hugging Face and placed in the Model/ directory.

Quick Start: Use DALR

import torch
from scipy.spatial.distance import cosine
from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Model/DALR")
model = AutoModel.from_pretrained("Model/DALR")

texts = [
    "There's a kid on a skateboard.",
    "A kid is skateboarding.",
    "A kid is inside the house.",
]
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

with torch.no_grad():
    embeddings = model(**inputs, output_hidden_states=True, return_dict=True).pooler_output

cosine_sim_0_1 = 1 - cosine(embeddings[0], embeddings[1])
cosine_sim_0_2 = 1 - cosine(embeddings[0], embeddings[2])

print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[1], cosine_sim_0_1))
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[2], cosine_sim_0_2))

Evaluation

Run evaluation on SentEval benchmarks:

python src/evaluation.py \
    --model_name_or_path Model/DALR \
    --pooler cls_before_pooler \
    --task_set sts \
    --mode test

Additional evaluation scripts are provided in scripts/:

bash scripts/run_eval.sh        # STS evaluation
bash scripts/run_eval_coco.sh   # COCO retrieval evaluation

Train Your Own Models

Wiki + Flickr30k

bash scripts/run_wiki_flickr.sh

Wiki + MS-COCO

bash scripts/run_wiki_coco.sh

You can freely adjust hyperparameters (learning rate, batch size, margins, lambda, etc.) in the respective shell scripts. Key arguments:

| Argument | Description | Default | |---|---|---| | --framework | Training framework (simcse / mse) | mse | | --learning_rate | Learning rate | 2e-5 | | --per_device_train_batch_size | Batch size per device | 128 | | --num_train_epochs | Number of training epochs | 4 | | --lbd | Weight for distillation loss | 0.01 | | --margin1 / --margin2 | Ranking margins | 0.18 | | --distillation_loss | Distillation loss type | listmle | | --alpha_ / --beta_ / --gamma_ | Loss weights | 0.33 / 1.0 / 1.0 |

Project Structure

DALR/
├── clip/                   # CLIP model utilities
├── data/                   # Data directory (datasets downloaded here)
├── figure/                 # Figures used in the paper / README
├── scripts/                # Training and evaluation shell scripts
│   ├── run_wiki_flickr.sh
│   ├── run_wiki_coco.sh
│   ├── run_eval.sh
│   └── run_eval_coco.sh
├── SentEval/               # SentEval toolkit (evaluation)
├── src/                    # Core source code
│   ├── model_dalr.py       # DALR model definition
│   ├── train_mix.py        # Main training script
│   ├── data.py             # Dataset and data loading
│   ├── evaluation.py       # SentEval evaluation
│   ├── teachers.py         # Teacher model wrappers
│   ├── utils.py            # Utility functions
│   ├── vit.py              # Vision Transformer implementation
│   ├── xbert.py            # Extended BERT utilities
│   ├── tool.py             # Miscellaneous tools
│   └── randaugment.py      # RandAugment data augmentation
├── requirements.txt
├── LICENSE
├── CONTRIBUTING.md
├── README.md
└── README_zh.md

Citation

If you find this work useful in your research, please consider citing:

@inproceedings{he-etal-2025-dalr,
    title = "{DALR}: Dual-level Alignment Learning for Multimodal Sentence Representation Learning",
    author = "He, Kang  and
      Ding, Yuzhe  and
      Wang, Haining  and
      Li, Fei  and
      Teng, Chong  and
      Ji, Donghong",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    year = "2025",
    pages = "3586--3601",   
}

Acknowledgements

Evaluation is powered by the SentEval toolkit; we adopt the modified version from SimCSE.
Part of our code is adapted from MCSE and KDMCSE.

Contributing

We welcome contributions! Please read our Contributing Guide to get started. Feel free to open an Issue or submit a Pull Request.

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

isf-agent

a repo for an agent that helps researchers apply for isf funding

kangverse

View profile

View on GitHub

GitHub Stars78

CategoryEducation

Updated1d ago

Forks18

kangverse/DALR

Languages

Python

Security Score

95/100

Audited on Mar 29, 2026

No findings

DALR

Install / Use

README

DALR

Overview

Table of Contents

Getting Started

Environment Setup

Download Datasets

Quick Start: Use DALR

Evaluation

Train Your Own Models

Wiki + Flickr30k

Wiki + MS-COCO

Project Structure

Citation

Acknowledgements

Contributing

Related Skills