TaxaBind
[WACV'25 Oral] TaxaBind: Multimodal Ecological Model
Install / Use
/learn @mvrl/TaxaBindREADME
TaxaBind: A Unified Embedding Space for Ecological Applications
<div align="center"> <img src="imgs/taxabind_logo.png" width="250">Srikumar Sastry*, Subash Khanal, Aayush Dhakal, Adeel Ahmad, Nathan Jacobs (*Corresponding Author)
WACV 2025
</div>This repository is the official implementation of TaxaBind. TaxaBind is a suite of multimodal models useful for downstream ecological tasks covering six modalities: ground-level image, geographic location, satellite image, text, audio, and environmental features.

🎯 Zero-Shot Image Classification
Our framework outperforms the state-of-the-art in both unimodal (BioCLIP, ArborCLIP) and multimodal setting (ImageBind).
🔥 Large Mulitmodal Ecological Datasets
- We release TaxaBench-8k, a truly multimodal dataset containing six paired modalities for evaluating large ecological models.
- We release iSatNat, containing 2.7M pairs of satellite images and ground-level species images.
- We release iSoundNat, containing 88,130 pairs of audio and ground-level species images.
⚙️ Usage
Our pretrained models are made available through rshf and transformers package for easy inference.
Load and initialize taxabind config:
from transformers import PretrainedConfig
from rshf.taxabind import TaxaBind
config = PretrainedConfig.from_pretrained("MVRL/taxabind-config")
taxabind = TaxaBind(config)
📎 Loading ground-level image and text encoders:
# Loads open_clip style model
model = taxabind.get_image_text_encoder()
tokenizer = taxabind.get_tokenizer()
processor = taxabind.get_image_processor()
🛰️ Loading satellite image encoder:
sat_encoder = taxabind.get_sat_encoder()
sat_processor = taxabind.get_sat_processor()
📍 Loading location encoder:
location_encoder = taxabind.get_location_encoder()
🔈 Loading audio encoder:
audio_encoder = taxabind.get_audio_encoder()
audio_processor = taxabind.get_audio_processor()
🌦️ Loading environmental encoder:
env_encoder = taxabind.get_env_encoder()
env_processor = taxabind.get_env_processor()
📑 Citation
@inproceedings{sastry2025taxabind,
title={TaxaBind: A Unified Embedding Space for Ecological Applications},
author={Sastry, Srikumar and Khanal, Subash and Dhakal, Aayush and Ahmad, Adeel and Jacobs, Nathan},
booktitle={Winter Conference on Applications of Computer Vision},
year={2025},
organization={IEEE/CVF}
}
🔍 Additional Links
Check out our lab website for other interesting works on geospatial understanding and mapping:
Related Skills
node-connect
339.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
