VisInContext
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Install / Use
/learn @showlab/VisInContextREADME
VisInContext
- VisInContext is a easy way to increase the in-context text length in Multi-modality Learning.
- This work is also complement with existing works to increase in-context text length like FlashAttn, Memory Transformer.
Install
pip install -r requirement.txt
For H100 GPUS, run the following dependencies:
pip install -r requirements_h100.txt
Dataset Preparation
See DATASET.md.
Pre-training
See PRETRAIN.md.
Few-shot Evaluation
See Evaluation.md
Citation
If you find our work helps, please consider cite the following work
@article{wang2024visincontext,
title={Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning},
author={Wang, Alex Jinpeng and Li, Linjie and Lin, Yiqi and Li, Min and Wang, Lijuan and Shou, Mike Zheng},
journal={NeurIPS},
year={2024}
}
Contact
Email: awinyimgprocess at gmail dot com
Acknowledgement
Thanks for these good works. Open-flamingo, Open-CLIP and WebDataset.
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
last30days-skill
8.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

