ImplicitOOD

An end-to-end vision and language model incorporating explicit knowledge graphs and OOD-detection.

Generate Convert Improve

Install / Use

/learn @ellenzhuwang/ImplicitOOD

About this skill

Quality Score

0/100

README

VK-OOD: Differentiable Outlier Detection Enables Robust Deep Multimodal Analysis

We move this project to a new repo here. It is accepted by NeurIPS23. :new:

In this work, we propose an end-to-end vision and language model incorporating explicit knowledge graphs. We also introduce an interactive out-of-distribution (OOD) layer using implicit network operator. The layer is used to filter noise that is brought by external knowledge base. In practice, we apply our model on several vision and language downstream tasks including visual question answering, visual reasoning, and image-text retrieval on different datasets. Our experiments show that it is possible to design models that perform similarly to state-of-art results but with significantly fewer samples and training time.

Install

To create a conda enviroment:

$ conda create -n vk_ood python=3.8 pip
$ conda activate vk_ood

To install other requirements:

$ pip install -r requirements.txt

Run

Pre-train:

$ python train.py data_root=/dataset/pretrain num_gpus=8 num_nodes=1 task_mlm_itm_clip_bert per_gpu_batchsize=64 clip16 text_roberta image_size=244

Fine-tune:

We show an example here : fine-tunning and evaluating on VQA tasks:

$ python train.py data_root=/dataset/vqa num_gpus=8 num_nodes=1task_finetune_vqa_clip_bert per_gpu_batchsize=32 load_path=pretrain.ckpt clip16 text_roberta image_size=244 clip_randaug

We provide our VK-OOD-CLIP/16B-RoBERTa fine-tuned on VQAv2 checkpoint here

Evaluate:

$ python train.py data_root=/dataset/vqa num_gpus=8 num_nodes=1 task_finetune_vqa_clip_bert per_gpu_batchsize=32 load_path=vqav2.ckpt clip16 text_roberta image_size=244 test_only=True

To get test-dev and test-std results, submit result json file /results/vqa_submit_ckpt.json to eval.ai.

Citation

If you use this repo for your work, please consider citing our paper and staring this repo:

@article{wang2023differentiable,
  title={Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis},
  author={Wang, Zhu and Medya, Sourav and Ravi, Sathya N},
  journal={arXiv preprint arXiv:2302.05608},
  year={2023}
}

@article{wang2023implicit,
  title={Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis},
  author={Wang, Zhu and Medya, Sourav and Ravi, Sathya},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  pages={13854--13872},
  year={2023}
}

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

isf-agent

a repo for an agent that helps researchers apply for isf funding