Video-Language Model Robustness

Video and Language Perturbations

This work evaluted the robustness of video-language models on text-to-video retreival using a variety of video and/or text perturbations. For more information, check out our site.

<p align="center"> <center><img src="./images/PerturbationTypesCombined.png" width=600px /></center> <br> <center>Different real-world perturbations used in this study. </center> </p>

Text Perturbations

To generate text perturbations, code is available in generate_noisy_text.py. You can call this script from the command line, for example:

python generate_noisy_text.py msrvtt --meta_pth msvrtt_eval.csv --text_style --textflint

This will call perturbations to run for those generated by the TextStyle and TextFlint packages for the MSRVTT dataset using the csv file that has (at minimum) columns for video_id and text.

This is the same procedure for the MC VideoQA on MSRVTT in generate_noisy_mc_videoqa.py

Video Perturbations

We provide both the on-the-fly generation of perturbations in video_perturbations.py which is useful for processing pre-extracted features and generating noisy video copies in generate_noisy_videos.py.

To run generate_noisy_videos.py, an example is:

python generate_noisy_videos.py msrvtt data/msrvtt/videos data/msrvtt/noisy_videos blur

This will run generating videos for MSRVTT where the original videos are stored in data/msrvtt/videos, perturbing with blur and saving the copies in data/msrvtt/noisy_videos.

Before running this command, you need to generate a file for the MSRVTT and YouCook2 dataset with a mapping of the original video for one column and the target file for the second. This should be stored as datasets/{youcook2, msrvtt}_videolist.csv. Example:

YouCook2/validation/226/videos/xHr8X2Wpmno.mkv,robustness/youcook2/xHr8X2Wpmno.mkv
YouCook2/validation/105/videos/V53XmPeyjIU.mkv,robustness/youcook2/V53XmPeyjIU.mkv
YouCook2/validation/201/videos/mZwK0TBI1iY.mkv,robustness/youcook2/mZwK0TBI1iY.mkv
YouCook2/validation/310/videos/gEYyWqs1oL0.mp4,robustness/youcook2/gEYyWqs1oL0.mp4

Use video_perturbations.py by creating a VideoPerturbation object by initializing the perturbation and severity. This is useful when modifying video feature extractor code from fairseq and VideoFeatureExtractor.

Original Model Code

Robustness Scores

The file robustness_scores.py provides sample code on how to calculate the robustness score for perturbation combinations. This is done by collecting model retreival scores for R@5, R@10, R@25 for different perturbation scores. This particular function requires a pandas.dataframe as the results of models and their runs were collected in csv files. An example of what this file may look like is:

| R@1 | R@5 | Median-R | Model | Dataset | Perturbation | Severity | Type | PerturbModality | Name | Train | R@1 Error | R@5 Error | | ----| ----|----------|-------|---------|--------------|----------|------|-----------------|------|-------|-----------|-----------| |0.103|0.227|41|VideoClip|MSRVTT|shuffle_order|0|Positional|Text|ShuffleOrder|zs|0|0| |0.072|0.181|59|VideoClip|MSRVTT|shuffle_order|1|Positional|Text|ShuffleOrder|zs|-0.031|-0.046| |0.103|0.227|41|VideoClip|MSRVTT|shot_noise|0|Noise|Video|ShotNoise|zs|0|0| |0.063|0.153|63.5|VideoClip|MSRVTT|shot_noise|1|Noise|Video|ShotNoise|zs|-0.04|-0.074|

Each perturbation will have a severity of 0 that represents the baseline scores for easier calculation. Any severity greater than 0 indicates a perturbation was applied.

Citation

@inproceedings{
schiappa2022robustness,
title={Robustness Analysis of Video-Language Models Against Visual and Language Perturbations},
author={Madeline Chantry Schiappa and Shruti Vyas and Hamid Palangi and Yogesh S Rawat and Vibhav Vineet},
booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2022},
url={https://openreview.net/forum?id=A79jAS4MeW9}

}

Examples

For examples, please see EXAMPLES.md.

VideoLanguageModelRobustness

Install / Use

README