MatchTime
[EMNLP 2024 Oral] MatchTime: Towards Automatic Soccer Game Commentary Generation
Install / Use
/learn @jyrao/MatchTimeREADME
MatchTime: Towards Automatic Soccer Game Commentary Generation (EMNLP 2024 Oral)
This repository contains the official PyTorch implementation of MatchTime: https://arxiv.org/abs/2406.18530/
<div align="center"> <img src="./assets/teaser.png"> </div> <div align="center"> <img src="./assets/commentary.png"> </div>Some Information
Project Page $\cdot$ Paper $\cdot$ Dataset $\cdot$ Checkpoint $\cdot$ Demo Video (YouTube) $\cdot$ Demo Video (bilibili)
Requirements
- Python >= 3.8 (Recommend to use Anaconda or Miniconda)
- PyTorch >= 2.0.0 (If use A100)
- transformers >= 4.42.3
- pycocoevalcap >= 1.2
A suitable conda environment named matchtime can be created and activated with:
cd MatchTime
conda env create -f environment.yaml
conda activate matchtime
Training
Before training, make sure you have prepared features and caption data, and put them into according folders. The structure after collating should be like:
└─ MatchTime
├─ dataset
│ ├─ MatchTime
│ │ ├─ valid
│ │ └─ train
│ │ ├─ england_epl_2014-2015
│ │ ... ├─ 2015-02-21 - 18-00 Chelsea 1 - 1 Burnley
│ │ ... └─ Labels-caption.json
│ │
│ ├─ SN-Caption
│ └─ SN-Caption-test-align
│ ├─ england_epl_2015-2016
│ ... ├─ 2015-08-16 - 18-00 Manchester City 3 - 0 Chelsea
│ ... └─ Labels-caption_with_gt.json
│
├─ features
│ ├─ baidu_soccer_embeddings
│ │ ├─ england_epl_2014-2015
... │ ... ├─ 2015-02-21 - 18-00 Chelsea 1 - 1 Burnley
│ ... ├─ 1_baidu_soccer_embeddings.npy
│ └─ 2_baidu_soccer_embeddings.npy
├─ C3D_PCA512
...
with the format of features is adjusted by
python ./features/preprocess.py directory_path_of_feature
Above example gives the format of Baidu feature, in our experiments we also used ResNET_PCA_512, C3D_PCA_512 from official website. If you want to use CLIP(2 FPS) or InternVideo(1FPS) feature. You can follow their official website to extract feature or contact us for features.
After preparing the data and features, you can pre-train (or finetune) with the following terminal command (Check hyper-parameters at the bottom of train.py):
python train.py
Inference
We provide two types of inference:
For all test set
You can generate a .csv file with the following code to test the MatchVoice model with the following code (Check hyper-parameters at the bottom of inference.py)
python inference.py
There is a sample of this type of inference in ./inference_result/sample.csv.
For Single Video
We also provide a version for predict the commentary single video (for our checkpoints, use 30s video)
python inference_single_video_CLIP.py single_video_path
Here we only provide the version of CLIP feature (using VIT/B-32), for crop the CLIP feature, please check here. CLIP features are not the one with best performance but are the most friendly for new new videos.
Alignment
Before doing alignment, you should download videos from here (224p is enough) and make it in the following format:
└─ MatchTime
├─ videos_224p
... ├─ england_epl_2014-2015
... ├─ 2015-02-21 - 18-00 Chelsea 1 - 1 Burnley
... ├─ 1_224.mkv
└─ 2_224p.mkv
Pre-process (Coarse Align)
We need to use WhisperX and LLaMA3 (as agent) to finish coarse alignment with following steps:
WhisperX ASR:
python ./alignment/soccer_whisperx.py --process_directory video_folder(eg. ./videos_224p/england_epl_2014-2015) --output_directory output_folder(eg. ./ASR_results/england_epl_2014-2015)
Transform to Events:
python ./alignment/soccer_asr2events.py --base_path ASR_results_folder(eg. ./ASR_results/england_epl_2014-2015) --output_dir envent_results_folder(eg. ./event_results/england_epl_2014-2015)
Align from Events:
python ./alignment/soccer_align_from_event.py --event_path envent_results_folder(eg. ./event_results/england_epl_2014-2015) --output_dir output_directory(eg. ./pre-processed/england_epl_2014-2015)
More details could be checked in paper.
Contrastive Learning (Fine-grained Align)
After downloading checkpoints from here. Use the following code to finish alignment with contrastive learning:
python ./alignment/do_alignment.py
By changing the hyper-parameter finding_words, you can freely align from ASR, enent, or original SN-Caption.
Also, you can directly use alignment model by
from alignment.matchtime_model import ContrastiveLearningModel
Evaluation
We provide codes for evaluate the prediction results:
# for single csv file
python ./evaluation/scoer_single.py --csv_path ./inference_result/sample.csv
# for many csv files to record scores in a new csv file
python ./evaluation/scoer_group.py
# for gpt score (need OpenAI API Key)
python ./evaluation/scoer_gpt.py ./inference_result/sample.csv
TODO
- [x] Commentary Model & Training & Inference Code
- [x] Release Checkpoints
- [x] Release Meta Data
- [x] Alignment Model & Training & Inference Code
- [x] Evaluation Code
- [x] Release Demo
Citation
If you use this code for your research or project, please cite:
@inproceedings{rao2024matchtimeautomaticsoccergame,
title = {MatchTime: Towards Automatic Soccer Game Commentary Generation},
author = {Rao, Jiayuan and Wu, Haoning and Liu, Chang and Wang, Yanfeng and Xie, Weidi},
booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
year = {2024}
}
Acknowledgements
Many thanks to the code bases from Video-LLaMA and source data from SoccerNet-Caption.
Contact
If you have any questions, please feel free to contact jy_rao@sjtu.edu.cn or haoningwu3639@gmail.com.
Related Skills
node-connect
347.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
