VideoDataset
A GPU-accelerated library that enables random frame access and efficient video decoding for data loading.
Install / Use
/learn @AgiBot-World/VideoDatasetREADME
VideoDataset
<!-- SPHINX-START -->A GPU-accelerated library that enables random frame access and efficient video decoding for data loading.
[!WARNING] VideoDataset is in the Alpha phase. Frequent changes and instability should be anticipated. Any feedback, comments, suggestions and contributions are welcome!
Overview
VideoDataset is a high-performance video decoding multi-framework supporting library. It aims to provide framework-integrated solutions for working with video decoding tasks.
Key Features:
- GPU-accelerated video decoding using NvCodec library
- Support for common video formats (H.264, H.265, etc.)
- Easy integration with multi-frameworks and multi-formats.
Installation
Prerequisites
- NVIDIA GPU with CUDA support and CUDA Toolkit 12.0+ installed
- Python 3.10 or later
Install from PyPI
pip install agibot-videodataset
Building from Source
pip install git+https://github.com/AgiBot-World/VideoDataset.git
Quick Start
The complete example can be found in the quickstart documentation.
Documentation
Please refer to full documentation here.
Also, a sphinx-based documentation can be generated by running the following command:
make dev-doc doc
It will generate the documentation in the docs/_build/html directory and serve it on http://localhost:8000.
Performance
VideoDataset is optimized for high-throughput video processing. Benchmark results show:
- GPU Decoding: A decoding throughput of 20,000 FPS is achieved in a multiprocessing scenario.
- Random Access: Minimal overhead for non-sequential frame access.
- GPU Decoder Utilization: Over 90% GPU decoder utilization is achieved in a multiprocessing scenario.
See the benchmark documentation for detailed performance analysis.
Comparison with other CPU decoding solutions
In addition, we conducted a comprehensive benchmark comparing it against mainstream CPU software decoding solutions, including OpenCV, Torchvision (PyAV), Torchvision (VideoReader), and TorchCodec (CPU).The results demonstrate that VideoDataset achieves a 3 to 4 times improvement in decoding throughput.

Furthermore, it also demonstrates outstanding performance in reducing CPU utilization.

Development Status
- [X] GPU acceleration via NvCodec
- [X] Random frame access
- [X] PyTorch integration
- [ ] Compatibility with LeRobot
- [ ] Asynchronous pipeline optimization
License
MIT License, for more details, see the LICENSE file.
