S3prl
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Install / Use
/learn @s3prl/S3prlREADME
Contact
We prefer to have discussions directly on Github issue page, so that all the information is transparent to all the contributors and is auto-archived on the Github. If you wish to use email, please contact:
- Shu-wen Yang (leo19941227@gmail.com)
- Andy T. Liu (liuandyt@gmail.com)
We encourage to cite the individual papers most related to the function you are using to give fair credit to the developer of the function. You can find the names in the Change Log. Finally, we would like to thank our advisor, Prof. Hung-yi Lee, for his advice. The project would be impossible without his support.
If you have any question (e.g., about who came up with / developed which ideas / functions or how the project started), feel free to engage in an open and responsible conversation on the GitHub issue page, and we'll be happy to help!
Contribution (pull request)
Guideline
- Starting in 2024, we will only accept new contributions in the form of new upstream models, so we can save bandwidth for developing new techniques (which will not be in S3PRL.)
- S3PRL has transitioned into pure maintenance mode, ensuring the long-term maintenance of all existing functions.
- Reporting bugs or the PR fixing the bugs is always welcome! Thanks!
Tutorials
Environment compatibilities 
We support the following environments. The test cases are ran with tox locally and on github action:
| Env | versions |
| --- | --- |
| os | ubuntu-20.04 |
| python | 3.9, 3.10, 3.11, 3.12 |
| pytorch | 1.13.1 , 2.0.1 , 2.1.2, 2.2.2, 2.3.1, 2.4.0 |
Star History
Change Log
We only list the major contributors here for conciseness. However, we are deeply grateful for all the contributions. Please see the Contributors page for the full list.
- Sep 2024: Support MS-HuBERT (see MS-HuBERT)
- Dec 2023: Support Multi-resolution HuBERT (MR-HuBERT, see Multiresolution HuBERT)
- Oct 2023: Support ESPnet pre-trained upstream models (see ESPnet HuBERT and WavLabLM)
- Sep 2022: In JSALT 2022, We upgrade the codebase to support testing, documentation and a new S3PRL PyPI package for easy installation and usage for upstream models. See our online doc for more information. The package is now used by many open-source projects, including ESPNet. Contributors: Shu-wen Yang (NTU), Andy T. Liu (NTU), Heng-Jui Chang (MIT), Haibin Wu (NTU) and Xuankai Chang (CMU).
- Mar 2022: Introduce SUPERB-SG, see Speech Translation by Hsiang-Sheng Tsai (NTU), Out-of-domain ASR by Heng-Jui Chang (NTU), Voice Conversion by Wen-Chin Huang (Nagoya), Speech Separation and Speech Enhancement by Zili Huang (JHU) for more info.
- Mar 2022: Introduce SSL for SE/SS by Zili Huang (JHU). See SE1 and SS1 folders for more details. Note that the improved performances can be achieved by the later introduced SE2 and SS2. However, for aligning with SUPERB-SG benchmarking, please use the version 1.
- Nov 2021: Introduce S3PRL-VC by Wen-Chin Huang (Nagoya), see Any-to-one for more info. We highly recommend to consider the newly released official repo of S3PRL-VC which is developed and actively maintained by Wen-Chin Huang. The standalone repo contains much more recepies for the VC experiments. In S3PRL we only include the Any-to-one recipe for reproducing the SUPERB results.
- Oct 2021: Support DistilHuBERT by Heng-Jui Chang (NTU), see docs for more info.
- Sep 2021: We host a challenge in AAAI workshop: The 2nd Self-supervised Learning for Audio and Speech Processing! See SUPERB official site for the challenge details and the SUPERB documentation in this toolkit!
- Aug 2021: Andy T. Liu (NTU) and Shu-wen Yang (NTU) introduces the S3PRL toolkit in MLSS 2021, you can also watch it on Youtube!
- Aug 2021: TERA by Andy T. Liu (NTU) is accepted to TASLP!
- July 2021: We are now working on packaging s3prl and reorganizing the file structure in v0.3. Please consider using the stable v0.2.0 for now. We will test and release v0.3 before August.
- June 2021: Support SUPERB: Speech processing Universal PERformance Benchmark, submitted to Interspeech 2021. Use the tag superb-interspeech2021 or v0.2.0. Contributors: Shu-wen Yang (NTU), Pohan Chi (NTU), Yist Lin (NTU), Yung-Sung Chuang (NTU), Jiatong Shi (CMU), Xuankai Chang (CMU), Wei-Cheng Tseng (NTU), Tzu-Hsien Huang (NTU) and Kushal Lakhotia (Meta).
- June 2021: Support extracting multiple hidden states for all the SSL pretrained models by Shu-wen Yang (NTU).
- Jan 2021: Readme updated with detailed instructions on how to use our latest version!
- Dec 2020: We are migrating to a newer version for a more general, flexible, and scalable code. See the introduction below for more information! The legacy version can be accessed the tag v0.1.0.
- Oct 2020: Shu-wen Yang (NTU) and Andy T. Liu (NTU) added varioius classic upstream models, including PASE+, APC, VQ-APC, NPC, wav2vec, vq-wav2vec ...etc.
- Oct 2019: The birth of S3PRL! The repository was created for the Mockingjay development. Andy T. Liu (NTU), Shu-wen Yang (NTU) and Pohan Chi (NTU) implemented the pre-training scripts and several simple downstream evaluation tasks. This work was the very start of the S3PRL project which established lots of foundamental modules and coding styles. Fe
