SkillAgentSearch skills...

Openlrc

Transcribe and translate voice into LRC file using Whisper and LLMs (GPT, Claude, et,al). 使用whisper和LLM(GPT,Claude等)来转录、翻译你的音频为字幕文件。

Install / Use

/learn @zh-plus/Openlrc

README

Open-Lyrics

PyPI PyPI - License Downloads GitHub Workflow Status (with event)

Open-Lyrics is a Python library that transcribes audio with faster-whisper, then translates/polishes the text into .lrc subtitles with LLMs such as OpenAI and Anthropic.

Key Features

  • Audio preprocessing to reduce hallucinations (loudness normalization and optional noise suppression).
  • Context-aware translation to improve translation quality. Check prompt for details.
  • Check here for an overview of the architecture.

New 🚨

  • 2024.5.7:
    • Added custom endpoint (base_url) support for OpenAI and Anthropic:
      lrcer = LRCer(
          translation=TranslationConfig(
              base_url_config={'openai': 'https://api.chatanywhere.tech',
                               'anthropic': 'https://example/api'}
          )
      )
      
    • Added bilingual subtitle generation:
      lrcer.run('./data/test.mp3', target_lang='zh-cn', bilingual_sub=True)
      
  • 2024.5.11: Added glossary support in prompts to improve domain-specific translation. Check here for details.
  • 2024.5.17: You can route models to arbitrary chatbot SDKs (OpenAI or Anthropic) by setting chatbot_model to provider: model_name together with base_url_config:
    lrcer = LRCer(
        translation=TranslationConfig(
            chatbot_model='openai: claude-3-haiku-20240307',
            base_url_config={'openai': 'https://api.g4f.icu/v1/'}
        )
    )
    
  • 2024.6.25: Added Gemini as a translation model (for example, gemini-1.5-flash):
    lrcer = LRCer(translation=TranslationConfig(chatbot_model='gemini-1.5-flash'))
    
  • 2024.9.10: Now openlrc depends on a specific commit of faster-whisper, which is not published on PyPI. Install it from source:
    pip install "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/8327d8cc647266ed66f6cd878cf97eccface7351.tar.gz"
    
  • 2024.12.19: Added ModelConfig for model routing. It is more flexible than plain model-name strings. ModelConfig can be ModelConfig(provider='<provider>', name='<model-name>', base_url='<url>', proxy='<proxy>'), e.g.:
    
    from openlrc import LRCer, TranslationConfig, ModelConfig, ModelProvider
    
    chatbot_model1 = ModelConfig(
        provider=ModelProvider.OPENAI, 
        name='deepseek-chat', 
        base_url='https://api.deepseek.com/beta', 
        api_key='sk-APIKEY'
    )
    chatbot_model2 = ModelConfig(
        provider=ModelProvider.OPENAI, 
        name='gpt-4o-mini', 
        api_key='sk-APIKEY'
    )
    lrcer = LRCer(translation=TranslationConfig(chatbot_model=chatbot_model1, retry_model=chatbot_model2))
    

Installation ⚙️

  1. Install CUDA 11.x and cuDNN 8 for CUDA 11 first according to https://opennmt.net/CTranslate2/installation.html to enable faster-whisper.

    faster-whisper also needs cuBLAS for CUDA 11 installed.

    <details> <summary>For Windows Users (click to expand)</summary>

    (Windows only) You can download the libraries from Purfview's repository:

    Purfview's whisper-standalone-win provides the required NVIDIA libraries for Windows in a single archive. Decompress the archive and place the libraries in a directory included in the PATH.

    </details>
  2. Add LLM API keys (recommended for most users: OPENROUTER_API_KEY):

  3. Install ffmpeg and add bin directory to your PATH.

  4. This project can be installed from PyPI:

    pip install openlrc
    

    or install directly from GitHub:

    pip install git+https://github.com/zh-plus/openlrc
    
  5. Install the latest faster-whisper from source:

    pip install "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/8327d8cc647266ed66f6cd878cf97eccface7351.tar.gz"
    
  6. Install PyTorch:

    pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
    
  7. Fix the typing-extensions issue:

    pip install typing-extensions -U
    

Lightweight Imports

OpenLRC keeps several package-root APIs lightweight to import.

The following imports are guaranteed not to eagerly load heavyweight runtime dependencies such as torch, spacy, faster-whisper, tiktoken, or lingua:

import openlrc
from openlrc import LRCer
from openlrc import TranscriptionConfig, TranslationConfig
from openlrc import ModelConfig, ModelProvider, list_chatbot_models

This is useful when you only need configuration objects, model metadata, or the LRCer type itself without immediately starting transcription or language-processing work.

Heavy dependencies are loaded only when the corresponding features are first used. For example:

  • faster-whisper is loaded when transcription is first needed.
  • torch and df.enhance are loaded when noise suppression is used.
  • spacy is loaded when sentence segmentation or related NLP helpers are used.
  • tiktoken is loaded when token counting is used.
  • lingua is loaded when language detection helpers are used.

[!NOTE] Lightweight imports improve import-time behavior only. They do not change installation requirements: pip install openlrc still installs the full dependency set declared by the package.

Usage 🐍

Python code

import os
from openlrc import LRCer, TranscriptionConfig, TranslationConfig, ModelConfig, ModelProvider

if __name__ == '__main__':
    lrcer = LRCer()

    # Single file
    lrcer.run('./data/test.mp3',
              target_lang='zh-cn')  # Generate translated ./data/test.lrc with default translate prompt.

    # Multiple files
    lrcer.run(['./data/test1.mp3', './data/test2.mp3'], target_lang='zh-cn')
    # Note we run the transcription sequentially, but run the translation concurrently for each file.

    # Path can contain video
    lrcer.run(['./data/test_audio.mp3', './data/test_video.mp4'], target_lang='zh-cn')
    # Generate translated ./data/test_audio.lrc and ./data/test_video.srt

    # Use glossary to improve translation
    lrcer = LRCer(translation=TranslationConfig(glossary='./data/aoe4-glossary.yaml'))

    # To skip translation process
    lrcer.run('./data/test.mp3', target_lang='en', skip_trans=True)

    # Change asr_options or vad_options (see openlrc.defaults for details)
    vad_options = {"threshold": 0.1}
    lrcer = LRCer(transcription=TranscriptionConfig(vad_options=vad_options))
    lrcer.run('./data/test.mp3', target_lang='zh-cn')

    # Enhance the audio using noise suppression (consume more time).
    lrcer.run('./data/test.mp3', target_lang='zh-cn', noise_suppress=True)

    # Change the translation model
    lrcer = LRCer(translation=TranslationConfig(chatbot_model='claude-3-sonnet-20240229'))
    lrcer.run('./data/test.mp3', target_lang='zh-cn')

    # Clear temp folder after processing done
    lrcer.run('./data/test.mp3', target_lang='zh-cn', clear_temp=True)

    # Use OpenRouter via ModelConfig (custom base_url + routed model name)
    openrouter_model = ModelConfig(
        provider=ModelProvider.OPENAI,
        name='anthropic/claude-3.5-haiku',
        base_url='https://openrouter.ai/api/v1',
        api_key=os.getenv('OPENROUTER_API_KEY')
    )
    fallback_model = ModelConfig(
        provider=ModelProvider.OPENAI,
        name='openai/gpt-4.1-nano',
        base_url='https://openrouter.ai/api/v1',
        api_key=os.getenv('OPENROUTER_API_KEY')
    )
    lrcer = LRCer(
        translation=TranslationConfig(chatbot_model=openrouter_model, retry_model=fallback_model)
    )

    # Bilingual subtitle
    lrcer.run('./data/test.mp3', target_lang='zh-cn', bilingual_sub=True)

Check more details in Documentation.

Glossary

Add glossary to improve domain specific translation. For example aoe4-glossary.yaml:

{
  "aoe4": "帝国时代4",
  "feudal": "封建时代",
  "2TC": "双TC",
  "English": "英格兰文明",
  "scout": "侦察兵"
}
lrcer = LRCer(translation=TranslationConfig(glossary='./data/aoe4-glossary.yaml'))
lrcer.ru

Related Skills

View on GitHub
GitHub Stars641
CategoryDevelopment
Updated9h ago
Forks49

Languages

Python

Security Score

100/100

Audited on Mar 25, 2026

No findings