SkillAgentSearch skills...

NBLM2PPTX

Convert NotebookLM PDFs to PPTX with separated background images and editable text layers using Gemini AI

Install / Use

/learn @laihenyi/NBLM2PPTX
About this skill

Quality Score

0/100

Supported Platforms

Gemini CLI

README

NBLM2PPTX - NotebookLM PDF to PPTX Converter

License: MIT GitHub stars GitHub forks GitHub issues

Convert NotebookLM exported PDFs into PPTX presentations with separated background images and editable text layers.

Updated (2026-01-21): v2.2.1 Release - Complete i18n Overhaul! All language versions now feature professional light theme design with improved UX and standardized documentation.

繁體中文 | 简体中文 | 日本語 | Español | Français

Demo

🎬 Product Video (40s)

▶ Watch Demo Video

Direct Download (4.7 MB)

Background Music: "Happy Upbeat Ukulele" by MaxKoMusic (CC BY-SA 3.0)

v1.1 - Hybrid Text Extraction

| Original (NotebookLM PDF) | Output (Editable PPTX) | |:-------------------------:|:----------------------:| | <img src="assets/demo-v1.1-original.jpg" width="400"> | <img src="assets/demo-v1.1-output.jpg" width="400"> |

PDF.js native text extraction provides precise text positioning without additional API calls.

v1.0 - AI Text Removal

| Before (NotebookLM PDF) | After (Editable PPTX) | |:-----------------------:|:---------------------:| | <img src="assets/demo-after.png" width="400"> | <img src="assets/demo-before.png" width="400"> |

Left: Original PDF from NotebookLM (text embedded in image) Right: Converted PPTX with clean background + editable text layers

What's New in v2.3 (2026-01-21)

⚡ Dual Mode OCR System

  • Lite Model (Default): Uses gemini-2.5-flash-lite for OCR, 40-50% faster with 50% API quota savings
  • Standard Model (Optional): Uses gemini-2.5-flash for full font size, weight, and color style detection
  • User Flexibility: Switch OCR models during page selection to balance speed and quality based on your needs

🚀 Parallel Processing Technology

  • Simultaneous Execution: Text removal and OCR run concurrently, no waiting
  • Reduced Processing Time: From 3-4 seconds per page down to 2-3 seconds
  • Intelligent Fault Tolerance: Single API failure doesn't affect the overall workflow, improving stability

💡 Clear Usage Guidelines

  • Lite Model Best For: Plain text notes, meeting minutes, content drafts (when visual formatting doesn't matter)
  • Standard Model Best For: Beautiful presentations, brand showcases, teaching slides (require visual hierarchy)
  • Transparent Limitations: Clear communication about Lite model's inability to detect font styles

📊 Output Comparison

| Lite Model | Standard Model | |:--------:|:--------:| | <img src="assets/demo-v2.3-lite.jpg" width="400"> | <img src="assets/demo-v2.3-standard.jpg" width="400"> |

Lite Model: All text uses uniform styling, no font size variation (faster, saves API quota) Standard Model: Fully preserves font size hierarchy between titles and body text (complete style detection)


What's New in v2.2.1 (2026-01-21)

🌍 Complete i18n Overhaul

  • Professional Design Across All Languages: Completely redesigned all language versions (English, Spanish, Japanese, French, Simplified Chinese) from dark theme to modern light theme
  • Unified Font System: Migrated to Poppins (headings) + Open Sans (body) with language-specific fallbacks (Noto Sans JP, Noto Sans SC, etc.)
  • Professional Blue Color Scheme: Implemented consistent #3B82F6 primary color across all versions for trust and professionalism
  • Enhanced API Key Modal: Browser-based API Key storage with localStorage integration eliminates need for code editing
  • Collapsible UI Elements: Added collapsible alert banner and tools section for cleaner interface

📚 Standardized Documentation

  • Comprehensive READMEs: All language README files now follow 204-line comprehensive structure
  • Quick Start Guide: Added 3-step quick start instructions for better onboarding
  • Free API Quota Details: Clear documentation of Google Gemini's free tier (15 RPM, 1500 RPD, no credit card)
  • Complete FAQ Section: 5 Q&A pairs covering common questions about API keys, security, failures, sharing, and offline usage

🎨 Design System Updates

  • Light Theme: #F8FAFC background for better readability
  • Modern Card Layout: Clean borders (#E2E8F0) and subtle shadows
  • Professional SVG Icons: Replaced emoji icons with proper SVG graphics
  • Responsive Typography: Optimized font sizes and spacing for all screen sizes

What's New in v2.2 (2026-01-20)

🎯 Soft Reset with API Key Persistence

  • No More Re-entering: API Key is now preserved in memory when you click "Restart"
  • Unlimited Restarts: Process multiple batches without re-entering your API Key
  • Smart State Management: Resets all processing state while keeping your credentials

⚡ Speed Optimization

  • 70% Faster Processing: Reduced inter-page delay from 3.5s to 1.0s
  • Parallel Processing: Leverages concurrent API calls for maximum efficiency
  • Instant Reset: Soft reset returns to initial state immediately without page reload

🔧 IMAGE_RECITATION Error Fix

  • Improved AI Prompt: Enhanced prompt engineering to avoid copyright detection
  • Better Background Reconstruction: More accurate content-aware fill results
  • Reduced Temperature: More consistent AI behavior with temperature 0.4

📝 UI Improvements

  • Clearer Instructions: Updated API Key setup guide to match actual workflow
  • Clean Reset UI: Restored initial upload interface on reset instead of loading spinner

Features

  • AI Text Removal: Uses Gemini 2.5 Flash to automatically remove text from images and reconstruct backgrounds
  • Hybrid Text Extraction: PDF sources use native PDF.js extraction for precise coordinates; image sources use enhanced Gemini OCR
  • Separated Layers: Exported PPTX contains background images and text as independent layers for easy editing
  • Batch Processing: Supports processing multiple PDF pages or images at once
  • Page Selection: Freely select which pages to process, saving time and API quota

Usage

Quick Start (3 Simple Steps)

  1. Open the HTML file in your browser (Chrome/Edge recommended)
  2. Follow the guided setup to get your free API Key from Google
  3. Start processing your PDF or images immediately!

First-Time Setup

When you first open the application, a friendly setup wizard will guide you through:

  1. Visit Google AI Studio - One-click link to aistudio.google.com/apikey
  2. Create Your Free API Key - Sign in with your Google account (no credit card required)
  3. Paste and Save - Copy your API Key and paste it into the app

🔒 Your API Key is stored securely in your browser and never uploaded to any server.

Free API Quota

Google Gemini API offers a generous free tier:

  • 15 requests per minute
  • 1,500 requests per day
  • No credit card required

This is more than enough for typical daily use!

Alternative: Using in Google Gemini Canvas (Advanced)

If you prefer to run in Gemini Canvas environment:

  1. Open Google Gemini
  2. Paste the code from 01.html into Canvas
  3. Click "Preview" to run

⚠️ Note: As of January 2026, API Key is still required even in Canvas environment. The app will prompt you to set it up.

Workflow

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  Upload PDF │ -> │   Select    │ -> │ AI Process  │ -> │ Export PPTX │
│  or Images  │    │   Pages     │    │ Remove Text │    │ BG + Text   │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

Step 1: Upload Files

  • Drag and drop or click to upload NotebookLM exported PDFs
  • Also supports JPG, PNG, WebP and other image formats
  • Multiple files can be uploaded at once

Tip: NotebookLM exported PDFs can be quite large. You can use free PDF compression services to reduce file size before uploading for much better efficiency.

Step 2: Select Pages

  • System automatically generates thumbnails for all pages
  • Check the pages you want to process (all selected by default)
  • Click "Start Processing" to proceed

Step 3: AI Processing

  • Gemini removes text from each page and reconstructs the background
  • Progress is displayed in real-time
  • Each page takes approximately 3-5 seconds (including API latency)

Note: Gemini text removal may sometimes be incomplete. If you notice excessive text residue, you can try processing again.

Step 4: Export PPTX

  • Select presentation ratio (16:9 / 9:16 / 4:3)
  • Click "Export PPTX" to download
  • Text positioning uses hybrid strategy:
    • PDF sources: Uses pre-extracted coordinates from PDF.js (instant, no API call)
    • Image sources: Uses Gemini OCR with enhanced styling detection

Output Structure

Each slide in the exported PPTX contains:

| Layer | Content | |-------|---------| | Bottom | Clean background image with text removed | | Top | Editable text boxes (positioned to match original text) |

This layered structure allows you to:

  • Easily modify text content
  • Change fonts, colors, and sizes
  • Adjust text positions
  • Preserve the original design style

Technical Specifications

| Item | Description | |------|-------------| | AI Model | Gemini 2.5 Flash Image (Text Removal) + Gemini 2.5 Flash (OCR) | | Text Removal | Optimized prompt for complete text erasure wi

View on GitHub
GitHub Stars288
CategoryDevelopment
Updated7h ago
Forks68

Languages

HTML

Security Score

100/100

Audited on Mar 24, 2026

No findings