NanoWakeWord is a next-generation, adaptive framework designed to build high-performance, custom wake word models. More than just a tool, it’s an intelligent engine that understands your data and optimizes the entire training process to deliver exceptional accuracy and efficiency.

Quick Access

Choose Your Architecture, Build Your Pro Model

NanoWakeWord is a versatile framework offering a rich library of neural network architectures. Each is optimized for different scenarios, allowing you to build the perfect model for your specific needs. This Colab notebook lets you experiment with any of them.

[!NOTE] Nanowakeword is under active development. For important updates, version-specific notes, and the latest stability status of all features, please refer to our official status document.

➡️ View Latest Release Notes & Project Status

State-of-the-Art Features and Architecture

Nanowakeword is not merely a tool; it's a holistic, end-to-end ecosystem engineered to democratize the creation of state-of-the-art, custom wake word models. It moves beyond simple scripting by integrating a series of automated, production-grade systems that orchestrate the entire lifecycle—from data analysis and feature engineering to advanced training and deployment-optimized inference.

<details> <summary>1. Automated ML Engineering for Peak Performance</summary>

The cornerstone and "brain" of the framework is its data-driven configuration engine. This system performs a holistic analysis of your unique dataset and hardware environment to replace hours of manual, error-prone hyper-parameter tuning with a single, intelligent process. It crafts a powerful, optimized training baseline by synergistically determining:

Adaptive Architectural Scaling: It doesn't just use a fixed architecture; it sculpts one for you. The engine dynamically scales the model's complexity—tuning its depth, width, and regularization (e.g., layers, neurons, dropout) to perfectly match the volume and complexity of your training data. This core function is critical for preventing both underfitting on small datasets and overfitting on large ones.
Optimized Training & Convergence Strategy: Based on data characteristics, it formulates a multi-stage, dynamic learning rate schedule and determines the precise training duration required to reach optimal convergence. This ensures the model is trained to its full potential without wasting computational resources on diminishing returns.
Hardware-Aware Performance Tuning: The engine profiles your entire hardware stack (CPU cores, system RAM, and GPU VRAM) to maximize throughput at every stage. It calculates the maximum efficient batch sizes for data generation, augmentation, and model training, ensuring that your hardware's full potential is unlocked.
Automatic Pre-processing: Just drop your raw audio files (.mp3, .flac, .pcm, etc.) into the data folders — NanoWakeWord automatically handles resampling, channel conversion, and format standardization.
Data-Driven Augmentation Policy: Rather than applying a generic augmentation strategy, the engine crafts a custom augmentation policy. It analyzes the statistical properties of your provided noise and reverberation files to tailor the intensity, probability, and type of on-the-fly augmentations, creating a training environment that mirrors real-world challenges.

While this engine provides a state-of-the-art baseline, it does not sacrifice flexibility. Advanced users retain full, granular control and can override any of the dozens of automatically generated parameters by simply specifying their desired value in the .yaml file.

</details> <details> <summary>2. The Production-Grade Data Pipeline: From Raw Audio to Optimized Features</summary>

Recognizing that data is the bedrock of any great model, Nanowakeword automates the entire data engineering lifecycle with a pipeline designed for scale and quality:

Phonetic Adversarial Negative Generation: This is a key differentiator. The system moves beyond generic noise and random words by performing a phonetic analysis of your wake word. It then synthesizes acoustically confusing counter-examples—phrases that sound similar but are semantically different. This forces the model to learn fine-grained phonetic boundaries, dramatically reducing the false positive rate in real-world use.
Dynamic On-the-Fly Augmentation: During training, a powerful augmentation engine injects a rich tapestry of real-world acoustic scenarios in real-time. This includes applying background noise at varying SNR levels, convolving clips with room impulse responses (RIR) for realistic reverberation, and applying a suite of other transformations like pitch shifting and filtering.
Seamless Large-Scale Data Handling (mmap): The framework shatters the

Nanowakeword

Install / Use

README

Choose Your Architecture, Build Your Pro Model

State-of-the-Art Features and Architecture