Specimux
High-performance dual barcode and primer demultiplexer for MinION sequenced reads, optimized for fungal DNA barcoding.
Install / Use
/learn @joshuaowalker/SpecimuxREADME
Specimux
Dual barcode and primer demultiplexing for MinION sequenced reads
Specimux is an independent project inspired by minibar.py (originally developed by the California Academy of Sciences). While building upon core demultiplexing concepts from minibar, Specimux represents a complete reimplementation with substantial algorithmic enhancements and architectural improvements.
Specimux is designed to improve the accuracy and throughput of DNA barcode identification for multiplexed MinION sequencing data, with a primary focus on serving the fungal sequencing community. Whereas minibar.py includes several processing methods supporting a variety of barcode designs and matching regimes, specimux focuses specifically on high precision for demultiplexing dual-indexed sequences.
The tool was developed and tested using the Mycomap ONT037 dataset, which comprises 768 specimens and approximately 765,000 nanopore reads in FastQ format. This real-world dataset provided a robust testing ground, ensuring Specimux's capabilities align closely with the needs of contemporary fungal biodiversity research. Specimux was designed to work seamlessly with the Primary Data Analysis protocol developed by Stephen Russell [1], serving the needs of community-driven fungal DNA barcoding projects.
Installation
Option 1: Install from GitHub (Recommended)
Virtual Environment Recommended: It's strongly recommended to use a virtual environment to avoid dependency conflicts:
# Create and activate virtual environment
python3 -m venv specimux-env
source specimux-env/bin/activate # On Windows: specimux-env\Scripts\activate
# Install latest version (includes visualization support)
pip install git+https://github.com/joshuaowalker/specimux.git
# Install with development tools
pip install "git+https://github.com/joshuaowalker/specimux.git#egg=specimux[dev]"
After installation, specimux commands are available:
specimux --version
specimux primers.fasta specimens.txt sequences.fastq -F -d
Note: Remember to activate your virtual environment (source specimux-env/bin/activate) each time you want to use specimux.
Option 2: Local Development Installation
For development or testing modifications:
# Clone the repository
git clone https://github.com/joshuaowalker/specimux.git
cd specimux
# Create virtual environment (Python 3.10+ required)
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .
# Install with development tools
pip install -e ".[dev]"
Requirements
Python Version: Specimux requires Python 3.10 or newer, with full support for Python 3.10-3.13.
Specimux automatically installs these dependencies:
- edlib>=1.1.2 (sequence alignment)
- biopython>=1.81 (sequence handling)
- pybloomfilter3>=0.7.3 (performance optimization)
- cachetools>=5.3.0 (file handle caching)
- tqdm>=4.65.0 (progress bars)
- plotly>=5.0.0 (visualization support)
- watchdog>=3.0.0 (file system monitoring for specimux-watch)
- pyyaml>=5.0 (profile system)
Specimux has been tested on MacOS and Linux machines.
Available Commands
After installation, specimux provides several command-line tools:
specimux- Main demultiplexer for dual barcode and primer matchingspecimux-watch- (Deprecated) File watcher for live sequencing — usespecimux-suite liveinsteadspecimine- Mine additional sequences from partial barcode matchesspecimux-convert- Convert legacy specimen files to current formatspecimux-stats- Analyze trace files to generate statisticsspecimux-visualize- Create interactive Sankey diagrams from statistics
Basic Usage
Specimux uses primer pools to organize specimens and their associated primers. Here's a basic example:
- Define primers and their pools (primers.fasta):
>ITS1F pool=ITS position=forward
CTTGGTCATTTAGAGGAAGTAA
>ITS4 pool=ITS position=reverse
TCCTCCGCTTATTGATATGC
- Create specimen file mapping barcodes to pools (specimens.txt):
SampleID PrimerPool FwIndex FwPrimer RvIndex RvPrimer
specimen1 ITS ACGTACGT ITS1F TGCATGCA ITS4
specimen2 ITS GTACGTAC ITS1F CATGCATG ITS4
- Run specimux:
specimux primers.fasta specimens.txt sequences.fastq -F -d
Dereplication
When the same read matches multiple primer pairs (common with complex primer pools), specimux deduplicates the output by default:
# Default: Select best match per specimen/barcode group (recommended)
specimux primers.fasta specimens.txt sequences.fastq --dereplicate best
# Output all equivalent matches (may cause read amplification)
specimux primers.fasta specimens.txt sequences.fastq --dereplicate none
The best strategy selects the optimal match per specimen using tiebreakers: barcode distance, primer count, primer distance, and file order. This prevents artificial read amplification that can occur when overlapping primers cause the same read to match through multiple permutations.
Note: If a read legitimately matches multiple different specimens (e.g., through different barcode combinations), it will still appear in each specimen's output file. Dereplication only prevents duplicate output when the same read matches the same specimen through different primer pair routes.
For a full list of options:
specimux -h
Profiles
Profiles allow you to save and reuse parameter presets for different workflows. A profile is a YAML file that sets default values for specimux options — any option explicitly provided on the command line overrides the profile value.
Using Profiles
# List available profiles
specimux --list-profiles
# Run with a profile
specimux primers.fasta specimens.txt sequences.fastq -p default -F -d
# CLI arguments override profile values
specimux primers.fasta specimens.txt sequences.fastq -p default --search-len 120
Profile Resolution
Profiles are loaded in this order (first match wins):
- User profiles in
~/.config/specimux/profiles/ - Bundled profiles shipped with the package
Creating Custom Profiles
Copy the example profile that is created in ~/.config/specimux/profiles/ on first use:
cp ~/.config/specimux/profiles/example.yaml ~/.config/specimux/profiles/my-workflow.yaml
Edit the file to set your preferred defaults. Only include parameters you want to change:
specimux-version: "0.7.*"
description: "My custom workflow"
specimux:
search-len: 120
trim: primers
threads: 8
Available Profile Parameters
Profiles support the following specimux parameters: trim, dereplicate, search-len, index-edit-distance, primer-edit-distance, min-length, max-length, threads, disable-prefilter, disable-preorient, sample-topq, diagnostics.
Version Compatibility
Each profile declares a specimux-version pattern (e.g., "0.7.*"). Specimux validates this on load and raises an error if the profile is incompatible with the installed version, preventing silent parameter mismatches after upgrades.
Progress Reporting
For integration with orchestration tools like specimux-suite, specimux can write JSONL progress updates to a file:
specimux primers.fasta specimens.txt sequences.fastq -F --progress-file progress.jsonl
Each line is a JSON object with type ("progress" or "complete"), processed, matched, and rate fields. Progress lines are throttled to at most one per second.
Primer Pool Organization
Primer pools are a core organizing principle in Specimux, allowing logical grouping of primers and specimens. A pool defines:
- Which primers can be used together
- Which specimens belong to which primer sets
- How output files are organized
Pool Design Benefits
- Organize specimens by target region (e.g., ITS, RPB2)
- Support shared primers between pools
- Improve performance by limiting primer search space
- Provide logical output organization
Primer File Format
Primers are specified in a text file in FASTA format with metadata in the description line:
>primer_name pool=pool1,pool2 position=forward
PRIMER_SEQUENCE
Required metadata:
pool=- Comma/semicolon separated list of pool namesposition=- Either "forward" or "reverse"
Example for fungal ITS and RPB2 regions:
>ITS1F pool=ITS,Mixed position=forward
CTTGGTCATTTAGAGGAAGTAA
>ITS4 pool=ITS position=reverse
TCCTCCGCTTATTGATATGC
>fRPB2-5F pool=RPB2 position=forward
GAYGAYMGWGATCAYTTYGG
>RPB2-7.1R pool=RPB2 position=reverse
CCCATRGCYTGYTTMCCCATDGC
Although the file is technically in FASTA format, you can name it primers.fasta, primers.txt, or anything that makes sense for your workflow.
Specimen File Format
Tab-separated file with columns:
- SampleID - Unique identifier for specimen
- PrimerPool - Which pool the specimen belongs to
- FwIndex - Forward barcode sequence
- FwPrimer - Forward primer name or wildcard (*/-)
- RvIndex - Reverse barcode sequence
- RvPrimer - Reverse primer name or wildcard (*/-)
Example:
SampleID PrimerPool FwIndex FwPrimer RvIndex RvPrimer
specimen1 ITS ACGTACGT ITS1F TGCATGCA ITS4
specimen2 RPB2 GTACGTAC * CATGCATG *
Output Organization
Specimux organizes output with match quality at the top level, making it easy to access your primary data (full matches) while keeping partial matches and unknowns organized separately:
output_dir/
full/ # All complete matches (PRIMARY DATA)
ITS/ # Pool-level aggregation
specimen1.fastq # All ITS full matches collected here
specimen2.fastq
primers.fasta # All primers in the ITS pool
ITS1F-ITS4/ # Primer-pair specific matches
specimen1.fastq
Related Skills
product-manager-skills
31PM skill for Claude Code, Codex, Cursor, and Windsurf: diagnose SaaS metrics, critique PRDs, plan roadmaps, run discovery, and coach PM career transitions.
devplan-mcp-server
3MCP server for generating development plans, project roadmaps, and task breakdowns for Claude Code. Turn project ideas into paint-by-numbers implementation plans.
