Seqkit
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Install / Use
/learn @shenwei356/SeqkitREADME
SeqKit - a cross-platform and ultrafast toolkit for FASTA/Q file manipulation
- Try SeqKit in your browser (Tutorials and Exercises provided by sandbox.bio)
- Documents: http://bioinf.shenwei.me/seqkit (Usage, FAQs, Tutorial, and Benchmark)
- Source code: https://github.com/shenwei356/seqkit
- Latest version:
- Please cite:
- Others:
<a href="https://doi.org/10.1002/imt2.191"><img src="seqkit2.jpg" alt="Subcommands of SeqKit2" width="700"/></a>
Features
- Easy to install (download)
- Providing statically linked executable binaries for multiple platforms (Linux/Windows/macOS, amd64/arm64)
- Light weight and out-of-the-box, no dependencies, no compilation, no configuration
conda install -c bioconda seqkit
- Easy to use
- Ultrafast (see technical-details and benchmark)
- Seamlessly parsing both FASTA and FASTQ formats
- Supporting (
gzip/xz/zstd/bzip2/lz4compressed) STDIN/STDOUT and input/output file, easily integrated in pipe - Reproducible results (configurable rand seed in
sampleandshuffle) - Supporting custom sequence ID via regular expression
- Supporting Bash/Zsh autocompletion
- Versatile commands (usages and examples)
- Practical functions supported by 38 subcommands
Installation
Method 1: Download binaries
Go to Download Page, where you can find download links to various platforms.
Method 2: Install via Pixi
pixi global install -c bioconda seqkit
Method 3: Install via conda
conda install -c bioconda seqkit
Method 4: Install via homebrew
brew install seqkit
Subcommands
|Category |Command |Function |Input |Strand-sensitivity|Multi-threads| |:----------------|:-------------------------------------------------------------------|:--------------------------------------------------------------------------------------------|:--------------|:-----------------|:------------| |Basic operation |seq |Transform sequences: extract ID/seq, filter by length/quality, remove gaps… |FASTA/Q | | | | |stats |Simple statistics: #seqs, min/max_len, N50, Q20%, Q30%… |FASTA/Q | |✓ | | |subseq |Get subsequences by region/gtf/bed, including flanking sequences |FASTA/Q |+ or/and - | | | |sliding |Extract subsequences in sliding windows |FASTA/Q |+ only | | | |faidx |Create the FASTA index file and extract subsequences (with more features than samtools faidx)|FASTA |+ or/and - | | | |translate |translate DNA/RNA to protein sequence |FASTA/Q |+ or/and - | | | |watch |Monitoring and online histograms of sequence features |FASTA/Q | | | | |scat |Real time concatenation and streaming of fastx files |FASTA/Q | |✓ | |Format conversion|fq2fa |Convert FASTQ to FASTA format |FASTQ | | | | |fx2tab |Convert FASTA/Q to tabular format |FASTA/Q | | | | |fa2fq |Retrieve corresponding FASTQ records by a FASTA file |FASTA/Q |+ only | | | |tab2fx |Convert tabular format to FASTA/Q format |TSV | | | | |convert |Convert FASTQ quality encoding between Sanger, Solexa and Illumina |FASTA/Q | | | |Searching |grep |Search sequences by ID/name/sequence/sequence motifs, mismatch allowed |FASTA/Q |+ and - |partly, -m | | |locate |Locate subsequences/motifs, mismatch allowed |FASTA/Q |+ and - |partly, -m | | |amplicon |Extract amplicon (or specific region around it), mismatch allowed |FASTA/Q |+ and - |partly, -m | | |fish |Look for short sequences in larger sequences |FASTA/Q |+ and - | | |Set operation |sample |Sample sequences by number or proportion |FASTA/Q | | | | |sample2 |Sample sequences by number or proportion (version 2) |FASTA/Q | | | | |rmdup |Remove duplicated sequences by ID/name/sequence |FASTA/Q |+ and - | | | |common |Find common sequences of multiple files by id/name/sequence |FASTA/Q |+ and - | | | |duplicate |Duplicate sequences N times |FASTA/Q | | | | |split |Split sequences into files by id/seq region/size/parts (mainly for FASTA) |FASTA preffered| | | | |split2 |Split sequences into files by size/parts (FASTA, PE/SE FASTQ) |FASTA/Q | | | | |head |print the first N FASTA/Q records, or leading records whose total length >= L |FASTA/Q | | | | |head-genome |Print sequences of the first genome with common prefixes in na
