SeqKit - a cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Try SeqKit in your browser (Tutorials and Exercises provided by sandbox.bio)
Documents: http://bioinf.shenwei.me/seqkit (Usage, FAQs, Tutorial, and Benchmark)
Source code: https://github.com/shenwei356/seqkit
Latest version:
Please cite:
Others:

Features

Easy to install (download)
- Providing statically linked executable binaries for multiple platforms (Linux/Windows/macOS, amd64/arm64)
- Light weight and out-of-the-box, no dependencies, no compilation, no configuration
- conda install -c bioconda seqkit
Easy to use
- Ultrafast (see technical-details and benchmark)
- Seamlessly parsing both FASTA and FASTQ formats
- Supporting (gzip/xz/zstd/bzip2/lz4 compressed) STDIN/STDOUT and input/output file, easily integrated in pipe
- Reproducible results (configurable rand seed in sample and shuffle)
- Supporting custom sequence ID via regular expression
- Supporting Bash/Zsh autocompletion
Versatile commands (usages and examples)
- Practical functions supported by 38 subcommands

Installation

Method 1: Download binaries

Go to Download Page, where you can find download links to various platforms.

Method 2: Install via Pixi

pixi global install -c bioconda seqkit

Method 3: Install via conda

conda install -c bioconda seqkit

Method 4: Install via homebrew

brew install seqkit

Subcommands

|Category |Command |:----------------|:------------------------------------------------------------------ |Basic operation |seq | |stats | |subseq | |sliding | |faidx | |translate | |watch | |scat |Format conversion|fq2fa | |fx2tab | |fa2fq | |tab2fx | |convert |Searching |grep | |locate | |amplicon | |fish |Set operation |sample | |sample2 | |rmdup | |common | |duplicate | |split | |split2 | |head | |head-genome |Function |Input |Strand-sensitivity|Multi-threads| -|:--------------------------------------------------------------------------------------------|:--------------|:-----------------|:------------| |Transform sequences: extract ID/seq, filter by length/quality, remove gaps… |FASTA/Q | | | |Simple statistics: #seqs, min/max_len, N50, Q20%, Q30%… |FASTA/Q | |✓ | |Get subsequences by region/gtf/bed, including flanking sequences |FASTA/Q |+ or/and - | | |Extract subsequences in sliding windows |FASTA/Q |+ only | | |Create the FASTA index file and extract subsequences (with more features than samtools faidx)|FASTA |+ or/and - | | |translate DNA/RNA to protein sequence |FASTA/Q |+ or/and - | | |Monitoring and online histograms of sequence features |FASTA/Q | | | |Real time concatenation and streaming of fastx files |FASTA/Q | |✓ | |Convert FASTQ to FASTA format |FASTQ | | | |Convert FASTA/Q to tabular format |FASTA/Q | | | |Retrieve corresponding FASTQ records by a FASTA file |FASTA/Q |+ only | | |Convert tabular format to FASTA/Q format |TSV | | | |Convert FASTQ quality encoding between Sanger, Solexa and Illumina |FASTA/Q | | | |Search sequences by ID/name/sequence/sequence motifs, mismatch allowed |FASTA/Q |+ and - |partly, -m | |Locate subsequences/motifs, mismatch allowed |FASTA/Q |+ and - |partly, -m | |Extract amplicon (or specific region around it), mismatch allowed |FASTA/Q |+ and - |partly, -m | |Look for short sequences in larger sequences |FASTA/Q |+ and - | | |Sample sequences by number or proportion |FASTA/Q | | | |Sample sequences by number or proportion (version 2) |FASTA/Q | | | |Remove duplicated sequences by ID/name/sequence |FASTA/Q |+ and - | | |Find common sequences of multiple files by id/name/sequence |FASTA/Q |+ and - | | |Duplicate sequences N times |FASTA/Q | | | |Split sequences into files by id/seq region/size/parts (mainly for FASTA) |FASTA preffered| | | |Split sequences into files by size/parts (FASTA, PE/SE FASTQ) |FASTA/Q | | | |print the first N FASTA/Q records, or leading records whose total length >= L |FASTA/Q | | | |Print sequences of the first genome with common prefixes in na

Seqkit

Install / Use

README