ISeq
Download sequencing data and metadata from GSA, SRA, ENA, and DDBJ databases.
Install / Use
/learn @BioOmics/ISeqREADME
iSeq: An integrated tool to fetch public Sequencing data
Cite us: Haoyu Chao, Zhuojin Li, Dijun Chen, Ming Chen, iSeq: An integrated tool to fetch public sequencing data, Bioinformatics, 2024, btae641, https://doi.org/10.1093/bioinformatics/btae641
Description
iSeq is a Bash script that allows you to download sequencing data and metadata from GSA, SRA, ENA, and DDBJ databases. See Detail Pipeline for iSeq. Here is the basic pipeline of iSeq:

[!IMPORTANT] To use iSeq, Your system must be connected to the network and support FTP, HTTP, and HTTPS protocols.
Update Notes:
2025.11.20
- ⚠️Update Notice: Due to recent changes in the GSA API, we have updated the way iseq retrieves metadata. The metadata fetching process has been modified accordingly to ensure compatibility with the latest GSA API. Everyone please update iSeq version (≥ 1.9.8)
2025.10.21
- New
-Q,--quietoption: Added this option to suppress download progress bars. Useful when logging clean outputs. - Fixed a bug where
--databaseoption failed to switch when ENA was inaccessible. - If the MD5 check ultimately fails, delete the partially downloaded files.
2025.10.09
- Fixed a bug when checking md5 of SRA files.
2025.09.04
- ⚠️Update Notice: Due to recent changes in the ENA API, we have updated the way iseq retrieves metadata. The metadata fetching process has been modified accordingly to ensure compatibility with the latest ENA API. Everyone please update iSeq version (≥ 1.9.5)
2025.07.22
- New
-r,--protocoloption to specify the protocol only when downloading files from ENA.
2025.06.16
- When using
-e,--merge, create symbolic links or retain the original Run files to avoid re-downloading them after merging. - Fixed the issue mentioned in #40: modified the behavior so that batch downloads do not terminate upon encountering an error, and instead continue until all items are processed.
- Added an error message when download failures occur, such as
Download failures detected, please check fail.log for details. - Fixed a bug where incomplete downloads from GSA were incorrectly reported as successful.
2025.05.23
- Fixed the issue mentioned in #39. The problem was that using both
-d sraand-gtogether would skip the MD5 check invdb-validate. - New
-k,--skip-md5option: Added this option to disable MD5 checks.
2025.04.25
- Fixed a bug that occurred when re-downloading with empty metadata.
- Fixed a bug where the while loop exited abnormally with a non-zero exit code.
2025.04.22
- Fixed the issue mentioned in #33.
-s,--speedre-enable use. - Fix the exception when the metadata file is empty, mentioned in #34
- Bug fix to resolve the issue of MD5 checksum failure when downloading ONT or HiFi third-generation sequencing gzip data.
2025.04.02
- Fixed the issue mentioned in #27 and Rednote: In
sra-tools> 3.0.0, runningvdb-validatewithout specifying the SRA file path causes it to re-download the file, leading to a stuck process. Specifying the path (e.g.,vdb-validate ./SRR931847) resolves the issue.
2025.03.14
- Fixed the issue mentioned in #26. The cause was that the data was
paired-endbut had onlyone link, such asSRR23680070.
2025.03.11
- Input file can contain accessions from different databases.
-pand-acan be used simultaneously , with-ataking priority.- Fixed some bugs when retrying download data from GSA.
2024.12.26
- Fixed the bugs mentioned in #16, #17 (2024.12.16) and https://github.com/BioOmics/iSeq/issues/19 (2024.12.26).
2024.11.21
- Dependency update for aspera-cli:
The version requirement for aspera-cli has been updated from
aspera-clitoaspera-cli=4.14.0.
2024.10.23
- New
-s,--speedoption to set the download speed limit (MB/s) (default: 1000 MB/s). Such asiseq -i SRR7706354 -s 10 - Dependency update for sra-tools:
The version requirement for sra-tools has been updated from
sra-tools=2.11tosra-tools>=2.11.0.
2024.09.14
-
New
-eoption for merging FASTQ files: Added a-eoption to merge multiple FASTQ files into a single file for eachExperiment (-e ex),Sample (-e sa), orStudy (-e st). -
New
-ioption for input:iSeqcan now accept afilecontaining multiple accession numbers as input by-i fileName. -
API change for GSA metadata download: The API endpoint has been updated from
getRunInfotogetRunInfoByCrafor downloading GSA metadata. -
Save result to personal directory: The output results will now be saved in the user's personal directory by
-ooption. -
Updated regex for SAMC matching: The matching pattern for SAMC has been changed from
SAMC[A-Z]?[0-9]+toSAMC[0-9]+. -
Fix some bugs
Features
- Multiple Database Support: Supports multiple bioinformatics databases (GSA/SRA/ENA/DDBJ/GEO).
- Multiple Input Formats: Supports multiple accessions (Project, Study, Sample, Experiment, or Run accession).
- Metadata Download: Supports download sample metadata for each accession.
- File Format Selection: Users can choose to directly download gzip-formatted FASTQ files or download SRA files and convert them to FASTQ format.
- Multi-threading Support: Supports the use of multi-threading to accelerate the conversion of SRA to FASTQ files or the compression of FASTQ files.
- File Merging: For experiment-level accession, the script can merge multiple FASTQ files into one.
- Parallel Download: Supports parallel download connections, allowing the specification of the number of connections to speed up download speeds.
- Support for Aspera High-speed Download: For GSA/ENA databases, the script supports high-speed data transfer using Aspera.
- Automatic Retry Mechanism: If a download or verification fails, the script will automatically retry until a set number of attempts have been reached.
- Automated File Verification: After the download is complete, the script will automatically verify the integrity of the files, including checking file sizes and MD5 checksums.
- Error Handling: The script provides error messages and suggestions for solutions when encountering errors.
Installation
1. iSeq can be installed by conda easily
conda install bioconda::iseq
- If conda Found conflicts! You can try
conda install -c conda-forge -c bioconda iseq
2. The latest version of iSeq can also be installed from source, see INSTALL
# Use the following command to check whether dependent software is installed
iseq --version
Example (See more)
- Download all Run sequencing data and metadata associated with an accession.
iseq -i PRJNA211801

- Batch download by Aspera with
-ato directly download gzip-formatted FASTQ files with-g.
iseq -i SRR_Acc_List.txt -a -g

Usage (中文教程✨)
$ iseq --help
Usage:
iseq -i accession [options]
Required option:
-i, --input [text|file] Single accession or a file containing multiple accessions.
Note: Only one accession per line in the file
Optional options:
-m, --metadata Skip the sequencing data downloads and only fetch the metadata for the accession.
-g, --gzip Download FASTQ files in gzip format directly (*.fastq.gz).
Note: if *.fastq.gz files are not available, SRA files will be downloaded and converted to *.fastq.gz files.
-q, --fastq Convert SRA files to FASTQ format.
-t, --threads int The n
