SkillAgentSearch skills...

ISeq

Download sequencing data and metadata from GSA, SRA, ENA, and DDBJ databases.

Install / Use

/learn @BioOmics/ISeq
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge GitHub Downloads

iSeq: An integrated tool to fetch public Sequencing data

Cite us: Haoyu Chao, Zhuojin Li, Dijun Chen, Ming Chen, iSeq: An integrated tool to fetch public sequencing data, Bioinformatics, 2024, btae641, https://doi.org/10.1093/bioinformatics/btae641

Description

iSeq is a Bash script that allows you to download sequencing data and metadata from GSA, SRA, ENA, and DDBJ databases. See Detail Pipeline for iSeq. Here is the basic pipeline of iSeq:

iSeq-pipeline

[!IMPORTANT] To use iSeq, Your system must be connected to the network and support FTP, HTTP, and HTTPS protocols.

Update Notes:

2025.11.20

  • ⚠️Update Notice: Due to recent changes in the GSA API, we have updated the way iseq retrieves metadata. The metadata fetching process has been modified accordingly to ensure compatibility with the latest GSA API. Everyone please update iSeq version (≥ 1.9.8)
<details> <summary>More Updates</summary>

2025.10.21

  • New -Q, --quiet option: Added this option to suppress download progress bars. Useful when logging clean outputs.
  • Fixed a bug where --database option failed to switch when ENA was inaccessible.
  • If the MD5 check ultimately fails, delete the partially downloaded files.

2025.10.09

  • Fixed a bug when checking md5 of SRA files.

2025.09.04

  • ⚠️Update Notice: Due to recent changes in the ENA API, we have updated the way iseq retrieves metadata. The metadata fetching process has been modified accordingly to ensure compatibility with the latest ENA API. Everyone please update iSeq version (≥ 1.9.5)

2025.07.22

  • New -r, --protocol option to specify the protocol only when downloading files from ENA.

2025.06.16

  • When using -e, --merge, create symbolic links or retain the original Run files to avoid re-downloading them after merging.
  • Fixed the issue mentioned in #40: modified the behavior so that batch downloads do not terminate upon encountering an error, and instead continue until all items are processed.
  • Added an error message when download failures occur, such as Download failures detected, please check fail.log for details.
  • Fixed a bug where incomplete downloads from GSA were incorrectly reported as successful.

2025.05.23

  • Fixed the issue mentioned in #39. The problem was that using both -d sra and -g together would skip the MD5 check in vdb-validate.
  • New -k, --skip-md5 option: Added this option to disable MD5 checks.

2025.04.25

  • Fixed a bug that occurred when re-downloading with empty metadata.
  • Fixed a bug where the while loop exited abnormally with a non-zero exit code.

2025.04.22

  • Fixed the issue mentioned in #33. -s, --speed re-enable use.
  • Fix the exception when the metadata file is empty, mentioned in #34
  • Bug fix to resolve the issue of MD5 checksum failure when downloading ONT or HiFi third-generation sequencing gzip data.

2025.04.02

  • Fixed the issue mentioned in #27 and Rednote: In sra-tools > 3.0.0, running vdb-validate without specifying the SRA file path causes it to re-download the file, leading to a stuck process. Specifying the path (e.g., vdb-validate ./SRR931847) resolves the issue.

2025.03.14

  • Fixed the issue mentioned in #26. The cause was that the data was paired-end but had only one link, such as SRR23680070.

2025.03.11

  • Input file can contain accessions from different databases.
  • -p and -a can be used simultaneously , with -a taking priority.
  • Fixed some bugs when retrying download data from GSA.

2024.12.26

  • Fixed the bugs mentioned in #16, #17 (2024.12.16) and https://github.com/BioOmics/iSeq/issues/19 (2024.12.26).

2024.11.21

  • Dependency update for aspera-cli: The version requirement for aspera-cli has been updated from aspera-cli to aspera-cli=4.14.0.

2024.10.23

  • New -s, --speed option to set the download speed limit (MB/s) (default: 1000 MB/s). Such as iseq -i SRR7706354 -s 10
  • Dependency update for sra-tools: The version requirement for sra-tools has been updated from sra-tools=2.11 to sra-tools>=2.11.0.

2024.09.14

  • New -e option for merging FASTQ files: Added a -e option to merge multiple FASTQ files into a single file for each Experiment (-e ex), Sample (-e sa), or Study (-e st).

  • New -i option for input: iSeq can now accept a file containing multiple accession numbers as input by -i fileName.

  • API change for GSA metadata download: The API endpoint has been updated from getRunInfo to getRunInfoByCra for downloading GSA metadata.

  • Save result to personal directory: The output results will now be saved in the user's personal directory by -o option.

  • Updated regex for SAMC matching: The matching pattern for SAMC has been changed from SAMC[A-Z]?[0-9]+ to SAMC[0-9]+.

  • Fix some bugs

</details>

Features

  • Multiple Database Support: Supports multiple bioinformatics databases (GSA/SRA/ENA/DDBJ/GEO).
  • Multiple Input Formats: Supports multiple accessions (Project, Study, Sample, Experiment, or Run accession).
  • Metadata Download: Supports download sample metadata for each accession.
<details> <summary>More features</summary>
  • File Format Selection: Users can choose to directly download gzip-formatted FASTQ files or download SRA files and convert them to FASTQ format.
  • Multi-threading Support: Supports the use of multi-threading to accelerate the conversion of SRA to FASTQ files or the compression of FASTQ files.
  • File Merging: For experiment-level accession, the script can merge multiple FASTQ files into one.
  • Parallel Download: Supports parallel download connections, allowing the specification of the number of connections to speed up download speeds.
  • Support for Aspera High-speed Download: For GSA/ENA databases, the script supports high-speed data transfer using Aspera.
  • Automatic Retry Mechanism: If a download or verification fails, the script will automatically retry until a set number of attempts have been reached.
  • Automated File Verification: After the download is complete, the script will automatically verify the integrity of the files, including checking file sizes and MD5 checksums.
  • Error Handling: The script provides error messages and suggestions for solutions when encountering errors.
</details>

Installation

1. iSeq can be installed by conda easily

conda install bioconda::iseq
  • If conda Found conflicts! You can try conda install -c conda-forge -c bioconda iseq

2. The latest version of iSeq can also be installed from source, see INSTALL

# Use the following command to check whether dependent software is installed
iseq --version

Example (See more)

  1. Download all Run sequencing data and metadata associated with an accession.
iseq -i PRJNA211801

e01

  1. Batch download by Aspera with -a to directly download gzip-formatted FASTQ files with -g.
iseq -i SRR_Acc_List.txt -a -g

e13

Usage (中文教程✨)

$ iseq --help

Usage:
  iseq -i accession [options]

Required option:
  -i, --input     [text|file]   Single accession or a file containing multiple accessions.
                                Note: Only one accession per line in the file

Optional options:
  -m, --metadata                Skip the sequencing data downloads and only fetch the metadata for the accession.
  -g, --gzip                    Download FASTQ files in gzip format directly (*.fastq.gz).
                                Note: if *.fastq.gz files are not available, SRA files will be downloaded and converted to *.fastq.gz files.
  -q, --fastq                   Convert SRA files to FASTQ format.
  -t, --threads   int           The n
View on GitHub
GitHub Stars243
CategoryData
Updated2h ago
Forks16

Languages

Shell

Security Score

100/100

Audited on Mar 28, 2026

No findings