SkillAgentSearch skills...

Pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO

Install / Use

/learn @saketkc/Pysradb
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

A Python package for retrieving metadata from SRA/ENA/GEO

image image image image image image

Documentation

https://saketkc.github.io/pysradb

CLI Usage

pysradb supports command line usage. See CLI instructions or quickstart guide.

$ pysradb
usage: pysradb [-h] [--version] [--citation]
               {metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs,geo-matrix,srp-to-pmid,gse-to-pmid,pmid-to-gse,pmid-to-srp,pmc-to-identifiers,pmid-to-identifiers,doi-to-gse,doi-to-srp,doi-to-identifiers}
               ...

pysradb: Query NGS metadata and data from NCBI Sequence Read Archive.
version: 3.0.0
Citation: 10.12688/f1000research.18676.1

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --citation            how to cite

subcommands:
  {metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs,geo-matrix,srp-to-pmid,gse-to-pmid,pmid-to-gse,pmid-to-srp,pmc-to-identifiers,pmid-to-identifiers,doi-to-gse,doi-to-srp,doi-to-identifiers}
    metadata            Fetch metadata for SRA project (SRPnnnn)
    download            Download SRA project (SRPnnnn)
    search              Search SRA/ENA for matching text
    gse-to-gsm          Get GSM for a GSE
    gse-to-srp          Get SRP for a GSE
    gsm-to-gse          Get GSE for a GSM
    gsm-to-srp          Get SRP for a GSM
    gsm-to-srr          Get SRR for a GSM
    gsm-to-srs          Get SRS for a GSM
    gsm-to-srx          Get SRX for a GSM
    srp-to-gse          Get GSE for a SRP
    srp-to-srr          Get SRR for a SRP
    srp-to-srs          Get SRS for a SRP
    srp-to-srx          Get SRX for a SRP
    srr-to-gsm          Get GSM for a SRR
    srr-to-srp          Get SRP for a SRR
    srr-to-srs          Get SRS for a SRR
    srr-to-srx          Get SRX for a SRR
    srs-to-gsm          Get GSM for a SRS
    srs-to-srx          Get SRX for a SRS
    srx-to-srp          Get SRP for a SRX
    srx-to-srr          Get SRR for a SRX
    srx-to-srs          Get SRS for a SRX
    geo-matrix          Download and parse GEO Matrix files
    srp-to-pmid         Get PMIDs for SRP accessions
    gse-to-pmid         Get PMIDs for GSE accessions
    pmid-to-gse         Get GSE accessions from PMIDs
    pmid-to-srp         Get SRP accessions from PMIDs
    pmc-to-identifiers  Extract database identifiers from PMC articles
    pmid-to-identifiers
                        Extract database identifiers from PubMed articles
    doi-to-gse          Get GSE accessions from DOIs
    doi-to-srp          Get SRP accessions from DOIs
    doi-to-identifiers  Extract database identifiers from articles via DOI

Quickstart

A Google Colaboratory version of most used commands are available in this Colab Notebook . Note that this requires only an active internet connection (no additional downloads are made).

The following notebooks document all the possible features of `pysradb`:

  1. Python API
  2. Downloading datasets from SRA - command line
  3. Parallely download multiple datasets - Python API
  4. Converting SRA-to-fastq - command line (requires conda)
  5. Downloading subsets of a project - Python API
  6. Metadata for multiple SRPs
  7. Searching SRA/GEO/ENA
  8. Extracting identifiers from PMC/DOI
  9. Metadata Enrichment with LLMs

Installation

To install stable version using `pip`:

pip install pysradb

Alternatively, if you use conda:

conda install -c bioconda pysradb

This step will install all the dependencies. If you have an existing environment with a lot of pre-installed packages, conda might be slow. Please consider creating a new enviroment for pysradb:

conda create -c bioconda -n pysradb PYTHON=3.13 pysradb

Dependencies

pandas
requests
tqdm
xmltodict

Installing pysradb in development mode

git clone https://github.com/saketkc/pysradb.git
cd pysradb && pip install -r requirements.txt
pip install -e .

Using pysradb

Obtaining SRA metadata

$ pysradb metadata SRP000941 | head

study_accession experiment_accession experiment_title                                                                                                                 experiment_desc                                                                                                                  organism_taxid  organism_name library_strategy library_source  library_selection sample_accession sample_title instrument                    total_spots total_size    run_accession run_total_spots run_total_bases
SRP000941       SRX056722                                                                         Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells                                                               Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC    ChIP            SRS184466                              Illumina HiSeq 2000    26900401     531654480   SRR179707     26900401         807012030
SRP000941       SRX027889                                                                            Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells                                                                  Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC    ChIP            SRS116481                      Illumina Genome Analyzer II    37528590     779578968   SRR067978     37528590        1351029240
SRP000941       SRX027888                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS116483                      Illumina Genome Analyzer II    13603127    3232309537   SRR067977     13603127         489712572
SRP000941       SRX027887                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS116562                      Illumina Genome Analyzer II    22430523     506327844   SRR067976     22430523         807498828
SRP000941       SRX027886                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigenome: ChIP-Seq Input from hESC H1 Cells  9606            Homo sapiens       ChIP-Seq           GENOMIC  RANDOM            SRS116560                      Illumina Genome Analyzer II    15342951     301720436   SRR067975     15342951         552346236
SRP000941       SRX027885                                                                                     Reference Epigenome: ChIP-Seq Input from hESC H1 Cells                                                                           Reference Epigeno

Related Skills

View on GitHub
GitHub Stars354
CategoryDevelopment
Updated5d ago
Forks63

Languages

Python

Security Score

100/100

Audited on Apr 2, 2026

No findings