Pysradb
Package for fetching metadata and downloading data from SRA/ENA/GEO
Install / Use
/learn @saketkc/PysradbREADME
A Python package for retrieving metadata from SRA/ENA/GEO
Documentation
https://saketkc.github.io/pysradb
CLI Usage
pysradb supports command line usage. See
CLI instructions or
quickstart
guide.
$ pysradb
usage: pysradb [-h] [--version] [--citation]
{metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs,geo-matrix,srp-to-pmid,gse-to-pmid,pmid-to-gse,pmid-to-srp,pmc-to-identifiers,pmid-to-identifiers,doi-to-gse,doi-to-srp,doi-to-identifiers}
...
pysradb: Query NGS metadata and data from NCBI Sequence Read Archive.
version: 3.0.0
Citation: 10.12688/f1000research.18676.1
options:
-h, --help show this help message and exit
--version show program's version number and exit
--citation how to cite
subcommands:
{metadata,download,search,gse-to-gsm,gse-to-srp,gsm-to-gse,gsm-to-srp,gsm-to-srr,gsm-to-srs,gsm-to-srx,srp-to-gse,srp-to-srr,srp-to-srs,srp-to-srx,srr-to-gsm,srr-to-srp,srr-to-srs,srr-to-srx,srs-to-gsm,srs-to-srx,srx-to-srp,srx-to-srr,srx-to-srs,geo-matrix,srp-to-pmid,gse-to-pmid,pmid-to-gse,pmid-to-srp,pmc-to-identifiers,pmid-to-identifiers,doi-to-gse,doi-to-srp,doi-to-identifiers}
metadata Fetch metadata for SRA project (SRPnnnn)
download Download SRA project (SRPnnnn)
search Search SRA/ENA for matching text
gse-to-gsm Get GSM for a GSE
gse-to-srp Get SRP for a GSE
gsm-to-gse Get GSE for a GSM
gsm-to-srp Get SRP for a GSM
gsm-to-srr Get SRR for a GSM
gsm-to-srs Get SRS for a GSM
gsm-to-srx Get SRX for a GSM
srp-to-gse Get GSE for a SRP
srp-to-srr Get SRR for a SRP
srp-to-srs Get SRS for a SRP
srp-to-srx Get SRX for a SRP
srr-to-gsm Get GSM for a SRR
srr-to-srp Get SRP for a SRR
srr-to-srs Get SRS for a SRR
srr-to-srx Get SRX for a SRR
srs-to-gsm Get GSM for a SRS
srs-to-srx Get SRX for a SRS
srx-to-srp Get SRP for a SRX
srx-to-srr Get SRR for a SRX
srx-to-srs Get SRS for a SRX
geo-matrix Download and parse GEO Matrix files
srp-to-pmid Get PMIDs for SRP accessions
gse-to-pmid Get PMIDs for GSE accessions
pmid-to-gse Get GSE accessions from PMIDs
pmid-to-srp Get SRP accessions from PMIDs
pmc-to-identifiers Extract database identifiers from PMC articles
pmid-to-identifiers
Extract database identifiers from PubMed articles
doi-to-gse Get GSE accessions from DOIs
doi-to-srp Get SRP accessions from DOIs
doi-to-identifiers Extract database identifiers from articles via DOI
Quickstart
A Google Colaboratory version of most used commands are available in this Colab Notebook . Note that this requires only an active internet connection (no additional downloads are made).
The following notebooks document all the possible features of `pysradb`:
- Python API
- Downloading datasets from SRA - command line
- Parallely download multiple datasets - Python API
- Converting SRA-to-fastq - command line (requires conda)
- Downloading subsets of a project - Python API
- Metadata for multiple SRPs
- Searching SRA/GEO/ENA
- Extracting identifiers from PMC/DOI
- Metadata Enrichment with LLMs
Installation
To install stable version using `pip`:
pip install pysradb
Alternatively, if you use conda:
conda install -c bioconda pysradb
This step will install all the dependencies. If you have an existing
environment with a lot of pre-installed packages, conda might be
slow.
Please consider creating a new enviroment for pysradb:
conda create -c bioconda -n pysradb PYTHON=3.13 pysradb
Dependencies
pandas
requests
tqdm
xmltodict
Installing pysradb in development mode
git clone https://github.com/saketkc/pysradb.git
cd pysradb && pip install -r requirements.txt
pip install -e .
Using pysradb
Obtaining SRA metadata
$ pysradb metadata SRP000941 | head
study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_strategy library_source library_selection sample_accession sample_title instrument total_spots total_size run_accession run_total_spots run_total_bases
SRP000941 SRX056722 Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells Reference Epigenome: ChIP-Seq Analysis of H3K27ac in hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC ChIP SRS184466 Illumina HiSeq 2000 26900401 531654480 SRR179707 26900401 807012030
SRP000941 SRX027889 Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells Reference Epigenome: ChIP-Seq Analysis of H2AK5ac in hESC Cells 9606 Homo sapiens ChIP-Seq GENOMIC ChIP SRS116481 Illumina Genome Analyzer II 37528590 779578968 SRR067978 37528590 1351029240
SRP000941 SRX027888 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS116483 Illumina Genome Analyzer II 13603127 3232309537 SRR067977 13603127 489712572
SRP000941 SRX027887 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS116562 Illumina Genome Analyzer II 22430523 506327844 SRR067976 22430523 807498828
SRP000941 SRX027886 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigenome: ChIP-Seq Input from hESC H1 Cells 9606 Homo sapiens ChIP-Seq GENOMIC RANDOM SRS116560 Illumina Genome Analyzer II 15342951 301720436 SRR067975 15342951 552346236
SRP000941 SRX027885 Reference Epigenome: ChIP-Seq Input from hESC H1 Cells Reference Epigeno
Related Skills
node-connect
351.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
