SkillAgentSearch skills...

CuratedAtlasQueryR

Tidy R query API for the harmonised and curated CELLxGENE single-cell atlas.

Install / Use

/learn @stemangiola/CuratedAtlasQueryR
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

CuratedAtlasQueryR

<!-- badges: start -->

Lifecycle:maturing

<!-- badges: end -->

CuratedAtlasQuery is a query interface that allow the programmatic exploration and retrieval of the harmonised, curated and reannotated CELLxGENE single-cell human cell atlas. Data can be retrieved at cell, sample, or dataset levels based on filtering criteria.

Harmonised data is stored in the ARDC Nectar Research Cloud, and most CuratedAtlasQuery functions interact with Nectar via web requests, so a network connection is required for most functionality.

<img src="man/figures/logo.png" width="120x" height="139px" />

<img src="man/figures/svcf_logo.jpeg" width="155x" height="58px" /><img src="man/figures/czi_logo.png" width="129px" height="58px" /><img src="man/figures/bioconductor_logo.jpg" width="202px" height="58px" /><img src="man/figures/vca_logo.png" width="219px" height="58px" /><img src="man/figures/nectar_logo.png" width="180px" height="58px" />

Usage

The API has delivered more than 15Tb of data to the community in the first year. Thanks!

<img src="man/figures/downloads.png" width="40%" />

Query interface

Installation

devtools::install_github("stemangiola/CuratedAtlasQueryR")

Load the package

library(CuratedAtlasQueryR)

Load and explore the metadata

Load the metadata

# Note: in real applications you should use the default value of remote_url 
metadata <- get_metadata(remote_url = METADATA_URL)
metadata
#> # Source:   table</vast/scratch/users/milton.m/cache/R/CuratedAtlasQueryR/metadata.0.2.3.parquet> [?? x 56]
#> # Database: DuckDB 0.7.1 [unknown@Linux 3.10.0-1160.88.1.el7.x86_64:R 4.2.1/:memory:]
#>    cell_ sample_ cell_…¹ cell_…² confi…³ cell_…⁴ cell_…⁵ cell_…⁶ sampl…⁷ _samp…⁸
#>    <chr> <chr>   <chr>   <chr>     <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 8387… 7bd7b8… natura… immune…       5 cd8 tem gmp     natura… 842ce7… Q59___…
#>  2 1768… 7bd7b8… natura… immune…       5 cd8 tem cd8 tcm natura… 842ce7… Q59___…
#>  3 6329… 7bd7b8… natura… immune…       5 cd8 tem clp     termin… 842ce7… Q59___…
#>  4 5027… 7bd7b8… natura… immune…       5 cd8 tem clp     natura… 842ce7… Q59___…
#>  5 7956… 7bd7b8… natura… immune…       5 cd8 tem clp     natura… 842ce7… Q59___…
#>  6 4305… 7bd7b8… natura… immune…       5 cd8 tem clp     termin… 842ce7… Q59___…
#>  7 2126… 933f96… natura… ilc           1 nk      nk      natura… c250bf… AML3__…
#>  8 3114… 933f96… natura… immune…       5 mait    nk      natura… c250bf… AML3__…
#>  9 1407… 933f96… natura… immune…       5 mait    clp     natura… c250bf… AML3__…
#> 10 2911… 933f96… natura… nk            5 nk      clp     natura… c250bf… AML3__…
#> # … with more rows, 46 more variables: assay <chr>,
#> #   assay_ontology_term_id <chr>, file_id_db <chr>,
#> #   cell_type_ontology_term_id <chr>, development_stage <chr>,
#> #   development_stage_ontology_term_id <chr>, disease <chr>,
#> #   disease_ontology_term_id <chr>, ethnicity <chr>,
#> #   ethnicity_ontology_term_id <chr>, experiment___ <chr>, file_id <chr>,
#> #   is_primary_data_x <chr>, organism <chr>, organism_ontology_term_id <chr>, …

The metadata variable can then be re-used for all subsequent queries.

Explore the tissue

metadata |>
    dplyr::distinct(tissue, file_id) 
#> # Source:   SQL [10 x 2]
#> # Database: DuckDB 0.7.1 [unknown@Linux 3.10.0-1160.88.1.el7.x86_64:R 4.2.1/:memory:]
#>    tissue              file_id                             
#>    <chr>               <chr>                               
#>  1 bone marrow         1ff5cbda-4d41-4f50-8c7e-cbe4a90e38db
#>  2 lung parenchyma     6661ab3a-792a-4682-b58c-4afb98b2c016
#>  3 respiratory airway  6661ab3a-792a-4682-b58c-4afb98b2c016
#>  4 nose                6661ab3a-792a-4682-b58c-4afb98b2c016
#>  5 renal pelvis        dc9d8cdd-29ee-4c44-830c-6559cb3d0af6
#>  6 kidney              dc9d8cdd-29ee-4c44-830c-6559cb3d0af6
#>  7 renal medulla       dc9d8cdd-29ee-4c44-830c-6559cb3d0af6
#>  8 cortex of kidney    dc9d8cdd-29ee-4c44-830c-6559cb3d0af6
#>  9 kidney blood vessel dc9d8cdd-29ee-4c44-830c-6559cb3d0af6
#> 10 lung                a2796032-d015-40c4-b9db-835207e5bd5b

Download single-cell RNA sequencing counts

Query raw counts

single_cell_counts = 
    metadata |>
    dplyr::filter(
        ethnicity == "African" &
        stringr::str_like(assay, "%10x%") &
        tissue == "lung parenchyma" &
        stringr::str_like(cell_type, "%CD4%")
    ) |>
    get_single_cell_experiment()
#> ℹ Realising metadata.
#> ℹ Synchronising files
#> ℹ Downloading 0 files, totalling 0 GB
#> ℹ Reading files.
#> ℹ Compiling Single Cell Experiment.

single_cell_counts
#> # A SingleCellExperiment-tibble abstraction: 1,571 × 57
#> # [90mFeatures=36229 | Cells=1571 | Assays=counts[0m
#>    .cell sample_ cell_…¹ cell_…² confi…³ cell_…⁴ cell_…⁵ cell_…⁶ sampl…⁷ X_sam…⁸
#>    <chr> <chr>   <chr>   <chr>     <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 AGCG… 11a7dc… CD4-po… cd4 th1       3 cd4 tcm cd8 t   th1     10b339… Donor_…
#>  2 TCAG… 11a7dc… CD4-po… cd4 th…       3 cd4 tcm cd4 tem th1/th… 10b339… Donor_…
#>  3 TTTA… 11a7dc… CD4-po… cd4 th…       3 cd4 tcm cd4 tcm th17    10b339… Donor_…
#>  4 ACAC… 11a7dc… CD4-po… immune…       5 cd4 tcm plasma  th1/th… 10b339… Donor_…
#>  5 CAAG… 11a7dc… CD4-po… immune…       1 cd4 tcm cd4 tcm mait    10b339… Donor_…
#>  6 CTGT… 14a078… CD4-po… cd4 th…       3 cd4 tcm cd4 tem th1/th… 8f71c5… VUHD85…
#>  7 ACGT… 14a078… CD4-po… treg          5 cd4 tcm tregs   t regu… 8f71c5… VUHD85…
#>  8 CATA… 14a078… CD4-po… immune…       5 nk      cd8 tem mait    8f71c5… VUHD85…
#>  9 ACTT… 14a078… CD4-po… mait          5 mait    cd8 tem mait    8f71c5… VUHD85…
#> 10 TGCG… 14a078… CD4-po… cd4 th1       3 cd4 tcm cd4 tem th1     8f71c5… VUHD85…
#> # … with 1,561 more rows, 47 more variables: assay <chr>,
#> #   assay_ontology_term_id <chr>, file_id_db <chr>,
#> #   cell_type_ontology_term_id <chr>, development_stage <chr>,
#> #   development_stage_ontology_term_id <chr>, disease <chr>,
#> #   disease_ontology_term_id <chr>, ethnicity <chr>,
#> #   ethnicity_ontology_term_id <chr>, experiment___ <chr>, file_id <chr>,
#> #   is_primary_data_x <chr>, organism <chr>, organism_ontology_term_id <chr>, …

Query counts scaled per million

This is helpful if just few genes are of interest, as they can be compared across samples.

single_cell_counts = 
    metadata |>
    dplyr::filter(
        ethnicity == "African" &
        stringr::str_like(assay, "%10x%") &
        tissue == "lung parenchyma" &
        stringr::str_like(cell_type, "%CD4%")
    ) |>
    get_single_cell_experiment(assays = "cpm")
#> ℹ Realising metadata.
#> ℹ Synchronising files
#> ℹ Downloading 0 files, totalling 0 GB
#> ℹ Reading files.
#> ℹ Compiling Single Cell Experiment.

single_cell_counts
#> # A SingleCellExperiment-tibble abstraction: 1,571 × 57
#> # [90mFeatures=36229 | Cells=1571 | Assays=cpm[0m
#>    .cell sample_ cell_…¹ cell_…² confi…³ cell_…⁴ cell_…⁵ cell_…⁶ sampl…⁷ X_sam…⁸
#>    <chr> <chr>   <chr>   <chr>     <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 AGCG… 11a7dc… CD4-po… cd4 th1       3 cd4 tcm cd8 t   th1     10b339… Donor_…
#>  2 TCAG… 11a7dc… CD4-po… cd4 th…       3 cd4 tcm cd4 tem th1/th… 10b339… Donor_…
#>  3 TTTA… 11a7dc… CD4-po… cd4 th…       3 cd4 tcm cd4 tcm th17    10b339… Donor_…
#>  4 ACAC… 11a7dc… CD4-po… immune…       5 cd4 tcm plasma  th1/th… 10b339… Donor_…
#>  5 CAAG… 11a7dc… CD4-po… immune…       1 cd4 tcm cd4 tcm mait    10b339… Donor_…
#>  6 CTGT… 14a078… CD4-po… cd4 th…       3 cd4 tcm cd4 tem th1/th… 8f71c5… VUHD85…
#>  7 ACGT… 14a078… CD4-po… treg          5 cd4 tcm tregs   t regu… 8f71c5… VUHD85…
#>  8 CATA… 14a078… CD4-po… immune…       5 nk      cd8 tem mait    8f71c5… VUHD85…
#>  9 ACTT… 14a078… CD4-po… mait          5 mait    cd8 tem mait    8f71c5… VUHD85…
#> 10 TGCG… 14a078… CD4-po… cd4 th1       3 cd4 tcm cd4 tem th1     8f71c5… VUHD85…
#> # … with 1,561 more rows, 47 more variables: assay <chr>,
#> #   assay_ontology_term_id <chr>, file_id_db <chr>,
#> #   cell_type_ontology_term_id <chr>, development_stage <chr>,
#> #   development_stage_ontology_term_id <chr>, disease <chr>,
#> #   disease_ontology_term_id <chr>, ethnicity <chr>,
#> #   ethnicity_ontology_term_id <chr>, experiment___ <chr>, file_id <chr>,
#> #   is_primary_data_x <chr>, organism <chr>, organism_ontology_term_id <chr>, …

Extract only a subset of genes

single_cell_counts = 
    metadata |>
    dplyr::filter(
        ethnicity == "African" &
        stringr::str_like(assay, "%10x%") &
        tissue == "lung parenchyma" &
        stringr::str_like(cell_type, "%CD4%")
    ) |>
    get_single_cell_experiment(assays = "cpm", features = "PUM1")
#> ℹ Realising metadata.
#> ℹ Synchronising files
#> ℹ Downloading 0 files, totalling 0 GB
#> ℹ Reading files.
#> ℹ Compiling Single Cell Experiment.

single_cell_counts
#> # A SingleCellExperiment-tibble abstraction: 1,571 × 57
#> # [90mFeatures=1 | Cells=1571 | Assays=cpm[0m
#>    .cell sample_ cell_…¹ cell_…² confi…³ cell_…⁴ cell_…⁵ cell_…⁶ sampl…⁷ X_sam…⁸
#>    <chr> <chr>   <chr>   <chr>     <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 AGCG… 11a7dc… CD4-po… cd4 th1       3 cd4 tcm cd8 t   th1     10b339… Donor_…
#>  2 TCAG… 11a7dc… CD4-po… cd4 th…       3 cd4 tcm cd4 tem th1/th… 10b339… Donor_…
#>  3 TTTA… 11a7dc… CD4-po… cd4 th…       3 cd4 tcm cd4 tcm th17    10b339… Donor_…
#>  4 ACAC… 11a7dc… CD4-po… immune…       5 cd4 tcm plasma  th1/th… 10b339… Donor_…
#>  5 CAAG… 11a7dc… CD4-po… immune…       1 cd4 tcm cd4 tcm mait    10b339… Donor_…
#>  6 CTGT… 14a078… CD4-po… cd4 th…       3 cd4 tcm cd4 tem th1/th… 8f71c5… VUHD85…
#>  7 ACGT… 14a078… CD4-po… treg          5 cd4 tcm tregs   t regu… 8f71c5… VUHD85…
#>  8 CATA… 14a078… CD4-po… immune…       5 nk      cd8 tem mait
View on GitHub
GitHub Stars94
CategoryData
Updated2d ago
Forks8

Languages

R

Security Score

100/100

Audited on Mar 31, 2026

No findings