APAeval

Community effort to evaluate computational methods for the detection and quantification of poly(A) sites and estimating their differential usage across RNA-seq samples

Generate Convert Improve

Install / Use

/learn @iRNA-COSI/APAeval

About this skill

Quality Score

0/100

README

APAeval

Welcome to the [APAeval][apa-eval] GitHub repository.

Quick links

Use a benchmarked method on your own RNA-seq data
Benchmark a new method
Extend APAeval's benchmarks

APAeval is a community effort that was born as the APAeval hackathon at the RNA 2021 Conference. We are aiming to evaluate computational methods for the detection and quantification of poly(A) sites from RNA-seq samples in an open, reproducible and extensible manner.

[![logo][apa-eval-logo]][apa-eval]

Overview of APAeval benchmarking
What can you do?
Some technical stuff
Code of Conduct
Open Science, licenses & attribution
Get in touch
Contributors ✨

Overview of APAeval benchmarking

APAeval currently consists of three benchmarking events, each consisting of a set of challenges for bioinformatics methods (=participants) that use RNA-seq data to:

Identify polyadenylation sites
Report poly(A) site expression as absolute quantification in TPM
Report relative expression of poly(A) sites within transcripts

We'd still like to set up a fourth event to evaluate tools that calculate differential usage of polyadenylation sites. If you'd like to contribute, continue reading below.

![schema][apa-eval-overview]

As described above, APAeval consists of three benchmarking events to evaluate the performance of different tasks that the methods of interest (=participants) might be able to perform: PAS identification, absolute quantification, and relative quantification. A method can participate in one, two or all three events, depending on its functions.
Raw data: For challenges within the benchmarking events, APAeval is using data from several different selected publications. Generally, one dataset (consisting of one or more samples) corresponds to one challenge (here, datasets for challenges x and y are depicted). All raw RNA-seq data is processed with nf-core/rna-seq for quality control and mapping. For each dataset we provide a matching ground truth file, created from 3’ end seq data from the same publications as the raw RNA-seq data, that will be used in the challenges to assess the performance of participants. You can find an overview of RNA-seq and matching ground truth samples in [the APAeval Zenodo snapshot][apaeval-zenodo].
Sanctioned input files: The processed input data is made available in .bam format. Additionally, for each dataset a gencode annotation in .gtf format, as well as a reference PAS atlas in .bed format for participants that depend on pre-defined PAS (not shown), are provided.
In order to evaluate each participant in different challenges, a re-usable [“method workflow”][apaeval-mwf-readme] has to be written in either [Snakemake][snakemake] or [Nextflow][nf]. Within this workflow, all necessary pre- and post-processing steps that are needed to get from the input formats provided by APAeval (see 3.), to the output specified by APAeval in their metrics specifications (see 5.) have to be performed.
To ensure compatibility with the workflows of the benchmarking events, [specifications for file formats][apaeval-specs] (output of method workflows = input for benchmarking workflows) are provided by APAeval.
Within a benchmarking event, one or more challenges will be performed. A challenge is primarily defined by the input dataset used for performance assessment. Results of a challenge (metrics) are computed for each participant within a ["benchmarking workflow"][apaeval-bwfs].
In order to compare the performance of participants, results for each participant are uploaded to the OEB database, where metrics for all participants are visualized per challenge.

What can you do?

Use a benchmarked method on your own RNA-seq data

Firstly, you might want to check our [manuscript][manuscript] or our [OpenEBench site][apaeval-oeb] to find the method that would perform best for your use case. If you have decided on a method to use, head over to the [method workflows section in this repo][apaeval-mwf-readme] and follow the instructions in the README.md of the method of your choice. All our method workflows are built in either [Snakemake][snakemake] or [Nextflow][nf], and use [containers][docker] for individual steps to ensure reproducibility and reusability. For instructions on how to set up a [conda environment][conda] for running APAeval workflows see here.

You'll need to have your RNA-seq data ready in .bam format. No idea how to get there? You could check out the [nf-core][nf-core] [RNA-Seq analysis pipeline][nf-core-rna-seq] or other tools such as [ZARP][zarp].

Benchmark a new method

Have you developed a new computational method for investigating APA from RNA-seq data? Or are you interested in one of the tools we haven't managed to include in APAeval yet? We'd be very happy if you decided to contribute to APAeval!

In order to ensure reproducibility of the benchmarks, as well as reusability and shareability of the benchmarked method, you'd start by writing an APAeval style [method workflow][apaeval-mwf-readme]. That workflow will take .bam files as an input, and create .bed files compatible with the [specification for the respective APAeval benchmarking event][apaeval-specs]. Create a PR (pull request; please ask in our [Github discussions board][discussions] to be added to APAeval as a collaborator, or create the PR from a fork) in this repo and wait for your request to be approved. You can then run the workflow on the [data for all APAeval challenges][apaeval-zenodo] and use the resulting .bed files in the corresponding [APAeval benchmarking workflow][apaeval-bwfs] in order to compare the performance of your tool to the [APAeval ground truths][apaeval-zenodo]. Finally you can submit your metrics .json files to us and we'll take care of including them in our [OEB site][apaeval-oeb].

Extend APAeval's benchmarks

One of the main goals of APAeval is to provide extensible benchmarking, such that new tools, new challenges or new metrics can be added at any time. Therefore we warmly welcome any contribution to the project. A good starting point would be to visit our [issue][issues] and [discussion][discussions] boards. The latter one is also the place where you can reach out to us and request we add you to the repo as a collaborator (alternatively, create your PRs from a fork). You can then take on an existing task, suggest a new one, or start a discussion.

Some technical stuff

OpenEBench

We are partnering with [OpenEBench][oeb], a benchmarking and technical monitoring platform for bioinformatics tools. OpenEBench development, maintenance and operation is coordinated by [Barcelona Supercomputing Center (BSC)][bsc] together with partners from the European Life Science infrastructure initiative [ELIXIR][elixir].

OpenEBench tooling will facilitate the computation and visualization of benchmarking results and store the results of all benchmarking events and challenges in their databases, making it easy for others to explore results. This should also make it easy to add additional participants to existing benchmarking events later on. OpenEBench developers are also advising us on creating benchmarks that are compatible with good practices in the wider community of bioinformatics challenges.

APAeval conda environment

For reproducible execution of our workflows (both method and benchmarking workflows) we're using a conda environment with fixed versions of Snakemake, Nextflow, some python packages, and Singularity. Make sure you have [conda][conda] installed and from the root directory of this repo create the APAeval environment with

conda env create -f apaeval_env.yaml

You can then activate it with:

conda activate apaeval

NOTE: If you're working on Windows or Mac, you might have to google about setting up a virtual machine for running Singularity.

ANOTHER NOTE: If you run into problems regarding root access & Singularity with the described setup, try removing Singularity installation from the apaeval_env.yaml and [install it independently][singularity].

You can now execute the workflows!

Tutorials

Here are some pointers and tutorials for the main software tools that we are using at APAeval:

Conda: [tutorial][tutorial-conda]
Docker: [tutorial][tutorial-docker]
Git: [tutorial][tutorial-git]
GitHub: [general tutorial][tutorial-gh] / [GitHub flow tutorial][tutorial-gh-flow]
Nextflow: [tutorial][tutorial-nextflow]
Singularity: [tutorial][tutorial-singularity]
Snakemake: [tutorial][tutorial

Related Skills

node-connect

331.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

81.5k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

331.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

81.5k

Commit, push, and open a PR