SkillAgentSearch skills...

FLARE

RNA edit detection (SAILOR) and peak calling (FLARE)

Install / Use

/learn @YeoLab/FLARE
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

FLARE

FLagging Areas of RNA-editing Enrichment (FLARE)

We present FLagging Areas of RNA-editing Enrichment (FLARE), a Snakemake-based pipeline that uses a statistical approach to determine regions of enriched RNA editing, using SAILOR-derived editing sites as a starting point. FLARE is configurable for use with any type of base pair change – we include with this release of FLARE an update of SAILOR to enable detection of all edit types.

Requirements

  • Your system must be at least Linux Centos7

  • Make sure that the environment your SAILOR and FLARE processing pipelines will be running on have snakemake installed (https://snakemake.readthedocs.io/en/v5.6.0/getting_started/installation.html).

  • You also will need to have Singularity installed for several steps of the pipeline to work. We have created singularity images containing all necessary python packages that will automatically be loaded for you in the course of running the pipeline, as long as your system has singularity installed.

Running the SAILOR (edit site finding) snakemake pipeline

Before you start:

  • To get the known snps bedfile for all chromosome, download the appropriate individual chromosome bedfiles, for example from here: https://ftp.ncbi.nih.gov/snp/organisms/, then combine them all (remembering to remove any header lines, and retain only chrom start end columns), ie:

    for b in $(ls *.bed); do echo $b; tail -n+2 $b | cut -f1,2,3  | sort >> mm10_dbsnp_combined.bed3; done
    

    Make sure that your chromosome nomenclature is the same as in the fasta file you are using ("chr1" vs "1")! This file should then contain lots of lines like this, tab-separated, without headers:

    1 1334235 1334236 ...

Parameters

All SAILOR configuration information must be saved in a .json file with the following contents:

{
  "samples_path": "/path/to/aligned/bams/",
  "samples": [
    "sample1.sortedByCoord.out.bam",
    "sample2.sortedByCoord.out.bam",
    ....,
    etc.
  ],
  "reverse_stranded": true,
  "reference_fasta": "/path/to/fasta/used/to/align/genome.fa",
  "known_snps": "/path/to/known/common/snps/file/for/organism/in/chrom/start/end/format/b151_GRCh38p7_common.bed3",
  "edit_type": "CT", (or "AG", "TC", etc.)
  "output_dir": "/path/to/output/directory"
}

Create your .json config file and call it something sensible based on your experiment, for example 'sailor.json'. Multiple .bam files contained in one directory can be processed with one run of the SAILOR pipeline.

In order to run a snakemake pipeline, there a few parameters that snakemake needs to know about. The first is which Snakefile to use -- the Snakefile contains the instructions for running each step of the pipeline, and for the SAILOR pipeline will be found at your local version of /FLARE/workflow_sailor/Snakefile. The second is which config file to use -- this is the config file you just created, which contains the parameters particular to your run of the pipeline. Always absolute paths. You will also need to tell snakemake to use singularity, and specify singularity arugments allowing the virtual environments to have access to your local filesystem -- the "bind" parameters should reflect locations that the pipeline should have access to, for example folders containing relevant input bams, fastas, gtfs or dbsnp files. So, an example snakemake run could like like the following:

snakemake --snakefile /full/path/to/FLARE/workflow_sailor/Snakefile --configfile /full/path/to/your/config/file/sailor.json --verbose --use-singularity --singularity-args '--bind /home --bind /projects' -j1

However, this will launch the snakemake pipeline on your head node, and all subsequent snakemake jobs (which can number into the hundreds depending on how many samples you are processing) will also be run there.

Instructions on HPCC (cluster with submission):

If you have access to a high performance compute cluster, you probably want jobs to be automatically submitted to this system instead so that you can take full advantage of the parallelization built into this pipeline, especially if you are analyzing many samples. Cluster submission information can be placed into a "profile" file. In this case, you can model your profile file on the file at /full/path/to/FLARE/profiles/tscc_sailor/config.yaml, which by default has the following contents:

cluster: "qsub -N {rule}.{wildcards} -l nodes=1:ppn={params.threads},walltime={params.run_time} -A yeo-group -q home -V -t 0"
verbose: true
notemp: false
latency: 300
printshellcmds: true
directory: .
snakefile: /path/to/FLARE/workflow_sailor/Snakefile
use-singularity: true
singularity-args: '--bind /oasis --bind /projects --bind /home'
jobs: 8
skip-script-cleanup: true
singularity-prefix: /projects/ps-yeolab4/software/stamp/0.99.0/bin/.singularity
nolock: true
  • Change "cluster" to be relevant to your cluster system, specifically changing the names of the parameters to match your system's requirements.
    • When in doubt, just hardcode the nodes (in this example 'ppn') to be equal to 1 and run time (in this example 'walltime') values at 5 hours. You can increase the walltime value if your job is running out of time.
  • Change "directory" to be the full path to the directory where you want log files for each step to be deposited during the snakemake run
  • Change "snakefile" to the absolute path for your version of the workflow_sailor Snakefile, as mentioned earlier
  • Change "singularity-args" to include the correct directory binding relevant to your system so snakemake can find all necessary files
  • Change "singularity-prefix" to reflect the absolute path to where you want the singularity images used for the run to be stored (should have a lot of space)
  • Change "jobs" to reflect the maximum number of jobs you want submitted to your cluster simulataneously

Put your profile configuration .yaml file in a new folder you can call /full/path/to/FLARE/profiles/my_profile/, for example. With more run information tucked away into the profile file, the snakemake launch command becomes simpler as it can reference the parameters from this profile file (note that it is actually the folder containing the profile.yaml file that is specified, not the file itself):

snakemake --profile /full/path/to/FLARE/profiles/my_profile/ --configfile /full/path/to/your/config/file/sailor.json

Running that command should launch your SAILOR run. An example set of completed outputs from a successful SAILOR run, using the small .bam file and config inputs found in the "examples" folder, looks like this. Note that these folders may not necessarily appear chronologically in an order matching their number. The final outputs are the .ranked.bed files.

1_split_strands
3_index_reads
4_filter_reads
5_pileup_reads
6_vcfs
7_scored_outputs
8_bw_and_bam
9_edit_fraction_bedgraphs
subsampled.bam.combined.readfiltered.formatted.varfiltered.snpfiltered.ranked.bed

Running the FLARE (cluster identification) snakemake pipeline

FLARE can be run in two modes:

  • Cluster Identification Mode:
      • Determines edit cluster locations along with a confidence score for prioritizing and filtering clusters
  • Edit Fraction Mode:
      • Calculates edit fractions within predetermined genomic regions

To give a scenario where you might find the second mode of use, let's say you are comparing editing between two samples, A and B, expected to have different levels of editing, or different edited locations. You would first use the Cluster Identification Mode to find the edit clusters in A and B. Then, after inspecting the resulting clusters and possibly filtering them, you can make a merged regions file containing the union of edited regions found in A and B. Then, you'd run FLARE again for each sample, but this time in Edit Fraction mode, providing this merged regions file as a parameter. This would generate, for each sample, an output file containing the edit fraction at each of the regions in the merged regions. Downstream analyses can then be implemented to assess statistical significance of differential editing at the same loci across samples or conditions.

Before you start:

To run the FLARE pipeline, you will first need a set of files specifying the genomic regions in which cluster identification will occur. To generate these files, use the script in workflow_FLARE/scripts called generate_regions.py

Copy this script to wherever you'd like to generate the helper files, which will be take up about 8-10 GB of space. Then run the script by typing generate_regions.py <full/path/to/your/genome/gtf/file> <genome_name>_regions

The .gtf file you use should include gene and exon level information, i.e. the third column should at least contain the descriptors "gene" and "exon."

If using the following command, for example: generate_regions.py <full/path/to/your/genome/gtf/mm10.gtf > mm10_regions

Once the script finishes running, you will see a new folder called mm10_regions, and within that folder a slew of files with increasing indices, i.e. mm10_regions_0, mm10_regions_1...

Parameters

All FLARE configuration information must be saved in a .json file with the following contents:

Cluster Identification Mode

{
    "label": "label_for_this_sample",
    "output_folder": "/absolute/path/to/folder/outputs_where_all_samples_will_be_placed/FLARE_outputs/",
    "stamp_sites_file": "/absolute/path/to/sailor/output/this_sample.bam.combined.readfiltered.formatted.varfiltered.snpfiltered.ranked.bed",
    "forward_bw": "/absolute/path/to/sailor/output/8_bw_and_bam/this_sample.sortedByCoord.out.bam.fwd.sorted.bw",
    "reverse_bw": "/absolute/path/to/sailor/output/8_bw_and_bam/this_sample.sortedByCoord.out.bam.rev.sorted.bw",
    "fasta": "/path/to/fasta/used/to/align/genome.fa",
    "regions": "/absolute/path/to/regions/folder",  # Generated using the script described above
    "edit_typ

Related Skills

View on GitHub
GitHub Stars11
CategoryDevelopment
Updated1mo ago
Forks5

Languages

Python

Security Score

75/100

Audited on Feb 10, 2026

No findings