ViR
ViR is designed to solve the dispersion of reads due to intrasample variability for a reliable identification of lateral gene transfer events, with a focus on viral integrations
Install / Use
/learn @epischedda/ViRREADME
ViR
Pipeline developped by Elisa Pischedda, while in the Bonizzoni Lab at the University of Pavia (Italy).
see the ViR paper in BMC Bioinformatics: Pischedda et al., 2021
Purpose
ViR is designed to solve the dispersion of reads due to intrasample variability for a reliable identification of lateral gene transfer events, with a focus on viral integrations. Repetitive elements and/or fragmentation of a genome assembly result in intra-host variability leading to dispersion of reads across sequence-identical regions of the genome (here after called 'equivalent regions') when mapping short paired end reads to predict sites of an integration event. ViR solve the dispersions of reads by recognizing the membership of spared chimeric reads across equivalent genomic region and reconstructing the insertion site, based on available reads.

ViR is composed of four scripts, which work in two modules. The first three scripts "ViR_RefineCandidates.sh", "ViR_SolveDispersion.sh" and "ViR_AlignToGroup.sh" work together to overcome the dispersion of reads due to intrasample variability (module 1). "ViR_LTFinder.sh" is designed to run independently from the others, when testing for integration events of non-host sequences which have none or limited (threshold defined by the user) sequence similarity to sequences of the host (module 2).
Docker
The Dockerfile of ViR is available. See below the commands to run the scripts.
Installation
Dependencies
ViR was evaluated in Ubuntu 16.04 LTS Linux environment and uses the following programs which need to be installed and their path reported as values of the input parameters of the scripts.
- PYTHON 2.7 https://www.python.org/download/releases/2.7/
- BEDTOOLS v2.25.0 https://github.com/arq5x/bedtools2/releases
- SAMTOOLS 1.4 https://sourceforge.net/projects/samtools/files/samtools/1.4/
- BLAST 2.6.0 ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
- BWA 0.7.15-r1140 https://github.com/lh3/bwa/releases
- TRINITY v2.7.0-PRERELEASE https://github.com/trinityrnaseq/trinityrnaseq/releases
<br>
Module-1: "ViR_RefineCandidates.sh"
<br>
Files preparation
Needed:
-
A sam file (SAM) generated by the alignment of the raw reads of the sample against the host reference genome. Example: './Example_files/ViR_inputFiles/SCL_VI600.sam'. The path of the sam file must be used as value of the parameter
-sam_filein"ViR_RefineCandidates.sh"script; -
A tab separated table of the chimeric reads (TXT). Example: './Example_files/ViR_inputFiles/SCL_VI600_chimeric_reads.txt'. The path of the text file must be used as value of the parameter
-chimeric_reads_filein"ViR_RefineCandidates.sh"script. Example:
#SAMPLE_ID READ_ID HR_CHR HR_START HR_MQ VR_SEQ VIRUS_ID VIRUS_START VIRUS_END VIRAL_SEQ
SSR4 E00338:95:HGNWJCCXY:1:1221:16640:64984 NW_021838865.1 556999 0 GTCATTGCCGCCATCATCAACGGCATTGAGTGGATCCGTGGCATGGGTGAGTGCCGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTTTTTAAACATAGGGGGGTAATCTTCCCCCACGAACCCCCAAAAGAAAGGTTTGGTTGTGTCGGG NC_043569.1 826 863 TTTTTTTTTTTTTTTTTTTTTTTTTTTTCTTTTTAAA
- Host reference genome (FASTA) in fasta format. The path of the fasta file must be used as value of the parameter
-reference_fastain"ViR_RefineCandidates.sh"script. To use the example files please download the latest version of Aedes albopictus reference genome and run the following command to replace the default scaffold names:
cat GCF_006496715.1_Aalbo_primary.1_genomic.fna | awk -F '\\ Aedes' '{print $1}' > GCF_006496715.1_Aalbo_primary.1_genomic _clean.fasta
The BLAST database of the fasta file is needed; you can produce it with the following command:
/absolute_path_to/ncbi-blast-2.6.0+/bin/makeblastdb -in /absolute_path_to/Host_Reference_Genome.fasta -dbtype nucl
<br>
Paramters
Fill the appropriated parameter values; absolute paths will allow to run "ViR_RefineCandidates.sh" from any directory.
| parameter name | description | default | |-----------------------------------------|--------------------------------------------------------------------|---------| | work_files_dir | absolute path of ViR directory | | | sample_name | name of the sample | | | sam_file | absolute path of the SAM file | | | chimeric_reads_file | absolute path of the tab separated table of the chimeric reads | | | out | absolute path of the output directory | | | max_percentage_dinucleotide_in_ViralSeq | maximum percentage of dinucleotide accepted in the viral sequence | 0.8 | | minimum_virus_len | minimum length of the viral sequence | 30 | | reference_fasta | absolute path of the host reference genome | | | path_to_blastn | absolute path of blastn executable | | | path_to_bedtools | absolute path of bedtools executable | | | blastn_evalue | maximum evalue for the reads alignment | 1e-15 | | min_mate_distance | minimum distance between mates to be maintained | 10000 |
<br>Running
Run the following command:
nohup bash /absolute_path_to/VIR-master/ViR_RefineCandidates.sh \
-work_files_dir /absolute_path_to/VIR-master/ \
-sample_name SAMPLE_ID \
-sam_file /absolute_path_to/sample_file.sam \
-chimeric_reads_file /absolute_path_to/sample_chimeric_reads.txt \
-out /absolute_path_to/output_directory_refineCandidates \
-max_percentage_dinucleotide_in_ViralSeq 0.8 \
-minimum_virus_len 30 \
-reference_fasta /absolute_path_to/Host_reference_genome.fasta \
-path_to_blastn /absolute_path_to/blastn \
-path_to_bedtools /absolute_path_to/bedtools \
-blastn_evalue 1e-15 \
-min_mate_distance 10000 &
Docker command
Set the absolute path to the local directory of the reference file (PATH_TO_REF_FILE_DIR); Set the absolute path to the local directory of the output (PATH_TO_OUTPUT_DIR);
docker run \
-v /PATH_TO_OUTPUT_DIR/:/out \
-v /PATH_TO_REF_FILE_DIR/:/ref \
-i pipelinetools:vir \
bash ViR_RefineCandidates.sh \
-work_files_dir ./ \
-sample_name SCL_VI600 \
-sam_file ./Example_files/ViR_inputFiles/SCL_VI600.sam \
-chimeric_reads_file ./Example_files/ViR_inputFiles/SCL_VI600_chimeric_reads.txt \
-out /out \
-max_percentage_dinucleotide_in_ViralSeq 0.8 \
-minimum_virus_len 30 \
-reference_fasta /ref/GCF_006496715.1_Aalbo_primary.1_genomic_clean.fasta \
-path_to_blastn blastn \
-path_to_bedtools bedtools \
-blastn_evalue 1e-15 \
-min_mate_distance 850
<br>
Output
"ViR_RefineCandidates.sh" outputs three files: "Final_ChimericPairs_Info.txt", "Final_HostReads.fasta" and "Final_ViralReads.fasta". Example output files are in the 'RC_OutputFiles' directory.
- The
"Final_ChimericPairs_Info.txt"is a tab delimited file including the following columns for the chimeric reads passing the filters;
# SAMPLE_ID READ_ID HR_CHR HR_START HR_END HR_SEQ HR_FLAG HR_NT_BQ20 HR_MQ VR_SEQ VR_FLAG VR_NT_BQ20 VIRUS_ID VIRUS_START VIRUS_END VIRUS_SEQ VIRUS_SEQ_LEN VR_AlignToRef?
SSR4 E00338:95:HGNWJCCXY:4:1122:28716:42481 NW_021837046.1 23938349 23938500 TGTTCTGGGCGGTGGAACGCCATCAGAAAGTTTTGTCCGCTTGCCTCGAAGCCGTAGCAGCATCAGTATTGTTGGGGGCGTATCGCGGCGTGATCCAAGATCTTCCGCCCGAAGTTTTGGTATTCTGTCTGACAATGGGGTGGAAGACGTC 73 147 0 GCTGCCACCGTCGGATGATTGGCTCTCTCTGCGGCCCAGATAGCCCCAGCGCTCGCTAAGGACACAGTCCAAAACCACACGTTCATTCCAGAGTCGCTGAATGAGCTCTTGGGCTCGCGCCAATTCCACACGAATCTGTAGAGCACGTTC 133 135 M91671.1 4038 4071 CCCAGCGCTCGCTAAGGACACAGTCCAAAACCA 33 Yes
-
The fasta file of the host reads in the sample,
"Final_HostReads.fasta"; -
The fasta file of the viral reads in the sample,
"Final_ViralReads.fasta".
<br>
Module-1: "ViR_SolveDispersion.sh"
(including "ViR_AlignToGroup.sh")
<br>
Files preparation
The followings are needed:
-
The output directory of ViR_RefineCandidates (DIRECTORY) which has been used as value of the parameter
-outin"ViR_RefineCandidates.sh". This directory has to be set as value of-outdict_RefCandin"ViR_SolveDispersion.sh"; -
The list of the samples (TXT) to analyze together. The path of the text file must be used as value of the parameter
-sample_listin"ViR_SolveDispersion.sh"script. A single or multiple samples can be used at the same time. Example: './Example_files/otherFiles/example_SD_sample_list.txt'; -
Host reference genome (FASTA) in fasta format. The path of the fasta file must be used as value of the parameter
-reference_fastain"ViR_SolveDispersion.sh"script. The same reference genome used for ViR_RefineCandidates; -
OPTIONAL Transposable elements (FASTA) in fasta format. The path of the fasta file must be used as value of the parameter
-repreg_fastain"ViR_SolveDispersion.sh"script. If-repreg_fastais used, also set the-min_TE_al_lengthparameter. The BLAST database of the fasta file is needed; you can produce it with the following command:
/absolute_path
Related Skills
openpencil
2.1kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
openpencil
2.1kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
ui-ux-designer
Use this agent when you need to design, implement, or improve user interface components and user experience flows. Examples include: creating new pages or components, improving existing UI layouts, implementing responsive designs, optimizing user interactions, building forms or dashboards, analyzing existing UI through browser snapshots, or when you need to ensure UI components follow design system standards and shadcn/ui best practices.\n\n<example>\nContext: User needs to create a new dashboard page for team management.\nuser: "I need to create a team management dashboard where users can view team members, invite new members, and manage roles"\nassistant: "I'll use the ui-ux-designer agent to design and implement this dashboard with proper UX considerations, using shadcn/ui components and our design system tokens."\n</example>\n\n<example>\nContext: User wants to improve the user experience of an existing form.\nuser: "The signup form feels clunky and users are dropping off. Can you improve it?"\nassistant: "Let me use the ui-ux-designer agent to analyze the current form UX and implement improvements using our design system and shadcn/ui components."\n</example>\n\n<example>\nContext: User wants to evaluate and improve existing UI.\nuser: "Can you take a look at our pricing page and see how we can make it more appealing and user-friendly?"\nassistant: "I'll use the ui-ux-designer agent to take a snapshot of the current pricing page, analyze the UX against Notion-inspired design principles, and implement improvements using our design tokens."\n</example>
ui-ux-pro-max-skill
61.5kAn AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
