SkillAgentSearch skills...

ViR

ViR is designed to solve the dispersion of reads due to intrasample variability for a reliable identification of lateral gene transfer events, with a focus on viral integrations

Install / Use

/learn @epischedda/ViR
About this skill

Quality Score

0/100

Category

Design

Supported Platforms

Universal

README

ViR

Pipeline developped by Elisa Pischedda, while in the Bonizzoni Lab at the University of Pavia (Italy).

see the ViR paper in BMC Bioinformatics: Pischedda et al., 2021

Purpose

ViR is designed to solve the dispersion of reads due to intrasample variability for a reliable identification of lateral gene transfer events, with a focus on viral integrations. Repetitive elements and/or fragmentation of a genome assembly result in intra-host variability leading to dispersion of reads across sequence-identical regions of the genome (here after called 'equivalent regions') when mapping short paired end reads to predict sites of an integration event. ViR solve the dispersions of reads by recognizing the membership of spared chimeric reads across equivalent genomic region and reconstructing the insertion site, based on available reads.

Figure 1: ViR overview

ViR is composed of four scripts, which work in two modules. The first three scripts "ViR_RefineCandidates.sh", "ViR_SolveDispersion.sh" and "ViR_AlignToGroup.sh" work together to overcome the dispersion of reads due to intrasample variability (module 1). "ViR_LTFinder.sh" is designed to run independently from the others, when testing for integration events of non-host sequences which have none or limited (threshold defined by the user) sequence similarity to sequences of the host (module 2).

Docker

The Dockerfile of ViR is available. See below the commands to run the scripts.

Installation

Dependencies

ViR was evaluated in Ubuntu 16.04 LTS Linux environment and uses the following programs which need to be installed and their path reported as values of the input parameters of the scripts.

  • PYTHON 2.7 https://www.python.org/download/releases/2.7/
  • BEDTOOLS v2.25.0 https://github.com/arq5x/bedtools2/releases
  • SAMTOOLS 1.4 https://sourceforge.net/projects/samtools/files/samtools/1.4/
  • BLAST 2.6.0 ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
  • BWA 0.7.15-r1140 https://github.com/lh3/bwa/releases
  • TRINITY v2.7.0-PRERELEASE https://github.com/trinityrnaseq/trinityrnaseq/releases

<br>

Module-1: "ViR_RefineCandidates.sh"

<br>

Files preparation

Needed:

  1. A sam file (SAM) generated by the alignment of the raw reads of the sample against the host reference genome. Example: './Example_files/ViR_inputFiles/SCL_VI600.sam'. The path of the sam file must be used as value of the parameter -sam_file in "ViR_RefineCandidates.sh" script;

  2. A tab separated table of the chimeric reads (TXT). Example: './Example_files/ViR_inputFiles/SCL_VI600_chimeric_reads.txt'. The path of the text file must be used as value of the parameter -chimeric_reads_file in "ViR_RefineCandidates.sh" script. Example:

#SAMPLE_ID	READ_ID	HR_CHR	HR_START	HR_MQ	VR_SEQ	VIRUS_ID	VIRUS_START	VIRUS_END	VIRAL_SEQ
SSR4	E00338:95:HGNWJCCXY:1:1221:16640:64984	NW_021838865.1	556999	0	GTCATTGCCGCCATCATCAACGGCATTGAGTGGATCCGTGGCATGGGTGAGTGCCGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTTTTTAAACATAGGGGGGTAATCTTCCCCCACGAACCCCCAAAAGAAAGGTTTGGTTGTGTCGGG	NC_043569.1	826	863	TTTTTTTTTTTTTTTTTTTTTTTTTTTTCTTTTTAAA
  1. Host reference genome (FASTA) in fasta format. The path of the fasta file must be used as value of the parameter -reference_fasta in "ViR_RefineCandidates.sh" script. To use the example files please download the latest version of Aedes albopictus reference genome and run the following command to replace the default scaffold names:
cat GCF_006496715.1_Aalbo_primary.1_genomic.fna | awk -F '\\ Aedes' '{print $1}' >  GCF_006496715.1_Aalbo_primary.1_genomic _clean.fasta

The BLAST database of the fasta file is needed; you can produce it with the following command:

/absolute_path_to/ncbi-blast-2.6.0+/bin/makeblastdb -in /absolute_path_to/Host_Reference_Genome.fasta -dbtype nucl
<br>

Paramters

Fill the appropriated parameter values; absolute paths will allow to run "ViR_RefineCandidates.sh" from any directory.

| parameter name | description | default | |-----------------------------------------|--------------------------------------------------------------------|---------| | work_files_dir | absolute path of ViR directory | | | sample_name | name of the sample | | | sam_file | absolute path of the SAM file | | | chimeric_reads_file | absolute path of the tab separated table of the chimeric reads | | | out | absolute path of the output directory | | | max_percentage_dinucleotide_in_ViralSeq | maximum percentage of dinucleotide accepted in the viral sequence | 0.8 | | minimum_virus_len | minimum length of the viral sequence | 30 | | reference_fasta | absolute path of the host reference genome | | | path_to_blastn | absolute path of blastn executable | | | path_to_bedtools | absolute path of bedtools executable | | | blastn_evalue | maximum evalue for the reads alignment | 1e-15 | | min_mate_distance | minimum distance between mates to be maintained | 10000 |

<br>

Running

Run the following command:

nohup bash /absolute_path_to/VIR-master/ViR_RefineCandidates.sh \
-work_files_dir /absolute_path_to/VIR-master/ \
-sample_name SAMPLE_ID \
-sam_file /absolute_path_to/sample_file.sam \
-chimeric_reads_file /absolute_path_to/sample_chimeric_reads.txt \
-out /absolute_path_to/output_directory_refineCandidates \
-max_percentage_dinucleotide_in_ViralSeq 0.8 \
-minimum_virus_len 30 \
-reference_fasta /absolute_path_to/Host_reference_genome.fasta \
-path_to_blastn /absolute_path_to/blastn \
-path_to_bedtools /absolute_path_to/bedtools \
-blastn_evalue 1e-15 \
-min_mate_distance 10000 &

Docker command

Set the absolute path to the local directory of the reference file (PATH_TO_REF_FILE_DIR); Set the absolute path to the local directory of the output (PATH_TO_OUTPUT_DIR);


docker run \
-v /PATH_TO_OUTPUT_DIR/:/out \
-v /PATH_TO_REF_FILE_DIR/:/ref \
-i pipelinetools:vir \
bash ViR_RefineCandidates.sh \
-work_files_dir ./ \
-sample_name SCL_VI600 \
-sam_file ./Example_files/ViR_inputFiles/SCL_VI600.sam \
-chimeric_reads_file ./Example_files/ViR_inputFiles/SCL_VI600_chimeric_reads.txt \
-out /out \
-max_percentage_dinucleotide_in_ViralSeq 0.8 \
-minimum_virus_len 30 \
-reference_fasta /ref/GCF_006496715.1_Aalbo_primary.1_genomic_clean.fasta \
-path_to_blastn blastn \
-path_to_bedtools bedtools \
-blastn_evalue 1e-15 \
-min_mate_distance 850
<br>

Output

"ViR_RefineCandidates.sh" outputs three files: "Final_ChimericPairs_Info.txt", "Final_HostReads.fasta" and "Final_ViralReads.fasta". Example output files are in the 'RC_OutputFiles' directory.

  1. The "Final_ChimericPairs_Info.txt" is a tab delimited file including the following columns for the chimeric reads passing the filters;
# SAMPLE_ID	READ_ID	HR_CHR	HR_START	HR_END	HR_SEQ	HR_FLAG	HR_NT_BQ20	HR_MQ	VR_SEQ	VR_FLAG	VR_NT_BQ20	VIRUS_ID	VIRUS_START	VIRUS_END	VIRUS_SEQ	VIRUS_SEQ_LEN	VR_AlignToRef?
SSR4	E00338:95:HGNWJCCXY:4:1122:28716:42481	NW_021837046.1	23938349	23938500	TGTTCTGGGCGGTGGAACGCCATCAGAAAGTTTTGTCCGCTTGCCTCGAAGCCGTAGCAGCATCAGTATTGTTGGGGGCGTATCGCGGCGTGATCCAAGATCTTCCGCCCGAAGTTTTGGTATTCTGTCTGACAATGGGGTGGAAGACGTC	73	147	0	GCTGCCACCGTCGGATGATTGGCTCTCTCTGCGGCCCAGATAGCCCCAGCGCTCGCTAAGGACACAGTCCAAAACCACACGTTCATTCCAGAGTCGCTGAATGAGCTCTTGGGCTCGCGCCAATTCCACACGAATCTGTAGAGCACGTTC	133	135	M91671.1	4038	4071	CCCAGCGCTCGCTAAGGACACAGTCCAAAACCA	33	Yes
  1. The fasta file of the host reads in the sample, "Final_HostReads.fasta";

  2. The fasta file of the viral reads in the sample, "Final_ViralReads.fasta".


<br>

Module-1: "ViR_SolveDispersion.sh"

(including "ViR_AlignToGroup.sh")

<br>

Files preparation

The followings are needed:

  1. The output directory of ViR_RefineCandidates (DIRECTORY) which has been used as value of the parameter -out in "ViR_RefineCandidates.sh". This directory has to be set as value of -outdict_RefCand in "ViR_SolveDispersion.sh";

  2. The list of the samples (TXT) to analyze together. The path of the text file must be used as value of the parameter -sample_list in "ViR_SolveDispersion.sh" script. A single or multiple samples can be used at the same time. Example: './Example_files/otherFiles/example_SD_sample_list.txt';

  3. Host reference genome (FASTA) in fasta format. The path of the fasta file must be used as value of the parameter -reference_fasta in "ViR_SolveDispersion.sh" script. The same reference genome used for ViR_RefineCandidates;

  4. OPTIONAL Transposable elements (FASTA) in fasta format. The path of the fasta file must be used as value of the parameter -repreg_fasta in "ViR_SolveDispersion.sh" script. If -repreg_fasta is used, also set the -min_TE_al_length parameter. The BLAST database of the fasta file is needed; you can produce it with the following command:

/absolute_path

Related Skills

openpencil

2.1k

The world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.

openpencil

2.1k

The world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.

ui-ux-designer

Use this agent when you need to design, implement, or improve user interface components and user experience flows. Examples include: creating new pages or components, improving existing UI layouts, implementing responsive designs, optimizing user interactions, building forms or dashboards, analyzing existing UI through browser snapshots, or when you need to ensure UI components follow design system standards and shadcn/ui best practices.\n\n<example>\nContext: User needs to create a new dashboard page for team management.\nuser: "I need to create a team management dashboard where users can view team members, invite new members, and manage roles"\nassistant: "I'll use the ui-ux-designer agent to design and implement this dashboard with proper UX considerations, using shadcn/ui components and our design system tokens."\n</example>\n\n<example>\nContext: User wants to improve the user experience of an existing form.\nuser: "The signup form feels clunky and users are dropping off. Can you improve it?"\nassistant: "Let me use the ui-ux-designer agent to analyze the current form UX and implement improvements using our design system and shadcn/ui components."\n</example>\n\n<example>\nContext: User wants to evaluate and improve existing UI.\nuser: "Can you take a look at our pricing page and see how we can make it more appealing and user-friendly?"\nassistant: "I'll use the ui-ux-designer agent to take a snapshot of the current pricing page, analyze the UX against Notion-inspired design principles, and implement improvements using our design tokens."\n</example>

ui-ux-pro-max-skill

61.5k

An AI SKILL that provide design intelligence for building professional UI/UX multiple platforms

View on GitHub
GitHub Stars5
CategoryDesign
Updated3y ago
Forks2

Languages

Python

Security Score

55/100

Audited on Jun 29, 2022

No findings