SkillAgentSearch skills...

MixcrFilter

To filter out TCR clones which are falsely recognized by MiXCR from paralog sequences

Install / Use

/learn @littlegq/MixcrFilter
About this skill

Quality Score

0/100

Supported Platforms

Zed

README

MixcrFilter

MiXCR is a powerful computation program for TCR analysis. It can identify human TCR clones from raw sequences in a highly sensitive way. However, when the input sequences were not exclusively from TCR regions, some paralog sequences, such as those from immunoglobulin regions, could be mis-identified as TCR genes. Here we provide a bash script, named MixcrFilter, to filter out TCR clones which were falsely recognized by MiXCR from paralog sequences according to alignment results by BLAT.

Pre-required programs:

  1. MiXCR
  2. BLAT

Input

The sample name and paths to the input files need to be customized by editing Lines 5-10 of MixcrFilter.sh. The required input files are:

  1. Raw RNA-seq reads in FASTQ format: *.R1.fq.gz, *.R2.fq.gz
  2. Reference genome file, perferable hg38, which has better annotation of TCR sequences: hg38.fa

Output

A sub-directary with the files with the sample name as the prefix, and following suffix:

| Suffix | Description | | ----------------- | ----------------------- | | alignments.vdjca | Output from MiXCR align | | clones.clns | Output from MiXCR assemble | | clones.txt | Raw clones from MiXCR exportClones | | cloneseq.fa | FASTA file of all nucleotide sequences of CDR3 region | | cloneseq.psl | BLAT results of *.cloneseq.fa against reference genome | | blat1.clones.txt | Clones excluding the ones which CDR3-containing reads mapped to non-TCR regions | | blat2.clones.txt | Clones excluding the ones which CDR3 nt seq mapped to non-TCR regions | | alignments.txt | Alignment data from MiXCR exportAlignments | | alignedreads.fa | Original reads containing identified CDR3 sequences | | alignedreads.psl | BLAT results of *.alignedreads.fa against reference genome | | falseCDR3AA | A list of AA sequences of CDR3 from falsely mapped reads (Format: AA-seq, # In-target reads, # Out-target reads, In/Out Ratio) | | flt.clones.txt | Filtered clones |

View on GitHub
GitHub Stars4
CategoryDevelopment
Updated8mo ago
Forks2

Languages

R

Security Score

77/100

Audited on Jul 22, 2025

No findings