SpecHLA

SpecHLA reconstructs entire diploid sequences of HLA genes and infers LOH events. It supports HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 genes. Also, it supports both short- and long-read data.

Generate Convert Improve

Install / Use

/learn @deepomicslab/SpecHLA

About this skill

Quality Score

0/100

README

SpecHLA: full-resolution HLA typing from sequencing data

SpecHLA is a software package leveraging reads binning and local assembly to achieve accurate full-resolution HLA typing and loss-of-heterozygosity detection.

SpecHLA reconstructs diploid sequences of HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 genes.
SpecHLA accepts short-reads-only, long-reads-only, and short+long-reads data.
SpecHLA supports WES, WGS, and RNA-seq data.
SpecHLA detects HLA loss-of-heterozygosity events.

Install

Option 1: Conda (recommended and best build an independent env)

conda create -n spechla -c bioconda -c conda-forge spechla
conda activate spechla

After installation, run SpecHLA with:

spechla -h

For long-read-only typing:

spechla-long-read -h

If you only need long-read mode, you can install a minimal set of dependencies using long_only_env.yml instead of the full package.

Option 2: From source

First, create the env with conda, and activate the env. When running any of the SpecHLA scripts, ensure this is done inside this environment, either through conda run or by activating the environment like below.

git clone https://github.com/deepomicslab/SpecHLA.git --depth 1
cd SpecHLA/
conda env create --prefix=./spechla_env -f environment.yml
conda activate ./spechla_env

Second, make the softwares in bin/ executable.

chmod +x -R bin/*

Third, index the database and install the packages.

unset LD_LIBRARY_PATH && unset LIBRARY_PATH
bash index.sh

Perform SpecHLA with

bash script/whole/SpecHLA.sh -h

With only long reads, run

python3 script/long_read_typing.py -h

Note:

SpecHLA optionally uses Novoalign for improved read mapping. If not detected, bowtie2 is used automatically.
- From source: Put novoalign.lic in the bin/ folder before running bash index.sh.
- Conda install: Install novoalign separately and place novoalign.lic next to the novoalign binary (e.g., $CONDA_PREFIX/bin/novoalign.lic).
If you want to run SpecHLA with only long-read data, there is no need for the Novoalign license and running bash index.sh. You need only construct the environment from conda env create --prefix=./spechla_env -f long_only_env.yml. To extract HLA-related reads, pls use the full environment.
SpecHLA now supports Linux and Windows WSL systems.
SpecHLA does not accept short single-end reads.
In case of failed installation, pls check your GCC version, it should be GCC 9.4.0+, see https://github.com/deepomicslab/SpecHLA/issues/42
Please ensure to clear the previous results before rerunning SpecHLA.
For long-read data, pls use our new tool SpecImmune, which can type HLA, KIR, IG, TCR, CYP genes.

Test

Please go to the example/ folder, run SpecHLA with given scripts, and find results in the output/. There is also a test_all.sh script which will run all tests sequentially.

Basic Usage

Main functions

| Scripts | Conda command | Description | | ------------------------------ | ------------------------------ | ----------------------------------------------------------------------- | | script/ExtractHLAread.sh | spechla-extract-hla-reads | Extract HLA reads from enrichment-free data. | | script/whole/SpecHLA.sh | spechla | HLA typing with paired-end (PE), PE+long reads, PE+HiC, or PE+10X data. | | script/long_read_typing.py | spechla-long-read | HLA typing with only long-read data. | | script/typing_from_assembly.py | spechla-assembly | HLA Typing from diploid assemblies. | | script/cal.hla.copy.pl | spechla-loh | Detect HLA LOH events based on SpecHLA's typing results. |

Extract HLA-related reads

First extract HLA reads with enrichment-free data. Otherwise, HLA typing would be slow. Map reads to hg19 or hg38, then use script/ExtractHLAread.sh (or spechla-extract-hla-reads with conda) to extract HLA-related reads. We use the script of Kourami with minor revision for this step. Extract HLA-related reads by

# Conda
spechla-extract-hla-reads -s <sample_id> -b <bamfile> -r <refGenome> -o <outdir>

# From source
USAGE: bash script/ExtractHLAread.sh -s <sample_id> -b <bamfile> -r <refGenome> -o <outdir>

 -s          : desired sample name (ex: NA12878) [required]

 -b          : sorted and indexed bam or cram (ex: NA12878.bam) [required]

 -r          : hg38 or hg19

 -o          : folder to save extracted reads [required]

HLA Typing

Full-resolution and exon HLA typing using SpecHLA. With Exome data like WES or RNASeq, only support exon typing. For efficient HLA typing, we strongly recommend utilizing only HLA-related reads. Specifically, for enrichment-free data, we recommend first performing the aforementioned step.

Perform full-resolution HLA typing with paired-end reads by

# Conda
spechla -n <sample> -1 <sample.fq.1.gz> -2 <sample.fq.2.gz> -o outdir/

# From source
bash script/whole/SpecHLA.sh -n <sample> -1 <sample.fq.1.gz> -2 <sample.fq.2.gz> -o outdir/

Perform exon HLA typing with paired-end reads by

spechla -n <sample> -1 <sample.fq.1.gz> -2 <sample.fq.2.gz> -o outdir/ -u 1

Perform full-resolution HLA typing with paired-end reads and PacBio reads by

spechla -n <sample> -t <sample.pacbio.fq.gz> -1 <sample.fq.1.gz> -2 <sample.fq.2.gz> -o outdir/

Perform full-resolution HLA typing with paired-end reads and Nanopore reads by

spechla -n <sample> -e <sample.nanopore.fq.gz> -1 <sample.fq.1.gz> -2 <sample.fq.2.gz> -o outdir/

Perform full-resolution HLA typing with paired-end reads and Hi-C reads by

bash script/whole/SpecHLA.sh -n <sample> -c <sample.hic.fwd.fq.gz> -d <sample.hic.rev.fq.gz> -1 <sample.fq.1.gz> -2 <sample.fq.2.gz> -o outdir/

Perform full-resolution HLA typing with paired-end reads and 10X linked reads by (LongRanger should be installed in system env)

bash script/whole/SpecHLA.sh -n <sample> -x <sample.10x.read.folder> -1 <sample.fq.1.gz> -2 <sample.fq.2.gz> -o outdir/

Consider long Indels and use population information for annotation by

bash script/whole/SpecHLA.sh -n <sample> -v True -p <Asia> -1 <sample.fq.1.gz> -2 <sample.fq.2.gz> -o outdir/

Full arguments can be seen in

SpecHLA: Full-resolution HLA typing from sequencing data.

Note:
  1) Use HLA reads only, otherwise, it would be slow. Use ExtractHLAread.sh to extract HLA reads first.
  2) WGS, WES, and RNASeq data are supported.
  3) With Exome data like WES or RNASeq, must select exon typing  (-u 0).
  4) Short single-end read data are not supported.

Usage:
  bash SpecHLA.sh -n <sample> -1 <sample.fq.1.gz> -2 <sample.fq.2.gz> -o <outdir>

Options:
  -n        Sample ID. <required>
  -1        The first fastq file of paired-end data. <required>
  -2        The second fastq file of paired-end data. <required>
  -o        The output folder to store the typing results. Default is ./output
  -u        Choose full-length or exon typing [0|1]. 0 indicates full-length, 1 means exon,
            default is 0. With Exome or RNA data, must select 1 (i.e., exon typing).
  -p        The population of the sample [Asian, Black, Caucasian, Unknown, nonuse] for annotation.
            Default is Unknown, meaning use mean allele frequency in all populations. nonuse indicates
            only adopting mapping score and considering zero-frequency alleles.
  -j        Number of threads [5].
  -t        Pacbio fastq file.
  -e        Nanopore fastq file.
  -c        fwd hi-c fastq file.
  -d        rev hi-c fastq file.
  -x        Path of folder created by 10x demultiplexing. Prefix of the filenames of FASTQs
            should be the same as Sample ID. Please install Longranger in the system env.
  -w        The weight to use allele imbalance info for phasing [0-1]. Default is 0 that means
            not use. 1 means only use imbalance info; other values integrate reads and allele imbalance.
  -m        The maximum mismatch number tolerated in assigning gene-specific reads. Deault
            is 2. It should be set larger to infer novel alleles.
  -y        The minimum different mapping score between the best and second-best aligned genes.
            Discard the read if the score is lower than this value. Deault is 0.1.
  -v        True or False. Consider long InDels if True, else only consider small variants.
            Default is False.
  -q        Minimum variant quality. Default is 0.01. Set it larger in high quality samples.
  -s        Minimum variant depth. Default is 5.
  -a        Use this long InDel file if provided.
  -r        The minimum Minor Allele Frequency (MAF), default is 0.05 for full length and
            0.1 for exon typing.
  -k        The mean depth in a window lower than this value will be masked by N, default is 5.
            Set 0 to avoid masking.
  -z        Whether only mask exon region, True or False, default is False.
  -f        The trio infromation; child:parent_1:parent_2 [Example: NA12878:NA12891:NA12892]. If provided,
            use trio info to improve typing. Note: use it after performing SpecHLA once already.
  -b        Whether use database for unlinked block phasing [0|1], default is 1 (i.e., use).
  -i        Location of the IMGT/HLA database folder, default is db.
  -l        Whether remove all tmp files [0|1], default is 1.
  -h        Show this message.

HLA typing with long-read data alone

Perform HLA typing only with long reads by

# Conda
spechl

Related Skills

pestel-analysis

Analyze political, economic, social, technological, environmental, and legal forces

A beautifully designed, floating Pomodoro timer that respects your workspace.

product-manager-skills

PM skill for Claude Code, Codex, Cursor, and Windsurf: diagnose SaaS metrics, critique PRDs, plan roadmaps, run discovery, and coach PM career transitions.

task.tpl

use this ALWAYS to create/update a task in json