SkillAgentSearch skills...

HiTE

High-precision TE Annotator

Install / Use

/learn @CSU-KangHu/HiTE
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

HiTE: a fast and accurate dynamic boundary adjustment approach for full-length Transposable Elements detection and annotation in Genome Assemblies

HiTE

GitHub GitHub DockerHub DockerHub Conda Nextflow DOI

HiTE is a Python software that uses a dynamic boundary adjustment approach to detect and annotate full-length Transposable Elements in Genome Assemblies. In comparison to other tools, HiTE demonstrates superior performance in detecting a greater number of full-length TEs.

panHiTE

We have developed panHiTE, a comprehensive and accurate pipeline for TE detection in large-scale population genomes. It has been successfully applied to hundreds of plant population genomes, demonstrating its effectiveness and scalability.

For detailed instructions, please refer to the panHiTE tutorial.

For an assessment of panHiTE's performance improvements over existing tools, please refer to the preprint article: https://www.biorxiv.org/content/10.1101/2025.02.15.638472v1.

Table of Contents

<a name="install"></a>Installation

Recommended Hardware requirements: 40 CPU processors, 128 GB RAM.

Recommended OS: (Ubuntu 16.04, CentOS 7, etc.)

<a name="download"></a>Dowload project

git clone https://github.com/CSU-KangHu/HiTE.git
# Alternatively, you can download the zip file directly from the repository.

For common issues related to installation and usage, please visit: https://github.com/CSU-KangHu/HiTE/wiki/Issues-with-installation-and-usage

<a name="install_conda"></a>Option 1. Run with conda (recommended)

# Find the **yml** file in the project directory and run
cd HiTE
conda env create --name HiTE -f environment.yml
conda activate HiTE

python configure.py

source ~/.bashrc # or open a new terminal

# run HiTE
python main.py \
 --genome ${genome} \
 --thread ${thread} \
 --out_dir ${output_dir} \
 [other parameters]
 
 # e.g., my command: python main.py 
 # --genome /home/hukang/HiTE/demo/genome.fa 
 # --thread 40 
 # --out_dir /home/hukang/HiTE/demo/test/

<a name="install_singularity"></a>Option 2. Run with Singularity

The provided container image may not always reflect the latest updates, which can lead to small bugs in some cases.

# pull singularity image (once for all). There will be a HiTE.sif file.
singularity pull HiTE.sif docker://kanghu/hite:3.3.3

# run HiTE
singularity run -B ${host_path}:${container_path} ${pathTo/HiTE.sif} python /HiTE/main.py \
 --genome ${genome} \
 --thread ${thread} \
 --out_dir ${output_dir} \
 [other parameters]
 
 # (1) The option "-B" is used to specify directories to be mounted.
 #     It is recommended to set ${host_path} and ${container_path} to your user directory, and ensure 
 #     that all input and output files are located within the user directory.
 # (2) "python /HiTE/main.py" does not need to be changed.
 
 # e.g., my command: singularity run -B /home/hukang:/home/hukang /home/hukang/HiTE.sif python /HiTE/main.py \
 # --genome /home/hukang/HiTE/demo/genome.fa \
 # --thread 40 \
 # --out_dir /home/hukang/HiTE/demo/test/

<a name="install_docker"></a>Option 3. Run with Docker

The provided container image may not always reflect the latest updates, which can lead to small bugs in some cases.

# pull docker image (once for all).
docker pull kanghu/hite:3.3.3

# run HiTE
docker run -v ${host_path}:${container_path} kanghu/hite:3.3.3 python main.py \
 --genome ${genome} \
 --thread ${thread} \
 --out_dir ${output_dir} \
 [other parameters]
 
 # (1) Since the default working directory is set to "/HiTE", we recommend specifying the options "--genome"
 #     and "--out_dir" as absolute paths.
 # (2) The option "-v" is used to specify directories to be mounted.
 #     It is recommended to set ${host_path} and ${container_path} to your user directory, and ensure 
 #     that all input and output files are located within the user directory.
 
 # e.g., my command: docker run -v /home/hukang:/home/hukang kanghu/hite:3.3.3 python main.py \
 # --genome /home/hukang/HiTE/demo/genome.fa \
 # --thread 40 \
 # --out_dir /home/hukang/HiTE/demo/test/
<!-- For those unable to download images from Docker Hub, we have uploaded the Docker and Singularity images to Zenodo: [https://zenodo.org/records/15761664](https://zenodo.org/records/15761664). ```sh # Load the Docker image docker load -i hite_docker_3.3.3.tar ``` -->

<a name="install_nextflow"></a>Option 4. Run with nextflow

Nextflow is built on top of the popular programming language, Groovy, and supports the execution of workflows on a wide range of computing environments, including local machines, clusters, cloud platforms, and HPC systems. It also provides advanced features such as data provenance tracking, automatic parallelization, error handling, and support for containerization technologies like Docker and Singularity.

We provide a tutorial on how to run HiTE with nextflow.

<a name="demo"></a>Demo data

Check HiTE/demo/genome.fa for demo genome assembly, and run HiTE with demo data (e.g., Conda mode):

python ${pathTo/HiTE}/main.py \
 --genome ${pathTo/genome.fa} \
 --thread 40 \
 --out_dir ${out_dir}

 # e.g., my command: python /home/hukang/HiTE/main.py 
 # --genome /home/hukang/HiTE/demo/genome.fa 
 # --thread 40 
 # --out_dir /home/hukang/HiTE/demo/test/

If the following files exist in the demo/test directory, it means the program runs successfully:

demo/test/
├── confident_helitron.fa
├── confident_other.fa
├── confident_non_ltr.fa
├── confident_tir.fa
├── confident_ltr_cut.fa.cons
└── confident_TE.cons.fa

Click on Outputs for further details.

Note: To avoid automatic deletion of files, set the output path parameter --out_dir to an empty directory.

Predicting conserved protein domains in TEs

To predict conserved protein domains in TEs, you need to add --domain 1 parameter.

The output file is confident_TE.cons.fa.domain, which is shown as follows:

TE_name domain_name     TE_start        TE_end  domain_start    domain_end

N_111   Gypsy-50_SB_1p#LTR/Gypsy        164     4387    1       1410
...

<a name="inputs"></a>Inputs

Required Parameters:

  • --genome. HiTE works with genome assemblies in fasta, fa, and fna formats using the --genome parameter.

Useful Parameters:

  • --curated_lib. HiTE supports users providing a fully trusted curated library, which will be used to pre-mask highly homologous sequences in the genome, thereby reducing the computational load to some extent. We recommend using TE libraries from Repbase and ensuring the format follows >header#class_name.
  • --annotate. Use the TE library generated by HiTE to annotate the genome. This will produce annotation files such as HiTE.out, HiTE.gff, and HiTE.tbl. To generate more detailed information on genome annotation proportions, please refer to https://github.com/CSU-KangHu/HiTE/issues/7.

For other optional parameters, please refer to Usage.

<a name="outputs"></a>Outputs

HiTE outputs many temporary files, which allow you to quickly restore the previous running state (use --recover 1) in case of any interruption during the running process. If the pipeline completes successfully, the output directory should look like the following:

output_dir/
├── longest_repeats_*.fa
├── longest_repeats_*.flanked.fa
├── confident_tir_*.fa
├── confident_helitron_*.fa
├── confident_non_ltr_*.fa
├── confident_other_*.fa
├── confident_ltr_cut.fa
├── confident_TE.cons.fa
├── HiTE.out (require `--annotate 1`)
├── HiTE.gff (require `--annotate 1`)
├── HiTE.tbl (require `--annotate 1`)
├── low_confident_TE.cons.fa
└── all_TE.fa
  1. confident_TE.cons.fa are the classified TE libraries generated by HiTE, which can be used directly as TE library in RepeatMasker by -lib.
  2. longest_repeats_*.fa represents the output of the FMEA algorithm, while longest_repeats_*.flanked.fa extends the sequences at both ends of longest_repeats_*.fa.
  3. confident_tir_*.fa, confident_helitron_*.fa, confident_non_ltr_*.fa represent the identification results of the TIR, Helitron, and non-LTR modules in HiTE respectively, while confident_other_*.fa indicates the identification results of the homology-based non-LTR searching module.
  4. Note that "*" represents the

Related Skills

View on GitHub
GitHub Stars151
CategoryDevelopment
Updated12d ago
Forks7

Languages

Python

Security Score

100/100

Audited on Mar 14, 2026

No findings