SkillAgentSearch skills...

BASALT

Nature Communications | BASALT (Binning Across a Series of Assemblies Toolkit) for binning and refinement of short- and long-read sequencing data

Install / Use

/learn @EMBL-PKU/BASALT
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

BASALT: Binning Across a Series of Assemblies Toolkit

📣 News

  • [2025/12/16]:🤗 We release BASALT V1.2.0 under MIT LICENSE.
  1. Upgrade the python version from 3.8 to 3.12 with new friendly installation script
  2. Add LorBin [Nature Communications, 2025], the current cutting-edged binning to BASALT Toolkit in Extra Binner
  3. Change the default QC evaluation software from CheckM to CheckM2 v1.1.0
  4. Support GPU acceleration for Semibin2 and Deep Learning Model in BASALT
  5. Weight for Deep Learning Model in BASALT can be set anywhere rather than in ~/.cahce by setting ~/.bashrc file
  • [2024/06/12]:🤗 We release BASALT V1.1.0 under MIT LICENSE.

  • [2024/03/11]:🤗 BASALT refines binning from metagenomic data and increases resolution of genome-resolved metagenomic analysis is publised on Nature Communications.

  • [2023/8/18]:🤗 We release BASALT V1.0.0 under MIT LICENSE.

🌉 Workflow of BASALT

<img src="fig/workflow.png" style="zoom: 75%;" />

📊 Type of input data for BASALT

BASALT is a versatile tool with high efficiency for binning and post-binning refinement. BASALT can generate high quality metagenome-assembled genomes (MAGs) from various input data types including: 1) assembly from short-read sequences (SRS); 2) assembly from long-read sequences (LRS); [Note: only PacBio-HiFi data is supported in the current version for long-read only assemblies, other types of LRS data will be available in later versions.] 3) hybrid assembly from SRS + LRS. Specific features of BASALT are listed below:

  1. Multiple assemblies as input with dereplication function BASALT developed a comprehensive method incorporating multiple assembly files, including single assemblies (SAs) and co-assemblies (CAs) in one run. Additionally, a dereplication step is applied after initial binning to efficiently remove redundant bins. Comparatively, prominent binning tools such as metaWRAP [1] and DASTool [2] only support single assembly file as input, where multiple binning processes are required if there are multiple assembly files in a dataset. Moreover, redundant bins generated under SA + CA mode need to be removed using dereplication tools such as dRep [3]. Although BASALT takes longer time than metaWRAP and DASTool in one single run, considerable amount of time will be saved when processing multiple assembly files, and a significantly more and better quality MAGs can be generated by BASALT than other tools, based on our assessment [4].
  2. Standalone Refinement module BASALT can effectively identify and remove potential contamination sequences in Refinement module based on neural networks. Specifically, users can import their bins along with raw sequences to BASALT to run the Refinement module independently without initial binning using BASALT.
  3. High read use efficiency of LRS BASALT maximized the utilization of LRS in the post-binning refinement steps. Firstly, LRS will be used at Sequence retrieval step by recruiting unused sequences to target bins via pair-end tracking. After processing the Sequence retrieval function, an extra polishing step will be performed in the existence of LRS. Polishing at this step will save ~90% of the computation time than conducting at assembly step with same iterations, as the data size is largely reduced. Furthermore, LRS will be exploited again at reassembly step using the SPAdes Hybrid function (default). [Note: reassembly function is not applicable on LRS-alone dataset in the current version of BASALT, but will be available in later version.] Although reassembly may take a considerable amount of time, large augmentation of genome quality can be observed after reassembly. For any issue compiling and running BASALT, as well as bug report, please do not hesitate to contact us (yuke.sz@pku.edu.cn). Thanks for using BASALT!

💻 SYSTEM REQUIREMENTS

  1. Required dependencies

    Linux x64 systems, 8+ cores, and 128GB+ RAM

    Python (>=3.0) modules: biopython, pandas, numpy, scikit-learn, copy, multiprocessing, collections,pytorch, tensorboardx

    Perl

    Java (>=1.7)

    Binning tools: MetaBAT2, Maxbin2, CONCOCT, Semibin2, LorBin

    Note: VAMB was used to be implemented in BASALT, but due to the conflict of environment and unsatisfactory performance on environmental datasets, we temporarily removed VAMB from BASALT environment. However, bins generated using VAMB can still be imported to BASALT directly for post-binning refinements.

    Sequences processing tools: Bowtie2, BWA, SAMtools, Prodigal, BLAST+, HMMER, Minimap2

    Sequences assembly and polishing tools: SPAdes, IDBA-UD, Pilon, Racon, Unicycler

    Genome quality assessment tools: CheckM, CheckM2, pplacer

    Note: CheckM2 database is not compiled along with BASALT installation in v1.0.1. To setup CheckM2 database, please refer to CheckM2 user guide (https://github.com/chklovski/CheckM2).

⏬ BASALT v1.2.0 INSTALLATION

  1. BASALT 1.2.0 installation

    Please refer to the installation guide of BASALT v1.2.0:

    git clone https://github.com/EMBL-PKU/BASALT.git
    
    cd BASALT
    
    conda create -n basalt_env -c conda-forge -c bioconda \     python=3.12 \     megahit metabat2 maxbin2 concoct prodigal semibin \     bedtools blast bowtie2 diamond checkm2 \     unicycler spades samtools racon pplacer pilon \     ncbi-vdb minimap2 miniasm idba hmmer entrez-direct \     biopython uv --yes
    
    conda activate basalt_env
    
    uv pip install tensorflow torch torchvision tensorboard tensorboardx \     lightgbm scikit-learn numpy==1.26.4 python-igr
    aph scipy pandas matplotlib \     cython biolib joblib tqdm requests checkm-genome
    

    Download BASALT Deep Learning Model Weights:

     # please chanage the download path according to your computer environment
     
     python BASALT_models_download.py --path "my_model_folder"
    

    Download BASALT script files and change permission:

    chmod +x install.sh
    
    bash install.sh
    
    chmod +x /path/to/basalt/bin/*
    

    Set environment variables by adding the following lines to your ~/.bashrc file:

    nano ~/.bashrc
    
    export CHECKM2DB=/path/to/checkm2db/CheckM2_database/uniref100.KO.1.dmnd
    export CHECKM_DATA_PATH=/path/to/checkmdb
    export BASALT_WEIGHT=/path/to/BASALT
    
    source ~/.bashrc
    

    The below Google Drive link provide the essential files for checkm_db, checkm2_db and newest singularity image.

    https://drive.google.com/drive/folders/1d0e_2FpYRBAZLwKXl8fA-yDK4b5PBA_E?usp=sharing
    

    ⚠️: Another way to install BASALT in China mainland 以singularity的方式加载BASALT的sif镜像

    将BASALT的singularity镜像(basalt.sif)放置在服务器的home目录下。以执行singularity的命令运行,如

    singularity run basalt.sif BASALT -a as1.fa -s S1_R1.fq,S1_R2.fq/S2_R1.fq,S2_R2.fq -t 32 -m 128
    

    如basalt.sif不在home目录下运行需要添加 -B挂载,如

    # please change /meida/emma according to your path
    singularity run -B /media/emma basalt.sif BASALT -h
    

    需要后台挂载运行,nohup可能会出现意外,但是集群一般sbatch等提交命令的方式可以正常运行。实验室的服务器则考虑使用screen命令。 请严格参考screen命令的执行方式(除非你很熟悉screen,切勿擅自修改命令执行方式)。如

     screen -dmS session_name bash -c 'bash basalt.sh >log_basalt'
    

    请注意session_name要起跟自己有辨识度唯一的名字,避免发生意外情况

    basalt.sif含有checkm1 checkm2 semibin bowtie2 bwa等很多软件,均可以通过以下方式调用:

    singularity run basalt.sif bowtie2 -h
    

⏬ BASALT v1.1.0 INSTALLATION

  1. Quick installation

    Download BASALT_setup.py and run:

python BASALT_setup.py

Please remain patient, as the installation process may take an extended period.

  1. Quick installation from China mainland 从中国内地快速安装BASALT

    For users in China mainland who may experience a network issue, please download the alternative script ‘BASALT_setup_China_mainland.py’ and run:

    中国内地且无法翻墙的用户推荐使用‘BASALT_setup_China_mainland.py’安装

    python BASALT_setup_China_mainland.py
    

    Then, download the trained models for neural networks BASALT.zip from Tencent iCloud (https://share.weiyun.com/r33c2gqa) and run:

    mv BASALT.zip ~/.cache
    cd ~/.cache
    unzip BASALT.zip
    
  2. Manual installation (recommended)

    Install Miniconda (https://docs.anaconda.com/free/miniconda/miniconda-install/) or Anaconda (https://docs.anaconda.com/free/anaconda/install/index.html)

    Add mirrors to increase download speed of BASALT dependent software (optional):

    site=https://mirrors.tuna.tsinghua.edu.cn/anaconda
    conda config --add channels ${site}/pkgs/free/
    conda config --add channels ${site}/pkgs/main/
    conda config --add channels ${site}/cloud/conda-forge/
    conda config --add channels ${site}/cloud/bioconda/
    

    Download the BASALT installation file and create a conda environment:

    git clone https://github.com/EMBL-PKU/BASALT.git
    cd BASALT
    conda env create -n BASALT --file basalt_env.yml
    

    Please remain patient, as the installation process may take an extended period.

    If you have encountered an error, please download 'basalt_env.yml' from Tencent iCloud (https://share.weiyun.com/xXdRiDkl) and create a conda environment:

    conda env create -n BASALT --file basalt_env.yml
    

    After successfully creating the conda environment, change file permissions for BASALT script files:

    chmod -R 777 <PATH_TO_CONDA>/envs/BASALT/bin/*
    

    Example: To easily find your path to conda environments, simply use:

    conda info --envs
    

    and you can find your path to BASALT environment, such as:

    # conda environments:
    #
    base     /home/emma/miniconda3
    BASALT   /home/emma/miniconda3/envs/BASALT
    

    Then, change permission to BASALT script folders:

    chmod -R 777 /home/emma/miniconda/envs/BASALT/bin/*
    

    Download the trained models for neural networks 'BASALT.zip' from FigShare:

    You can also find the BASALT v1.1.0 version BASALT.zip file the previous released version and

View on GitHub
GitHub Stars128
CategoryEducation
Updated1mo ago
Forks19

Languages

Python

Security Score

100/100

Audited on Feb 23, 2026

No findings