BASALT
Nature Communications | BASALT (Binning Across a Series of Assemblies Toolkit) for binning and refinement of short- and long-read sequencing data
Install / Use
/learn @EMBL-PKU/BASALTREADME
BASALT: Binning Across a Series of Assemblies Toolkit
📣 News
- [2025/12/16]:🤗 We release BASALT V1.2.0 under MIT LICENSE.
- Upgrade the python version from 3.8 to 3.12 with new friendly installation script
- Add LorBin [Nature Communications, 2025], the current cutting-edged binning to BASALT Toolkit in Extra Binner
- Change the default QC evaluation software from CheckM to CheckM2 v1.1.0
- Support GPU acceleration for Semibin2 and Deep Learning Model in BASALT
- Weight for Deep Learning Model in BASALT can be set anywhere rather than in ~/.cahce by setting ~/.bashrc file
-
[2024/06/12]:🤗 We release BASALT V1.1.0 under MIT LICENSE.
-
[2024/03/11]:🤗 BASALT refines binning from metagenomic data and increases resolution of genome-resolved metagenomic analysis is publised on Nature Communications.
-
[2023/8/18]:🤗 We release BASALT V1.0.0 under MIT LICENSE.
🌉 Workflow of BASALT
<img src="fig/workflow.png" style="zoom: 75%;" />📊 Type of input data for BASALT
BASALT is a versatile tool with high efficiency for binning and post-binning refinement. BASALT can generate high quality metagenome-assembled genomes (MAGs) from various input data types including: 1) assembly from short-read sequences (SRS); 2) assembly from long-read sequences (LRS); [Note: only PacBio-HiFi data is supported in the current version for long-read only assemblies, other types of LRS data will be available in later versions.] 3) hybrid assembly from SRS + LRS. Specific features of BASALT are listed below:
- Multiple assemblies as input with dereplication function BASALT developed a comprehensive method incorporating multiple assembly files, including single assemblies (SAs) and co-assemblies (CAs) in one run. Additionally, a dereplication step is applied after initial binning to efficiently remove redundant bins. Comparatively, prominent binning tools such as metaWRAP [1] and DASTool [2] only support single assembly file as input, where multiple binning processes are required if there are multiple assembly files in a dataset. Moreover, redundant bins generated under SA + CA mode need to be removed using dereplication tools such as dRep [3]. Although BASALT takes longer time than metaWRAP and DASTool in one single run, considerable amount of time will be saved when processing multiple assembly files, and a significantly more and better quality MAGs can be generated by BASALT than other tools, based on our assessment [4].
- Standalone Refinement module BASALT can effectively identify and remove potential contamination sequences in Refinement module based on neural networks. Specifically, users can import their bins along with raw sequences to BASALT to run the Refinement module independently without initial binning using BASALT.
- High read use efficiency of LRS BASALT maximized the utilization of LRS in the post-binning refinement steps. Firstly, LRS will be used at Sequence retrieval step by recruiting unused sequences to target bins via pair-end tracking. After processing the Sequence retrieval function, an extra polishing step will be performed in the existence of LRS. Polishing at this step will save ~90% of the computation time than conducting at assembly step with same iterations, as the data size is largely reduced. Furthermore, LRS will be exploited again at reassembly step using the SPAdes Hybrid function (default). [Note: reassembly function is not applicable on LRS-alone dataset in the current version of BASALT, but will be available in later version.] Although reassembly may take a considerable amount of time, large augmentation of genome quality can be observed after reassembly. For any issue compiling and running BASALT, as well as bug report, please do not hesitate to contact us (yuke.sz@pku.edu.cn). Thanks for using BASALT!
💻 SYSTEM REQUIREMENTS
-
Required dependencies
Linux x64 systems, 8+ cores, and 128GB+ RAM
Python (>=3.0) modules: biopython, pandas, numpy, scikit-learn, copy, multiprocessing, collections,pytorch, tensorboardx
Perl
Java (>=1.7)
Binning tools: MetaBAT2, Maxbin2, CONCOCT, Semibin2, LorBin
Note: VAMB was used to be implemented in BASALT, but due to the conflict of environment and unsatisfactory performance on environmental datasets, we temporarily removed VAMB from BASALT environment. However, bins generated using VAMB can still be imported to BASALT directly for post-binning refinements.
Sequences processing tools: Bowtie2, BWA, SAMtools, Prodigal, BLAST+, HMMER, Minimap2
Sequences assembly and polishing tools: SPAdes, IDBA-UD, Pilon, Racon, Unicycler
Genome quality assessment tools: CheckM, CheckM2, pplacer
Note: CheckM2 database is not compiled along with BASALT installation in v1.0.1. To setup CheckM2 database, please refer to CheckM2 user guide (https://github.com/chklovski/CheckM2).
⏬ BASALT v1.2.0 INSTALLATION
-
BASALT 1.2.0 installation
Please refer to the installation guide of BASALT v1.2.0:
git clone https://github.com/EMBL-PKU/BASALT.git cd BASALT conda create -n basalt_env -c conda-forge -c bioconda \ python=3.12 \ megahit metabat2 maxbin2 concoct prodigal semibin \ bedtools blast bowtie2 diamond checkm2 \ unicycler spades samtools racon pplacer pilon \ ncbi-vdb minimap2 miniasm idba hmmer entrez-direct \ biopython uv --yes conda activate basalt_env uv pip install tensorflow torch torchvision tensorboard tensorboardx \ lightgbm scikit-learn numpy==1.26.4 python-igr aph scipy pandas matplotlib \ cython biolib joblib tqdm requests checkm-genomeDownload BASALT Deep Learning Model Weights:
# please chanage the download path according to your computer environment python BASALT_models_download.py --path "my_model_folder"Download BASALT script files and change permission:
chmod +x install.sh bash install.sh chmod +x /path/to/basalt/bin/*Set environment variables by adding the following lines to your ~/.bashrc file:
nano ~/.bashrc export CHECKM2DB=/path/to/checkm2db/CheckM2_database/uniref100.KO.1.dmnd export CHECKM_DATA_PATH=/path/to/checkmdb export BASALT_WEIGHT=/path/to/BASALT source ~/.bashrcThe below Google Drive link provide the essential files for checkm_db, checkm2_db and newest singularity image.
https://drive.google.com/drive/folders/1d0e_2FpYRBAZLwKXl8fA-yDK4b5PBA_E?usp=sharing⚠️: Another way to install BASALT in China mainland 以singularity的方式加载BASALT的sif镜像
将BASALT的singularity镜像(basalt.sif)放置在服务器的home目录下。以执行singularity的命令运行,如
singularity run basalt.sif BASALT -a as1.fa -s S1_R1.fq,S1_R2.fq/S2_R1.fq,S2_R2.fq -t 32 -m 128如basalt.sif不在home目录下运行需要添加 -B挂载,如
# please change /meida/emma according to your path singularity run -B /media/emma basalt.sif BASALT -h需要后台挂载运行,nohup可能会出现意外,但是集群一般sbatch等提交命令的方式可以正常运行。实验室的服务器则考虑使用screen命令。 请严格参考screen命令的执行方式(除非你很熟悉screen,切勿擅自修改命令执行方式)。如
screen -dmS session_name bash -c 'bash basalt.sh >log_basalt'请注意session_name要起跟自己有辨识度唯一的名字,避免发生意外情况
basalt.sif含有checkm1 checkm2 semibin bowtie2 bwa等很多软件,均可以通过以下方式调用:
singularity run basalt.sif bowtie2 -h
⏬ BASALT v1.1.0 INSTALLATION
-
Quick installation
Download BASALT_setup.py and run:
python BASALT_setup.py
Please remain patient, as the installation process may take an extended period.
-
Quick installation from China mainland 从中国内地快速安装BASALT
For users in China mainland who may experience a network issue, please download the alternative script ‘BASALT_setup_China_mainland.py’ and run:
中国内地且无法翻墙的用户推荐使用‘BASALT_setup_China_mainland.py’安装
python BASALT_setup_China_mainland.pyThen, download the trained models for neural networks BASALT.zip from Tencent iCloud (https://share.weiyun.com/r33c2gqa) and run:
mv BASALT.zip ~/.cache cd ~/.cache unzip BASALT.zip -
Manual installation (recommended)
Install Miniconda (https://docs.anaconda.com/free/miniconda/miniconda-install/) or Anaconda (https://docs.anaconda.com/free/anaconda/install/index.html)
Add mirrors to increase download speed of BASALT dependent software (optional):
site=https://mirrors.tuna.tsinghua.edu.cn/anaconda conda config --add channels ${site}/pkgs/free/ conda config --add channels ${site}/pkgs/main/ conda config --add channels ${site}/cloud/conda-forge/ conda config --add channels ${site}/cloud/bioconda/Download the BASALT installation file and create a conda environment:
git clone https://github.com/EMBL-PKU/BASALT.git cd BASALT conda env create -n BASALT --file basalt_env.ymlPlease remain patient, as the installation process may take an extended period.
If you have encountered an error, please download 'basalt_env.yml' from Tencent iCloud (https://share.weiyun.com/xXdRiDkl) and create a conda environment:
conda env create -n BASALT --file basalt_env.ymlAfter successfully creating the conda environment, change file permissions for BASALT script files:
chmod -R 777 <PATH_TO_CONDA>/envs/BASALT/bin/*Example: To easily find your path to conda environments, simply use:
conda info --envsand you can find your path to BASALT environment, such as:
# conda environments: # base /home/emma/miniconda3 BASALT /home/emma/miniconda3/envs/BASALTThen, change permission to BASALT script folders:
chmod -R 777 /home/emma/miniconda/envs/BASALT/bin/*Download the trained models for neural networks 'BASALT.zip' from FigShare:
You can also find the BASALT v1.1.0 version BASALT.zip file the previous released version and
