ChASE: a Chebyshev Accelerated Subspace Eigensolver for Dense Eigenproblems

The Chebyshev Accelerated Subspace Eigensolver (ChASE) is a modern and scalable library based on subspace iteration with polynomial acceleration to solve dense Hermitian (Symmetric) and pseudo-Hermitian algebraic eigenvalue problems, especially solving dense eigenproblems arranged in a sequence. Novel to ChASE is the computation of the spectral estimates that enter in the filter and an optimization of the polynomial degree that further reduces the necessary floating-point operations.

ChASE is written in C++ using the modern software engineering concepts that favor a simple integration in application codes and a straightforward portability over heterogeneous platforms. When solving sequences of eigenproblems for a portion of their extremal spectrum, ChASE greatly benefits from the sequence's spectral properties and outperforms direct solvers in many scenarios. The library ships with multiple parallelization schemes, supports NVIDIA GPU acceleration with CUDA, cuBLAS, and cuSOLVER, and distributed GPU execution using NCCL (NVIDIA Collective Communications Library) for optimized multi-GPU communication. ChASE is easily extensible to other parallel computing architectures.

Use Case and Features

Real and Complex: ChASE is templated for real and complex numbers. So it can be used to solve real symmetric eigenproblems as well as complex Hermitian ones.
Hermitian and Pseudo-Hermitian: ChASE supports solving Hermitian and pseudo-Hermitian eigenproblems, including those arising from Bethe-Salpeter Equation (BSE) formulations.
Eigespectrum: ChASE algorithm is designed to solve for the extremal portion of the eigenspectrum of matrix A. The library is particularly efficient when no more than 20% of the extremal portion of the eigenspectrum is sought after. For larger fractions the subspace iteration algorithm may struggle to be competitive. Converge could become an issue for fractions close to or larger than 50%.
Type of Problem: ChASE can currently handle only standard eigenvalue problems.
Sequences: ChASE is particularly efficient when dealing with sequences of eigenvalue problems, where the eigenvectors solving for one problem can be use as input to accelerate the solution of the next one.
Vectors input: Since it is based on subspace iteration, ChASE can receive as input a matrix of vector equal to the number of desired eigenvalues. ChASE can experience substantial speed-ups when this input matrix contains some information about the sought after eigenvectors.
Degree optimization: For a fixed accuracy level, ChASE can optimize the degree of the Chebyshev polynomial filter so as to minimize the number of FLOPs necessary to reach convergence.
Precision: ChASE is also templated to work in Single Precision (SP) or Double Precision (DP).

Builds of ChASE

ChASE supports different builds for different systems with different architectures:

Shared memory build: This is the simplest configuration and should be exclusively selected when ChASE is used on only one computing node or on a single GPU.
MPI+Threads build: On multi-core homogeneous CPU clusters, ChASE is best used in its pure MPI build. In this configuration, ChASE is typically used with one MPI rank per NUMA domain and as many threads as number of available cores per NUMA domain.
Multi-GPU build: ChASE can be configured to take advantage of NVIDIA GPUs on heterogeneous computing clusters with CUDA, cuBLAS, and cuSOLVER. Currently we support the use of one GPU per MPI rank. Multiple-GPU per computing node can be used when MPI rank number per node equals to the GPU number per node.
- NCCL Backend: by default, ChASE uses NCCL (NVIDIA Collective Communications Library) as backend for optimized collective communications across different GPUs.
- CUDA-Aware MPI Backend: alternatively, CUDA-Aware MPI can be used for the communications.

Supported Data types

ChASE supports different data types:

Shared memory build requires dense matrices to be column major.
Distributed-memory build support two types of data distribution of matrix A across 2D MPI/GPU grid:
- Block Distribution: each MPI rank of 2D grid is assigned a block of dense matrix A.
- Block-Cyclic Distribution: an distribution scheme for implementation of dense matrix computations on distributed-memory machines, to improve the load balance of matrix computation if the amount of work differs for different entries of a matrix. For more details, please refer to Netlib .

Quick Start

Installing Dependencies

#Linux Operating System
sudo apt-get install cmake #install CMake
sudo apt-get install build-essential #install GNU Compiler
sudo apt-get install libopenblas-dev #install BLAS and LAPACK
sudo apt-get install libopenmpi-dev #install MPI

#Apple Mac Operating System 
sudo port install cmake #install CMake
sudo port install gcc10 #install GNU Compiler
sudo port select --set gcc mp-gcc10 #Set installed GCC as C compiler
sudo port install OpenBLAS +native #install BLAS and LAPACK
sudo port install openmpi #install MPI
sudo port select --set mpi openmpi-mp-fortran #Set installed MPI as MPI compiler

Cloning ChASE source code

git clone https://github.com/ChASE-library/ChASE #cloning the ChASE repository
git checkout v1.0.0 #it is recommended to check out the latest stable tag.

Building and Installing the ChASE library

cd ChASE/
mkdir build
cd build/
cmake .. -DCMAKE_INSTALL_PREFIX=${ChASEROOT}
make install

More details about the installation on both local machine and clusters, please refer to User Documentation (⚠️To be updated).

Examples

Multiple examples are provided, which helps user get familiar with ChASE.

Build ChASE with Examples requires enable -DCHASE_BUILD_WITH_EXAMPLES=ON flag when compiling ChASE library:

cmake .. -DCHASE_BUILD_WITH_EXAMPLES=ON

5 examples are available in folder examples:

The example 1_hello_world constructs a simple Clement matrix and find a given number of its eigenpairs.
The example 2_input_output provides the configuration of parameters of ChASE from command line (supported by Boost); the parallel I/O which loads the local matrices into the computing nodes in parallel.
The example 3_installation shows the way to link ChASE to other applications.
The example 4_interface shows examples to use the C and Fortran interfaces of ChASE.
The example 5_bse_benchmark shows the benchmark codes for solving Pseudo-Hermitian eigenprolem.

Developers

Main developers

Edoardo Di Napoli – Algorithm design and development
Xinzhe Wu – Algorithm development, advanced parallel (MPI and GPU) implementation and optimization, developer documentation
Clément Richefort - Algorithm development, advanced parallel (MPI and GPU) implementation and optimization, Pseudo-Hermitian project support, Integration of ChASE into the YAMBO code.

Current contributors

Past contributors

Davor Davidović – Advanced parallel GPU implementation and optimization
Nenad Mijić – ARM-based implementation and optimization, CholeskyQR, unitests, parallel IO
Xiao Zhang – Integration of ChASE into Jena BSE code
Miriam Hinzen, Daniel Wortmann – Integration of ChASE into FLEUR code
Sebastian Achilles – Library benchmarking on parallel platforms, documentation
Jan Winkelmann – DoS algorithm development and advanced C++ implementation
Paul Springer – Advanced GPU implementation
Marija Kranjcevic – OpenMP C++ implementation
Josip Zubrinic – Early GPU algorithm development and implementation
Jens Rene Suckert – Lanczos algorithm and GPU implementation
Mario

ChASE

Install / Use

README