LuisaCompute

teaser

[Documentation] | [Project Page] | [Paper] | [Discord]

LuisaCompute is a high-performance cross-platform computing framework for graphics and beyond.

LuisaCompute is also the rendering framework described in the SIGGRAPH Asia 2022 paper

LuisaRender: A High-Performance Rendering Framework with Layered and Unified Interfaces on Stream Architectures.

See also LuisaRender for the rendering application as described in the paper.

Welcome to join the discussion channel on Discord!

对于中国大陆的用户，也欢迎加入我们的 QQ 群组：1050189593。

LuisaCompute

Overview

LuisaCompute seeks to balance the seemingly ever-conflicting pursuits for unification, programmability, and performance. To achieve this goal, we design three major components:

A domain-specific language (DSL) embedded inside modern C++ for kernel programming exploiting JIT code generation and compilation;
A unified runtime with resource wrappers for cross-platform resource management and command scheduling; and
Multiple optimized backends, including CUDA, DirectX, Metal, and CPU.

To demonstrate the practicality of the system, we also build a Monte Carlo renderer, LuisaRender, atop the framework, which is faster than the state-of-the-art rendering frameworks on modern GPUs.

Embedded Domain-Specific Language

The DSL in our system provides a unified approach to authoring kernels, i.e., programmable computation tasks on the device. Distinct from typical graphics APIs that use standalone shading languages for device code, our system unifies the authoring of both the host-side logic and device-side kernels into the same language, i.e., modern C++.

The implementation purely relies on the C++ language itself, without any custom preprocessing pass or compiler extension. We exploit meta-programming techniques to simulate the syntax, and function/operator overloading to dynamically trace the user-defined kernels. ASTs are constructed during the tracing as an intermediate representation and later handed over to the backends for generating concrete, platform-dependent shader source code.

Example program in the embedded DSL:

Callable to_srgb = [](Float3 x) {
    $if (x <= 0.00031308f) {
        x = 12.92f * x;
    } $else {
        x = 1.055f * pow(x, 1.f / 2.4f) - .055f;
    };
    return x;
};
Kernel2D fill = [&](ImageFloat image) {
    auto coord = dispatch_id().xy();
    auto size = make_float2(dispatch_size().xy());
    auto rg = make_float2(coord) / size;
    // invoke the callable
    auto srgb = to_srgb(make_float3(rg, 1.f));
    image.write(coord, make_float4(srgb, 1.f));
};

Unified Runtime with Resource Wrappers

Like the RHIs in game engines, we introduce an abstract runtime layer to re-unify the fragmented graphics APIs across platforms. It extracts the common concepts and constructs shared by the backend APIs and plays the bridging role between the high-level frontend interfaces and the low-level backend implementations.

On the programming interfaces for users, we provide high-level resource wrappers to ease programming and eliminate boilerplate code. They are strongly and statically typed modern C++ objects, which not only simplify the generation of commands via convenient member methods but also support close interaction with the DSL. Moreover, with the resource usage information in kernels and commands, the runtime automatically probes the dependencies between commands and re-schedules them to improve hardware utilization.

Multiple Backends

The backends are the final realizers of computation. They generate concrete shader sources from the ASTs and compile them into native shaders. They implement the virtual device interfaces with low-level platform-dependent API calls and translate the intermediate command representations into native kernel launches and command dispatches.

Currently, we have 3 working GPU backends for the C++ and Python frontends, based on CUDA, Metal, and DirectX, respectively, and a CPU backend (re-)implemented in Rust for debugging purpose and fallback.

Python Frontend

Besides the native C++ DSL and runtime interfaces, we are also working on a Python frontend and have published early-access packages to PyPI. You may install the pre-built wheels with pip (Python >= 3.10 required):

python -m pip install luisa-python

You may also build your own wheels with pip:

python -m pip wheel <path-to-project> -w <output-dir>

Examples using the Python frontend can be found under src/tests/python.

Note: Due to the different syntax and idioms between Python and C++, the Python frontend does not 1:1 reflects the C++ DSL and APIs. For instance, Python does not have a dedicated reference type qualifier, so we follow the Python idiom that structures and arrays are passed as references to @luisa.func and built-in types (scalar, vector, matrix, etc.) as values by default.

C API and Frontends in Other Languages

We are also making a C API for creating other language bindings and frontends (e.g., in Rust and C#).

Building

Note: LuisaCompute is a rendering framework rather than a renderer itself. It is designed to provide general computation functionalities on modern stream-processing hardware, on which high-performance, cross-platform graphics applications can be easily built. If you would like to just try a Monte Carlo renderer out of the box rather than building one from scratch, please see LuisaRender.

Preparation

Check your hardware and platform. Currently, we support CUDA on Linux and Windows; DirectX on Windows; Metal on macOS; and CPU on all the major platforms. For CUDA, an RTX-enabled graphics card, e.g., NVIDIA RTX 20 and 30 series, is required. For DirectX, a DirectX-12.1 & Shader Model 6.5 compatible graphics card is required.
Prepare the environment and dependencies. We recommend using the latest IDEs, Compilers, XMake/CMake, CUDA drivers, etc. Since we aggressively use new technologies like C++20 and OptiX 8, you may need to, for example, upgrade your VS to 2019 or 2022 and install CUDA 11.7+ and NVIDIA driver R535+.
Clone the repo with the --recursive option:
```
git clone -b next https://github.com/LuisaGroup/LuisaCompute.git/ --recursive
```
Since we use Git submodules to manage third-party dependencies, a --recursive clone is required.
Detailed requirements for each platform are listed in BUILD.md.

Build via the Bootstrap Script

The easiest way to build LuisaCompute is to use the bootstrap script. It can even download and install the required dependencies and build the project.

python bootstrap.py cmake -f cuda -b # build with CUDA backend using CMake
python bootstrap.py cmake -f cuda -b -- -DCMAKE_BUILD_TYPE=RelWithDebInfo # everything after -- will be passed to CMake

You may specify -f all to enable all available features on your platform.

To install certain dependencies, you can use the --install or -i option. For example, to install Rust, you can use:

python bootstrap.py -i rust

Alternatively, the bootstrap script can output a configuration file for build system without actually building the project. This is useful when you want to use the project inside IDE.

python bootstrap.py cmake -f cuda -c -o cmake-build-release # generate CMake configuration in ./cmake-build-release

Please use python bootstrap.py --help for more details.

Build from Source with XMake/CMake

LuisaCompute follows the standard XMake and CMake build process. Please see also BUILD.md for details on platform requirements, configuration options, and other precautions.

Usage

A Minimal Example

Currently, we suggest using LuisaCompute as a submodule. For quick start with CMake, you can find the project template here.

Generally, using LuisaCompute to construct a graphics application basically involves the following steps:

Create a Context and loading a Device plug-in;
Create a Stream for command s

LuisaCompute

Install / Use

README

LuisaCompute

Table of Contents

Overview

Embedded Domain-Specific Language

Unified Runtime with Resource Wrappers

Multiple Backends

Python Frontend

C API and Frontends in Other Languages

Building

Preparation

Build via the Bootstrap Script

Build from Source with XMake/CMake

Usage

A Minimal Example