ReliQ — Lattice Field Theory Framework

"We all make choices. But in the end our choices make us." — Andrew Ryan (BioShock)

ReliQ is an experimental lattice field theory framework written in Nim, designed for user-friendliness, performance, reliability, and portability across heterogeneous architectures. Distributed memory is handled through a partitioned global address space model backed by Global Arrays (GA), while device-level parallelism dispatches across three backends — OpenCL, SYCL, and OpenMP — through a single user-facing API.

Early Development — ReliQ is under active development and is not yet production-ready. Contributions are welcome; contact us at reliq-lft@proton.me or follow us on our organization page.

Architecture

ReliQ is organized into layered abstractions, each narrowing scope from global distributed data to device-specific kernel execution:

┌──────────────────────────────────────────────────────────┐
│                       User Code                          │
│            import reliq; each v, n: v[n] = ...           │
├──────────────────────────────────────────────────────────┤
│                    Tensor Layer                          │
│   TensorField ─► LocalTensorField ─► TensorFieldView    │
│   (GA/MPI)        (host buffer)       (device buffers)   │
├──────────────────────────────────────────────────────────┤
│        GlobalShifter · LatticeStencil · Transporter      │
│        discreteLaplacian · applyStencilShift             │
├──────────────────────────────────────────────────────────┤
│                 Backend Dispatch                         │
│     OpenCL (JIT)  │  SYCL (pre-compiled)  │  OpenMP     │
│      (cldisp)     │    (sycldisp)         │  (ompdisp)  │
├──────────────────────────────────────────────────────────┤
│              Memory & Communication                      │
│   Global Arrays · MPI · AoSoA Layout · SIMD Intrinsics   │
└──────────────────────────────────────────────────────────┘

Data Flow

TensorField[D,R,L,T] — A distributed tensor field stored as a Global Array with ghost (halo) regions for boundary communication across MPI ranks.
LocalTensorField[D,R,L,T] — A contiguous host-memory copy of the rank-local partition. Created via newLocalTensorField(); data flows back to the GA on releaseLocalTensorField().
TensorFieldView[L,T] — A device-side view optimized for the active backend (AoSoA layout for SIMD, GPU buffers for OpenCL/SYCL). This is the type the each macro operates on.

Three Compute Backends

| Backend | Flag | Best For | Mechanism | |---------|------|----------|-----------| | OpenCL | (default) | GPUs, FPGAs | JIT kernel compilation at runtime | | SYCL | BACKEND=sycl | Intel GPUs, oneAPI | Pre-compiled C++ template kernels | | OpenMP | BACKEND=openmp | CPU-only | SIMD-vectorized loops (SSE/AVX2/AVX-512) |

All three backends share the same user-facing API — the each macro analyzes loop bodies at compile time and generates the appropriate backend code.

Quick Start

Prerequisites

Python 3.10+ (for the bootstrap/configure scripts and launcher)
A C/C++ compiler (GCC, Clang, or icpx)
MPI implementation (OpenMPI, MPICH, etc.)

Installation

# 1. Clone the repository
git clone https://github.com/reliq-lft/ReliQ.git
cd ReliQ

# 2. Create a build directory
mkdir -p /path/to/build && cd /path/to/build

# 3. Bootstrap dependencies (installs Nim, Global Arrays, Kokkos via Spack)
/path/to/ReliQ/bootstrap

# 4. Configure
/path/to/ReliQ/configure

The bootstrap script performs a local Spack installation and uses it to install Nim 2.2.4, Global Arrays 5.8.2, and Kokkos 4.6.01. All dependencies are installed under <build>/external/.

Building and Running

# Compile a module
make tensor

# Run tests with the parallel launcher
./reliq -e tensor -n 1       # 1 MPI rank
./reliq -e tensor -n 4       # 4 MPI ranks

# Run the full test suite (core + all backends)
make test

The `each` Macro

The each macro is the primary mechanism for expressing computations on lattice fields. It works on TensorFieldView objects and generates optimized backend-specific code at compile time.

import reliq

parallel:
  let lat = newSimpleCubicLattice([8, 8, 8, 16])

  block:
    var fieldA = lat.newTensorField([3, 3]): float64
    var fieldB = lat.newTensorField([3, 3]): float64
    var fieldC = lat.newTensorField([3, 3]): float64

    var localA = fieldA.newLocalTensorField()
    var localB = fieldB.newLocalTensorField()
    var localC = fieldC.newLocalTensorField()

    # Create device views
    var vA = localA.newTensorFieldView(iokRead)
    var vB = localB.newTensorFieldView(iokRead)
    var vC = localC.newTensorFieldView(iokWrite)

    # Dispatch computation across all backend devices
    for n in each 0..<vA.numSites():
      vC[n] = vA[n] + vB[n]          # Element-wise addition
      vC[n] = vA[n] * vB[n]          # Matrix multiplication
      vC[n] = 3.0 * vA[n]            # Scalar multiplication

Stencil Operations in `each` Loops

let stencil = newLatticeStencil(nearestNeighborStencil[4](), lat)

for n in each 0..<vDst.numSites():
  let fwd = stencil.fwd(n, 0)     # Forward x-neighbor
  let bwd = stencil.bwd(n, 0)     # Backward x-neighbor
  vDst[n] = vSrc[fwd] + vSrc[bwd] - 2.0 * vSrc[n]

The `all` Loop (Host-Side)

The all loop operates on LocalTensorField objects for host-side site-level operations using LocalSiteProxy:

var localA = fieldA.newLocalTensorField()
var localB = fieldB.newLocalTensorField()
var localC = fieldC.newLocalTensorField()

for n in all 0..<localC.numSites():
  localC[n] = localA.getSite(n) + localB.getSite(n)
  localC[n] = localA.getSite(n) * localB.getSite(n)
  localC[n] = 2.5 * localA.getSite(n)

# Write changes back to the distributed Global Array
localC.releaseLocalTensorField()

Distributed Transport

GlobalShifter — MPI-Level Transport

For operations that cross MPI partition boundaries at the TensorField level:

parallel:
  let lat = newSimpleCubicLattice([8, 8, 8, 16], [1, 1, 1, 4], [1, 1, 1, 1])

  block:
    var src  = lat.newTensorField([1, 1]): float64
    var dest = lat.newTensorField([1, 1]): float64

    # Shift forward in the t-dimension (crosses MPI boundaries)
    let shifter = newGlobalShifter(src, dim=3, len=1)
    shifter.apply(src, dest)   # dest[x] = src[x + e_t]

    # Discrete Laplacian: sum_mu (f[x+mu] + f[x-mu]) - 2D * f[x]
    var lap = lat.newTensorField([1, 1]): float64
    var scratch = lat.newTensorField([1, 1]): float64
    discreteLaplacian(src, lap, scratch)

Two Transport Layers

| Layer | Type | Communication | |-------|------|---------------| | GlobalShifter | TensorField | GA ghost exchange (MPI) | | Shifter / Transporter | TensorFieldView | Device-side halo buffers |

Use GlobalShifter when working with distributed tensor fields directly (setup, I/O, measurements). Use Shifter when data is already on-device inside each loops.

I/O

ReliQ supports standard lattice QCD file formats:

parallel:
  let lat = newSimpleCubicLattice([8, 8, 8, 16])

  block:
    # Read an ILDG gauge configuration
    var gaugeField: array[4, TensorField[4, 2, typeof(lat), Complex64]]
    for mu in 0..<4:
      gaugeField[mu] = lat.newTensorField([3, 3]): Complex64
    readGaugeField(gaugeField, "config.ildg")

    # Write a tensor field
    var field = lat.newTensorField([3, 3]): float64
    writeTensorField(field, "output.lime")

Supported formats: LIME containers, SciDAC/QIO with XML metadata and checksums, ILDG gauge configurations.

Test Suite

ReliQ has a comprehensive test suite organized into four categories:

make test-core     # Backend-agnostic (lattice, stencil, tensor, transport, I/O)
make test-opencl   # OpenCL backend
make test-openmp   # OpenMP backend with SIMD
make test-sycl     # SYCL backend
make test          # All of the above

Each test module runs at both 1 and 4 MPI ranks. The current suite contains 1,660 tests across all backends with zero failures.

| Suite | Tests | |-------|-------| | Core (backend-agnostic) | 875 | | OpenCL | 245 | | OpenMP | 295 | | SYCL | 245 | | Total | 1,660 |

Documentation

API documentation is available at reliq-lft.github.io/ReliQ. Generate documentation locally from the build directory:

./document

Module Overview

| Module | Description | |--------|-------------| | lattice | SimpleCubicLattice[D], LatticeStencil[D], indexing utilities | | tensor | TensorField, LocalTensorField, TensorFieldView, GlobalShifter | | parallel | Backend-agnostic parallel dispatch (parallel: template, each macro) | | io | LIME/QIO/SciDAC/ILDG file I/O with checksum validation | | globalarrays | Global Arrays FFI bindings, distributed array types, MPI wrappers | | opencl | OpenCL JIT kernel generation and dispatch | | sycl | SYCL pre-compiled kernel dispatch via libreliq_sycl.so | | openmp | OpenMP SIMD-vectorized CPU dispatch | | simd | SimdVec[N,T], SimdLatticeLayout, AoSoA memory layout | | utils | Complex number predicates, comm

ReliQ

Install / Use

README

ReliQ — Lattice Field Theory Framework

Architecture

Data Flow

Three Compute Backends

Quick Start

Prerequisites

Installation

Building and Running

The `each` Macro

Stencil Operations in `each` Loops

The `all` Loop (Host-Side)

Distributed Transport

GlobalShifter — MPI-Level Transport

Two Transport Layers

I/O

Test Suite

Documentation

Module Overview

ReliQ

Install / Use

README

ReliQ — Lattice Field Theory Framework

Architecture

Data Flow

Three Compute Backends

Quick Start

Prerequisites

Installation

Building and Running

The each Macro

Stencil Operations in each Loops

The all Loop (Host-Side)

Distributed Transport

GlobalShifter — MPI-Level Transport

Two Transport Layers

I/O

Test Suite

Documentation

Module Overview

The `each` Macro

Stencil Operations in `each` Loops

The `all` Loop (Host-Side)