XSBench is a mini-app representing a key computational kernel of the Monte Carlo neutron transport algorithm. Specifically, XSBench represents the continuous energy macroscopic neutron cross section lookup kernel. XSBench serves as a lightweight stand-in for full neutron transport applications like OpenMC, and is a useful tool for performance analysis on high performance computing architectures.

Understanding the FOM
Selecting a Programming Model
Compilation
Running XSBench / Command Line Interface
Feature Discussion
Theory & Algorithms
- Transport Simulation Styles
  - History-Based Transport
  - Event-Based Transport
- Cross Section Lookup Methods
Optimized Kernels
Citing XSBench
Development Team

Understanding the XSBench Figure of Merit

The figure of merit (FOM) in XSBench is the number of macroscopic cross section lookups performed per second. This value includes only the time spent performing the cross section lookups themselves. It is intended to exclude any time spent on program initialization (e.g., generating randomized cross section data) and data movement to/from device memory spaces. In some of the GPU implementations of XSBench, timing data may be limited to host CPU timers, in which case they sometimes may include data movement costs to and from the device memory space. Thus, it is recommended to use device kernel timers (e.g., nsys, nvprof, rocprof, iprof, etc) to ascertain the true kernel runtime of the simulation loop alone. The true FOM can then be computing by dividing the number of lookups (reported by XSBench at the end of the simulation) by the kernel runtime.

In some cases, users may want to extend the runtime of XSBench (e.g., for investigating thermal effects that only arise after several seconds of execution). The best way of doing this while still representing a realistic cross section lookup kernel is to increase the number of lookups (in event-based mode), or particle histories (in history-based mode), as described in the running XSBench section below.

Selecting A Programming Model

XSBench has been implemented in multiple different languages to target a variety of computational architectures and accelerators. The available implementations can be found in their own directories:

XSBench/openmp-threading This is the "default" version of XSBench that is appropriate for serial and multicore CPU architectures. The method of parallelism is via the OpenMP threading model.
XSBench/openmp-offload This method of parallelism uses OpenMP 4.5 (or newer) to map program data to a remote accelerator memory space and run targeted kernels on the accelerator. This method of parallelism could be used for a wide variety of architectures (besides CPUs) that support OpenMP 4.5 targeting. NOTE: The Makefile will likely not work by default and will need to be adjusted to utilize your OpenMP accelerator compiler.
XSBench/cuda This version of XSBench is written in CUDA for use with NVIDIA GPU architectures. NOTE: You will likely want to specify in the makefile the SM version for the card you are running on.
XSBench/opencl This version of XSBench is written in OpenCL, and can be used for CPU, GPU, FPGA, or other architectures that support OpenCL. It was written with GPUs in mind, so if running on other architectures you may need to heavily re-optimize the code. You will also likely need to edit the makefile to supply the path to your OpenCL compiler.
XSBench/sycl This version of XSBench is written in SYCL, and can be used for CPU, GPU, FPGA, or other architectures that support OpenCL and SYCL. It was written with GPUs in mind, so if running on other architectures you may need to heavily re-optimize the code. You will also likely need to edit the makefile to supply the path to your SYCL compiler.
XSBench/hip This version of XSBench is written in HIP for use with GPU architectures. This version is derived from CUDA using an automatic conversion tool with only a few small manual changes.

Compilation

To compile XSBench with default settings, navigate to your selected source directory and use the following command:

make

You can alter compiler settings in the included Makefile. Alternatively, for the OpenMP threading version of XSBench you may specify a compiler via the CC environment variable and then making as normal, e.g.:

export CC=clang
make

Debugging, Optimization & Profiling

There are also a number of switches that can be set in the makefile. Here is a sample of the control panel at the top of the makefile:

OPTIMIZE = yes
DEBUG    = no
PROFILE  = no
MPI      = no
AML      = no

Optimization enables the -O3 optimization flag.
Debugging enables the -g flag.
Profiling enables the -pg flag. When profiling the code, you may wish to significantly increase the number of lookups (with the -l flag) in order to wash out the initialization phase of the code.
MPI enables MPI support in the code.
AML enables AML optimizations. See AML Optimizations.

Running XSBench

To run XSBench with default settings, use the following command:

./XSBench

For non-default settings, XSBench supports the following command line options:

| Argument |Description | Options | Default |-------------|------------|---------------|------------| |-t |# of OpenMP threads| integer value | System Default| |-m |Simulation method| history, event| history| |-s | Problem Size | small, large, XL, XXL | large| |-g | # of gridpoints per nuclide (overrides -s defaults) | integer value| 11,303 | -G | Grid search type | unionized, nuclide, hash | unionized | -p | # of particle histories (if running using "history" method)| integer value | 500,000 | -l | # of Cross-section (XS) lookups. If using history based method, this is lookups per particle history. If using event-based method, this is total lookups. | integer value | (History: 34) (Event: 17,000,000) | -h | # of hash bins (only used with hash-based grid search) | integer value | 10,000 | -b | Read/Write binary files | read, write | | -k | Optimized kernel ID | integer value | 0

-t [threads] Sets the number of OpenMP threads to run. By default, XSBench will run with 1 thread per hardware core. If the architecture supports hyperthreading, multiple threads will be run per core. If running in MPI mode, this will be the number of threads per MPI rank. This argument is only used by the OpenMP threading version of XSBench.
-m [simulation method] Sets the simulation method, either "history" or "event". These options represent the history based or event based algorithms respectively. The default is the history based method. These two methods represent different methods of parallelizing the Monte Carlo transport method. In the history based method, the central mode of parallelism is expressed over particles, which each require some number of macroscopic cross sections to be executed in series and in a dependent order. The event based method expresses its parallelism over a large pool of independent macroscopic cross section lookups that can be executed in any order without dependence. They key difference between the two methods is the dependence/independence of the macroscopic cross section loop. See the Transport Simulation Styles section for more information.
-s [size] Sets the size of the Hoogenboom-Martin reactor model. There are four options: 'small', 'large', 'XL', and 'XXL'. By default, the 'large' option is selected. The H-M size corresponds to the number of nuclides present in the fuel region. The small version has 34 fuel nuclides, whereas the large version has 321 fuel nuclides. This significantly slows down the runtime of the program as the data structures are much larger, and more lookups are required whenever a lookup occurs in a fuel material. Note that the program defaults to "Large" if no specification is made. The additional size options, "XL" and "XXL", do not directly correspond to any particular physical model. They are similar to the H-M "large" option, except the number of gridpoints per nuclide has been increased greatly. This creates an extremely large energy grid data structure (XL: 120GB, XXL: 252GB), which is unlikely to fit on a single node, but is useful for experimentation purposes on novel architectures.
-g [gridpoints] Sets the number of gridpoints per nuclide. By default, this value is set to 11,303. This corresponds to the average number of actual gridpoints per nuclide in the H-M Large model as run by OpenMC with the actual ACE ENDF cross-section data. Note that this option will override the number of default gridpoi

XSBench

Install / Use

README

Table of Contents

Understanding the XSBench Figure of Merit

Selecting A Programming Model

Compilation

Debugging, Optimization & Profiling

Running XSBench