Gravitation
n-body-simulation performance test suite
Install / Use
/learn @pleiszenburg/GravitationREADME
GRAVITATION
/ɡʁavitaˈt͡si̯oːn/ (German, noun, feminine: gravity)
Citation
Ernst, S.M., and contributors, 2019. gravitation - n-body-simulation performance test suite. https://github.com/pleiszenburg/gravitation
Synopsis
gravitation is a collection of experiments on writing, benchmarking and optimizing n-body simulations, taking the "two language problem" to the extreme.
In science and engineering, it is a prominent scenario to use Python as a high level or "glue" language on top of code written C, Fortran or other "fast" languages. In some cases, a Python project starts out as a Python wrapper around other, possibly older non-Python code. In other cases, functionality is rapidly prototyped in Python. Eventually, performance-critical code-paths are identified, isolated and optimized - or even re-implemented in a second, "faster" language. Either way, there is a great diversity of possible approaches and tools for accelerating Python code and/or combining it with other languages. Depending on a project's requirements, it can be hard to choose the right one(s). Virtually all are based on at least one more programming language other than Python, in one way or the other - hence the "two language problem". gravitation aims to demonstrate a broad selection of approaches and tools. Both their performance and level of implementation complexity are (cautiously) compared based on a (single) typical computation-bound problem: n-body simulations. n-body simulations scale up easily, which in return gives invaluable insights into the scaling behaviors of all demonstrated approaches. They are also fairly trivial targets for parallelization, allowing to study their behaviors on hardware platforms with multiple CPU cores (both heterogeneous and homogeneous), multiple CPU sockets and GPUs or comparable numerical accelerators.
Example
screenshot of interactive benchmark plot - click on the image to see interactive version
The plot above was generated running the commands below on an Intel i5-4570 CPU with a Nvidia GeForce GTX 1050 Ti graphics card. CPython 3.6.7, GCC 8.2.0, Linux 4.15.0 x86_64, Octave 4.2.2, CUDA Toolkit 9.1, Nvidia driver 390.77. CPython, GCC and Linux Kernel are original distribution binaries of Linux Mint 19.1.
# Faster kernels, can work on larger numbers of bodies
gravitation benchmark -p 4 -i 3 -t 5 -k c1a -k c4a -k cp1 -k cp2 -k cy2 -k cy4 -k js1 -k nb1 -k nb2 -k np1 -k np2 -k np3 -k np4 -k oc1 -k oc4 -k pc1 -k pc2 -k torch1 -k ne1 -b 4 14 -l benchmark_long.log
# Slower kernels, should work on smaller numbers of bodies
gravitation benchmark -p 4 -i 3 -t 5 -k py1 -k py2 -k cy1 -b 4 12 -l benchmark_short.log
# Transform logs
gravitation analyze -l benchmark_long.log -o benchmark_long.json
gravitation analyze -l benchmark_short.log -o benchmark_short.json
# Plot data and display in browser
gravitation plot -l benchmark_long.json -l benchmark_short.json -o benchmark.html
Implementation
The design idea is simple: A number of n-body simulation implementations, referred to as kernels, are derived from a single base class (universe_base) written in pure Python. It provides the fundamental Python infrastructure while it can not (and should not) perform a simulation on its own.
gravitation kernels divide each simulation time step into two major stages, updating certain parameters of all bodies:
- Acceleration (i.e. the forces between all pairs of bodies), O(N^2)
- Velocity and position, O(N)
Because of its O(N^2) complexity, stage 1 is the prime target for optimizations and/or re-implementations. Therefore, the base class offers a default implementation for stage 2 only. A kernel [class] derived from the base class must (at least) implement stage 1 on its own. Overloading stage 2 is possible but not required. The "py1" kernel serves as a minimal working reference kernel, implementing stage 1 in pure Python without dependencies.
Kernels must not change the "interface for frond-end code" exposed by the base class. "Python users" or other pieces of Python software must be able to run all kernels transparently without knowing about any of the intricate details of their implementations. Both position and velocity data of every body (also referred to as "point mass") must be accessible after every time step through iterable objects (e.g. Python lists, arrays or numpy arrays). For reference, have a look at the class _point_mass. Kernels must be located in src/gravitation/kernel for being auto-detected.
Certain kernels allow to switch between single precision floating point numbers and double precision floating point numbers, others do not (due to language- or instruction-level restrictions for instance). The infrastructure in question is prepared but not perfect yet and will be improved in future releases. In the meantime, single precision is used by default where possible.
While kernels compute time steps, Python's garbage collector remains switched off. This allows clean results not affected by "randomly occurring" garbage collections. Directly before and after every time step, a garbage collection is triggered "manually". The time required for collecting garbage after a time step has been computed is also (separately) measured and recorded.
Existing Kernels
All of gravitation's kernels reside in the kernel sub-module.
- py*: Pure Python, single-thread (compatible with pypy)
- np*: Both single-thread and parallel, based on numpy
- cp*: numpy-compatible implementations using cupy (GPU/CUDA)
- torch*: almost numpy-compatible implementations using torch (GPU/CUDA)
- nb*: accelerated by numba, both CPU and GPU (CUDA), single-thread
- ne*: accelerated by numexpr, single-thread
- pc*: PyCUDA kernels
- cN*: C backends, both single-thread and parallel, both plain C and SIMD (SSE2) intrinsics
- cy*: Cython backends, both plain Python (compiled) and isolated Cython, both single-thread and parallel
- js*: JavaScript backends, currently single-thread and based on py_mini_racer (V8)
- oc*: Octave backends, very likely Matlab-compatible (not yet tested), based on oct2py, both single-thread and parallel
Desired / Planned Kernels
Contributions are highly welcome:
- Faster (pure) Python implementation(s) - "pure" as in "standard-library only"
- Faster (pure) numpy implementation(s)
- Balanced / optimized combinations of numpy, numba and numexpr (for individually both, smaller and larger numbers of bodies)
- Rust backend(s)
- Go backend(s)
- Swift backend(s) - if this is at all possible
- C backend(s) with AVX(2) and AVX512
- C backend(s) with CUDA (without PyCUDA)
- C backend(s) called through cffi (instead of ctypes)
- C++ backend(s) called through different interfaces
- Faster CUDA backend(s) in general, with or without PyCUDA
- openCL backend(s), any language
- ROCr/ROCm backend(s), any language
- Fortran backend(s)
- Julia backend(s)
- TensorFlow backend(s), for both CPU and GPU - (theoretically) possible
- JavaScript on nodejs
- Faster JavaScript in general
- Parallel JavaScript with workers
- Matlab on original Matlab interpreter (not Octave)
- Lisp backend(s)
- Parallel backend(s) based on MPI (in any language)
- Parallel backend(s) based on Dask (in any language)
Kernel-FAQ
Does actual physics matter? No. gravitation is certainly not about physics.
Does numerical accuracy matter? Yes and no. In certain applications, numerical accuracy is more desirable than speed. In other cases, it is the opposite or somewhere in between. Studying the impact of various trade-offs with respect to both speed and accuracy is therefore highly interesting.
What about "optimizations" such as e.g. tree methods, for instance Barnes–Hut? This is not what gravitation is about. gravitation is intentionally written as a direct n-body simulation where forces are computed for all pairs of bodies (in time steps of equal length).
Why is JavaScript even on this list? At first, it seemed like a crazy experiment. But after some initial tests with V8 and Mozilla's latest monkey, it became obvious that JavaScript engines had come a long way. The results were simply impressive. Why should one use it? Well, the basic argument is that JavaScript currently is the most widely used programming language in existence, for better or for worse. JavaScript development skills are therefore relatively easy to get hold of. There are even books about how to use it for research projects including numerical computations, e.g. "JavaScript versus Data Science" aka. "JavaScript for Scientists and Engineers".
Why is cffi desired if there is already a kernel using ctypes? From a functional point of view, cffi and ctypes do not differ much. However, they differ in both code complexity and performance. The differences when scaling up are highly interesting.
What about different compilers and compiler versions? This is yet another interesting dimension that is intended to be added to the benchmark infrastructure. The project's C c

