Cocos
Numeric and scientific computing on GPUs for Python with a NumPy-like API
Install / Use
/learn @michaelnowotny/CocosREADME
Cocos (Core Computational System) - Scientific GPU Computing in Python
Overview
Cocos is a package for numeric and scientific computing on GPUs for Python with a NumPy-like API. It supports both CUDA and OpenCL on Windows, Mac OS, and Linux. Internally, it relies on the ArrayFire C/C++ library. Cocos offers a multi-GPU map-reduce framework. In addition to its numeric functionality, it allows parallel computation of SymPy expressions on the GPU.
Highlights
- Fast vectorized computation on GPUs with a NumPy-like API.
- Multi GPU support via map-reduce.
- High-performance random number generators for beta, chi-square, exponential, gamma, logistic, lognormal, normal, uniform, and Wald distributions. Antithetic random numbers for uniform and normal distributions.
- Provides a GPU equivalent to SymPy's lambdify, which enables numeric evaluation of symbolic SymPy (multi-dimensional array) expressions on the GPU for vectors of input parameters in parallel.
- Adaptation of SciPy's gaussian_kde to the GPU
Table of Contents
- Installation
- Getting Started
- Multi-GPU Computing
- Memory Limitations on the GPU Device
- Examples
5.1. Estimating Pi via Monte Carlo
5.2. Option Pricing in a Stochastic Volatility Model via Monte Carlo
5.3. Numeric evaluation of SymPy array expressions on the GPU
5.4. Kernel Density Estimation - Benchmark
- Functionality
- Limitations and Differences with NumPy
- A Note on Hardware Configurations for Multi-GPU Computing
- License
Installation
0. Prerequisites
NVidia CUDA must be installed on the system.
1. Download and Install Arrayfire
Windows
Linux via Installer
Ubuntu Linux 20.04 and Derivatives via APT
- Follow the instructions here:
https://github.com/arrayfire/arrayfire/wiki/Install-ArrayFire-From-Linux-Package-Managers, concretelysudo apt-key adv --fetch-key https://repo.arrayfire.com/GPG-PUB-KEY-ARRAYFIRE-2020.PUB- Register the ArrayFire repo as a software source for apt-get via
echo "deb [arch=amd64] https://repo.arrayfire.com/ubuntu focal main" | sudo tee /etc/apt/sources.list.d/arrayfire.list
(if your distribution is based on a different version of Ubuntu, you must replace focal with the code name obtained vialsb_release -c) - Update software sources and install ArrayFire via
sudo apt-get update && sudo apt-get install arrayfire
Ubuntu Linux 20.04 and Derivatives via Docker
Docker must be installed on your system. See here for instructions: https://docs.docker.com/engine/install/ubuntu/.
- Set up the nvidia-docker plugin as follows:
- Add nvidia's cryptographic key to apt-get via
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | sudo apt-key add - - Add nvidia's repo to the software sources apt-get is able to access
curl -s -L https://nvidia.github.io/nvidia-container-runtime/ubuntu20.04/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list - Update the software sources and install the nvidia container runtime via
sudo apt-get update && sudo apt-get install nvidia-container-runtime - Stop the Docker daemon via
sudo systemctl stop docker - Restart the Docker daemon via
sudo systemctl start docker
- Add nvidia's cryptographic key to apt-get via
- Build the Docker image from the Dockerfile via
sudo docker build --tag cocos .(this only needs to be done the first time) - Run a Docker container based on the image created in the previous step via
sudo docker run -it --gpus all cocos - To test the installation:
- Navigate to the Monte-Carlo example via
cd examples/monte_carlo_pi - Run Monte-Carlo example via
python3 -m monte_carlo_pi
- Navigate to the Monte-Carlo example via
MacOS up until High Sierra (version 10.13.6)
2. Make sure that your System is able to locate ArrayFire's libraries
ArrayFire's functionality is contained in dynamic libries, dynamic link libraries (.dll) on Windows and shared objects (.so) on Unix.
This step is to ensure that these library files can be located on your system.
On Windows, this can be done by adding %AF_PATH%\lib to the path environment variable.
On Linux and Mac, one can either install (or copy) the ArrayFire libraries and their dependencies
to /usr/local/lib or modify the environment variable LD_LIBRARY_PATH (Linux) or
DYLD_LIBRARY_PATH (MacOS) to include the ArrayFire library directory.
3. Install Cocos via PIP:
<pre> pip install cocos </pre>or
<pre> pip3 install cocos </pre>if not using Anaconda.
To get the latest version, clone the repository from github, open a terminal/command prompt, navigate to the root folder and install via
<pre> pip install . </pre>or
<pre> pip3 install . </pre>if not using Anaconda.
Getting Started
Platform Information:
Print available devices
<pre> import cocos.device as cd cd.info() </pre>Select a device
<pre> cd.ComputeDeviceManager.set_compute_device(0) </pre>First Steps:
<pre> # begin by importing the numerics package import cocos.numerics as cn # create two arrays from lists a = cn.array([[1.0, 2.0], [3.0, 4.0]]) b = cn.array([[5.0], [6.0]]) # print their contents print(a) print(b) # matrix product of b and a c = a @ b print(c) # create an array of normally distributed random numbers d = cn.random.randn(2, 2) print(d) </pre>Multi-GPU Computing:
Cocos provides map-reduce as well as the related map-combine as multi-GPU
programming models. The computations are separated into 'batches' and then distributed
across GPU devices in a pool. Cocos implements multi-GPU support via process-based parallelism.
To run the function my_gpu_function over separate batches of input data on multiple
GPUs in parallel, first create a ComputeDevicePool:
To construct the batches, separate the arguments of the function into
- a list of args lists and (one list per batch)
- a list of kwargs dictionaries (one dictionary per batch)
Run the function in separate batches via map_reduce
The reduction function iteratively aggregates two results from the list of results generated by
my_gpu_function from left to right, beginning at initial_value (i.e. reducing
initial_value and the result of my_gpu_function corresponding to the first batch).
The list of results is in the same order to the list of args and kwargs.
If the function requires input arrays on the GPU, it must be provided to map_reduce as
a NumPy array. The data is then sent to the process managing the GPU assigned to this
batch, where it is moved to the GPU device by a host_to_device_transfer_function.
This function needs to be implemented by the user.
Likewise, results that involve GPU arrays are transferred to the host via a user-supplied
device_to_host_transfer_function and are then sent back to the main process before
reduction takes place.
map_combine is a variation of map_reduce, in which a combination function aggregates the
list of results in a single step.
Please refer to the documentation of cocos.multi_processing.device_pool.ComputeDevicePool.map_reduce as well as
cocos.multi_processing.device_pool.ComputeDevicePool.map_combine for further details.
See 'examples/heston_pricing_multi_gpu_example.py' for a fully worked example.
Memory Limitations on the GPU Device
It is common for modern standard desktop computers to support up to support up to 128GB of RAM. Video cards by contrast only feature a small fraction of VRAM. The consequence is that algorithms that work well on a CPU can experience into memory limitations when run on a GPU device.
In some cases this problem can be resolved by running the computation in batches or chunks and transferring results from the GPU to the host after each batch has been processed.
Using map_reduce_single_device and map_combine_single_device found in
cocos.multi_processing.single_device_batch_processing, computations on a single GPU can be split into chunks
and run sequentially. The interface is modeled after the multi GPU functionality described in the previous section.
Calls to map_reduce_single_device and map_combine_single_device can be nested in a multi GPU computation,
which is how multi GPU evaluation of kernel density estimates is realized in Cocos
(see cocos.scientific.kde.gaussian_kde.evaluate_in_batches_on_multiple_gpus).
Packaged examples:
- Estimating Pi via Monte Carlo
- [Option Pr
