Corrfunc
⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
Install / Use
/learn @manodeep/CorrfuncREADME
|logo|
|Release| |PyPI| |MIT licensed| |GitHub CI| |RTD| |Issues|
|CoreInfra| |FAIRSoft|
|Paper I| |Paper II|
Description
This repo contains a suite of codes to calculate correlation functions and
other clustering statistics for simulated galaxies in a cosmological box (co-moving XYZ)
and on observed galaxies with on-sky positions (RA, DEC, CZ). Read the
documentation on corrfunc.rtfd.io <http://corrfunc.rtfd.io/>_.
Why Should You Use it
- Fast Theory pair-counting is 7x faster than
SciPy cKDTree, and at least 2x faster than all existing public codes. - OpenMP Parallel All pair-counting codes can be done in parallel (with strong scaling efficiency >~ 95% up to 10 cores)
- Python Extensions Python extensions allow you to do the compute-heavy bits using C while retaining all of the user-friendliness of Python.
- Weights All correlation functions now support arbitrary, user-specified weights for individual points
- Modular The code is written in a modular fashion and is easily extensible to compute arbitrary clustering statistics.
- Future-proof As we get access to newer instruction-sets, the codes will get updated to use the latest and greatest CPU features.
If you use the codes for your analysis, please star this repo -- that helps us keep track of the number of users.
Benchmark against Existing Codes
Please see this
gist <https://gist.github.com/manodeep/cffd9a5d77510e43ccf0>__ for
some benchmarks with current codes. If you have a pair-counter that you would like to compare, please add in a corresponding function and update the timings.
Installation
Pre-requisites
make >= 3.80- OpenMP capable compiler like
icc,gcc>=4.6orclang >= 3.7. If not available, please disableUSE_OMPoption option intheory.optionsandmocks.options. On a HPC cluster, consult the cluster documentation for how to load a compiler (oftenmodule load gccor similar). If you are using Corrfunc with Anaconda Python, thenconda install gcc(MAC/linux) should work. On MAC,(sudo) port install gcc5is also an option. gsl >= 2.4. On an HPC cluster, consult the cluster documentation (oftenmodule load gslwill work). With Anaconda Python, useconda install -c conda-forge gsl(MAC/linux). On MAC, you can use(sudo) port install gsl(MAC) if necessary.python >= 2.7orpython>=3.4for compiling the CPython extensions.numpy>=1.7for compiling the CPython extensions.
Method 1: Source Installation (Recommended)
::
$ git clone https://github.com/manodeep/Corrfunc.git
$ cd Corrfunc
$ make
$ make install
$ python -m pip install . [--user]
$ make tests # run the C tests
$ python -m pip install pytest
$ python -m pytest # run the Python tests
Assuming you have gcc in your PATH, make and
make install should compile and install the C libraries + Python
extensions within the source directory. If you would like to install the
CPython extensions in your environment, then
python -m pip install . [--user] should be sufficient. If you are primarily
interested in the Python interface, you can condense all of the steps
by using python -m pip install . [--user] --install-option="CC=yourcompiler"
after git clone [...] and cd Corrfunc.
Compilation Notes
- If Python and/or numpy are not available, then the CPython extensions will not be compiled.
- ``make install`` simply copies files into the ``lib/bin/include`` sub-directories. You do not need ``root`` permissions
- Default compiler on MAC is set to ``clang``, if you want to specify a different compiler, you will have to call ``make CC=yourcompiler``, ``make install CC=yourcompiler``, ``make tests CC=yourcompiler`` etc. If you want to permanently change the default compiler, then please edit the `common.mk <common.mk>`__ file in the base directory.
- If you are directly using ``python -m pip install . [--user] --install-option="CC=yourcompiler"``, please run a ``make distclean`` beforehand (especially if switching compilers)
- Please note that Corrfunc is compiling with optimizations for the architecture
it is compiled on. That is, it uses ``gcc -march=native`` or similar.
For this reason, please try to compile Corrfunc on the architecture it will
be run on (usually this is only a concern in heterogeneous compute environments,
like an HPC cluster with multiple node types). In many cases, you can
compile on a more capable architecture (e.g. with AVX-512 support) then
run on a less capable architecture (e.g. with only AVX2), because the
runtime dispatch will select the appropriate kernel. But the non-kernel
elements of Corrfunc may emit AVX-512 instructions due to ``-march=native``.
If an ``Illegal instruction`` error occurs, then you'll need to recompile
on the target architecture.
Installation notes
If compilation went smoothly, please run make tests to ensure the
code is working correctly. Depending on the hardware and compilation
options, the tests might take more than a few minutes. Note that the
tests are exhaustive and not traditional unit tests.
For Python tests, please run python -m pip install pytest and python -m pytest
from the Corrfunc root dir.
While we have tried to ensure that the package compiles and runs out of
the box, cross-platform compatibility turns out to be incredibly hard.
If you run into any issues during compilation and you have all of the
pre-requisites, please see the FAQ <FAQ>__ or email the Corrfunc mailing list <mailto:corrfunc@googlegroups.com>__. Also, feel free to create a new issue
with the Installation label.
Method 2: pip installation
The Python package is directly installable via python -m pip install Corrfunc. However, in that case you will lose the ability to recompile the code. This usually fine if you are only using the Python interface and are on a single machine, like a laptop. For usage on a cluster or other environment with multiple CPU architectures, you may find it more useful to use the Source Installation method above in case you need to compile for a different architecture later.
Testing a pip-installed Corrfunc
You can check that a pip-installed Corrfunc is working with:
::
$ python -m pytest --pyargs Corrfunc
The pip installation does not include all of the test data contained in the main repo,
since it would total over 100 MB and the tests that generate on-the-fly data are similarly
exhaustive. pytest will mark tests where the data files are not availabe as "skipped".
If you would like to run the data-based tests, please use the Source Installation method.
OpenMP on OSX
--------------
Automatically detecting OpenMP support from the compiler and the runtime is a
bit tricky. If you run into any issues compiling (or running) with OpenMP,
please refer to the `FAQ <FAQ>`__ for potential solutions.
Clustering Measures on simulated galaxies
=========================================
Input data
----------
The input galaxies (or any discrete distribution of points) are derived from a
simulation. For instance, the galaxies could be a result of an Halo Occupation
Distribution (HOD) model, a Subhalo Abundance matching (SHAM) model, a
Semi-Empirical model (SEM), or a Semi-Analytic model (SAM) etc. The input set of
points can also be the dark matter halos, or the dark matter particles from
a cosmological simulation. The input set of points are expected to have
positions specified in Cartesian XYZ.
Types of available clustering statistics
----------------------------------------
All codes that work on cosmological boxes with co-moving positions are
located in the ``theory`` directory. The various clustering measures
are:
1. ``DD`` -- Measures auto/cross-correlations between two boxes.
The boxes do not need to be cubes.
2. ``xi`` -- Measures 3-d auto-correlation in a cubic cosmological box.
Assumes PERIODIC boundary conditions.
3. ``wp`` -- Measures auto 2-d point projected correlation function in a
cubic cosmological box. Assumes PERIODIC boundary conditions.
4. ``DDrppi`` -- Measures the auto/cross correlation function between
two boxes. The boxes do not need to be cubes.
5. ``DDsmu`` -- Measures the auto/cross correlation function between
two boxes. The boxes do not need to be cubes.
6. ``vpf`` -- Measures the void probability function + counts-in-cells.
Clustering measures on observed galaxies
========================================
Input data
----------
The input galaxies are typically observed galaxies coming from a large-scale
galaxy survey. In addition, simulated galaxies that have been projected onto the sky
(i.e., where observational systematics have been incorporated and on-sky
positions have been generated) can also be used. We generically refer to both
these kinds of galaxies as "mocks".
The input galaxies are expected to have positions specified in spherical
co-ordinates with at least right ascension (RA) and declination (DEC).
For spatial correlation functions, an approximate "co-moving" distance
(speed of light multiplied by redshift, CZ) is also required.
Types of available clustering statistics
----------------------------------------
All codes that work on mock catalogs (RA, DEC, CZ) are located in the
``mocks`` directory. The various clustering measures are:
1. ``DDrppi_mocks`` -- The standard auto/cross correlation between two data
sets. The outputs, DD, DR and RR can be combined using ``wprp`` to
produce the Landy-Szalay estimator for `wp(rp)`.
2. ``DDsmu_mocks`` -- The standard auto/cross correlation between two data
sets. The outputs, DD, DR and R
