Chia VDF

Building a wheel

Compiling chiavdf requires cmake, boost and GMP/MPIR.

python3 -m venv venv
source venv/bin/activate

pip install wheel setuptools_scm pybind11
pip wheel .

The primary build process for this repository is to use GitHub Actions to build binary wheels for MacOS, Linux (x64 and aarch64), and Windows and publish them with a source wheel on PyPi. See .github/workflows/build.yml. CMake uses FetchContent to download pybind11. Building is then managed by cibuildwheel. Further installation is then available via pip install chiavdf e.g.

Building Timelord and related binaries

In addition to building the required binary and source wheels for Windows, MacOS and Linux, chiavdf can be used to compile vdf_client and vdf_bench. vdf_client is the core VDF process that completes the Proof of Time submitted to it by the Timelord. The repo also includes a benchmarking tool to get a sense of the iterations per second of a given CPU called vdf_bench. Try ./vdf_bench square_asm 250000 for an ips estimate on x86/x64 (phased/asm pipeline). On non-x86 architectures, use ./vdf_bench square 250000 (NUDUPL). Set CHIAVDF_LOG_AVX=1 to emit AVX feature detection logs during startup.

For direct CMake builds, the following options are available:

BUILD_VDF_CLIENT - build vdf_client
BUILD_VDF_BENCH - build vdf_bench
BUILD_VDF_TESTS - build test binaries (1weso_test, 2weso_test, prover_test) and CTest/GoogleTest targets (for example vdf_client_session_test)
BUILD_HW_TOOLS - build hardware timelord tools
ENABLE_GNU_ASM - enable GNU-style asm pipeline on x86/x64 (enabled by default)
GENERATE_ASM_TRACKING_DATA - enable track_asm() instrumentation in generated asm (off by default to avoid hot-loop overhead)

Example:

cmake -S src -B build \
  -DBUILD_PYTHON=OFF \
  -DBUILD_CHIAVDFC=OFF \
  -DBUILD_VDF_CLIENT=ON \
  -DBUILD_VDF_BENCH=ON \
  -DBUILD_VDF_TESTS=ON
cmake --build build --target vdf_client vdf_bench 1weso_test 2weso_test prover_test vdf_client_session_test

For the legacy setup.py + Makefile.vdf-client flow (used by wheel hooks), you can control native binary builds with environment variables:

BUILD_VDF_CLIENT=Y to include vdf_client (and related test binaries)
BUILD_VDF_BENCH=Y to include vdf_bench For direct CMake builds, use -DBUILD_* flags instead of these environment variables.

AVX runtime flags:

CHIAVDF_LOG_AVX=1: emit AVX detection logs at startup
CHIA_DISABLE_AVX2=1: disable AVX2 path even when supported
CHIA_FORCE_AVX2=1: force AVX2 path
CHIA_DISABLE_AVX512_IFMA=1: disable AVX-512 IFMA path
CHIA_ENABLE_AVX512_IFMA=1: enable AVX-512 IFMA path when CPUID support is present
CHIA_FORCE_AVX512_IFMA=1: force AVX-512 IFMA path

This is currently automated via pip in the install-timelord.sh script in the chia-blockchain repository which depends on this repository.

If you're running a timelord, the following tests are available, depending of which type of timelord you are running:

./1weso_test, in case you're running in sanitizer_mode.

./2weso_test, in case you're running a timelord that extends the chain and you're running the slow algorithm.

./prover_test, in case you're running a timelord that extends the chain and you're running the fast algorithm.

Those tests will simulate the vdf_client and verify for correctness the produced proofs.

Note: ./prover_test defaults to a long soak/stress run. Set CHIAVDF_PROVER_TEST_FAST=1 to run a short, CI-friendly correctness check.

Regression tests for specific bugs are now added with GoogleTest and run via CTest. Example:

cmake -S src -B build \
  -DBUILD_PYTHON=OFF \
  -DBUILD_CHIAVDFC=OFF \
  -DBUILD_VDF_CLIENT=OFF \
  -DBUILD_VDF_BENCH=OFF \
  -DBUILD_VDF_TESTS=ON \
  -DBUILD_HW_TOOLS=OFF
cmake --build build --target vdf_client_session_test
ctest --test-dir build --output-on-failure -R '^regression\.'

Testing matrix

Binary integration tests (existing): 1weso_test, 2weso_test, prover_test; these simulate vdf_client and validate proof correctness.
Regression tests (new): GoogleTest targets executed via CTest (for example vdf_client_session_test, typically filtered with ctest -R '^regression\.').
Hardware tests: standalone binaries such as hw_test and emu_hw_test described in README_ASIC.md.

Fuzzing

Fuzz targets live under rust_bindings/fuzz. The prove target includes an iteration cap to avoid out-of-memory conditions in CI. If you want deeper iteration coverage, raise the cap in rust_bindings/fuzz/fuzz_targets/prove.rs after validating memory usage and exec/s on your runner.

Contributing and workflow

Contributions are welcome and more details are available in chia-blockchain's CONTRIBUTING.md.

The master branch is the currently released latest version on PyPI. Note that at times chiavdf will be ahead of the release version that chia-blockchain requires in it's master/release version in preparation for a new chia-blockchain release. Please branch or fork master and then create a pull request to the master branch. Linear merging is enforced on master and merging requires a completed review. PRs will kick off a ci build and analysis of chiavdf at lgtm.com. Please make sure your build is passing and that it does not increase alerts at lgtm.

Background from prior VDF competitions

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Our VDF construction is described in classgroup.pdf. The implementation details about squaring and proving phrases are described below.

Main VDF Loop

The main VDF loop produces repeated squarings of the generator form (i.e. calculates y(n) = g^(2^n)) as fast as possible, until the program is interrupted. Sundersoft's entry from Chia's 2nd VDF contest is used, together with the fast reducer used in Pulmark's entry. This approach is described below:

The NUDUPL algorithm is used. The equations are based on cryptoslava's equations from the 1st contest. They were modified slightly to increase the level of parallelism.

The GCD is a custom implementation with scalar integers. There are two base cases: one uses a lookup table with continued fractions and the other uses the euclidean algorithm with a division table. The division table algorithm is slightly faster even though it has about 2x as many iterations.

After the base case, there is a 128 bit GCD that generates 64 bit cofactor matrices with Lehmer's algorithm. This is required to make the long integer multiplications efficient (Flint's implementation doesn't do this).

The GCD also implements Flint's partial xgcd function, but the output is slightly different. This implementation will always return an A value which is > the threshold and a B value which is <= the threshold. For a normal GCD, the threshold is 0, B is 0, and A is the GCD. Also the interfaces are slightly different.

Scalar integers are used for the GCD. I don't expect any speedup for the SIMD integers that were used in the last implementation since the GCD only uses 64x1024 multiplications, which are too small and have too high of a carry overhead for the SIMD version to be faster. In either case, most of the time seems to be spent in the base case so it shouldn't matter too much.

If SIMD integers are used with AVX-512, doubles have to be used because the multiplier sizes for doubles are significantly larger than for integers. There is an AVX-512 extension to support larger integer multiplications but no processor implements it yet. It should be possible to do a 50 bit multiply-add into a 100 bit accumulator with 4 fused multiply-adds if the accumulators have a special nonzero initial value and the inputs are scaled before the multiplication. This would make AVX-512 about 2.5x faster than scalar code for 1024x1024 integer multiplications (assuming the scalar code is unrolled and uses ADOX/ADCX/MULX properly, and the CPU can execute this at 1 cycle per iteration which it probably can't).

The GCD is parallelized by calculating the cofactors in a separate slave thread. The master thread will calculate the cofactor matrices and send them to the slave thread. Other calculations are also parallelized.

The VDF implementation from the first contest is still used as a fallback and is called about once every 5000 iterations. The GCD will encounter large quotients about this often and these are no

Chiavdf

Install / Use

README