rvint

rvint3

Integer-based mathematical subroutines which implement RISC-V M and B extension functionality on I, and Zmmul instruction sets as well as other useful integer algorithms. There is support for the reduced functionality E instruction set. 32 and 64 bit processors are supported, and there is the beginnings of support for 128 bit processors.

The code can optionally make use of the Zba, Zbb, Zbs, ZiCond, and M to increase the efficiency of routines.

These routines are designed to be both concise and efficient, with a slight bias towards concise when these are in conflict.

Screenshot of tests

Division

On 32-bit processors (RV32I, RV32E)

32-bit by 32-bit signed and unsigned division with 32-bit result and remainder.
Fast 32-bit unsigned division by 3
Fast 32-bit unsigned division by 5
Fast 32-bit unsigned division by 6
Fast 32-bit unsigned division by 7
Fast 32-bit unsigned division by 9
Fast 32-bit unsigned division by 10
Fast 32-bit unsigned division by 11
Fast 32-bit unsigned division by 12
Fast 32-bit unsigned division by 13
Fast 32-bit unsigned division by 100
Fast 32-bit unsigned division by 1000
Fast 32-bit signed division by 3
Fast 32-bit signed division by 5
Fast 32-bit signed division by 6
Fast 32-bit signed division by 7
Fast 32-bit signed division by 9
Fast 32-bit signed division by 10
Fast 32-bit signed division by 11
Fast 32-bit signed division by 12
Fast 32-bit signed division by 13
Fast 32-bit signed division by 100
Fast 32-bit signed division by 1000

On 64-bit processors (RV64I)

64-bit by 64-bit signed and unsigned division with 64-bit result and remainder.
Fast 64-bit unsigned division by 3
Fast 64-bit unsigned division by 5
Fast 64-bit unsigned division by 6
Fast 64-bit unsigned division by 7
Fast 64-bit unsigned division by 9
Fast 64-bit unsigned division by 10
Fast 64-bit unsigned division by 11
Fast 64-bit unsigned division by 12
Fast 64-bit unsigned division by 13
Fast 64-bit unsigned division by 100
Fast 64-bit unsigned division by 1000
Fast 64-bit signed division by 3
Fast 64-bit signed division by 5
Fast 64-bit signed division by 6
Fast 64-bit signed division by 7
Fast 64-bit signed division by 9
Fast 64-bit signed division by 10
Fast 64-bit signed division by 11
Fast 64-bit signed division by 12
Fast 64-bit signed division by 13
Fast 64-bit signed division by 100
Fast 64-bit signed division by 1000

Math Macros

abs (absolute value) of 32 or 64 bit register.

Multiplication

On 32-bit processors (RV32E, RV32I):

32-bit by 32-bit signed and unsigned multiplication with 32-bit result.
32-bit by 32-bit signed and unsigned multiplication with 64-bit result.

on 64-bit processors (RV64i):

32-bit by 32-bit signed and unsigned multiplication with 64-bit result.
64-bit by 64-bit signed and unsigned multiplication with 64-bit result.
64-bit by 64-bit signed and unsigned multiplication with 128-bit result.

Multiplication Macros

Multiply 32-bit or 64-bit register by 3.
Multiply 32-bit or 64-bit register by 5.
Multiply 32-bit or 64-bit register by 6.
Multiply 32-bit or 64-bit register by 9.
Multiply 32-bit or 64-bit register by 10.
Multiply 32-bit or 64-bit register by 11.
Multiply 32-bit or 64-bit register by 12.
Multiply 32-bit or 64-bit register by 13.
Multiply 32-bit or 64-bit register by 100.
Multiply 32-bit or 64-bit register by 1000.

Base Conversions & I/O Operations

These operations support 32-bit numbers on 32-bit architectures and 64-bit numbers on 64-bit architectures. All routines are RV32E, RV32I, and RV64I compatible.

ASCII binary to binary.
ASCII unsigned decimal to binary.
ASCII signed decimal to two's complement binary.
ASCII hexadecimal to binary.
binary to ASCII binary.
two's complement binary to ASCII signed decimal.
unsigned binary to unsigned ASCII decimal (also RV128 compatible)
binary to ASCII hexadecimal.

Square Root (RV32E, RV32I, and RV64I compatible)

32-bit integer square root on 32-bit processors.
64-bit integer square root on 64-bit processors.

Greatest Common Divisor (RV32E, RV32I, and RV64I compatible)

32-bit GCD of two unsigned 32-bit numbers on 32-bit processors.
64-bit GCD of two unsigned 64-bit numbers on 64-bit processors.

Least Common Multiple (RV32E, RV32I, and RV64I compatible)

32-bit LCM of two unsigned 32-bit numbers on 32-bit processors.
64-bit LCM of two unsigned 64-bit numbers on 64-bit processors.

Bit Operations (RV32E, RV32I, and RV64I compatible)

Count leading zeroes in 32-bit number on 32-bit processors.
Count leading zeroes in 64-bit number on 64-bit processors.
Count trailing zeroes in 32-bit number on 32-bit processors.
Count trailing zeroes in 64-bit number on 64-bit processors.

Building

clang, lld, and make are assumed. I'm currently using clang 18. The TARGET, ARCH, and ABI should be edited in the Makefile, and the CPU_BITS constant should be set to match in config.s HAS_... flags should be set for available extensions.

A static library, librvint.a, will be created to link against.

Only certain routines support RV32E. The routines that do not currently give compile errors; this will be fixed in the future. RV64E is not a focus as I am unaware of any real world implementations. The Makefile has a CH32V003 target that is an RV32EC_zicsr target.

Tests

The test programs assume Linux syscalls are available. In particular syscall 64 is write and 93 is exit.

Running

I use this emulator to run RV32I code: https://riscv-programming.org/ale/
I'm running RV64 code on a scaleway elastic metal cloud instance: https://labs.scaleway.com/en/em-rv1/

API

Bit manipulation

These routines implement functionality in the ZBB ISA extension for CPUs which do not have this available. Despite being efficiently implemented (they run in O(log n) time), they are heavyweight enough that they are not useful in places like inner loops in division routines.

bits_ctz

Count number of trailing zeroes on CPUs with no ZBB extension. O(log n). RV32E/RV32I/RV64I compatible.

| Configuration | Cycles (32) | Cycles (64) | |--------------|-------------|-------------| | Best Case | 3 | 3 | | Average Case | ~21 | ~25 | | Worst Case | 23 | 27 |

Input

a0 = number

Output

a0 = count of trailing zeroes

bits_clz

Count number of leading zeroes on CPUs with no ZBB extension. O(log n). RV32E/RV32I/RV64I compatible.

| Configuration | Cycles (32) | Cycles (64) | |--------------|-------------|-------------| | Best Case | 3 | 3 | | Average Case | ~21 | ~25 | | Worst Case | 23 | 27 |

Input

a0 = number

Output

a0 = count of leading zeroes

Division

Signed and unsigned division for CPUs without dividers.

The initial implementation of divremu came from the division routine in the rvmon monitor. I believe this code originally came from Bruce Hout on reddit. It has been heavily extended to minimize the number of cycles by using ISA extensions and restructuring the code by selectively unrolling parts. The divrem routine is a wrapper which enables signed division.

divremu

Unsigned integer division for CPUs without M extension. RV32E/RV32I/RV64I compatible. Restoring division algorithm. Available in ROLLED (compact) and UNROLLED (faster) versions (this is selected in config.s with the DIVREMU_UNROLLED flag). Worst-case performance:

| Configuration | Cycles (32) | Cycles (64) | |----------------------------|-------------|-------------| | Base ISA (Rolled) | 455 | 903 | | Base ISA (Unrolled) | 351 | 695 | | With Extensions (Rolled) | 391 | 775 | | With Extensions (Unrolled) | 287 | 567 |

Input

a0 = dividend a1 = divisor

Output

a0 = quotient a1 = remainder

divrem

Signed integer division that rounds towards zero. RV32E/RV32I/RV64I compatible restoring division algorithm. Uses divremu, so performance depends upon whether that routine is used in ROLLED or UNROLLED form. Worst case performance:

| Configuration | Cycles (32) | Cycles (64) | |----------------------------|-------------|-------------| | Base ISA (Rolled) | 496 | 944 | | Base ISA (Unrolled) | 392 | 736 | | With Extensions (Rolled) | 430 | 814 | | With Extensions (Unrolled) | 326 | 606 |

Input

a0 = dividend a1 = divisor

Output

a0 = quotient a1 = remainder

The following division routines are branchless, straight line code that execute in a fixed number of cycles for all inputs, and thus are suitable for cryptographic applications.

Initial implementation was done from Hacker's Delight, 2nd edition. The algorithms were extended to 64 bits, and agressively optimized using the peculiar features of the RISC-V ISA to minimize the cycle count. The algorithmic approach is to use a series expansion to estimate the quotient, then an estimated remainder is calculated, and the quotient is corrected.

div3u

Unsigned integer divide by 3. RV32E/RV32I/RV64I

| Configuration | Cycles (32) | Cycles (64) | |-----------------|-------------|-------------| | Base ISA | 20 | 22 | | With Extensions | 17 | 19 |

Input

a0 = dividend

Output

a0 = quotient

div5u

Unsigned integer divide by 5. RV32E/RV32I/RV64I

| Configuration | Cycles (32) | Cycles (64) | |-----------------|-------------|-------------| | Base ISA | 15 | 17 | | With Extensions | 14 | 16 |

Input

a0 = d

Rvint

Install / Use

README

rvint

Division

On 32-bit processors (RV32I, RV32E)

On 64-bit processors (RV64I)

Math Macros

Multiplication

On 32-bit processors (RV32E, RV32I):

on 64-bit processors (RV64i):

Multiplication Macros

Base Conversions & I/O Operations

Square Root (RV32E, RV32I, and RV64I compatible)

Greatest Common Divisor (RV32E, RV32I, and RV64I compatible)

Least Common Multiple (RV32E, RV32I, and RV64I compatible)

Bit Operations (RV32E, RV32I, and RV64I compatible)

Building

Tests

Running

API

Bit manipulation

bits_ctz

Input

Output

bits_clz

Input

Output

Division

divremu

Input

Output

divrem

Input

Output

div3u

Input

Output

div5u

Input