Rvint
Subroutines which implement RISC-V multiplication, division, base conversion, square root, and other integer algorithms
Install / Use
/learn @benmesander/RvintREADME
rvint

Integer-based mathematical subroutines which implement RISC-V M and B extension functionality on I, and Zmmul instruction sets as well as other useful integer algorithms. There is support for the reduced functionality E instruction set. 32 and 64 bit processors are supported, and there is the beginnings of support for 128 bit processors.
The code can optionally make use of the Zba, Zbb, Zbs, ZiCond, and M to increase the efficiency of routines.
These routines are designed to be both concise and efficient, with a slight bias towards concise when these are in conflict.

Division
On 32-bit processors (RV32I, RV32E)
- 32-bit by 32-bit signed and unsigned division with 32-bit result and remainder.
- Fast 32-bit unsigned division by 3
- Fast 32-bit unsigned division by 5
- Fast 32-bit unsigned division by 6
- Fast 32-bit unsigned division by 7
- Fast 32-bit unsigned division by 9
- Fast 32-bit unsigned division by 10
- Fast 32-bit unsigned division by 11
- Fast 32-bit unsigned division by 12
- Fast 32-bit unsigned division by 13
- Fast 32-bit unsigned division by 100
- Fast 32-bit unsigned division by 1000
- Fast 32-bit signed division by 3
- Fast 32-bit signed division by 5
- Fast 32-bit signed division by 6
- Fast 32-bit signed division by 7
- Fast 32-bit signed division by 9
- Fast 32-bit signed division by 10
- Fast 32-bit signed division by 11
- Fast 32-bit signed division by 12
- Fast 32-bit signed division by 13
- Fast 32-bit signed division by 100
- Fast 32-bit signed division by 1000
On 64-bit processors (RV64I)
- 64-bit by 64-bit signed and unsigned division with 64-bit result and remainder.
- Fast 64-bit unsigned division by 3
- Fast 64-bit unsigned division by 5
- Fast 64-bit unsigned division by 6
- Fast 64-bit unsigned division by 7
- Fast 64-bit unsigned division by 9
- Fast 64-bit unsigned division by 10
- Fast 64-bit unsigned division by 11
- Fast 64-bit unsigned division by 12
- Fast 64-bit unsigned division by 13
- Fast 64-bit unsigned division by 100
- Fast 64-bit unsigned division by 1000
- Fast 64-bit signed division by 3
- Fast 64-bit signed division by 5
- Fast 64-bit signed division by 6
- Fast 64-bit signed division by 7
- Fast 64-bit signed division by 9
- Fast 64-bit signed division by 10
- Fast 64-bit signed division by 11
- Fast 64-bit signed division by 12
- Fast 64-bit signed division by 13
- Fast 64-bit signed division by 100
- Fast 64-bit signed division by 1000
Math Macros
- abs (absolute value) of 32 or 64 bit register.
Multiplication
On 32-bit processors (RV32E, RV32I):
- 32-bit by 32-bit signed and unsigned multiplication with 32-bit result.
- 32-bit by 32-bit signed and unsigned multiplication with 64-bit result.
on 64-bit processors (RV64i):
- 32-bit by 32-bit signed and unsigned multiplication with 64-bit result.
- 64-bit by 64-bit signed and unsigned multiplication with 64-bit result.
- 64-bit by 64-bit signed and unsigned multiplication with 128-bit result.
Multiplication Macros
- Multiply 32-bit or 64-bit register by 3.
- Multiply 32-bit or 64-bit register by 5.
- Multiply 32-bit or 64-bit register by 6.
- Multiply 32-bit or 64-bit register by 9.
- Multiply 32-bit or 64-bit register by 10.
- Multiply 32-bit or 64-bit register by 11.
- Multiply 32-bit or 64-bit register by 12.
- Multiply 32-bit or 64-bit register by 13.
- Multiply 32-bit or 64-bit register by 100.
- Multiply 32-bit or 64-bit register by 1000.
Base Conversions & I/O Operations
These operations support 32-bit numbers on 32-bit architectures and 64-bit numbers on 64-bit architectures. All routines are RV32E, RV32I, and RV64I compatible.
- ASCII binary to binary.
- ASCII unsigned decimal to binary.
- ASCII signed decimal to two's complement binary.
- ASCII hexadecimal to binary.
- binary to ASCII binary.
- two's complement binary to ASCII signed decimal.
- unsigned binary to unsigned ASCII decimal (also RV128 compatible)
- binary to ASCII hexadecimal.
Square Root (RV32E, RV32I, and RV64I compatible)
- 32-bit integer square root on 32-bit processors.
- 64-bit integer square root on 64-bit processors.
Greatest Common Divisor (RV32E, RV32I, and RV64I compatible)
- 32-bit GCD of two unsigned 32-bit numbers on 32-bit processors.
- 64-bit GCD of two unsigned 64-bit numbers on 64-bit processors.
Least Common Multiple (RV32E, RV32I, and RV64I compatible)
- 32-bit LCM of two unsigned 32-bit numbers on 32-bit processors.
- 64-bit LCM of two unsigned 64-bit numbers on 64-bit processors.
Bit Operations (RV32E, RV32I, and RV64I compatible)
- Count leading zeroes in 32-bit number on 32-bit processors.
- Count leading zeroes in 64-bit number on 64-bit processors.
- Count trailing zeroes in 32-bit number on 32-bit processors.
- Count trailing zeroes in 64-bit number on 64-bit processors.
Building
clang, lld, and make are assumed. I'm currently using clang 18. The TARGET, ARCH, and ABI should be edited in the Makefile, and the CPU_BITS constant should be set to match in config.s HAS_... flags should be set for available extensions.
A static library, librvint.a, will be created to link against.
Only certain routines support RV32E. The routines that do not currently give compile errors; this will be fixed in the future. RV64E is not a focus as I am unaware of any real world implementations. The Makefile has a CH32V003 target that is an RV32EC_zicsr target.
Tests
The test programs assume Linux syscalls are available. In particular syscall 64 is write and 93 is exit.
Running
- I use this emulator to run RV32I code: https://riscv-programming.org/ale/
- I'm running RV64 code on a scaleway elastic metal cloud instance: https://labs.scaleway.com/en/em-rv1/
API
Bit manipulation
These routines implement functionality in the ZBB ISA extension for CPUs which do not have this available. Despite being efficiently implemented (they run in O(log n) time), they are heavyweight enough that they are not useful in places like inner loops in division routines.
bits_ctz
Count number of trailing zeroes on CPUs with no ZBB extension. O(log n). RV32E/RV32I/RV64I compatible.
| Configuration | Cycles (32) | Cycles (64) | |--------------|-------------|-------------| | Best Case | 3 | 3 | | Average Case | ~21 | ~25 | | Worst Case | 23 | 27 |
Input
a0 = number
Output
a0 = count of trailing zeroes
bits_clz
Count number of leading zeroes on CPUs with no ZBB extension. O(log n). RV32E/RV32I/RV64I compatible.
| Configuration | Cycles (32) | Cycles (64) | |--------------|-------------|-------------| | Best Case | 3 | 3 | | Average Case | ~21 | ~25 | | Worst Case | 23 | 27 |
Input
a0 = number
Output
a0 = count of leading zeroes
Division
Signed and unsigned division for CPUs without dividers.
The initial implementation of divremu came from the division routine in the rvmon monitor. I believe this code originally came from Bruce Hout on reddit. It has been heavily extended to minimize the number of cycles by using ISA extensions and restructuring the code by selectively unrolling parts. The divrem routine is a wrapper which enables signed division.
divremu
Unsigned integer division for CPUs without M extension. RV32E/RV32I/RV64I compatible. Restoring division algorithm. Available in ROLLED (compact) and UNROLLED (faster) versions (this is selected in config.s with the DIVREMU_UNROLLED flag). Worst-case performance:
| Configuration | Cycles (32) | Cycles (64) | |----------------------------|-------------|-------------| | Base ISA (Rolled) | 455 | 903 | | Base ISA (Unrolled) | 351 | 695 | | With Extensions (Rolled) | 391 | 775 | | With Extensions (Unrolled) | 287 | 567 |
Input
a0 = dividend a1 = divisor
Output
a0 = quotient a1 = remainder
divrem
Signed integer division that rounds towards zero. RV32E/RV32I/RV64I compatible restoring division algorithm. Uses divremu, so performance depends upon whether that routine is used in ROLLED or UNROLLED form. Worst case performance:
| Configuration | Cycles (32) | Cycles (64) | |----------------------------|-------------|-------------| | Base ISA (Rolled) | 496 | 944 | | Base ISA (Unrolled) | 392 | 736 | | With Extensions (Rolled) | 430 | 814 | | With Extensions (Unrolled) | 326 | 606 |
Input
a0 = dividend a1 = divisor
Output
a0 = quotient a1 = remainder
The following division routines are branchless, straight line code that execute in a fixed number of cycles for all inputs, and thus are suitable for cryptographic applications.
Initial implementation was done from Hacker's Delight, 2nd edition. The algorithms were extended to 64 bits, and agressively optimized using the peculiar features of the RISC-V ISA to minimize the cycle count. The algorithmic approach is to use a series expansion to estimate the quotient, then an estimated remainder is calculated, and the quotient is corrected.
div3u
Unsigned integer divide by 3. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) | |-----------------|-------------|-------------| | Base ISA | 20 | 22 | | With Extensions | 17 | 19 |
Input
a0 = dividend
Output
a0 = quotient
div5u
Unsigned integer divide by 5. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) | |-----------------|-------------|-------------| | Base ISA | 15 | 17 | | With Extensions | 14 | 16 |
Input
a0 = d
