Tinyfive
TinyFive is a lightweight RISC-V emulator and assembler written in Python with neural network examples
Install / Use
/learn @OpenMachine-ai/TinyfiveREADME
TinyFive
<a href="https://colab.research.google.com/github/OpenMachine-ai/tinyfive/blob/main/misc/colab.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Colab" height="20"> </a>
TinyFive is a lightweight RISC-V emulator and assembler written entirely in Python:
- TinyFive brings the power of Python and NumPy to assembly code.
- Useful for running neural networks on RISC-V: Simulate your RISC-V assembly code along with a neural network in Keras or PyTorch (and without relying on RISC-V toolchains).
- Custom instructions can be added for easy HW/SW codesign in Python (without C++ and compiler toolchains).
- If you want to learn how RISC-V works, TinyFive lets you play with instructions and assembly code in this colab.
- TinyFive might also be useful for ML scientists who are using ML/RL for compiler optimizations (see e.g. CompilerGym) or to replace compiler toolchains by AI.
- Can be very fast if you only use the upper-case instructions defined in the first ~200 lines of machine.py.
- Fewer than 1000 lines of code (w/o tests and examples)
- Uses NumPy for math
Contents
- Installation
- Usage
- Running in colab
- Running without package
- Contribute
- Latest status
- Speed
- Comparison
- References
- Tiny Tech promise
Installation
pip install tinyfive
Usage
TinyFive can be used in the following three ways:
- Option A: Use upper-case instructions such as
ADD()andMUL(), see examples 1.1, 1.2, 2.1, and 3.1 below. - Option B: Use
asm()andexe()functions without branch instructions, see examples 1.3 and 2.2 below. - Option C: Use
asm()andexe()functions with branch instructions, see example 2.3, 3.2, and 3.3 below.
For the examples below, import and instantiate a RISC-V machine with at least 4KB of memory as follows:
from tinyfive.machine import machine
m = machine(mem_size=4000) # instantiate RISC-V machine with 4KB of memory
Example 1: Multiply two numbers
Example 1.1: Use upper-case instructions (option A) with back-door loading of registers.
m.x[11] = 6 # manually load '6' into register x[11]
m.x[12] = 7 # manually load '7' into register x[12]
m.MUL(10, 11, 12) # x[10] := x[11] * x[12]
print(m.x[10])
# Output: 42
Example 1.2: Same as example 1.1, but now load the data from memory. Specifically, the data values are stored at addresses 0 and 4. Here, each value is 32 bits wide (i.e. 4 bytes wide), which occupies 4 addresses in the byte-wide memory.
m.write_i32(6, 0) # manually write '6' into mem[0] (memory @ address 0)
m.write_i32(7, 4) # manually write '7' into mem[4] (memory @ address 4)
m.LW (11, 0, 0) # load register x[11] from mem[0 + 0]
m.LW (12, 4, 0) # load register x[12] from mem[4 + 0]
m.MUL(10, 11, 12) # x[10] := x[11] * x[12]
print(m.x[10])
# Output: 42
Example 1.3: Same as example 1.2, but now use asm() and exe() (option B). The assembler function asm() function takes an instruction and converts it into machine code and stores it in memory at address s.pc. Once the entire assembly program is written into memory mem[], the exe() function (aka ISS) can then exectute the machine code stored in memory.
m.write_i32(6, 0) # manually write '6' into mem[0] (memory @ address 0)
m.write_i32(7, 4) # manually write '7' into mem[4] (memory @ address 4)
# store assembly program in mem[] starting at address 4*20
m.pc = 4*20
m.asm('lw', 11, 0, 0) # load register x[11] from mem[0 + 0]
m.asm('lw', 12, 4, 0) # load register x[12] from mem[4 + 0]
m.asm('mul', 10, 11, 12) # x[10] := x[11] * x[12]
# execute program from address 4*20: execute 3 instructions and then stop
m.exe(start=4*20, instructions=3)
print(m.x[10])
# Output: 42
Example 2: Add two vectors
We are using the following memory map for adding two 8-element vectors res[] := a[] + b[], where each vector element is 32 bits wide (i.e. each element occupies 4 byte-addresses in memory).
| Byte address | Contents |
| ------------ | -------- |
| 0 .. 4*7 | a-vector: a[0] is at address 0, a[7] is at address 4*7 |
| 4*8 .. 4*15 | b-vector: b[0] is at address 4*8, b[7] is at address 4*15 |
| 4*16 .. 4*23 | result-vector: res[0] is at address 4*16, res[7] is at address 4*23 |
Example 2.1: Use upper-case instructions (option A) with Python for-loop.
# generate 8-element vectors a[] and b[] and store them in memory
a = np.random.randint(100, size=8)
b = np.random.randint(100, size=8)
m.write_i32_vec(a, 0) # write vector a[] to mem[0]
m.write_i32_vec(b, 4*8) # write vector b[] to mem[4*8]
# pseudo-assembly for adding vectors a[] and b[] using Python for-loop
for i in range(8):
m.LW (11, 4*i, 0) # load x[11] with a[i] from mem[4*i + 0]
m.LW (12, 4*(i+8), 0) # load x[12] with b[i] from mem[4*(i+8) + 0]
m.ADD(10, 11, 12) # x[10] := x[11] + x[12]
m.SW (10, 4*(i+16), 0) # store results in mem[], starting at address 4*16
# compare results against golden reference
res = m.read_i32_vec(4*16, size=8) # read result vector from address 4*16
ref = a + b # golden reference: simply add a[] + b[]
print(res - ref) # print difference (should be all-zero)
# Output: [0 0 0 0 0 0 0 0]
Example 2.2: Same as example 2.1, but now use asm() and exe() functions without branch instructions (option B).
# generate 8-element vectors a[] and b[] and store them in memory
a = np.random.randint(100, size=8)
b = np.random.randint(100, size=8)
m.write_i32_vec(a, 0) # write vector a[] to mem[0]
m.write_i32_vec(b, 4*8) # write vector b[] to mem[4*8]
# store assembly program in mem[] starting at address 4*48
m.pc = 4*48
for i in range(8):
m.asm('lw', 11, 4*i, 0) # load x[11] with a[i] from mem[4*i + 0]
m.asm('lw', 12, 4*(i+8), 0) # load x[12] with b[i] from mem[4*(i+8) + 0]
m.asm('add', 10, 11, 12) # x[10] := x[11] + x[12]
m.asm('sw', 10, 4*(i+16), 0) # store results in mem[], starting at address 4*16
# execute program from address 4*48: execute 8*4 instructions and then stop
m.exe(start=4*48, instructions=8*4)
# compare results against golden reference
res = m.read_i32_vec(4*16, size=8) # read result vector from address 4*16
ref = a + b # golden reference: simply add a[] + b[]
print(res - ref) # print difference (should be all-zero)
# Output: [0 0 0 0 0 0 0 0]
Example 2.3: Same as example 2.2, but now use asm() and exe() functions with branch instructions (option C). The lbl() function defines labels, which are symbolic names that represent memory addresses. These labels improve the readability of branch instructions and mark the start and end of the assembly code executed by the exe() function.
# generate 8-element vectors a[] and b[] and store them in memory
a = np.random.randint(100, size=8)
b = np.random.randint(100, size=8)
m.write_i32_vec(a, 0) # write vector a[] to mem[0]
m.write_i32_vec(b, 4*8) # write vector b[] to mem[4*8]
# store assembly program starting at address 4*48
m.pc = 4*48
# x[13] is the loop-variable that is incremented by 4: 0, 4, .., 28
# x[14] is the constant 28+4 = 32 for detecting the end of the for-loop
m.lbl('start') # define label 'start'
m.asm('add', 13, 0, 0) # x[13] := x[0] + x[0] = 0 (because x[0] is always 0)
m.asm('addi', 14, 0, 32) # x[14] := x[0] + 32 = 32 (because x[0] is always 0)
m.lbl('loop') # label 'loop'
m.asm('lw', 11, 0, 13) # load x[11] with a[] from mem[0 + x[13]]
m.asm('lw', 12, 4*8, 13) # load x[12] with b[] from mem[4*8 + x[13]]
m.asm('add', 10, 11, 12) # x[10] := x[11] + x[12]
m.asm('sw', 10, 4*16, 13) # store x[10] in mem[4*16 + x[13]]
m.asm('addi', 13, 13, 4) # x[13] := x[13] + 4 (increment x[13] by 4)
m.asm('bne', 13, 14, 'loop') # branch to 'loop' if x[13] != x[14]
m.lbl('end') # label 'end'
# execute program: start at label 'start', stop when label 'end' is reached
m.exe(start='start', end='end')
# compare results against golden reference
res = m.read_i32_vec(4*16, size=8) # read result vector from address 4*16
ref = a + b # golden reference: simply add a[] + b[]
print(res - ref) # print difference (should be all-zero)
# Output: [0 0 0 0 0 0 0 0]
A slightly more efficient implementation would decrement the loop variable x[13] (instead of incrementing) so that the branch instruction compares against x[0] = 0 (instead of the constant stored in x[14]), which frees up register x[14] and reduces the total number of instructions by 1.
Use print_perf() to analyze performance and dump_state() to print out the current values of the register files and the the program counter (PC) as follow
