RS5

RV32I[M][A][C][V]Zicntr[_Zicond]_Zicsr_Zihpm[_Zcb][_Zkne][_Xosvm] processor

Generate Convert Improve

Install / Use

/learn @gaph-pucrs/RS5

About this skill

Quality Score

0/100

README

RS5

Tutorials (in Portuguese)

Description

RS5 is a processor that implements the RISC-V 32 bits integer Module (RV32I) alongside the Zicsr Extension and the Machine Mode of the RISC-V Privileged Architecture. It is written in the SystemVerilog Hardware Description Language (HDL) and implements the following interface:

This project was designed at the Hardware development support Group (GAPH) of the School of Technology, PUCRS, Brazil.

The processor is a 4 stage pipeline, synchronized to the rising edge of the clock. The stages are:

Fetch: Contains the Program Counter (PC) Logic, that indexes the Instruction memory.
Decode and Operand Fetch: Decodes the instruction, extracting its type, format, operation and register addresses, also Fetches the operands in the register Bank and performs data hazard detections. When a hazard is detected, it inserts NOP instructions (bubbles) until the conflict become resolved.
Execute: Performs the given operation on the received operands. Holds an Arithmetic and Logic Unit, a CSR access unit, a Memory Load and Store unit and a Branch unit.
Retire: Performs the write-back of the results of the instruction, it can be either the result from execute stage or the data read from memory.

> RS5 BLOCK DIAGRAM.

</p>

RTL and Processor Organization

This processor organization is an evolution of an Asynchronous RISC-V High-level Functional Model written in GO language. That can be found in the ARV Go High-level Functional Model repository.

First Stage - Instruction Fetch: This stage is the Instruction Fetch Stage and is implemented by the Fetch Unit, this unit contains the Program Counter (PC) logic, the value contained in this register is used to index the instruction memory, at each clock cycle it is updated, it can be updated to the next instruction address (PC+4), or a branch address, it can also maintain the same address in case of a bubble being inserted due to a detection of a data hazard or memory stall. The jump/branch prediction policy is "never taken". Each instruction that leaves the first stage is linked to a Tag that will follow the instruction until its retirement. This Tag is a number that indicates the "flow/context" of that instruction. Every time a jump/branch occurs the tag will be increased meaning that the fetched instructions now belong to a new flow.
Second Stage - Instruction Decode: It comprehends the Decoder Unit, It is responsible for the generation of the control signals, based on the instruction object code fetched by the previous stage. It identifies the instruction operation (e.g. addi, bne) that is implemented in a one-hot encoding. It also decodes the instruction format (e.g. Immediate, Branch). Also sends to the register bank the read addresses that are directly extracted from the instruction object code, the object code is also used for the immediate operand extraction, based on the instruction format. The instruction format also determines the operands that will be sent to the next stage. The Data Hazard Detection mechanism is implemented by this unit, it tracks the Destiny register (regD) of the instruction that currently is in execute stage, this register is called the "blocked register", its value is updated every time that an instruction leaves the stage. A Data Hazard is detected when the instruction being processed by the decoding stage has an operator that must be read from the blocked register, this issues a signal called "hazard" that indicates that a bubble must be issued until the data conflict gets resolved. This unit also looks for data memory hazards, cases where a read is performed right after a write in the memory. The locked register value is used for indexing the register that must receive the write-back data from the retire unit.
Third Stage - Instruction Execution: This stage comprehends the Instruction Execution Stage that is implemented by the Execute Unit, It implements an Arithmetic Logic Unit (ALU) responsible for calculations, and it also has a Branch Unit that makes the decision of branching based on instruction operation and operands. Also implements the Memory Load and Store mechanism. Lastly, it implements the CSR access logic. The operations are performed based on Tag comparisons between this unit's tag and the instruction tag, if they mismatch the instruction is killed and its operation is not performed. A performed branch causes the internal tag to be increased, causing the tag mismatch on the following instructions until an instruction fetched from the new flow arrives with the updated tag.

Fourth Stage: This is the last stage and it is responsible for the write-back of the instructions data, it is implemented by the Retire Unit and is responsible for closing the loops. It performs the write-back on the register bank. It also receives the data read from memory and process it based on instruction operation, then it decides which data should be sent to the register bank, either the data from memory or the data from execute unit.

The three loops

The Control of the processor core flow is made by three main loops:

The first loop is the instruction context/flow loop that comprehends the first three stages of the processor core, it starts in the first stage with the instruction fetch and its tag association and goes through until the third stage where it is closed by the jump control signals that are sent back to the first stage, this loop is implemented by the Tag system that manages the context/flow of the instructions, it is updated every time a jump/branch occurs.
The second loop comprehends the second, the third and the fourth stage, it is called the Data loop and implements the data write-back in the register bank. It is closed by the Retire Unit and also implements a data forwarding mechanism.
The third loop comprehends the data hazard conflict detection mechanism that is implemented by the Instruction decode stage that keeps track of the register with pending write and the closure of the loop is performed by the value stored in the loop being used for addressing the register that will receive the write-back.

Memory Interface

The processor implements The memory interface depicted in the above image. The Memory is a True dual port RAM where one port is used as a read-only port for instruction fetching and the other port is used for read and write operations.

Requirements

To perform code compilation the RISC-V toolchain is needed. The toolchain has a compiler that performs the compilation of the applications codes that are written in C language and generates a binary, this binary is the entry of the processor simulation. The applications are located in app/.

The installation of the toolchain is only needed if you want to compile new applications or change parameters in the given ones.

To install the Toolchain a guide and a script are provided inside the folder tools/riscv-toolchain.

To perform the simulation you must have a HDL simulator (e.g. XCELIUM, MODELSIM). To perform the simulation of a specific application, you must edit the binary input file in the RAM_mem.sv. The testbench and the ram implementation are located in the /sim folder. Once the desired application is selected and the testbench is pointing to it, then you are able to perform the simulation using the HDL simulator.

Applications

In this repository, some applications that were used to validate the processor are provided. The source codes of the applications are located in the app/ folder, all of them can be built using their own Makefile, which will generate the output binary of each application.

Coremark

The Coremark is a Benchmark application that was developed by EEMBC, it was ported to run in RS5 and can be compiled by simply running the command "make" inside Coremark's folder, it will then generate a binary called "coremark.bin". In our processor, since we have only one thread we are running the coremark for only one iteration.

RISCV Tests

The riscv-tests is the "Berkeley Suite" that was developed to validate the RISC-V implementations. It tests all the instructions by running comparisons between the expected results and those generated by the Unit under verification.

Sample Codes

The samplecode folder contains some simple applications that were used to test some functionalities in the processor. These applications use BareOS, which is a simple Operational System. All the applications are compiled at once by simply running the "make" command. To add more applications you must insert in the folder with the source code in C language and then edit the [Makefile](https://github.co

Related Skills

node-connect

345.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

106.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

345.9k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

345.9k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。