VexRiscv
A FPGA friendly 32 bit RISC-V CPU implementation
Install / Use
/learn @SpinalHDL/VexRiscvREADME
Index
- Index
- Description
- Area usage and maximal frequency
- Dependencies
- CPU generation
- Regression tests
- Interactive debug of the simulated CPU via GDB OpenOCD and Verilator
- Using Eclipse to run and debug the software
- Briey SoC
- Murax SoC
- Running Linux
- Build the RISC-V GCC
- CPU parametrization and instantiation example
- Add a custom instruction to the CPU via the plugin system
- Adding a new CSR via the plugin system
- CPU clock and resets
- VexRiscv Architecture
- FPU
- Plugins
- IBusSimplePlugin
- IBusCachedPlugin
- DecoderSimplePlugin
- RegFilePlugin
- HazardSimplePlugin
- SrcPlugin
- IntAluPlugin
- LightShifterPlugin
- FullBarrelShifterPlugin
- BranchPlugin
- DBusSimplePlugin
- DBusCachedPlugin
- MulPlugin
- DivPlugin
- MulDivIterativePlugin
- CsrPlugin
- MstatushPlugin
- StaticMemoryTranslatorPlugin
- MmuPlugin
- PmpPlugin
- DebugPlugin
- EmbeddedRiscvJtag
- YamlPlugin
- FpuPlugin
- AesPlugin
Description
This repository hosts a RISC-V implementation written in SpinalHDL. Here are some specs :
- RV32I[M][A][F[D]][C] instruction set
- Pipelined from 2 to 5+ stages ([Fetch*X], Decode, Execute, [Memory], [WriteBack])
- 1.44 DMIPS/MHz --no-inline when nearly all features are enabled (1.57 DMIPS/MHz when the divider lookup table is enabled)
- Optimized for FPGA, does not use any vendor specific IP block / primitive
- AXI4, Avalon, wishbone ready
- Optional MUL/DIV extensions
- Optional F32/F64 FPU (require data cache for now)
- Optional instruction and data caches
- Optional hardware refilled MMU
- Optional debug extension allowing Eclipse debugging via a GDB >> openOCD >> JTAG connection
- Optional interrupts and exception handling with Machine, [Supervisor] and [User] modes as defined in the RISC-V Privileged ISA Specification v1.10.
- Two implementations of shift instructions: single cycle (full barrel shifter) and shiftNumber cycles
- Each stage can have optional bypass or interlock hazard logic
- Linux compatible (SoC : https://github.com/enjoy-digital/linux-on-litex-vexriscv)
- Zephyr compatible
- FreeRTOS port
- Support tightly coupled memory on I$ D$ (see GenFullWithTcm / GenFullWithTcmIntegrated)
The hardware description of this CPU is done by using a very software oriented approach (without any overhead in the generated hardware). Here is a list of software concepts used:
- There are very few fixed things. Nearly everything is plugin based. The PC manager is a plugin, the register file is a plugin, the hazard controller is a plugin, ...
- There is an automatic a tool which allows plugins to insert data in the pipeline at a given stage, and allows other plugins to read it in another stage through automatic pipelining.
- There is a service system which provides a very dynamic framework. For instance, a plugin could provide an exception service which can then be used by other plugins to emit exceptions from the pipeline.
There is a gitter channel for all questions about VexRiscv :<br>
For commercial support, please contact spinalhdl@gmail.com.
Note you may be interested VexiiRiscv (https://github.com/SpinalHDL/VexiiRiscv).
Area usage and maximal frequency
The following numbers were obtained by synthesizing the CPU as toplevel on the fastest speed grade without any specific synthesis options to save area or to get better maximal frequency (neutral).<br>
The clock constraint is set to an unattainable value, which tends to increase the design area.<br>
The dhrystone benchmark was compiled with the -O3 -fno-inline option.<br>
All the cached configurations have some cache trashing during the dhrystone benchmark except the VexRiscv full max perf one. This, of course, reduces the performance. It is possible to produce
dhrystone binaries which fit inside a 4KB I$ and 4KB D$ (I already had this case once) but currently it isn't the case.<br>
The CPU configurations used below can be found in the src/scala/vexriscv/demo directory.
VexRiscv small (RV32I, 0.52 DMIPS/MHz, no datapath bypass, no interrupt) ->
Artix 7 -> 243 MHz 504 LUT 505 FF
Cyclone V -> 174 MHz 352 ALMs
Cyclone IV -> 179 MHz 731 LUT 494 FF
iCE40 -> 92 MHz 1130 LC
VexRiscv small (RV32I, 0.52 DMIPS/MHz, no datapath bypass) ->
Artix 7 -> 240 MHz 556 LUT 566 FF
Cyclone V -> 194 MHz 394 ALMs
Cyclone IV -> 174 MHz 831 LUT 555 FF
iCE40 -> 85 MHz 1292 LC
VexRiscv small and productive (RV32I, 0.82 DMIPS/MHz) ->
Artix 7 -> 232 MHz 816 LUT 534 FF
Cyclone V -> 155 MHz 492 ALMs
Cyclone IV -> 155 MHz 1,111 LUT 530 FF
iCE40 -> 63 MHz 1596 LC
VexRiscv small and productive with I$ (RV32I, 0.70 DMIPS/MHz, 4KB-I$) ->
Artix 7 -> 220 MHz 730 LUT 570 FF
Cyclone V -> 142 MHz 501 ALMs
Cyclone IV -> 150 MHz 1,139 LUT 536 FF
iCE40 -> 66 MHz 1680 LC
VexRiscv full no cache (RV32IM, 1.21 DMIPS/MHz 2.30 Coremark/MHz, single cycle barrel shifter, debug module, catch exceptions, static branch) ->
Artix 7 -> 216 MHz 1418 LUT 949 FF
Cyclone V -> 133 MHz 933 ALMs
Cyclone IV -> 143 MHz 2,076 LUT 972 FF
VexRiscv full (RV32IM, 1.21 DMIPS/MHz 2.30 Coremark/MHz with cache trashing, 4KB-I$,4KB-D$, single cycle barrel shifter, debug module, catch exceptions, static branch) ->
Artix 7 -> 199 MHz 1840 LUT 1158 FF
Cyclone V -> 141 MHz 1,166 ALMs
Cyclone IV -> 131 MHz 2,407 LUT 1,067 FF
VexRiscv full max perf (HZ*IPC) -> (RV32IM, 1.38 DMIPS/MHz 2.57 Coremark/MHz, 8KB-I$,8KB-D$, single cycle barrel shifter, debug module, catch exceptions, dynamic branch prediction in the fetch stage, branch and shift operations done in the Execute stage) ->
Artix 7 -> 200 MHz 1935 LUT 1216 FF
Cyclone V -> 130 MHz 1,166 ALMs
Cyclone IV -> 126 MHz 2,484 LUT 1,120 FF
VexRiscv full with MMU (RV32IM, 1.24 DMIPS/MHz 2.35 Coremark/MHz, with cache trashing, 4KB-I$, 4KB-D$, single cycle barrel shifter, debug module, catch exceptions, dynamic branch, MMU) ->
Artix 7 -> 151 MHz 2021 LUT 1541 FF
Cyclone V -> 124 MHz 1,368 ALMs
Cyclone IV -> 128 MHz 2,826 LUT 1,474 FF
VexRiscv linux balanced (RV32IMA, 1.21 DMIPS/MHz 2.27 Coremark/MHz, with cache trashing, 4KB-I$, 4KB-D$, single cycle barrel shifter, catch exceptions, static branch, MMU, Supervisor, Compatible with mainstream linux) ->
Artix 7 -> 180 MHz 2883 LUT 2130 FF
Cyclone V -> 131 MHz 1,764 ALMs
Cyclone IV -> 121 MHz 3,608 LUT 2,082 FF
The following configuration results in 1.44 DMIPS/MHz:
- 5 stage: F -> D -> E -> M -> WB
- single cycle ADD/SUB/Bitwise/Shift ALU
- branch/jump done in the E stage
- memory load values are bypassed in the WB stage (late result)
- 33 cycle division with bypassing in the M stage (late result)
- single cycle multiplication with bypassing in the WB stage (late result)
- dynamic branch prediction done in the F stage with a direct mapped target buffer cache (no penalties on correct predictions)
Note that, recently, the capability to remove the Fetch/Memory/WriteBack stage was added to reduce the area of the CPU, which ends up with a smaller CPU and a better DMIPS/MHz for the small configurations.
Dependencies
On Ubuntu 14:
# JAVA JDK 8
sudo add-apt-repository -y ppa:openjdk-r/ppa
sudo apt-get update
sudo apt-get install openjdk-8-jdk -y
sudo update-alternatives --config java
sudo update-alternatives --config javac
# Install SBT - https://www.scala-sbt.org/
echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list
echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee /etc/apt/sources.list.d/sbt_old.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add
sudo apt-get update
sudo apt-get install sbt
# Verilator (for sim only, really needs 3.9+, in general apt-get will give you 3.8)
sudo apt-get install git make autoconf g++ flex bison
git clone http://git.veripool.org/git/verilator # Only first time
unsetenv VERILATOR_ROOT # For csh; ignore error if on bash
unset VERILATOR_ROOT # For bash
cd verilator
git pull # Make sure we're up-to-date
git checkout v4.216
autoconf # Create ./
