Index

Index
Description
Area usage and maximal frequency
Dependencies
CPU generation
Regression tests
Interactive debug of the simulated CPU via GDB OpenOCD and Verilator
Using Eclipse to run and debug the software
- By using gnu-mcu-eclipse
- By using Zylin plugin (old)
Briey SoC
Murax SoC
Running Linux
Build the RISC-V GCC
CPU parametrization and instantiation example
Add a custom instruction to the CPU via the plugin system
Adding a new CSR via the plugin system
CPU clock and resets
VexRiscv Architecture
- FPU
- Plugins

Description

This repository hosts a RISC-V implementation written in SpinalHDL. Here are some specs :

RV32I[M][A][F[D]][C] instruction set
Pipelined from 2 to 5+ stages ([Fetch*X], Decode, Execute, [Memory], [WriteBack])
1.44 DMIPS/MHz --no-inline when nearly all features are enabled (1.57 DMIPS/MHz when the divider lookup table is enabled)
Optimized for FPGA, does not use any vendor specific IP block / primitive
AXI4, Avalon, wishbone ready
Optional MUL/DIV extensions
Optional F32/F64 FPU (require data cache for now)
Optional instruction and data caches
Optional hardware refilled MMU
Optional debug extension allowing Eclipse debugging via a GDB >> openOCD >> JTAG connection
Optional interrupts and exception handling with Machine, [Supervisor] and [User] modes as defined in the RISC-V Privileged ISA Specification v1.10.
Two implementations of shift instructions: single cycle (full barrel shifter) and shiftNumber cycles
Each stage can have optional bypass or interlock hazard logic
Linux compatible (SoC : https://github.com/enjoy-digital/linux-on-litex-vexriscv)
Zephyr compatible
FreeRTOS port
Support tightly coupled memory on I$ D$ (see GenFullWithTcm / GenFullWithTcmIntegrated)

The hardware description of this CPU is done by using a very software oriented approach (without any overhead in the generated hardware). Here is a list of software concepts used:

There are very few fixed things. Nearly everything is plugin based. The PC manager is a plugin, the register file is a plugin, the hazard controller is a plugin, ...
There is an automatic a tool which allows plugins to insert data in the pipeline at a given stage, and allows other plugins to read it in another stage through automatic pipelining.
There is a service system which provides a very dynamic framework. For instance, a plugin could provide an exception service which can then be used by other plugins to emit exceptions from the pipeline.

There is a gitter channel for all questions about VexRiscv :

For commercial support, please contact spinalhdl@gmail.com.

Note you may be interested VexiiRiscv (https://github.com/SpinalHDL/VexiiRiscv).

Area usage and maximal frequency

The following numbers were obtained by synthesizing the CPU as toplevel on the fastest speed grade without any specific synthesis options to save area or to get better maximal frequency (neutral). The clock constraint is set to an unattainable value, which tends to increase the design area. The dhrystone benchmark was compiled with the -O3 -fno-inline option. All the cached configurations have some cache trashing during the dhrystone benchmark except the VexRiscv full max perf one. This, of course, reduces the performance. It is possible to produce dhrystone binaries which fit inside a 4KB I$ and 4KB D$ (I already had this case once) but currently it isn't the case. The CPU configurations used below can be found in the src/scala/vexriscv/demo directory.

VexRiscv small (RV32I, 0.52 DMIPS/MHz, no datapath bypass, no interrupt) ->
    Artix 7     -> 243 MHz 504 LUT 505 FF 
    Cyclone V   -> 174 MHz 352 ALMs
    Cyclone IV  -> 179 MHz 731 LUT 494 FF 
    iCE40       -> 92 MHz 1130 LC

VexRiscv small (RV32I, 0.52 DMIPS/MHz, no datapath bypass) ->
    Artix 7     -> 240 MHz 556 LUT 566 FF 
    Cyclone V   -> 194 MHz 394 ALMs
    Cyclone IV  -> 174 MHz 831 LUT 555 FF 
    iCE40       -> 85 MHz 1292 LC

VexRiscv small and productive (RV32I, 0.82 DMIPS/MHz)  ->
    Artix 7     -> 232 MHz 816 LUT 534 FF 
    Cyclone V   -> 155 MHz 492 ALMs
    Cyclone IV  -> 155 MHz 1,111 LUT 530 FF 
    iCE40       -> 63 MHz 1596 LC

VexRiscv small and productive with I$ (RV32I, 0.70 DMIPS/MHz, 4KB-I$)  ->
    Artix 7     -> 220 MHz 730 LUT 570 FF 
    Cyclone V   -> 142 MHz 501 ALMs
    Cyclone IV  -> 150 MHz 1,139 LUT 536 FF 
    iCE40       -> 66 MHz 1680 LC

VexRiscv full no cache (RV32IM, 1.21 DMIPS/MHz 2.30 Coremark/MHz, single cycle barrel shifter, debug module, catch exceptions, static branch) ->
    Artix 7     -> 216 MHz 1418 LUT 949 FF 
    Cyclone V   -> 133 MHz 933 ALMs
    Cyclone IV  -> 143 MHz 2,076 LUT 972 FF 

VexRiscv full (RV32IM, 1.21 DMIPS/MHz 2.30 Coremark/MHz with cache trashing, 4KB-I$,4KB-D$, single cycle barrel shifter, debug module, catch exceptions, static branch) ->
    Artix 7     -> 199 MHz 1840 LUT 1158 FF 
    Cyclone V   -> 141 MHz 1,166 ALMs
    Cyclone IV  -> 131 MHz 2,407 LUT 1,067 FF 

VexRiscv full max perf (HZ*IPC) -> (RV32IM, 1.38 DMIPS/MHz 2.57 Coremark/MHz, 8KB-I$,8KB-D$, single cycle barrel shifter, debug module, catch exceptions, dynamic branch prediction in the fetch stage, branch and shift operations done in the Execute stage) ->
    Artix 7     -> 200 MHz 1935 LUT 1216 FF 
    Cyclone V   -> 130 MHz 1,166 ALMs
    Cyclone IV  -> 126 MHz 2,484 LUT 1,120 FF 

VexRiscv full with MMU (RV32IM, 1.24 DMIPS/MHz 2.35 Coremark/MHz, with cache trashing, 4KB-I$, 4KB-D$, single cycle barrel shifter, debug module, catch exceptions, dynamic branch, MMU) ->
    Artix 7     -> 151 MHz 2021 LUT 1541 FF 
    Cyclone V   -> 124 MHz 1,368 ALMs
    Cyclone IV -> 128 MHz 2,826 LUT 1,474 FF 

VexRiscv linux balanced (RV32IMA, 1.21 DMIPS/MHz 2.27 Coremark/MHz, with cache trashing, 4KB-I$, 4KB-D$, single cycle barrel shifter, catch exceptions, static branch, MMU, Supervisor, Compatible with mainstream linux) ->
    Artix 7     -> 180 MHz 2883 LUT 2130 FF 
    Cyclone V   -> 131 MHz 1,764 ALMs
    Cyclone IV  -> 121 MHz 3,608 LUT 2,082 FF

The following configuration results in 1.44 DMIPS/MHz:

5 stage: F -> D -> E -> M -> WB
single cycle ADD/SUB/Bitwise/Shift ALU
branch/jump done in the E stage
memory load values are bypassed in the WB stage (late result)
33 cycle division with bypassing in the M stage (late result)
single cycle multiplication with bypassing in the WB stage (late result)
dynamic branch prediction done in the F stage with a direct mapped target buffer cache (no penalties on correct predictions)

Note that, recently, the capability to remove the Fetch/Memory/WriteBack stage was added to reduce the area of the CPU, which ends up with a smaller CPU and a better DMIPS/MHz for the small configurations.

Dependencies

On Ubuntu 14:

# JAVA JDK 8
sudo add-apt-repository -y ppa:openjdk-r/ppa
sudo apt-get update
sudo apt-get install openjdk-8-jdk -y
sudo update-alternatives --config java
sudo update-alternatives --config javac

# Install SBT - https://www.scala-sbt.org/
echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list
echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee /etc/apt/sources.list.d/sbt_old.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add
sudo apt-get update
sudo apt-get install sbt

# Verilator (for sim only, really needs 3.9+, in general apt-get will give you 3.8)
sudo apt-get install git make autoconf g++ flex bison
git clone http://git.veripool.org/git/verilator   # Only first time
unsetenv VERILATOR_ROOT  # For csh; ignore error if on bash
unset VERILATOR_ROOT  # For bash
cd verilator
git pull        # Make sure we're up-to-date
git checkout v4.216
autoconf        # Create ./

VexRiscv

Install / Use

README

Index

Description

Area usage and maximal frequency

Dependencies