SkillAgentSearch skills...

ARMLEG

Multi-cycle pipelined ARM-LEGv8 CPU with Forwarding and Hazard Detection.

Install / Use

/learn @ronitrex/ARMLEG
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

ARM-LEGv8 CPU GitHub

Multi-cycle pipelined ARM-LEGv8 CPU with Forwarding and Hazard Detection as described in 'Computer Organization and Design ARM Edition'. The CPU can execute memory-reference instructions like LDUR and STUR, arithmetic-logical instructions like ADD, SUB, AND and ORR and branch instructions like B and CBZ.

Table of contents

Introduction

The ARMv8 architecture is a 64-bit architecture with native support for 32 bit instructions. It has 31 general purpose registers, each 64-bits wide. Compared to this, the 32-bit ARMv7 architecture had 15 general purpose registers, each 32-bits wide. The ARMv8 follows some key design principles:

- Simplicity favours regularity
- Regularity makes implementation simpler
- Simplicity enables higher performance at lower cost
- Smaller is faster
- Different formats complicate decoding, therefore keep formats as similar as possible

Registers are faster to access than memory. Operating on Data memory requires loads and stores. This means more instructions need to be executed when data is fetched from Data memory. Therefore more frequent use of registers for variables speeds up execution time.

The 32-bit ARMv7 architecture had 15 general purpose registers, each 32-bits wide. The ARMv8 architecture has 31 general registers, each 64-bits wide. This means that optimized code should be able to use the internal registers more often than memory, and that these registers can hold bigger numbers and addresses. The result is that ARM’s 64-bit processors can do things quicker.

In terms of energy efficiency, the use of 64-bit registers doesn’t increase the power usage. In some cases the fact that a 64-bit core can perform certain operations quicker means that it will be more energy efficient than a 32-bit core, simply because it gets the job done faster and can then power down.

To favor simplicity, arithmetic operations are formed with two sources and one destination. For example,
ADD a, b, c --> a gets b + c
SUB a, b, c --> a gets b - c

The LEGv8 instruction set is a subset of ARM instruction set. LEGv8 has a 32 × 64-bit register module which is used for frequently accessed data. The 64-bit data is called a “doubleword”.
Of these 32 registers, 31 registers X0 to X30, are the general purpose registers. In the full ARMv8 instruction set, register 31 is XZR in most instructions but the stack point ( SP ) in others. But in LEGv8, the 32nd register or X31 is always initialized to 0. That is, it is always XZR in LEGv8. And SP is always register 28.

The project implementation includes a subset of the core LEGv8 instruction set:

  • The memory-reference instructions load register unscaled ( LDUR ) and store register unscaled ( STUR )
  • The arithmetic-logical instructions ADD, SUB, AND and ORR
  • The instructions compare and branch on zero ( CBZ ) and branch ( B )

Architecture

Let's start with an abstract view of the CPU. The CPU comprises of a Program Counter [PC], Instruction Memory, Register module [Registers], Arithmetic Logic Unit [ALU] and Data Memory.

The Program Counter or PC reads the instructions from the instruction memory, then modifies the Register module to hold the current instruction. The Registers pass the values in instruction memory to the ALU to perform operations. Depending on the type of operation performed, the result may need to be loaded from or stored to the data memory. If the result needs to be loaded from the data memory, it can be written back to the Register module to perform any further operations.

| Module | Register width | No. of registers| | ----------------|:-------------:| -----:| | Instruction Memory | 8 bits | 64 | | Registers | 64 bits | 32 | | Data Memory | 64 bits | 128 |

Program Counter

A CPU instruction is 64 bits wide. The Program Counter or PC goes through the Instruction Memory and fetches a 32 bit instruction in each cycle. 4 registers of 8 bits of information each from the Instruction Memory are read in little endian byte order and form the first 32 bits of the CPU instruction. That is,

CPU_Instruction[8:0] = Instruction_Memory[PC+3];
CPU_Instruction[16:8] = Instruction_Memory[PC+2];
CPU_Instruction[24:16] = Instruction_Memory[PC+1];
CPU_Instruction[31:24] = Instruction_Memory[PC];

Instruction Memory

The data is fetched from Instruction Memory in Little Endian Byte Order. 32-bit data is called a “word”. The Instruction Memory is read one word at a time. LEGv8 does not require words to be aligned in memory, except for instructions and the stack.

The Instruction Memory supports instructions in 32-bit format. The instructions are given below with bit width of various parts of instructions.

| Term | Meaning | | ------------------|:----------| |opcode| Operation code| |Rn | First operand register| |Rm | Second operand register| |Rd | Destination register; used in R-type and I-type instructions to specify register that will store the result of the current operation.| |Rt | Target register; used in D-type instructions to specify register where value is to be loaded to or stored at. |shamt | Shift amount| |ALU_immediate| Result obtained from the ALU during instruction execution| |DT_address| Data address offset| |BR_address| Branch address offset | |COND_ BR_address|Conditional Branch address offset|

Some examples of instructions that have been implemented in this project:

| Instruction | Instruction Name | Instruction Type | Instruction Opcode (Hex) |Instruction Expression| | ----------------|:-------------:| -----:|-----------------------------:|------------:| | LDUR | LoaD (Unscaled offset) Register | D-type | 7C2| Register[Rt] = Mount(Register[Rn] + DT_address)| | STUR |STore (Unscaled offset) Register| D-type |7C0| Mount(Register[Rn] + DT_address) = Register[Rt]| | ADD | Add | R-type | 458|Register[Rd] = Register[Rn] + Register[Rm] | |SUB | Subtract | R-type | 658 |Register[Rd] = Register[Rn] - Register[Rm] |ORR|Inclusive OR|R-type| 550|Register[Rd] = Register[Rn] OR Register[Rm]| |AND|AND|R-type|450|Register[Rd] = Register[Rn] AND Register[Rm]| |CBZ|Compare and Branch if Zero|CB-type|5A0-5A7|if(Register[Rt]==0) --> PC = PC + COND_ BR_address}| |B|Branch|B-type|0A0-0BF|PC = PC + BR_address

Register Module

As mentioned before, the register module has 31 general purpose registers, each 64-bits wide. Register module schematic:

| Registers| Used for | | ------------------|:----------| |X0 – X7| procedure arguments/results| |X8 | indirect result location register| |X9 – X15| temporaries| |X16 – X17| (IP0 – IP1): may be used by linker as a scratch register, other times as temporary register | |X18| platform register for platform independent code; otherwise a temporary register |X19 – X27| saved (value is preserved across function calls) |X28 (SP)| stack pointer| |X29 (FP)| frame pointer| |X30 (LR)| link register (return address)| |XZR (register 31)| the constant value 0|

Register module feeding ALU :

Arithmetic Logic Unit or ALU

The operation codes determine how the ALU treats the data it receives from the Registers module. The ALU is used to calculate:

  • Arithmetic result
  • Memory address for load/store
  • Branch target address

| ALU Operation code| Operation performed | | ------------------|:----------| | 4'b0000 |A AND B | | 4'b0001 |A OR B | | 4'b0010 |A ADD B | | 4'b0110 |A SUBTRACT B| | 4'b0111 |B (pass input B)| | 4'b1100 |A NOR B| | 4'b1111 |default or edge cases|

Data Memory

The Data memory unit is a state element which acts as a data storage medium. To retrieve data, it has inputs for the address and the write data, and a single output for the read result. There are separate read and write controls, although only one of these may be asserted on any given clock.
In this project, it is initialized as follows:

| Register[location]| Value | | ------------------|:----------| |memoryData[0]| 64'd0| |memoryData[8]| 64'd1| |memoryData[16]| 64'd2| |memoryData[24]| 64'd3| |memoryData[32]| 64'd4| |memoryData[40]| 64'd5| |memoryData[48]| 64'd6| |memoryData[56]| 64'd7| |memoryData[64]| 64'd8| |memoryData[72]| 64'd9| |memoryData[80]| 64'd10| |memoryData[88]| 64'd11| |memoryData[96]| 64'd12|

Sign-extend and Shift Left 2

The Instruction Memory is read in chunks of 32-bits, whereas the CPU instruction is of 64 bits. The sign extension unit has the 32-bit instruction as input. From that, it selects a 9-bit for load and store or a 19-bit field for compare and branch on zero. It is then sign-extended into a 64-bit result appearing on the output.

To read a 32-bit instruction, 4 program counters per instruction are required. Each program counter corresponds to each byte being read. The program counter is therefore incremented by 4 at a time. If the bits are shifted to the left by 2, which is similar to multiplying

View on GitHub
GitHub Stars34
CategoryDevelopment
Updated1mo ago
Forks7

Languages

Verilog

Security Score

90/100

Audited on Feb 20, 2026

No findings