MiniCPU
Minimal FPGA Processor Core for anycpu.org 8-bit Challenge
Install / Use
/learn @MorrisMA/MiniCPUREADME
Minimal CPU for anycpu.org 8-bit Challenge
Copyright (C) 2017, Michael A. Morris morrisma@mchsi.com. All Rights Reserved.
Released under GPL v3.
General Description
This project provides a minimal CPU core that satisfies the requirements posed by the 8-bit CPU challenge as defined by Arlet Ottens on anycpu.org. The parameters of the challenge are as follows:
Arlet wrote: As a result of a thread on 6502.org, I would like to propose a
challenge. The challenge is create a 6502-era CPU, using an FPGA, using
roughly similar amount of resources as were available to the 6502 designers.
The CPU needs to have similar capabilities as the 6502: 16 bit address bus, 8
bit data bus, 2 interrupts, reset, RDY. To make design easier, the data bus
may be split into separate in/out buses. Instead of an NMI, you can make
higher priority maskable IRQ. It should interface to either a block RAM, or an
external async SRAM. It doesn't need to be 6502 compatible, but you should be
able to port typical 6502 programs to it.
Maximum area is 128 slices on a Spartan 6 (XC6SLX4), which is about what my
NMOS 6502 core requires. Use of block RAMs or DSP blocks is not permitted
inside the CPU, but these resources may be used outside the CPU to build a
complete working system. The goal is to make something as powerful as possible
that could theoretically have existed as a 40 pin DIP in the 70's, hopefully
better than the 6502 itself. One of the goals is to keep room for future
improvement, so filling up the opcode space is not encouraged.
The 8-bit processor core supplied in this project meets the objectives defined above. The processor core provided supplies the core functions required for a complete processor. The supplied core expects to be connected to a synchronous Block RAM memory array, and to memory mapped peripherals. One peripheral function required is that of a vectored interrupt handler that supports the types and number of interrupts and traps required by the application. (Note: the interrupt handler peripheral is expected to supply the reset vector as well as any interrupt vectors that the application may require. The interrupt handler is also required to provide the interrupt enable/disable functionality that the application may require.)
The processor core supplied in the project consists of a 8-bit ALU with 8-bit two registers with a modified stack-like organization. The processor does not include a processor status word, or flags register. The ALU does have a carry register to support multi-precision addition and subtraction. The ALU accumulator, the left operand of any dual operand operation, can be tested for zero and for negative values, although arithmetic 2's Complement overflow is not supported in the current implementation. (Note: if the carry flag needs to be preserved, either in an interrupt service routine, or through a function call, then its value must be captured using either the conditional branch on no carry instruction, or with the rotate instructions. Once captured in the ALU accumulator, the value can be preserved in the workspace.)
A 16-bit workspace pointer provides a stack-like capability for the processor core. The workspace pointer is automatically adjusted during subroutine calls and subroutine returns. The workspace pointer can be adjusted to allocate and deallocate locally accessible variables. The subroutine return instruction also deallocates the allocated workspace on the stack before the return address is popped from the stack. Thus, at the end of the subroutine return, if the proper workspace deallocation value is provided, the workspace pointer is left pointing to the base of the calling function's allocated workspace. The workspace pointer register is the base register for workspace relative addressing. Instructions for loading the accumulator (or the general pointer register described next), or storing the accumulator to the workspace include an offset (positive or negative) which is automatically added to the workspace register during the operand read/write cycle.
In addition to the 16-bit workspace pointer, the processor also provides a dual 16-bit register stack for general memory addressing. These two base- relative pointers are supported by three instructions which allow the pointer stack to be loaded, unloaded from the workspace, and swapped. Like the workspace pointer, the offset for the base-relative address is supplied by the instruction.
A 16-bit instruction pointer provides the addressing of the processor core into program memory. All program memory accesses are performed relative to the instruction pointer. This applies to conditional branches and subroutine calls.
A 16-bit operand register is the final register in the core. The operand register provides the offsets into the local (relative to the workspace) and non-local address spaces (relative to the pointer stack). The operand register also provides the instruction opcodes for indirect instructions. Indirect instructions are instructions which extend the basic, direct instructions of the processor. Using the operand register in this manner allows the MiniCPU's instruction set to be extended in the future.
MiniCPU Instruction Set
The MiniCPU instruction set consists of 16 single byte direct instructions, and 16 single byte indirect instructions executed by the EXE direct instruction. In total, the MiniCPU instruction set is composed of 31 instructions.
Each instruction read from memory consists of a 4-bit instruction and a 4-bit constant. The 4-bit constant is either the least significant four bits of an operand, or the least significant four bits of an indirect instruction opcode. For all direct instructions, except for the EXE direct instruction, the operand register provides an operand: a load constant, a local or non-local address space offset, a workspace allocation/deallocation, or a instruction pointer relative offset.
Since only four bits are included in each instruction byte, two direct instructions, PFX and NFX, are used to load additional nibbles into the operand register. (Note: these prefix instructions can be manually added to the instruction stream, but it is more convenient for the assembler to automatically insert the required number of prefix instructions. Except for subroutine calls and the loading of constants less than 0 or greater than 15, there should be no need for prefix instructions. Negative branches will require the use of the NFX (Negative Prefix) instruction to correctly define the negative relative offset. The operand register is cleared at the completion of all instructions except the PFX and NFX prefix instructions.)
0x - PFX : Prefix Shifts the operand register, KI, left four
positions and logically OR in the least
significant four bits of the instruction byte.
1x - NFX : Negative Prefix Performs operations like PFX except that it
complements the value loaded into the operand
register.
2x - EXE : Execute Execute the indirect instruction whose opcode
is in the operand register. (See below.)
3x - LDK : Load Constant Load operand register into the ALU accumulator.
4x - LDL : Load Local Load accumulator from workspace pointer plus KI.
5x - LDN : Load Non-local Load accumulator from non-local pointer plus KI.
6x - STL : Store Local Store accumulator at workspace pointer plus KI.
7x - STN : Store Non-Local Store accumulator at non-local pointer plus KI.
8x - LDY : Load Y Load Non-Local Pointer from workspace + KI.
9x - STY : Store Y Store Non-Local Pointer at workspace + KI.
Ax - BNE : Branch if ~Z Branch relative to IP if accumulator <> 0.
Bx - BPL : Branch if ~N Branch relative to IP if accumulator positive.
Cx - BNC : Branch if ~C Branch relative to IP if Carry == 0.
Dx - ADJ : Adjust Workspace Workspace Pointer adjusted by operand register.
Ex - RTS : Return Subroutine Adjust workspace pointer by value of operand
and pull return address.
Fx - JSR : Jump Subroutine Push return address to the workspace and jump to
subroutine relative to IP.
20 - SWP : Swap Pointer Stk Swap Non-Local Pointer stack values.
21 - XAB : Swap A and B Swap ALU registers
22 - XCH : Swap X and {A, B} Swap workspace pointer with ALU registers.
23 - LDA : Load accumulator Load accumulator from KI (LDK, LDL, LDN)
24 - CLC : Clear Carry C <= 0
25 - SEC : Set Carry C <= 1
26 - ADC : Add with C A <= A + B + C
27 - SBC : Subtract with C A <= A + ~B + C
28 - ROL : Rotate Left {C, A} <= {A, C}
29 - ASL : Arith. Left Shift {C, A} <= {A, 0}
2A - ROR : Rotate Right {A, C} <= {C, A}
2B - ASR : Arith. Right Shft {A, C} <= {A[7], A}
2C - CPL : Complement Accum A <= ~A
2D - AND : AND ALU Registers A <= A & B
2E - ORL : ORL ALU Registers A <= A | B
2F - XOR : XOR ALU Registers A <= A ^ B
Instruction timing
Assuming that no prefix instructions are required, the following table shows the instruction timing for the MiniCPU instruction set. (Note: each prefix instruction required to set up the operand register adds 1 cycle to the instruction timing.)
0x - PFX : 1
1x - NFX : 1
2x - EXE : 1
3x - LDK : 1
4x - LDL : 2
5x - LDN : 2
6x - STL : 2
