Mir
A lightweight JIT compiler based on MIR (Medium Internal Representation) and C11 JIT compiler and interpreter based on MIR
Install / Use
/learn @vnmakarov/MirREADME
<p>
<a href="https://github.com/vnmakarov/mir/actions?query=workflow%3AAMD64%2DLinux%2DOSX%2DWindows%2Dtest"><img alt="GitHub MIR test status" src="https://github.com/vnmakarov/mir/workflows/AMD64%2DLinux%2DOSX%2DWindows%2Dtest/badge.svg"></a>
<a href="https://github.com/vnmakarov/mir/actions?query=workflow%3Aapple%2Daarch64%2Dtest"><img alt="GitHub MIR test status on Apple Silicon" src="https://github.com/vnmakarov/mir/workflows/apple%2Daarch64%2Dtest/badge.svg"></a>
<a href="https://github.com/vnmakarov/mir/actions?query=workflow%3Aaarch64%2Dtest"><img alt="GitHub MIR test status on aarch64" src="https://github.com/vnmakarov/mir/workflows/aarch64%2Dtest/badge.svg"></a>
<a href="https://github.com/vnmakarov/mir/actions?query=workflow%3Appc64le%2Dtest"><img alt="GitHub MIR test status on ppc64le" src="https://github.com/vnmakarov/mir/workflows/ppc64le%2Dtest/badge.svg"></a>
<a href="https://github.com/vnmakarov/mir/actions?query=workflow%3As390x%2Dtest"><img alt="GitHub MIR test status on s390x" src="https://github.com/vnmakarov/mir/workflows/s390x%2Dtest/badge.svg"></a>
<a href="https://github.com/vnmakarov/mir/actions?query=workflow%3Ariscv64%2Dtest"><img alt="GitHub MIR test status on riscv64" src="https://github.com/vnmakarov/mir/workflows/riscv64%2Dtest/badge.svg"></a>
<a href="https://github.com/vnmakarov/mir/actions?query=workflow%3AAMD64%2DLinux%2Dbench"><img alt="GitHub MIR benchmark status" src="https://github.com/vnmakarov/mir/workflows/AMD64%2DLinux%2Dbench/badge.svg"></a>
</p>
MIR Project
- MIR means Medium Internal Representation
- MIR project goal is to provide a basis to implement fast and lightweight JITs
- Plans to try MIR light-weight JIT first for CRuby or/and MRuby implementation
- Motivations for the project can be found in this blog post
- C2MIR compiler description can be found in this blog post
- Future of code specialization in MIR for dynamic language JITs can be found in this blog post
Disclaimer
- There is absolutely no warranty that the code will work for any tests except ones given here and on platforms other than x86_64 Linux/OSX, aarch64 Linux/OSX(Apple M1), and ppc64le/s390x/riscv64 Linux
MIR
- MIR is strongly typed IR
- MIR can represent machine 32-bit and 64-bit insns of different architectures
- MIR.md contains detail description of MIR and its API. Here is a brief MIR description:
- MIR consists of modules
- Each module can contain functions and some declarations and data
- Each function has signature (parameters and return types), local variables
(including function arguments) and instructions
- Each local variable has type which can be only 64-bit integer, float, double, or long double and can be bound to a particular target machine register
- Each instruction has opcode and operands
- Operand can be a local variable
(or a function argument), immediate, memory, label, or reference
- Immediate operand can be 64-bit integer, float, double, or long double value
- Operand can be a local variable
(or a function argument), immediate, memory, label, or reference
- Memory operand has a type, displacement, base and index integer local variable,
and integer constant as a scale for the index
- Memory type can be 8-, 16-, 32- and 64-bit signed or unsigned integer type,
float type, double, or long double type
- When integer memory value is used it is expanded with sign or zero promoting to 64-bit integer value first
- Memory type can be 8-, 16-, 32- and 64-bit signed or unsigned integer type,
float type, double, or long double type
- Label operand has name and used for control flow instructions
- Reference operand is used to refer to functions and declarations in the current module, in other MIR modules, or for C external functions or declarations
- opcode describes what the instruction does
- There are conversion instructions for conversion between different 32- and 64-bit signed and unsigned values, float, double, and long double values
- There are arithmetic instructions (addition, subtraction, multiplication, division, modulo) working on 32- and 64-bit signed and unsigned values, float, double, and long double values
- There are logical instructions (and, or, xor, different shifts) working on 32- and 64-bit signed and unsigned values
- There are comparison instructions working on 32- and 64-bit signed and unsigned values, float, double, and long double values
- There are local variable address instructions to get address of local variable
- There are branch insns (unconditional jump, and jump on zero or non-zero value) which take a label as one their operand
- There are combined comparison and branch instructions taking a label as one operand and two 32- and 64-bit signed and unsigned values, float, double, and long double values
- There is switch instruction to jump to a label from labels given as operands depending on index given as the first operand
- There is label address instruction to get a label address and unconditional indirect jump instruction whose operand contains previously taken label address
- There are function and procedural call instructions
- There are return instructions optionally returning 32- and 64-bit integer values, float, double, and long double values
- There are specialized light-weight call and return instructions can be used for fast switching from threaded interpreter to JITted code and vice verse
- There are property instructions to generated specialized machine code when lazy basic block versioning is used
MIR Example
- You can create MIR through API consisting of functions for creation of modules, functions, instructions, operands etc
- You can also create MIR from MIR binary or text file
- The best way to get a feel about MIR is to use textual MIR representation
- Example of Eratosthenes sieve on C
#define Size 819000
int sieve (int N) {
int64_t i, k, prime, count, n; char flags[Size];
for (n = 0; n < N; n++) {
count = 0;
for (i = 0; i < Size; i++)
flags[i] = 1;
for (i = 0; i < Size; i++)
if (flags[i]) {
prime = i + i + 3;
for (k = i + prime; k < Size; k += prime)
flags[k] = 0;
count++;
}
}
return count;
}
void ex100 (void) {
printf ("sieve (100) = %d\", sieve (100));
}
- Example of MIR textual file for the same function:
m_sieve: module
export sieve
sieve: func i32, i32:N
local i64:iter, i64:count, i64:i, i64:k, i64:prime, i64:temp, i64:flags
alloca flags, 819000
mov iter, 0
loop: bge fin, iter, N
mov count, 0; mov i, 0
loop2: bge fin2, i, 819000
mov u8:(flags, i), 1; add i, i, 1
jmp loop2
fin2: mov i, 0
loop3: bge fin3, i, 819000
beq cont3, u8:(flags,i), 0
add temp, i, i; add prime, temp, 3; add k, i, prime
loop4: bge fin4, k, 819000
mov u8:(flags, k), 0; add k, k, prime
jmp loop4
fin4: add count, count, 1
cont3: add i, i, 1
jmp loop3
fin3: add iter, iter, 1
jmp loop
fin: ret count
endfunc
endmodule
m_ex100: module
format: string "sieve (10) = %d\n"
p_printf: proto p:fmt, i32:result
p_sieve: proto i32, i32:iter
export ex100
import sieve, printf
ex100: func v, 0
local i64:r
call p_sieve, sieve, r, 100
call p_printf, printf, format, r
endfunc
endmodule
funcdescribes signature of the function (taking 32-bit signed integer argument and returning 32-bit signed integer value) and function argumentNwhich will be local variable of 64-bit signed integer type- Function results are described first by their types and have no names. Parameters always have names and go after the result description
- Function may have more than one result but possible number and combination of result types are currently machine defined
- You can write several instructions on one line if you separate them by
; - The instruction result, if any, is always the first operand
- We use 64-bit instructions in calculations
- We could use 32-bit instructions in calculations which would have sense if we use 32-bit CPU
- When we use 32-bit instructions we take only 32-bit significant part of 64-bit operand and high 32-bit part of the result is machine defined (so if you write a portable MIR code consider the high 32-bit part value is undefined)
stringdescribes data in form of C string- C string can be used directly as an insn operand. In this case the data will be added to the module and the data address will be used as an operand
exportdescribes the module functions or data which are visible outside the current moduleimportdescribes the module functions or data which should be defined in other MIR modulesprotodescribes function prototypes. Its syntax is the same asfuncsyntaxcallare MIR instruction to call functions
Running MIR code
- After creating MIR modules (through MIR API or reading MIR binary or textual files),
you should load the modules
- Loading modules makes visible exported module functions and data
- You can load external C function with
MIR_load_external
- After loading modules, you should link the loaded modules
- Linking modules resolves imported module references, initializes data, and set up call interfaces
- After linking, you can interpret functions from the modules or
