SkillAgentSearch skills...

Mir

A lightweight JIT compiler based on MIR (Medium Internal Representation) and C11 JIT compiler and interpreter based on MIR

Install / Use

/learn @vnmakarov/Mir

README

<p> <a href="https://github.com/vnmakarov/mir/actions?query=workflow%3AAMD64%2DLinux%2DOSX%2DWindows%2Dtest"><img alt="GitHub MIR test status" src="https://github.com/vnmakarov/mir/workflows/AMD64%2DLinux%2DOSX%2DWindows%2Dtest/badge.svg"></a> <a href="https://github.com/vnmakarov/mir/actions?query=workflow%3Aapple%2Daarch64%2Dtest"><img alt="GitHub MIR test status on Apple Silicon" src="https://github.com/vnmakarov/mir/workflows/apple%2Daarch64%2Dtest/badge.svg"></a> <a href="https://github.com/vnmakarov/mir/actions?query=workflow%3Aaarch64%2Dtest"><img alt="GitHub MIR test status on aarch64" src="https://github.com/vnmakarov/mir/workflows/aarch64%2Dtest/badge.svg"></a> <a href="https://github.com/vnmakarov/mir/actions?query=workflow%3Appc64le%2Dtest"><img alt="GitHub MIR test status on ppc64le" src="https://github.com/vnmakarov/mir/workflows/ppc64le%2Dtest/badge.svg"></a> <a href="https://github.com/vnmakarov/mir/actions?query=workflow%3As390x%2Dtest"><img alt="GitHub MIR test status on s390x" src="https://github.com/vnmakarov/mir/workflows/s390x%2Dtest/badge.svg"></a> <a href="https://github.com/vnmakarov/mir/actions?query=workflow%3Ariscv64%2Dtest"><img alt="GitHub MIR test status on riscv64" src="https://github.com/vnmakarov/mir/workflows/riscv64%2Dtest/badge.svg"></a> <a href="https://github.com/vnmakarov/mir/actions?query=workflow%3AAMD64%2DLinux%2Dbench"><img alt="GitHub MIR benchmark status" src="https://github.com/vnmakarov/mir/workflows/AMD64%2DLinux%2Dbench/badge.svg"></a> </p>

MIR Project

  • MIR means Medium Internal Representation
  • MIR project goal is to provide a basis to implement fast and lightweight JITs
  • Plans to try MIR light-weight JIT first for CRuby or/and MRuby implementation
  • Motivations for the project can be found in this blog post
  • C2MIR compiler description can be found in this blog post
  • Future of code specialization in MIR for dynamic language JITs can be found in this blog post

Disclaimer

  • There is absolutely no warranty that the code will work for any tests except ones given here and on platforms other than x86_64 Linux/OSX, aarch64 Linux/OSX(Apple M1), and ppc64le/s390x/riscv64 Linux

MIR

  • MIR is strongly typed IR
  • MIR can represent machine 32-bit and 64-bit insns of different architectures
  • MIR.md contains detail description of MIR and its API. Here is a brief MIR description:
  • MIR consists of modules
    • Each module can contain functions and some declarations and data
    • Each function has signature (parameters and return types), local variables (including function arguments) and instructions
      • Each local variable has type which can be only 64-bit integer, float, double, or long double and can be bound to a particular target machine register
      • Each instruction has opcode and operands
        • Operand can be a local variable (or a function argument), immediate, memory, label, or reference
          • Immediate operand can be 64-bit integer, float, double, or long double value
      • Memory operand has a type, displacement, base and index integer local variable, and integer constant as a scale for the index
        • Memory type can be 8-, 16-, 32- and 64-bit signed or unsigned integer type, float type, double, or long double type
          • When integer memory value is used it is expanded with sign or zero promoting to 64-bit integer value first
      • Label operand has name and used for control flow instructions
      • Reference operand is used to refer to functions and declarations in the current module, in other MIR modules, or for C external functions or declarations
    • opcode describes what the instruction does
    • There are conversion instructions for conversion between different 32- and 64-bit signed and unsigned values, float, double, and long double values
    • There are arithmetic instructions (addition, subtraction, multiplication, division, modulo) working on 32- and 64-bit signed and unsigned values, float, double, and long double values
    • There are logical instructions (and, or, xor, different shifts) working on 32- and 64-bit signed and unsigned values
    • There are comparison instructions working on 32- and 64-bit signed and unsigned values, float, double, and long double values
    • There are local variable address instructions to get address of local variable
    • There are branch insns (unconditional jump, and jump on zero or non-zero value) which take a label as one their operand
    • There are combined comparison and branch instructions taking a label as one operand and two 32- and 64-bit signed and unsigned values, float, double, and long double values
    • There is switch instruction to jump to a label from labels given as operands depending on index given as the first operand
    • There is label address instruction to get a label address and unconditional indirect jump instruction whose operand contains previously taken label address
    • There are function and procedural call instructions
    • There are return instructions optionally returning 32- and 64-bit integer values, float, double, and long double values
    • There are specialized light-weight call and return instructions can be used for fast switching from threaded interpreter to JITted code and vice verse
    • There are property instructions to generated specialized machine code when lazy basic block versioning is used

MIR Example

  • You can create MIR through API consisting of functions for creation of modules, functions, instructions, operands etc
  • You can also create MIR from MIR binary or text file
  • The best way to get a feel about MIR is to use textual MIR representation
  • Example of Eratosthenes sieve on C
#define Size 819000
int sieve (int N) {
  int64_t i, k, prime, count, n; char flags[Size];

  for (n = 0; n < N; n++) {
    count = 0;
    for (i = 0; i < Size; i++)
      flags[i] = 1;
    for (i = 0; i < Size; i++)
      if (flags[i]) {
        prime = i + i + 3;
        for (k = i + prime; k < Size; k += prime)
          flags[k] = 0;
        count++;
      }
  }
  return count;
}
void ex100 (void) {
  printf ("sieve (100) = %d\", sieve (100));
}
  • Example of MIR textual file for the same function:
m_sieve:  module
          export sieve
sieve:    func i32, i32:N
          local i64:iter, i64:count, i64:i, i64:k, i64:prime, i64:temp, i64:flags
          alloca flags, 819000
          mov iter, 0
loop:     bge fin, iter, N
          mov count, 0;  mov i, 0
loop2:    bge fin2, i, 819000
          mov u8:(flags, i), 1;  add i, i, 1
          jmp loop2
fin2:     mov i, 0
loop3:    bge fin3, i, 819000
          beq cont3, u8:(flags,i), 0
          add temp, i, i;  add prime, temp, 3;  add k, i, prime
loop4:    bge fin4, k, 819000
          mov u8:(flags, k), 0;  add k, k, prime
          jmp loop4
fin4:     add count, count, 1
cont3:    add i, i, 1
          jmp loop3
fin3:     add iter, iter, 1
          jmp loop
fin:      ret count
          endfunc
          endmodule
m_ex100:  module
format:   string "sieve (10) = %d\n"
p_printf: proto p:fmt, i32:result
p_sieve:  proto i32, i32:iter
          export ex100
          import sieve, printf
ex100:    func v, 0
          local i64:r
          call p_sieve, sieve, r, 100
          call p_printf, printf, format, r
          endfunc
          endmodule
  • func describes signature of the function (taking 32-bit signed integer argument and returning 32-bit signed integer value) and function argument N which will be local variable of 64-bit signed integer type
    • Function results are described first by their types and have no names. Parameters always have names and go after the result description
    • Function may have more than one result but possible number and combination of result types are currently machine defined
  • You can write several instructions on one line if you separate them by ;
  • The instruction result, if any, is always the first operand
  • We use 64-bit instructions in calculations
  • We could use 32-bit instructions in calculations which would have sense if we use 32-bit CPU
    • When we use 32-bit instructions we take only 32-bit significant part of 64-bit operand and high 32-bit part of the result is machine defined (so if you write a portable MIR code consider the high 32-bit part value is undefined)
  • string describes data in form of C string
    • C string can be used directly as an insn operand. In this case the data will be added to the module and the data address will be used as an operand
  • export describes the module functions or data which are visible outside the current module
  • import describes the module functions or data which should be defined in other MIR modules
  • proto describes function prototypes. Its syntax is the same as func syntax
  • call are MIR instruction to call functions

Running MIR code

  • After creating MIR modules (through MIR API or reading MIR binary or textual files), you should load the modules
    • Loading modules makes visible exported module functions and data
    • You can load external C function with MIR_load_external
  • After loading modules, you should link the loaded modules
    • Linking modules resolves imported module references, initializes data, and set up call interfaces
  • After linking, you can interpret functions from the modules or
View on GitHub
GitHub Stars2.6k
CategoryDevelopment
Updated13h ago
Forks168

Languages

C

Security Score

100/100

Audited on Mar 27, 2026

No findings