Mcsema

Framework for lifting x86, amd64, aarch64, sparc32, and sparc64 program binaries to LLVM bitcode

Generate Convert Improve

Install / Use

/learn @lifting-bits/Mcsema

About this skill

Quality Score

0/100

README

McSema

McSema is an executable lifter. It translates ("lifts") executable binaries from native machine code to LLVM bitcode. LLVM bitcode is an intermediate representation form of a program that was originally created for the retargetable LLVM compiler, but which is also very useful for performing program analysis methods that would not be possible to perform on an executable binary directly.

McSema enables analysts to find and retroactively harden binary programs against security bugs, independently validate vendor source code, and generate application tests with high code coverage. McSema isn’t just for static analysis. The lifted LLVM bitcode can also be fuzzed with libFuzzer, an LLVM-based instrumented fuzzer that would otherwise require the target source code. The lifted bitcode can even be compiled back into a runnable program! This is a procedure known as static binary rewriting, binary translation, or binary recompilation.

McSema supports lifting both Linux (ELF) and Windows (PE) executables, and understands most x86 and amd64 instructions, including integer, X87, MMX, SSE and AVX operations. AARCH64 (ARMv8) instruction support is in active development.

Using McSema is a two-step process: control flow recovery, and instruction translation. Control flow recovery is performed using the mcsema-disass tool, which relies on IDA Pro to disassemble a binary file and produce a control flow graph. Instruction translation is then performed using the mcsema-lift tool, which converts the control flow graph into LLVM bitcode. Under the hood, the instruction translation capability of mcsema-lift is implemented in the remill library. The development of remill was a result of refactoring and improvements to McSema, and was first introduced with McSema version 2.0.0. Read more about remill here.

McSema and remill were developed and are maintained by Trail of Bits, funded by and used in research for DARPA and the US Department of Defense.

Build status

| | master | | ----- | ---------------------------------------- | | Linux | |

Features

Lifts 32- and 64-bit Linux ELF and Windows PE binaries to bitcode, including executables and shared libraries for each platform.
Supports a large subset of x86 and x86-64 instructions, including most integer, X87, MMX, SSE, and AVX operations.
Supports a large subset of AArch64, SPARCv8+ (SPARC32), and SPARCv9 (SPARC64) instuctions.
McSema runs on Windows and Linux and has been tested on Windows 7, 10, Ubuntu (14.04, 16.04, 18.04), and openSUSE.
McSema can cross-lift: it can translate Linux binaries on Windows, or Windows binaries on Linux.
Output bitcode is compatible with the LLVM toolchain (versions 3.5 and up).
Translated bitcode can be analyzed or recompiled as a new, working executable with functionality identical to the original.

Use-cases

Why would anyone translate binaries back to bitcode?

Binary Patching And Modification. Lifting to LLVM IR lets you cleanly modify the target program. You can run obfuscation or hardening passes, add features, remove features, rewrite features, or even fix that pesky typo, grammatical error, or insane logic. When done, your new creation can be recompiled to a new binary sporting all those changes. In the Cyber Grand Challenge, we were able to use McSema to translate challenge binaries to bitcode, insert memory safety checks, and then re-emit working binaries.
Symbolic Execution with KLEE. KLEE operates on LLVM bitcode, usually generated by providing source to the LLVM toolchain. McSema can lift a binary to LLVM bitcode, permitting KLEE to operate on previously unavailable targets. See our walkthrough showing how to run KLEE on a symbolic maze.
Re-use existing LLVM-based tools. KLEE is not the only tool that becomes available for use on bitcode. It is possible to run LLVM optimization passes and other LLVM-based tools like libFuzzer on lifted bitcode.
Analyze the binary rather than the source. Source level analysis is great but not always possible (e.g. you don't have the source) and, even when it is available, it lacks compiler transformations, re-ordering, and optimizations. Analyzing the actual binary guarantees that you're analyzing the true executed behavior.
Write one set of analysis tools. Lifting to LLVM IR means that one set of analysis tools can work on both the source and the binary. Maintaining a single set of tools saves development time and effort, and allows for a single set of better tools.

Comparison with other machine code to LLVM bitcode lifters

| | McSema | dagger | llvm-mctoll | retdec | reopt | rev.ng | bin2llvm | fcd | RevGen | Fracture | libbeauty | | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | | Actively maintained? | Yes | No | Yes | Yes | Yes | No | Maybe | Maybe | Maybe | No | Yes | | Commercial support available? | Yes | No | No | No | Maybe | No | No | No | No | Maybe | No | | LLVM versions | 9 - 11 | 5 | current | 4.0 | 3.8 | 3.8 | 3.2 | 4 | 3.9 | 3.4 | 6 | | Builds with CI? | Yes | No | No | Yes | No | No | Yes | Maybe | Maybe | No | No | | 32-bit architectures | x86, SPARC32 | x86 | ARM | x86, ARM, MIPS, PIC32, PowerPC | | ARM, MIPS | S2E | S2E | S2E | ARM, x86 | | | 64-bit architectures | x86-64, AArch64, SPARC64 | x86-64, AArch64) | x86-64 | x86-64, arm64 & more | x86-64 | x86-64 | | S2E | S2E | PowerPC | x86-64 | | Control-flow recovery | IDA Pro | Ad-hoc | Ad-hoc | Ad-hoc | Ad-hoc | Ad-hoc | Ad-hoc | Ad-hoc | McSema | Ad-hoc | Ad-hoc | | File formats | ELF, PE | ELF, Mach-O | | ELF, PE, Mach-O, COFF, AR, Intel HEX, Raw | ELF | ELF | ELF | | ELF, PE | ELF, Mach-O (maybe) | ELF | | Bitcode is executable? | Yes | Yes | Yes | Yes | Yes | Yes | No | No | CGC | No | No | | C++ exceptions suport? | Yes | No | No | No | No | Indirectly | No | No | No | No | Maybe | | Lifts stack variables? | Yes | No | Maybe | Yes | No | No | No | Yes | No | No | Maybe | | Lifts global variables? | Yes | Maybe | Yes | Yes | No | Maybe | No | No | No | Yes | Maybe | | Has a test suite? | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | No |

Note: We label some architectures as "S2E" to mean any architecture supported by the S2E system. A system using "McSema" for control-flow recovery (e.g. RevGen) uses McSema's CFG.proto format for recovering control-flow. In the case of RevGen, only bitcode produced from DARPA Cyber Grand Challenge (CGC) binaries is executable.

Dependencies

| Name | Version | | ---- | ------- | | Git | Latest | | CMake | 3.14+ | | Remill | 710013a | | Anvill | bc3183b | | Python | 3.8 | | Python Package Index | Latest | | python-protobuf | 3.2.0 | | python-clang | 3.5.0 | | ccsyspath | 1.1.0 | | IDA Pro | 7.5+ | | macOS | Latest | | Ubuntu | 18.04, 20.04 |

DynInst support is optional if you use the experimental DynInst disassembler. Note: We do not provide support for the DynInst disassembler.

Getting and building the code

Docker

Step 1: Clone the repository

git clone https://github.com/lifting-bits/mcsema
cd mcsema

Step 2: Add your disassembler to the Dockerfile

Currently IDA is the only supported frontend for control-flow recovery, it's left as an exercise to the reader to install your disassembler of choice. Experimental support for DynInst is available but may be buggy and sometimes get out of date, as we do not officially support it. DynInst support is provided as an exemplar of how to make a third-party disassembler.

Step 3: Build & Run Dockerfile

This will build the container for you and run it with your local directory mounted into the container (at /mcsema/local) such that your work in the container is saved locally:

# Build McSema container
ARCH=amd64; UBUNTU=18.04; LLVM=9; docker build . \
  -t mcsema:llvm${LLVM}-ubuntu${UBUNTU}-${ARCH} \
  -f Dockerfile \
  --build-arg UBUNTU_VERSION=${UBUNTU} \
  --build-arg LLVM_VERSION=${LLVM} \
  --build-arg ARCH=${ARCH}

# Run McSema container lifter
docker run --rm -it --ip

Related Skills

node-connect

339.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

83.9k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

339.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

83.9k

Commit, push, and open a PR