scfw

A cross-platform C++ framework for building Windows shellcode. Supports Linux, macOS, or Windows development environments. Creates position-independent blob that runs in user-mode or kernel-mode, x86 or x64.

#include <scfw/runtime.h>
#include <scfw/platform/windows/usermode.h>

IMPORT_BEGIN();
    IMPORT_MODULE("kernel32.dll");
        IMPORT_SYMBOL(WriteConsoleA);
IMPORT_END();

namespace sc {

extern "C" void __fastcall entry(void* argument1, void* argument2)
{
    HANDLE StdOut = NtCurrentPeb()->ProcessParameters->StandardOutput;
    WriteConsoleA(StdOut, _T("Hello, World!\n"), 14, NULL, NULL);
}

} // namespace sc

Build it, extract the .text section, and you have a self-contained shellcode binary that resolves its own imports at runtime.

Motivation

As with all my projects, it boils down to "I need that, current solutions were unsatisfactory, and I want to learn something".

I like to experiment with <abbr title="Virtual Machine Introspection">VMI</abbr> and sometimes it's really useful to be able to inject a piece of code into the memory of some process (and/or kernel) and execute it. And because vmi-rs development happens on Linux, and my daily driver became macOS, I wanted a convenient way to generate Windows shellcode on them.

Although compression and "non-null" shellcode are not the primary goals of this project, they might be interesting additions in the future.

Motivation
Installation
Building
Running Shellcode
Architecture
User-Mode Shellcode
Kernel-Mode Shellcode
Compile-Time Options
Per-Entry Flags
CMake Build Options
Examples
License

Installation

Prerequisites

scfw cross-compiles Windows shellcode from any host OS. You don't need a Windows machine to build.

CMake 3.22+
Ninja build system
clang clang 19+
- Note: On Windows, clang 21+ currently experiences issues with /FILEALIGN:1 during linking. If you encounter linker errors, try to compile with -DSCFW_FILE_ALIGNMENT=0 or switch to older clang version.
LLVM tools: lld-link, llvm-objcopy, llvm-readobj
Windows SDK headers and libraries (can be fetched automatically on any platform, see below)

Dependencies

phnt (Windows native API headers) is fetched automatically by CMake via FetchContent. No action needed.

Windows SDK is the only dependency that requires some setup, especially on non-Windows hosts. On Windows with MSVC, CMake detects the system SDK automatically. On macOS and Linux (or Windows without the SDK), CMake will tell you it's missing and suggest how to fetch it.

The easiest option is to let CMake download it for you:

cmake --preset x64 -DSCFW_FETCH_WINSDK=ON

This runs scripts/fetch-winsdk.sh (or fetch-winsdk.ps1 on Windows), which uses xwin to download the Windows SDK. If xwin isn't installed, the script looks for Rust toolchain and installs xwin via cargo install. If Rust isn't installed either, the script downloads a temporary Rust toolchain, installs xwin, downloads the SDK, and then cleans up both the Rust toolchain and xwin. Nothing is left behind on your system - the temporary installations are fully isolated.

Alternatively, you can manually place the Windows SDK into the winsdk/ directory at the project root. The expected structure is:

winsdk/
  crt/
    include/
    lib/{x86,x86_64}/
  sdk/
    include/{ucrt,um,shared}/
    lib/{um,ucrt}/{x86,x86_64}/

Building

The project uses CMake presets for convenience:

# x64 Release
cmake --preset x64
cmake --build build-x64

# x86 Release
cmake --preset x86
cmake --build build-x86

Debug builds are also available (x64-debug, x86-debug), but shellcode extraction is disabled in Debug mode. You get a PE executable for debugging instead.

After building, each example produces both a .exe and a .bin (the extracted shellcode).

Running Shellcode

The scrun tool loads a shellcode binary into executable memory and runs it. It's built alongside the examples and requires Windows to run.

.\build-x64\tools\scrun.exe .\build-x64\examples\writeconsole\writeconsole.bin
.\build-x64\tools\scrun.exe .\build-x64\examples\opengl_triangle\opengl_triangle.bin
.\build-x86\tools\scrun.exe .\build-x86\examples\messagebox\messagebox.bin

scrun accepts two optional arguments that are passed to the shellcode via the first and second parameters (RCX/ECX and RDX/EDX respectively):

.\build-x64\tools\scrun.exe shellcode.bin 0x12345 0x67890

After the shellcode returns, scrun checks whether the shellcode freed its own memory (i.e. whether SCFW_OPT_CLEANUP was enabled) and reports the result.

Fun fact: on Windows on ARM64, the binary translation layer can run both x86 and x64 shellcodes via scrun. However, when emulating x64, xtajit64.dll (or xtajit64se.dll) is the 2nd module in the PEB load order instead of kernel32.dll, which breaks the fast-path lookup. If your shellcode imports from kernel32.dll and you want it to work under ARM64 emulation, define SCFW_ENABLE_FULL_MODULE_SEARCH to use the generic PEB walker instead.

<img src="assets/opengl_triangle.png" alt="scrun output" width="600"> Those who'd like to point out that more impressive sub-4kB demos exist will be mercilessly frowned upon.

Architecture

scfw compiles your code into a PE executable, then extracts the .text section as a raw binary. The trick is getting everything (code, data, constants, and the import resolution logic) into that single section, in the right order, with no absolute address fixups.

The Dispatch Table

The core of scfw is a dispatch table built entirely at compile time using C++ template metaprogramming. Each IMPORT_MODULE and IMPORT_SYMBOL macro creates a new template specialization that inherits from the previous one, forming a chain:

dispatch_table_impl<0, Mode>         base class
  - holds fn pointers: cleanup_, free_, load_module_, unload_module_, lookup_symbol_
  - provides find_module(), lookup_symbol()
        |
dispatch_table_impl<1, Mode>         IMPORT_MODULE("kernel32.dll")
  - adds: module_ (handle to loaded/found module)
  - init() calls find_module() or load_module()
        |
dispatch_table_impl<2, Mode>         IMPORT_SYMBOL(WriteConsoleA)
  - adds: slot_WriteConsoleA_ (typed function pointer)
  - init() resolves the symbol from the parent module
        |
dispatch_table                       IMPORT_END (final alias)

The __COUNTER__ macro gives each entry a unique ID, and IMPORT_END() seals the chain, instantiates a global __dispatch_table, and generates the _entry() wrapper function. This wrapper initializes the dispatch table (resolving all modules and symbols), calls your entry() function, and optionally tears things down (e.g. FreeLibrary for dynamically loaded modules).

After IMPORT_END(), imported symbols are accessible through proxy objects in the sc namespace. When you write WriteConsoleA(...) in your code, it reads the function pointer from the dispatch table and calls through it. There's no runtime metadata, no string tables, no relocation records. Just a flat struct of function pointers.

Section Layout

The linker merges .data and .rdata into .text, producing a single PE section with read/write/execute permissions. Within .text, ordering is controlled via MSVC-style subsection naming (.text$00, .text$10, ...), which the linker sorts alphabetically:

Section       Contents                    Source
.text$00      _init                       lib/src/arch/*/init.S
.text$10      _start, _pc, _cleanup_*     lib/src/arch/*/start.S
.text$20      _entry                      generated by IMPORT_END()
.text$aaa     framework code              runtime.h, crt0.h, ...
.text$yyy     user code                   your entry() and everything after

_init is the PE entry point. It must be at the very beginning of the binary since that's where execution starts when you jump to the shellcode's base address. The startup code, dispatch table initialization, and user code follow in a deterministic order.

After building, a post-build step verifies the PE has exactly one section (or two, if debug info is enabled) and extracts .text to the final .bin file using llvm-objcopy.

Position-Independent Code

Shellcode can be loaded at any address, so all memory references must be position-independent. This is where x86 and x64 diverge.

x64 has RIP-relative addressing, so mov rax, [rip + symbol] just works. The compiler generates position-independent code by default and _pic() is a no-op. Nothing special is needed.

x86 doesn't have an instruction pointer-relative addressing mode. The compiler generates absolute addresses like mov eax, offset symbol, and those addresses are wrong when the shellcode is loaded somewhere other than its compile-time base.

scfw solves this with a runtime PIC relocation scheme. It relies on the fact that while absolute addresses change, the differences between addresses stay the same no matter where the code is loaded. The _pc() function (implemented via the classic call/pop trick) returns its own runtime address:

_pc:
    call    1f
1:  pop     eax
    sub     eax, 5      ; call is 5 bytes
    ret

Then _pic() computes the correct runtime address of

Scfw

Install / Use

README