Scfw
A cross-platform C++ framework for building Windows shellcode
Install / Use
/learn @wbenny/ScfwREADME
scfw
A cross-platform C++ framework for building Windows shellcode. Supports Linux, macOS, or Windows development environments. Creates position-independent blob that runs in user-mode or kernel-mode, x86 or x64.
#include <scfw/runtime.h>
#include <scfw/platform/windows/usermode.h>
IMPORT_BEGIN();
IMPORT_MODULE("kernel32.dll");
IMPORT_SYMBOL(WriteConsoleA);
IMPORT_END();
namespace sc {
extern "C" void __fastcall entry(void* argument1, void* argument2)
{
HANDLE StdOut = NtCurrentPeb()->ProcessParameters->StandardOutput;
WriteConsoleA(StdOut, _T("Hello, World!\n"), 14, NULL, NULL);
}
} // namespace sc
Build it, extract the .text section, and you have a self-contained shellcode binary that resolves its own imports at runtime.
Motivation
As with all my projects, it boils down to "I need that, current solutions were unsatisfactory, and I want to learn something".
I like to experiment with <abbr title="Virtual Machine Introspection">VMI</abbr> and sometimes it's really useful to be able to inject a piece of code into the memory of some process (and/or kernel) and execute it. And because vmi-rs development happens on Linux, and my daily driver became macOS, I wanted a convenient way to generate Windows shellcode on them.
Although compression and "non-null" shellcode are not the primary goals of this project, they might be interesting additions in the future.
Table of Contents
- Motivation
- Installation
- Building
- Running Shellcode
- Architecture
- User-Mode Shellcode
- Kernel-Mode Shellcode
- Compile-Time Options
- Per-Entry Flags
- CMake Build Options
- Examples
- License
Installation
Prerequisites
scfw cross-compiles Windows shellcode from any host OS. You don't need a Windows machine to build.
- CMake 3.22+
- Ninja build system
- clang clang 19+
- Note: On Windows, clang 21+ currently experiences issues with
/FILEALIGN:1during linking. If you encounter linker errors, try to compile with-DSCFW_FILE_ALIGNMENT=0or switch to older clang version.
- Note: On Windows, clang 21+ currently experiences issues with
- LLVM tools:
lld-link,llvm-objcopy,llvm-readobj - Windows SDK headers and libraries (can be fetched automatically on any platform, see below)
Dependencies
phnt (Windows native API headers) is fetched automatically by CMake via FetchContent. No action needed.
Windows SDK is the only dependency that requires some setup, especially on non-Windows hosts. On Windows with MSVC, CMake detects the system SDK automatically. On macOS and Linux (or Windows without the SDK), CMake will tell you it's missing and suggest how to fetch it.
The easiest option is to let CMake download it for you:
cmake --preset x64 -DSCFW_FETCH_WINSDK=ON
This runs scripts/fetch-winsdk.sh (or fetch-winsdk.ps1 on Windows), which uses xwin to download the Windows SDK. If xwin isn't installed, the script looks for Rust toolchain and installs xwin via cargo install. If Rust isn't installed either, the script downloads a temporary Rust toolchain, installs xwin, downloads the SDK, and then cleans up both the Rust toolchain and xwin. Nothing is left behind on your system - the temporary installations are fully isolated.
Alternatively, you can manually place the Windows SDK into the winsdk/ directory at the project root. The expected structure is:
winsdk/
crt/
include/
lib/{x86,x86_64}/
sdk/
include/{ucrt,um,shared}/
lib/{um,ucrt}/{x86,x86_64}/
Building
The project uses CMake presets for convenience:
# x64 Release
cmake --preset x64
cmake --build build-x64
# x86 Release
cmake --preset x86
cmake --build build-x86
Debug builds are also available (x64-debug, x86-debug), but shellcode extraction is disabled in Debug mode. You get a PE executable for debugging instead.
After building, each example produces both a .exe and a .bin (the extracted shellcode).
Running Shellcode
The scrun tool loads a shellcode binary into executable memory and runs it. It's built alongside the examples and requires Windows to run.
.\build-x64\tools\scrun.exe .\build-x64\examples\writeconsole\writeconsole.bin
.\build-x64\tools\scrun.exe .\build-x64\examples\opengl_triangle\opengl_triangle.bin
.\build-x86\tools\scrun.exe .\build-x86\examples\messagebox\messagebox.bin
scrun accepts two optional arguments that are passed to the shellcode via the first and second parameters (RCX/ECX and RDX/EDX respectively):
.\build-x64\tools\scrun.exe shellcode.bin 0x12345 0x67890
After the shellcode returns, scrun checks whether the shellcode freed its own memory (i.e. whether SCFW_OPT_CLEANUP was enabled) and reports the result.
<p align="center"> <img src="assets/opengl_triangle.png" alt="scrun output" width="600"> <br> <em>Those who'd like to point out that more impressive sub-4kB demos exist will be mercilessly frowned upon.</em> </p>Fun fact: on Windows on ARM64, the binary translation layer can run both x86 and x64 shellcodes via
scrun. However, when emulating x64,xtajit64.dll(orxtajit64se.dll) is the 2nd module in the PEB load order instead ofkernel32.dll, which breaks the fast-path lookup. If your shellcode imports fromkernel32.dlland you want it to work under ARM64 emulation, defineSCFW_ENABLE_FULL_MODULE_SEARCHto use the generic PEB walker instead.
Architecture
scfw compiles your code into a PE executable, then extracts the .text section as a raw binary. The trick is getting everything (code, data, constants, and the import resolution logic) into that single section, in the right order, with no absolute address fixups.
The Dispatch Table
The core of scfw is a dispatch table built entirely at compile time using C++ template metaprogramming. Each IMPORT_MODULE and IMPORT_SYMBOL macro creates a new template specialization that inherits from the previous one, forming a chain:
dispatch_table_impl<0, Mode> base class
- holds fn pointers: cleanup_, free_, load_module_, unload_module_, lookup_symbol_
- provides find_module(), lookup_symbol()
|
dispatch_table_impl<1, Mode> IMPORT_MODULE("kernel32.dll")
- adds: module_ (handle to loaded/found module)
- init() calls find_module() or load_module()
|
dispatch_table_impl<2, Mode> IMPORT_SYMBOL(WriteConsoleA)
- adds: slot_WriteConsoleA_ (typed function pointer)
- init() resolves the symbol from the parent module
|
dispatch_table IMPORT_END (final alias)
The __COUNTER__ macro gives each entry a unique ID, and IMPORT_END() seals the chain, instantiates a global __dispatch_table, and generates the _entry() wrapper function. This wrapper initializes the dispatch table (resolving all modules and symbols), calls your entry() function, and optionally tears things down (e.g. FreeLibrary for dynamically loaded modules).
After IMPORT_END(), imported symbols are accessible through proxy objects in the sc namespace. When you write WriteConsoleA(...) in your code, it reads the function pointer from the dispatch table and calls through it. There's no runtime metadata, no string tables, no relocation records. Just a flat struct of function pointers.
Section Layout
The linker merges .data and .rdata into .text, producing a single PE section with read/write/execute permissions. Within .text, ordering is controlled via MSVC-style subsection naming (.text$00, .text$10, ...), which the linker sorts alphabetically:
Section Contents Source
.text$00 _init lib/src/arch/*/init.S
.text$10 _start, _pc, _cleanup_* lib/src/arch/*/start.S
.text$20 _entry generated by IMPORT_END()
.text$aaa framework code runtime.h, crt0.h, ...
.text$yyy user code your entry() and everything after
_init is the PE entry point. It must be at the very beginning of the binary since that's where execution starts when you jump to the shellcode's base address. The startup code, dispatch table initialization, and user code follow in a deterministic order.
After building, a post-build step verifies the PE has exactly one section (or two, if debug info is enabled) and extracts .text to the final .bin file using llvm-objcopy.
Position-Independent Code
Shellcode can be loaded at any address, so all memory references must be position-independent. This is where x86 and x64 diverge.
x64 has RIP-relative addressing, so mov rax, [rip + symbol] just works. The compiler generates position-independent code by default and _pic() is a no-op. Nothing special is needed.
x86 doesn't have an instruction pointer-relative addressing mode. The compiler generates absolute addresses like mov eax, offset symbol, and those addresses are wrong when the shellcode is loaded somewhere other than its compile-time base.
scfw solves this with a runtime PIC relocation scheme. It relies on the fact that while absolute addresses change, the differences between addresses stay the same no matter where the code is loaded. The _pc() function (implemented via the classic call/pop trick) returns its own runtime address:
_pc:
call 1f
1: pop eax
sub eax, 5 ; call is 5 bytes
ret
Then _pic() computes the correct runtime address of
