██████╗ ███████╗ █████╗ ████████╗██╗  ██╗███████╗██╗     ███████╗███████╗██████╗ 
    ██╔══██╗██╔════╝██╔══██╗╚══██╔══╝██║  ██║██╔════╝██║     ██╔════╝██╔════╝██╔══██╗
    ██║  ██║█████╗  ███████║   ██║   ███████║███████╗██║     █████╗  █████╗  ██████╔╝
    ██║  ██║██╔══╝  ██╔══██║   ██║   ██╔══██║╚════██║██║     ██╔══╝  ██╔══╝  ██╔═══╝ 
    ██████╔╝███████╗██║  ██║   ██║   ██║  ██║███████║███████╗███████╗███████╗██║     
    ╚═════╝ ╚══════╝╚═╝  ╚═╝   ╚═╝   ╚═╝  ╚═╝╚══════╝╚══════╝╚══════╝╚══════╝╚═╝

A PoC implementation for an evasion technique to terminate the current thread and restore it before resuming execution, while implementing page protection changes during no execution.

Intro

Sleep and obfuscation methods are well known in the maldev community, with different implementations, they have the objective of hiding from memory scanners while sleeping, usually changing page protections and even adding cool features like encrypting the shellcode, but there is another important point to hide our shellcode, and is hiding the current execution thread. Spoofing the stack is cool, but after thinking a little about it I thought that there is no need to spoof the stack… if there is no stack :)

The usability of this technique is left to the reader to assess, but in any case, I think it is a cool way to review some topics, and learn some maldev for those who, like me, are starting in this world.

The main implementation showed here holds everything that we need to take out of the stack in the data section, as global variables, but an impletementation moving everything to the heap will be published soon. It aims to show some key modifications that needs to be done to make this code pic and injectable.

This repository is mirrored between GitHub and GitLab.

<details> <summary> Whats going on? </summary>

First of all

Everything stated here comes from my understanding of the different topics covered, either from reading or experience during development. Im aware that Im not an expert and the last thing I want to do is spread misinformation, so if you think that something is not correct, I would love you to make me aware of it, you can contact me on twitter, or opening issues in this repo. Thank you very much for your understanding. :)

Basics

The main objective of this technique is clear, terminating the current thread and restoring it before resuming execution, but what exactly does this mean, and which new constraints it puts in place?

To be able to restore the execution, we need to save two things before terminating the thread, first, the CPU state, and secondly the stack, and effectively set them up again after the new thread is launched.

I talked about new constraints that will appear in this techniche, and there are two big ones: First we need to store outside the stack anything that is needed from the moment the thread terminates until the stack is restored, and as you will see, it creates some new challenges.

Second, we always need at least another thread running in our process, since we are terminating our thread, if there are no other threads the process will end. I don't think this a big problem, since most agents are injected in other processes, we can assume that this process will keep at least one thread running.

DeathSleep components:

We can view on this POC 4 core functions:

Main program: This is where you would write your agent code, and it's the portion of the code that will make use of DeathSleep
Awake function: this is the entry point of all our threads, and it's in charge of saving the starting point of the stack that we will be restoring. Also, it's in charge of restoring the stack and CPU context when it's needed, or just launching our main program.
DeathSleep: this is the main function of this technique, and is in charge of backing up the thread context and stack, and also setting everything up for the magic to happen.
Rebirth: A simple function only in charge of launching our new threads.

Saving the stack.

When we are about to save the stack, a question is raised: how much of the stack needs to be saved?

Lets review first what is in the stack after we called the DeathSleep function (this is the function that saves the context, stack and prepares everything for the obfuscation and restoration)

As we can see, every function has three parts:

Shadow space: this is a 32 bytes space, allocated by the caller, but used by the callee. As far as I know and I could see, its main function is to hold, if needed, the arguments passed to the callee function in registers, but it can be used for anything the callee function decides.
Return address: this is the address of the next instruction to be executed in the caller function, pushed by the CALL instruction, so the RET instruction in the callee will just take this address and “jump” to it when it ends.
Function stack space: This is the space reserved by the callee to store the value of registers that need to be restored and the value of its local variables.

The minimum portion of the stack that we obviously need to save is everything inside our main program, that means its shadow space, its return address and everything until the DeathSleep function. Anything before that is not really required (saving the stack used by the entry function has its advantages, but we will discuss that later), as that is the stack used by windows routines for launching our new thread. Apart from this, I decided to also store the shadow space of the DeathSleep function (not really needed, but it makes the calculation of the Rsp at the moment of waking up easier).

So at the end, we are saving this:

How to find our stack addresses:

Every function on a standart compilation should be composed of 3 parts, the prologue, the function code, and the epilogue.

The Rsp (stack pointer), should only be modified on the function prologue and epilogue. The prologue increases the stack pointer (remember increasing the stack means reducing addresses, since they go in opposite directions), to save registers, to hold all its local variables and then to hold the shadow space, and the epilogue does exactly the opposite.

This means that the stack pointer inside the function code should always point to the end of the shadow space (purple on the above image), and the sum of the stack size of the function and the shadow space can be found by calculating how much the prologue increases the stack pointer. This value can be easily calculated using the information held in the unwind tables, an explanation about their usage is something we will not cover here, but as a summary, those tables are used to allow any other thread or process to correctly move through the stack to see its content, to handle exceptions or to analyze it.

Capturing and preparing the context to restore.

Capturing the context is probably one of the easiest things to do, as we can simply do RtlCaptureContext() at the first line of DeathSleep, before any modification is done to the non-volatile registers. We still need to do two modifications to the context where we will restore the execution.

The first one will be modifying its Rip, as you remember, this is the register that holds the next instruction to run, and if we just leave this unmodified, execution will resume inside the DeathSleep function. What we will be doing is changing the Rip to hold the return address of DeathSleep, which is pointed by the current Rsp displaced the size increased in the epilogue (green + purple areas in the images above).

The second modification will be done when the thread is restored, and it implies setting the Rsp to point to the top of our restored stack; this will be done during the restoration phase, since we don't know where our new stack will be placed. The value will simply be the end address of our restored stack, since as we discussed later, we copied also the shadow space reserved by the caller of DeathSleep, and that's exactly the value of RSP before the call to DeathSleep.

Restoring our stack:

Once we reach the point of waking up, just before resuming execution, we need to put our saved stack in place. As we already know our saved stack starts at the address captured by the awake function, so the new captured address will be the starting point where we will be placing our saved stack, but this raises a problem, any call to a function after we placed our old stack would modify it and break it, and doing the cleanup here is really convenient, specially freeing the heap used to hold our stack backup. This means that we need to move our current Rsp and also the parts of the stack that we are currently using to a place outside where we will place our restored stack. Trying to make it clearer here is the problem:

And here is my solution, just move everything away:

Restoring the context:

After we did all the hard work, the last thing to do is using NtContinue, this function allows us to change the current context with our previously captured and modified context, setting the RIP right after the call to DeathSleep, all the register should have the same values as they had when calling DeathSleep, and RSP should be pointing at the top of the stack.

Scheduling the restoration process: Using thread pool API.

Okay, we know the basics of what we need to do to store and restore the current thread, but we need somehow be able to run all of this even when we have no threads. Here is where we meet our lovely Thread Pool API, a tool given by Windows, tha

DeathSleep

Install / Use

README