Miasm
Reverse engineering framework in Python
Install / Use
/learn @cea-sec/MiasmREADME
What is Miasm?
Miasm is a free and open source (GPLv2) reverse engineering framework. Miasm aims to analyze / modify / generate binary programs. Here is a non exhaustive list of features:
- Opening / modifying / generating PE / ELF 32 / 64 LE / BE
- Assembling / Disassembling X86 / ARM / MIPS / SH4 / MSP430
- Representing assembly semantic using intermediate language
- Emulating using JIT (dynamic code analysis, unpacking, ...)
- Expression simplification for automatic de-obfuscation
- ...
See the official blog for more examples and demos.
Table of Contents
- What is Miasm?
- Basic examples
- How does it work?
- Documentation
- Obtaining Miasm
- Testing
- They already use Miasm
- Misc
Basic examples
Assembling / Disassembling
Import Miasm x86 architecture:
>>> from miasm.arch.x86.arch import mn_x86
>>> from miasm.core.locationdb import LocationDB
Get a location db:
>>> loc_db = LocationDB()
Assemble a line:
>>> l = mn_x86.fromstring('XOR ECX, ECX', loc_db, 32)
>>> print(l)
XOR ECX, ECX
>>> mn_x86.asm(l)
['1\xc9', '3\xc9', 'g1\xc9', 'g3\xc9']
Modify an operand:
>>> l.args[0] = mn_x86.regs.EAX
>>> print(l)
XOR EAX, ECX
>>> a = mn_x86.asm(l)
>>> print(a)
['1\xc8', '3\xc1', 'g1\xc8', 'g3\xc1']
Disassemble the result:
>>> print(mn_x86.dis(a[0], 32))
XOR EAX, ECX
Using Machine abstraction:
>>> from miasm.analysis.machine import Machine
>>> mn = Machine('x86_32').mn
>>> print(mn.dis('\x33\x30', 32))
XOR ESI, DWORD PTR [EAX]
For MIPS:
>>> mn = Machine('mips32b').mn
>>> print(mn.dis(b'\x97\xa3\x00 ', "b"))
LHU V1, 0x20(SP)
Intermediate representation
Create an instruction:
>>> machine = Machine('arml')
>>> instr = machine.mn.dis('\x00 \x88\xe0', 'l')
>>> print(instr)
ADD R2, R8, R0
Create an intermediate representation object:
>>> lifter = machine.lifter_model_call(loc_db)
Create an empty ircfg:
>>> ircfg = lifter.new_ircfg()
Add instruction to the pool:
>>> lifter.add_instr_to_ircfg(instr, ircfg)
Print current pool:
>>> for lbl, irblock in ircfg.blocks.items():
... print(irblock)
loc_0:
R2 = R8 + R0
IRDst = loc_4
Working with IR, for instance by getting side effects:
>>> for lbl, irblock in ircfg.blocks.items():
... for assignblk in irblock:
... rw = assignblk.get_rw()
... for dst, reads in rw.items():
... print('read: ', [str(x) for x in reads])
... print('written:', dst)
... print()
...
read: ['R8', 'R0']
written: R2
read: []
written: IRDst
More information on Miasm IR is in the corresponding Jupyter Notebook.
Emulation
Giving a shellcode:
00000000 8d4904 lea ecx, [ecx+0x4]
00000003 8d5b01 lea ebx, [ebx+0x1]
00000006 80f901 cmp cl, 0x1
00000009 7405 jz 0x10
0000000b 8d5bff lea ebx, [ebx-1]
0000000e eb03 jmp 0x13
00000010 8d5b01 lea ebx, [ebx+0x1]
00000013 89d8 mov eax, ebx
00000015 c3 ret
>>> s = b'\x8dI\x04\x8d[\x01\x80\xf9\x01t\x05\x8d[\xff\xeb\x03\x8d[\x01\x89\xd8\xc3'
Import the shellcode thanks to the Container abstraction:
>>> from miasm.analysis.binary import Container
>>> c = Container.from_string(s, loc_db)
>>> c
<miasm.analysis.binary.ContainerUnknown object at 0x7f34cefe6090>
Disassembling the shellcode at address 0:
>>> from miasm.analysis.machine import Machine
>>> machine = Machine('x86_32')
>>> mdis = machine.dis_engine(c.bin_stream, loc_db=loc_db)
>>> asmcfg = mdis.dis_multiblock(0)
>>> for block in asmcfg.blocks:
... print(block)
...
loc_0
LEA ECX, DWORD PTR [ECX + 0x4]
LEA EBX, DWORD PTR [EBX + 0x1]
CMP CL, 0x1
JZ loc_10
-> c_next:loc_b c_to:loc_10
loc_10
LEA EBX, DWORD PTR [EBX + 0x1]
-> c_next:loc_13
loc_b
LEA EBX, DWORD PTR [EBX + 0xFFFFFFFF]
JMP loc_13
-> c_to:loc_13
loc_13
MOV EAX, EBX
RET
Initializing the JIT engine with a stack:
>>> jitter = machine.jitter(loc_db, jit_type='python')
>>> jitter.init_stack()
Add the shellcode in an arbitrary memory location:
>>> run_addr = 0x40000000
>>> from miasm.jitter.csts import PAGE_READ, PAGE_WRITE
>>> jitter.vm.add_memory_page(run_addr, PAGE_READ | PAGE_WRITE, s)
Create a sentinelle to catch the return of the shellcode:
def code_sentinelle(jitter):
jitter.running = False
jitter.pc = 0
return True
>>> jitter.add_breakpoint(0x1337beef, code_sentinelle)
>>> jitter.push_uint32_t(0x1337beef)
Active logs:
>>> jitter.set_trace_log()
Run at arbitrary address:
>>> jitter.init_run(run_addr)
>>> jitter.continue_run()
RAX 0000000000000000 RBX 0000000000000000 RCX 0000000000000000 RDX 0000000000000000
RSI 0000000000000000 RDI 0000000000000000 RSP 000000000123FFF8 RBP 0000000000000000
zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000000
RIP 0000000040000000
40000000 LEA ECX, DWORD PTR [ECX+0x4]
RAX 0000000000000000 RBX 0000000000000000 RCX 0000000000000004 RDX 0000000000000000
RSI 0000000000000000 RDI 0000000000000000 RSP 000000000123FFF8 RBP 0000000000000000
zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000000
....
4000000e JMP loc_0000000040000013:0x40000013
RAX 0000000000000000 RBX 0000000000000000 RCX 0000000000000004 RDX 0000000000000000
RSI 0000000000000000 RDI 0000000000000000 RSP 000000000123FFF8 RBP 0000000000000000
zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000000
RIP 0000000040000013
40000013 MOV EAX, EBX
RAX 0000000000000000 RBX 0000000000000000 RCX 0000000000000004 RDX 0000000000000000
RSI 0000000000000000 RDI 0000000000000000 RSP 000000000123FFF8 RBP 0000000000000000
zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000000
RIP 0000000040000013
40000015 RET
>>>
Interacting with the jitter:
>>> jitter.vm
ad 1230000 size 10000 RW_ hpad 0x2854b40
ad 40000000 size 16 RW_ hpad 0x25e0ed0
>>> hex(jitter.cpu.EAX)
'0x0L'
>>> jitter.cpu.ESI = 12
Symbolic execution
Initializing the IR pool:
>>> lifter = machine.lifter_model_call(loc_db)
>>> ircfg = lifter.new_ircfg_from_asmcfg(asmcfg)
Initializing the engine with default symbolic values:
>>> from miasm.ir.symbexec import SymbolicExecutionEngine
>>> sb = SymbolicExecutionEngine(lifter)
Launching the execution:
>>> symbolic_pc = sb.run_at(ircfg, 0)
>>> print(symbolic_pc)
((ECX + 0x4)[0:8] + 0xFF)?(0xB,0x10)
Same, with step logs (only changes are displayed):
>>> sb = SymbolicExecutionEngine(lifter, machine.mn.regs.regs_init)
>>> symbolic_pc = sb.run_at(ircfg, 0, step=True)
Instr LEA ECX, DWORD PTR [ECX + 0x4]
Assignblk:
ECX = ECX + 0x4
________________________________________________________________________________
ECX = ECX + 0x4
________________________________________________________________________________
Instr LEA EBX, DWORD PTR [EBX + 0x1]
Assignblk:
EBX = EBX + 0x1
________________________________________________________________________________
EBX = EBX + 0x1
ECX = ECX + 0x4
________________________________________________________________________________
Instr CMP CL, 0x1
Assignblk:
zf = (ECX[0:8] + -0x1)?(0x0,0x1)
nf = (ECX[0:8] + -0x1)[7:8]
pf = parity((ECX[0:8] + -0x1) & 0xFF)
of = ((ECX[0:8] ^ (ECX[0:8] + -0x1)) & (ECX[0:8] ^ 0x1))[7:8]
cf = (((ECX[0:8] ^ 0x1) ^ (ECX[0:8] + -0x1)) ^ ((ECX[0:8] ^ (ECX[0:8] + -0x1)) & (ECX[0:8] ^ 0x1)))[7:8]
af = ((ECX[0:8] ^ 0x1) ^ (ECX[0:8] + -0x1))[4:5]
________________________________________________________________________________
af = (((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8] ^ 0x1)[4:5]
pf = parity((ECX + 0x4)[0:8] + 0xFF)
zf = ((ECX + 0x4)[0:8] + 0xFF)?(0x0,0x1)
ECX = ECX + 0x4
of = ((((ECX + 0x4)[0:8] + 0xFF) ^ (ECX + 0x4)[0:8]) & ((ECX + 0x4)[0:8] ^ 0x1))[7:8]
nf = ((ECX + 0x4)[0:8] + 0xFF)[7:8]
cf
