IDAPythonEmbeddedToolkit

IDA Python Embedded Toolkit -- IDAPython scripts for automating analysis of firmware of embedded devices

Generate Convert Improve

Install / Use

/learn @maddiestone/IDAPythonEmbeddedToolkit

About this skill

Quality Score

0/100

README

IDAPython Embedded Toolkit

The IDAPython Embedded Toolkit is a set of script to automate many of the steps associated with statically analyzing, or reverse engineering, the firmware of embedded devices in IDA Pro.

Presentations

The IDAPython Embedded Toolkit has been presented at the following venues:

DerbyCon "IDAPython: The Wonder Woman of Embedded Device Reversing" -- September 2017 Recording of Talk: http://www.irongeek.com/i.php?page=videos/derbycon7/t215-idapython-the-wonder-woman-of-embedded-device-reversing-maddie-stone Slides and Demo Videos from Presentation are available in the presentations folder
RECON Montreal "The Life-Changing Magic of IDAPython: Embedded Device Edition" -- June 2017 Recording of Talk: https://recon.cx/media-archive/2017/mtl/recon2017-mtl-20-maddie-stone-The-Life-Changing-Magic-of-IDAPython-Embedded-Device-Edition.mp4 Slides and Demo Videos from Presentation are available in the presentations folder

Getting Started

To understand how and why the IDAPython Embedded Toolkit was created, check out the slides and recording from the DerbyCon or RECON Presentations.

The IDAPython Embedded Toolkit is a set of IDAPython scripts written to be processor/architecture-agnostic and automate the triage, analysis, and annotation processes associated with reversing the firmware image of an embedded device. The currently available scripts:

TRIAGE<a name="triage"></a>
ANALYSIS<a name="analysis"></a>
- Calculate Indirect Offset Memory Accesses
- Find Memory Accesses
ANNOTATE<a name="annotate"></a>

Each script is written to be processor/architecture-agnostic, but in some scripts, this requires a regular expression to address each architecture's specific-syntax. Before running the scripts, verify that the architecture of the firmware image to be analyzed is supported in the script.Please see Architecture Agnostic Structure of Scripts for more details. The IDAPython Embedded Toolkit only becomes more powerful, the more processors that are supported, so please submit a pull request as you add new processors.

To run a script, you must have IDA Pro 6.95 installed. Open the IDA database on which you'd like to run a script and then select File > Script File... and select the script to run.

Versioning

Currently, the IDAPython Embedded Toolkit has only been tested on IDA Pro 6.95. Testing on IDA Pro 7.0 is currently in process.

Installation/ Usage

If you completed the default installation for IDA Pro, then IDAPython should be installed. You can verify by checking your IDA directory for a Python/ folder. If that is there, IDAPython is installed.

Otherwise, install IDAPython per: https://github.com/idapython/src

Once IDAPython is installed, the IDAPython Embedded Toolkit scripts may be run by opening an IDA database and selecting File > Script file... from the upper menu. Then, select the script to run. Each script is run individually by selecting it through this process.

Architecture Agnostic Structure of Scripts<a name="archagnostic"></a>

The scripts in the IDAPython Embedded Toolkit are written to be architecture and processor-agnostic. This is done by finding the common structure and processes that are not dependent on architecture-specific syntax. For the scripts that require processor-specific syntax (for example: Special Function Register Names or Instruction Syntax), regular expressions are used for each architecture. For more information on how to write regular expressions in Python: https://docs.python.org/2/library/re.html

Thanks to the contribution by @tmr232, each script auto-identifies the architecture in use and selects the correct set of regular expressions using the IDAPython function: processor_name = idaapi.get_inf_structure().procName

Add a Processor to a Script

If the processor-in-use does not have regular expressions defined within the script, then the script will exit with an "Unsupported Processor Type" error. To make the script work, you simply need to add the required regular expression. To do this:

Determine IDA's string representation of the processor. In the bottom console bar, type the following command as shown in the image below: idaapi.get_inf_structure().procName The command will output a string. That string is the processor name.
Add an elif statement to the script with the processor name output in Step 1.
Copy the regular expression assignments from another one of the processor's and customize them for the new processor being added. The Python documentation for regular expressions is here. Each script that utilizes processor-specific regular expressions describes what the regular expression is describing in the header of the script.

Example of the Regular Expressions for Processor-Specific Syntax in define_code_functions.py

################### USER DEFINED VALUES ###################
# Enter a regular expression for how this architecture usually 
# begins and ends functions. If the architecture does not 
# dictate how to start or end a function use r".*" to allow
# for any instruction.
#
processor_name = idaapi.get_inf_structure().procName

if processor_name == '8051':	# 8051 Architecture Prologue and Epilogue   	smart_prolog = re.compile(r".*")	
	smart_epilog = re.compile(r"reti{0,1}")
elif processor_name == 'PIC18Cxx':	# PIC18 Architecture Prologue and Epilogue	
	smart_prolog = re.compile(r".*")	
	smart_epilog = re.compile(r"return  0")
elif processor_name == 'm32r':	# Mitsubishi M32R Architecutre Prologue and Epilogue
	smart_prolog = re.compile(r"push +lr")
	smart_epilog = re.compile(r"jmp +lr.*")
elif processor_name == 'TMS32028':	# Texas Instruments TMS320C28x	
	smart_prolog = re.compile(r".*")	
	smart_epilog = re.compile(r"lretr")
elif processor_name == 'AVR':	# AVR	
	smart_prolog = re.compile(r"push +r")	
	smart_epilog = re.compile(r"reti{0,1}")
else:	
	print "[define_code_functions.py] UNSUPPORTED PROCESSOR. Processor = %s is unsupported. Exiting." % processor_name	
	raise NotImplementedError('Unsupported Processor Type.')

Scripts in the IDAPython Embedded Toolkit

data_offset_calc.py -- Resolve Indirect Offset Memory Accesses Resolves the references to indirect offsets of a variable, register, or memory location whose value is known. Changes the display of the operand in the instruction (OpAlt function), creates a data cross references (add_dref), and creates a comment of the resolved address (MakeComment). User nees to define the following: offset_var_string: The string representation of the variable, register, or memory location to be replaced by the resolved value offset_var_value: The value of the variable defined in offset_var_string reg_ex_indirect: A regular expression of how indirect offset accesses to the variable reg_ex_immediate: A regular expression of how the immediate offset value is represented new_opnd_display: A string representation of how the calculated and resolved value should be displayed as the operand in the instruction

For example, let's say we have firmware where fp = 0x808000 and the majority of memory accesses are as offsets from fp. This script will calculate that the instruction is reading 0x80C114, create a cross-reference to that location, and replace the operand in the instruction with this calculated value as shown below.

ld      R1, @(0x4114, fp)   -->     ld      R1, @[0x80C114]
add3    R10, fp, 0x4147     -->     add3    R10, fp, 0x4147;    @[0x80C147]

define_code_functions.py -- Define Code and Functions This script scans an area of the database from the user input "start address" to "end address" defining the bytes as code and attempting to define functions from that code. The script is architecture agnostic by having the user define a regular expression for the "function prologue" and the "function epilogue" for the architecture being analyzed.
define_data_as_types.py -- Define a Block as Data Defines a segment of addresses as the user-specified data type (byte, word, or double word). The byte length for each of these types is architecture dependent, but generally: 1 byte = Byte 2 bytes = Word 4 bytes = Double Word This script with undefine all bytes in the range first which means if you previously had code or strings defined in the area, they will be overwritten as data.
make_strings.py -- Define a Block as Strings This script is used to search for and declare blocks of "Unexplored" bytes as ASCII strings. The user inserts the starting and ending address of the areas to be analyzed. The script then checks if each byte is an ASCII character value and ends with a defined "ending string character." In this example, the ending string characters are 0xD, 0xA, and 0x00. The script only checks "undefined or unexplored" values in the database. For example

Related Skills

node-connect

334.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

82.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

334.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

82.1k

Commit, push, and open a PR