Allyourbase
Finds the base address of a firmware by comparing string addresses with target pointer addresses
Install / Use
/learn @8051Enthusiast/AllyourbaseREADME
allyourbase
allyourbase.py is a script that tries to find the base address of a firmware image by comparing the addresses of strings with the target addresses of all possible pointers, similar to rbasefind.
It works with arbritrary pointer sizes and endianness efficiently on decently sized firmware images (in the 10s of MBs, mostly limited by required RAM which grows linearly with filesize but has a high constant factor).
The script is made to be simple so that it is easy to modify it to use one's own heuristic. For example, one could modify it to correlate target addresses of absolute calls with addresses of return instructions (since the instruction after it would often begin a new function (except on function where functions are typically aligned)).
Example
$ ./allyourbase.py -n 5 -l 8 -e little /usr/bin/ls
Found 630 strings
4/4
Offset: 0x11ffffefc0
Parameters
-n N the minimum length of strings to look for, in unicode codepoints
-l L the length of pointers to look for, in bytes (4 = 32-bit pointers)
-e {little,big} the endianness of the pointers
-a A the alignment of the pointers (defaults to pointer lengths)
-f F slack factor (higher = slower and more memory but more accurate)
How it works
First, one finds all addresses where strings are at and puts them in a set. Then one interpretes each (aligned) offset in the file as a pointer and puts the target addresses in a set.
Now the brute-force approach would be to try each base address and look at how many string offsets overlap the possible pointer target locations. That way, one can simply choose the base address that has the highest overlap.
However that approach is not really usable on 64-bit address spaces without changes, and it quickly gets slower with more string addresses.
One useful realization is that "counting the overlap between two sets for each possible relative offset" is the same as doing the cross-correlation of the indicator vectors of the sets. This can be efficiently implemented using the fast fourier transform by the convolution theorem. Still you would have to do a fourier transform the size of the address space which in most cases is infeasable.
Instead, if the target addresses are reduced modulo n (bigger than filesize) one can do a circular cross-correlation of size n to find out the base address modulo n. Doing that for a few coprime n until their product is bigger than the address space, one only has one solution that fits inside the address space which will probably be the base address.
Assuming the target addresses do not tend to differ by multiples of n, the matches of strings with errant pointers modulo n is modeled as binomial noise.
The purpose of the -f flag is to make the ratio of pointers in relation to n smaller so that the noise floor is lowered.
License
This software is licensed under the MIT license.
Related Skills
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
