LabSync

An IDA plugin that can be used to partially synchronize IDBs between different users reversing the same binaries

Generate Convert Improve

Install / Use

/learn @cellebrite-labs/LabSync

About this skill

Quality Score

0/100

README

LabSync

LabSync is an IDA plugin that can be used to partially synchronize IDBs between different users working on reversing the same binaries.

LabSync is intended to be non-intrusive, lightweight, and easy to use for very frequent syncs (think as frequently and easily as you saved your IDB before Undo was a thing).

The leading use case is multiple people reversing the same binary at the same time, and especially for binaries that don't start from a good "baseline" IDB (i.e. no typing information, non-standard formats or architectures), and whose structure keeps changing during the reversing process.

How it works

When enabled, whenever the IDB is saved, LabSync will synchronize some of its data with other reverse engineers using a shared git repo.

LabSync generates a YAML from some of the data stored in your IDB (e.g. names, types, inlined functions, etc.), and when you save the IDB this YAML is also committed to the shared git repo. The repo is then pulled and pushed to sync with others.

In case there were any remote changes since the last time you saved your IDB, they will be fetched during the git pull operation and merged into your local changes.

In case of any merge conflicts, git mergetool will automatically be started for you to resolve them in a convenient textual format, and the sync process will finish afterwards.

The YAML for each IDB is saved under the MD5 of the input file, so LabSync is able to sync it with other reverse engineers no matter their local IDB filename, as long as it originated from the same binary.

Example

We generated an example YAML for the gzip-O3-moutline example from the FunctionInliner repo (source / binary) after inlining all outlined functions, and you can take a look at it here.

An example of conflict resolution can be seen in the following animation (we change some function name and sync, while it was also changed in upstream since our last change):

Motivation

In Cellebrite Labs we often have multiple researchers working at the same time on reversing huge binaries with no symbols or typing information. The structure of some of these also undergos drastic changes during the reversing process (e.g. when FunctionInliner is used).

After reviewing past and existing IDA synchronization solutions, we failed to find a solution that satisfied the following requirements:

Syncing should be very fast and non-intrusive -- either live or as quick and easy as a keypress.
In case of non-live synchronization or live-synchronization supporting offline work (i.e. where conflicts can occur), we want conflict resolution to be clear, intuitive and non-intrusive.
1. Require the least amount of user interaction
2. Be "easy" for a user unfamiliar with IDA/the solution internals
3. Conflict resolution should not be a "must" when you just want to open the IDB to check something
FunctionInliner should be a "first class citizen"
1. IDA's built-in outlined function support (which is trivial to sync) is not sufficient for "deep" reversing, for example because xrefs from outlined chunks to functions/data are not propagated the "parent" functions, and because it makes static analysis of disassembly practically impossible.
2. Non-live syncing of FunctionInliner's effects on the IDB are practically impossible without the solution being "FunctionInliner-aware" (i.e. because if two users inline different functions before syncing, the "clone" segments will most likely collide on the same unused EAs).

We tried to develop a solution based on the Pareto principle -- support the minimum amount of features that will give the maximum amount of assistance to shared reversing work. We also chose to dismiss syncing of features that, even if very helpful, would require a lot of maintenance, or would require going down deep rabbit holes of edge cases. For example, we chose to not sync decompiler comments, because we suspected that it will require exact-same decompilation for all users which is impossible in the general case unless perfect synchronization is achieved for a lot of other features).

We chose to use YAML files to store the data because they're easily editable by humans that will need to understand what's going on and resolve potential conflicts.

We chose to use git as a backend because it's proven to work well for storing text files, handling merges and conflicts and everyone already knows how to use it (or should :) ).

Installing

Install the dependencies listed in requirements.txt where IDA can import them. For example using /path/to/python3/used/by/ida -m pip install -r requirements.txt.
Optionally install FunctionInliner.
Clone this repository and symlink ~/.idapro/plugins/labsync.py to labsync.py in the cloned repo.
Create a new (empty) git repository that will be used for the synchronization data. This repo should be cloned by all of the users that will share their work.
Follow the next section on how to configure LabSync.

Configuring

LabSync expects to find its configuration file under ~/.idapro/cfg/labsync.cfg
Copy the example labsync.cfg from this repository and change repo_path to point to the path where your local clone of the data repository is (i.e. the one from step 4 above).
Make sure that you have merge.tool configured in your git configuration (either globally or locally for the data repo).
1. You can check what it's globally configured to with git config --global merge.tool
2. In case the above is empty, you should configure it e.g. with git config --global merge.tool opendiff
3. You can test your configured merge.tool using this repo.

Usage

To start synchronizing the current IDB to LabSync, use Edit > Plugins > LabSync > Enable.

Afterwards -- just save your IDB regularly in order to synchronize to the repository.

NOTE: when resolving merge conflicts for names/prototypes, make sure that you verify the EAs of the conflicting chunks you're comparing. Git's merge strategy compares the files line-by-line and isn't aware of YAML's syntax, so it may create a conflict when two different adjacent keys have been added to a dictionary such as names or prototypes.

What does it sync?

We currently only sync:

Names given to EAs (e.g. functions, globals)
EAs of functions that have been inlined with functioninliner
Local types (i.e. structures, unions, enums)
Function prototypes

Advanced features

Branching out

You can checkout a different branch on the data repository and push it to the remote data repository, and then your changes will sync only with people that are using the same branch, and won't affect everyone else.

Unless you intend to merge back into the main branch eventually, it's recommended to keep a copy of your IDB before you do the above so it'll be easy to revert once done.

Local type tracking

LabSync generates a UUID for each local type and documents it in a comment above the local type definition written in the YAML. This is used to track renames of local types, so that we don't delete and re-create a local type in case it was renamed remotely (since then all references to it will be destroyed).

When first enabling sync on an IDB whose binary has already been synced in the past, LabSync will try to "adopt" the UUIDs saved in the repo for local types that have the same names as those saved in the repo.

Note that this is a best-effort heuristical approach to reduce the amount of "duplicate" local types that will then have to be replaced manually. However, this heuristic might not be complete, and/or may also lead to false positives (e.g. a logically-different local type with the same name as one saved in the repo will be assigned its UUID).

Please take the above into consideration when reviewing the initial commit merge, and replace false-positive UUIDs with new random ones.

Resetting synchronization

In order to reset the synchronization state of an IDB, you should first disable the synchronization using Edit > Plugins > LabSync > Disable and then use Edit > Plugins > LabSync > Reset.

If LabSync will be enabled afterwards, it'll treat the IDB as "new" when syncing it with the repo, and in case it won't be deleted from git, it'll be merged with the existing repo data.

Mapping segments to a different data file

In some cases it's useful to reverse a binary together with a software library that it uses in the same IDB. In order to support syncing features related to the library between the IDBs of different binaries that use it, you can ask LabSync to sync some segments to separate files.

Note that all of the local types will be synced to all of the associated files, so in case of a conflict involving local types, you may have to resolve it more than once during conflict resolution.

This functionality is mostly intended to be used by IDA loaders, so we don't expose it using UI, but rather using the following API:

from labsync import LabSyncPlugin
LabSyncPlugin.map_segments_to_idb_id(seg_prefix, idb_id)

Where seg_prefix is e.g. libwhatever. and idb_id can technically be any string, but is expected to be the MD5 hash of the libwhatever binary. This will cause LabSync to sync features (e.g. names, prototypes) related to EAs whose segment name starts with <seg_prefix> to <idb_id>.yaml instead of the main YAML file.

Because the library may be loade

Related Skills

node-connect

339.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

83.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

339.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

83.8k

Commit, push, and open a PR