SkillAgentSearch skills...

Mcrit

The MinHash-based Code Relationship & Investigation Toolkit (MCRIT) is a framework created to simplify the application of the MinHash algorithm in the context of code similarity.

Install / Use

/learn @danielplohmann/Mcrit
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

MinHash-based Code Relationship & Investigation Toolkit (MCRIT)

Test

MCRIT is a framework created to simplify the application of the MinHash algorithm in the context of code similarity. It can be used to rapidly implement "shinglers", i.e. methods which encode properties of disassembled functions, to then be used for similarity estimation via the MinHash algorithm. It is tailored to work with disassembly reports emitted by SMDA.

Usage

Dockerized Usage

We highly recommend to use the fully packaged docker-mcrit for trivial deployment and usage.
First and foremost, this will ensure that you have fully compatible versions across all components, including a database for persistence and a web frontend for convenient interaction.

Standalone Usage

Installing MCRIT on its own will require some more steps.
For the following, we assume Ubuntu as host operating system.

The Python installation requirements are listed in requirements.txt and can be installed using:

# install python and MCRIT dependencies
$ sudo apt install python3 python3-pip
$ pip install -r requirements.txt 

By default, MongoDB 5.0 is used as backend, which is also the recommended mode of operation as it provides a persistent data storage. The following commands outline an example installation on Ubuntu:

# fetch mongodb signing key
$ sudo apt-get install gnupg
$ wget -qO - https://www.mongodb.org/static/pgp/server-5.0.asc | sudo apt-key add -
# add package repository (Ubuntu 22.04)
$ echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/5.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-5.0.list
# OR add package repository (Ubuntu 20.04)
$ echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/5.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-5.0.list
# OR add package repository (Ubuntu 18.04)
$ echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/5.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-5.0.list
# install mongodb
$ sudo apt-get update
$ sudo apt-get install -y mongodb-org
# start mongodb as a service
$ sudo systemctl start mongod
# optionally configure to start the service with system startup
$ sudo systemctl enable mongod

When doing the standalone installation, you possibly want to install the MCRIT module based on the cloned repository, like so:

$ pip install -e .

After this initial installation and if desired, MCRIT can be used without an internet connection.

Operation

The MCRIT backend is generally divided into two components, a server providing an API interface to work with and one or more workers processing queued jobs. They can be started in seperate shells using:

$ mcrit server

and

$ mcrit worker

By default, the REST API server will be listening on http://127.0.0.1:8000/.

Interaction

Regardless of your choice for installation, once running you can interact with the MCRIT backend.

MCRIT Client

We have created a Python client module that is capable of working with all available endpoints of the server.
Documentation for this client module is currently in development.

MCRIT CLI

There is also a CLI which is based on this client package, examples:

# query some stats of the data stored in the backend 
$ mcrit client status
{'status': {'db_state': 187, 'storage_type': 'mongodb', 'num_bands': 20, 'num_samples': 137, 'num_families': 14, 'num_functions': 129110, 'num_pichashes': 25385}}
# submit a malware sample with filename sample_unpacked, using family name "some_family"
$ mcrit client submit sample_unpacked -f some_family
 1.039s -> (architecture: intel.32bit, base_addr: 0x10000000): 634 functions

A more extensive documentation of the MCRIT CLI is available here

MCRIT IDA Plugin

An IDA plugin is also currently under development. To use it, first create your own config.py and make required changes depending on the deployment of your MCRIT instance:

cp ./plugins/ida/template.config.py ./plugins/ida/config.py
nano ./plugins/ida/config.py

Then simply run the script found at

./plugins/ida/ida_mcrit.py

in IDA.

Reference Data

In July 2023, we started populating a Github repository which contains ready-to-use reference data for common compilers and libraries.

Version History

  • 2026-01-13 v1.4.6: Introduced linear banding strategy, to be used as default in future releases (will require re-creation of the whole index).
  • 2025-12-22 v1.4.5: Fixed a bug due to early conversion when fetching many FunctionEntries at once, which would crash if one function ID does not exist.
  • 2025-12-22 v1.4.4: No changes, just moved plugins to their own repo located at mcrit-plugins.
  • 2025-12-08 v1.4.3: Major improvements to MCRIT IDA plugin UI, backend now supports faster cross matching jobs only matching among selected samples, minor bugfixes.
  • 2025-09-12 v1.4.2: QoL improvements and bugfixes to console client (proper markdown for result tables in queries and force recalculation option, faster skipping for dir mode submission).
  • 2025-07-30 v1.4.1: Filtering for unique matches now takes precedence over scores.
  • 2025-06-13 v1.4.0: Changed the way how percentages for matching are calculated, now using only matchable code vs. all code as baseline. Minor IDA plugin fixes.
  • 2025-05-22 v1.3.22: McritCLI now supports ENV variables (MCRIT_CLI_SERVER and MCRIT_CLI_APITOKEN) and a .env file for setting server and apitoken - THX to @r0ny123 for the suggestion!
  • 2025-03-11 v1.3.21: McritCLI now supports submissions with a a spawned worker (requires --worker flag).
  • 2025-02-26 v1.3.20: Fixed a bug where crashing SpawningWorker would not be properly handled - THX to @yankovs!.
  • 2025-02-26 v1.3.18: Added server and API token support for the CLI.
  • 2024-06-20 v1.3.17: Job deletion and cleanup are now more robust and won't accidentally purge samples unwantedly - @yankovs - THX!!
  • 2024-05-10 v1.3.16: Queue cleanup has been extended to also purge files uploaded during all 3 types of queries (mapped, unmapped, smda).
  • 2024-04-17 v1.3.15: Worker type spawningworker will now terminate children after QueueConfig.QUEUE_SPAWNINGWORKER_CHILDREN_TIMEOUT seconds.
  • 2024-04-02 v1.3.14: Experimental: Introduction of new worker type spawningworker - this variant will consume jobs from the queue as usual but defer the actual job execution into a separate (sub)process, which should reduce issues with locked memory allocations.
  • 2024-04-02 v1.3.13: When cleaning up the queue, now also delete all failed jobs @yankovs - THX!!
  • 2024-03-06 v1.3.12: Fixed a bug where protection of recent samples from queue cleanup would lead to key errors as reported by @yankovs - THX!!
  • 2024-02-21 v1.3.10: Bump SMDA to 1.13.16, which covers another 200 instructions in a better escaped category (affects MinHashes).
  • 2024-02-16 v1.3.9: Finished and integrated automated queue cleanup feature (disabled by default) proposed by @yankovs - THX!!
  • 2024-02-15 v1.3.8: Bump SMDA to address issues with version recognition in SmdaFunction, fixed exception prints in IDA plugin's McritInterface (THX to @malwarefrank!!).
  • 2024-02-12 v1.3.5: Recalculating minhashes will now show correct percentages (THX to @malwarefrank!!).
  • 2024-02-02 v1.3.4: Mini fix in the IDA plugin to avoid referencing a potentially uninitialized object (THX to @r0ny123!!).
  • 2024-02-01 v1.3.2: FIX: Non-parallelized matching now outputs the same data format (THX to @dannyquist!!).
  • 2024-01-30 v1.3.1: The connection to MongoDB is now fully configurable (THX to @dannyquist!!).
  • 2024-01-24 v1.3.0: BREAKING: Milestone release with indexing improvements for PicHash and MinHash. To ensure full backward compatibility, recalculation of all hashes is recommended. Check this migration guide.
  • 2024-01-23 v1.2.26: Pinning lief to 0.13.2 in order to ensure that the pinned SMDA remains compatible.
  • 2024-01-09 v1.2.25: Ensure that we can deliver system status regardless of whether there is a db_state and db_timestamp or not.
  • 2024-01-05 v1.2.24: Now supporting "query" argument in CLI, as well as compact MatchingResults (without function match info) to reduce file footprint.
  • 2024-01-03 v1.2.23: Limit maximum export size to protect the system against OOM crashes.
  • 2024-01-02 v1.2.22: Introduced data class for UniqueBlocksResult with convenience functionality.
  • 2023-12-28 v1.2.21: McritClient now doing passthrough for binary query matching.
  • 2023-12-28 v1.2.20: Status now provides timestamp of last DB update.
  • 2023-12-13 v1.2.18: Bounds check versus sample_ids passed to getUniqueBlocks.
  • 2023-12-05 v1.2.15: Added convenience functionality to Job objects, version number aligned with mcritweb.
  • 2023-11-24 v1.2.11: SMDA pinned to version 1.12.7 before we upgrade SMDA and introduce a database migration to recalculate pic + picblock hashes with the improved generalization.
  • 2023-11-17 v1.2.10: Added ability to set an authorization token for the server via header field: apitoken; added ability to filter by job groups; added ability to fail orphaned jobs.
  • 2023-10-17 v1.2.8: Minor fix in job groups.
  • 2023-10-16 v1.2.6: Summarized q
View on GitHub
GitHub Stars97
CategoryDevelopment
Updated27d ago
Forks14

Languages

Python

Security Score

100/100

Audited on Mar 6, 2026

No findings