SkillAgentSearch skills...

Tormalwarefp

Traffic analysis for Tor-based malware detection and classification

Install / Use

/learn @malfp/Tormalwarefp
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

tormalwarefp

Description: This repository contains code and datasets for the ACM CCS 2022 paper:

Title: Exposing the Rat in the Tunnel: Using Traffic Analysis for Tor-based Malware Detection

Authors: Priyanka Dodia, Mashael AlSabah, Omar Alrawi, Tao Wang

Our proposed solution is a Machine Learning based prototype designed to identify stealthy Tor-based malware C&C connections using traffic analysis on encrypted Tor traffic. The models further infer the type of malware from the Tor traffic by fingerprinting malicious behavior at the connection and host-levels.

Note: Conference presentation slides PDF

Files & Control Flow:

Main file: classify_topk.py

USAGE: python(3.7) classify_topk.py --options [options_file] --topk [topk] --train/zeroday

[options_file]: Options file defining parameter inputs for classification

[topk]: Use k=1 or k=3 for topk most active Tor connections (connections with most activity)

[train]: Set option to train models for binary/multi-label classification

[zeroday]: Set option to test trained models on provided zeroday data

Datasets provided:

  1. train_D5: Data used for training/validation/testing ML models
  2. zerodaytest.zip: Zero day data for testing the trained models on unseen malware Tor traffic

Note: The data consists of cell files representing connections from a PCAP (ie. Tor traffic obtained from malware/benign binary executions in the Falcon Sandbox). Connection-level features use Tor cell direction, time, order information and Host-level features use information from all Tor connections in a PCAP (appended to the end of each cell file).

Option files provided:

  1. options-D5
  2. options-D5_host
  3. options-zeroday_binary
  4. options-zeroday_multilabel

1. Binary Classification: Classify Tor-based malware and benign connections

Scenarios:

Note(!): 'MULTICLASS' option must be set to 0 in options file

  • Train models with CONNECTION-LEVEL features only [Hayes et al. 2016] derived from top3 highly active Tor connections

    cmd: python classify_topk.py --options options-D5 --topk 3 --train
    
  • Train models with CONNECTION+HOST-LEVEL features [Dodia et al. 2022] using top3 highly active Tor connections for connection-level features

    cmd: python classify_topk.py --options options-D5_host --topk 3 --train
    

2. Multi-label Classification: Infer malware class type

Note(!): 'MULTICLASS' option must be set to 1 in options file

Same commands as used in binary classification.

3. Zeroday Scenario: Test models using traffic from new, unseen binaries (EternalRocks malware)

  • Identify zeroday malware connections using pre-trained binary classifier model

    cmd: python classify_topk.py --options options-zeroday_binary --topk 3 --zeroday
    
  • Identify type of malware (class labels) using pre-trained multi label classifier models

    cmd: python classify_topk.py --options options-zeroday_multilabel --topk 3 --zeroday
    
Note:
  • All experiments can be run with topk=1 or topk=3 (optimal results achieved when top3 most active Tor connections are used for training & testing).
  • Host features can be activated/deactivated by setting HOSTFTS to True/False or commenting in/out in the options file.
  • Models trained with HOSTFTS, must be tested with HOSTFTS option activated in the test (ie. in the zeroday option files).
View on GitHub
GitHub Stars43
CategoryEducation
Updated21d ago
Forks7

Languages

Python

Security Score

95/100

Audited on Mar 17, 2026

No findings