ExeRay
ExeRay AI detects malicious Windows executables using ML. Analyzes entropy, imports, and metadata for rapid classification, aiding incident response. Built with Python and scikit-learn.
Install / Use
/learn @MohamedMostafa010/ExeRayREADME
ExeRay 2.0 :hospital:
<p align="center"> <img src="assets/ExeRay_Image.png" alt="TruxTrace banner" width="560"/> </p>Advanced X-ray Vision for Windows Executables
- Detect malicious
.exefiles using machine learning. Extracts static features (entropy, imports, metadata) and combines ML with heuristic rules for fast, automated classification.
🚀 What's New in v2.0
- 50+ New Detection Features (VM detection, anti-debugging, API call chains)
- Enhanced Prediction Engine with detailed suspicious behavior reports if malware found
- Recall-Optimized Training: Custom scorer prioritizing malware detection
- Streamlined 3-Script Architecture (faster workflow)
- Improved Accuracy (F1-score up to 0.99 in testing)
- Dataset Provided !!
📊 Dataset Information
- Source & Composition:
<div align="center"> <table> <thead> <tr> <th>Dataset</th> <th>From</th> <th>Examples</th> <th>Total</th> </tr> </thead> <tbody> <tr> <td><strong>Malicious Dataset</strong></td> <td> <a href="https://github.com/iosifache/DikeDataset">DikeDataset</a>, <a href="https://github.com/ytisf/theZoo">theZoo</a>, <a href="https://bazaar.abuse.ch/">MalwareBazaar</a> </td> <td>WannaCry.exe, njRAT.exe</td> <td>10,925</td> </tr> <tr> <td><strong>Benign Dataset</strong></td> <td> Windows Files, <a href="https://ninite.com/">Ninite.com</a>, <a href="https://portableapps.com/">PortableApps.com</a> </td> <td>Putty.exe, notepad.exe, ida.exe</td> <td>3,590</td> </tr> <tr> <td colspan="3" style="text-align:right;"><strong>Total</strong></td> <td><strong>14,515</strong></td> </tr> <tr> <td colspan="3" style="text-align:right;"><strong>Size</strong></td> <td><strong>11.81 GB</strong></td> </tr> </tbody> </table> </div>- Dataset Processing: From 10,925 malware samples, we processed 4,200 for feature extraction, then applied Undersampling to balance with 3,500 benign samples (7,000 total). Used RandomUnderSampler (random_state=42) to prevent malware bias while preserving key patterns.
- Benign and Malicious Dataset Link (11.81 GB on MEGA): https://mega.nz/folder/iAU3iARQ#nKPwCQIW4jZgAEFmRJlR6Q
- ⚠️ Safety Notice:
- Please exercise caution when downloading and handling this dataset. It contains both benign and malicious files for research purposes.
- Do not execute or open any files unless you're in a secure, isolated environment (e.g., a virtual machine or sandbox). Executing malicious files can harm your system or compromise your data.
- ⚠️ Important Notice About the Dataset:
- To keep this repository lightweight and easy to download, the full dataset is not included here. Specifically:
- The data/ folder does not contain any executable files.
- You will find only two empty directories: benign/ and malware/.
- If you wish to work with the actual dataset, you need to download it manually from the MEGA link.
- To keep this repository lightweight and easy to download, the full dataset is not included here. Specifically:
:gear: Enhanced Features
- Hybrid AI detection (XGBoost + Random Forest)
- Detailed Malware Fingerprinting:
- VM/Sandbox detection markers
- Anti-debugging technique identification
- Suspicious API call patterns
- Confidence Scoring with threat level classification
:wrench: Upgraded Tech Stack
New Components:
- Advanced PE Analysis: Full directory parsing (TLS, Debug, Resources)
- String Analysis: Unicode/ASCII pattern detection
- Behavioral Indicators: 15+ new malware behavior signatures
Key Improvements:
- Structural Features:
# PE File Structure
'num_sections',
'num_unique_sections',
'section_names_entropy',
'avg_section_size',
'min_section_size',
'max_section_size',
'total_section_size',
'avg_entropy',
'min_entropy',
'max_entropy',
'has_packed_sections',
'has_executable_sections',
'writable_executable_sections',
'is_dll',
'is_executable',
'is_system_file',
'has_aslr',
'has_dep',
'is_signed',
'has_rich_header',
'rich_header_entries',
'has_resources',
'num_resources',
'has_embedded_exe',
'has_debug',
'has_tls',
'has_relocations',
'ep_in_first_section',
'ep_in_last_section',
'ep_section_entropy',
'has_suspicious_sections'
- Behavioral Features:
# API/Import Analysis
'num_imports',
'num_unique_dlls',
'num_unique_imports',
'imports_to_dlls_ratio',
'has_import_name_mismatches',
'suspicious_imports_count',
'num_exports',
'suspicious_exports',
'suspicious_api_chains',
'has_delayed_imports',
'has_vm_detection_imports',
'has_anti_debug_imports',
'has_process_creation_imports',
'has_createprocess',
'has_setwindowshookex',
# String Patterns
'num_strings',
'avg_string_length',
'has_suspicious_strings',
'has_anti_debug',
'has_vm_detection_strings',
'has_vm_mac_addresses',
'has_anti_debug_strings',
'has_nop_sleds',
'has_anti_debug_strings'
- Detection Signatures:
vm_detection_strings = {
b'vbox', b'vmware', b'virtualbox', b'qemu', b'xen', b'hypervisor',
b'virtual machine', b'vmcheck', b'vboxguest', b'vboxsf', b'vboxvideo'
}
vm_mac_prefixes = {
b'00:0C:29', b'00:1C:14', b'00:05:69', b'00:50:56', # VMware
b'08:00:27', # VirtualBox
b'00:16:3E', # Xen
b'00:1C:42', # Parallels
b'00:15:5D' # Hyper-V
}
anti_debug_strings = {
b'IsDebuggerPresent', b'CheckRemoteDebuggerPresent', b'OutputDebugString',
b'NtQueryInformationProcess', b'NtSetInformationThread', b'ZwSetInformationThread'
}
suspicious_patterns = {
b'payload', b'malware', b'inject', b'virus', b'trojan',
b'backdoor', b'rat', b'worm', b'spyware', b'keylog',
b'xored', b'encrypted', b'packed', b'obfus'
}
# API Groups
vm_detection_apis = {
'cpuid', 'hypervisor', 'vmcheck', 'vbox', 'vmware', 'virtualbox',
'wine_get_unix_file_name', 'wine_get_dos_file_name'
}
anti_debug_apis = {
'IsDebuggerPresent', 'CheckRemoteDebuggerPresent', 'OutputDebugStringA',
'NtQueryInformationProcess', 'NtSetInformationThread', 'NtQuerySystemInformation',
'GetTickCount', 'QueryPerformanceCounter', 'RDTSC', 'GetProcessHeap',
'ZwSetInformationThread', 'DbgBreakPoint', 'DbgUiRemoteBreakin'
}
process_creation_apis = {
'CreateProcessA', 'CreateProcessW', 'CreateProcessAsUserA', 'CreateProcessAsUserW',
'SetWindowsHookExA', 'SetWindowsHookExW', 'ShellExecuteA', 'ShellExecuteW',
'WinExec', 'System'
}
# Suspicious API Chains
api_sequences = {
('VirtualAlloc', 'WriteProcessMemory', 'CreateRemoteThread'): 'Process Injection',
('RegCreateKey', 'RegSetValue', 'RegCloseKey'): 'Registry Persistence',
('LoadLibraryA', 'GetProcAddress', 'VirtualProtect'): 'Dynamic API Resolution',
('OpenProcess', 'ReadProcessMemory', 'WriteProcessMemory'): 'Process Hollowing',
('NtUnmapViewOfSection', 'MapViewOfFile', 'ResumeThread'): 'RunPE Technique',
('CreateProcessA', 'WriteProcessMemory', 'ResumeThread'): 'Process Injection',
('SetWindowsHookExA', 'GetMessage', 'DispatchMessage'): 'Hook Injection'
}
:file_folder: Directory Structure
ExeShield_AI/
├── assets/ # Repo Images
├── data/ # Raw Samples
│ ├── malware/ # Malicious Executables
│ └── benign/ # Clean Executables
├── dependencies/ # Installation Dependencies
├── models/ # Saved Models/Thresholds
│ ├── malware_detector.joblib
│ └── optimal_threshold.npy
├── output/ # Processed Data (CSV/features)
│ └── processed_features_dataset.csv
├── scripts/ # Core Scripts
│ ├── extract_features.py
│ ├── train_model.py
│ └── predict.py
│ └── visualize_model.py
├── visualizations/ # Model Feature & Tree Visualizations
│ ├── feature_importances.png
│ ├── feature_importances_gain.png
│ └── xgb_tree_0.png
│ └── xgb_tree_1.png
│ └── xgb_tree_2.png
│ └── xgb_tree_99.png
└── README.md
:computer: Installation and Usage (Commands & Outputs)
1. Clone the repository:
git clone https://github.com/MohamedMostafa010/ExeRay.git
cd ExeRay
2. Install dependencies:
pip install -r dependencies/requirements.txt
3. Extract Features:
> python extract_features.py
[*] Processing benign samples from ../data\benign...
[!] Not a valid PE file: adaminstall.exe
[!] Not a valid PE file: adamsync.exe
[!] Not a valid PE file: AddSuggestedFoldersToLibraryDialog.exe
[!] Not a valid PE file: AgentService.exe
[!] Not a valid PE file: AggregatorHost.exe
[!] Not a valid PE file: appcmd.exe
[!] Not a valid PE file: AppHostRegistrationVerifier.exe
[!] Not a valid PE file: ApplySettingsTemplateCatalog.exe
[!] Not a valid PE file: ApplyTrustOffline.exe
[!] Not a valid PE file: ApproveChildRequest.exe
[!] Not a valid PE file: AppVClient.exe
[!] Not a valid PE file: ARPPRODUCTICON.exe
[!] Not a valid PE file: audit.exe
[!] Not a valid PE file: AuditShD.exe
[!] Not a valid PE file: autofstx.exe
...
[*] Processing malware samples (limited to 3500) from ../data\malware...
[+] Processed Features Dataset saved to ../output/processed_features_dataset.csv
[+] Total samples: 6857
[+] Malware samples: 3500
[+] Benign samples: 3357
4. Train Model (Metrics Also Provided After Training to Know Your Model's Performance):
> python train_model.py
Training models: 0%| | 0/2 [00:00<?, ?it/s]
New best model: XGBoost (Recall=0.990)
Training models: 100%|██████████████
