FlowMiner
FlowMiner: A Powerful GNN Model Based on Flow Correlation Mining for Encrypted Traffic Classification
Install / Use
/learn @MrRobotsAA/FlowMinerREADME
FlowMiner
FlowMiner: A Powerful GNN Model Based on Flow Correlation Mining for Encrypted Traffic Classification
(1) The detailed traffic features relied upon by Diting to construct edges are shown in the following table.
| Features | Type | Descriptions | Related information | |----------------------------|------|------------------------------------------------------------------------------|---------------------| | Application Name | str | nDPI detected application name | Server Side | | Application Category Name | str | nDPI detected application category name | Server Side | | Application is Guessed | int | Indicates if detection result is based on pure dissection or on a guess heuristics | Server Side | | Requested server name | str | SSL/TLS, DNS, HTTP | Server Side | | Client Fingerprint | str | DHCP fingerprint for DHCP, JA3 for SSL/TLS and HASSH for SSH | Client Side | | Server Fingerprint | str | JA3S for SSL/TLS and HASSH for SSH | Server Side | | User agent | str | Extracted user agent for HTTP or User Agent Identifier for QUIC | Client Side | | Content type | str | Extracted HTTP content type | Server Side |
(2) The following are detailed category names for the ISCX VPN-nonVPN dataset and the detailed category names for the ISCX Tor-nonTor dataset.
ISCX-nonVPN detailed categories (5 classes):
- vpn_chat
- vpn_voip
- vpn_email
- vpn_file
- vpn_video
ISCX-VPN detailed categories (5 classes):
- chat
- video
- voip
- file
ISCX-Tor detailed categories (8 classes):
- tor_p2p
- tor_mail
- tor_browsing
- tor_voip
- tor_video
- tor_file
- tor_audio
- tor_chat
ISCX-nonTor detailed categories (8 classes):
- file
- audio
- video
- browsing
- voip
- chat
- p2p
(3) The detailed node traffic features constructed by Diting are shown in the following table.
| id | features name | directions | statistical features | Category | |-------|-----------------------------------|----------------------|------------------------------------------------------------------|-------------------| | 1 | protocol | bidirectional | - | Content Features | | 2 | ip_version | bidirectional | - | Content Features | | 3-4 | duration time | bidirectional,single | - | Temporal Features | | 5-14 | packet length | bidirectional,single | origin,min,max,mean,std | Spatial Features | | 15-16 | bytes size | bidirectional,single | - | Spatial Features | | 17-24 | packets interval | bidirectional,single | min,max,mean,std | Temporal Features | | 25-26 | syn packet number | bidirectional,single | - | Content Features | | 27-28 | cwr packet number | bidirectional,single | - | Content Features | | 29-30 | ece packet number | bidirectional,single | - | Content Features | | 31-32 | urg packet number | bidirectional,single | - | Content Features | | 33-34 | ack packet number | bidirectional,single | - | Content Features | | 35-36 | psh packet number | bidirectional,single | - | Content Features | | 37-38 | rst packet number | bidirectional,single | - | Content Features | | 39-40 | fin packet number | bidirectional,single | - | Content Features | | 41 | application_name | bidirectional | - | Content Features | | 42 | application_category_name | bidirectional | - | Content Features | | 43 | application_is_guessed | bidirectional | - | Content Features | | 44 | user_agent | bidirectional | - | Content Features | | 45 | requested server name | bidirectional | - | Content Features | | 46 | content_type | bidirectional | - | Content Features | | 47 | JA3/JA3S fingerprint | bidirectional | - | Content Features | | 48 | payload bytes | bidirectional | - | Byte Features | | 49-304| payload bytes value distribution | bidirectional | - | Byte Features | | 305-378| payload length distribution | bidirectional | - | Byte Features | | 379-396| popcount value | bidirectional | max, min, mean, med, mode, std, var, rng, skew, kurt, P25, P50, P75, P90, P95, Q1, Q2, Q3 | Byte Features | | 397-414| printable characters number | bidirectional | max, min, mean, med, mode, std, var, rng, skew, kurt, P25, P50, P75, P90, P95, Q1, Q2, Q3 | Byte Features | | 415 | total popcount value | bidirectional | - | Byte Features | | 416 | total printable characters number | bidirectional | - | Byte Features | | 417-641| Higher-order Cross Features | - | - | Cross Features |
In the table above, various abbreviations represent different statistical features:
- max: Represents the Maximum value in the dataset, which is the highest value observed.
- min: Stands for the Minimum value, indicating the smallest value in the dataset.
- mean: Denotes the Mean or average of the dataset, calculated by summing all values and dividing by the count of values.
- med: Represents the Median, which is the middle value in a dataset when the numbers are all arranged in order.
- mode: Stands for Mode, which is the value that appears most frequently in the dataset.
- std: Represents the Standard Deviation, a measure of variation or dispersion in a set of values.
- var: Denotes Variance, a statistical measurement that describes the spread of numbers in a dataset.
- rng: Represents the Range, which is the difference between the largest and smallest values in the dataset.
- skew: Represents Skewness, a measure of the asymmetry of the distribution of values in the dataset.
- kurt: Stands for Kurtosis, a statistical measure that defines the distribution's tails and peak.
- P25, P50, P75, P90, P95: These are Percentiles, values below which a given percentage of observations fall. For instance, P25 is the value below which 25% of the data falls.
- Q1, Q2, Q3: These are Quartiles, values that divide the dataset into four equal parts. Q1 is the first quartile (also equal to P25), Q2 is the median (also equal to P50), and Q3 is the third quartile (also equal to P75).
(3) For the purpose of easy replication, we have published the entire feature extraction code. Among them, feature_extraction.py is the main code.
