PacketGuard
Packetguard a tool to process .pcap files, generating .csv data for user-defined classification or utilizing built-in classifiers trained on the UNSW-NB15 dataset.
Install / Use
/learn @Ausommet/PacketGuardREADME
PacketGuard
Packetguard is a tool to process .pcap files, generating .csv data for user-defined classification or utilizing built-in classifiers trained on the UNSW-NB15 dataset. Results are conveniently presentted in a readable .csv format for seamless analysis
Table of Contents
Introduction
PacketGuard - Python-powered solution for .pcap file analysis, offering an alternative to traditional methods like Wireshark. Designed to expedite the analysis of vast amounts of network traffic data, PacketGuard simplifies the process and also generates structured data from .pcap files for integration with classifiers and datasets. Reducing time-consuming analysis with Wireshark with effiient insights with PacketGuard.
Features
-
Efficient .pcap Analysis: PacketGuard uses rdpcap for reading the of .pcap files, offering a faster and more efficient alternative to tools like Wireshark.
-
Data Generation: Automatically generates structured data from .pcap files, enabling seamless integration with classifiers or datasets for further analysis.
-
Classifier Integration: Built-in classifiers trained on datasets like UNSW-NB15-training-set.csv facilitate quick and somewhat accurate classification of network traffic data.
-
Custom Classification: Allows users to define their own classification tasks using the generated data, providing flexibility for specific analysis requirements.
-
Readable Output: Presents classified data in a readable .csv format, making it easy for users to interpret and further analyze the results.
-
Python-Based: Written in Python, PacketGuard offers a familiar environment for users comfortable with the language, along with the flexibility to customize and extend its functionality.
-
User-Friendly Interface: Features an intuitive CLI interface that simplifies the process of analyzing and interpreting network traffic data, suitable for both novice and experienced users.
Installation
To get started with PacketGuard, simply follow these steps:
- Clone the repository to your local machine:
git clone https://github.com/Ausommet/PacketGuard.git
- Navigate to the project directory:
cd my_project_directory
- Install the required dependencies using pip:
pip install -r requirements.txt
Once the dependencies are installed, you're ready to use PacketGuard for analyzing .pcap files efficiently!
Ensure you have Python and pip installed on your system before proceeding with the installation.
Usage
To utilize the Packet Classifier Application, follow these steps:
- Ensure you have Python installed on your system.
- Clone the repository or download the application files to your local machine.
- Navigate to the directory containing the application files.
- Run the application using the following command:
python main.py -f <path_to_pcap_file> -m <model_choice> -t <training_data>
Example Usage with the provided Test-Files
python main.py -f .\Test-Files\test-pcap-file.pcap -m DTC -t .\Test-Files\UNSW_NB15_training-set.csv
options:
-h, --help show this help message and exit
-f F Path to the .pcap file
-m {DTC,RFC} Choose the training model: DTC (Decision Tree Classifier) or RFC (Random Forest Classifier)
-t T Choose the training model in .csv file format, you can provide your own or use the provided dataset.
- Follow the prompts displayed in the terminal: Choose whether to save the generated results (y/n).
If applicable, choose whether to save the prediction results (y/n).
Files are always saved in the following directory
For the Generated Data
my_project_directory\Results\Generated Data
and for the Classification Predictions
my_project_directory\Results\Predictions
-
Wait for the application to process the data, train the selected model, and perform prediction.
-
Once completed, the application will display the time taken for prediction and any saved results will be available in the specified location.
-
You can now analyze the prediction results and further investigate the network traffic data as needed.
If you encounter any issues or have questions, refer to the documentation or reach out to our support team for assistance.
Data Generation
The data generation process in PacketGuard involves analyzing .pcap files and extracting relevant information from each packet. Here's an overview of the steps involved:
-
Packet Reading: Packet data is read from the .pcap file using the rdpcap function from the scapy library.
-
Packet Information Extraction:
duration, protocol, service, state, packet counts, bytes, rates, TTL (Time to Live), load, packet loss, packet size means, transmission depth, response body length, TCP sequence numbers, and various connection tracking parameters.
- Derived Metrics Calculation:
inter-arrival times, TCP RTT (Round-Trip Time), SYN-ACK and ACK-DAT delays, mean packet sizes, connection tracking metrics, FTP-related metrics, HTTP flow methods, and others are calculated based on the extracted packet information.
-
Data Structure Initialization: Data structures and variables are initialized to store running averages, total packet sizes, packet counts, connection timestamps, and other relevant information.
-
Data Storage: If specified by the user, the generated packet data is stored in a .csv file for further analysis and processing.
-
Iterative Processing: The packet processing and information extraction are performed iteratively over all packets in the .pcap file, ensuring comprehensive coverage and accurate data representation.
Overall, this process ensures that essential packet information is captured and organized effectively, enabling users to analyze network traffic data efficiently and derive meaningful insights for classification and analysis tasks.
Models
Decision Tree Classifier (DTC)
The Decision Tree Classifier (DTC) is a machine learning algorithm used for classification tasks. In the project, the DTC model is trained to classify network packet data into different attack categories. Here's a description of the DTC model used in the project:
-
Data Preprocessing:
- The dataset, presumably containing features extracted from network packets, is loaded from a CSV file.
- Rows with specified values, such as "-", are removed from certain columns to clean the data.
- Rows with "Generic" values in the "service" column are removed.
- The "attack_cat" column is label encoded using
LabelEncoder()to convert categorical labels into numerical format. - Categorical variables are one-hot encoded to prepare the data for model training.
-
Model Training:
- The dataset is split into features (X) and the target variable (y).
- The dataset is further split into training and testing sets using
train_test_split()fromsklearn.model_selection. - A Decision Tree classifier is initialized with default parameters.
- The model is trained on the training data using
fit(). - The trained model is saved to a pickle file along with the label encoder.
-
Model Evaluation:
- The trained model is used to make predictions on the testing set.
- Classification report and accuracy score are printed to evaluate the model's performance.
-
Time Measurement:
- The time taken to train the model is measured and printed.
Random Forest Classifier (RFC)
The Random Forest Classifier (RFC) is an ensemble learning method that constructs a multitude of decision trees during training and outputs the class that is the mode of the classes of the individual trees. Here's a description of the RFC model used in your project:
-
Data Preprocessing:
- Similar data preprocessing steps are performed as in the DTC model.
-
Model Training:
- The dataset is split into features (X) and the target variable (y).
- The dataset is further split into training and testing sets.
- A Random Forest classifier is initialized with 100 decision trees (
n_estimators=100). - The model is trained on the training data.
- The trained model is saved to a pickle file along with the label encoder.
-
Model Evaluation:
- The trained model is used to make predictions on the testing set.
- Classification report and accuracy score are printed to evaluate the model's performance.
-
Time Measurement:
- The time taken to train the model is measured and printed.
These models provide a way to classify network packet data into different attack categories, allowing for the detection and analysis of network security threats.
Future Work
Moving forward, the primary objective for this project is to optimize the analysis of pcap files by implementing a Generator approach as opposed to utilizing rdpcap. This transition is anticipated to significantly enhance processing speed, owing to the inherent efficiency of Generators compared to the current method.
Moreover, attention will be directed towards refining the pre-processing stage. The existing procedure, while functional, tends to excessively discard data, potentially eliminating valuable metrics essential for predictive analysis. To address this, a comprehensive review of the pre-processing steps will be undertaken, aiming to strike a balance between data refinement and preservation of pertinent predictive features.
# Preprocessing steps to remove rows with specified values
columns_to_check = ["proto", "service", "state", "spkts", "dpkts",
"sbytes", "dbytes", "swin", "dwin", "stcpb", "dtcpb"]
# Remove rows where any of the specified columns
Related Skills
node-connect
348.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
348.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
348.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
