SkillAgentSearch skills...

EnronEmailCategorization

Emails are important and pervasive in people lives as a large section of the society exchange digital messages as a means for communication. Therefore, it is very important to keep this chunk organized and categorized as large volumes of different types of emails result in a cluttered mailbox and in this messy outlook, some important emails may go unnoticed. I've performed email categorization based on the body of the email using real world email dataset provided by Enron Corporation which consisted of over 500000 emails from over 150 people’s inboxes. I've performed experiments and on a chunk of this enron dataset which was labeled for its categories and since the volume of the dataset was so large, it could be easily divided into training, testing and validation sets which were modeled using Naive Bayes, Linear Support Vector and K-nearest neighbors classifiers to provide baseline results using text extractors namely TF-IDF Vectorizer and Count Vectorizer.

Install / Use

/learn @SRavewaskar/EnronEmailCategorization
About this skill

Quality Score

0/100

Supported Platforms

Zed

README

Topic: Automatic email classification and categorization into organized bundles. Author: Saurabh Rewaskar

README.txt: This file contains the description of the contents of this project and how to execute the program data/ : The data folder categories.txt(http://bailando.sims.berkeley.edu/enron/enron_categories.txt): The categories which are used to label the emails classify.py: The main program, this is the program which needs to be executed.

How to run this project: 1] Setting up the environment. Requires Python 3.6. Libraries required: pandas numpy scikit-learn timeit (Alternatively use the Anaconda Data Science platform)

2] The data is to be downloaded from the link: http://bailando.sims.berkeley.edu/enron/enron_with_categories.tar.gz and extracted into the data/ folder. The data directory should look like this after data extraction into it. data/ enron_with_categories/ 1/ 2/ 3/ 4/ 5/ 6/ 7/ 8/

3] Run the script classify.py Expected time to complete ~15min

Related Skills

View on GitHub
GitHub Stars5
CategoryDevelopment
Updated1y ago
Forks0

Languages

Python

Security Score

55/100

Audited on Sep 25, 2024

No findings