EnronEmailCategorization
Emails are important and pervasive in people lives as a large section of the society exchange digital messages as a means for communication. Therefore, it is very important to keep this chunk organized and categorized as large volumes of different types of emails result in a cluttered mailbox and in this messy outlook, some important emails may go unnoticed. I've performed email categorization based on the body of the email using real world email dataset provided by Enron Corporation which consisted of over 500000 emails from over 150 people’s inboxes. I've performed experiments and on a chunk of this enron dataset which was labeled for its categories and since the volume of the dataset was so large, it could be easily divided into training, testing and validation sets which were modeled using Naive Bayes, Linear Support Vector and K-nearest neighbors classifiers to provide baseline results using text extractors namely TF-IDF Vectorizer and Count Vectorizer.
Install / Use
/learn @SRavewaskar/EnronEmailCategorizationREADME
Topic: Automatic email classification and categorization into organized bundles. Author: Saurabh Rewaskar
README.txt: This file contains the description of the contents of this project and how to execute the program data/ : The data folder categories.txt(http://bailando.sims.berkeley.edu/enron/enron_categories.txt): The categories which are used to label the emails classify.py: The main program, this is the program which needs to be executed.
How to run this project: 1] Setting up the environment. Requires Python 3.6. Libraries required: pandas numpy scikit-learn timeit (Alternatively use the Anaconda Data Science platform)
2] The data is to be downloaded from the link: http://bailando.sims.berkeley.edu/enron/enron_with_categories.tar.gz and extracted into the data/ folder. The data directory should look like this after data extraction into it. data/ enron_with_categories/ 1/ 2/ 3/ 4/ 5/ 6/ 7/ 8/
3] Run the script classify.py Expected time to complete ~15min
Related Skills
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
