EnrichEvent
Official source code for the EnrichEvent: Enriching Social Data with Contextual Information for Emerging Event Extraction paper, published in Iran Journal of Computer Science.
Install / Use
/learn @mojtabaSefidi/EnrichEventREADME
EnrichEvent
Official implementation of "EnrichEvent: Enriching Social Data with Contextual Information for Emerging Event Extraction"
Introduction
Social platforms have emerged as crucial platforms for disseminating information and discussing real-life social events, offering researchers an excellent opportunity to design and implement novel event detection frameworks. However, most existing approaches only exploit keyword burstiness or network structures to detect unspecified events. Thus, they often need help identifying unknown events regarding the challenging nature of events and social data. Social data, e.g., tweets, is characterized by misspellings, incompleteness, word sense ambiguation, irregular language, and variation in aspects of opinions. Moreover, extracting discriminative features and patterns for evolving events by exploiting the limited structural knowledge is almost infeasible. To address these challenges, in this paper, we propose a novel framework, EnrichEvent, that leverages the linguistic and contextual representations of streaming social data. In particular, we leverage contextual and linguistic knowledge to detect semantically related tweets and enhance the effectiveness of the event detection approaches. Eventually, our proposed framework produces cluster chains for each event to show the evolving variation of the event through time. We conducted extensive experiments to evaluate our framework, validating its high performance and effectiveness in detecting and distinguishing unspecified social events.
Inputs & Outputs
- Input: Streams of message blocks.
- Output: Existing events presented as cluster chains.
How to Run
- Open
main.ipynb. - Initialize and customize the parameters based on your requirements.
- Run all cells in
main.ipynb. - The results will be saved in the specified output directory.
About the Dataset
- You can find the details of our proposed datasets in the
/Datasetfolder.- Note: You may also use your own dataset, but ensure its structure and column names are compatible with the model.
Training the Trend Detection Model
- Navigate to the
/Trend_Detectionfolder. - Use
train.pyto build and train the trend detection model.- Note: A labeled dataset is required. You can use
dataset_labeling.pyto label your dataset based on key phrases.
- Note: A labeled dataset is required. You can use
Training the Event Summarization Model
- Navigate to the
/Event_Summarizationfolder. - Use
train.pyto build and train the event summarization model.- Note: A pre-trained embedding model is required based on the language of your dataset.
Citation
For more details about the methodology, please refer to our paper:
@article{Esfahani2025EnrichEvent,
title={EnrichEvent: Enriching Social Data with Contextual Information for Emerging Event Extraction},
author={Mohammadali Sefidi Esfahani and Mohammad Akbari},
journal={Iran Journal of Computer Science},
year={2025},
doi={https://doi.org/10.1007/s42044-025-00284-2}
}
You can also download the paper from arXiv. Please feel free to contact me with any questions or concerns.
Related Skills
node-connect
347.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
