Maqasid
Maqāṣid is a deep learning framework for multi-label thematic classification of Arabic poetry.
Install / Use
/learn @NoorBayan/MaqasidREADME
Maqasid is an end-to-end research framework designed to address the critical challenges in the computational thematic analysis of Arabic poetry. It provides a robust methodology and a suite of tools for researchers, developers, and digital humanists to explore the rich, multifaceted themes inherent in one of the world's oldest literary traditions.
This project moves beyond simple single-label classification by introducing a novel hierarchical thematic taxonomy and a powerful hybrid deep learning model capable of understanding thematic complexity and overlap.
➡️ Live Demo <br>
<p align="center"> <img src="https://your-link-to-a-demo-gif.com/demo.gif" alt="Maqasid Demo" width="80%"> </p>📖 Table of Contents
- ✨ Key Features
- ⚡ Project Structure
- 📊 The Mana Corpus
- 🔬 Interactive Exploration with Google Colab
- 📦 Technology Stack
- 🚀 Getting Started
- 📜 How to Cite
- 📄 License
✨ Key Features
- Multi-Label Classification: Accurately assigns multiple, co-occurring themes to a single poem, reflecting its true literary nature.
- Hierarchical Thematic Schema: A novel taxonomy based on seven authoritative works of Arabic literary criticism, capturing thematic nuances with up to four levels of specificity.
- Poetry-Specific Embeddings: Utilizes a custom
FastTextmodel trained from scratch on our poetry corpus to understand archaic and metaphorical language. - Reproducible MLOps Pipeline: The entire project is structured with modern MLOps practices, using
DVCfor data versioning to ensure full reproducibility. - Interactive Tools: A user-friendly web application and a Google Colab notebook for model demonstration and in-depth corpus exploration.
⚡ Project Structure
The repository is organized into several key directories, each serving a specific purpose to ensure modularity and clarity.
/
├── src/ # Contains the core source code of the Maqasid framework,
│ # including data processing, model architecture, and training logic.
│
├── web_app/ # Source code for the interactive Streamlit web application,
│ # which provides a user-friendly interface for the model.
│
├── test/ # Includes unit and integration tests to ensure the reliability
│ # and correctness of the framework's components.
│
├── images/ # Static image assets used in the documentation and web app.
│
├── .dvc/ # Directory for DVC metadata (not shown in repo view).
├── dvc.yaml # Defines the stages of the DVC data pipeline.
└── README.md # This documentation file.
📊 The Mana Corpus
The Maqasid framework was trained and evaluated on the Mana (مَعنَى) Corpus, a large-scale, thematically annotated dataset of Arabic poetry developed as part of this research. The corpus is a key contribution of our work and is hosted in its own dedicated repository.
It features a gold-standard, expert-annotated set and a large, computationally-annotated extension.
➡️ Explore and Download the Mana Corpus Here
For all details regarding the corpus structure, metadata, and usage, please refer to the documentation in the Mana repository.
🔬 Interactive Exploration with Google Colab
To enhance the accessibility and promote hands-on analysis of the Maqasid Corpus, we have developed an interactive Google Colab notebook. This tool empowers anyone—from students to seasoned researchers—to visually explore, filter, and analyze the dataset directly in their browser with zero setup.
➡️ Open the Interactive Explorer in Google Colab
The notebook features two powerful, user-friendly dashboards:
1. Thematic Poem Browser
This dashboard provides an intuitive way to navigate the corpus through its rich thematic hierarchy. It allows you to:
- Drill-Down Through Themes: Start from broad categories (e.g., "Love Poetry") and progressively narrow your focus to highly specific sub-themes (e.g., "Chaste Love" → "Love from a distance").
- Instantly Access Poems: As you select a theme, the interface immediately populates a list of all poems annotated with that specific theme.
- View Detailed Poem Analysis: Clicking on a poem reveals its full text, essential metadata (poet, era), and an interactive pie chart that visualizes its complete thematic composition.
2. Cross-Era Thematic Analysis Dashboard
Designed for comparative literary studies, this advanced analytical tool enables data-driven investigation into the evolution of poetic themes across different historical periods. Its key functionalities include:
- Targeted Analysis: Select a primary theme (e.g., "Praise Poetry") and a specific historical era (e.g., "Umayyad Period") to focus your inquiry.
- Dynamic Visualization: The tool automatically generates a series of hierarchical bar charts that break down the chosen theme into its sub-themes, displaying the frequency of each within the selected era.
- Uncover Literary Trends: This dashboard facilitates empirical answers to complex research questions, such as: "Which sub-themes of Satire were most prevalent in the Abbasid era compared to the Modern era?"
This powerful feature transforms the Maqasid Corpus from a static dataset into a dynamic laboratory for literary and historical inquiry.
📦 Technology Stack
- Backend & ML: Python, PyTorch, Gensim, Scikit-learn
- Web Framework & Notebooks: Streamlit, Google Colab, Plotly
- Data Versioning: DVC
- HPO: Optuna
- Code Quality: Black, isort, Flake8
- Testing: Pytest
🚀 Getting Started
Follow these instructions to set up and run the project on your local machine.
Prerequisites
- Python 3.9+
- Java Development Kit (JDK) (required by
pyfarasa) - Git & DVC
Installation and Setup
-
Clone the Repository
git clone https://github.com/your-username/maqasid.git cd maqasid -
Create and Activate a Virtual Environment
python -m venv venv # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate -
Install Dependencies
pip install -r requirements.txt -
Pull Data and Models with DVC This step downloads the large data and model files tracked by DVC.
dvc pull -
Run the Interactive Web Application
streamlit run web_app/app.pyYour browser should open a new tab with the Maqasid dashboard!
📜 How to Cite
If you use the Maqasid framework or the associated corpus in your research, please cite our paper:
(Once the paper is published, add the full BibTeX citation here. For now, you can use a placeholder.)
bibtex
@article{Al-anazi2025Maqasid,
author = {Your Authors},
title = {Maqasid: A Hybrid CNN-BiLSTM Framework for Nuanced Thematic Classification of Arabic Poetry},
journal = {IEEE Access},
year = {2025 (Forthcoming)}
}
📄 License
This project is licensed under the MIT License. See the LICENSE file for more details.
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
