SkillAgentSearch skills...

Maqasid

Maqāṣid is a deep learning framework for multi-label thematic classification of Arabic poetry.

Install / Use

/learn @NoorBayan/Maqasid
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <h1 align="center"> 📜 Maqasid (مقاصد) 📜 </h1> <p align="center"> <strong>A Deep Learning Framework for Nuanced Thematic Classification of Arabic Poetry</strong> </p> <p align="center"> <a href="https://opensource.org/licenses/MIT"> <img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"> </a> <a href="https://www.python.org/downloads/"> <img src="https://img.shields.io/badge/python-3.9%2B-blue.svg" alt="Python Version"> </a> <a href="https://github.com/psf/black"> <img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black"> </a> </p> </div>

Maqasid is an end-to-end research framework designed to address the critical challenges in the computational thematic analysis of Arabic poetry. It provides a robust methodology and a suite of tools for researchers, developers, and digital humanists to explore the rich, multifaceted themes inherent in one of the world's oldest literary traditions.

This project moves beyond simple single-label classification by introducing a novel hierarchical thematic taxonomy and a powerful hybrid deep learning model capable of understanding thematic complexity and overlap.

➡️ Live Demo <br>

<p align="center"> <img src="https://your-link-to-a-demo-gif.com/demo.gif" alt="Maqasid Demo" width="80%"> </p>

📖 Table of Contents


✨ Key Features

  • Multi-Label Classification: Accurately assigns multiple, co-occurring themes to a single poem, reflecting its true literary nature.
  • Hierarchical Thematic Schema: A novel taxonomy based on seven authoritative works of Arabic literary criticism, capturing thematic nuances with up to four levels of specificity.
  • Poetry-Specific Embeddings: Utilizes a custom FastText model trained from scratch on our poetry corpus to understand archaic and metaphorical language.
  • Reproducible MLOps Pipeline: The entire project is structured with modern MLOps practices, using DVC for data versioning to ensure full reproducibility.
  • Interactive Tools: A user-friendly web application and a Google Colab notebook for model demonstration and in-depth corpus exploration.

⚡ Project Structure

The repository is organized into several key directories, each serving a specific purpose to ensure modularity and clarity.

/
├── src/ # Contains the core source code of the Maqasid framework,
│ # including data processing, model architecture, and training logic.
│
├── web_app/ # Source code for the interactive Streamlit web application,
│ # which provides a user-friendly interface for the model.
│
├── test/ # Includes unit and integration tests to ensure the reliability
│ # and correctness of the framework's components.
│
├── images/ # Static image assets used in the documentation and web app.
│
├── .dvc/ # Directory for DVC metadata (not shown in repo view).
├── dvc.yaml # Defines the stages of the DVC data pipeline.
└── README.md # This documentation file.


📊 The Mana Corpus

The Maqasid framework was trained and evaluated on the Mana (مَعنَى) Corpus, a large-scale, thematically annotated dataset of Arabic poetry developed as part of this research. The corpus is a key contribution of our work and is hosted in its own dedicated repository.

It features a gold-standard, expert-annotated set and a large, computationally-annotated extension.

➡️ Explore and Download the Mana Corpus Here

For all details regarding the corpus structure, metadata, and usage, please refer to the documentation in the Mana repository.


🔬 Interactive Exploration with Google Colab

To enhance the accessibility and promote hands-on analysis of the Maqasid Corpus, we have developed an interactive Google Colab notebook. This tool empowers anyone—from students to seasoned researchers—to visually explore, filter, and analyze the dataset directly in their browser with zero setup.

➡️ Open the Interactive Explorer in Google Colab

The notebook features two powerful, user-friendly dashboards:

1. Thematic Poem Browser

This dashboard provides an intuitive way to navigate the corpus through its rich thematic hierarchy. It allows you to:

  • Drill-Down Through Themes: Start from broad categories (e.g., "Love Poetry") and progressively narrow your focus to highly specific sub-themes (e.g., "Chaste Love" → "Love from a distance").
  • Instantly Access Poems: As you select a theme, the interface immediately populates a list of all poems annotated with that specific theme.
  • View Detailed Poem Analysis: Clicking on a poem reveals its full text, essential metadata (poet, era), and an interactive pie chart that visualizes its complete thematic composition.
<p align="center"> <em>The interactive poem browser in action, allowing users to filter poems by theme and view a detailed analysis with a dynamic chart.</em> <br> <img src="https://raw.githubusercontent.com/NoorBayan/Maqasid/main/images/ThematicPoem.png" width="600px"/> </p>

2. Cross-Era Thematic Analysis Dashboard

Designed for comparative literary studies, this advanced analytical tool enables data-driven investigation into the evolution of poetic themes across different historical periods. Its key functionalities include:

  • Targeted Analysis: Select a primary theme (e.g., "Praise Poetry") and a specific historical era (e.g., "Umayyad Period") to focus your inquiry.
  • Dynamic Visualization: The tool automatically generates a series of hierarchical bar charts that break down the chosen theme into its sub-themes, displaying the frequency of each within the selected era.
  • Uncover Literary Trends: This dashboard facilitates empirical answers to complex research questions, such as: "Which sub-themes of Satire were most prevalent in the Abbasid era compared to the Modern era?"
<p align="center"> <em>The Cross-Era Analysis Dashboard generating hierarchical bar charts to compare sub-theme frequencies within a selected era.</em> <br> <img src="https://raw.githubusercontent.com/NoorBayan/Maqasid/main/images/ThematicAnalysis.png" width="400px"/> </p>

This powerful feature transforms the Maqasid Corpus from a static dataset into a dynamic laboratory for literary and historical inquiry.


📦 Technology Stack

  • Backend & ML: Python, PyTorch, Gensim, Scikit-learn
  • Web Framework & Notebooks: Streamlit, Google Colab, Plotly
  • Data Versioning: DVC
  • HPO: Optuna
  • Code Quality: Black, isort, Flake8
  • Testing: Pytest

🚀 Getting Started

Follow these instructions to set up and run the project on your local machine.

Prerequisites

Installation and Setup

  1. Clone the Repository

    git clone https://github.com/your-username/maqasid.git
    cd maqasid
    
  2. Create and Activate a Virtual Environment

    python -m venv venv
    # On Windows: venv\Scripts\activate
    # On macOS/Linux: source venv/bin/activate
    
  3. Install Dependencies

    pip install -r requirements.txt
    
  4. Pull Data and Models with DVC This step downloads the large data and model files tracked by DVC.

    dvc pull
    
  5. Run the Interactive Web Application

    streamlit run web_app/app.py
    

    Your browser should open a new tab with the Maqasid dashboard!


📜 How to Cite

If you use the Maqasid framework or the associated corpus in your research, please cite our paper:

(Once the paper is published, add the full BibTeX citation here. For now, you can use a placeholder.)

bibtex
@article{Al-anazi2025Maqasid,
  author    = {Your Authors},
  title     = {Maqasid: A Hybrid CNN-BiLSTM Framework for Nuanced Thematic Classification of Arabic Poetry},
  journal   = {IEEE Access},
  year      = {2025 (Forthcoming)}
}


📄 License

This project is licensed under the MIT License. See the LICENSE file for more details.

Related Skills

View on GitHub
GitHub Stars109
CategoryEducation
Updated1mo ago
Forks0

Languages

Python

Security Score

95/100

Audited on Feb 4, 2026

No findings