Nazm
Nazm نَظْمْ is a comprehensive system for processing Arabic poetry, offering a rich array of linguistic tools and resources. It includes a vast collection of Application Programming Interfaces (APIs), providing diverse functionalities, organized into three main modules.
Install / Use
/learn @NoorBayan/NazmREADME
Nazm (نَظْمْ)
<p align="center"> <img src = "https://raw.githubusercontent.com/NoorBayan/Nazm/main/images/logo.png" width = "200px"/> </p>Nazm نَظْمْ is a comprehensive system designed to process Arabic poetry, providing a rich repository of linguistic tools and resources. The system includes an extensive collection of Application Programming Interfaces (APIs) that offer a wide range of functionalities, organized into three main modules.
Module 1: Advanced Linguistic Analysis of Arabic Poetry
- Morphological and Syntactic Analysis: This tool offers precise extraction of morphological features and accurate part-of-speech tagging. It also provides sentence parsing according to classical Arabic grammatical traditions (i’rab إعراب) while supporting modern syntactic theories like Dependency-constituency parsing. Additionally, it addresses elliptical sentence constructions by offering visual representations that highlight word and sentence connections.
Module 2: Automated and Customized Arabic Poetry Generation
- Automated Generation: This advanced tool generates Arabic poetry based on various parameters, including the type of poetic form (e.g., Qasida, Muwashah, Doublet), rhyme scheme, poetic meter, theme, era, and even emulation of a specific poet’s style. This functionality allows for the creation of innovative poetry or the emulation of traditional styles, tailored to user preferences.
Module 3: Comprehensive Prosody Analysis of Poetry
- Prosody Analysis: A specialized tool for analyzing poetic meter, which includes text formation, recognition of poetic meters, scansion, identification of metrical feet (taf’ilat), and detection of metrical and rhyme errors. It also evaluates rhyme schemes and identifies their flaws.
Methodology
Nazm employs an integrated methodology leveraging advanced AI and linguistic analysis, with a focus on improving accuracy and customizing models according to available data. Below is an overview of the methodologies and technologies used:
1. Data Preparation
Data availability and quality are crucial for fine-tuning models. We have prepared the data for the three modules as follows:
-
Module 1: Advanced Linguistic Analysis: We completed the Quranic Corpus project, a three-tiered corpus comprising phonological, morphological, and syntactic layers. The phonological and morphological layers were expanded and corrected, and the syntactic layer was fully developed. Manual review and evaluation were conducted to achieve a gold standard for the Quranic Corpus.
| Corpus Layer | Status | Details | |------------------------|-------------------|--------------------------------------------------------| | Orthographical Layer | Completed | Expanded and corrected | | Morphological Layer | Completed | Expanded and corrected | | Syntactic Layer | Fully Developed | Manual review conducted, achieving a gold standard |
1. Data Preparation
Data availability and quality are crucial for fine-tuning models. We have prepared the data for the three modules as follows:
-
Module 1: Advanced Linguistic Analysis: We completed the Quranic Corpus project, a three-tiered corpus comprising phonological, morphological, and syntactic layers. The phonological and morphological layers were expanded and corrected, and the syntactic layer was fully developed. Manual review and evaluation were conducted to achieve a gold standard for the Quranic Corpus.
| Corpus Layer | Status | Details | |------------------------|-------------------|--------------------------------------------------------| | Phonological Layer | Completed | Expanded and corrected | | Morphological Layer | Completed | Expanded and corrected | | Syntactic Layer | Fully Developed | Manual review conducted, achieving a gold standard |
The data layers in the new Corpus dataset have been significantly enhanced to provide comprehensive information necessary for the research community when utilizing this linguistic corpus. These enhancements focus on orthographic, morphological, and syntactic representations of texts, including:
-
Orthographic Layer:
- An Imlaai script has been incorporated to align with classical Arabic texts, replacing the Uthmani script that previously limited the generalization of model results.
- Buckwalter Unicode encoding has been added to strengthen the connection between the dataset and other Arabic resources.
- English translation and transliteration of texts have been included to increase educational value.
- A sentence coding system has been developed to make the data compatible with other classical texts, not just the Quranic text.
-
Morphological Layer:
- The Parts of Speech (POS) scheme has been expanded to include more precise morphological features.
- The data has been meticulously cleaned to correct errors in morphological annotations, leading to a significant increase in the number of columns compared to the previous corpus.
- The diagram below illustrates the analytical information and classifications within the morphological layer. Diagram illustrating the analytical information and classifications of the morphological layer.
-
-
Syntactic Layer:
- The syntactic layer has been fully constructed, with the CoNLL-X scheme extended to represent a hybrid model combining dependency and constituency structures, alongside the introduction of elliptical constructions.
- The diagram below illustrates the analytical information and classifications within the syntactic layer.
Diagram illustrating the analytical information and classifications of the syntactic layer.
- Module 2: Automated Poetry Generation: The largest poetic corpus was compiled from various poetry data repositories. The corpus contains over half a million poems and approximately 15 million individual verses. Classification algorithms were employed to categorize the poetry into multiple genres.
- Module 3: Prosody Analysis: A rule-based system was developed for prosody analysis according to the well-known rules of Khalil ibn Ahmad al-Farahidi for classical Arabic poetry. Based on this system, we are building a prosody pattern corpus and a prosody dataset containing around fifty thousand verses, analyzed and reviewed by volunteer linguists.
Grammatical_Analysis
2. Fine-Tuning Large Language Models
The previously prepared data was restructured as input-output pairs for fine-tuning. We prioritized accuracy and diversity in the data used, ensuring it represents a good sample for fine-tuning. Initially, models were fine-tuned on the Gemini platform, with repeated iterations to achieve optimal settings. Further fine-tuning on other language models will follow for comparison and to achieve the best results.
3. Evaluation of the Three Models
Each fine-tuned model undergoes evaluation.
| Model | Accuracy | Precision | Recall | |-------------------------------|-----------------|-----------------|-----------------| | Linguistic Analysis Model | 95% | 93% | 94% | | Poetry Generation Model | 92% | 91% | 90% | | Prosody Analysis Model | 94% | 95% | 93% |
4. Data Visualization
A data visualization model was developed to provide interactive visual representations of output, illustrating word connections using Scalable Vector Graphics (SVG). This feature enhances users’ understanding of analyses and facilitates easy interpretation of results.
Contributions
1. Annotated Corpora
-
The Quranic Annotated Corpus: A meticulously tagged corpus with comprehensive morphological and syntactic analysis. This corpus serves as a fundamental resource for fine-tuning language models, enabling the model to grasp the intricate grammatical structure of Arabic and apply it to poetry.
-
The Extensive Poetic Corpus: Comprising over half a million poems with approximately 15 million verses, this corpus is categorized based on five criteria, including poetic form, theme, meter, prosodic system, and writing style. These classifications allow the model to understand the nuances of Arabic poetry, facilitating accurate generation and analysis.
-
Prosody Pattern Corpora: This contribution includes two corpora. The
Related Skills
node-connect
337.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.2kCommit, push, and open a PR
Security Score
Audited on Feb 9, 2026
