44 skills found · Page 1 of 2
urduhack / UrduhackAn NLP library for the Urdu language. It comes with a lot of battery included features to help you process Urdu data in the easiest way possible.
mbzuai-oryx / PALO(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, Bengali and Urdu.
urduhack / Awesome Urdu📖 A curated list of resources dedicated to Urdu language.
mirfan899 / UrduCollection of Urdu datasets for POS, NER, Sentiment, Summarization and NLP tasks.
UniversalPython / UniversalPythonWrite Python in any human language. UniversalPython is a transpiler which makes it possible to write Python code in different human languages like Urdu, German, Czech, and more. The code is translated to Python.
siddiquelatif / URDU DatasetUrdu Language Speech Emotional Corpus
avineshpvs / Indic TaggerIndian Language Tagger and Chunker (Hindi, Telugu, Tamil, Marathi, Punjabi, Kanada, Malayalam, Urdu, Bengali)
urduhack / Urdu Characters📄 Complete collection of Urdu language characters & unicode code points.
Smat26 / Roman Urdu DatasetCompilation of Manually Tagged Roman Urdu Dataset (Urdu written in Latin/Roman Script), along with other helpful Roman Urdu NLP resources
D4X-UMAR / GOLD MDMULTI-DEVICE WHATSAPP BOT WITH URDU LANGUAGE MADE BY UMAR 🙂❤️
notwld / NewsurduA minimal react-native application to show news in urdu language.
PakUrdu-Research-Center / Awesome UrduRepository dedicated to a collection of resources and helping material for Urdu language Processing related tasks
MazanLabeeb / NewspkA basic nodejs based news website (dawn) scraper. You can use this API to fetch latest news in English or Urdu language. It uses node-fetch and jsdom dependencies and hence is a very light-weighted package.
MoizRauf / Urdu Roman Urdu English DictionaryNo description available
Hassan-kareem / Nastaliq Urdu FontJameel Noori Nastaleeq, Noto Nastaliq Urdu and Mehr Nastaliq font for Rooted and Non-Rooted Android device.
IhyaCommunity / Khushkhat ExtensionBeautifies Arabic, Persian, Urdu, Pashto and other right-to-left (RTL) languages
muhammadsohaib60 / Urdu OCROur project is based on one of the most important application of machine learning i.e. pattern recognition. Optical character recognition or optical character reader is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo or from subtitle text superimposed on an image. We are working on developing an OCR for URDU. We studied a couple of research papers related to our project. So far, we have found that Both Arabic and Urdu are written in Perso-Arabic script; at the written level, therefore, they share similarities. The styles of Arabic and Persian writing have a heavy influence on the Urdu script. There are 6 major styles for writing Arabic, Persian and Pashto as well. Urdu is written in Naskh writing style which is most famous of all. Optical character recognition (OCR) is the process of converting an image of text, such as a scanned paper document or electronic fax file, into computer-editable text [1]. The text in an image is not editable: the letters are made of tiny dots (pixels) that together form a picture of text. During OCR, the software analyzes an image and converts the pictures of the characters to editable text based on the patterns of the pixels in the image. After OCR, the converted text can be exported and used with a variety of word-processing, page layout and spreadsheet applications [2]. One of the main aims of OCR is to emulate the human ability to read at a much faster rate by associating symbolic identities with images of characters. Its potential applications include Screen Readers, Refreshable Braille Displays [3], reading customer filled forms, reading postal address off envelops, archiving and retrieving text etc. OCR’s ultimate goal is to develop a communication interface between the computer and its potential users. Urdu is the national language of Pakistan. It is a language that is understood by over 300 million people belonging to Pakistan, India and Bangladesh. Due to its historical database of literature, there is definitely a need to devise automatic systems for conversion of this literature into electronic form that may be accessible on the worldwide web. Although much work has been done in the field of OCR, Urdu and other languages using the Arabic script like Farsi, Urdu and Arabic, have received least attention. This is due in part to a lack of interest in the field and in part to the intricacies of the Arabic script. Owing to this state of indifference, there remains a huge amount of Urdu and Arabic literature unattended and rotting away on some old shelves. The proposed research aims to develop workable solutions to many of the problems faced in realization of an OCR designed specifically for Urdu Noori Nastaleeq Script, which is widely used in Urdu newspapers, governmental documents and books. The underlying processes first isolate and classify ligatures based on certain carefully chosen special, contour and statistical features and eventually recognize them with the aid of Feed-Forward Back Propagation Neural Networks. The input to the system is a monochrome bitmap image file of Urdu text written in Noori Nastaleeq and the output is the equivalent text converted to an editable text file.
anuragshas / Nlp For UrduThis repository contains State of the Art Tokenizer, Language model and Classifier for Urdu, which is one of the Official Languages of India and spoken in various states of India.
AdilFayyaz / Sentence Segmentation In UrduSentence Segmentation using basic text processing techniques in NLP for the Urdu Language
Areesha-Tahir / Fake News Detection Using Naive BayesFake news detection using Naïve Bayes in Python along with confusion matrix calculated using sklearn.