Articlequality
Github mirror - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for contributing)
Install / Use
/learn @wikimedia/ArticlequalityREADME
Wikipedia article quality classification
<blockquote> ⚠️ Warning: As of late 2023, the ORES infrastructure is being deprecated by the WMF Machine Learning team, please check https://wikitech.wikimedia.org/wiki/ORES for more info.While the code in this repository may still work, it is unmaintained, and as such may break at any time. Special consideration should also be given to machine learning models seeing drift in quality of predictions.
The replacement for ORES and associated infrastructure is Lift Wing: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing
Some Revscoring models from ORES run on the Lift Wing infrastructure, but they are otherwise unsupported (no new training or code updates).
They can be downloaded from the links documented at: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Revscoring_models_(migrated_from_ORES)
In the long term, some or all these models may be replaced by newer models specifically tailored to be run on modern ML infrastructure like Lift Wing.
If you have any questions, contact the WMF Machine Learning team: https://wikitech.wikimedia.org/wiki/Machine_Learning
</blockquote>This library provides a set of utilities for performing automatic detection of assessment classes of Wikipedia articles. For more information, see the full documentation at https://articlequality.readthedocs.io .
Compatible with Python 3.x only. Sorry.
- Install:
pip install articlequality - Models: https://github.com/wikimedia/articlequality/tree/master/models
- Documentation: https://articlequality.readthedocs.io
Basic usage
>>> import articlequality
>>> from revscoring import Model
>>>
>>> scorer_model = Model.load(open("models/enwiki.nettrom_wp10.gradient_boosting.model", "rb"))
>>>
>>> text = "I am the text of a page. I have a <ref>word</ref>"
>>> articlequality.score(scorer_model, text)
{'prediction': 'stub',
'probability': {'stub': 0.27156163795807853,
'b': 0.14707452309674252,
'fa': 0.16844898943510833,
'c': 0.057668704007171959,
'ga': 0.21617801281707663,
'start': 0.13906813268582238}}
Install
Requirements
- Python 3.5, 3.6 or 3.7
- All the system requirements of revscoring
Installation steps
- clone this repository
- install the package itself and its dependencies
python setup.py install - You can verify that your installation worked by running
make enwiki_modelsto build the English Wikipedia article quality model ormake wikidatawiki_modelsto build the item quality model for Wikidata
Retraining the models
To retrain a model, run make -B MODEL e.g. make -B wikidatawiki_models. This will redownload the labels, re-extract the features from the revisions, and then retrain and rescore the model.
To skip re-downloading the training labels and re-extracting the features, it is enough touch the files in the datasets/ directory and run the make command without the -B flag.
Running tests
Example:
pytest -vv tests/feature_lists/test_wikidatawiki.py
Authors
- Aaron Halfaker -- https://github.com/halfak
- Morten Warncke-Wang -- https://github.com/nettrom
Related Skills
node-connect
351.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
