Pyltr
Python learning to rank (LTR) toolkit
Install / Use
/learn @jma127/PyltrREADME
pyltr
|pypi version| |Build status|
.. |pypi version| image:: https://img.shields.io/pypi/v/pyltr.svg :target: https://pypi.python.org/pypi/pyltr .. |Build status| image:: https://secure.travis-ci.org/jma127/pyltr.svg :target: http://travis-ci.org/jma127/pyltr
pyltr is a Python learning-to-rank toolkit with ranking models, evaluation metrics, data wrangling helpers, and more.
This software is licensed under the BSD 3-clause license (see LICENSE.txt).
The author may be contacted at ma127jerry <@t> gmail with general
feedback, questions, or bug reports.
Example
Import pyltr::
import pyltr
Import a LETOR <http://research.microsoft.com/en-us/um/beijing/projects/letor/>_ dataset
(e.g. MQ2007 <http://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR4.0/Data/MQ2007.rar>_
)::
with open('train.txt') as trainfile, \
open('vali.txt') as valifile, \
open('test.txt') as evalfile:
TX, Ty, Tqids, _ = pyltr.data.letor.read_dataset(trainfile)
VX, Vy, Vqids, _ = pyltr.data.letor.read_dataset(valifile)
EX, Ey, Eqids, _ = pyltr.data.letor.read_dataset(evalfile)
Train a LambdaMART <http://research.microsoft.com/pubs/132652/MSR-TR-2010-82.pdf>_ model, using
validation set for early stopping and trimming::
metric = pyltr.metrics.NDCG(k=10)
# Only needed if you want to perform validation (early stopping & trimming)
monitor = pyltr.models.monitors.ValidationMonitor(
VX, Vy, Vqids, metric=metric, stop_after=250)
model = pyltr.models.LambdaMART(
metric=metric,
n_estimators=1000,
learning_rate=0.02,
max_features=0.5,
query_subsample=0.5,
max_leaf_nodes=10,
min_samples_leaf=64,
verbose=1,
)
model.fit(TX, Ty, Tqids, monitor=monitor)
Evaluate model on test data::
Epred = model.predict(EX)
print 'Random ranking:', metric.calc_mean_random(Eqids, Ey)
print 'Our model:', metric.calc_mean(Eqids, Ey, Epred)
Features
Below are some of the features currently implemented in pyltr.
Models
-
LambdaMART (
pyltr.models.LambdaMART)-
Validation & early stopping
-
Query subsampling
-
Metrics
-
(N)DCG (
pyltr.metrics.DCG,pyltr.metrics.NDCG)- pow2 and identity gain functions
-
ERR (
pyltr.metrics.ERR)- pow2 and identity gain functions
-
(M)AP (
pyltr.metrics.AP) -
Kendall's Tau (
pyltr.metrics.KendallTau) -
AUC-ROC -- Area under the ROC curve (
pyltr.metrics.AUCROC)
Data Wrangling
-
Data loaders (e.g.
pyltr.data.letor.read) -
Query groupers and validators (
pyltr.util.group.check_qids,pyltr.util.group.get_groups)
Running Tests
Use the run_tests.sh script to run all unit tests.
Building Docs
cd into the docs/ directory and run make html. Docs are generated
in the docs/_build directory.
Contributing
Quality contributions or bugfixes are gratefully accepted. When submitting a
pull request, please update AUTHOR.txt so you can be recognized for your
work :).
By submitting a Github pull request, you consent to have your submitted code
released under the terms of the project's license (see LICENSE.txt).
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
