Skpro
A unified framework for tabular probabilistic regression, time-to-event prediction, and probability distributions in python
Install / Use
/learn @sktime/SkproREADME
<a href="https://skpro.readthedocs.io/en/latest"><img src="https://github.com/sktime/skpro/blob/main/docs/source/images/skpro-banner.png" width="500" align="right" /></a>
:rocket: Version 2.11.0 out now! Read the release notes here..
skpro is a library for supervised probabilistic prediction in python.
It provides scikit-learn-like, scikit-base compatible interfaces to:
- tabular supervised regressors for probabilistic prediction - interval, quantile and distribution predictions
- tabular probabilistic time-to-event and survival prediction - instance-individual survival distributions
- metrics to evaluate probabilistic predictions, e.g., pinball loss, empirical coverage, CRPS, survival losses
- reductions to turn
scikit-learnregressors into probabilisticskproregressors, such as bootstrap or conformal - building pipelines and composite models, including tuning via probabilistic performance metrics
- symbolic probability distributions with value domain of
pandas.DataFrame-s andpandas-like interface
| Overview | |
|---|---|
| Open Source |
|
| Tutorials |
|
| Community |
|
| CI/CD |
|
| Code |
|
| Downloads |
|
| Citation |
|
:books: Documentation
| Documentation | | | -------------------------- | -------------------------------------------------------------- | | :star: Tutorials | New to skpro? Here's everything you need to know! | | :clipboard: Binder Notebooks | Example notebooks to play with in your browser. | | :woman_technologist: User Guides | How to use skpro and its features. | | :scissors: Extension Templates | How to build your own estimator using skpro's API. | | :control_knobs: API Reference | The detailed reference for skpro's API. | | :hammer_and_wrench: Changelog | Changes and version history. | | :deciduous_tree: Roadmap | skpro's software and community development plan. | | :pencil: Related Software | A list of related software. |
:speech_balloon: Where to ask questions
Questions and feedback are extremely welcome! We strongly believe in the value of sharing help publicly, as it allows a wider audience to benefit from it.
skpro is maintained by the sktime community, we use the same social channels.
| Type | Platforms |
| ------------------------------- | --------------------------------------- |
| :bug: Bug Reports | GitHub Issue Tracker |
| :sparkles: Feature Requests & Ideas | GitHub Issue Tracker |
| :woman_technologist: Usage Questions | GitHub Discussions · Stack Overflow |
| :speech_balloon: General Discussion | GitHub Discussions |
| :factory: Contribution & Development | dev-chat channel · Discord |
| :globe_with_meridians: Community collaboration session | Discord - Fridays 13 UTC, dev/meet-ups channel |
:dizzy: Features
Our objective is to enhance the interoperability and usability of the AI model ecosystem:
-
skprois compatible with scikit-learn and sktime, e.g., ansktimeproba forecaster can be built with anskproproba regressor which in ansklearnregressor with proba mode added byskpro -
skproprovides a mini-package management framework for first-party implementations, and for interfacing popular second- and third-party components, such as cyclic-boosting, MAPIE, or ngboost packages.
skpro curates libraries of components of the following types:
| Module | Status | Links | |---|---|---| | Probabilistic tabular regression | maturing | Tutorial · API Reference · Extension Template | | Time-to-event (survival) prediction | maturing | Tutorial · API Reference · Extension Template | | Performance metrics | maturing | API Reference | | Probability distributions | maturing | Tutorial · API Reference · Extension Template |
:hourglass_flowing_sand: Installing skpro
To install skpro, use pip:
pip install skpro
or, with maximum dependencies,
pip install skpro[all_extras]
Releases are available as source packages and binary wheels. You can see all available wheels here.
:zap: Quickstart
Making probabilistic predictions
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from skpro.regression.residual import ResidualDouble
# step 1: data specification
X, y = load_diabetes(return_X_y=True, as_frame=True)
X_train, X_new, y_train, y_test = train_test_split(X, y)
# step 2: specifying the regressor - any compatible regressor is valid!
# example - "squaring residuals" regressor
# random forest for mean prediction
# linear regression for variance prediction
reg_mean = RandomForestRegressor()
reg_resid = LinearRegression()
reg_proba = ResidualDouble(reg_mean, reg_resid)
# step 3: fitting the model to training data
reg_proba.fit(X_train, y_train)
# step 4: predicting labels on new data
# probabilistic prediction mod
