Ppmml
Python library for converting machine learning models to pmml file
Install / Use
/learn @lgrcyanny/PpmmlREADME
ppmml
ppmml is a python library for converting machine learning models to pmml file. ppmml wraps jpmml libraries and provides clean interface.
Installation
pip install --default-timeout=10000 -i https://pypi.anaconda.org/lgrcyanny/simple ppmml
If download too slow, please download from anaconda in ppmml conda package, then run the command:
pip install ppmml-0.0.1.tar.gz
Geting Started
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib
import ppmml
# load data and train iris datasets
(X, y) = load_iris(True)
lr = LogisticRegression(tol=1e-5)
lr.fit(X, y)
joblib.dump(lr, "lr.pkl.z", compress = 9)
# to pmml file
ppmml.to_pmml("lr.pkl.z", "lr.pmml", model_type='sklearn')
# prepare test data
df = pd.DataFrame(X)
df.columns = ['x1', 'x2', 'x3', 'x4']
df.to_csv('test.csv', header=True, index=False)
# predit with pmml file, a simple predict API based on jpmml-evaluator
ppmml.predict('lr.pmml', 'test.csv', 'predict.csv')
More examples
- sklearn models to pmml file
- xgboost model to pmml file
- lightgbm model to pmml file
- tensorflow model to pmml file
- r model to pmml file
Algorithm Features
All algorithm supported by jpmml is support by pmml. In summary:
sklearn estimators
Supervised Learning
- GLM: Linear, Logistic Regression, Lasso, ElasticNet, Ridge, SGD
- Naive Bayes: GaussianNB, but no support for multinomial naive Bayes, Bernoulli naive Bayes
- Nearest Neighbors
- Neural Network
- SVM
- DecisionTree
- All EM Method
- LinearDiscriminantAnalysis
Unsupervised Learning
- Custering: KMeans, no support for LDA and DBSCAN
- PCA
feature algorithms
- feature selection
- feature extraction, no support for FeatureHasher
- feature selection
- feature preprocessing
xgboost classifier and regressor
lightgbm classifier and regressor
tensorflow
Only support DNNClassifier, DNNRegressor, Linear Classifier, Linear Regressor, one_hot_column, real_valued_column sparse_column_with_keys jpmml-tensorflow
spark ml
jpmml-sparkml is better than spark mllib's pmml transformation. it support 35 algorithms now.
- Feature extractors, transformers and selectors. But no support for BucketedRandomProjectionLSH,DCT, ElementWiseProduct, LSH, MinHashLSH, Normalizer, PolynomialExpansion
- Classification: LR, GBT, DecisionTree, NN, RandomForest. But no support for SVM,Naive Bayes.
- Regression: Linear, GBT, DecisionTree, RandomForest, GLR. But no support for Survival regression, Isotonic regression
- Clustering: KMeans. But no support for GMM, LDA
R models
ppmml integrates jpmml-r
Preprocessing
- range, center, scale, medianImpute
R Algorithms
- glm - Generalized linear (GLM) regression and classification
- kmeans - K-Means clustering
- lm - Linear (LM) regression
- XGBoost, GBM
- Random Forest
- SVM Classifier and Regression
- Scorecard regression
- earth - Multivariate Adaptive Regression Spline (MARS) regression
- elmNN - Extreme Learning Machine (ELM) regression
- iForest - Isolation Forest (IF) anomaly detection
- mvr - Multivariate Regression (MVR) regression
- lrm - Binary Logistic Regression (LR) classification
- ols - Ordinary Least Squares (OLS) regression
Requirements
- python 2.7
- jdk 1.8+
How to build from source
sh build.sh clean package
the output egg package will be palced in dist directory
install to local
python setup.py install
Project Structure
- ppmml: ppmml python libraries
- deps: jar dependencies, including jpmml-sklearn, jpmml-tensorflow, jpmml-r, jpmml-spark, jpmml-xgboost, jpmml-lightgbm and jpmml-evaluator.
- examples: ppmml example notebooks
Run unit tests
please refer to dev guide
All unit tests are passed with these versions:
- tensorflow 1.4
- xgboost 0.6a2
- scikit-learn 0.19
- lightgbm 2.0.11
- spark 2.2, 2.3
- R 3.4.2
- jpmml-model 1.3.8
Notes
Notes for Users
- pmml converters only support run in locally, especially spark converter will new a local SparkSession
- Users care about the input path and pmml output path
Notes for developers
- jpmml-tensorflow hasn't been publish to maven, please pull the code and compile the jar manually (https://github.com/lgrcyanny/jpmml-tensorflow/tree/fix-building) The forked version has fixed compile error of jpmml-tensorflow
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
flutter-tutor
Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
workshop-rules
Materials used to teach the summer camp <Data Science for Kids>
