Pyspark2pmml
Python library for converting Apache Spark ML pipelines to PMML
Install / Use
/learn @jpmml/Pyspark2pmmlREADME
PySpark2PMML
Python package for converting Apache Spark ML pipelines to PMML.
Features
This package is a thin PySpark wrapper for the JPMML-SparkML library.
News and Updates
See the NEWS.md file.
Prerequisites
- PySpark 3.0.X through 3.5.X, 4.0.X or 4.1.X.
- Python 3.8 or newer.
Installation
Install a release version from PyPI:
pip install pyspark2pmml
Alternatively, install the latest snapshot version from GitHub:
pip install --upgrade git+https://github.com/jpmml/pyspark2pmml.git
Configuration
One and the same PySpark2PMML version works across all supported PySpark release lines. Version variance is confined to the underlying JPMML-SparkML library, where each Apache Spark release line maps to a dedicated JPMML-SparkML release line.
PySpark2PMML must be paired with JPMML-SparkML based on the following compatibility matrix:
Active development branches:
| Apache Spark version | JPMML-SparkML branch | Latest JPMML-SparkML version |
|----------------------|----------------------|------------------------------|
| 3.4.X | 3.0.X | 3.0.10 |
| 3.5.X | 3.1.X | 3.1.10 |
| 4.0.X | 3.2.X | 3.2.9 |
| 4.1.X | master | 3.3.2 |
Stale development branches:
| Apache Spark version | JPMML-SparkML branch | Latest JPMML-SparkML version |
|----------------------|----------------------|------------------------------|
| 3.0.X | 2.0.X | 2.0.6 |
| 3.1.X | 2.1.X | 2.1.6 |
| 3.2.X | 2.2.X | 2.2.6 |
| 3.3.X | 2.3.X | 2.3.5 |
| 3.4.X | 2.4.X | 2.4.4 |
| 3.5.X | 2.5.X | 2.5.3 |
PySpark2PMML Python APIs are simple and stable in time. If the package has not been updated for months or even a year, then this does not mean that it has fallen behind JPMML-SparkML development in any way.
Quite the contrary. The latest PySpark2PMML package version should be fully interoperable with any and all JPMML-SparkML library versions that have been released since that time.
Usage
Launch PySpark; use the --packages command-line option to specify the coordinates of relevant JPMML-SparkML modules:
org.jpmml:pmml-sparkml:${version}- Core module.org.jpmml:pmml-sparkml-lightgbm:${version}- LightGBM via SynapseML extension module.org.jpmml:pmml-sparkml-xgboost:${version}- XGBoost via XGBoost4J-Spark extension module.
Launching core:
pyspark --packages org.jpmml:pmml-sparkml:${version}
Fitting a Spark ML pipeline:
from pyspark.ml import Pipeline
from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml.feature import RFormula
df = spark.read.csv("Iris.csv", header = True, inferSchema = True)
formula = RFormula(formula = "Species ~ .")
classifier = DecisionTreeClassifier()
pipeline = Pipeline(stages = [formula, classifier])
pipelineModel = pipeline.fit(df)
Exporting the fitted Spark ML pipeline to a PMML file:
from pyspark2pmml import PMMLBuilder
pmmlBuilder = PMMLBuilder(df.schema, pipelineModel)
pmmlBuilder.buildFile("DecisionTreeIris.pmml")
The representation of individual Spark ML pipeline stages can be customized via conversion options:
from pyspark2pmml import PMMLBuilder
classifierModel = pipelineModel.stages[1]
pmmlBuilder = PMMLBuilder(df.schema, pipelineModel) \
.putOption(classifierModel, "compact", False) \
.putOption(classifierModel, "estimate_featureImportances", True)
pmmlBuilder.buildFile("DecisionTreeIris.pmml")
License
PySpark2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.
If you would like to use PySpark2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes PySpark2PMML available under the terms and conditions of the BSD 3-Clause License instead.
Additional information
PySpark2PMML is developed and maintained by Openscoring Ltd, Estonia.
Interested in using Java PMML API software in your company? Please contact info@openscoring.io
Related Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
