SkillAgentSearch skills...

Onetl

One ETL tool to rule them all

Install / Use

/learn @MTSWebServices/Onetl
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

.. _readme:

onETL

|Repo Status| |PyPI Latest Release| |PyPI License| |PyPI Python Version| |PyPI Downloads| |Documentation| |CI Status| |Test Coverage| |pre-commit.ci Status|

.. |Repo Status| image:: https://www.repostatus.org/badges/latest/active.svg :alt: Repo status - Active :target: https://github.com/MTSWebServices/onetl .. |PyPI Latest Release| image:: https://img.shields.io/pypi/v/onetl :alt: PyPI - Latest Release :target: https://pypi.org/project/onetl/ .. |PyPI License| image:: https://img.shields.io/pypi/l/onetl.svg :alt: PyPI - License :target: https://github.com/MTSWebServices/onetl/blob/develop/LICENSE.txt .. |PyPI Python Version| image:: https://img.shields.io/pypi/pyversions/onetl.svg :alt: PyPI - Python Version :target: https://pypi.org/project/onetl/ .. |PyPI Downloads| image:: https://img.shields.io/pypi/dm/onetl :alt: PyPI - Downloads :target: https://pypi.org/project/onetl/ .. |Documentation| image:: https://readthedocs.org/projects/onetl/badge/?version=stable :alt: Documentation - ReadTheDocs :target: https://onetl.readthedocs.io/ .. |CI Status| image:: https://github.com/MTSWebServices/onetl/workflows/Tests/badge.svg :alt: Github Actions - latest CI build status :target: https://github.com/MTSWebServices/onetl/actions .. |Test Coverage| image:: https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/ MTSOnGithub/03e73a82ecc4709934540ce8201cc3b4/raw/onetl_badge.json :target: https://github.com/MTSWebServices/onetl/actions .. |pre-commit.ci Status| image:: https://results.pre-commit.ci/badge/github/MTSWebServices/onetl/develop.svg :alt: pre-commit.ci - status :target: https://results.pre-commit.ci/latest/github/MTSWebServices/onetl/develop

|Logo|

.. |Logo| image:: docs/_static/logo_wide.svg :alt: onETL logo :target: https://github.com/MTSWebServices/onetl

What is onETL?

Python ETL/ELT library powered by Apache Spark <https://spark.apache.org/>_ & other open-source tools.

Goals

  • Provide unified classes to extract data from (E) & load data to (L) various stores.
  • Provides Spark DataFrame API <https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html>_ for performing transformations (T) in terms of ETL.
  • Provide direct assess to database, allowing to execute SQL queries, as well as DDL, DML, and call functions/procedures. This can be used for building up ELT pipelines.
  • Support different read strategies <https://onetl.readthedocs.io/en/stable/strategy/index.html>_, e.g. icremental reads.
  • Provide hooks <https://onetl.readthedocs.io/en/stable/hooks/index.html>_ & plugins <https://onetl.readthedocs.io/en/stable/plugins.html>_ mechanism for altering behavior of internal classes.

Non-goals

  • onETL is not a Spark replacement. It just provides additional functionality that Spark does not have, and improves UX for end users.
  • onETL is not a framework, as it does not have requirements to project structure, naming, the way of running ETL/ELT processes, configuration, etc. All of that should be implemented in some other tool.
  • onETL is deliberately developed without any integration with scheduling software like Apache Airflow. All integrations should be implemented as separated tools.
  • No Spark streaming support of any kind, only batch operations are supported. For streaming prefer Apache Flink <https://flink.apache.org/>_.

Requirements

  • Python 3.7 - 3.14
  • PySpark 3.2.x - 4.1.x (depends on used connector)
  • Java 8+ (required by Spark, see below)
  • Kerberos libs & GCC (required by Hive, HDFS and SparkHDFS connectors)

Supported storages

+--------------------+--------------+-------------------------------------------------------------------------------------------------------------------------+ | Type | Storage | Powered by | +====================+==============+=========================================================================================================================+ | Database | Clickhouse | Apache Spark JDBC Data Source <https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html>_ |

  •                +--------------+                                                                                                                         +
    

| | MSSQL | |

  •                +--------------+                                                                                                                         +
    

| | MySQL | |

  •                +--------------+                                                                                                                         +
    

| | Postgres | |

  •                +--------------+                                                                                                                         +
    

| | Oracle | |

  •                +--------------+-------------------------------------------------------------------------------------------------------------------------+
    

| | Hive | Apache Spark Hive integration <https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html>_ |

  •                +--------------+-------------------------------------------------------------------------------------------------------------------------+
    

| | Iceberg | Apache Iceberg Spark integration <https://iceberg.apache.org/spark-quickstart/>_ |

  •                +--------------+-------------------------------------------------------------------------------------------------------------------------+
    

| | Kafka | Apache Spark Kafka integration <https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html>_ |

  •                +--------------+-------------------------------------------------------------------------------------------------------------------------+
    

| | Greenplum | VMware Greenplum Spark connector <https://docs.vmware.com/en/VMware-Greenplum-Connector-for-Apache-Spark/index.html>_ |

  •                +--------------+-------------------------------------------------------------------------------------------------------------------------+
    

| | MongoDB | MongoDB Spark connector <https://www.mongodb.com/docs/spark-connector/current>_ | +--------------------+--------------+-------------------------------------------------------------------------------------------------------------------------+ | File | HDFS | HDFS Python client <https://pypi.org/project/hdfs/>_ |

  •                +--------------+-------------------------------------------------------------------------------------------------------------------------+
    

| | S3 | minio-py client <https://pypi.org/project/minio/>_ |

  •                +--------------+-------------------------------------------------------------------------------------------------------------------------+
    

| | SFTP | Paramiko library <https://pypi.org/project/paramiko/>_ |

  •                +--------------+-------------------------------------------------------------------------------------------------------------------------+
    

| | FTP | FTPUtil library <https://pypi.org/project/ftputil/>_ |

  •                +--------------+                                                                                                                         +
    

| | FTPS | |

  •                +--------------+-------------------------------------------------------------------------------------------------------------------------+
    

| | WebDAV | WebdavClient3 library <https://pypi.org/project/webdavclient3/>_ |

  •                +--------------+-------------------------------------------------------------------------------------------------------------------------+
    

| | Samba | pysmb library <https://pypi.org/project/pysmb/>_ | +--------------------+--------------+-------------------------------------------------------------------------------------------------------------------------+ | Files as DataFrame | SparkLocalFS | Apache Spark File Data Source <https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html>_ | | +--------------+

View on GitHub
GitHub Stars87
CategoryData
Updated12h ago
Forks7

Languages

Python

Security Score

100/100

Audited on Apr 7, 2026

No findings