SkillAgentSearch skills...

Hamilton

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

Install / Use

/learn @apache/Hamilton

README

<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <div align="center"> <h1><img src="https://github.com/apache/hamilton/assets/2328071/feb6abaa-b6d5-4271-a320-0ae4a18d8aa7" width="50"/> Apache Hamilton — portable & expressive <br> data transformation DAGs</h1> <a href='https://hamilton.apache.org/?badge=latest'> <img src='https://readthedocs.org/projects/hamilton/badge/?version=latest' alt='Documentation Status' /> </a><a href="https://www.python.org/downloads/" target="_blank"> <img src="https://img.shields.io/badge/python-3.10%20|%203.11%20|%203.12%20|%203.13%20|%203.14-blue.svg" alt="Python supported"/> </a> <a href="https://pypi.org/project/sf-hamilton/" target="_blank"> <img src="https://badge.fury.io/py/sf-hamilton.svg" alt="PyPi Version"/> </a> <a href="https://pepy.tech/project/sf-hamilton" target="_blank"> <img src="https://pepy.tech/badge/sf-hamilton" alt="Total Downloads"/> </a> <a href="https://pepy.tech/project/sf-hamilton" target="_blank"> <img src="https://static.pepy.tech/badge/sf-hamilton/month" alt="Total Monthly Downloads"/> </a> <br/> <a href="https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g" target="_blank"> <img src="https://img.shields.io/badge/Apache Hamilton-Join-purple.svg?logo=slack" alt="Apache Hamilton Slack"/> </a> <a href="https://twitter.com/hamilton_os" target="_blank"> <img src="https://img.shields.io/badge/HamiltonOS-Follow-purple.svg?logo=X"/> </a> </div> <br></br>

Disclaimer

Apache Hamilton is an effort undergoing incubation at the Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC.

Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects.

While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Apache Hamilton (incubating) is a lightweight Python library for directed acyclic graphs (DAGs) of data transformations. Your DAG is portable; it runs anywhere Python runs, whether it's a script, notebook, Airflow pipeline, FastAPI server, etc. Your DAG is expressive; Apache Hamilton has extensive features to define and modify the execution of a DAG (e.g., data validation, experiment tracking, remote execution).

To create a DAG, write regular Python functions that specify their dependencies with their parameters. As shown below, it results in readable code that can always be visualized. Apache Hamilton loads that definition and automatically builds the DAG for you!

<div align="center"> <img src="./docs/_static/abc_highlight.png" alt="Create a project" width="65%"/> </div> <div align="center"> Functions <code>B()</code> and <code>C()</code> refer to function <code>A</code> via their parameters </div> <br>

Apache Hamilton brings modularity and structure to any Python application moving data: ETL pipelines, ML workflows, LLM applications, RAG systems, BI dashboards, and the Apache Hamilton UI allows you to automatically visualize, catalog, and monitor execution.

Apache Hamilton is great for DAGs, but if you need loops or conditional logic to create an LLM agent or a simulation, take a look at our sister library Burr 🤖 .

Installation

Apache Hamilton supports Python 3.8+. We include the optional visualization dependency to display our Apache Hamilton DAG. For visualizations, Graphviz needs to be installed on your system separately.

pip install "sf-hamilton[visualization]"

To use the Apache Hamilton UI, install the ui and sdk dependencies.

pip install "sf-hamilton[ui,sdk]"

To try Apache Hamilton in the browser, visit www.tryhamilton.dev

Why use Apache Hamilton?

Data teams write code to deliver business value, but few have the resources to standardize practices and provide quality assurance. Moving from proof-of-concept to production and cross-function collaboration (e.g., data science, engineering, ops) remain challenging for teams, big or small. Apache Hamilton is designed to help throughout a project's lifecycle:

  • Separation of concerns. Apache Hamilton separates the DAG "definition" and "execution" which lets data scientists focus on solving problems and engineers manage production pipelines.

  • Effective collaboration. The Apache Hamilton UI provides a shared interface for teams to inspect results and debug failures throughout the development cycle.

  • Low-friction dev to prod. Use @config.when() to modify your DAG between execution environments instead of error-prone if/else feature flags. The notebook extension prevents the pain of migrating code from a notebook to a Python module.

  • Portable transformations. Your DAG is independent of infrastructure or orchestration, meaning you can develop and debug locally and reuse code across contexts (local, Airflow, FastAPI, etc.).

  • Maintainable DAG definition. Apache Hamilton automatically builds the DAG from a single line of code whether it has 10 or 1000 nodes. It can also assemble multiple Python modules into a pipeline, encouraging modularity.

  • Expressive DAGs. Function modifiers are a unique feature to keep your code DRY and reduce the complexity of maintaining large DAGs. Other frameworks inevitably lead to code redundancy or bloated functions.

  • Built-in coding style. The Apache Hamilton DAG is defined using Python functions, encouraging modular, easy-to-read, self-documenting, and unit testable code.

  • Data and schema validation. Decorate functions with @check_output to validate output properties, and raise warnings or exceptions. Add the SchemaValidator() adapter to automatically inspect dataframe-like objects (pandas, polars, Ibis, etc.) to track and validate their schema.

  • Built for plugins. Apache Hamilton is designed to play nice with all tools and provides the right abstractions to create custom integrations with your stack. Our lively community will help you build what you need!

Apache Hamilton UI

You can track the execution of your Apache Hamilton DAG in the Apache Hamilton UI. It automatically populates a data catalog with lineage / tracing and provides execution observability to inspect results and debug errors. You can run it as a local server or a self-hosted application using Docker.

<p align="center"> <img src="./docs/_static/hamilton_1.jpeg" alt="Description1" width="30%" style="margin-right: 20px;"/> <img src="./docs/_static/hamilton_2.jpeg" alt="Description2" width="30%" style="margin-right: 20px;"/> <img src="./docs/_static/hamilton_3.jpeg" alt="Description3" width="30%"/> </p> <p align="center"> <em>DAG catalog, automatic dataset profiling, and execution tracking</em> </p>

Get started with the Apache Hamilton UI

  1. To use the Apache Hamilton UI, install the dependencies (see Installation section) and start the server with

    hamilton ui
    
  2. On the first connection, create a username and a new project (the project_id should be 1).

<div align="center"> <img src="./docs/_static/new_project.png" alt="Create a project" width="70%"/> </div> <br>
  1. Track your Apache Hamilton DAG by creating a HamiltonTracker object with your username and project_id and adding it to your Builder. Now, your DAG will appear in the UI's catalog and all executions will be tracked!

    from hamilton import driver
    from hamilton_sdk.adapters import HamiltonTracker
    import my_dag
    
    # use your `username` and `project_id`
    tracker = HamiltonTracker(
       username="my_username",
       project_id=1,
       dag_name="hello_world",
    )
    
    # adding the tracker to the `Builder` will add the DAG to the catalog
    dr = (
       driver.Builder()
       .with_modules(my_dag)
       .with_adapters(tracker)  # add your tracker here
       .build()
    )
    
    # executing the `Driver` will track results
    dr.execute(["C"])
    

Documentation & learning resources

  • 📚 See the official documentation to learn about the core concepts of Apache Hamilton.

  • 👨‍🏫 Consult the examples on GitHub to learn about specific features or integrations with other frameworks.

  • 📰 The DAGWorks blog includes guides about how to build a dat

Related Skills

View on GitHub
GitHub Stars2.4k
CategoryData
Updated7h ago
Forks177

Languages

Jupyter Notebook

Security Score

100/100

Audited on Mar 31, 2026

No findings