SkillAgentSearch skills...

Pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.

Install / Use

/learn @catalyst-cooperative/Pudl

README

=============================================================================== The Public Utility Data Liberation Project (PUDL)

.. readme-intro

.. |repo-status| image:: https://www.repostatus.org/badges/latest/active.svg :target: https://www.repostatus.org/#active :alt: Project Status: Active .. |pytest| image:: https://github.com/catalyst-cooperative/pudl/workflows/pytest/badge.svg :target: https://github.com/catalyst-cooperative/pudl/actions?query=workflow%3Apytest :alt: PyTest Status .. |codecov| image:: https://img.shields.io/codecov/c/github/catalyst-cooperative/pudl?style=flat&logo=codecov :target: https://codecov.io/gh/catalyst-cooperative/pudl :alt: Codecov Test Coverage .. |rtd| image:: https://img.shields.io/readthedocs/catalystcoop-pudl?style=flat&logo=readthedocs :target: https://catalystcoop-pudl.readthedocs.io/en/nightly/ :alt: Read the Docs Build Status .. |oc| image:: https://opencollective.com/pudl/tiers/badge.svg :target: https://opencollective.com/pudl .. |ruff| image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json :target: https://github.com/astral-sh/ruff .. |pre-commit-ci| image:: https://results.pre-commit.ci/badge/github/catalyst-cooperative/pudl/main.svg :target: https://results.pre-commit.ci/latest/github/catalyst-cooperative/pudl/main :alt: pre-commit CI .. |zenodo-doi| image:: https://zenodo.org/badge/80646423.svg :target: https://zenodo.org/badge/latestdoi/80646423 :alt: Zenodo DOI .. |office-hours| image:: https://img.shields.io/badge/calend.ly-officehours-darkgreen :target: https://calend.ly/catalyst-cooperative/pudl-office-hours :alt: Schedule a 1-on-1 chat with us about PUDL. .. |mastodon| image:: https://img.shields.io/mastodon/follow/110855618428885893?domain=https%3A%2F%2Fmastodon.energy&style=social&color=%23000000&link=https%3A%2F%2Fmastodon.energy%2F%40catalystcoop :target: https://mastodon.energy/@catalystcoop :alt: Follow Catalyst Cooperative on Mastodon .. |slack| image:: https://img.shields.io/badge/Slack-4A154B?logo=slack&logoColor=fff :target: https://join.slack.com/t/catalystcooperative/shared_invite/zt-2yg1v2sb7-GsoGlA9Ojc_LCJ00vPWKbQ .. |linkedin| image:: https://img.shields.io/badge/LinkedIn-0077B5?style=flat&logo=linkedin&logoColor=white :target: https://linkedin.com/company/catalyst-cooperative/ :alt: Follow Catalyst Cooperative on LinkedIn .. |bluesky| image:: https://img.shields.io/badge/Bluesky-0285FF?logo=bluesky&logoColor=fff&style=flat :target: https://bsky.app/profile/catalyst.coop :alt: Follow @catalyst.coop on BlueSky .. |kaggle| image:: https://img.shields.io/badge/Kaggle-20BEFF?style=flat&logo=Kaggle&logoColor=white :target: https://www.kaggle.com/datasets/catalystcooperative/pudl-project :alt: The PUDL Dataset on Kaggle .. |youtube| image:: https://img.shields.io/badge/YouTube-%23FF0000.svg?logo=YouTube&logoColor=white :target: https://youtube.com/@CatalystCooperative :alt: Catalyst Cooperative on YouTube .. |aws| image:: https://img.shields.io/badge/Amazon_AWS-FF9900?style=flat&logo=amazonaws&logoColor=white :target: https://registry.opendata.aws/catalyst-cooperative-pudl/ :alt: PUDL in the AWS Open Data Registry

|repo-status| |pytest| |codecov| |rtd| |oc| |ruff| |pre-commit-ci| |zenodo-doi| |office-hours| |mastodon| |linkedin| |bluesky| |kaggle| |slack| |youtube| |aws|

What is PUDL?

The PUDL <https://catalyst.coop/pudl/>__ Project (pronounced puddle) is an open source data processing pipeline that makes US energy data easier to access and use programmatically.

Hundreds of gigabytes of valuable data are published by US government agencies, but it's often difficult to work with. PUDL takes the original spreadsheets, CSV files, and databases and turns them into a unified resource. This allows users to spend more time on novel analysis and less time on data preparation.

The project is focused on serving researchers, activists, journalists, policy makers, and small businesses that might not otherwise be able to afford access to this data from commercial sources and who may not have the time or expertise to do all the data processing themselves from scratch.

We want to make this data accessible and easy to work with for as wide an audience as possible: anyone from a grassroots youth climate organizers working with Google sheets to university researchers with access to scalable cloud computing resources and everyone in between!

PUDL is comprised of three core components:

Raw Data Archives ^^^^^^^^^^^^^^^^^ PUDL archives <https://github.com/catalyst-cooperative/pudl-archiver>__ all our raw inputs on Zenodo <https://zenodo.org/communities/catalyst-cooperative/?page=1&size=20>__ to ensure permanent, versioned access to the data. In the event that an agency changes how it publishes data or deletes old files, the data processing pipeline will still have access to the original inputs. Each of the data inputs may have several different versions archived, and all are assigned a unique DOI (digital object identifier) and made available through Zenodo's REST API. You can read more about the Raw Data Archives in the docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/#raw-data-archives>__.

Data Pipeline ^^^^^^^^^^^^^ The data pipeline (this repo) ingests raw data from the archives, cleans and integrates it, and writes the resulting tables to SQLite <https://sqlite.org>__ and Apache Parquet <https://parquet.apache.org/>__ files, with some accompanying metadata stored as JSON. Each release of the PUDL software contains a set of DOIs indicating which versions of the raw inputs it processes. This helps ensure that the outputs are replicable. You can read more about our ETL (extract, transform, load) process in the PUDL documentation <https://catalystcoop-pudl.readthedocs.io/en/nightly/#the-etl-process>__.

Data Warehouse ^^^^^^^^^^^^^^ The SQLite, Parquet, and JSON outputs from the data pipeline, sometimes called "PUDL outputs", are updated each night by an automated build process, and periodically archived so that users can access the data without having to install and run our data processing system. These outputs contain hundreds of tables and comprise a small file-based data warehouse that can be used for a variety of energy system analyses. Learn more about how to access the PUDL data <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_access.html>__.

What data is available?

PUDL currently integrates data from:

  • EIA Form 176 (a few tables -- work in progress):
    • Source Docs <https://www.eia.gov/dnav/ng/TblDefs/NG_DataSources.html#s176>__
    • PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/eia176.html>__
  • EIA Form 860:
    • Source Docs <https://www.eia.gov/electricity/data/eia860/>__
    • PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/eia860.html>__
  • EIA Form 860m:
    • Source Docs <https://www.eia.gov/electricity/data/eia860m/>__
  • EIA Form 861:
    • Source Docs <https://www.eia.gov/electricity/data/eia861/>__
    • PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/eia861.html>__
  • EIA Form 923:
    • Source Docs <https://www.eia.gov/electricity/data/eia923/>__
    • PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/eia923.html>__
  • EIA Form 930:
    • Source Docs <https://www.eia.gov/electricity/gridmonitor/>__
    • PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/eia930.html>__
  • EIA Annual Energy Outlook (AEO) (a few tables):
    • Source Docs <https://www.eia.gov/outlooks/aeo/>__
    • PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/eiaaeo.html>__
  • EPA Continuous Emissions Monitoring System (CEMS):
    • Source Docs <https://campd.epa.gov/>__
    • PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/epacems.html>__
  • FERC Form 1 (dozens of fully processed tables, plus raw data converted to SQLite):
    • Source Docs <https://www.ferc.gov/industries-data/electric/general-information/electric-industry-forms/form-1-electric-utility-annual>__
    • PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/ferc1.html>__
  • FERC Form 714 (a few fully processed tables):
    • Source Docs <https://www.ferc.gov/industries-data/electric/general-information/electric-industry-forms/form-no-714-annual-electric/data>__
    • PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/ferc714.html>__
  • FERC Form 2 (raw data converted to SQLite):
    • Source Docs <https://www.ferc.gov/industries-data/natural-gas/industry-forms/form-2-2a-3-q-gas-historical-vfp-data>__
  • FERC Form 6 (raw data converted to SQLite):
    • Source Docs <https://www.ferc.gov/general-information-1/oil-industry-forms/form-6-6q-historical-vfp-data>__
  • FERC Form 60 (raw data converted to SQLite):
    • Source Docs <https://www.ferc.gov/form-60-annual-report-centralized-service-companies>__
  • NREL Annual Technology Baseline (ATB) for Electricity:
    • Source Docs <https://atb.nrel.gov/electricity/2024/data>__
    • PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/nrelatb.html>__
  • GridPath Resource Adequacy Toolkit (partial):
    • Source Docs <https://gridlab.org/gridpathratoolkit/>__
    • PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/gridpathratoolkit.html>__
  • US Census Demographic Profile 1 Geodatabase:
    • Source Docs <https://www.census.gov/geographies/mapping-files/2010/geo/tiger-data.html>__
    • PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/censusdp1tract.html>__

High Priority Target Datasets

View on GitHub
GitHub Stars579
CategoryData
Updated33m ago
Forks133

Languages

Python

Security Score

100/100

Audited on Mar 29, 2026

No findings