Open Collective
Open Collective
Loading

The Public Utility Data Liberation Project

Liberating US energy system data for easy use by people fighting climate change.

Today's Balance
$77,503.19
Estimated Annual Budget
$85,026
PUDL Stewardship (2025)
$166,000
per year

Contribute


Become a financial contributor.

Financial Contributions

Custom contribution
Watt

Your contribution will help keep PUDL going strong! Watt contributors are encouraged to engage with PUDL directly through GitHub issues and discuss... Read more

Latest activity by


Recurring contribution
Kilowatt⚡️

You are a part of keeping PUDL free and updated! You can participate in our quarterly PUDL work planning and prioritization process, and we’ll incl... Read more

Starts at
$8,000 USD / year

Latest activity by


Be the first one to contribute!
Recurring contribution
Megawatt ⚡️⚡️

You are a key part of keeping PUDL free and updated! You can participate in our quarterly PUDL work planning and prioritization process, and we’ll ... Read more

Starts at
$16,000 USD / year

Latest activity by


Recurring contribution
Gigawatt⚡️⚡️⚡️

You are a major part of keeping PUDL free and updated! You can participate in our quarterly PUDL work planning and prioritization process, and we’l... Read more

Starts at
$25,000 USD / year

Latest activity by


The Public Utility Data Liberation Project is all of us

Our contributors 7

Thank you for supporting The Public Utility Data Liberation Project.

RMI

Gigawatt⚡️⚡️⚡️

$32,000 USD

GridLab

Gigawatt⚡️⚡️⚡️

$30,000 USD

ZERO Lab @ Pr...

Megawatt ⚡️⚡️

$20,000 USD

Saul Shanabrook

Watt

$2 USD

Katherine Lamb

$1 USD

Energy data for all!

About


It’s been eight years since we started the Public Utility Data Liberation Project (PUDL)! With support from funders like the Alfred P. Sloan Foundation and organizations like RMI and GridLab, we’ve been able to provide cleaned, connected and open-access versions of essential public energy datasets to help close the gap on information asymmetry in the power sector.

We’re proud to publish data that’s used by a growing number of organizations and open-source projects working to decarbonize the US energy system, such as RMI, Princeton’s ZERO Lab, Singularity’s Open Grid Emissions and pyPSA-USA. We want to ensure that PUDL users working in the public interest continue to have free and open access to analysis-ready energy data.

But keeping PUDL free is…well…not free. As the project’s maintainers, we’re committed to ensuring that the data and software is kept up to date, bugs get fixed, testing is ongoing, and there are regularly scheduled data releases. We’ve done the math, and keeping PUDL running costs Catalyst approximately $166,000 per year. 

Introducing The PUDL Sustainers program

As we enter 2025, we’re excited to launch the PUDL Sustainers Program–a crowdsourced effort to raise capital and build a powerful community of users to help inform the direction of PUDL!

Typically, open source projects rely on a blend of grants and donations to keep their lights on. Grant funding remains an essential resource for us in supporting the development of new tooling and datasets, but there is plenty of other work that we do on PUDL that needs more consistent funding. We want to fill this funding gap with users who find enough value in PUDL to provide financial support and contribute to its direction. We believe that if the cost of ongoing maintenance for the project is distributed amongst our user community then PUDL will be a better, more reliable tool for all.

That’s where you come in! We’re asking organizations and individuals who use PUDL data to become PUDL Sustainers by contributing financially to the project. 

Sustainers will receive the following perks: 
  • Help keep PUDL regularly updated and freely available to the public under an open license. You are closing the gap on information asymmetry in the US energy system.
  • Your role in sustaining PUDL is acknowledged on the project website, GitHub repository, in Catalyst’s social media posts, presentations, and other public forums.
  • Funders above the Kilowatt level: You’re invited to participate in PUDL’s quarterly planning process. If we fundraise more than our target budget, this will include the ability to propose and vote on additional PUDL enhancements to work on in 2025!

What is PUDL?
The PUDL Project (pronounced puddle) is an open source data processing pipeline that makes US energy data easier to access and use programmatically. 

Hundreds of gigabytes of valuable data are published by US government agencies, but it's often prohibitively difficult to work with. PUDL takes the original spreadsheets, CSV files, and databases and turns them into a unified resource. This allows users to spend more time on novel analysis and less time on data preparation. 

The project is focused on serving researchers, activists, journalists, policy makers, and small businesses that might not otherwise be able to afford access to this data from commercial sources and who may not have the time or expertise to do all the data processing themselves from scratch. 

We want to make this data accessible and easy to work with for as wide an audience as possible: anyone from grassroots youth climate organizers working with Google sheets to professional researchers with access to scalable cloud computing resources and everyone in between! 

PUDL consists of three components: the raw data archives, the data pipeline that transforms them, and the distributed outputs that form a small data warehouse for analytical use. Read through our documentation page for more information!

Supporting PUDL
Based on our experience over the last year, the budget for 2025 will be $166,000. This number may change slightly from year to year, but conceptually the budget will support:

  • Monthly archives of all our raw input datasets.
  • Annual updates to all existing PUDL data sources (no later than two weeks after the raw data are published).
  • Quarterly versioned data releases.
  • Computing resources for continuous integration testing and nightly data builds.
  • Maintenance of software dependencies, documentation, and example notebooks.
  • Distribution of PUDL data to the AWS Open Data Registry, Zenodo, Kaggle, and web interface.
  • Immediate support for bugs and data quality issues
  • Incremental adaptation of the PUDL infrastructure. 
  • Project management and overhead.

See the full statement of work below for more detail.

Beyond The Baseline
For the moment we are focused on making sure that the basic operations and maintenance of PUDL are sustainable.

If we can recruit support beyond the stated PUDL budget, Sustainers will be able to submit proposals to enhance PUDL beyond what’s included in the baseline statement of work during our quarterly planning process. For example, identifying a new dataset to integrate, or additional tables to add from a current PUDL dataset.

What We’re Asking For
Any size contribution to the PUDL Project is welcomed! However, we’re suggesting annual contributions between 5-15% of the overall budget ($8k - $25k).

These are separated into the following contribution tiers: 
  • Watt: 0-5%
  • Kilowatt: 5-9.9%
  • Megawatt: 10-14.9%
  • Gigawatt: 15% +

We are targeting a ceiling on any individual contribution of one third of the annual budget. However, RMI is helping us launch the program and has pledged to cover 40% of the budget for the 1st half of 2025.

Spread the word! We need your help recruiting additional PUDL Sustainers to fill the gap in the 2025 budget.

Benefits of Being a Sustainer
  • You are helping keep PUDL regularly updated and freely available to the public under an open license. You are closing the gap on information asymmetry in the US energy system.
  • Your role in sustaining PUDL is acknowledged on the project website, GitHub repository, in Catalyst’s social media posts, presentations, and other public forums in proportion to your contribution.
  • If we fundraise in excess of the baseline budget: Sustainers at the level of Kilowatt and greater are invited to participate in the quarterly PUDL work planning process where they can vote (in proportion to their contribution) on what PUDL enhancements will be funded with the additional resources.

All users, regardless of financial contributions, are encouraged to engage with PUDL by submitting GitHub issues, participating in GitHub discussions, attending office hours, and filling out annual user surveys.

2025 PUDL Statement of Work

Our estimated annual budget for stewarding PUDL includes the following tasks. It includes a buffer that we believe should be sufficient to deal with external changes that are beyond our control. These include unexpected changes to upstream data formats, the constantly evolving open source software landscape, and necessary infrastructure upgrades as the data being processed grows over time.

Data Archiving

  • Ensure we capture monthly snapshots of all our input data sources, and archive them on Zenodo.
  • Fresh archives should be up on Zenodo by the 5th of each month.
  • Adapt the archiving system as needed when upstream data publication methods or formats change. In most cases, little or no work is required, but occasionally upstream data changes substantially, requiring significant work to integrate.
  • Snapshots are captured monthly to ensure that we have a chance to identify upstream issues that are beyond our control early, and work to address them before we need the data to be ready for use in a quarterly release.
  • Monthly snapshots also ensure that if there are upstream issues that come up right before a quarterly release, we still have relatively fresh inputs that can be used.

Quarterly Data Updates

  • Includes all PUDL datasets that are updated more frequently than annually.
  • Take place in February, May, August, and November.
  • Updated data should be in the nightly builds by the 15th of each of these months.
  • Versioned releases should be out by the 20th of each of these months.
  • These versioned data releases are archived for long term access.
  • Datasets receiving quarterly updates include:
    • EIA Forms 860M and 923 (year-to-date)
    • EIA Form 930 (hourly Grid Monitor)
    • EIA Bulk Electricity API Data
    • EPA CEMS Hourly Emissions

Annual Data Updates

  • Includes all PUDL datasets that are updated on an annual basis, including EIA “Early Release” data.
  • FERC XBRL datasets have continuous, rolling submissions. Q2 data should be nearly complete. Q3 data often includes significant revisions or late submissions, so we tend to include updates in both of these quarters.
  • Different annual datasets become available at different times of the year.
  • Datasets receiving annual updates include:
    • EIA Form 860 (early release in Q2 or Q3, final release in Q3 or Q4)
    • EIA Form 923 (early release in Q2 or Q3, final release in Q3 or Q4)
    • EIA Form 861 (final release only, typically in Q3 – there is no early release data)
    • FERC Form 1 (early release in Q2, final release in Q3)
    • FERC Form 714 (early release in Q2, final release in Q3)
    • PHMSA Annual Natural Gas Report (currently only extracted, not fully transformed and distributed)
    • EIA Natural Gas data (Forms 176, 191, & 757a – currently only extracted, not fully transformed and distributed)
    • NREL ATB for Electricity (Q2 or Q3)
    • The crosswalk table connecting EIA generation units and EPA emissions units
    • The crosswalk table connecting FERC & EIA utilities and plants.
  • In addition to the data from FERC Forms 1 and 714 that is fully integrated into PUDL, there are many additional tables in those datasets and FERC’s other forms that we archive and convert from DBF & XBRL to modern formats that are more accessible and amenable to analytical use (SQLite). These include FERC Forms 2, 6, and 60. This is possible because we can re-use the same tooling that we apply to FERC Form 1 and 714 with minimal intervention. If upstream changes in the data format required significant work we might not be able to continue publishing these data, but so long as they are low effort we intend to keep them updated.

Ongoing Tasks

  • Triage newly reported or identified bugs and data issues, and fix those bugs determined to be urgent.
  • Ensure that PUDL relies on recent, supported versions of all our software dependencies so that it continues to function and be maintainable and we do not accumulate technical debt.
  • Deal with unexpected nightly build or infrastructure issues that are not related to new features or datasets. For example:
    • accommodate breaking changes introduced by new versions of our dependencies;
    • adapt to changes in the APIs of services we depend on, including Google Cloud, GitHub Actions, Zenodo, etc;
    • address resource or architectural constraints that come up as the data being processed by PUDL grows over time.
  • Ensure that our example example notebooks continue working with new data as it is published.
  • Ensure that our documentation remains up to date.

Cloud/Compute Costs

  • Infrastructure required for our continuous integration and deployment (CI/CD) pipelines, including nightly builds that run all available tests and data validations and regenerate the outputs from scratch.
  • Storage and network egress for regular data releases and data distribution
  • Hosting costs to provide a browseable and queryable web interface to the data we publish.

Our team

Zane Selvans

Admin
Set public energy data free!

Austen Sharpe

Admin
I <3 open data

Katherine Lamb

Energy data for all!

Budget


Transparent and open finances.

Invoice #232391
Contribution #815167

Credit from ZERO Lab @ Princeton to The Public Utility Data Liberation Project

+$20,000.00USD
Completed
Added funds #813514
$
Today’s balance

$77,503.19 USD

Total raised

$85,003.29 USD

Total disbursed

$7,500.10 USD

Estimated annual budget

$85,026.00 USD