The Public Utility Data Liberation Project
Liberating US energy system data for easy use by people fighting climate change.
Contribute
Become a financial contributor.
Financial Contributions
Your contribution will help keep PUDL going strong! Watt contributors are encouraged to engage with PUDL directly through GitHub issues and discuss... Read more
You are a part of keeping PUDL free and updated! You can participate in our quarterly PUDL work planning and prioritization process, and we’ll incl... Read more
You are a key part of keeping PUDL free and updated! You can participate in our quarterly PUDL work planning and prioritization process, and we’ll ... Read more
You are a major part of keeping PUDL free and updated! You can participate in our quarterly PUDL work planning and prioritization process, and we’l... Read more
The Public Utility Data Liberation Project is all of us
Our contributors 7
Thank you for supporting The Public Utility Data Liberation Project.
RMI
Gigawatt⚡️⚡️⚡️
$32,000 USD
GridLab
Gigawatt⚡️⚡️⚡️
$30,000 USD
ZERO Lab @ Pr...
Megawatt ⚡️⚡️
$20,000 USD
Catalyst Coop...
$3,000 USD
Saul Shanabrook
Watt
$2 USD
Christina Gos...
Test
$1 USD
About
- Help keep PUDL regularly updated and freely available to the public under an open license. You are closing the gap on information asymmetry in the US energy system.
- Your role in sustaining PUDL is acknowledged on the project website, GitHub repository, in Catalyst’s social media posts, presentations, and other public forums.
- Funders above the Kilowatt level: You’re invited to participate in PUDL’s quarterly planning process. If we fundraise more than our target budget, this will include the ability to propose and vote on additional PUDL enhancements to work on in 2025!
What is PUDL?
Supporting PUDL
- Monthly archives of all our raw input datasets.
- Annual updates to all existing PUDL data sources (no later than two weeks after the raw data are published).
- Quarterly versioned data releases.
- Computing resources for continuous integration testing and nightly data builds.
- Maintenance of software dependencies, documentation, and example notebooks.
- Distribution of PUDL data to the AWS Open Data Registry, Zenodo, Kaggle, and web interface.
- Immediate support for bugs and data quality issues
- Incremental adaptation of the PUDL infrastructure.
- Project management and overhead.
Beyond The Baseline
What We’re Asking For
- Watt: 0-5%
- Kilowatt: 5-9.9%
- Megawatt: 10-14.9%
- Gigawatt: 15% +
Benefits of Being a Sustainer
- You are helping keep PUDL regularly updated and freely available to the public under an open license. You are closing the gap on information asymmetry in the US energy system.
- Your role in sustaining PUDL is acknowledged on the project website, GitHub repository, in Catalyst’s social media posts, presentations, and other public forums in proportion to your contribution.
- If we fundraise in excess of the baseline budget: Sustainers at the level of Kilowatt and greater are invited to participate in the quarterly PUDL work planning process where they can vote (in proportion to their contribution) on what PUDL enhancements will be funded with the additional resources.
2025 PUDL Statement of Work
Data Archiving
- Ensure we capture monthly snapshots of all our input data sources, and archive them on Zenodo.
- Fresh archives should be up on Zenodo by the 5th of each month.
- Adapt the archiving system as needed when upstream data publication methods or formats change. In most cases, little or no work is required, but occasionally upstream data changes substantially, requiring significant work to integrate.
- Snapshots are captured monthly to ensure that we have a chance to identify upstream issues that are beyond our control early, and work to address them before we need the data to be ready for use in a quarterly release.
- Monthly snapshots also ensure that if there are upstream issues that come up right before a quarterly release, we still have relatively fresh inputs that can be used.
Quarterly Data Updates
- Includes all PUDL datasets that are updated more frequently than annually.
- Take place in February, May, August, and November.
- Updated data should be in the nightly builds by the 15th of each of these months.
- Versioned releases should be out by the 20th of each of these months.
- These versioned data releases are archived for long term access.
- Datasets receiving quarterly updates include:
- EIA Forms 860M and 923 (year-to-date)
- EIA Form 930 (hourly Grid Monitor)
- EIA Bulk Electricity API Data
- EPA CEMS Hourly Emissions
Annual Data Updates
- Includes all PUDL datasets that are updated on an annual basis, including EIA “Early Release” data.
- FERC XBRL datasets have continuous, rolling submissions. Q2 data should be nearly complete. Q3 data often includes significant revisions or late submissions, so we tend to include updates in both of these quarters.
- Different annual datasets become available at different times of the year.
- Datasets receiving annual updates include:
- EIA Form 860 (early release in Q2 or Q3, final release in Q3 or Q4)
- EIA Form 923 (early release in Q2 or Q3, final release in Q3 or Q4)
- EIA Form 861 (final release only, typically in Q3 – there is no early release data)
- FERC Form 1 (early release in Q2, final release in Q3)
- FERC Form 714 (early release in Q2, final release in Q3)
- PHMSA Annual Natural Gas Report (currently only extracted, not fully transformed and distributed)
- EIA Natural Gas data (Forms 176, 191, & 757a – currently only extracted, not fully transformed and distributed)
- NREL ATB for Electricity (Q2 or Q3)
- The crosswalk table connecting EIA generation units and EPA emissions units
- The crosswalk table connecting FERC & EIA utilities and plants.
- In addition to the data from FERC Forms 1 and 714 that is fully integrated into PUDL, there are many additional tables in those datasets and FERC’s other forms that we archive and convert from DBF & XBRL to modern formats that are more accessible and amenable to analytical use (SQLite). These include FERC Forms 2, 6, and 60. This is possible because we can re-use the same tooling that we apply to FERC Form 1 and 714 with minimal intervention. If upstream changes in the data format required significant work we might not be able to continue publishing these data, but so long as they are low effort we intend to keep them updated.
Ongoing Tasks
- Triage newly reported or identified bugs and data issues, and fix those bugs determined to be urgent.
- Ensure that PUDL relies on recent, supported versions of all our software dependencies so that it continues to function and be maintainable and we do not accumulate technical debt.
- Deal with unexpected nightly build or infrastructure issues that are not related to new features or datasets. For example:
- accommodate breaking changes introduced by new versions of our dependencies;
- adapt to changes in the APIs of services we depend on, including Google Cloud, GitHub Actions, Zenodo, etc;
- address resource or architectural constraints that come up as the data being processed by PUDL grows over time.
- Ensure that our example example notebooks continue working with new data as it is published.
- Ensure that our documentation remains up to date.
Cloud/Compute Costs
- Infrastructure required for our continuous integration and deployment (CI/CD) pipelines, including nightly builds that run all available tests and data validations and regenerate the outputs from scratch.
- Storage and network egress for regular data releases and data distribution
- Hosting costs to provide a browseable and queryable web interface to the data we publish.
Our team
Budget
Transparent and open finances.
Credit from ZERO Lab @ Princeton to The Public Utility Data Liberation Project •
$77,503.19 USD
$85,003.29 USD
$7,500.10 USD
$85,026.00 USD