Skip to content

Commit

Permalink
Merge branch 'main' into dbt_setup
Browse files Browse the repository at this point in the history
  • Loading branch information
zschira committed Feb 14, 2025
2 parents 389c540 + 33a92bf commit eb0765a
Show file tree
Hide file tree
Showing 40 changed files with 1,816 additions and 918 deletions.
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,6 @@ authors:
given-names: Dazhong

title: "The Public Utility Data Liberation (PUDL) Project"
version: 2024.11.0
version: 2025.2.0
doi: 10.5281/zenodo.3404014
date-released: 2024-11-14
date-released: 2025-02-13
8 changes: 3 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -185,14 +185,12 @@ summary:
* `Kaggle <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_access.html#access-kaggle>`__
provides easy Jupyter notebook access to the PUDL data, updated weekly:
https://www.kaggle.com/datasets/catalystcooperative/pudl-project
* `Cloud storage <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_access.html#access-cloud>`__
is populated by our nightly data builds, and is free to access thanks to the `AWS
Open Data Registry <https://registry.opendata.aws/catalyst-cooperative-pudl/>`__.
* `Zenodo <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_access.html#access-zenodo>`__
provides stable long-term access to our versioned data releases with a citeable DOI:
https://doi.org/10.5281/zenodo.3653158
* `Nightly Data Builds <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_access.html#access-nightly-builds>`__
push their outputs to the AWS Open Data Registry:
https://registry.opendata.aws/catalyst-cooperative-pudl/
See `the nightly build docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_access.html#access-nightly-builds>`__
for direct download links.
* `The PUDL Development Environment <https://catalystcoop-pudl.readthedocs.io/en/nightly/dev/dev_setup.html>`__
lets you run the PUDL data processing pipeline locally.

Expand Down
362 changes: 243 additions & 119 deletions docs/data_access.rst

Large diffs are not rendered by default.

29 changes: 20 additions & 9 deletions docs/data_sources/other_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,26 @@ converted directly from the original geodatabase distributed by the US Census Bu
EPA CAMD to EIA Power Sector Data Crosswalk
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This crosswalk is periodically updated through a collaboration between the US EPA and
the EIA, and connects EPA CAMD emissions units (smokestacks) which appear in
:doc:`epacems` with corresponding EIA plant components reported in EIA Forms 860
and 923 (``plant_id_eia``, ``boiler_id``, ``generator_id``). This one-to-many
connection is necessary because pollutants from various plant parts are collecitvely
emitted and measured from one point-source. We augment the crosswalk that they publish
on GitHub.

The crosswalk is distributed by EPA `via GitHub <https://github.com/USEPA/camd-eia-crosswalk>`__.
The `original EPA CAMD to EIA crosswalk <https://github.com/USEPA/camd-eia-crosswalk>`__
was published by the US Environmental Protection Agency on GitHub and connects EPA CAMD
emissions units (smokestacks) which appear in :doc:`epacems` with corresponding EIA
plant components reported in EIA Forms 860 and 923 (``plant_id_eia``, ``boiler_id``,
``generator_id``). This many-to-many connection is necessary because pollutants from
various plant parts are collecitvely emitted and measured from one point-source.

The original crosswalk was generated using only 2018 data. However, there is useful
information in all years of data, and we augment the crosswalk that they publish on
GitHub by running their code against all available later years of data.

Re-running the crosswalk pulls the latest data from the
`CAMD FACT API <https://www.epa.gov/power-sector/field-audit-checklist-tool-fact-api>`__
which results in some changes to the generator and unit IDs reported on the EPA side of
the crosswalk. The changes only result in the addition of new units and generators in
the EPA data, with no changes to matches at the plant level (other than identification
of new plant-plant matches). We derive sub-plant IDs (``subplant_id``) from the
crosswalk in the table :ref:`core_epa__assn_eia_epacamd_subplant_ids`. Note that these
IDs are not necessarily stable across multiple releases of this data, and should not be
hard-coded into analyses.

.. _data-eiaaeo:

Expand Down
4 changes: 2 additions & 2 deletions docs/dev/clone_ferc1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ The FoxPro / XBRL derived FERC Form 1 databases include 100+ tables, containing
columns.

If you need to work with this relatively unprocessed data, we highly recommend
downloading it from one of our periodic data releases or our
:ref:`access-nightly-builds`.
downloading it from one of our stable data releases or nightly build outputs, which
can be found in the PUDL :ref:`access-zenodo` or :ref:`access-cloud`.

Cloning the original FERC database is the first step in the PUDL ETL process. This can
be done using the dagster UI (see :ref:`run-dagster-ui`) or with the ``ferc_to_sqlite``
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ pages for each source:
* :doc:`data_sources/eia860`
* :doc:`data_sources/eia861`
* :doc:`data_sources/eia923`
* :doc:`data_sources/eia930`
* :doc:`data_sources/epacems`
* :doc:`data_sources/ferc1`
* :doc:`data_sources/ferc714`
Expand Down
120 changes: 105 additions & 15 deletions docs/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,77 @@ v2025.XX.x (2025-MM-DD)
New Data
^^^^^^^^

Expanded Data Coverage
^^^^^^^^^^^^^^^^^^^^^^

Bug Fixes
^^^^^^^^^

Major Dependency Updates
^^^^^^^^^^^^^^^^^^^^^^^^

Quality of Life Improvements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. _release-v2025.2.0:

---------------------------------------------------------------------------------------
v2025.2.0 (2025-02-13)
---------------------------------------------------------------------------------------

This is our regular quarterly release for 2025Q1. It includes updates to all the
datasets that are published with quarterly or higher frequency, plus initial verisons
of a few new data sources that have been in the works for a while.

One major change this quarter is that we are now publishing all processed PUDL data as
Apache Parquet files, alongside our existing SQLite databases. See :doc:`data_access`
for more on how to access these outputs.

Some potentially breaking changes to be aware of:

* In the :doc:`data_sources/eia930` a number of new energy sources have been added, and
some old energy sources have been split into more granular categories. See
:ref:`data-sources-eia930-changes-in-energy-source-granularity-over-time`.
* We are now running the EPA's CAMD to EIA unit crosswalk code for each individual year
starting from 2018, rather than just 2018 and 2021, resulting in more connections
between these two datasets and changes to some sub-plant IDs. See the note below for
more details.

Many thanks to the organizations who make these regular updates possible! Especially
`GridLab <https://gridlab.org>`__, `RMI <https://rmi.org>`__, and the `ZERO Lab at
Princeton University <https://zero.lab.princeton.edu/>`__. If you rely on PUDL and would
like to help ensure that the data keeps flowing, please consider joining them as a `PUDL
Sustainer <https://opencollective.com/pudl>`__, as we are still fundraising for 2025.

New Data
^^^^^^^^

EIA 176
~~~~~~~
* Add a couple of semi-transformed interim EIA-176 (natural gas sources and
dispositions) tables. They aren't yet being written to the database, but are one step
closer. See :issue:`3555` and PRs :pr:`3590,3978`. Thanks to :user:`davidmudrauskas`
for moving this dataset forward.
* Extracted these interim tables up through the latest 2023 data release. See
:issue:`4002` and :pr:`4004`.

EIA 860
~~~~~~~
* Added EIA 860 Multifuel table. See :issue:`3438` and :pr:`3946`.

FERC 1
~~~~~~
* Added three new output tables containing granular utility accounting data.
See :pr:`4057`, :issue:`3642` and the table descriptions in the data dictionary:

* :ref:`out_ferc1__yearly_detailed_income_statements`
* :ref:`out_ferc1__yearly_detailed_balance_sheet_assets`
* :ref:`out_ferc1__yearly_detailed_balance_sheet_liabilities`

SEC Form 10-K Parent-Subsidiary Ownership
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* We have added some new tables describing the parent-subsidary company ownership
* We have added some new tables describing the parent-subsidiary company ownership
relationships reported in the
`SEC's Form 10-K <https://en.wikipedia.org/wiki/Form_10-K>`__, Exhibit 21
"Subsidiaries of the Registrant". Where possible these tables link the SEC filers or
Expand All @@ -35,25 +102,51 @@ SEC Form 10-K Parent-Subsidiary Ownership
* :ref:`core_sec10k__quarterly_exhibit_21_company_ownership`
* :ref:`core_sec10k__quarterly_company_information`

New Data Coverage
^^^^^^^^^^^^^^^^^
Expanded Data Coverage
^^^^^^^^^^^^^^^^^^^^^^

EPA CEMS
~~~~~~~~
* Added 2024 Q4 of CEMS data. See :issue:`4041` and :pr:`4052`.

EIA 860
EPA CAMD EIA Crosswalk
~~~~~~~~~~~~~~~~~~~~~~
* In the past, the crosswalk in PUDL has used the EPA's published crosswalk (run with
2018 data), and an additional crosswalk we ran with 2021 EIA 860 data. To ensure that
the crosswalk reflects updates in both EIA and EPA data, we re-ran the EPA R code
which generates the EPA CAMD EIA crosswalk with 4 new years of data: 2019, 2020, 2022
and 2023. Re-running the crosswalk pulls the latest data from the CAMD FACT API, which
results in some changes to the generator and unit IDs reported on the EPA side of the
crosswalk, which feeds into the creation of :ref:`core_epa__assn_eia_epacamd`.
* The changes only result in the addition of new units and generators in the EPA data,
with no changes to matches at the plant level. However, the updates to generator and
unit IDs have resulted in changes to the subplant IDs - some EIA boilers and
generators which previously had no matches to EPA data have now been matched to EPA
unit data, resulting in an overall **reduction** in the number of rows in the
:ref:`core_epa__assn_eia_epacamd_subplant_ids` table. See issues :issue:`4039`
and PR :pr:`4056` for a discussion of the changes observed in the course of this
update.

EIA 860M
~~~~~~~~
* Added EIA 860m through December 2024. See :issue:`4038` and :pr:`4047`.

EIA 923
~~~~~~~
* Added EIA 860 Multifuel data. See :issue:`3438` and :pr:`3946`.
* Added EIA 923 monthly data through September 2024. See :issue:`4038` and :pr:`4047`.

EIA 176
EIA Bulk Electricity Data
~~~~~~~~~~~~~~~~~~~~~~~~~
* Updated the EIA Bulk Electricity data to include data published up through
2024-11-01. See :issue:`4042` and PR :pr:`4051`.

EIA 930
~~~~~~~
* Add a couple of semi-transformed interim EIA-176 (natural gas sources and
dispositions) tables. They aren't yet being written to the database, but are one step
closer. See :issue:`3555` and PRs :pr:`3590,3978`. Thanks to :user:`davidmudrauskas`
for moving this dataset forward.
* Extracted these interim tables up through the latest 2023 data release. See
:issue:`4002` and :pr:`4004`.
* Updated the EIA 930 data to include data published up through the beginning of
February 2025. See :issue:`4040` and PR :pr:`4054`. 10 new energy sources
were added and 3 were retired; see
:ref:`data-sources-eia930-changes-in-energy-source-granularity-over-time` for
more information.

Bug Fixes
^^^^^^^^^
Expand All @@ -69,9 +162,6 @@ Bug Fixes
:ref:`out_vcerare__hourly_available_capacity_factor` and related tables. See issue
:issue:`4007` and PR :pr:`4029`.

Major Dependency Updates
^^^^^^^^^^^^^^^^^^^^^^^^

Quality of Life Improvements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* We added a ``sources`` parameter to ``pudl.metadata.classes.DataSource.from_id()``
Expand Down
40 changes: 38 additions & 2 deletions docs/templates/eia930_child.rst.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@
{% block database_text %}
Clicking on the links will show you a description of the table as well as the names and
descriptions of each of its fields. Due to the size of the EIA-930 hourly tables we only
publish them as Parquet files, which are not browseable online. See :ref:`access-kaggle`
and :ref:`access-nightly-builds` for information on how to access these outputs.
publish them as Parquet files, which are not browseable online, but they can be
downloaded directly from :ref:`access-cloud` via the links embedded in the
:doc:`/data_dictionaries/pudl_db` or accessed on :ref:`access-kaggle`.
{% endblock %}

{% block background %}
Expand Down Expand Up @@ -128,6 +129,41 @@ it includes many irregularities including oulying and missing values, which they
to manage in their aggregated (daily or nationwide) data, but which are present in the
original reported data.

.. _data-sources-eia930-changes-in-energy-source-granularity-over-time:

Changes in energy source granularity over time
----------------------------------------------
In the Q1 2025 data release, the wind, solar, and hydro power energy source columns
were removed from the raw dataset, and new columns for multiple different kinds of
wind, solar, hydro power, and energy storage were added. The change takes effect
starting in 2024half2. Analyses that cross this temporal boundary will need to
aggregate in order to compare e.g. 2025 solar with 2023 solar.

.. list-table::
:widths: 20 20
:header-rows: 1

* - Removed in Q1 2025 release
- Added in Q1 2025 release
* -
- ``battery_storage``
* -
- ``geothermal``
* - ``hydro``
- | ``hydro_excluding_pumped_storage``
| ``pumped_storage``
* -
- ``other_energy_storage``
* - ``solar``
- | ``solar_w_integrated_battery_storage``
| ``solar_wo_integrated_battery_storage``
* -
- ``unknown_energy_storage``
* - ``wind``
- | ``wind_w_integrated_battery_storage``
| ``wind_wo_integrated_battery_storage``


Inconsistent interchange
------------------------

Expand Down
7 changes: 5 additions & 2 deletions docs/templates/epacems_child.rst.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,11 @@
{% block database_text %}
Clicking on the links will show you a description of the table as well as the names and
descriptions of each of its fields. Due to the size of the CEMS data we only publish
it as Parquet files, which are not browseable online. See :ref:`access-kaggle` and
:ref:`access-nightly-builds` for information on how to access these outputs.
it as Parquet files, which are not browseable online, but which can be downloaded
directly from :ref:`access-cloud` using the links embedded in the
:doc:`/data_dictionaries/pudl_db` or accessed with a Jupyter Notebook on
:ref:`access-kaggle`. See :ref:`access-cloud` for more information on how to query
the Parquet outputs.
{% endblock %}

{% block background %}
Expand Down
Loading

0 comments on commit eb0765a

Please sign in to comment.