Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel concatenate #5926

Merged
merged 35 commits into from
Sep 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
f9e0106
Simplify concatenate
bouweandela Apr 18, 2024
6b01352
First attempt at parallel concatenate
bouweandela Apr 23, 2024
35facc7
Clean up a bit
bouweandela Apr 25, 2024
6e883da
Add support for comparing different data types
bouweandela Apr 26, 2024
7cc9f19
Merge branch 'main' of github.com:scitools/iris into parallel-concate…
bouweandela Apr 26, 2024
add3f66
Undo unnessary change
bouweandela May 15, 2024
eb0b340
More tests
bouweandela May 15, 2024
24615a0
Use faster lookup
bouweandela May 15, 2024
5c68a8e
Merge branch 'main' of github.com:scitools/iris into parallel-concate…
bouweandela May 15, 2024
2540fea
Add test to show that NaNs are considered equal
bouweandela Jun 10, 2024
3bfea80
Merge branch 'main' into parallel-concatenate
bouweandela Jun 10, 2024
5870bcd
Merge branch 'main' of github.com:scitools/iris into parallel-concate…
bouweandela Jun 25, 2024
b59e3dc
Avoid inserting closures into the Dask graph
bouweandela Jun 25, 2024
e91175c
Merge branch 'main' into parallel-concatenate
bouweandela Jun 26, 2024
c2973ae
Fix type hints
bouweandela Jun 26, 2024
1633b70
Merge branch 'main' of github.com:scitools/iris into parallel-concate…
bouweandela Jul 3, 2024
8b44aa5
Compute numpy array hashes immediately
bouweandela Jul 3, 2024
134669d
Concatenate 25 cubes instead of 2
bouweandela Jul 3, 2024
8a41250
Improve test coverage
bouweandela Jul 3, 2024
861dd1b
Merge branch 'main' of github.com:scitools/iris into parallel-concate…
bouweandela Jul 10, 2024
125a0f6
Merge branch 'main' of github.com:scitools/iris into parallel-concate…
bouweandela Jul 22, 2024
0dcb323
Add whatsnew entry
bouweandela Jul 22, 2024
3f9a81c
Merge branch 'main' of github.com:scitools/iris into parallel-concate…
bouweandela Aug 14, 2024
470f28e
Various improvements from review
bouweandela Aug 14, 2024
2864bc7
Use correct value for chunks for numpy arrays
bouweandela Aug 14, 2024
2150535
Python 3.10 compatibility
bouweandela Aug 14, 2024
f481d55
Avoid creating derived coordinates multiple times
bouweandela Aug 15, 2024
c7d3d28
Merge branch 'main' of github.com:scitools/iris into parallel-concate…
bouweandela Aug 15, 2024
073b25e
Support comparing differently shaped arrays
bouweandela Sep 2, 2024
f2edc16
Merge branch 'main' of github.com:scitools/iris into parallel-concate…
bouweandela Sep 2, 2024
d71c008
Rewrite for code style without multiple returns
bouweandela Sep 2, 2024
44ddcda
Remove print call
bouweandela Sep 2, 2024
cf1ea3d
Better hashing algorithm
bouweandela Sep 3, 2024
1969d49
Add more information to release notes
bouweandela Sep 3, 2024
7948812
Merge branch 'main' of github.com:scitools/iris into parallel-concate…
bouweandela Sep 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 21 additions & 11 deletions benchmarks/benchmarks/merge_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,12 @@
# See LICENSE in the root of the repository for full licensing details.
"""Benchmarks relating to :meth:`iris.cube.CubeList.merge` and ``concatenate``."""

import warnings

import numpy as np

from iris.cube import CubeList
from iris.warnings import IrisVagueMetadataWarning

from .generate_data.stock import realistic_4d_w_everything

Expand Down Expand Up @@ -44,19 +47,26 @@ class Concatenate:

cube_list: CubeList

def setup(self):
source_cube = realistic_4d_w_everything()
second_cube = source_cube.copy()
first_dim_coord = second_cube.coord(dimensions=0, dim_coords=True)
first_dim_coord.points = (
first_dim_coord.points + np.ptp(first_dim_coord.points) + 1
)
self.cube_list = CubeList([source_cube, second_cube])

def time_concatenate(self):
params = [[False, True]]
param_names = ["Lazy operations"]

def setup(self, lazy_run: bool):
warnings.filterwarnings("ignore", message="Ignoring a datum")
warnings.filterwarnings("ignore", category=IrisVagueMetadataWarning)
source_cube = realistic_4d_w_everything(lazy=lazy_run)
self.cube_list = CubeList([source_cube])
for _ in range(24):
next_cube = self.cube_list[-1].copy()
first_dim_coord = next_cube.coord(dimensions=0, dim_coords=True)
first_dim_coord.points = (
first_dim_coord.points + np.ptp(first_dim_coord.points) + 1
)
self.cube_list.append(next_cube)

def time_concatenate(self, _):
_ = self.cube_list.concatenate_cube()

def tracemalloc_concatenate(self):
def tracemalloc_concatenate(self, _):
_ = self.cube_list.concatenate_cube()

tracemalloc_concatenate.number = 3 # type: ignore[attr-defined]
12 changes: 12 additions & 0 deletions docs/src/whatsnew/latest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,18 @@ This document explains the changes made to Iris for this release
with cftime :class:`~cftime.datetime` objects can benefit from the same
improvement by adding a type hint to their category funcion. (:pull:`5999`)

#. `@bouweandela`_ made :meth:`iris.cube.CubeList.concatenate` faster if more
than two cubes are concatenated with equality checks on the values of
auxiliary coordinates, derived coordinates, cell measures, or ancillary
variables enabled.
In some cases, this may lead to higher memory use. This can be remedied by
reducing the number of Dask workers.
In rare cases, the new implementation could potentially be slower. This
may happen when there are very many or large auxiliary coordinates, derived
coordinates, cell measures, or ancillary variables to be checked that span
the concatenation axis. This issue can be avoided by disabling the
problematic check. (:pull:`5926`)

🔥 Deprecations
===============

Expand Down
Loading
Loading