Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase speed of Delta encoding using np.subtract #584

Merged
merged 4 commits into from
Oct 10, 2024

Conversation

ehgus
Copy link
Contributor

@ehgus ehgus commented Sep 27, 2024

This PR accelerated Delta encoding by replacing np.diff into np.subtract.
Using inplace operation of np.subtract reduce array creation time.

Here is a simple benchmark:

from numcodecs import Delta
from numcodecs.compat import ensure_ndarray
import numpy as np
import time

# new codec
class CustomDelta(Delta):
    def encode(self, buf):
        arr = ensure_ndarray(buf).view(self.dtype)
        arr = arr.reshape(-1, order='A')
        enc = np.empty_like(arr, dtype=self.astype)
        enc[0] = arr[0]

        # only difference from Delta!!
        np.subtract(arr[1:], arr[0:-1], out = enc[1:])

        return enc

data_length = int(1e9)
x = np.random.randint(0,10000,(data_length,),dtype='i4')

# benchmark
base_codec = Delta(dtype='i4')
new_codec = CustomDelta(dtype='i4')
for codec in (base_codec, new_codec):
    time_start = time.perf_counter()
    for _ in range(10):
        codec.encode(x)
    time_end = time.perf_counter()
    benchmark_time = (time_end-time_start)/10
    print(f'time of {str(codec)} = {benchmark_time:.6f}')
# time of Delta(dtype='<i4') = 1.510299
# time of CustomDelta(dtype='<i4') = 0.953429

assert np.all(base_codec.encode(x) == new_codec.encode(x)), 'They should be equal'

TODO:

  • Unit tests and/or doctests in docstrings
  • Tests pass locally
  • Changes documented in docs/release.rst
  • Docs build locally
  • GitHub Actions CI passes
  • Test coverage to 100% (Codecov passes)

Copy link

codecov bot commented Sep 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.91%. Comparing base (e836f39) to head (61e4298).
Report is 62 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #584   +/-   ##
=======================================
  Coverage   99.91%   99.91%           
=======================================
  Files          59       59           
  Lines        2332     2332           
=======================================
  Hits         2330     2330           
  Misses          2        2           
Files with missing lines Coverage Δ
numcodecs/delta.py 100.00% <100.00%> (ø)

@dstansby
Copy link
Contributor

dstansby commented Oct 9, 2024

Looks good to me 👍 - could you just leave a comment above the line you changed explaining that you're using subtract for speed, otherwise someone might come along later and switch it back 🙈

@ehgus
Copy link
Contributor Author

ehgus commented Oct 10, 2024

Sure! I will leave a comment.

@dstansby dstansby enabled auto-merge (squash) October 10, 2024 07:49
@dstansby dstansby merged commit 9518e0b into zarr-developers:main Oct 10, 2024
30 checks passed
@jakirkham
Copy link
Member

Thanks Lee and David! 🙏

This is a good improvement. Always good to avoid unneeded array creation and copies 🙂

Noticed that bool needs a bit of extra handling. This wasn't tested previously. Idk to what extent it is used (maybe Delta + PackBits is effective in some cases?). Still it seems like a good idea to keep that case working given it is low effort. Made a small tweak to this effect with a test in PR: #595

@martindurant martindurant mentioned this pull request Oct 17, 2024
7 tasks
@bnavigator
Copy link
Contributor

Breaks zarr: #653

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants