-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing benchmarking #1851
Indexing benchmarking #1851
Conversation
@@ -34,6 +34,9 @@ nosetests.xml | |||
.cache | |||
.ropeproject/ | |||
|
|||
# asv environments | |||
.asv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any problem if .asv
is added to .gitignore
?
A bunch of files in this directory makes my gui-git-client freeze.
asv_bench/benchmarks/__init__.py
Outdated
@@ -29,3 +29,12 @@ def randn(shape, frac_nan=None, chunks=None): | |||
x.flat[inds] = np.nan | |||
|
|||
return x | |||
|
|||
|
|||
def randint(low, high=None, size=None, frac_minus=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a seed for all these random numbers? That should decrease the variance for these test results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See line 9.
jhamman was careful enough :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A global seed for all benchmarks isn't a great idea -- it means that results will vary depending upon whether we run the full benchmark suite or not, and depending upon the order in which benchmark tests are run. It would be better to set a seed for each test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it means that results will vary depending upon whether we run the full benchmark suite or not, and depending upon the order in which benchmark tests are run.
Thanks for the details.
Understood.
Done.
asv_bench/benchmarks/indexing.py
Outdated
|
||
|
||
def time_indexing_basic(): | ||
for ind in basic_indexes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than a loop, another option would be to separate these cases into separate benchmarks. That would give more granular information.
asv_bench/benchmarks/indexing.py
Outdated
|
||
|
||
try: | ||
ds_dask = ds.chunk({'x': 100, 'y': 50, 't': 50}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is to use a subclass, like I did in #1847
asv_bench/benchmarks/indexing.py
Outdated
|
||
def time_indexing_basic_dask(): | ||
for ind in basic_indexes: | ||
ds_dask.isel(**ind) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably a good idea to call .load()
on these to load them into numpy. Otherwise it's only measure the time to construct the dask graph, not to evaluate it. Usually evaluation time dominates.
Thanks. All done. |
It might be another issue, but I think we can have a clear link to our benchmark result page. |
@fujiisoup - I don't see any problem including a link in the docs (probably on the testing section) and/or on the readme. Its maybe too soon to include in the PR template since we just don't have very good coverage yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fujiisoup - glad to see the ASV setup getting used. Everything here looks good to me.
Just added some benchmarks for basic, outer, and vectorized indexing and assignments.