Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RandomGeoSampler: fix performance regression #1968

Merged
merged 2 commits into from
Apr 16, 2024

Conversation

adamjstewart
Copy link
Collaborator

Fixes a bug I introduced all the way back in #477. Thanks @yichiac for noticing this!

We want to try to sample from a pixel-aligned grid whenever possible to avoid resampling in the average case. With the changes in this PR, we should revert back to the behavior we documented in our paper, where preprocessing greatly improves sampling performance. GridGeoSampler was not affected by this bug.

I still want to create an I/O benchmarking script similar to what we used in our original paper to ensure that this indeed works as advertised, so will keep as a draft for now.

@adamjstewart adamjstewart added this to the 0.5.3 milestone Mar 28, 2024
@github-actions github-actions bot added the samplers Samplers for indexing datasets label Mar 28, 2024
height = (bounds.maxy - bounds.miny - t_size[0]) // res
# May be negative if bounding box is smaller than patch size
width = (bounds.maxx - bounds.minx - t_size[1]) / res
height = (bounds.maxy - bounds.miny - t_size[0]) / res
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the width/height in pixel units. It no longer needs to be an integer, float is fine too. We cast to integer elsewhere.


minx = bounds.minx
miny = bounds.miny

# random.randrange crashes for inputs <= 0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a dead comment, we no longer use random.randrange and the input can no longer be negative


minx = bounds.minx
miny = bounds.miny

# random.randrange crashes for inputs <= 0
if width > 0:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer need to guard for negative numbers, it doesn't matter if the sample starts outside the bounds of the image for bounding boxes smaller than the patch size.

if height > 0:
miny += torch.rand(1).item() * height * res
# Use an integer multiple of res to avoid resampling
minx += int(torch.rand(1).item() * width) * res
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only real important line

@adamjstewart adamjstewart mentioned this pull request Apr 1, 2024
5 tasks
@adamjstewart adamjstewart changed the title RandomGeoSampler: improve performance RandomGeoSampler: fix performance regression Apr 3, 2024
@adamjstewart
Copy link
Collaborator Author

adamjstewart commented Apr 4, 2024

Evaluation using #1972:

raw (random) raw (grid) preprocessed (random) preprocessed (grid)
before 17.223 10.974 15.685 4.6075
after 17.360 11.032 9.613 4.6673

A 60% speedup for RandomGeoSampler when data is preprocessed (same CRS, res, TAP, COG).

@adamjstewart adamjstewart marked this pull request as ready for review April 4, 2024 19:48
@adamjstewart adamjstewart merged commit 925b93f into microsoft:main Apr 16, 2024
15 checks passed
@adamjstewart adamjstewart deleted the fixes/sampler-speed branch April 16, 2024 07:39
@adamjstewart adamjstewart modified the milestones: 0.5.3, 0.6.0 Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
samplers Samplers for indexing datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants