Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasets.SEN12MS - dB values are cast to int32 #500

Closed
khdlr opened this issue Apr 6, 2022 · 2 comments · Fixed by #502
Closed

datasets.SEN12MS - dB values are cast to int32 #500

khdlr opened this issue Apr 6, 2022 · 2 comments · Fixed by #502
Labels
datasets Geospatial or benchmark datasets
Milestone

Comments

@khdlr
Copy link
Contributor

khdlr commented Apr 6, 2022

When loading samples from the SEN12MS dataset, the Sentinel-1 dB values (floats ranging from around -30 to 0) are cast to int32, discarding a lot of important information.

This line seems to be the culprit.

Fixing this would require either breaking the current behaviour where S1 and S2 imagery are stacked into a single tensor, or casting everything to float32 (not sure if this is okay for S2 data)

@adamjstewart adamjstewart added this to the 0.2.2 milestone Apr 6, 2022
@adamjstewart
Copy link
Collaborator

I vote for casting everything to float32. I think PyTorch will automatically do this for us, so all you have to do is remove the cast to int32. Want to open a PR?

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Apr 6, 2022
@khdlr
Copy link
Contributor Author

khdlr commented Apr 6, 2022

Sure, I'm happy to open a PR 😊

I believe the reason for the cast is that the Sentinel-2 imagery comes as uint16 data, which is not a thing in torch. In general, the geo-tiffs have the following datatypes:

Sentinel-1: float32
Sentinel-2: uint16
Label:      uint8

My current workaround is to just cast the uint16 to int32 and leave the others as they are. As you said, PyTorch will automatically cast the result to float32 when stacking.

Also not sure about the labels – but I don't believe having them as int32 is that useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants