-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SpaceNet: add SpaceNet 8, radiant mlhub -> aws #2203
Conversation
Not every chip has the same dimensions. Using the following command: > find <dir> -name '*.tif' | xargs file | tr -s ' ' | cut -d ' ' -f 7,11 | sort | uniq -c | sort -nr | tr -s ' ' | sed 's/^/*/g' SpaceNet 1RGB:
MSI:
SpaceNet 2MSI:
Pansharpened:
SpaceNet 3MSI:
Pansharpened:
SpaceNet 4MSI:
Pansharpened:
SpaceNet 5MSI:
Pansharpened:
SpaceNet 6
SpaceNet 7
SpaceNet 8For some reason the import glob
import os
import rasterio as rio
for p in glob.iglob(os.path.join('SN8_floods', '*', '*', '*.tif')):
with rio.open(p) as f:
print(f'height={f.height}, width={f.width}') and ran: > python3 test.py | sort | uniq -c | sort -nr | tr -s ' ' | sed 's/^/*/g'
Previously, we always chose the smallest dimensions and indexed into the array since they were never off by more than 1. However, with SpaceNet 8 being so drastically different, I think we instead need to resample the images. |
I've been using the following script to test this on the real data (all 1.2 TB of it): #!/usr/bin/env python3
"""Test all SpaceNet datasets."""
import itertools
from matplotlib import pyplot as plt
from torch.utils.data import DataLoader
import torchgeo.datasets
from torchgeo.datasets import SpaceNet
def test_dataset(ds: SpaceNet) -> None:
"""Test a single dataset."""
print(ds.split, ds.aois[0], ds.image, ds.mask, len(ds))
sample = ds[0]
ds.plot(sample)
plt.close()
dl = DataLoader(ds, batch_size=8, shuffle=True)
next(iter(dl))
for i in range(1, 9):
print(f'SpaceNet {i}')
SpaceNetX = getattr(torchgeo.datasets, f'SpaceNet{i}')
for split in ['train', 'test']:
for aoi, image, mask in itertools.product(
SpaceNetX.valid_aois[split],
SpaceNetX.valid_images[split],
SpaceNetX.valid_masks,
):
ds = SpaceNetX('data', split=split, aois=[aoi], image=image, mask=mask)
test_dataset(ds) |
c6df446
to
9641f42
Compare
Hard to review with all the changes to spacenet.py but I love that this removes our dependency on radiant-mlhub for 0.6. My biggest question is "does it work?", if you run a datamodule through all train/val/test batches for all spacenets do you hit any errors? |
See #2203 (comment). It doesn't test the entire epoch, just a single batch. The entire epoch for all combinations would take a couple days. We don't yet have data modules for all datasets, only SpaceNet1. |
Can we just run it for a couple of days to verify if we have it all downloaded? |
I don't see a huge benefit over random sampling of mini-batches, but if you want to run it you can. |
83fc2fc
to
ae75976
Compare
dddbbd1
to
76dcd8d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was able to run train/val/test on SpaceNet1 👍
This PR includes a number of improvements:
There are a few other peculiarities of these datasets that still need to be worked out:
SpaceNet 3, train, AOIs 2–4: some images are missing masksSpaceNet 7, train, mask='labels': masks for both Buildings and UDMSpaceNet 8, train, image='POST-event': some annotations have multiple imagesCloses #1830