SpaceNet: add SpaceNet 8, radiant mlhub -> aws #2203

adamjstewart · 2024-07-31T14:23:37Z

This PR includes a number of improvements:

Port from Radiant MLHub to AWS (Migrate from Radiant MLHub to Source Cooperative #1830)
Add SpaceNet 8
Add support for the test split
Add support for choosing a mask
Add support for choosing multiple image products?
Add support for choosing multiple mask products?
Testing

There are a few other peculiarities of these datasets that still need to be worked out:

~~SpaceNet 3, train, AOIs 2–4: some images are missing masks~~
SpaceNet 4: images for 27 different off-nadir angles
SpaceNet 7: should this be formulated as time-series?
SpaceNet 7, mask='labels_match_pix': weird reprojection bug
~~SpaceNet 7, train, mask='labels': masks for both Buildings and UDM~~
SpaceNet 8: which AOI is 12 and which is 13?
SpaceNet 8: should this be formulated as change detection?
~~SpaceNet 8, train, image='POST-event': some annotations have multiple images~~

adamjstewart · 2024-08-03T11:14:25Z

Not every chip has the same dimensions. Using the following command:

> find <dir> -name '*.tif' | xargs file | tr -s ' ' | cut -d ' ' -f 7,11 | sort | uniq -c | sort -nr | tr -s ' ' | sed 's/^/*/g'

SpaceNet 1

RGB:

3372 height=406, width=439
3096 height=406, width=438
1719 height=407, width=439
1548 height=407, width=440

MSI:

8702 height=102, width=110
1033 height=101, width=110

SpaceNet 2

MSI:

12544 height=163, width=163
1529 height=162, width=162
46 height=162, width=163

Pansharpened:

42357 height=650, width=650

SpaceNet 3

MSI:

3708 height=325, width=325

Pansharpened:

11124 height=1300, width=1300

SpaceNet 4

MSI:

29655 height=225, width=225

Pansharpened:

59310 height=900, width=900

SpaceNet 5

MSI:

2588 height=325, width=325

Pansharpened:

7764 height=1300, width=1300

SpaceNet 6

5462 height=900, width=900

SpaceNet 7

2488 height=1024, width=1024
516 height=1023, width=1024
410 height=1024, width=1023
99 height=1023, width=1023

SpaceNet 8

For some reason the file command does not output dimensions for about half of the files (all 1300x1300), so I used the following script:

import glob
import os
import rasterio as rio

for p in glob.iglob(os.path.join('SN8_floods', '*', '*', '*.tif')):
    with rio.open(p) as f:
        print(f'height={f.height}, width={f.width}')

and ran:

> python3 test.py | sort | uniq -c | sort -nr | tr -s ' ' | sed 's/^/*/g'

1207 height=1300, width=1300
258 height=961, width=961
240 height=835, width=835
147 height=916, width=916
100 height=1114, width=1114
99 height=1048, width=1048
90 height=786, width=786
59 height=786, width=785
44 height=834, width=835
42 height=835, width=834
34 height=785, width=786
31 height=1743, width=1743
25 height=916, width=915
21 height=1742, width=1743
20 height=915, width=916
19 height=785, width=785
17 height=1049, width=1048
17 height=1048, width=1049
13 height=961, width=962
13 height=961, width=748
11 height=1743, width=1742
11 height=1114, width=1113
11 height=1113, width=1114
10 height=1742, width=1742
8 height=962, width=961
7 height=834, width=834
5 height=1037, width=1743
4 height=1048, width=715
3 height=915, width=915
3 height=1114, width=765
2 height=1049, width=1049
2 height=1037, width=1742
1 height=962, width=962
1 height=1113, width=765
1 height=1113, width=1113

Previously, we always chose the smallest dimensions and indexed into the array since they were never off by more than 1. However, with SpaceNet 8 being so drastically different, I think we instead need to resample the images.

adamjstewart · 2024-08-05T20:28:02Z

I've been using the following script to test this on the real data (all 1.2 TB of it):

#!/usr/bin/env python3

"""Test all SpaceNet datasets."""

import itertools

from matplotlib import pyplot as plt
from torch.utils.data import DataLoader

import torchgeo.datasets
from torchgeo.datasets import SpaceNet                                                   


def test_dataset(ds: SpaceNet) -> None:
    """Test a single dataset."""
    print(ds.split, ds.aois[0], ds.image, ds.mask, len(ds))
    sample = ds[0]
    ds.plot(sample)
    plt.close()
    dl = DataLoader(ds, batch_size=8, shuffle=True)
    next(iter(dl))


for i in range(1, 9):
    print(f'SpaceNet {i}')
    SpaceNetX = getattr(torchgeo.datasets, f'SpaceNet{i}')
    for split in ['train', 'test']:
        for aoi, image, mask in itertools.product(
            SpaceNetX.valid_aois[split],
            SpaceNetX.valid_images[split],
            SpaceNetX.valid_masks,
        ):
            ds = SpaceNetX('data', split=split, aois=[aoi], image=image, mask=mask)
            test_dataset(ds)

torchgeo/datasets/spacenet.py

calebrob6 · 2024-08-13T17:41:46Z

Hard to review with all the changes to spacenet.py but I love that this removes our dependency on radiant-mlhub for 0.6. My biggest question is "does it work?", if you run a datamodule through all train/val/test batches for all spacenets do you hit any errors?

adamjstewart · 2024-08-13T17:44:29Z

Hard to review with all the changes to spacenet.py but I love that this removes our dependency on radiant-mlhub for 0.6. My biggest question is "does it work?", if you run a datamodule through all train/val/test batches for all spacenets do you hit any errors?

See #2203 (comment). It doesn't test the entire epoch, just a single batch. The entire epoch for all combinations would take a couple days. We don't yet have data modules for all datasets, only SpaceNet1.

calebrob6 · 2024-08-13T17:46:07Z

Can we just run it for a couple of days to verify if we have it all downloaded?

adamjstewart · 2024-08-13T17:49:32Z

I don't see a huge benefit over random sampling of mini-batches, but if you want to run it you can.

ashnair1

Was able to run train/val/test on SpaceNet1 👍

adamjstewart added the backwards-incompatible Changes that are not backwards compatible label Jul 31, 2024

adamjstewart added this to the 0.6.0 milestone Jul 31, 2024

adamjstewart mentioned this pull request Jul 31, 2024

Migrate from Radiant MLHub to Source Cooperative #1830

Closed

10 tasks

adamjstewart marked this pull request as draft July 31, 2024 14:24

github-actions bot added the datasets Geospatial or benchmark datasets label Jul 31, 2024

adamjstewart requested a review from ashnair1 July 31, 2024 17:05

github-actions bot added the documentation Improvements or additions to documentation label Aug 1, 2024

adamjstewart changed the title ~~SpaceNet: radiant mlhub -> aws~~ SpaceNet: add SpaceNet 8, radiant mlhub -> aws Aug 2, 2024

adamjstewart marked this pull request as ready for review August 5, 2024 20:23

adamjstewart force-pushed the datasets/spacenet branch from c6df446 to 9641f42 Compare August 6, 2024 11:52

github-actions bot added testing Continuous integration testing dependencies Packaging and dependencies labels Aug 6, 2024

calebrob6 reviewed Aug 13, 2024

View reviewed changes

torchgeo/datasets/spacenet.py Show resolved Hide resolved

torchgeo/datasets/spacenet.py Show resolved Hide resolved

adamjstewart force-pushed the datasets/spacenet branch from 83fc2fc to ae75976 Compare August 17, 2024 08:26

adamjstewart added 10 commits August 17, 2024 17:03

SpaceNet: radiant mlhub -> source coop

2618721

Add tarballs and md5s

0ccd5b1

Add directory mapping function

f096977

Need a URL to monkeypatch

05e3b35

Use directory glob instead

e650cb0

Fix URL f-string

5ab74a4

Fix _list_files

6ab9cc6

Add SpaceNet 8

d669fc4

Add docs

cc59bf5

Add masks

e5a5bbb

adamjstewart added 26 commits August 17, 2024 17:03

Split images and masks

9123a32

Fix __getitem__

3d526e9

Fix docutils URL

0a08bf2

Resample images to the same size

76945dd

Fix dtypes

eb6ed76

Fix SpaceNet 3

a507549

Fix SN7 and SN8 length

82b89f0

Fix one of reprojection bugs

c8238a3

Fix download

37bc859

Fix empty geometries

4c9f386

Remove radiant-mlhub

bcd8a33

Mock aws CLI

e076e71

Update tests

0b4a342

Fix mypy

85c6b27

Fix datamodule tests

e575391

Ruff

f748b01

Increase coverage

cb603c7

Fix empty geojsons and buggy reprojections

98c752f

Complete coverage

d51b635

Mask must be long

7f68a4a

Fix support for older fiona

7953899

Fix support for older fiona

3a7f996

Fix support for newer fiona

153cd1a

Fix warnings

177a691

Fix Windows

360d44a

Remove unused import

76dcd8d

adamjstewart force-pushed the datasets/spacenet branch from dddbbd1 to 76dcd8d Compare August 17, 2024 15:08

ashnair1 approved these changes Aug 17, 2024

View reviewed changes

adamjstewart merged commit 880593e into microsoft:main Aug 17, 2024
19 checks passed

adamjstewart deleted the datasets/spacenet branch August 17, 2024 18:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpaceNet: add SpaceNet 8, radiant mlhub -> aws #2203

SpaceNet: add SpaceNet 8, radiant mlhub -> aws #2203

adamjstewart commented Jul 31, 2024 •

edited

Loading

adamjstewart commented Aug 3, 2024 •

edited

Loading

adamjstewart commented Aug 5, 2024 •

edited

Loading

calebrob6 commented Aug 13, 2024

adamjstewart commented Aug 13, 2024

calebrob6 commented Aug 13, 2024

adamjstewart commented Aug 13, 2024

ashnair1 left a comment

SpaceNet: add SpaceNet 8, radiant mlhub -> aws #2203

SpaceNet: add SpaceNet 8, radiant mlhub -> aws #2203

Conversation

adamjstewart commented Jul 31, 2024 • edited Loading

adamjstewart commented Aug 3, 2024 • edited Loading

SpaceNet 1

SpaceNet 2

SpaceNet 3

SpaceNet 4

SpaceNet 5

SpaceNet 6

SpaceNet 7

SpaceNet 8

adamjstewart commented Aug 5, 2024 • edited Loading

calebrob6 commented Aug 13, 2024

adamjstewart commented Aug 13, 2024

calebrob6 commented Aug 13, 2024

adamjstewart commented Aug 13, 2024

ashnair1 left a comment

Choose a reason for hiding this comment

adamjstewart commented Jul 31, 2024 •

edited

Loading

adamjstewart commented Aug 3, 2024 •

edited

Loading

adamjstewart commented Aug 5, 2024 •

edited

Loading