Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpaceNet: add SpaceNet 8, radiant mlhub -> aws #2203

Merged
merged 37 commits into from
Aug 17, 2024

Conversation

adamjstewart
Copy link
Collaborator

@adamjstewart adamjstewart commented Jul 31, 2024

This PR includes a number of improvements:

  • Port from Radiant MLHub to AWS (Migrate from Radiant MLHub to Source Cooperative #1830)
  • Add SpaceNet 8
  • Add support for the test split
  • Add support for choosing a mask
  • Add support for choosing multiple image products?
  • Add support for choosing multiple mask products?
  • Testing

There are a few other peculiarities of these datasets that still need to be worked out:

  • SpaceNet 3, train, AOIs 2–4: some images are missing masks
  • SpaceNet 4: images for 27 different off-nadir angles
  • SpaceNet 7: should this be formulated as time-series?
  • SpaceNet 7, mask='labels_match_pix': weird reprojection bug
  • SpaceNet 7, train, mask='labels': masks for both Buildings and UDM
  • SpaceNet 8: which AOI is 12 and which is 13?
  • SpaceNet 8: should this be formulated as change detection?
  • SpaceNet 8, train, image='POST-event': some annotations have multiple images

Closes #1830

@adamjstewart adamjstewart added the backwards-incompatible Changes that are not backwards compatible label Jul 31, 2024
@adamjstewart adamjstewart added this to the 0.6.0 milestone Jul 31, 2024
@adamjstewart adamjstewart marked this pull request as draft July 31, 2024 14:24
@github-actions github-actions bot added the datasets Geospatial or benchmark datasets label Jul 31, 2024
@adamjstewart adamjstewart requested a review from ashnair1 July 31, 2024 17:05
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Aug 1, 2024
@adamjstewart adamjstewart changed the title SpaceNet: radiant mlhub -> aws SpaceNet: add SpaceNet 8, radiant mlhub -> aws Aug 2, 2024
@adamjstewart
Copy link
Collaborator Author

adamjstewart commented Aug 3, 2024

Not every chip has the same dimensions. Using the following command:

> find <dir> -name '*.tif' | xargs file | tr -s ' ' | cut -d ' ' -f 7,11 | sort | uniq -c | sort -nr | tr -s ' ' | sed 's/^/*/g'

SpaceNet 1

RGB:

  • 3372 height=406, width=439
  • 3096 height=406, width=438
  • 1719 height=407, width=439
  • 1548 height=407, width=440

MSI:

  • 8702 height=102, width=110
  • 1033 height=101, width=110

SpaceNet 2

MSI:

  • 12544 height=163, width=163
  • 1529 height=162, width=162
  • 46 height=162, width=163

Pansharpened:

  • 42357 height=650, width=650

SpaceNet 3

MSI:

  • 3708 height=325, width=325

Pansharpened:

  • 11124 height=1300, width=1300

SpaceNet 4

MSI:

  • 29655 height=225, width=225

Pansharpened:

  • 59310 height=900, width=900

SpaceNet 5

MSI:

  • 2588 height=325, width=325

Pansharpened:

  • 7764 height=1300, width=1300

SpaceNet 6

  • 5462 height=900, width=900

SpaceNet 7

  • 2488 height=1024, width=1024
  • 516 height=1023, width=1024
  • 410 height=1024, width=1023
  • 99 height=1023, width=1023

SpaceNet 8

For some reason the file command does not output dimensions for about half of the files (all 1300x1300), so I used the following script:

import glob
import os
import rasterio as rio

for p in glob.iglob(os.path.join('SN8_floods', '*', '*', '*.tif')):
    with rio.open(p) as f:
        print(f'height={f.height}, width={f.width}')

and ran:

> python3 test.py | sort | uniq -c | sort -nr | tr -s ' ' | sed 's/^/*/g'
  • 1207 height=1300, width=1300
  • 258 height=961, width=961
  • 240 height=835, width=835
  • 147 height=916, width=916
  • 100 height=1114, width=1114
  • 99 height=1048, width=1048
  • 90 height=786, width=786
  • 59 height=786, width=785
  • 44 height=834, width=835
  • 42 height=835, width=834
  • 34 height=785, width=786
  • 31 height=1743, width=1743
  • 25 height=916, width=915
  • 21 height=1742, width=1743
  • 20 height=915, width=916
  • 19 height=785, width=785
  • 17 height=1049, width=1048
  • 17 height=1048, width=1049
  • 13 height=961, width=962
  • 13 height=961, width=748
  • 11 height=1743, width=1742
  • 11 height=1114, width=1113
  • 11 height=1113, width=1114
  • 10 height=1742, width=1742
  • 8 height=962, width=961
  • 7 height=834, width=834
  • 5 height=1037, width=1743
  • 4 height=1048, width=715
  • 3 height=915, width=915
  • 3 height=1114, width=765
  • 2 height=1049, width=1049
  • 2 height=1037, width=1742
  • 1 height=962, width=962
  • 1 height=1113, width=765
  • 1 height=1113, width=1113

Previously, we always chose the smallest dimensions and indexed into the array since they were never off by more than 1. However, with SpaceNet 8 being so drastically different, I think we instead need to resample the images.

@adamjstewart adamjstewart marked this pull request as ready for review August 5, 2024 20:23
@adamjstewart
Copy link
Collaborator Author

adamjstewart commented Aug 5, 2024

I've been using the following script to test this on the real data (all 1.2 TB of it):

#!/usr/bin/env python3

"""Test all SpaceNet datasets."""

import itertools

from matplotlib import pyplot as plt
from torch.utils.data import DataLoader

import torchgeo.datasets
from torchgeo.datasets import SpaceNet                                                   


def test_dataset(ds: SpaceNet) -> None:
    """Test a single dataset."""
    print(ds.split, ds.aois[0], ds.image, ds.mask, len(ds))
    sample = ds[0]
    ds.plot(sample)
    plt.close()
    dl = DataLoader(ds, batch_size=8, shuffle=True)
    next(iter(dl))


for i in range(1, 9):
    print(f'SpaceNet {i}')
    SpaceNetX = getattr(torchgeo.datasets, f'SpaceNet{i}')
    for split in ['train', 'test']:
        for aoi, image, mask in itertools.product(
            SpaceNetX.valid_aois[split],
            SpaceNetX.valid_images[split],
            SpaceNetX.valid_masks,
        ):
            ds = SpaceNetX('data', split=split, aois=[aoi], image=image, mask=mask)
            test_dataset(ds)

@github-actions github-actions bot added testing Continuous integration testing dependencies Packaging and dependencies labels Aug 6, 2024
@calebrob6
Copy link
Member

Hard to review with all the changes to spacenet.py but I love that this removes our dependency on radiant-mlhub for 0.6. My biggest question is "does it work?", if you run a datamodule through all train/val/test batches for all spacenets do you hit any errors?

@adamjstewart
Copy link
Collaborator Author

Hard to review with all the changes to spacenet.py but I love that this removes our dependency on radiant-mlhub for 0.6. My biggest question is "does it work?", if you run a datamodule through all train/val/test batches for all spacenets do you hit any errors?

See #2203 (comment). It doesn't test the entire epoch, just a single batch. The entire epoch for all combinations would take a couple days. We don't yet have data modules for all datasets, only SpaceNet1.

@calebrob6
Copy link
Member

Can we just run it for a couple of days to verify if we have it all downloaded?

@adamjstewart
Copy link
Collaborator Author

I don't see a huge benefit over random sampling of mini-batches, but if you want to run it you can.

Copy link
Collaborator

@ashnair1 ashnair1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was able to run train/val/test on SpaceNet1 👍

@adamjstewart adamjstewart merged commit 880593e into microsoft:main Aug 17, 2024
19 checks passed
@adamjstewart adamjstewart deleted the datasets/spacenet branch August 17, 2024 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards-incompatible Changes that are not backwards compatible datasets Geospatial or benchmark datasets dependencies Packaging and dependencies documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate from Radiant MLHub to Source Cooperative
3 participants