Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add random crop logic to DeepGlobeLandCover Datamodule #876

Merged
merged 40 commits into from
Dec 30, 2022
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
ac16986
crop logic
nilsleh Oct 29, 2022
d74bae9
typo
nilsleh Oct 29, 2022
d464034
change train_batch_size logic
nilsleh Nov 17, 2022
2fa4f93
fix failing test
nilsleh Nov 23, 2022
304b253
typos and naming
nilsleh Dec 3, 2022
64c2243
return argument train dataloader
nilsleh Dec 3, 2022
9504cf5
typo
nilsleh Dec 3, 2022
e2569f6
fix failing test
nilsleh Dec 14, 2022
e90d926
suggestions except about test file
nilsleh Dec 19, 2022
ae22fc2
remove test_deepglobe and add test to trainer
nilsleh Dec 20, 2022
da023ac
forgot new conf file
nilsleh Dec 20, 2022
368fbbb
reanme collate function
nilsleh Dec 20, 2022
780d2f0
move cropping logic to transform and utils
nilsleh Dec 20, 2022
0ed41e9
remove comment
nilsleh Dec 20, 2022
80747d9
simplify
nilsleh Dec 20, 2022
383f40f
move pad_segmentation to transforms
nilsleh Dec 20, 2022
0ec2ce0
another one
nilsleh Dec 21, 2022
4fa0a1e
naming and versionadded
nilsleh Dec 21, 2022
ad39c6b
another transforms approach
nilsleh Dec 23, 2022
9cf99c1
typo
nilsleh Dec 23, 2022
c3e2325
fix read the docs
nilsleh Dec 23, 2022
873c9af
some checks for Ncrop
nilsleh Dec 23, 2022
5b0c6b0
add unit tests new transforms
nilsleh Dec 23, 2022
e976630
Remove cruft
adamjstewart Dec 27, 2022
621a905
More simplification
adamjstewart Dec 27, 2022
5ab6535
Add config file
adamjstewart Dec 28, 2022
ae17d7d
Implemented ExtractTensorPatches
adamjstewart Dec 28, 2022
f2a0dd8
Remove tests
adamjstewart Dec 28, 2022
d23fc7c
Remove unnecessary attrs
adamjstewart Dec 28, 2022
5412449
Apply to both input and mask
adamjstewart Dec 28, 2022
4e3a1b0
Implement RandomNCrop
adamjstewart Dec 28, 2022
041aaca
Fix dimensions
adamjstewart Dec 28, 2022
d31c48d
mypy fixes
adamjstewart Dec 29, 2022
9af4d43
Fix docs
adamjstewart Dec 29, 2022
32b610c
Ensure that image and mask get the same transformation
adamjstewart Dec 29, 2022
03d96f3
Bump min kornia version
adamjstewart Dec 29, 2022
e680000
ignore still needed?
adamjstewart Dec 29, 2022
2857b7b
Remove unneeded hacks
adamjstewart Dec 29, 2022
d35773b
Fix pydocstyle
adamjstewart Dec 29, 2022
50ece12
Fix dimensions
adamjstewart Dec 29, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/api/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,8 @@ Kenya Crop Type

.. autoclass:: CV4AKenyaCropType

Deep Globe Land Cover
^^^^^^^^^^^^^^^^^^^^^
DeepGlobe Land Cover
^^^^^^^^^^^^^^^^^^^^

.. autoclass:: DeepGlobeLandCover

Expand Down
4 changes: 2 additions & 2 deletions docs/api/non_geo_datasets.csv
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Dataset,Task,Source,# Samples,# Classes,Size (px),Resolution (m),Bands
`Cloud Cover Detection`_,S,Sentinel-2,"22,728",2,512x512,10,MSI
`COWC`_,"C, R","CSUAV AFRL, ISPRS, LINZ, AGRC","388,435",2,256x256,0.15,RGB
`Kenya Crop Type`_,S,Sentinel-2,"4,688",7,"3,035x2,016",10,MSI
`Deep Globe Land Cover`_,S,DigitalGlobe +Vivid,803,7,"2,448x2,448",0.5,RGB
`DeepGlobe Land Cover`_,S,DigitalGlobe +Vivid,803,7,"2,448x2,448",0.5,RGB
`DFC2022`_,S,Aerial,"3,981",15,"2,000x2,000",0.5,RGB
`ETCI2021 Flood Detection`_,S,Sentinel-1,"66,810",2,256x256,5--20,SAR
`EuroSAT`_,C,Sentinel-2,"27,000",10,64x64,10,MSI
Expand Down Expand Up @@ -34,4 +34,4 @@ Dataset,Task,Source,# Samples,# Classes,Size (px),Resolution (m),Bands
`Vaihingen`_,S,Aerial,33,6,"1,281--3,816",0.09,RGB
`NWPU VHR-10`_,I,"Google Earth, Vaihingen",800,10,"358--1,728",0.08--2,RGB
`xView2`_,CD,Maxar,"3,732",4,"1,024x1,024",0.8,RGB
`ZueriCrop`_,"I, T",Sentinel-2,116K,48,24x24,10,MSI
`ZueriCrop`_,"I, T",Sentinel-2,116K,48,24x24,10,MSI
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ dependencies:
- flake8>=3.8
- ipywidgets>=7
- isort[colors]>=5.8
- kornia>=0.6.4
- kornia>=0.6.5
- laspy>=2
- mypy>=0.900
- nbmake>=0.1
Expand Down
2 changes: 1 addition & 1 deletion requirements/min.old
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ setuptools==42.0.0
# install
einops==0.3.0
fiona==1.8.0
kornia==0.6.4
kornia==0.6.5
matplotlib==3.3.0
numpy==1.17.2
omegaconf==2.1.0
Expand Down
4 changes: 2 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ install_requires =
einops>=0.3,<0.7
# fiona 1.8+ required for reading empty files
fiona>=1.8,<2
# kornia 0.6.4+ required for kornia.contrib.compute_padding
kornia>=0.6.4,<0.7
# kornia 0.6.5+ required due to change in kornia.augmentation API
kornia>=0.6.5,<0.7
Comment on lines +32 to +33
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Kornia 0.6.5+, all augmentation instance methods have a new flags parameter. So the transforms I added won't work with Kornia 0.6.4 and older. Once we upstream these transforms to Kornia, we'll need to depend on an even newer version anyway.

# matplotlib 3.3+ required for (H, W, 1) image support in plt.imshow
matplotlib>=3.3,<4
# numpy 1.17.2+ required by pytorch-lightning
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ experiment:
ignore_index: null
datamodule:
root: "tests/data/deepglobelandcover"
num_tiles_per_batch: 1
num_patches_per_tile: 1
patch_size: 2
val_split_pct: 0.5
batch_size: 1
num_workers: 0
19 changes: 0 additions & 19 deletions tests/conf/deepglobelandcover_0.yaml

This file was deleted.

3 changes: 1 addition & 2 deletions tests/trainers/test_segmentation.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,7 @@ class TestSemanticSegmentationTask:
"name,classname",
[
("chesapeake_cvpr_5", ChesapeakeCVPRDataModule),
("deepglobelandcover_0", DeepGlobeLandCoverDataModule),
("deepglobelandcover_5", DeepGlobeLandCoverDataModule),
("deepglobelandcover", DeepGlobeLandCoverDataModule),
("etci2021", ETCI2021DataModule),
("inria_train", InriaAerialImageLabelingDataModule),
("inria_val", InriaAerialImageLabelingDataModule),
Expand Down
128 changes: 75 additions & 53 deletions torchgeo/datamodules/deepglobelandcover.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,87 +3,92 @@

"""DeepGlobe Land Cover Classification Challenge datamodule."""

from typing import Any, Dict, Optional
from typing import Any, Dict, Optional, Tuple, Union

import matplotlib.pyplot as plt
import pytorch_lightning as pl
from torch.utils.data import DataLoader, Dataset
from torchvision.transforms import Compose
from kornia.augmentation import Normalize
Comment on lines -11 to +10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning on removing all torchvision transforms. Torchvision relies on PIL for many of its transforms, which doesn't support MSI. Kornia has all of the same transforms, but they are in pure PyTorch, so they can run on the GPU and support MSI. I don't see a good reason not to only use Kornia transforms.

from torch.utils.data import DataLoader

from ..datasets import DeepGlobeLandCover
from ..samplers.utils import _to_tuple
from ..transforms import AugmentationSequential
from ..transforms.transforms import _ExtractTensorPatches, _RandomNCrop
from .utils import dataset_split


class DeepGlobeLandCoverDataModule(pl.LightningDataModule):
"""LightningDataModule implementation for the DeepGlobe Land Cover dataset.

Uses the train/test splits from the dataset.

"""

def __init__(
self,
batch_size: int = 64,
num_workers: int = 0,
num_tiles_per_batch: int = 16,
num_patches_per_tile: int = 16,
patch_size: Union[Tuple[int, int], int] = 64,
val_split_pct: float = 0.2,
num_workers: int = 0,
**kwargs: Any,
) -> None:
"""Initialize a LightningDataModule for DeepGlobe Land Cover based DataLoaders.
"""Initialize a new LightningDataModule instance.

The DeepGlobe Land Cover dataset contains images that are too large to pass
directly through a model. Instead, we randomly sample patches from image tiles
during training and chop up image tiles into patch grids during evaluation.
During training, the effective batch size is equal to
``num_tiles_per_batch`` x ``num_patches_per_tile``.

Args:
batch_size: The batch size to use in all created DataLoaders
num_workers: The number of workers to use in all created DataLoaders
val_split_pct: What percentage of the dataset to use as a validation set
num_tiles_per_batch: The number of image tiles to sample from during
training
num_patches_per_tile: The number of patches to randomly sample from each
image tile during training
patch_size: The size of each patch, either ``size`` or ``(height, width)``.
Should be a multiple of 32 for most segmentation architectures
val_split_pct: The percentage of the dataset to use as a validation set
num_workers: The number of workers to use for parallel data loading
**kwargs: Additional keyword arguments passed to
:class:`~torchgeo.datasets.DeepGlobeLandCover`

adamjstewart marked this conversation as resolved.
Show resolved Hide resolved
.. versionchanged:: 0.4
*batch_size* was replaced by *num_tile_per_batch*, *num_patches_per_tile*,
and *patch_size*.
Comment on lines +56 to +57
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to only document API changes, not internal changes. So the fact that we're now using random cropping isn't documented, only that the parameters changed.

"""
super().__init__()
self.batch_size = batch_size
self.num_workers = num_workers

self.num_tiles_per_batch = num_tiles_per_batch
self.num_patches_per_tile = num_patches_per_tile
self.patch_size = _to_tuple(patch_size)
self.val_split_pct = val_split_pct
self.num_workers = num_workers
self.kwargs = kwargs

def preprocess(self, sample: Dict[str, Any]) -> Dict[str, Any]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot use instance methods as transforms, see #886 for what happens when we do.

"""Transform a single sample from the Dataset.

Args:
sample: input image dictionary

Returns:
preprocessed sample
"""
sample["image"] = sample["image"].float()
sample["image"] /= 255.0
return sample
self.train_transform = AugmentationSequential(
Normalize(mean=0.0, std=255.0),
_RandomNCrop(self.patch_size, self.num_patches_per_tile),
data_keys=["image", "mask"],
)
self.test_transform = AugmentationSequential(
Normalize(mean=0.0, std=255.0),
_ExtractTensorPatches(self.patch_size),
data_keys=["image", "mask"],
)

def setup(self, stage: Optional[str] = None) -> None:
"""Initialize the main ``Dataset`` objects.
"""Initialize the main Dataset objects.

This method is called once per GPU per run.

Args:
stage: stage to set up
"""
transforms = Compose([self.preprocess])

dataset = DeepGlobeLandCover(
split="train", transforms=transforms, **self.kwargs
)

self.train_dataset: Dataset[Any]
self.val_dataset: Dataset[Any]

if self.val_split_pct > 0.0:
self.train_dataset, self.val_dataset, _ = dataset_split(
dataset, val_pct=self.val_split_pct, test_pct=0.0
)
else:
self.train_dataset = dataset
self.val_dataset = dataset
Comment on lines -73 to -82
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea why our previous logic was so complicated, but I don't think it needs to be.


self.test_dataset = DeepGlobeLandCover(
split="test", transforms=transforms, **self.kwargs
train_dataset = DeepGlobeLandCover(split="train", **self.kwargs)
adamjstewart marked this conversation as resolved.
Show resolved Hide resolved
self.train_dataset, self.val_dataset = dataset_split(
train_dataset, self.val_split_pct
)
self.test_dataset = DeepGlobeLandCover(split="test", **self.kwargs)
adamjstewart marked this conversation as resolved.
Show resolved Hide resolved

def train_dataloader(self) -> DataLoader[Dict[str, Any]]:
"""Return a DataLoader for training.
Expand All @@ -93,7 +98,7 @@ def train_dataloader(self) -> DataLoader[Dict[str, Any]]:
"""
return DataLoader(
self.train_dataset,
batch_size=self.batch_size,
batch_size=self.num_tiles_per_batch,
num_workers=self.num_workers,
shuffle=True,
)
Expand All @@ -105,10 +110,7 @@ def val_dataloader(self) -> DataLoader[Dict[str, Any]]:
validation data loader
"""
return DataLoader(
self.val_dataset,
batch_size=self.batch_size,
num_workers=self.num_workers,
shuffle=False,
self.val_dataset, batch_size=1, num_workers=self.num_workers, shuffle=False
)

def test_dataloader(self) -> DataLoader[Dict[str, Any]]:
Expand All @@ -118,12 +120,32 @@ def test_dataloader(self) -> DataLoader[Dict[str, Any]]:
testing data loader
"""
return DataLoader(
self.test_dataset,
batch_size=self.batch_size,
num_workers=self.num_workers,
shuffle=False,
self.test_dataset, batch_size=1, num_workers=self.num_workers, shuffle=False
)

def on_after_batch_transfer(
self, batch: Dict[str, Any], dataloader_idx: int
) -> Dict[str, Any]:
"""Apply augmentations to batch after transferring to GPU.

Args:
batch: A batch of data that needs to be altered or augmented
dataloader_idx: The index of the dataloader to which the batch belongs

Returns:
A batch of data
"""
if self.trainer:
if self.trainer.training:
Comment on lines +141 to +142
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much cleaner than our previous logic!

batch = self.train_transform(batch)
elif self.trainer.validating or self.trainer.testing:
batch = self.test_transform(batch)

# Kornia adds a channel dimension to the mask
batch["mask"] = batch["mask"].squeeze(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kornia does a lot of weird stuff with transforms that I don't like. Masks are required to be floats (why? slower, more storage). If the mask you input doesn't have a channel dimension, it will add one. Some of the transforms actually break if the mask doesn't have a channel dimension when you input it, so we may need to add an unsqueeze above.


return batch

def plot(self, *args: Any, **kwargs: Any) -> plt.Figure:
"""Run :meth:`torchgeo.datasets.DeepGlobeLandCover.plot`.

Expand Down
4 changes: 2 additions & 2 deletions torchgeo/datamodules/inria.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

"""InriaAerialImageLabeling datamodule."""

from typing import Any, Dict, List, Optional, Tuple, Union, cast
from typing import Any, Dict, List, Optional, Tuple, Union

import kornia.augmentation as K
import matplotlib.pyplot as plt
Expand Down Expand Up @@ -69,7 +69,7 @@ def __init__(
self.num_workers = num_workers
self.val_split_pct = val_split_pct
self.test_split_pct = test_split_pct
self.patch_size = cast(Tuple[int, int], _to_tuple(patch_size))
self.patch_size = _to_tuple(patch_size)
self.num_patches_per_tile = num_patches_per_tile
self.kwargs = kwargs

Expand Down
2 changes: 1 addition & 1 deletion torchgeo/datasets/deepglobelandcover.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ def _load_image(self, index: int) -> Tensor:
array: "np.typing.NDArray[np.int_]" = np.array(img)
tensor = torch.from_numpy(array)
# Convert from HxWxC to CxHxW
tensor = tensor.permute((2, 0, 1))
tensor = tensor.permute((2, 0, 1)).to(torch.float32)
return tensor

def _load_target(self, index: int) -> Tensor:
Expand Down
12 changes: 11 additions & 1 deletion torchgeo/samplers/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,23 @@
"""Common sampler utilities."""

import math
from typing import Optional, Tuple, Union
from typing import Optional, Tuple, Union, overload

import torch

from ..datasets import BoundingBox


@overload
def _to_tuple(value: Union[Tuple[int, int], int]) -> Tuple[int, int]:
...


@overload
def _to_tuple(value: Union[Tuple[float, float], float]) -> Tuple[float, float]:
...
Comment on lines +14 to +21
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python typing, all ints are floats, but not all floats are ints. This meant that if I pass an int as input, mypy would consider its output type to be float. These overloads ensure that int maps to int and float maps to float as expected.



def _to_tuple(value: Union[Tuple[float, float], float]) -> Tuple[float, float]:
"""Convert value to a tuple if it is not already a tuple.

Expand Down
Loading