Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding OSCD dataset #233

Merged
merged 77 commits into from
Nov 19, 2021
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
ca10f26
OSCD: initial template
iejMac Nov 10, 2021
a5f7b2f
updating download pattern
iejMac Nov 10, 2021
6a382a1
package: including OSCD in torchgeo.datasets
iejMac Nov 10, 2021
4c5996b
download: adapting download method to OSCD dataset + adding simple te…
iejMac Nov 10, 2021
3ef976f
_load_files method: temporary implementation
iejMac Nov 10, 2021
bfb73ef
OCSD: minimum working example, needs plenty improvement
iejMac Nov 10, 2021
0430b13
adding OSCD to docs
iejMac Nov 11, 2021
d1fc012
Moving test to appropriate location
iejMac Nov 11, 2021
bf0792f
Merge branch 'main' of https://github.com/iejMac/torchgeo into oscd
iejMac Nov 11, 2021
5f96aaa
OSCD: remove sort_bands and use utils.sort_sentinel2_bands
iejMac Nov 11, 2021
50d82f9
Using rasterio instead of tifffile
iejMac Nov 11, 2021
4f29155
remove useless import
iejMac Nov 11, 2021
3f3c9b3
Merge branch 'main' of https://github.com/iejMac/torchgeo into oscd
iejMac Nov 12, 2021
1e17bda
style changes
iejMac Nov 12, 2021
e12a0a2
fix: style
iejMac Nov 12, 2021
a005873
Merge branch 'main' of https://github.com/iejMac/torchgeo into oscd
iejMac Nov 15, 2021
42acbf7
Merge branch 'main' of https://github.com/iejMac/torchgeo into oscd
iejMac Nov 16, 2021
c774fcd
Developing tests for OSCD dataset
iejMac Nov 16, 2021
0d8e7c5
Updating dataset description
iejMac Nov 16, 2021
544d144
change name
iejMac Nov 16, 2021
b3da718
style fixes
iejMac Nov 16, 2021
7f66f6b
fixing mypy errors
iejMac Nov 16, 2021
2596506
style fixes
iejMac Nov 16, 2021
4cd64c1
cast to string to fix typing errors
iejMac Nov 16, 2021
491e4ed
style change
iejMac Nov 16, 2021
ffa6978
isort fix
iejMac Nov 16, 2021
164003d
remove TODO
iejMac Nov 16, 2021
ef8e92e
adding dataset for testing
iejMac Nov 16, 2021
dca02a1
change len
iejMac Nov 16, 2021
96a5947
check if sum is concatdataset
iejMac Nov 16, 2021
5fbdad3
isort fix
iejMac Nov 16, 2021
8513d6e
fixing some issues + correct md5 in dataset
iejMac Nov 17, 2021
9a2626b
closing rasterio file handles
iejMac Nov 17, 2021
961ebdc
removing some TODO's
iejMac Nov 17, 2021
ebf24c7
transitioning to fake data
iejMac Nov 17, 2021
730807f
mypy fix attempt
iejMac Nov 17, 2021
554d187
set fake data md5
iejMac Nov 17, 2021
e430f67
flake8 fix
iejMac Nov 17, 2021
8b32728
starting plot method
iejMac Nov 17, 2021
8770ff4
updating plot method
iejMac Nov 17, 2021
488978c
no predictions for now
iejMac Nov 17, 2021
36b8a82
fixing style errors
iejMac Nov 17, 2021
4cc5b2c
add testing for plot
iejMac Nov 17, 2021
bed3b40
making some changes to fake testing data
iejMac Nov 17, 2021
0b00589
full coverage
iejMac Nov 17, 2021
75b204e
Use RGB channels in the plot function
calebrob6 Nov 17, 2021
ffc64ff
adding shape tests in test_getitem
iejMac Nov 18, 2021
d889cbd
Merge branch 'oscd' of https://github.com/iejMac/torchgeo into oscd
iejMac Nov 18, 2021
dca5475
remove features and add to description
iejMac Nov 18, 2021
e4cdac9
fixing some things
iejMac Nov 18, 2021
e743f29
transitioning to authors dataset link
iejMac Nov 19, 2021
7c39193
No need to change file names + adapt test dataset
iejMac Nov 19, 2021
4bc7a92
adapting tests to new data format
iejMac Nov 19, 2021
d7c60ab
Update docs/api/datasets.rst
iejMac Nov 19, 2021
64b4028
closing plot at end of terst
iejMac Nov 19, 2021
0698a86
add versionadded
iejMac Nov 19, 2021
9d4a505
style fixes + indentation fixes
iejMac Nov 19, 2021
9c1f5b0
style fixes
iejMac Nov 19, 2021
dfda240
forgot the .zip
iejMac Nov 19, 2021
bfee62b
fix zipfile name
iejMac Nov 19, 2021
56b75df
temporary fix for flake8
iejMac Nov 19, 2021
ca93b69
Add link to docs
iejMac Nov 19, 2021
d40677e
forgot to adjust this
iejMac Nov 19, 2021
ad2ce34
changing flake8 solve
iejMac Nov 19, 2021
52046e9
slimming down the test dataset
iejMac Nov 19, 2021
cfbf26c
removing imgs_x files which aren't needed for current testing but mig…
iejMac Nov 19, 2021
9e320d6
Revert "removing imgs_x files which aren't needed for current testing…
iejMac Nov 19, 2021
d8148ec
nevermind, this was the issue
iejMac Nov 19, 2021
9488979
trying to remove these once again
iejMac Nov 19, 2021
9ef8b73
adding band choosing functionality
iejMac Nov 19, 2021
826d7a2
removing double code
iejMac Nov 19, 2021
3f7f800
removing more double code
iejMac Nov 19, 2021
c75b1ae
flake8 fix
iejMac Nov 19, 2021
24879ef
adding one more training sample to dummy dataset and testing split
iejMac Nov 19, 2021
e0fa092
typing numpy array
iejMac Nov 19, 2021
9d55b03
back to this
iejMac Nov 19, 2021
9615cb8
Fixing tests and mypy
calebrob6 Nov 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/api/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,11 @@ LEVIR-CD+ (LEVIR Change Detection +)

.. autoclass:: LEVIRCDPlus

OSCD (OSCD Change Detection)
iejMac marked this conversation as resolved.
Show resolved Hide resolved
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
iejMac marked this conversation as resolved.
Show resolved Hide resolved

.. autoclass:: OSCD

PatternNet
^^^^^^^^^^

Expand Down
9 changes: 9 additions & 0 deletions tests/datasets/test_oscd.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from torchgeo.datasets import OSCD
iejMac marked this conversation as resolved.
Show resolved Hide resolved

ocsd = OSCD(download=True)

sample = ocsd[1]

print(sample["image"].shape)
print(sample["mask"].shape)

2 changes: 2 additions & 0 deletions torchgeo/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
from .levircd import LEVIRCDPlus
from .naip import NAIP, NAIPChesapeakeDataModule
from .nwpu import VHR10
from .oscd import OSCD
from .patternnet import PatternNet
from .resisc45 import RESISC45, RESISC45DataModule
from .seco import SeasonalContrastS2
Expand Down Expand Up @@ -111,6 +112,7 @@
"LandCoverAI",
"LandCoverAIDataModule",
"LEVIRCDPlus",
"OSCD",
"PatternNet",
"RESISC45",
"RESISC45DataModule",
Expand Down
227 changes: 227 additions & 0 deletions torchgeo/datasets/oscd.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

"""OSCD dataset."""

import os
import glob
import rasterio
from typing import Callable, Dict, List, Optional

import torch
import numpy as np
from PIL import Image
from torch import Tensor

from .geo import VisionDataset
from .utils import download_url, extract_archive, sort_sentinel2_bands


class OSCD(VisionDataset):
# TODO: update this to OSCD
"""OSCD dataset.

The `LEVIR-CD+ <https://github.com/S2Looking/Dataset>`_
dataset is a dataset for building change detection.

Dataset features:

* image pairs of 20 different urban regions across Texas between 2002-2020
* binary change masks representing building change
* three spectral bands - RGB
* 985 image pairs with 50 cm per pixel resolution (~1024x1024 px)

Dataset format:

* images are three-channel pngs
* masks are single-channel pngs where no change = 0, change = 255

Dataset classes:

1. no change
iejMac marked this conversation as resolved.
Show resolved Hide resolved
2. change

If you use this dataset in your research, please cite the following paper:

* https://arxiv.org/abs/2107.09244
"""

url = "https://drive.google.com/file/d/1jidN0DKEIybOrP0j7Bos8bGDDq3Varj3"
iejMac marked this conversation as resolved.
Show resolved Hide resolved
md5 = "1adf156f628aa32fb2e8fe6cada16c04" # TODO: find this

# TODO: find better way to solve nested zip file structure
zipfile_glob = "*OSCD.zip"
zipfile_glob2 = "*Onera*.zip"
# TODO: need to change filename_glob due to how this is checked in verify
filename_glob = "*Onera*"
filename = "OSCD.zip"
splits = ["train", "test"]

def __init__(
self,
root: str = "data",
split: str = "train",
transforms: Optional[Callable[[Dict[str, Tensor]], Dict[str, Tensor]]] = None,
download: bool = False,
checksum: bool = False,
) -> None:
"""Initialize a new LEVIR-CD+ dataset instance.

Args:
root: root directory where dataset can be found
split: one of "train" or "test"
transforms: a function/transform that takes input sample and its target as
entry and returns a transformed version
download: if True, download dataset and store it in the root directory
checksum: if True, check the MD5 of the downloaded files (may be slow)

Raises:
AssertionError: if ``split`` argument is invalid
RuntimeError: if ``download=False`` and data is not found, or checksums
don't match
"""
assert split in self.splits

self.root = root
self.split = split
self.transforms = transforms
self.download = download
self.checksum = checksum

self._verify()

self.files = self._load_files()

def __getitem__(self, index: int) -> Dict[str, Tensor]:
"""Return an index within the dataset.

Args:
index: index to return

Returns:
data and label at that index
"""
files = self.files[index]
# TODO: implement choosing bands (right now assuming bands="all")
image1 = self._load_image(files["images1"])
image2 = self._load_image(files["images2"])
mask = self._load_target(files["mask"])

image = torch.stack(tensors=[image1, image2], dim=0)
sample = {"image": image, "mask": mask}

if self.transforms is not None:
sample = self.transforms(sample)

return sample

def __len__(self) -> int:
"""Return the number of data points in the dataset.

Returns:
length of the dataset
"""
return len(self.files)

# TODO: this needs to be refactored
def _load_files(self) -> List[Dict[str, str]]:
regions = []
temp_split = "Test" if self.split == "test" else "Train"
labels_root = os.path.join(self.root, f"Onera Satellite Change Detection dataset - {temp_split} Labels")
images_root = os.path.join(self.root, "Onera Satellite Change Detection dataset - Images")
folders = glob.glob(os.path.join(labels_root, "*/"))
for folder in folders:
region = folder.split(os.sep)[-2]
mask = os.path.join(labels_root, region, "cm", "cm.png")
images1 = glob.glob(os.path.join(images_root, region, "imgs_1_rect", "*.tif"))
images2 = glob.glob(os.path.join(images_root, region, "imgs_2_rect", "*.tif"))
images1 = sorted(images1, key=sort_sentinel2_bands)
images2 = sorted(images2, key=sort_sentinel2_bands)
with open(os.path.join(images_root, region, "dates.txt")) as f:
dates = tuple([line.split()[-1] for line in f.read().strip().splitlines()])

regions.append(dict(region=region, images1=images1, images2=images2, mask=mask, dates=dates))

return regions

def _load_image(self, paths: List[str]) -> Tensor:
"""Load a single image.

Args:
path: path to the image

Returns:
the image
"""

# images = np.stack([tifffile.imread(path) for path in paths], axis=0)
images = np.stack([rasterio.open(path).read() for path in paths], axis=0)
iejMac marked this conversation as resolved.
Show resolved Hide resolved
images = images.astype(np.long)
return torch.from_numpy(images)


def _load_target(self, path: str) -> Tensor:
"""Load the target mask for a single image.

Args:
path: path to the image

Returns:
the target mask
"""
filename = os.path.join(path)
with Image.open(filename) as img:
array = np.array(img.convert("L"))
tensor: Tensor = torch.from_numpy(array) # type: ignore[attr-defined]
tensor = torch.clamp(tensor, min=0, max=1) # type: ignore[attr-defined]
tensor = tensor.to(torch.long) # type: ignore[attr-defined]
return tensor

def _verify(self) -> None:
"""Verify the integrity of the dataset.
Raises:
RuntimeError: if ``download=False`` but dataset is missing or checksum fails
"""

# Check if the extracted files already exist
pathname = os.path.join(self.root, "**", self.filename_glob)
for fname in glob.iglob(pathname, recursive=True):
if not fname.endswith(".zip"):
return

# Check if the zip files have already been downloaded
pathname = os.path.join(self.root, self.zipfile_glob)
if glob.glob(pathname):
self._extract()
return

# Check if the user requested to download the dataset
if not self.download:
raise RuntimeError(
f"Dataset not found in `root={self.root}` and `download=False`, "
"either specify a different `root` directory or use `download=True` "
"to automaticaly download the dataset."
)

# Download the dataset
self._download()
self._extract()

def _download(self) -> None:
"""Download the dataset."""
download_url(
self.url,
self.root,
filename=self.filename,
md5=md5 if self.checksum else None,
)

def _extract(self) -> None:
"""Extract the dataset."""
pathname = os.path.join(self.root, self.zipfile_glob)
for zipfile in glob.iglob(pathname):
extract_archive(zipfile)
# TODO: nicer way to solve this nested zipfile structure
iejMac marked this conversation as resolved.
Show resolved Hide resolved
pathname = os.path.join(self.root, self.zipfile_glob2)
for zipfile in glob.iglob(pathname):
extract_archive(zipfile)