Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import for BraTS dataset format #628

Merged
merged 91 commits into from
Mar 16, 2022
Merged
Show file tree
Hide file tree
Changes from 88 commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
f274bd3
Move image classes
Nov 6, 2021
4bf67e3
Update image class uses
Nov 6, 2021
fdf3258
Update changelog
Nov 6, 2021
ae12800
update imports
Nov 6, 2021
46d41e7
Fix cache parameter
Nov 6, 2021
8b06f57
Remove extra parameter
Nov 6, 2021
a03c21c
Fix import
Nov 6, 2021
1716652
Introduce pcint cloud, deprecate old members
Nov 6, 2021
e0675cf
Support for generic media in datumaro format
Nov 6, 2021
24ca9f2
Provide backward compat alias for DTypeLike
Nov 6, 2021
9f13a58
Merge branch 'zm/update-image-classes' into zm/generic-media
Nov 6, 2021
e360e24
Ignore own deprecation warnings
Nov 8, 2021
b864a0a
Add deprecation warnings on annotation imports
Nov 8, 2021
d80de52
Add message
Nov 8, 2021
d3359b3
Merge develop
Nov 10, 2021
93e3126
Merge branch 'develop' into zm/generic-media
Nov 10, 2021
83ae399
Import for BraTS dataset format
yasakova-anastasia Jan 14, 2022
c8f2b1e
fixes
yasakova-anastasia Jan 17, 2022
55acaea
import for BraTS Numpy dataset format
yasakova-anastasia Jan 19, 2022
4fd723c
fix BraTS
yasakova-anastasia Jan 19, 2022
aeb419c
update requirements
yasakova-anastasia Jan 19, 2022
f20e35c
Update documentation
yasakova-anastasia Jan 19, 2022
c46a56b
Reading pickle files is moved to a separate file
yasakova-anastasia Jan 19, 2022
1c94f96
Fixes
yasakova-anastasia Jan 19, 2022
53fc3cc
Fix pylint
yasakova-anastasia Jan 19, 2022
f01aa8b
Merge branch 'develop' into ay/brats-format
yasakova-anastasia Jan 19, 2022
abe2545
Remove unused import
yasakova-anastasia Jan 19, 2022
1a8a22e
Update Changelog
yasakova-anastasia Jan 19, 2022
a613722
Fixes
yasakova-anastasia Jan 20, 2022
e4406eb
resolve conflcts
yasakova-anastasia Jan 21, 2022
341d3f5
change image to PCD
yasakova-anastasia Jan 21, 2022
a884603
Remove unused import
yasakova-anastasia Jan 21, 2022
085eb7e
Add a new media type
yasakova-anastasia Jan 24, 2022
586e3f5
Update Changelog
yasakova-anastasia Jan 24, 2022
652a94f
Fixes
yasakova-anastasia Jan 24, 2022
254a679
Sort imports
yasakova-anastasia Jan 24, 2022
021c021
Small fix
yasakova-anastasia Jan 25, 2022
0f8520c
Add save-media, replace image and point cloud with media
yasakova-anastasia Feb 3, 2022
18cbb05
Add 'media_type' to Extractors
yasakova-anastasia Feb 14, 2022
304a988
Resolve conflicts
yasakova-anastasia Feb 14, 2022
3a033bc
Fix pylint
yasakova-anastasia Feb 14, 2022
867b9d3
Replace image with media in Tensorflow Extractor
yasakova-anastasia Feb 14, 2022
e0e3681
Some fixes in MPII format
yasakova-anastasia Feb 14, 2022
8e0d538
Small fix
yasakova-anastasia Feb 14, 2022
c182e5b
Correction of some points
yasakova-anastasia Feb 15, 2022
8c6fb75
Move 'check_media_type' to 'init_cache()'
yasakova-anastasia Feb 16, 2022
6a66917
Fixes
yasakova-anastasia Feb 16, 2022
9b78f7a
Add error when multiple media types are in Datumaro format
yasakova-anastasia Feb 16, 2022
8ad43eb
Fixes
yasakova-anastasia Feb 16, 2022
0eb3e97
Add checks for media type in converters
yasakova-anastasia Feb 16, 2022
2b3f2ee
Fixes
yasakova-anastasia Feb 16, 2022
c31858d
Resolve conflicts
yasakova-anastasia Feb 16, 2022
45fd612
Replace 'require_images' with 'require_media'
yasakova-anastasia Feb 16, 2022
1a2355e
Sort imports
yasakova-anastasia Feb 16, 2022
4f9df5f
Fixes
yasakova-anastasia Feb 16, 2022
b942965
Add 'media_type' to 'from_iterable()'
yasakova-anastasia Feb 18, 2022
f730e10
Fix checks for media type in converters
yasakova-anastasia Feb 18, 2022
c330a10
Fix pylint
yasakova-anastasia Feb 18, 2022
92af796
Fix merging
yasakova-anastasia Feb 18, 2022
6b416ec
Fix pylint
yasakova-anastasia Feb 18, 2022
bcde3d9
Resolve conflicts
yasakova-anastasia Feb 19, 2022
7f8d19d
Fix pylint
yasakova-anastasia Feb 19, 2022
ec1d9fa
Fixes
yasakova-anastasia Feb 21, 2022
c32b1ea
Fix pylint
yasakova-anastasia Feb 22, 2022
4d1fc6d
Add test for point cloud merging
yasakova-anastasia Feb 22, 2022
1c7d1d8
Small fix in test
yasakova-anastasia Feb 22, 2022
17b3b0b
Fixes
yasakova-anastasia Feb 22, 2022
7ee9277
Fix codacy
yasakova-anastasia Feb 22, 2022
8dee9ed
Fix checks for media types in extractors
yasakova-anastasia Feb 22, 2022
2300ec0
Fixes
yasakova-anastasia Feb 22, 2022
c356ce4
Merge develop
yasakova-anastasia Feb 24, 2022
d95ef37
Merge zm/generic-media
yasakova-anastasia Feb 24, 2022
1ab5238
Fixes
yasakova-anastasia Feb 25, 2022
218fa96
Remove unused import
yasakova-anastasia Feb 25, 2022
0806555
Resolve conflicts
yasakova-anastasia Mar 9, 2022
1f31008
Add merging multiframe images
yasakova-anastasia Mar 9, 2022
a4b8903
Update Changelog
yasakova-anastasia Mar 9, 2022
b6c9728
Sort imports
yasakova-anastasia Mar 9, 2022
ba634fc
Remove extra changes
yasakova-anastasia Mar 9, 2022
9b34b13
Update documentation
yasakova-anastasia Mar 9, 2022
16bb025
Resolve conflicts
yasakova-anastasia Mar 11, 2022
7715812
Small fixes
yasakova-anastasia Mar 11, 2022
606830d
Fix code style
yasakova-anastasia Mar 11, 2022
363be57
Remove extra changes
yasakova-anastasia Mar 11, 2022
b7b0f19
Fix a type annotation
yasakova-anastasia Mar 14, 2022
fe81895
Fix masks
yasakova-anastasia Mar 16, 2022
cb2c036
Fix code style
yasakova-anastasia Mar 16, 2022
e7b5df0
Fix code style
yasakova-anastasia Mar 16, 2022
0c8fa0b
Fix documentation
yasakova-anastasia Mar 16, 2022
635709d
Fixes
yasakova-anastasia Mar 16, 2022
62d3c9b
Update requirements-core.txt
Mar 16, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
(<https://github.com/openvinotoolkit/datumaro/pull/539>)
- \[API\] A way to request dataset and extractor media type with `media_type`
(<https://github.com/openvinotoolkit/datumaro/pull/539>)
- BraTS format (import-only) (.npy and .nii.gz), new `MultiframeImage`
media type (<https://github.com/openvinotoolkit/datumaro/pull/628>)

### Changed
- TBD
Expand Down
27 changes: 27 additions & 0 deletions datumaro/components/media.py
Original file line number Diff line number Diff line change
Expand Up @@ -500,3 +500,30 @@ def __init__(self, path: str, extra_images: Optional[List[Image]] = None):
self._path = path

self.extra_images: List[Image] = extra_images or []


class MultiframeImage(MediaElement):
def __init__(
self,
images: Optional[Iterable[Union[str, Image, np.ndarray, Callable[[str], np.ndarray]]]],
*,
path: Optional[str] = None,
):
self._path = path

self._images = [None] * len(images or [])
for i, image in enumerate(images or []):
assert isinstance(image, (str, Image, np.ndarray)) or callable(image)

if isinstance(image, str):
image = Image(path=image)
elif isinstance(image, np.ndarray) or callable(image):
image = Image(data=image)

self._images[i] = image

assert self._path or self._images

@property
def data(self) -> List[Image]:
return self._images
38 changes: 37 additions & 1 deletion datumaro/components/operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
WrongGroupError,
)
from datumaro.components.extractor import CategoriesInfo, DatasetItem
from datumaro.components.media import Image, MediaElement, PointCloud, Video
from datumaro.components.media import Image, MediaElement, MultiframeImage, PointCloud, Video
from datumaro.util import filter_dict, find
from datumaro.util.annotation_util import (
OKS,
Expand Down Expand Up @@ -187,6 +187,10 @@ def _merge_media(
not item_b.media or isinstance(item_b.media, Video)
):
media = cls._merge_videos(item_a, item_b)
elif (not item_a.media or isinstance(item_a.media, MultiframeImage)) and (
not item_b.media or isinstance(item_b.media, MultiframeImage)
):
media = cls._merge_multiframe_images(item_a, item_b)
elif (not item_a.media or isinstance(item_a.media, MediaElement)) and (
not item_b.media or isinstance(item_b.media, MediaElement)
):
Expand Down Expand Up @@ -330,6 +334,38 @@ def _merge_videos(item_a: DatasetItem, item_b: DatasetItem) -> Video:

return media

@staticmethod
def _merge_multiframe_images(item_a: DatasetItem, item_b: DatasetItem) -> MultiframeImage:
media = None

if isinstance(item_a.media, MultiframeImage) and isinstance(item_b.media, MultiframeImage):
if item_a.media.path and item_b.media.path and item_a.media.path != item_b.media.path:
raise MismatchingMediaPathError(
(item_a.id, item_a.subset), item_a.media.path, item_b.media.path
)

if item_a.media.path or item_a.media.data:
media = item_a.media

if item_b.media.data:
for image in item_b.media.data:
if image not in media.data:
media.data.append(image)
else:
media = item_b.media

if item_a.media.data:
for image in item_a.media.data:
if image not in media.data:
media.data.append(image)

elif isinstance(item_a.media, MultiframeImage):
media = item_a.media
else:
media = item_b.media

return media

@staticmethod
def _merge_anno(a: Iterable[Annotation], b: Iterable[Annotation]) -> List[Annotation]:
return merge_annotations_equal(a, b)
Expand Down
106 changes: 106 additions & 0 deletions datumaro/plugins/brats_format.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Copyright (C) 2022 Intel Corporation
#
# SPDX-License-Identifier: MIT

import glob
import os.path as osp

import nibabel as nib
import numpy as np

from datumaro.components.annotation import AnnotationType, LabelCategories, Mask
from datumaro.components.extractor import DatasetItem, Importer, SourceExtractor
from datumaro.components.format_detection import FormatDetectionContext
from datumaro.components.media import MultiframeImage


class BratsPath:
IMAGES_DIR = "images"
LABELS = "labels"
DATA_EXT = ".nii.gz"


class BratsExtractor(SourceExtractor):
def __init__(self, path):
if not osp.isdir(path):
raise FileNotFoundError("Can't read dataset directory '%s'" % path)

self._subset_suffix = osp.basename(path)[len(BratsPath.IMAGES_DIR) :]
subset = None
if self._subset_suffix == "Tr":
subset = "train"
elif self._subset_suffix == "Ts":
subset = "test"
super().__init__(subset=subset, media_type=MultiframeImage)

self._root_dir = osp.dirname(path)
self._categories = self._load_categories()
self._items = list(self._load_items(path).values())

def _load_categories(self):
label_cat = LabelCategories()

labels_path = osp.join(self._root_dir, BratsPath.LABELS)
if osp.isfile(labels_path):
with open(labels_path, encoding="utf-8") as f:
for line in f:
label_cat.add(line.strip())

return {AnnotationType.label: label_cat}

def _load_items(self, path):
items = {}

for image_path in glob.glob(osp.join(path, f"*{BratsPath.DATA_EXT}")):
data = nib.load(image_path).get_fdata()

item_id = osp.basename(image_path)[: -len(BratsPath.DATA_EXT)]

images = [0] * data.shape[2]
for i in range(data.shape[2]):
images[i] = data[:, :, i]

items[item_id] = DatasetItem(
id=item_id, subset=self._subset, media=MultiframeImage(images, path=image_path)
)

masks_dir = osp.join(self._root_dir, BratsPath.LABELS + self._subset_suffix)
for mask in glob.glob(osp.join(masks_dir, f"*{BratsPath.DATA_EXT}")):
data = nib.load(mask).get_fdata()

item_id = osp.basename(image_path)[: -len(BratsPath.DATA_EXT)]

if item_id not in items:
items[item_id] = DatasetItem(id=item_id)

anno = []
for i in range(data.shape[2]):
classes = np.unique(data[:, :, i])
for class_id in classes:
anno.append(
Mask(
image=self._lazy_extract_mask(data[:, :, i], class_id),
label=class_id,
attributes={"image_id": i},
)
)

items[item_id].annotations = anno

return items

@staticmethod
def _lazy_extract_mask(mask, c):
return lambda: mask == c


class BratsImporter(Importer):
@classmethod
def detect(cls, context: FormatDetectionContext) -> None:
with context.require_any():
with context.alternative():
context.require_file(f"*/*{BratsPath.DATA_EXT}")

@classmethod
def find_sources(cls, path):
return cls._find_sources_recursive(path, "", "brats", filename=f"{BratsPath.IMAGES_DIR}*")
106 changes: 106 additions & 0 deletions datumaro/plugins/brats_numpy_format.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Copyright (C) 2022 Intel Corporation
#
# SPDX-License-Identifier: MIT

import os.path as osp

import numpy as np

from datumaro.components.annotation import AnnotationType, Cuboid3d, LabelCategories, Mask
from datumaro.components.extractor import DatasetItem, Importer, SourceExtractor
from datumaro.components.format_detection import FormatDetectionContext
from datumaro.components.media import MultiframeImage
from datumaro.util.pickle_util import PickleLoader


class BratsNumpyPath:
IDS_FILE = "val_ids.p"
BOXES_FILE = "val_brain_bbox.p"
LABELS_FILE = "labels"
DATA_SUFFIX = "_data_cropped"
LABEL_SUFFIX = "_label_cropped"


class BratsNumpyExtractor(SourceExtractor):
def __init__(self, path):
if not osp.isfile(path):
raise FileNotFoundError("Can't read annotation file '%s'" % path)

super().__init__(media_type=MultiframeImage)

self._root_dir = osp.dirname(path)
self._categories = self._load_categories()
self._items = list(self._load_items(path).values())

def _load_categories(self):
label_cat = LabelCategories()

labels_path = osp.join(self._root_dir, BratsNumpyPath.LABELS_FILE)
if osp.isfile(labels_path):
with open(labels_path, encoding="utf-8") as f:
for line in f:
label_cat.add(line.strip())

return {AnnotationType.label: label_cat}

def _load_items(self, path):
items = {}

with open(path, "rb") as f:
ids = PickleLoader.restricted_load(f)

boxes = None
boxes_file = osp.join(self._root_dir, BratsNumpyPath.BOXES_FILE)
if osp.isfile(boxes_file):
with open(boxes_file, "rb") as f:
boxes = PickleLoader.restricted_load(f)

for i, item_id in enumerate(ids):
image_path = osp.join(self._root_dir, item_id + BratsNumpyPath.DATA_SUFFIX + ".npy")
media = None
if osp.isfile(image_path):
data = np.load(image_path)[0].transpose()
images = [0] * data.shape[2]
for j in range(data.shape[2]):
images[j] = data[:, :, j]

media = MultiframeImage(images, path=image_path)

anno = []
mask_path = osp.join(self._root_dir, item_id + BratsNumpyPath.LABEL_SUFFIX + ".npy")
if osp.isfile(mask_path):
mask = np.load(mask_path)[0].transpose()
for j in range(mask.shape[2]):
classes = np.unique(mask[:, :, j])
for class_id in classes:
anno.append(
Mask(
image=self._lazy_extract_mask(mask[:, :, j], class_id),
label=class_id,
attributes={"image_id": j},
)
)

if boxes is not None:
box = boxes[i]
anno.append(Cuboid3d(position=list(box[0]), rotation=list(box[1])))

items[item_id] = DatasetItem(id=item_id, media=media, annotations=anno)

return items

@staticmethod
def _lazy_extract_mask(mask, c):
return lambda: mask == c


class BratsNumpyImporter(Importer):
@classmethod
def detect(cls, context: FormatDetectionContext) -> None:
context.require_file(BratsNumpyPath.IDS_FILE)

@classmethod
def find_sources(cls, path):
return cls._find_sources_recursive(
path, "", "brats_numpy", filename=BratsNumpyPath.IDS_FILE
)
22 changes: 1 addition & 21 deletions datumaro/plugins/cifar_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
from collections import OrderedDict

import numpy as np
import numpy.core.multiarray

from datumaro.components.annotation import AnnotationType, Label, LabelCategories
from datumaro.components.converter import Converter
Expand All @@ -18,26 +17,7 @@
from datumaro.components.media import Image
from datumaro.util import cast
from datumaro.util.meta_file_util import has_meta_file, parse_meta_file


class RestrictedUnpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == "numpy.core.multiarray" and name in PickleLoader.safe_numpy:
return getattr(numpy.core.multiarray, name)
elif module == "numpy" and name in PickleLoader.safe_numpy:
return getattr(numpy, name)
raise pickle.UnpicklingError("Global '%s.%s' is forbidden" % (module, name))


class PickleLoader:
safe_numpy = {
"dtype",
"ndarray",
"_reconstruct",
}

def restricted_load(s):
return RestrictedUnpickler(s, encoding="latin1").load()
from datumaro.util.pickle_util import PickleLoader


class CifarPath:
Expand Down
6 changes: 5 additions & 1 deletion datumaro/plugins/mnist_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,11 @@ class MnistImporter(Importer):
@classmethod
def find_sources(cls, path):
return cls._find_sources_recursive(
path, ".gz", "mnist", file_filter=lambda p: osp.basename(p).split("-")[1] == "labels"
path,
".gz",
"mnist",
file_filter=lambda p: 1 < len(osp.basename(p).split("-"))
and osp.basename(p).split("-")[1] == "labels",
)


Expand Down
27 changes: 27 additions & 0 deletions datumaro/util/pickle_util.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Copyright (C) 2022 Intel Corporation
#
# SPDX-License-Identifier: MIT

import pickle # nosec - disable B403:import_pickle check - fixed

import numpy.core.multiarray


class RestrictedUnpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == "numpy.core.multiarray" and name in PickleLoader.safe_numpy:
return getattr(numpy.core.multiarray, name)
elif module == "numpy" and name in PickleLoader.safe_numpy:
return getattr(numpy, name)
raise pickle.UnpicklingError("Global '%s.%s' is forbidden" % (module, name))


class PickleLoader:
safe_numpy = {
"dtype",
"ndarray",
"_reconstruct",
}

def restricted_load(s):
return RestrictedUnpickler(s, encoding="latin1").load()
Loading