Skip to content

Commit

Permalink
Fix merging of stream datasets (openvinotoolkit#1609)
Browse files Browse the repository at this point in the history
<!-- Contributing guide:
https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md
-->

When importing a stream dataset with multiple sources in eager mode by
specifying `error_policy` or `progress_reporting`, an error occurs:
```
'_MergedStreamDataset' object has no attribute '_data'
```

<!--
Resolves openvinotoolkit#111 and openvinotoolkit#222.
Depends on openvinotoolkit#1000 (for series of dependent commits).

This PR introduces this capability to make the project better in this
and that.

- Added this feature
- Removed that feature
- Fixed the problem openvinotoolkit#1234
-->

<!-- Describe the testing procedure for reviewers, if changes are
not fully covered by unit tests or manual testing can be complicated.
-->

<!-- Put an 'x' in all the boxes that apply -->
- [x] I have added unit tests to cover my changes.​
- [x] I have added integration tests to cover my changes.​
- [x] I have added the description of my changes into
[CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md).​
- [x] I have updated the
[documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs)
accordingly

- [x] I submit _my code changes_ under the same [MIT
License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE)
that covers the project.
  Feel free to contact the maintainers if that's a concern.
- [x] I have updated the license header for each file (see an example
below).

```python
```

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Ilya Trushkin <[email protected]>
Co-authored-by: williamcorsel <[email protected]>
Co-authored-by: Sooah Lee <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Yunchu Lee <[email protected]>
Co-authored-by: Wonju Lee <[email protected]>
  • Loading branch information
6 people committed Sep 23, 2024
1 parent afd695d commit 757b26d
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 5 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
(<https://github.com/openvinotoolkit/datumaro/pull/1607>)

### Bug fixes
- Fix StreamDataset merging when importing in eager mode
(<https://github.com/openvinotoolkit/datumaro/pull/1609>)

## Q3 2024 Release 1.9.0
### New features
Expand Down
11 changes: 8 additions & 3 deletions src/datumaro/components/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1023,17 +1023,22 @@ class _MergedStreamDataset(cls):
def __init__(self, *sources: IDataset):
from datumaro.components.hl_ops import HLOps

self.merged = HLOps.merge(*sources, merge_policy=merge_policy)
self._merged = HLOps.merge(*sources, merge_policy=merge_policy)
self._data = self._merged._data
self._env = env
self._format = DEFAULT_FORMAT
self._source_path = None
self._options = {}

def __iter__(self):
yield from self.merged
yield from self._merged

@property
def is_stream(self):
return True

def subsets(self) -> Dict[str, DatasetSubset]:
return self.merged.subsets()
return self._merged.subsets()

return _MergedStreamDataset(*sources)

Expand Down
9 changes: 7 additions & 2 deletions tests/unit/test_imagenet_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import pytest

from datumaro.components.annotation import AnnotationType, Label, LabelCategories
from datumaro.components.contexts.importer import ImportErrorPolicy
from datumaro.components.dataset import Dataset, StreamDataset
from datumaro.components.dataset_base import DatasetItem
from datumaro.components.environment import Environment
Expand Down Expand Up @@ -214,7 +215,9 @@ def _create_expected_dataset(self):
@pytest.mark.parametrize("dataset_cls, is_stream", [(Dataset, False), (StreamDataset, True)])
def test_can_import(self, dataset_cls, is_stream, helper_tc):
expected_dataset = self._create_expected_dataset()
dataset = dataset_cls.import_from(self.DUMMY_DATASET_DIR, self.IMPORTER_NAME)
dataset = dataset_cls.import_from(
self.DUMMY_DATASET_DIR, self.IMPORTER_NAME, error_policy=ImportErrorPolicy()
)
assert dataset.is_stream == is_stream

compare_datasets(helper_tc, expected_dataset, dataset, require_media=True)
Expand All @@ -240,7 +243,9 @@ class ImagenetWithSubsetDirsImporterTest(ImagenetImporterTest):
@mark_requirement(Requirements.DATUM_GENERAL_REQ)
@pytest.mark.parametrize("dataset_cls, is_stream", [(Dataset, False), (StreamDataset, True)])
def test_can_import(self, dataset_cls, is_stream, helper_tc):
dataset = dataset_cls.import_from(self.DUMMY_DATASET_DIR, self.IMPORTER_NAME)
dataset = dataset_cls.import_from(
self.DUMMY_DATASET_DIR, self.IMPORTER_NAME, error_policy=ImportErrorPolicy()
)
assert dataset.is_stream == is_stream

for subset_name, subset in dataset.subsets().items():
Expand Down

0 comments on commit 757b26d

Please sign in to comment.