Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Bugfix for attribute inheritance in ShardedDatasetReader #4830

Merged
merged 6 commits into from
Dec 2, 2020
Merged

Bugfix for attribute inheritance in ShardedDatasetReader #4830

merged 6 commits into from
Dec 2, 2020

Conversation

aleSuglia
Copy link
Contributor

@aleSuglia aleSuglia commented Dec 1, 2020

Implemented automatic parameters inheritance from base_reader in ShardedDatasetReader. This allows the user to specify parameters like lazy or max_instances either for the base_reader or the ShardedDatasetReader itself.

Fixes the issue reported in #4825

Implemented automatic parameters inheritance from base_reader in ShardedDatasetReader.
@aleSuglia aleSuglia changed the title Bugfix for issue https://github.com/allenai/allennlp/issues/4825 Bugfix for attribute inheritance in ShardedDatasetReader Dec 1, 2020
Comment on lines 70 to 73
for attr_name, attr_val in self.reader.__dict__.items():
# copy over only shared attributes between the two classes
if attr_name in self.__dict__:
setattr(self, attr_name, kwargs.get(attr_name, attr_val))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me a little nervous. I'd rather just explicitly set only the attributes we care about (lazy and mabye max_instances). Although, I'm not sure about max_instances...

Setting max_instances in the base_reader but not in the ShardedDatasetReader itself actually has clear and well-defined behavior: at most max_instances will be read from each shard.

Copy link
Contributor Author

@aleSuglia aleSuglia Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That makes a lot of sense. I will make the change. @epwalsh Should I just consider lazy then? Seems the safest solution.

Comment on lines 72 to 76
for attr_name in _INHERITED_DATASET_PARAMS:
attr_val = getattr(self.reader, attr_name)
# copy over only shared attributes between the two classes
if attr_name in self.__dict__:
setattr(self, attr_name, kwargs.get(attr_name, attr_val))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more complicated than it needs to be. I also think it would be more robust to set lazy on the call to super().__init__(...) on line 46 above, since the base class DatasetReader might have special logic in the __init__ method that depends on the value of lazy (well, that's not the case right now, but it could be in the future).

So something like (right above line 46):

if "lazy" not in kwargs:
    kwargs["lazy"] = base_reader.lazy

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks!

@epwalsh epwalsh merged commit 3e62365 into allenai:master Dec 2, 2020
@epwalsh epwalsh mentioned this pull request Dec 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants