-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Bugfix for attribute inheritance in ShardedDatasetReader #4830
Bugfix for attribute inheritance in ShardedDatasetReader #4830
Conversation
Implemented automatic parameters inheritance from base_reader in ShardedDatasetReader.
for attr_name, attr_val in self.reader.__dict__.items(): | ||
# copy over only shared attributes between the two classes | ||
if attr_name in self.__dict__: | ||
setattr(self, attr_name, kwargs.get(attr_name, attr_val)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me a little nervous. I'd rather just explicitly set only the attributes we care about (lazy
and mabye max_instances
). Although, I'm not sure about max_instances
...
Setting max_instances
in the base_reader
but not in the ShardedDatasetReader
itself actually has clear and well-defined behavior: at most max_instances
will be read from each shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That makes a lot of sense. I will make the change. @epwalsh Should I just consider lazy
then? Seems the safest solution.
for attr_name in _INHERITED_DATASET_PARAMS: | ||
attr_val = getattr(self.reader, attr_name) | ||
# copy over only shared attributes between the two classes | ||
if attr_name in self.__dict__: | ||
setattr(self, attr_name, kwargs.get(attr_name, attr_val)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is more complicated than it needs to be. I also think it would be more robust to set lazy
on the call to super().__init__(...)
on line 46 above, since the base class DatasetReader
might have special logic in the __init__
method that depends on the value of lazy
(well, that's not the case right now, but it could be in the future).
So something like (right above line 46):
if "lazy" not in kwargs:
kwargs["lazy"] = base_reader.lazy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks!
Implemented automatic parameters inheritance from
base_reader
inShardedDatasetReader
. This allows the user to specify parameters likelazy
ormax_instances
either for thebase_reader
or theShardedDatasetReader
itself.Fixes the issue reported in #4825