You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a dataset of sequences, where each example in the sequence is a separate row in the dataset (similar to LeRobotDataset). When running Dataset.save_to_disk how can I provide indices where it's possible to shard the dataset such that no episode spans more than 1 shard. Consequently, when I run Dataset.load_from_disk, how can I load just a subset of the shards to save memory and time on different ranks?
I guess an alternative to this would be, given a loaded Dataset, how can I run Dataset.shard such that sharding doesn't split any episode across shards?
The text was updated successfully, but these errors were encountered:
I have a dataset of sequences, where each example in the sequence is a separate row in the dataset (similar to LeRobotDataset). When running
Dataset.save_to_disk
how can I provide indices where it's possible to shard the dataset such that no episode spans more than 1 shard. Consequently, when I runDataset.load_from_disk
, how can I load just a subset of the shards to save memory and time on different ranks?I guess an alternative to this would be, given a loaded
Dataset
, how can I runDataset.shard
such that sharding doesn't split any episode across shards?The text was updated successfully, but these errors were encountered: