-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Omniglot Dataset #323
Omniglot Dataset #323
Conversation
I am particularly concerned as to how to implement the |
I precompute the random set of pairs with an added argument of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR - I've left some inline comments
.gitignore
Outdated
docs/build | ||
.idea/ |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torchvision/datasets/omniglot.py
Outdated
print('Files already downloaded and verified') | ||
return | ||
|
||
for fzip in self.zips_md5: |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torchvision/datasets/omniglot.py
Outdated
' You can use download=True to download it') | ||
|
||
self.target_folder = os.path.join(self.root, self._get_target_folder()) | ||
self.alphabets_ = list_dir(self.target_folder) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torchvision/datasets/omniglot.py
Outdated
['images_background', '68d2efa1b9178cc56df9314c21c6e718'], | ||
['images_evaluation', '6b91aef0f799c5bb55b94e3f2daec811'], | ||
# Provision in future | ||
# ['images_background_small1', 'e704a628b5459e08445c13499850abc4'], |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torchvision/datasets/omniglot.py
Outdated
|
||
self.target_folder = os.path.join(self.root, self._get_target_folder()) | ||
self.alphabets_ = list_dir(self.target_folder) | ||
self.characters_ = list( |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torchvision/datasets/omniglot.py
Outdated
self.target_folder = os.path.join(self.root, self._get_target_folder()) | ||
self.alphabets_ = list_dir(self.target_folder) | ||
self.characters_ = list( | ||
reduce( |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torchvision/datasets/omniglot.py
Outdated
] | ||
for idx, character in enumerate(self.characters_) | ||
] | ||
self.flat_character_images_ = list(reduce(lambda x, y: x + y, self.character_images_)) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torchvision/datasets/omniglot.py
Outdated
) | ||
self.character_images_ = [ | ||
[ | ||
tuple([image, idx]) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torchvision/datasets/omniglot.py
Outdated
return 'images_background' if self.background is True else 'images_evaluation' | ||
|
||
|
||
class OmniglotRandomPair(Omniglot): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, I've left some more comments inline, mostly around the OmniglotRandomPair
dataset
torchvision/datasets/omniglot.py
Outdated
zip_file.extractall(self.root) | ||
|
||
def _get_target_folder(self): | ||
return 'images_background' if self.background is True else 'images_evaluation' |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torchvision/datasets/omniglot.py
Outdated
|
||
self.target_folder = os.path.join(self.root, self._get_target_folder()) | ||
self._alphabets = list_dir(self.target_folder) | ||
self._characters = sum( |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torchvision/datasets/omniglot.py
Outdated
def __init__(self, root, pair_count=10000, background=True, | ||
transform=None, target_transform=None, | ||
download=False): | ||
super(self.__class__, self).__init__(root, background=background, |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torchvision/datasets/omniglot.py
Outdated
], | ||
[] | ||
) | ||
self._character_images = [ |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torchvision/datasets/omniglot.py
Outdated
return 'images_background' if self.background is True else 'images_evaluation' | ||
|
||
|
||
class OmniglotRandomPair(Omniglot): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Hey @alykhantejani, do you mind checking some updates to the randomized pair generation? |
@alykhantejani Writing the comments on the approach here because I couldn't find it anywhere when I had responded to your review earlier (weirdly)
I believe this way, it is certainly harder to arrive at a collision when
The problem with this approach would be that I am not staying true to what I wanted to achieve with the class. The total number of pairs combinatorially possible are huge and that is why I introduced a |
Hey @activatedgeek, sorry for the response. Pinging @fmassa in case he has any opinions/thoughts on the random pair dataset |
Hi @activatedgeek , sorry for the delay in replying. So, my first thought about the We introduce a new class MultiDataset(object):
def __init__(self, dataset, num_outputs=1, transforms=None):
self.dataset = dataset
self.num_outputs = num_outputs
self.transforms = transforms
def __getitem__(self, idx):
# here comes the logic to convert a 1d index into a
# self.num_output indices, each of size len(self.dataset)
individual_idx = []
for i in range(self.num_outputs):
individual_idx.append(idx % len(self.dataset))
idx = idx // len(self.dataset)
result = []
for i in reversed(idx):
result.append(self.dataset[i])
if self.transforms is not None:
result = self.transforms(result)
return result
def __len__(self):
return len(self.dataset) ** self.num_outputs This way, you generate on-the-fly an arbitrarily large dataset, that can accommodate pairs/triplets/etc of elements of the same dataset. Plus, the logic of how to combine the different targets of each dataset becomes something that the user should do (via the transforms in the This is just a rough idea, but let me know what you think. |
@fmassa That is a great idea. I was in fact wondering the same because I recently came across an obvious requirement of similar kind for the ImageNet/Mini-ImageNet datasets as well. It didn't feel right to create custom rules every now and then. So here is what I will do - the I will take up a Does that sound good? |
Sounds good! Also, giving how generic this dataset is, might be worth considering sending it to pytorch/tnt, but we can discuss that later |
@fmassa @alykhantejani Can you please verify if everything is in order now? |
Any updates here? |
Hi, I am wondering what the hold up is about? Is there something that I could further help with? It has been quite a while that this PR has been up and I would really appreciate if we could make some progress here. (Or closed if it doesn't align well). |
@alykhantejani @fmassa Any possibility of this getting merged? |
@activatedgeek sorry for the delay, I'll have a look at it today. |
This looks very good, thanks! Once the merge conflicts are addressed, I think this can be merged. |
Hey @alykhantejani @fmassa , can we merge this? I can continue on a discussion on #338 after this. |
Pinging @fmassa and @alykhantejani here again. Can we please merge this? |
Awesome! Thanks a lot @activatedgeek ! |
This is the loader for Omniglot Dataset.
One of the use cases of this dataset is for One-Shot Learning where we sample a pair from the dataset and train a Neural Network to learn the similarity metric between the pair.
P.S.: It is amazing how simple it is to write data loaders!