-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataModules: pass kwargs directly to datasets #730
Conversation
@@ -18,8 +18,8 @@ experiment: | |||
num_filters: 256 | |||
ignore_index: 0 | |||
datamodule: | |||
root_dir: "data/oscd" | |||
batch_size: 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OSCD data module has never had a batch_size
parameter. The correct name is train_batch_size
. This parameter was being ignored since we never actually used kwargs
before.
batch_size: int = 64, | ||
num_workers: int = 0, | ||
bands: str = "rgb", | ||
band_set: str = "rgb", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed from bands
to band_set
to match SEN12MSDataModule
and to remove ambiguity between So2SatDataModule
and So2Sat
where bands
is a list of band names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any issues. lgtm this one was much needed.
Will give @calebrob6 a chance to review, no rush on this PR since it's going in 0.4.0. |
Agree that this was much needed -- I just don't like how it isn't obvious from the DataModule constructors that you need to pass a root parameter. Since every dataset will need a root parameter can't we just keep it in? |
In theory, we could someday have datasets without a root parameter (streaming data from Azure/AWS for example). But yes, all of our current datasets have one.
We could keep it in and forward it like we used to. Or we could properly document the available kwargs, or the fact that kwargs will be forwarded. Or both. |
Properly document the available kwargs seems best to me |
3d05ae5
to
b89fc41
Compare
@calebrob6 check out the docs for the latest version and see if that's better. The alternative would be to make a formal doc section like: Initialize a LightningDataModule for BigEarthNet based DataLoaders.
Args:
batch_size: The batch size to use in all created DataLoaders
num_workers: The number of workers to use in all created DataLoaders
Keyword Args:
root: root directory where dataset can be found
split: train/val/test split to load
bands: load Sentinel-1 bands, Sentinel-2, or both. one of {s1, s2, all}
num_classes: number of classes to load in target. one of {19, 43}
transforms: a function/transform that takes input sample and its target as
entry and returns a transformed version
download: if True, download dataset and store it in the root directory
checksum: if True, check the MD5 of the downloaded files (may be slow) but that ends up with a lot of duplication and becomes harder to maintain. |
afd2f04
to
b19a75f
Compare
Closing/reopening to try to kick off CLA bot. |
@microsoft-github-policy-service agree |
This looks like a good compromise on documentation. What if we specially check for "root" in kwargs and raise a helpful error message if it isn't included? My worry is that if a user doesn't add |
(or warning) |
The stack trace will tell them the full path the code took to raise that error (not necessarily clear to a non-programmer though). |
It only needs to be a message on the datasets that do have root, right?
…On Fri, Sep 30, 2022, 10:53 AM Adam J. Stewart ***@***.***> wrote:
The stack trace will tell them the full path the code took to raise that
error (not necessarily clear to a non-programmer though). root isn't
required, all datasets have a default root. If anyone ever implements #660
<#660> the error message will
be a bit more helpful for datasets that don't already explain what root
is set to.
—
Reply to this email directly, view it on GitHub
<#730 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIJUTUQT3UF7ZH6YGJJ6KTWA4SK5ANCNFSM567BHLFA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
At the moment this is every dataset. |
* Datamodules: pass kwargs directly to datasets * Rename root_dir -> root in config files * Fix datamodule tests * Fix mypy * Fix tutorial * Specify all kwarg keys * Fix bands vs. band_set * root_dir -> root * Document **kwargs
Closes #666
Pros
Cons
root_dir
has been renamed toroot
for consistencyroot_dir
as the first positional arg, must be a keyword argTo elaborate, the following are no longer valid:
Instead, users will need to use: