-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a way to read data files from local path #451
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for the PR!
I have a few comments, and I think that there is a simpler way of addressing this.
Here is what I propose: I think there is not much need of having the preprocess
function either, and the loading of the labels / targets could happen at every dataset initialization. It should be very fast, as we are using numpy's frombuffer
(which was not the case when we first implemented this function), see #334 for details.
What do you think?
|
||
if download: | ||
self.download() | ||
|
||
elif self.from_local: |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@@ -36,15 +36,20 @@ class MNIST(data.Dataset): | |||
training_file = 'training.pt' | |||
test_file = 'test.pt' | |||
|
|||
def __init__(self, root, train=True, transform=None, target_transform=None, download=False): | |||
def __init__(self, root, train=True, transform=None, target_transform=None, | |||
download=False, from_local=False): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
* [BERT] fixing input pipeline and layer norm namescope * [BERT] input preprocessing for tf2 model, seperating training and eval sets * [BERT] small corrections to README.md * Shell script to download and extract, V0.7 README * Minor fix to README * [BERT] small fixes to tpu library imports * [BERT] Gradient accumulation for TF1 * Fixing git SHA and nltk versions * [BERT] redirect seperated dataset changes to TF1 * [BERT] revert back run commands * [BERT] uncommand process_wiki.sh * [BERT] remove 3M starting eval requirement and update target accuracy to 0.720 * [BERT] delete unused files * [BERT] add MLPerf logging * [BERT] small corrections * [BERT] update eval frequency * [BERT] provides dataset after preprocessing, and move the related details to dataset.md * [BERT] small corrections * [BERT] update HPs for BS24 on V100x8, and add BS8k running steps on TPUs * [BERT] update README to give more details about how to use eval * [BERT] Readme update for eval * Revert "Merge pull request pytorch#1 from aarti-cerebras/v0.7_readme" This reverts commit 9974ac9d6d6bf0b3ceaf22d4d86c1f5f25ba26e4, reversing changes made to 563be596bd7f5c38db696e9c34ef29bd477462f7. * Revert "Revert "Merge pull request pytorch#1 from aarti-cerebras/v0.7_readme"" This reverts commit f38ef1626517ae74a3be2895861ee17ee0d699c9. * [BERT] add clip_by_global_norm_after_gradient_allreduce option * [BERT] sample script to run offline eval Co-authored-by: Aarti Ghatkesar <[email protected]>
yesterday, on my linux machine,
I found it can not download mnist_data from net,
but the network had no problem,
so I downloaded mnist data by chrome, then modified this mnist.py to train the example.
this costed some time for me.
so I want to provided a way to read mnist data from the path which represented by the root parameter
to help those who meets the same question with me
By the way, this is my first pull request in github, I hope there would be no problem.
I love to use pytorch, I hope pytorch can be better and better
God bless us!