Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update examples to show how to deal with extra validation copies #319

Merged
merged 8 commits into from
Apr 20, 2022

Conversation

muellerzr
Copy link
Collaborator

Update examples to show how to truncate the validation set for metrics

What does this add?

Based off this issue this PR updates all examples to show how to get rid of extra samples that get added when performing distributed training on the validation set.

Testing on a multigpu system will happen tommorow, but @sgugger pretty sure the way I have it setup ensures that this only runs when we have distributed systems, and that's where this problem arises?

Who is it for?

Should close #287

Why is it needed?

It's unclear from the scripts how to alleviate this behavior, and it's not documented anywhere. So, with this PR it now is

@muellerzr muellerzr added enhancement New feature or request documentation Improvements or additions to documentation labels Apr 19, 2022
@muellerzr muellerzr requested a review from sgugger April 19, 2022 18:29
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 19, 2022

The documentation is not available anymore as the PR was closed or merged.

@muellerzr
Copy link
Collaborator Author

Note: This is just an initial to make sure the format and whatnot looks right and then all the other examples will follow suite :)

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I'd just put this in a specific feature example instead of the base one.

@muellerzr muellerzr requested a review from sgugger April 20, 2022 16:31
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Not sure what your question is about tests. To test this we would need to know in advance the exact value of the metric and make sure we get that again but it's very finnicky since the metric computed in the base script is roughly the same.

examples/by_feature/multi_node_metrics.py Outdated Show resolved Hide resolved
examples/by_feature/multi_node_metrics.py Outdated Show resolved Hide resolved
@muellerzr muellerzr merged commit fa476d0 into main Apr 20, 2022
@muellerzr muellerzr deleted the fix-validation-examples branch April 20, 2022 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stuck forever in accelerator.backward without any logs
3 participants