Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixup all example CI tests and properly fail #517

Merged
merged 10 commits into from
Jul 15, 2022
Merged

Fixup all example CI tests and properly fail #517

merged 10 commits into from
Jul 15, 2022

Conversation

muellerzr
Copy link
Collaborator

Fix all example CI

What does this add?

This pr makes a variety of fixes to the example tests to solve all failures and properly ensure that failures occur when they need to

Why is it needed?

The entire list of issues I noticed are below:

  • We cannot mock CometMLTracker to run offline tests with the tracking example script, so the CI now explicitly uninstalls them
  • Adds a new requires_tracking decorator for the example tests to make sure at least one tracking API is installed and comet_ml is not installed
  • Creates a new exception class and a new run_command function which will properly call a subprocess test function and check its outputs properly as well as fully fail if the call did not work.
  • Fixes two bugs in the cross validation example in regards to noting the final prediction score.

@muellerzr muellerzr added the bug Something isn't working label Jul 13, 2022
@muellerzr muellerzr requested a review from sgugger July 13, 2022 10:43
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 13, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the fixes!

@@ -229,20 +229,14 @@ def training_function(config, args):
with torch.no_grad():
outputs = model(**batch)
predictions = outputs.logits
predictions, references = accelerator.gather((predictions, batch["labels"]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will stop working on distributed setup. The predictions will need to be gathered (maybe not the labels).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now

@muellerzr muellerzr requested a review from sgugger July 15, 2022 12:44
@muellerzr
Copy link
Collaborator Author

All slow ci is now passing here: https://github.com/huggingface/accelerate/runs/7358153325

@muellerzr muellerzr merged commit 7abc708 into main Jul 15, 2022
@muellerzr muellerzr deleted the fix-example-ci branch July 15, 2022 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants