Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrected Throughput measure for GaudiDDPMPipeline #1460

Merged
merged 5 commits into from
Nov 21, 2024

Conversation

deepak-gowda-narayana
Copy link
Contributor

@deepak-gowda-narayana deepak-gowda-narayana commented Oct 28, 2024

What does this PR do?

The test GaudiDDPMPipelineTester::test_no_throughput_regression_bf16() fails with throughput comparison, but throughput was being measured inversely as sec/sample instead of samples /sec.
This PR does the follow

  • Corrected throughput measure formula for GaudiDDPMPipeline
  • Updated the Gaudi benchmark values for diffusion test test_no_throughput_regression_bf16()

Below is the summary of slow_diffusion tests

python -m pytest tests/test_diffusers.py -v -s -k "test_textual_inversion"
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0 -- /usr/bin/python
cachedir: .pytest_cache
rootdir: /mnt/oh_pr
configfile: setup.cfg
collecting ... collected 159 items / 158 deselected / 1 selected
========== 1 passed, 158 deselected, 5 warnings in 160.91s (0:02:40) ===========

python -m pytest tests/test_diffusers.py -v -s -k "test_train_text_to_image_"
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0 -- /usr/bin/python
cachedir: .pytest_cache
rootdir: /mnt/oh_pr
configfile: setup.cfg
collecting ... collected 159 items / 157 deselected / 2 selected
========== 2 passed, 157 deselected, 3 warnings in 486.10s (0:08:06) ===========

python -m pytest tests/test_diffusers.py -v -s -k "test_train_controlnet"
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0 -- /usr/bin/python
cachedir: .pytest_cache
rootdir: /mnt/oh_pr
configfile: setup.cfg
collecting ... collected 159 items / 157 deselected / 2 selected
========== 2 passed, 157 deselected, 5 warnings in 201.56s (0:03:21) ===========

python -m pytest tests/test_diffusers.py -v -s -k "test_deterministic_image_generation"
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0 -- /usr/bin/python
cachedir: .pytest_cache
rootdir: /mnt/oh_pr
configfile: setup.cfg
collecting ... collected 159 items / 157 deselected / 2 selected
========== 2 passed, 157 deselected, 6 warnings in 276.77s (0:04:36) ===========

python -m pytest tests/test_diffusers.py -v -s -k "test_no_"
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0 -- /usr/bin/python
cachedir: .pytest_cache
rootdir: /mnt/oh_pr
configfile: setup.cfg
collecting ... collected 159 items / 153 deselected / 6 selected
===== 5 passed, 1 skipped, 153 deselected, 6 warnings in 416.68s (0:06:56) =====

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@deepak-gowda-narayana
Copy link
Contributor Author

@jiminha @emascarenhas
Request to review PR for feedback

@yafshar
Copy link
Contributor

yafshar commented Oct 28, 2024

@deepak-gowda-narayana nice catch! You are right throughput typically measures the number of units processed per unit of time and here it is inverse of that!

Would you please post the tests and failures before this PR?

@deepak-gowda-narayana
Copy link
Contributor Author

deepak-gowda-narayana commented Oct 28, 2024

@deepak-gowda-narayana nice catch! You are right throughput typically measures the number of units processed per unit of time and here it is inverse of that!

Would you please post the tests and failures before this PR?

@yafshar

Sure, here is the previous test summary without the fix:

Result Summary of fast_tests.sh

python -m pytest tests/test_gaudi_configuration.py tests/test_trainer_distributed.py tests/test_trainer.py tests/test_trainer_seq2seq.py
=============================================================================================== 81 passed, 8 skipped, 39 warnings in 109.40s (0:01:49) ================================================================================================

Result Summary of fast_tests_diffusers.sh

python -m pytest tests/test_diffusers.py
================================================================================================= 113 passed, 47 skipped, 280 warnings in 1229.45s (0:20:29) =================================================================================================

Result Summary of slow_tests_diffusers.sh
================================================================================================================== short test summary info ===================================================================================================================
FAILED tests/test_diffusers.py::GaudiDDPMPipelineTester::test_no_throughput_regression_bf16 - AssertionError: 6.937816172838211 not greater than or equal to 7.287651444971561
======================================================================================= 1 failed, 4 passed, 1 skipped, 154 deselected, 6 warnings in 358.13s (0:05:58) =======================================================================================
make: *** [Makefile:104: slow_tests_diffusers] Error 1

@jiminha jiminha added the run-test Run CI for PRs from external contributors label Oct 30, 2024
Copy link
Contributor

@yafshar yafshar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@deepak-gowda-narayana
Copy link
Contributor Author

@regisss @yafshar PR Updated - removed absolute import path

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@regisss regisss merged commit 3cdb0eb into huggingface:main Nov 21, 2024
4 checks passed
Luca-Calabria pushed a commit to Luca-Calabria/optimum-habana that referenced this pull request Nov 25, 2024
HolyFalafel pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Nov 26, 2024
Liangyx2 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-test Run CI for PRs from external contributors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants