Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented check_disk_space function #1590

Merged

Conversation

martinbrose
Copy link
Contributor

@martinbrose martinbrose commented Aug 12, 2023

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 16, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @martinbrose thanks again for another PR! Your approach here is good. I focused the review and comments on how the logic could be simplified by having a helper doing only 1 check but being called multiple times. This simplifies the overall logic by reducing the number of if ... is not None: statements. Let me know what you think about it :)

src/huggingface_hub/file_download.py Outdated Show resolved Hide resolved
src/huggingface_hub/file_download.py Outdated Show resolved Hide resolved
src/huggingface_hub/file_download.py Outdated Show resolved Hide resolved
src/huggingface_hub/file_download.py Outdated Show resolved Hide resolved
src/huggingface_hub/file_download.py Outdated Show resolved Hide resolved
src/huggingface_hub/file_download.py Outdated Show resolved Hide resolved
@martinbrose
Copy link
Contributor Author

@Wauplin many thanks for your detailed review comments. I'll attend to them soon.

What I realised is that I haven't added any test cases to test_file_download.py...
Would you have pointers for creating test file storage of a specific size in the testing environment?
So I could check for cases where enough space is available and where there is not?

Would you also want to have these helpers added to __init__.py?

@Wauplin
Copy link
Contributor

Wauplin commented Aug 17, 2023

Good questions!

Would you have pointers for creating test file storage of a specific size in the testing environment?
So I could check for cases where enough space is available and where there is not?

What I would do is to test the helper individually and not the end-to-end process of downloading a file.

  1. I would define a new class TestDiskUsageWarning(unittest.TestCase) in test_file_download.py
  2. I would 1 test where the free disk space is ok, 1 test where the free disk space is too low
  3. Instead of artificially reducing the disk space to make the tests work, I would mock the shutil.disk_usage method.

To mock functions, you can use unittest.mock.patch. Instead of running disk_usage, the test will simply use the mock value. This is done for example here. In your case, you will decorate your test with @patch("huggingface_hub.file_download.shutil.disk_usage"). Then the test takes as input a Mock object on which you can set a specific return value (for instance: mock.return_value.free = 1024 * 1024 to simulate the fact that there is 1MB left on disk). When you call _check_disk_space afterwards, it will use this mocked value. Hope these explanations are clear enough. You can also look at the Python docs and check some examples in huggingface_hub.

Would you also want to have these helpers added to init.py?

No need for that! It's a small private helper so it's better to not expose it :)

@martinbrose
Copy link
Contributor Author

Hi @Wauplin,

I believe I have addressed all your review points.

The only remaining point from my side:
When I run make quality, I receive the following messages:

src/huggingface_hub/file_download.py:977: error: Unsupported operand types for >= ("int" and "None")  [operator]
src/huggingface_hub/file_download.py:977: note: Right operand is of type "int | None"
src/huggingface_hub/file_download.py:981: error: Unsupported operand types for / ("None" and "float")  [operator]
src/huggingface_hub/file_download.py:981: note: Left operand is of type "int | None"
src/huggingface_hub/file_download.py:986: error: No return value expected  [return-value]

Is this something I need to address?

Thank you!

@martinbrose
Copy link
Contributor Author

Oh, actually another question:
I did add a third test case when free disk space is exactly the size of the expected file size.
Do you think that adds value or not?

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @martinbrose !
You mypy issues can be solved quite easily by changing the expected input type.
About the test case when you have exactly the good amount of free space, I don't think it's worth it. TBH on a normal computer there are 0 chances that you'll use exactly the 100% of your hard-drive. And even if that's the case your computer will have other issues since it's always writing/reading the disk. So let's not overthink this :)

src/huggingface_hub/file_download.py Outdated Show resolved Hide resolved
src/huggingface_hub/file_download.py Outdated Show resolved Hide resolved
src/huggingface_hub/file_download.py Outdated Show resolved Hide resolved
tests/test_file_download.py Outdated Show resolved Hide resolved
tests/test_file_download.py Outdated Show resolved Hide resolved
@martinbrose
Copy link
Contributor Author

Addressed nearly all of the review points.

I need to read up on testing for warnings with self.assertWarns and address the outstanding right afterward.

@martinbrose
Copy link
Contributor Author

I suspect we don't really have a warning with logger.warning? I might need to provide a mock logger and check that one as part of the test.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work again! Looks good to merge :)

@Wauplin Wauplin merged commit 518c74b into huggingface:main Aug 28, 2023
@martinbrose martinbrose deleted the issue1551_CheckDiskUsageBeforeDownload branch August 30, 2023 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants