-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-129327: revise hashlib documentation to account for FIPS removing sha1 #129334
base: main
Are you sure you want to change the base?
Conversation
Lucky you I have WiFi while travelling. The change looks correct but not when taken as a whole paragraph. We talk about "secure hash" and "message digests", then "insecure hash functions" and then we say that the first two terms are actually interchangeable. For me, but it's maybe because I was taught that, all of them are hash functions that may or may not be secure depending on the time you read the docs and depending on their usage. For instance, md5 and sha1 are still used for file comparisons although they shouldn't be used for cryptographic purposes. So it's a bit strange to say "insecure" when just before we talked about secure hashes and just after we say that they are equivalent to message digests. Now, since SHA-1 has been removed from FIPS 140, we can also add a link to the removal notice (make the "legacy" term a link). |
I wasn't entirely sure how best to phrase this myself, yeah. They are "secure hashes" in the sense that they are hashes designed for security, but they are insecure algorithms, if that somehow makes sense... is it enough to say "legacy algorithms" then, do you think? |
I would leave out the term "insecure" so that users aren't confused with the terms "secure/insecure". Previously, MD5 wasn't categorized as insecure in the text. Similarly, I would remove the term "legacy" which could hint that these hashes are kept for backward compatibility but that's not the case as they are still used in production for other purposes that do not require the same security level. |
Ah! I know why we have secure hash and message digest. It's because of the SHA and the MD family... MD means message digest while SHA means Secure Hash Algorithm. |
Ideally I'd like to remove both terms. It's confusing. Let's just call them hash functions and that's it. No need for being historical IMO. That way we can also add a note saying that SHA1 and MD5 are not cryptographically secure hash functions but can be used for file integrity checks as they are more efficient than secure hash functions. |
…oving sha1 More generally, the current documentation is a bit scattered, talking about what terms are "equal" despite those terms not being very interesting and given the term "secure hash", probably wrong (because md5 and sha1 are not secure anymore). Let's talk about cryptographically secure instead, and note that two of them aren't. And then we can also link to the source for NIST going through the removal process for SHA1.
960d84a
to
67c4294
Compare
This module implements a common interface to many different hash algorithms. | ||
Included are the FIPS secure hash algorithms SHA224, SHA256, SHA384, SHA512, | ||
(defined in `the FIPS 180-4 standard`_), the SHA-3 series (defined in `the FIPS | ||
202 standard`_) as well as the non-cryptographically-secure algorithms SHA1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, "non-cryptographically-secure" may be too verbose. Let just say "as well as SHA1 (formerly...) and RSA's MD5 (..))". If we want to be precise we should rather add a note concerning the weaknesses of sha1/md5 as cryptographically secure hash functions but I don't know if we wouldn't reinvent the wheel instead in this PR.
WDYT @gpshead? do you think it's fine not to talk too much about security/applications in the first paragraph of hashlib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just go with "legacy".
"""... as well as the legacy algorithms SHA1 (formerly part of ...) and the MD5 algorithm (defined in ...)"""
perhaps. This elides attributing MD5 to RSA as well as it isn't important where MD5 came from, that's what the linked-to RFC is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that but I would have used the term "legacy" when talking about something that is only kept for backwards compatibility, and not something that I expect to see in production. Maybe it's more a language issue on my side though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm however happy if we can remove the mention to RSA though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
legacy means a lot of things. It is simpler and more pedantically accurate than saying not-cryptographically-secure.
I'd edit this PR myself with those words but it appears the author unchecked the allow-edits checkbox on it (Github isn't showing me an edit option)?
📚 Documentation preview 📚: https://cpython-previews--129334.org.readthedocs.build/