Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-129327: revise hashlib documentation to account for FIPS removing sha1 #129334

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

eli-schwartz
Copy link
Contributor

@eli-schwartz eli-schwartz commented Jan 27, 2025

@picnixz
Copy link
Member

picnixz commented Jan 27, 2025

Lucky you I have WiFi while travelling. The change looks correct but not when taken as a whole paragraph. We talk about "secure hash" and "message digests", then "insecure hash functions" and then we say that the first two terms are actually interchangeable.

For me, but it's maybe because I was taught that, all of them are hash functions that may or may not be secure depending on the time you read the docs and depending on their usage. For instance, md5 and sha1 are still used for file comparisons although they shouldn't be used for cryptographic purposes. So it's a bit strange to say "insecure" when just before we talked about secure hashes and just after we say that they are equivalent to message digests.

Now, since SHA-1 has been removed from FIPS 140, we can also add a link to the removal notice (make the "legacy" term a link).

@eli-schwartz
Copy link
Contributor Author

The change looks correct but not when taken as a whole paragraph. We talk about "secure hash" and "message digests", then "insecure hash functions" and then we say that the first two terms are actually interchangeable.

I wasn't entirely sure how best to phrase this myself, yeah.

They are "secure hashes" in the sense that they are hashes designed for security, but they are insecure algorithms, if that somehow makes sense... is it enough to say "legacy algorithms" then, do you think?

@picnixz
Copy link
Member

picnixz commented Jan 27, 2025

I would leave out the term "insecure" so that users aren't confused with the terms "secure/insecure". Previously, MD5 wasn't categorized as insecure in the text. Similarly, I would remove the term "legacy" which could hint that these hashes are kept for backward compatibility but that's not the case as they are still used in production for other purposes that do not require the same security level.

@picnixz
Copy link
Member

picnixz commented Jan 27, 2025

Ah! I know why we have secure hash and message digest. It's because of the SHA and the MD family... MD means message digest while SHA means Secure Hash Algorithm.

@picnixz
Copy link
Member

picnixz commented Jan 27, 2025

Ideally I'd like to remove both terms. It's confusing. Let's just call them hash functions and that's it. No need for being historical IMO. That way we can also add a note saying that SHA1 and MD5 are not cryptographically secure hash functions but can be used for file integrity checks as they are more efficient than secure hash functions.

…oving sha1

More generally, the current documentation is a bit scattered, talking
about what terms are "equal" despite those terms not being very
interesting and given the term "secure hash", probably wrong (because
md5 and sha1 are not secure anymore).

Let's talk about cryptographically secure instead, and note that two of
them aren't. And then we can also link to the source for NIST going
through the removal process for SHA1.
This module implements a common interface to many different hash algorithms.
Included are the FIPS secure hash algorithms SHA224, SHA256, SHA384, SHA512,
(defined in `the FIPS 180-4 standard`_), the SHA-3 series (defined in `the FIPS
202 standard`_) as well as the non-cryptographically-secure algorithms SHA1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, "non-cryptographically-secure" may be too verbose. Let just say "as well as SHA1 (formerly...) and RSA's MD5 (..))". If we want to be precise we should rather add a note concerning the weaknesses of sha1/md5 as cryptographically secure hash functions but I don't know if we wouldn't reinvent the wheel instead in this PR.

WDYT @gpshead? do you think it's fine not to talk too much about security/applications in the first paragraph of hashlib?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just go with "legacy".

"""... as well as the legacy algorithms SHA1 (formerly part of ...) and the MD5 algorithm (defined in ...)"""

perhaps. This elides attributing MD5 to RSA as well as it isn't important where MD5 came from, that's what the linked-to RFC is for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that but I would have used the term "legacy" when talking about something that is only kept for backwards compatibility, and not something that I expect to see in production. Maybe it's more a language issue on my side though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm however happy if we can remove the mention to RSA though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

legacy means a lot of things. It is simpler and more pedantically accurate than saying not-cryptographically-secure.

I'd edit this PR myself with those words but it appears the author unchecked the allow-edits checkbox on it (Github isn't showing me an edit option)?

@gpshead gpshead self-assigned this Jan 27, 2025
@gpshead gpshead added needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Jan 27, 2025
@python python deleted a comment from mauroia609 Feb 5, 2025
@python python deleted a comment from mauroia609 Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review docs Documentation in the Doc dir needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

3 participants