Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow a configurable digest length #169

Closed
wlipa opened this issue Dec 7, 2023 · 11 comments
Closed

allow a configurable digest length #169

wlipa opened this issue Dec 7, 2023 · 11 comments

Comments

@wlipa
Copy link
Contributor

wlipa commented Dec 7, 2023

The asset digest is 40 random hex characters. This is far more than needed to provide disambiguated assets and contributes to larger page size (which doesn't compress well due to the randomization) and more noisy looking HTML source.

It would be nice to have a config option to specify the digest size. It can be monkeypatched like this, but obviously a patch is more fragile.

More boldly, I would suggest that a shorter digest size (here 8) would be the right default for almost every app.

class Propshaft::Asset
  def digest
    @digest ||= Digest::SHA1.hexdigest("#{content}#{version}")[0,8]
  end
end
@brenogazzola
Copy link
Collaborator

brenogazzola commented Dec 7, 2023

Hey @wlipa. Thank you for the interest.

That said, the transfer step is only one part of the overall response time and after compression the differences between 8 and 40 should be pretty small. So this looks to me like a micro optimization (like using JSON instead of HTML)

Do you have numbers for the improvement on the overall request time (from request to LCP at least) for this change?

@wlipa
Copy link
Contributor Author

wlipa commented Dec 7, 2023

I don't have numbers, but consider that it's every reference to every asset, including not only what's on the page but asset requests, keys in caches, etc. Some pages might have hundreds or even thousands of assets. And these fragments don't compress that well (basically just reversing the hex encoding).

To be honest that part of it bugs me less than the impact on the source and developer tools. I dislike having these large hex fragments all over the place - it makes it harder to quickly look at and understand the source.

@brenogazzola
Copy link
Collaborator

Alright, I’m not against reducing it. It did feel excessive to me at times. But I’d like to be sure we are choosing a good default. Unlike JS bundlers, propshaft is handling a lot more assets then they are, so their default might not be applicable.

Why is 8 good enough?

@wlipa
Copy link
Contributor Author

wlipa commented Dec 7, 2023

The digest needs to be long enough to have a practically zero chance that two versions of an asset will accidentally get the same digest. 8 hex is a long - 4 billion. I consider an N in 4 billion chance that a new version of an asset will overlap one of the N previous versions in the cache to be practically zero.

@brenogazzola
Copy link
Collaborator

I've talked to the others and we've agreed to reduce the digest size. Would you like to open a PR for that?

@wlipa
Copy link
Contributor Author

wlipa commented Dec 25, 2023

Sure, I will do that. Thanks for considering the issue.

@dhh
Copy link
Member

dhh commented Dec 27, 2023

Let's def just pick a value. This isn't something we need to allow configuration on. 8 sounds good to me.

@wlipa
Copy link
Contributor Author

wlipa commented Dec 29, 2023

Addressed by #173

@wlipa wlipa closed this as completed Dec 29, 2023
@midnight-wonderer
Copy link

midnight-wonderer commented Jun 29, 2024

I hate to be that guy, but the "N in 4 billion chance" estimation misrepresents the actual probability.

This is the birthday paradox.
Here is my understanding:

The eight digits of a hexadecimal number are 32 bits long; this will have a 0.5 probability of collision at about 2^16 builds.
2^16 builds = 65,536 builds

This is still a large number, but it is way more possible than previously estimated and is not, by any means, "practically zero."

I would suggest ten digits instead, which translates to
0.5 collision probability at 1,048,576 builds.

That would suffice for most people.
But 12 digits are not too big, IMO, to be on the safe side and prevent another me from happening.

I suck at explaining things; please read the article on birthday paradox to get a better understanding.

@wlipa
Copy link
Contributor Author

wlipa commented Jun 29, 2024

It's not "builds"; it's changes to a specific asset. If the number of changes to a specific asset is very small compared to the size of the group (4 billion), the birthday paradox really doesn't become significant. I think for that is the case for almost every app.

If you have deployed tens of thousands of changes to a specific asset, and those have managed to stay alive in a cache, then yes, you could be hit with the birthday paradox.

What do you mean by "prevent another me from happening"?

@midnight-wonderer
Copy link

Since we chose this magic number for everyone, the less assumptions about other people we made, the better.
Now, about your observations and concerns.

It's not "builds"; it's changes to a specific asset.

This observation made an assumption that everyone's code looks the same with the current best practice, code splitting, and not compiling as a big javascript file. Otherwise, new hash will be generated on every builds. Hence, the term "build" is more appropriate IMO because it made fewer assumptions than specific asset changes.

If you have deployed tens of thousands of changes to a specific asset, and those have managed to stay alive in a cache.

This observation made an assumption about the cache and its TTL. The counterexample is assets that are built and uploaded to s3-liked storage while the main app is versioned and can be rolled back. The collision can break the older version of the app or destroy some info in such a scenario. We might not care about it, but some might.

What do you mean by "prevent another me from happening"?

I mean, prevent another occurrence if we choose 10 digits and someone comes up later and makes an argument similar to mine.
For whatever reason, a million-builds collision resistance is not enough for them.

I understand @wlipa's intention and why you tried to make it configurable. However, the situation changed from what you initially intended because of the maintainer value convention or universal parameters over configurations.
My argument is that 8 is sane for, maybe, the majority of use cases. However, it should not be the number for all.

8 digits (60k builds) is too short for me; I just wonder if 12 digits is too long for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants