Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pool: Improve share UUID with random value. #390

Merged
merged 2 commits into from
Sep 22, 2023

Conversation

davecgh
Copy link
Member

@davecgh davecgh commented Sep 21, 2023

This is rebased on #389.

The postgres tests currently fail when running on fast systems due to the lack of a generating unique share IDs.

After digging into the failure a bit, the root cause is that the UUID used for the share id is not created with any randomness to it.

As the test failure demonstrates, it is generally not a good idea to rely solely on timestamps for generating supposedly unique values. While it is fairly unlikely that two shares will come in at the same nanosecond in most cases, there are a variety of factors that can lead to such a situation.

So, this appends a random value to the share UUID in order to help ensure a unique overall value and updates the callers accordingly. It also updates the database upgrade code to ensure the old share id format is used when performing the old version upgrade.

@jholdstock
Copy link
Member

This looks fine overall, but its gotten me thinking.

Given that there are no known deployments of dcrpool at the moment I guess we can play a bit loose with it, but ideally we would probably want a brand new upgrade to move all existing shares to the new ID format.

But, if we are to assume there are no deployments and throw caution to the wind, we could probably just throw away all of upgrade code and start from scratch. Any thoughts?

@davecgh
Copy link
Member Author

davecgh commented Sep 21, 2023

There really no reason to do an upgrade here imo. I called this out in the final commit message:

A new database upgrade is not needed because any existing entries are
necessarily already unique or they would've hit duplicate constraint
failures and there is not any code that explicitly relies on the key
being a specific length beyond the first 8 bytes being the timestamp.

Ideally, a v4 UUID would be created according to RFC 4122 instead,
however, that would be a much more invasive change since the current
bolt database code uses the UUID as a key and relies on the timestamp
being the first 8 bytes in the key for filtering purposes.

As mentioned, the only thing that cares about the format is the bolt code which looks at the first 8 bytes and I specifically didn't change them. Any future code that did care could very easily check the length.

@davecgh
Copy link
Member Author

davecgh commented Sep 21, 2023

But, if we are to assume there are no deployments and throw caution to the wind, we could probably just throw away all of upgrade code and start from scratch. Any thoughts?

We could do that, but I don't see any reason to. I was careful not to break anything with this and old upgrades still work as intended.

This modifies the pool tests to avoid cleaning up the postgres data when
there is a test failure and the -failfast test flag is set so the
contents of the database can be examined.

Since a failed run with -failfast will no longer purge the data, it also
adds logic to purge the data prior to running the tests to ensure any
previous failed runs are cleaned up on subsequent runs.
The postgres tests currently fail when running on fast systems due to
the lack of a generating unique share IDs.

After digging into the failure a bit, the root cause is that the
supposed UUID used for the share id is not created with any randomness
to it.

As the test failure demonstrates, it is generally not a good idea to
rely solely on timestamps for generating supposedly unique values.
While it is fairly unlikely that two shares will come in at the same
nanosecond in most cases, there are a variety of factors that can lead
to such a situation.

So, this appends a random value to the share UUID in order to help
ensure a unique overall value and updates the callers accordingly.  It
also updates the database upgrade code to ensure the old share id format
is used when performing the old version upgrade.

A new database upgrade is not needed because any existing entries are
necessarily already unique or they would've hit duplicate constraint
failures and there is not any code that explicitly relies on the key
being a specific length beyond the first 8 bytes being the timestamp.

Ideally, a v4 UUID would be created according to RFC 4122 instead,
however, that would be a much more invasive change since the current
bolt database code uses the UUID as a key and relies on the timestamp
being the first 8 bytes in the key for filtering purposes.
@jholdstock jholdstock merged commit 445e130 into decred:master Sep 22, 2023
@davecgh davecgh deleted the rng_in_share_uuid branch September 22, 2023 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants