Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPSDIR Review #145

Closed
kyzer-davis opened this issue Sep 7, 2023 · 5 comments · Fixed by #152
Closed

OPSDIR Review #145

kyzer-davis opened this issue Sep 7, 2023 · 5 comments · Fixed by #152

Comments

@kyzer-davis
Copy link
Collaborator

----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Minor level comments:

(2) p 14, sec 4.2.  Version Field

      Table 2: UUID variant 10x versions defined by this specification
   An example version/variant layout for UUIDv4 follows the table where
   M represents the version placement for the hexadecimal representation
   of 0x4 (0b0100) and the N represents the variant placement for one of
   the four possible hexadecimal representation of variant 10x: 0x8
   (0b1000), 0x9 (0b1001), 0xA (0b1010), 0xB (0b1011)

Table 1 only defines 3 bits and doesn't obviously mention a 4th bit, and yet
the examples below are displaying 4 bits, where I assume that the 4th bit is
effectively arbitrary data in the UUID definition.  Please consider whether it
would be helpful to clarify this.

(3) p 16, sec 5.3.  UUID Version 3

   UUIDv3 values are created by computing an MD5 [RFC1321] hash over a
   given name space value concatenated with the desired name value after
   both have been converted to a canonical sequence of octets in network
   byte order.  This MD5 value is then used to populate all 128 bits of
   the UUID layout.  The UUID version and variant then replace the
   respective bits as defined by Section 4.2 and Section 4.1.

Presumably the MD5 is only actually used to populate 122 bits, since the other
6 are already spoken for with 'ver' and 'var'.

(4) p 29, sec 6.2.  Monotonicity and Counters

      For example, let's assume a system timestamp of 1 Jan 2023
      12:34:56.1234567.  Taking the precision greater than 1ms gives us
      a value of 0.4567, as a fraction of a millisecond.  If we wish to
      encode this as 12 bits, we can take the count of possible values
      that fit in those bits (4096, or 2 to the 12th power) and multiply
      it by our millisecond fraction value of 0.4567 and truncate the
      result to an integer, which gives an integer value of 1870.
      Expressed as hexadecimal it is 0x74E, or the binary bits
      0b011101001110.  One can then use those 12 bits as the most
      significant (left-most) portion of the random section of the UUID
      (e.g., the rand_a field in UUIDv7).  This works for any desired
      bit length that fits into a UUID, and applications can decide the
      appropriate length based on available clock precision, but for
      UUIDv7, it is limited to 12 bits at maximum to reserve sufficient
      space for random bits.

Would a valid alternative way of achieving this be to just directly write the
10 bits of the timestamp, giving 2 extra bits of randomness and a slightly
cheaper calculation?
@kyzer-davis
Copy link
Collaborator Author

kyzer-davis commented Sep 7, 2023

On the 122 vs 128, in example of appendix C2 (and review of existing v3/v5 implementations) one simply modify the bits in the right position via some bit operations.
So 128 are generated and 6 bits are modified to the correct value vs creating 122 then shifting them apart to insert the 6 new bits.
The first method is how current implementations are operating on RFC4122 verbiage/guidance so we have to carry that into this draft or it will break many implementations.

Edit: I may add some text pointing to C.2 to state there is an example of how this bit modification is done so folks can see this.
Repeat for v5 section to C.4 to show bit modification (and discard)

@LiosK
Copy link
Contributor

LiosK commented Sep 8, 2023

Would a valid alternative way of achieving this be to just directly write the 10 bits of the timestamp, giving 2 extra bits of randomness and a slightly cheaper calculation?

"just directly write the 10 bits of the timestamp" does not really make sense here unfortunately because the timestamp here is a sub-millisecond fraction. There is no canonical or natural way (as far as I know) to take the most significant 10 bits (or whatever bits) of a fractional number unless we dig deep into IEEE 754 floating-point numbers. Computationally, the example calculation is equivalent to the following integer arithmetics and is not so expensive as one might think from the example code that relies on floating-point math.

# take submillisecond fraction from nanosecond timestamp
(456_700 << 12) // 1_000_000

@fabiolimace
Copy link

fabiolimace commented Sep 8, 2023

Would a valid alternative way of achieving this be to just directly write the
10 bits of the timestamp, giving 2 extra bits of randomness and a slightly
cheaper calculation?

The 2 extra bits can be populated with bits from a higher precision timestamp. If the system timestamp has a precision of 9 decimal digits, say "0.123456789", instead of the 7 digits of Windows, we can do this:

>>> nanos = int(0.123456789 * 1_000_000_000)
>>> msecs = int(nanos / 1_000_000)
>>> rand1 = int(nanos % 1_000_000 / 100)
>>> 
>>> print( nanos )
123456789
>>> print( msecs )
123
>>> print( rand1 )
4567

or this by avoiding divisions:

>>> nanos = int(0.123456789 * 1_000_000_000)
>>> msecs = nanos >> 20 
>>> rand1 = nanos >> 8 & 0xfff
>>> 
>>> print( hex( nanos ) )
0x75bcd15
>>> print( hex( msecs ) )
0x75
>>> print( hex( rand1 ) )
0xbcd

The first example is more intuitive, so it is preferred in my opinion. The second one is more accurate as it produces a 60-bit timestamp with 100-nanosecond precision and no loss due to IEEE 754 arithmetics.

P.S.: However, I'm not sure anymore whether prescription of a formula or algorithm for shifting the bits should be included in the document. Perhaps it is beyond the scope of the document as it's hard to put this into words.

@LiosK
Copy link
Contributor

LiosK commented Sep 8, 2023

I don't think we should include the (456_700 << 12) / 1_000_000 technique in the document because it's just an implementation detail! In low-level languages, it is very natural to get a nanosecond timestamp from clock_gettime() and perhaps many implementers reaches the same technique. Plus, the / 1_000_000 is expected to be eliminated by compilers through division-by-constant optimization, so in reality, the current example algorithm does not need to involve floating-point math or integer division.

kyzer-davis added a commit that referenced this issue Sep 20, 2023
@kyzer-davis
Copy link
Collaborator Author

kyzer-davis commented Sep 20, 2023

Pushed 8b2e967
Only changed the item calling out the bit swapping operation (and discard for v5).
Edit2: I split the variant item into #153

@kyzer-davis kyzer-davis mentioned this issue Sep 20, 2023
@kyzer-davis kyzer-davis linked a pull request Sep 20, 2023 that will close this issue
kyzer-davis added a commit that referenced this issue Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants