Discussion: UUIDv7 subsecond precision encoding #24

kyzer-davis · 2021-08-10T17:31:39Z

Question: Factions of a second representation or specific number of milliseconds/microseconds/nanoseconds for subsecond precision encoding within UUIDv7's subsec_a, subsec_b, and subsec_seq_node

Current draft-02 Implementation as per @bradleypeabody (who wrote the UUIDv7 section)

UUIDv7 encodes subsecond precision as fraction of a second:
Nanoseconds (ns) encoding utilizes 38 total bits
Microsecond (µs) encoding utilizes 24 total bits
Millisecond (ms) encoding utilizes 12 total bits

Sources from email between Brad and I on the topic:

[UUIDv7] supposed to be a fraction as opposed to the specific number of milliseconds.
If you want to indicate .75 seconds (three quarters of a second), you would write in decimal, following the period, the digits "75", right?
Well you do the exact same thing in binary, except that instead of "7" meaning 7 tenths, plus "5" meaning 5 hundredths, you do:
11
which means 1 half, plus one quarter.
Or, put another way, in decimal the digits go, from left to right:
tenths
hundredths
thousandths
ten thousandths
etc.
In binary it goes
halfs
quarters
eights
sixteeths
etc

To encode a fraction of a second s (expressed as a rational number in the range [0-1), e.g. 0.321 seconds) into a bit field of length n, calculate m as the maximum integer value representable with n bits plus one (aka 2^n, 1<<n, or "an int with the lower n bits set to 1, converted to a float64"). And then the formula is simply: s / (1/m) - the remaining fractional part is truncated and the number encoded as an integer. (e.g. 0.321 seconds encoded with 12 bits would be 0.321 divided by (1/4096) = 1314.816, truncated and expressed an integer value gives 1314 or in hex 0x522 or in binary 010100100010)

Taking a page from how floating point numbers are encoded, we could express fractions of a second as a bitfield divisor of whatever length a particular implementation uses. E.g. JS implementations which only have millisecond precision can reliably encode this information into 12 bits (probably 10 or 11 actually, but I digress). The first bit set to 1 would mean 0.5 seconds, two bits set to 1 and 1 would mean .75 seconds, a 0 and then 1 would be 0.25 seconds, etc.

One down side I see is that it requires floating point math (or some system that can represent fractional numbers) for encoding and decoding, which perhaps on some embedded systems may not necessarily be readily available - not sure how big a deal that concern is. It's also slightly more complicated than dealing with integer math, but I don't think to an extent that is problematic.

Alternative is specific number of milliseconds/microseconds/nanoseconds as integer.

Nanoseconds (ns) in a timestamp take 30 bits. (1,000,000,000 nanoseconds in 1 seconds so 1000000000 = 111011100110101100101000000000)
Microseconds (µs) in a timestamp take 20 bits. (1,000,000 microseconds in 1 second so 1000000 = 11110100001001000000)
Milliseconds (ms) in a timestamp take 10 bits. (1,000 millisecond in 1 second so 1000 = 1111101000)

Sources: https://en.wikipedia.org/wiki/Orders_of_magnitude_(time)

More Discussion on the alternative:

If alternative was utilized in Draft 02 we would encode as such:

Milliseconds (ms): All 12 bits of subsec_a is fully dedicated to millisecond information with a pad of two bits at the start since we only need 10. Easier to do this padding since Version bits come up right after.
Microseconds (µs): All 24 bits of both subsec_a and subsec_b utilized for microsecond information with 4 bits padded at the start since we only require 20. Easier to pad here because Variant bits are coming up.
Nanoseconds (ns): subsec_a, subsec_b , and 6 bits of subsec_seq_nod used for nanosecond information. Final 48 bits of subsec_seq_nod are used for random/entropy/node data. No padding required for this scenario.

The text was updated successfully, but these errors were encountered:

kyzer-davis · 2021-08-10T17:54:52Z

Adding Draft V01 quote from thayne: https://news.ycombinator.com/item?id=28088213

For UUIDv7 it isn't very clear on how the subsecond part is encoded. From the description of decoding it sounds like it would be multiplying the fraction of a second by 2^n. but that isn't very explicit. And if you want to avoid floating point arithmetic you'll need to include a 10^p in there where p is how many digits of precision you have in base 10 (such as 3 for milliseconds)

fabiolimace · 2021-08-11T01:18:39Z

I don't like using 38, 24 and 12 bits for nanos, micros and seconds respectively. I think this could 'waste' a few bits. I prefer to use 30, 20 and 10. But it's just my preference.

However, fields of 38-24-12 sub-seconds can be very convenient to implement using string operations rather than bitwise operations. It's not efficient to create UUIDs with string operations, but it works well, at least in Python. At the end of this answer, there are 3 implementations of UUID7, tackling the problem with a 'character-oriented/based/driven' approach.

We can think of UUIDs as a 3-part string, leaving out version and variant:

sub: 9 chars (36 bits)
subsec: 9 chars (36 bits)
node: 12 chars (48 bits)

The subsec part has 3 sub-parts:

subsec a: 3 chars (12 bits)
subsec b: 3 chars (12 bits)
subsec c: 3 chars (12 bits)

The 'character-oriented' layout is this:

|---------|---|v|---|r|---|------------|
          
|   secs  |    subsecs    |    node    |

          | a | | b | | c |            

v: version: [7]
r: variant: [89ab]

These are the character settings for nanosecond, microsecond and millisecond precision, respectively:

----------------------------------------
field      chars   bits    value
----------------------------------------
secs           9     36    seconds
subsec_abc     9     36    nanoseconds
node          12     48    random
version        1      4    [7]
variant        1      4    [89ab]
              32    128                
----------------------------------------
Settings for nanoseconds precision

----------------------------------------
field      chars   bits    value
----------------------------------------
secs           9     36    seconds
subsec_ab      6     24    microseconds
subsec_c       3     12    random
node          12     48    random
version        1      4    [7]
variant        1      4    [89ab]
              32    128                
----------------------------------------
Settings for microseconds precision

----------------------------------------
field      chars   bits    value
----------------------------------------
secs           9     36    seconds
subsec_a       3     12    milliseconds
subsec_bc      6     24    random
node          12     48    random
version        1      4    [7]
variant        1      4    [89ab]
              32    128                
----------------------------------------
Settings for millisecond precision

Here are the Python implementations of the 3 configurations shown above:

#---------------------------------
# UUID7 with nanosecond precision
#---------------------------------

import uuid
import time
import random

now = time.time_ns()
sec = now // 10**9
subsec = now % 10**9

f1 = f'{sec:09x}'
f2 = f'{subsec:09x}'
f3 = f'{random.randrange(2**48):012x}'

v1 = '7' # version
v2 = random.choice(['8', '9', 'a', 'b']) # variant

id = f1 + f2 + f3
id = id[:12] + v1 + id[12:]
id = id[:16] + v2 + id[16:]

uid = uuid.UUID(id)

print(uid) # 06113208-6028-78d2-aa42-aae9eba8a7a2

#---------------------------------
# UUID7 with microsecond precision
#---------------------------------

import uuid
import time
import random

now = time.time_ns()
sec = now // 10**9
subsec = now % 10**9 // 10**3 # ajust to micros

f1 = f'{sec:09x}'
f2 = f'{subsec:06x}'
fx = f'{random.randrange(2**12):03x}' # eXtra bits: random
f3 = f'{random.randrange(2**48):012x}' # node bits: random

v1 = '7' # version
v2 = random.choice(['8', '9', 'a', 'b']) # variant

id = f1 + f2 + fx + f3
id = id[:12] + v1 + id[12:]
id = id[:16] + v2 + id[16:]

uid = uuid.UUID(id)

print(uid) # 06113208-e044-7660-b754-b43ac4b4a721

#---------------------------------
# UUID7 with millisecond precision
#---------------------------------

import uuid
import time
import random

now = time.time_ns()
sec = now // 10**9
subsec = now % 10**9 // 10**6 # ajust to millis

f1 = f'{sec:09x}'
f2 = f'{subsec:03x}'
fx = f'{random.randrange(2**24):06x}' # eXtra bits: random
f3 = f'{random.randrange(2**48):012x}' # node bits: random

v1 = '7' # version
v2 = random.choice(['8', '9', 'a', 'b']) # variant

id = f1 + f2 + fx + f3
id = id[:12] + v1 + id[12:]
id = id[:16] + v2 + id[16:]

uid = uuid.UUID(id)

print(uid) # 06113209-430f-783b-89b8-68d0adb7fa01

fabiolimace · 2021-08-11T10:50:10Z

Sorry, I didn't focus on the main question in my previous answer.

I prefer the first case (one-second faction), although the second (specific number of ms/us/ns) is simpler and more efficient.

As I said in a previous issue, encoding and decoding can be done with fixed-point arithmetic. We don't have to deal with floating-point math directly.

To encode, do the following:

subsec = fraction * 2^bits

To decode, do the reverse:

fraction = subsec / 2^bits

That's it.

The decoder can extract the exact original fraction, unless the decoder reads more bits than the encoder used.

The excellent example given by @bradleypeabody above can be expressed like this in Python:

bits = 12
fraction = 0.321
subsec = round(fraction * 2**bits)

To decode, the reverse:

bits = 12
subsec = 1315
fraction = round(subsec / 2**bits, 3)

I used round() because int() caused slight differences when I was writing an example.

In the other case, which simply copies the subsec 'verbatim', you don't do that. To encode, you simply place the subsec bits in their expected positions. That is all. The encoder doesn't care what the decoder does to extract the original fraction. The examples in my previous answer do it.

kyzer-davis · 2021-08-11T13:54:43Z

@fabiolimace Great writeups. I agree with all points.

However, if the alternative option only ever requires a maximum of 30 bits to encode up to nanoseconds properly why not make the total bits ever allocated to subsec_a/b/c 30 bits?

12/12/12 is nice thematically and for subsec_b since the ver/var placement is sub-optimal for what we want to do. But with 30 bits maximum we can free up 6 bits that could either be shifted into unixts for a larger timestamp and more future proofing or node to be used as additional sequence counter space (missed in your python examples) or more random (which was a consistent point of contention in the threads here.)

Note: I really like those ASCII tables. I may use that format in draft-02 to help illustrate UUIDv6/7/8 Basic Creation Algorithm examples along with the current ASCII bit layout format and pseudocode steps

bradleypeabody · 2021-08-12T06:17:47Z

TLDR; look at the bottom bit layout for the proposal

Yeah there seems to be a lot of pain caused by how not-aligned the various fields are. A friend of mind has been hacking away on this: http://www.new-uuid.info/ (just put it up and still wip) and we went through the same thing - dealing with raw bits in most any language is painful (Go in this case).

Some ideas:

1. One idea might be to try changing the UUID7 spec so it is more byte-aligned - more practical to use and implement. Like taking the ver and var fields and blocking off the entire byte that each of those use and giving some meaning to the extra bits there, but otherwise everything else can be by moved into place in whole bits. To make it really usable we'd probably want to change the 36-bit timestamp to either a 32-bit unsigned int, or another option would to be make it 40 bits (5 whole bytes).

2. Another option, crazy idea here, bear with me: What if we ditch the positions of these old fields and instead provide another means of reliably detecting v6/7/8 UUID? This whole thing would be way simpler if we didn't have to dance around those two old legacy fields (ver and var). And while their positions are obviously locked into old software, we could probably get away with breaking that for newer values IF there were still a reliable way for software to detect which version was in use. Actually, just looking at this newly again - https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.1 :

The variant field determines the layout of the UUID. That is, the
interpretation of all other bits in the UUID depends on the setting
of the bits in the variant field.

...

1 1 1 Reserved for future definition.

That might be our ticket right there.... Per the spec, if we set those three bits to 1s, then we can change whatever we want. I mean, I'm sure we'll break someone's implementation somewhere, but per the spec it's legit - we can move or remove the version field if we set those bits. Okay so....

What if we set those var bits to 1s, and then make the rest of the byte the version field?! By combining these legacy fields, we then only have to work around this one byte:

   0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | var |  ver    |                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

(this incidentally will provide 5 version bits and allow for 32 UUID versions instead of 16 with the existing 4 bits)

If you ask me, this is definitely simpler.

From there, the main thing for UUID7 to figure out alignment for is this unix timestamp field. I'm not sure how hard we want try / complex we want to make it, in solving that. There are some simple transforms that would solve the problem. E.g. using the number of minutes since unix epoch instead of seconds fits in 32 bits up to year 10136, and then we could have a sub-minute portion instead of a sub-second portion. But I'm not sure any of this is better than bit shifting or just choosing 40 bits for the integer timestamp. Also, I mean heck, a uint32 gets us to the year 2106 - maybe we should just go with that and assume that if people are still using this standard 85 years from now it will have served its purpose and it's time for another version... If we did that, we'd get something like this:

   0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      unix_timestamp_u32                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       subsec_seq_rand                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | var |  ver    |       subsec_seq_rand                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       subsec_seq_rand                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

(Where subsec_seq_rand is a concatenation of the subsecond value, sequence counter and random data, with whatever lengths the implementation wants to choose for each, up to a max length of 88 bits / 11 bytes)

I know this would be a big change, but it would definitely be simpler and easier on the development side and to show/explain to people.

I'm still thinking this over myself and just throwing this out there. Let me know what y'all think.

nurked · 2021-08-12T07:03:04Z

Well, @bradleypeabody I said that it's annoying, but it's not not possible.

First, data for everyone, I just finished couple more tweaks of http://www.new-uuid.info. The website is written in Go and runs my implementation of the UUIDv7 generator. Repo here.

On the website, you can literally “dial in” the precision lengths for different parts of the UUID and see on-line how this generator provides you values.

At the end of this website, I added a simple playground where you can try to convert milliseconds into "Brad's encoded" milliseconds, just so it's real to touch.

Now, as for the implementation. This is a very rough draft. Although it's working.

To simplify what you are talking about (ver and var fields) you can just subtract the length of those fields from a total length of UUID (128 - 4 - 2 = 122 bits) and just work with 122 bit structure. Once you're done populating it, you just copy it into the final bit array, adding those values in set positions.

As for working with bits themselves, it's not that complicated.

// getBit returns the value of a bit at a specified positon in given byte array
func getBit(b []byte, index int) bool {
	pos := index / 8
	j := uint(index % 8)
	j = 7 - j
	return (b[pos] & (uint8(1) << j)) != 0
}

// setBit sets the value of a bit at a specified positon in []byte
func setBit(b []byte, index int, value bool) {
	pos := index / 8
	j := uint(index % 8)
	j = 7 - j
	if value {
		b[pos] |= (uint8(1) << j)
	} else {
		b[pos] &= ^(uint8(1) << j)
	}
}

And this is if you are very concerned about memory consumption (which is not the case nowadays anyway). You can use an array of bools instead. But I did it “the right way”.

Extracting the timestamp from this version of UUID is elementary from a viewpoint of a processor.

// Timestamp returns unix epoch stored in the struct without millisecond precision
func (u UUIDv7) Time() time.Time {
	bytes := [16]byte(u)
	tmp := toUint64(bytes[0:5])
	tmp = tmp >> 4 //We are off by 4 last bits of the byte there.
	return time.Unix(int64(tmp), 0)
}

You just use the first 38 bits (5 bytes) and shift the resulting value by 4. This is something that any 64bit processor would be able to do in a click.

It takes longer to extract other data. Currently, I'm using getBit/setBit, but it can be done with logical shifts. Those would be complex from the viewpoint of a programmer, but super-easy for a processor.

Pros: It's good that I can adjust the format. For example, if I know that I'm usually generating about 10 values per second, I could set sub-second precision and counter to 4 bits each and be happy with the random data that goes after.
Cons: Took me about 20 hours to write it in golang. Disclaimer: golang is not my native tongue, so I had to pick it up, as I go. A pro in C++ or C# would be able to re-build this generator in one day.

I guess the main question is: "Who would be implementing this system?".

nerg4l · 2021-08-12T08:08:20Z

I wouldn't go as far as changing the position of the version bit. It would be better to keep the compatibility with libraries implementing RFC4122. Sure cutting nanoseconds in three way makes it a bit more complicated but not marginally in my opinion.

I also did some experimentation with UUIDv7 in go: https://github.com/coding-socks/uuiddraft/blob/v7/uuid.go. This implementation already contains the earlier suggestion to only use 30 bits (12, 12, 6) for nanosecond precision. Having the different subsec sections is as simple as the following:

nsec := timestamp % int64(time.Second)
subsecA := (nsec >> 18) & 0xfff
subsecB := (nsec >> 6) & 0xfff
subsecC := nsec & 0x3f

Going line by line:

nsec is int64 but only 30 bits is used
We shift 18 bits 12 + 6 = 18 to the right so we continue with the left-most 46 bits 64 - 18 = 46 and 0xfff makes sure we keep the 12 right-most bits of that 46 bits.
We shift 6 bits 6 = 6 to the right so we continue with the left-most 58 bits 64 - 6 = 58 and 0xfff makes sure we keep the 12 right-most bits of that 58 bits.
We start with the left-most 64 bits and 0x3f makes sure we keep the 6 right-most bits.

With the current 38 bits (12, 12, 14) nanosecond precision it would be the following:

nsec := timestamp % int64(time.Second)
subsecA := (nsec >> 26) & 0xfff
subsecB := (nsec >> 14) & 0xfff
subsecC := nsec & 0x3fff

Going line by line:

nsec is int64 but only 30 bits is used
We shift 26 bits 12 + 14 = 26 to the right so we continue with the left-most 38 bits 64 - 26 = 38 and 0xfff makes sure we keep the 12 right-most bits of that 38 bits.
We shift 14 bits 14 = 14 to the right so we continue with the left-most 50 bits 64 - 14 = 50 and 0xfff makes sure we keep the 12 right-most bits of that 50 bits.
We start with the left-most 64 bits and 0x3fff makes sure we keep the 14 right-most bits.

It seems I'm one of those people who thinks bit manipulation is easy. 😄

To implement all precisions I'm waiting for Go 1.17 which will add Time.UnixMilli and Time.UnixMicro methods.

kyzer-davis · 2021-08-12T19:42:54Z

http://www.new-uuid.info

@nurked is very cool, I enjoy the visualizations to help drive the point home!

As tedious as the version/variant are, I agree with the others in leaving them where they is the best decision.

That being said, Brad's comment about the variant bits got me thinking, why not implement both?

UUIDv7 version (0111) + Variant 8/9/A/B (10x) indicates that subsecond encoding with fractions, floating point math in use. 38 bits total for up to nanosecond precision. (Currently in draft 01 but I can clean up the text to illustrate the point better)
UUIDv7 version (0111) + Variant E/F (111) indicates that subsecond encoding with specific number of seconds as an integer. 30 bits total up to nanosecond precision. (Net new text I can work on authoring.)

With this in mind we could also set the definition of any UUID Version (UUIDv1/2/3/4/5/6/7/8) + Variant E (111) as a method for signaling an alternative bit layout to any previously defined UUID. With that precedent set UUIDv6 could be converted to UUIDv1 (0001) + Variant E/F (111) as a method to signal the alternative encoding for UUIDv1 (Which is the entire goal of UUIDv6 at the moment)

Edit: See #26 for further discussion on UUIDvXε variant usage.

nerg4l · 2021-08-13T11:32:45Z

@kyzer-davis I really like the idea of UUIDv1ε because UUIDv6 in draft 01 is almost the same as UUIDv1 with different byte order. On the other hand, I don't quite understand the difference between UUIDv6 and UUIDv6ε. Do you mind opening an other issue to discuss the possible renaming to focus on subsecond precision in this issue?

If UUIDv7 subsecond precision in case of nanoseconds is changed from 36 to 30 bits the layout would became the following:

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                            unixts                             |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |unixts |         nsec          |  ver  |         nsec          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |var|   nsec    |      seq      |             rand              |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                             rand                              |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

From:

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                            unixts                             |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |unixts |         nsec          |  ver  |         nsec          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |var|           nsec            |      seq      |     rand      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                             rand                              |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

In both cases nsec is cut in 3 peaces there for I think it is better to keep it as compact as possible. However, in case of millisecond and microsecond precision decreasing nsec does not provide as much benefit because they don't free up a byte for seq.

Also, in case of 36 bits we could consider a way which would allow implementers to avoid bit shifting.

subsecA for subsecond milliseconds. 10 bits stored on 12 bits.
subsecB for submillisecond microseconds. 10 bits stored on 12 bits.
subsecC for submicrosecond nanoseconds. 10 bits stored on 14 bits.

Simple division and remainder calculation should be enough ~~and bit shifting by 10 is still an option~~.

Finally, 2 bits for subver could be added somewhere which tells the the precision of UUIDv7 but I still hesitant with this idea and should be discussed in an other issue.

Edit: Statement about bit shifting being an option is not true.

kyzer-davis · 2021-08-13T14:28:26Z

Do you mind opening an other issue to discuss the possible renaming to focus on subsecond precision in this issue?

Done, #26 open, will edit my original comment.

nerg4l · 2021-08-16T19:58:13Z

What do you think about having a fixed length subsec_seq value? The base for it would be UUIDv7 with 30 bit nsec.

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                            unixts                             |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |unixts |      subsec_seq       |  ver  |      subsec_seq       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |var|        subsec_seq         |             rand              |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                             rand                              |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

subsec_seq can be calculated the following ways:

Incase of nanosecond precision (subsec_nano << 6) | seq.
Incase of microsecond precision ((subsec_micro * 1000) << 6) | seq.
Incase of millisecond precision ((subsec_milli * 1000 * 1000) << 6) | seq.

seq could be limited to 6, 9 (1000 = 0b1111101000), 12 (1000 * 1000 = 0b11110100001001000000) bit length via &.

This would allow

to use mixed source UUIDv7 elements to keep relative order when sorting lexicographically;
to introduce more granual precision like ns / 100 with 7 bit seq.

Mixed precision example:

System A: Millisecond precision -> 1629140065080 * 1000 * 1000 = 1629140065080000000 = 0b1011010011011110111100100000111110100011100110011111000000000
System B: Microsecond precision -> 1629140065080093 * 1000 = 1629140065080093000 = 0b1011010011011110111100100000111110100011101001010100101001000
System C: Nanosecond precision -> 1629140065080093020 * 1 = 1629140065080093020 = 0b1011010011011110111100100000111110100011101001010100101011100

Order 1629140065080000000 (from System A), 1629140065080093000 (from System B), 1629140065080093020 (from System C).

I'm trying to avoid using 64 bit integer because some scripting language uses IEEE-754 double-precision (64 bit) format to represent double and float. In this case 53 int is the biggest they can represent. This means those languages cannot represent a fake 64 bit unix nano. Menawhile, a 34 int unixts and a 36 int subsec_seq coulde be used.

I assume 48 bit for random bits should be sufficient for millisecond, microsecond and nanosecond precision as well.

edo1 · 2021-08-17T00:58:09Z

I assume 48 bit for random bits should be sufficient for millisecond, microsecond and nanosecond precision as well.

IMO this is too little to avoid collisions. Especially with millisecond timestamp precision.

This means those languages cannot represent a fake 64 bit unix nano.

It only takes a few lines of code to get a 64-bit multiplication result using 32-bit arithmetic. Example

nerg4l · 2021-08-17T11:36:42Z

Continuing on the idea of subsec_seq the sequence could have an integer limit instead of bit limit. This would mean instead of bitwise | a simple addition would be used when combining sequence and subsec precision.

Max sequence number for nanosecond precision: 64. // Might be too small.
Max sequence number for microsecond precision: 64 * 1000.
Max sequence for millisecond precision: 64 * 1000 * 1000.

And sequence is allowed to overflow. Overflow could be done with seq = seq % (64 * x) which might be slower than bitwise operations.

@edo1

I assume 48 bit for random bits should be sufficient for millisecond, microsecond and nanosecond precision as well.

IMO this is too little to avoid collisions. Especially with millisecond timestamp precision.

Take into consideration sequence as well. With the rules from my previous comment it would be 48 bits random and 12 bits sequence in case of millisecond precision. With the rules from this comment it would be 48 bits random and a sequence with 64_000_000 maximum value. The later would allow the same theoretical maximum for each precision.

I'm also thinking about how high resolution time could be used in addition to sequence.

This means those languages cannot represent a fake 64 bit unix nano.

It only takes a few lines of code to get a 64-bit multiplication result using 32-bit arithmetic. Example

You are correct, I totally forgot you can use two 32 bit int as a 64 bit int. I will play with this after work.

fabiolimace · 2021-08-17T11:59:47Z

I would like to show you how to encode and decode in Python, Java and Javascript using the timestamp with 40 bits and 10-milli precision.

IMO a timestamp with 40 bits and 10-milli precision is easy to encode and decode. It also has the best 'score' in the spreadsheet created by @edo1 (#23 (comment)) for a 200 years life span. Also it is easy to describe the algorithm: just use the scale factor.

Also, it's not difficult to encode and decode in Javascript. You have to encode and decode the timestamp and the timestamp extension separately to get around the limitation of the Number object.

Here is (yet) another layout/structure I would like propose:

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                           timestamp                           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |   timestamp   | timestamp_ext |  ver  |      timestamp_ext    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |var|                         rand                              |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                             rand                              |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

timestamp: 40 bits, 10-milli precision
timestamp_ext: 20 bits, can be used to extend the precision up to 10-nano

Python

In Python you use the scale factor 100 * 2**20 to encode and decode.

Encode to fixed-point:

scale_factor = 100 * 2**20
timestamp = unixts * scale_factor

Decode back to floating-point:

unixts = timestamp / scale_factor

Here is a complete example:

import time

DEC_SCALE_FACTOR = 100
BIN_SCALE_FACTOR = 2**20
SCALE_FACTOR = DEC_SCALE_FACTOR * BIN_SCALE_FACTOR

def encode(unix_time):
    '''Encode to FIXED-point number'''
    return int(unix_time * SCALE_FACTOR)

def decode(timestamp):
    '''Decode to floating-point number'''
    return float(timestamp / SCALE_FACTOR)

# get UNIXTS as floating-point
unixts = time.time()

# encode UNIXTS to FIXED-point
timestamp = encode(unixts)

# decode timestamp to floating-point
unixts_decoded = decode(timestamp)

print(unixts)
print(unixts_decoded)

Java

In Java you also use the scale factor 100 * 2**20 to encode and decode.

Encode to fixed-point:

double scale_factor = 100.0 * Math.pow(2, 20);
double unixts = System.currentTimeMillis() / 1000;

long timestamp = (long) (unixts * scale_factor);

Decode back to floating-point:

double unixts_decoded = timestamp / scale_factor;

Here is a complete example:

public final class Uuid7Timestamp40b {

	private static final double DEC_SCALE_FACTOR = 100;
	private static final double BIN_SCALE_FACTOR = 1 << 20; // 2**20
	private static final double SCALE_FACTOR = DEC_SCALE_FACTOR * BIN_SCALE_FACTOR;

	public static long encode(final double unixts) {
		return (long) (unixts * SCALE_FACTOR);
	}

	public static double decode(final long timestamp) {
		return timestamp / SCALE_FACTOR;
	}

	public static double time() {
		return System.currentTimeMillis() / 1000.0;
	}

	public static void main(String[] args) {

		// get UNIXTS as floating-point
		double unixts = time();

		// encode UNIXTS to FIXED-point
		long timestamp = encode(unixts);

		// decode timestamp to floating-point
		double unixts_decoded = decode(timestamp);

		System.out.println(unixts);
		System.out.println(unixts_decoded);
	}
}

Javascript

In Javascript you use a decimal scale factor 100 and a binary scale factor 2**20 to encode and decode.

You can't use the fixed-point representation because of the internal representation of Number in javascript. But you can represent the timestamp as a hexadecimal.

Encode to hexadecimal:

First you encode the first 40 bits
Then encode the last 20 bits

Decode back to Number:

First decode the first 40 bits
Then decode the last 20 bits

Here is a complete example:

var DEC_SCALE_FACTOR = 100.0
var BIN_SCALE_FACTOR = Math.pow(2, 20)

function time() {
	return (new Date()).getTime() / 1000
}

function int(n) {
	return Math.floor(n)
}

// output in hexadecimal
function encode(unixts) {
	
	unixts = unixts * DEC_SCALE_FACTOR
	
	// encode the first 40 bits
	timestamp = int(unixts)
	hexa = timestamp.toString(16)
	padded = ("0000000000"+hexa).slice(-10)
	
	// encode the last 20 bits
	timestamp_ext = int((unixts - int(unixts)) * BIN_SCALE_FACTOR)
	hexa_ext = timestamp_ext.toString(16)
	padded_ext = ("00000"+hexa_ext).slice(-5)
	
	return padded + padded_ext
}

// input in hexadecimal
function decode(timestamp) {

	// decode the first 40 bits
	hexa = timestamp.slice(0,10)
	parsed = parseInt(hexa, 16)
	
	// decode the last 20 bits
	hexa_ext = timestamp.slice(10,15)
	parsed_ext = parseInt(hexa_ext, 16) / BIN_SCALE_FACTOR
	
	return (parsed + parsed_ext) / DEC_SCALE_FACTOR
}

unixts = time()
timestamp = encode(unixts)
unixts_decoded = decode(timestamp)

console.log(unixts)
console.log(unixts_decoded)

nerg4l · 2021-08-18T19:29:02Z

What if a 64 bit integer is used for nanosecond time whit a 1024 * precision sequence? The structure would look like this:

   0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           time_hi                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           time_mid            |  ver  |  time_low_and_seq_hi  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |var|  tlsh  |     seq_low      |             node              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             node                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

time: A 64 bit version of the current unix time in nanosecond or closest time in nanosecond. It should resolve times until the year 2262.
precision: The multiplier which needed to achieve nanosecond time.
seq: Clock counter with a maximum value of 1024 * precision. Bit length depends on precision but minimum 10 and maximum 40 bit is needed.
seq_low: Right most 10 bit of seq.
seq_hi: seq without the right most 10 bit.
time_hi: time from 0 to 32 bit.
time_mid: time from 33 to 48 bit.
time_low_and_seq_hi: (time + seq_hi) from 48 to 60 bit.
tlsh: (time + seq_hi) from 60 to 64 bit.
node: random bits.

I created an example in NodeJS: https://gist.github.com/nerg4l/7de56f715486fada84ac4c1f27a17b23. JavaScript does not have a 64 bit int primitive and when bitwise operations are used it converts the value to a 32 bit int. There for, most of the calculations had to be done in interesting ways.

I'm not sure if this is the correct way forward. It looks like a unix based UUIDv6 with 64 bit time.

In a lot of cases the main pain point is still the unfortunate position of ver and var.

Lastly, I was looking into maximum time resolutions on different operating systems https://en.wikipedia.org/wiki/System_time. For example, according to the linked wiki the best Windows can do is 100 ns but macOS/OSX can do 1 ns with mach_absolute_time and mach_timebase_info (at leat this is how go resolves time now on darwin) which is not in the wiki.

kyzer-davis · 2022-03-01T17:08:19Z

This was dropped from UUIDv7 in Draft 03 although I have re-worked some of the lessons learned into 5.1. Timestamp Granularity > Sub-second Precision and Accuracy bullet.

This was referenced Aug 10, 2021

V02: Fixed Typos and added 32 to 36 bit padding clarification #21

Closed

Discussion: Unix Timestamp Granularity in UUIDv7 #23

Open

kyzer-davis mentioned this issue Aug 13, 2021

Discussion: Redefine variant bit (111) definition #26

Open

theckman mentioned this issue Aug 27, 2021

UUIDv7 with Millisecond Precision and Clock Sequence Rollovers #40

Closed

stevesimmons mentioned this issue Dec 19, 2021

Clarify how to represent fractional seconds in UUIDv7 #44

Closed

kyzer-davis added the Discussion Further information is requested label Jan 31, 2022

kyzer-davis added the UUIDv7 All things UUIDv7 related label Feb 7, 2022

LiosK mentioned this issue Feb 11, 2022

v7: Add example layout that employs exotic precision timestamp #57

Closed

kyzer-davis added Out of Scope Topics not in scope for the RFC/Draft and removed UUIDv7 All things UUIDv7 related labels Feb 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: UUIDv7 subsecond precision encoding #24

Discussion: UUIDv7 subsecond precision encoding #24

kyzer-davis commented Aug 10, 2021 •

edited

Loading

kyzer-davis commented Aug 10, 2021 •

edited

Loading

fabiolimace commented Aug 11, 2021

fabiolimace commented Aug 11, 2021 •

edited

Loading

kyzer-davis commented Aug 11, 2021

bradleypeabody commented Aug 12, 2021 •

edited

Loading

nurked commented Aug 12, 2021

nerg4l commented Aug 12, 2021

kyzer-davis commented Aug 12, 2021 •

edited

Loading

nerg4l commented Aug 13, 2021 •

edited

Loading

kyzer-davis commented Aug 13, 2021

nerg4l commented Aug 16, 2021

edo1 commented Aug 17, 2021 •

edited

Loading

nerg4l commented Aug 17, 2021 •

edited

Loading

fabiolimace commented Aug 17, 2021 •

edited

Loading

nerg4l commented Aug 18, 2021

kyzer-davis commented Mar 1, 2022

Discussion: UUIDv7 subsecond precision encoding #24

Discussion: UUIDv7 subsecond precision encoding #24

Comments

kyzer-davis commented Aug 10, 2021 • edited Loading

kyzer-davis commented Aug 10, 2021 • edited Loading

fabiolimace commented Aug 11, 2021

fabiolimace commented Aug 11, 2021 • edited Loading

kyzer-davis commented Aug 11, 2021

bradleypeabody commented Aug 12, 2021 • edited Loading

nurked commented Aug 12, 2021

nerg4l commented Aug 12, 2021

kyzer-davis commented Aug 12, 2021 • edited Loading

nerg4l commented Aug 13, 2021 • edited Loading

kyzer-davis commented Aug 13, 2021

nerg4l commented Aug 16, 2021

edo1 commented Aug 17, 2021 • edited Loading

nerg4l commented Aug 17, 2021 • edited Loading

fabiolimace commented Aug 17, 2021 • edited Loading

Python

Java

Javascript

nerg4l commented Aug 18, 2021

kyzer-davis commented Mar 1, 2022

kyzer-davis commented Aug 10, 2021 •

edited

Loading

kyzer-davis commented Aug 10, 2021 •

edited

Loading

fabiolimace commented Aug 11, 2021 •

edited

Loading

bradleypeabody commented Aug 12, 2021 •

edited

Loading

kyzer-davis commented Aug 12, 2021 •

edited

Loading

nerg4l commented Aug 13, 2021 •

edited

Loading

edo1 commented Aug 17, 2021 •

edited

Loading

nerg4l commented Aug 17, 2021 •

edited

Loading

fabiolimace commented Aug 17, 2021 •

edited

Loading