-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support all value types #1580
Support all value types #1580
Conversation
WIP. |
binary.BigEndian.PutUint64(buf[1:9], math.Float64bits(v)) | ||
case bool: | ||
buf = make([]byte, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbjohnson -- from implementing this function, it seems to me we need to code more information in a (field ID, value) pair -- we must also encode the type. Can you confirm? unmarshalValues
just assumes that it's a float type in the database, but that will no longer be true with these changes.
If I am correct, a single byte after the field ID would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't have to encode the type as that's kept in the metastore. Field id, followed by value (if a fixed size type) or in the case of bytes and strings, field id followed by length, followed by value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why is this code casting it to a float before sending it up? Is that redundant? It did occur to me that the info is in the metastore, and we could decode at a higher level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear -- the conversion is happening in unmarshalValues
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like I could preform the type conversion using the metastore around here:
https://github.com/influxdb/influxdb/blob/master/server.go#L1839
so the low-level conversion to float (and support for other types at this low-level as implied by the comments) may be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah -- unless your distinction is just between numeric (floats) with a fixed length and variable length like strings? I interpreted the comments to mean we needed a breakdown by all supported types at this low-level of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ReadSeries
function is actually cruft left over from stuff that was in there for debugging purposes. The real way to read raw series data is through the query engine, which now actually works.
There's some other spot in the code where you can get a value based on the field ID. At some point someone needs to cast this to the correct data type. That should be done in the query engine before it gets yielded to the mappers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah -- OK. I do recall @benbjohnson remarking that ReadSeries
was debug. If casting strictly happens at the higher level, yeah, then at this low-level it really is just a bunch of bytes.
Let me take another pass.
Just for the record, |
Obviously using a single source of truth for field types would be best, of course. |
c99b16a
to
335bad2
Compare
@pauldix @benbjohnson -- please review these latest changes and let me know what you think. This is still WIP, but I think it's close. I'd like some feedback at this point. Points to note:
|
335bad2
to
e562716
Compare
e562716
to
4baeee3
Compare
We need to also support raw bytes as a type. On the mapValues part of things, almost all the aggregate functions look at only one field. It would be nice to have a function that takes this field ID and returns the raw value without marshaling it into a map. We should be able to just cast the raw bytes from the DB into the type. This would save us a ton when aggregating a through a bunch of data by not allocating a bunch of maps. |
How would a user insert raw bytes? |
Materializing into a map was just to get it working. It definitely need to get optimized. As for the |
I'm looking at some other stuff on the query side in terms of raw data queries. I think it would be useful to have the iterators yield the raw bytes to the map functions. Then have the iterator have a method that will pull a field value from those bytes. This will be useful for queries that need to access multiple fields in the map portion of the query. |
Conflicts: database.go server.go
b3cdf28
to
7ace67a
Compare
7ace67a
to
5f20af0
Compare
Field codec split into its own type, which allows the locking to be decoupled from the server lock. I still need to add support for other types, and have questions about interactions with the shard iterators. |
Unit test indexing of strings.
Unit tests added for string and boolean types. |
LGTM. Drop it like it's hot. Where's the Snoop Dogg emoji when you need it? |
Thanks for the review @pauldix -- merging now. |
No description provided.