-
-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IndexedJsonDocument
, a JSONWrapper
implementation that stores JSON documents in a prolly tree with probabilistic hashing.
#7912
Conversation
…re representing the leaf node as a blob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks awesome and a really cool approach. Not too mind bending after starting at it for a while. I took a pass on some suggestions, but I may be missing some things. Please push back if I'm off base anywhere :).
@@ -217,6 +217,9 @@ func newLeafCursorAtKey[K ~[]byte, O Ordering[K]](ctx context.Context, ns NodeSt | |||
// searchForKey returns a SearchFn for |key|. | |||
func searchForKey[K ~[]byte, O Ordering[K]](key K, order O) SearchFn { | |||
return func(nd Node) (idx int) { | |||
if nd.keys.IsEmpty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is nd.Count()
on one of these nodes? Why is this special case necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the flattened leaf nodes. They contain 0 keys and 1 value. If we don't have this check, then the binary search will attempt to compare the search key to key[0] in the node, which will be out of bounds.
I added a comment.
…on inside an object/array, before the first element.
…er: they're not needed.
…to indicate a bug in the JSON functions, not data corruption.
@nicktobey DOLT
|
Co-authored-by: Aaron Son <[email protected]>
Co-authored-by: Aaron Son <[email protected]>
…afe to call elsewhere.
@nicktobey DOLT
|
I don't like this, but the alternative is adding a context parameter to ToInterface, which would then propagate through *hundreds* of files. We want to do that eventually, but this is an acceptable stopgap.
@nicktobey DOLT
|
…ns other than JsonDocument.
…etails changed so much since then that the test is just wrong.
…ew buffer that contains anything that hasn't been put into a chunk yet.
…f we detect one during a lookup, fall back on the previous behavior.
…IndexedJsonDocument::Lookup`
…dexedJsonDocument.
8023dad
to
303526d
Compare
@nicktobey DOLT
|
@nicktobey DOLT
|
@coffeegoddd DOLT
|
…ry doesn't move.
…en by a newer version of Dolt that it can't read.
@nicktobey DOLT
|
tl;dr: We store a JSON document in a prolly tree, where the leaf nodes of the tree are blob nodes with each contain a fragment of the document, and the intermediate nodes are address map nodes, where the keys describe a JSONPath.
The new logic for reading and writing JSON documents is cleanly separated into the following files:
IndexedJsonDocument - The new
JSONWrapper
implementation. It holds the root hash of the prolly tree.JsonChunker - A wrapper around a regular chunker. Used to write new JSON documents or apply edits to existing documents.
JsonCursor - A wrapper around a regular cursor, with added functionality allowing callers to seek to a specific location in the document.
JsonScanner - A custom JSON parser that tracks that current JSONPath.
JsonLocation - A custom representation of a JSON path suitable for use as a prolly tree key.
Each added file has additional documentation with more details about the individual components.
Throughout every iteration of this project, the core idea has always been to represent a JSON document as a mapping from JSONPath locations to the values stored at those locations, then we could store that map in a prolly tree and get all the benefits that we currently get from storing tables in prolly trees: fast diffing and merging, fast point lookups and mutations, etc.
This goal has three major challenges:
This design achieves all three of these requirements:
JSONChunker
are backwards compatible with current Dolt binaries and can be read back by existing versions of Dolt. (Although they will have different hashes than equivalent documents that those versions would write.)