Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Created by
brew bump
Created with
brew bump-formula-pr
.release notes
information_schema.TABLES.DATA_LENGTH
information_schema.TABLES.DATA_LENGTH
currently reports the max possible table size for a table, and doesn't take into account table file compression or that variable length fields (e.g. TEXT) are not always fully used. Tools such as DBeaver use this metadata to display table sizes, and since the estimates can easily be orders of magnitude greater than the actual size on disk, it can cause customers to be concerned by the reported sizes (e.g. Table size calculation using DATA_LENGTH in information schema is naive and massively overstates the size of tables dolthub/dolt#6624).As a short-term fix to make these estimates more accurate, we apply a constant factor to the max table size. I came up with this scaling factor by measuring a best case scenario (where no fields are variable length) and a worst case scenario (were all fields are variable length and only use a few bytes), then picking a value roughly in the middle. Longer-term, a better way to estimate table size on disk will be to use statistics data.
dolt diff --stat -r json
This PR tidys up the code for printing diffs, specifically for JSON result format, and prints
--stat
correctly for JSON result format.Additionally, we throw an error for SQL result format instead of just returning incorrect output. It might be worth implenting now, but I can just make an issue for it.
fixes:
dolt diff --stat -r json
produces invalid JSON dolthub/dolt#7800jsonSerializer
to load JSON fromLazyJSONDocument
The Dolt database provider currently has a single init hook and a single drop hook. We have a few hooks, and in order to support multiple hooks, we chain them together. Binlog replication will also need to register a similar init and drop hook to capture database create/drop actions, so to prepare for that, this PR turns the single init hook and single drop hook into a slice of init hooks and a slice of drop hooks.
--name-only
option fordolt diff
This PR adds support for
--name-only
option fordolt diff
, which just prints the tables that have changed between the two commits. This mirrorsgit diff --name-only
.fixes:
dolt diff
... that only shows the tables changed in a simpler format dolthub/dolt#7797Provides support for serializing all Dolt data types into MySQL's binary encoding used in binlog events. Vitess provides good support for deserializing binary values from binlog events into Go datatypes, but doesn't provide any support for serializing types into MySQL's binary format. This PR pulls data out of Dolt's storage system and encodes it into MySQL's binary format. It would be interesting to split out the Dolt storage system specific code and the core MySQL serialization logic in the future, but this seems like the right first step.
Related to Dolt binlog Provider Support dolthub/dolt#7512
LazyJSONDocument
when reading from a JSON column.This is the Dolt side of Dolt serializes and deserializes JSON unnecessarily. dolthub/dolt#7749
The GMS PR is Add
LazyJSONDocument
, which wraps a JSON string and only deserializes it if needed. dolthub/go-mysql-server#2470LazyJSONDocument
is an alternate implementation ofsql.JSONWrapper
that takes a string of serialized JSON and defers deserialization until it's actually required.This is useful because in the most common use case (selecting a JSON column), deserialization is never required.
In an extreme example, I created a table with 8000 rows, with each row containing a 80KB JSON document.
dolt sql -q "SELECT * FROM test_table"
ran in 47 seconds usingJSONDocument
, and 28 seconds usingLazyJSONDocument
, nearly half the time.Even in cases where we do need to deserialize the JSON in order to filter on it, we can avoid reserializing it afterward, which is still a performance win.
Of note: In some cases we use a special serializer (defined in
json_encode.go::marshalToMySqlString
) in order to produce a string that is, according to the docstring "compatible with MySQL's JSON output, including spaces."This currently gets used
The last one is the most worrying, because it means that we can't avoid the serialization round-trip if we're connecting to a dolt server remotely. I discussed with Max whether or not we consider it a requirement to match MySQL's wire responses exactly for JSON, and agreed that we could probably relax that requirement. Casting a document to a text type will still result in the same output as MySQL.
Index builds now write keys to intermediate files and merge sort before materializing the prolly tree for the secondary index. This contrasts the default approach, which rebuilds the prolly tree each time we flush keys from memory. The old approach reads most of the tree with random reads and writes when memory flushes are unsorted keys. The new approach structures work for sequential IO by flushing sorted runs that become incrementally merge sorted. The sequential IO is dramatically faster for disk-based systems.
go-mysql-server
LazyJSONDocument
, which wraps a JSON string and only deserializes it if needed.This is the GMS side of Dolt serializes and deserializes JSON unnecessarily. dolthub/dolt#7749
This is a new
JSONWrapper
implementation. It isn't used by the GMS in-memory storage, but it will be used in Dolt to speed upSELECT
queries that don't care about the structure of the JSON.A big difference between this and
JSONDocument
is that even after it de-serializes the JSON into a go value, it continues to keep the string in memory. This is good in cases where we would want to re-serialize the JSON later without changing it. (So statements likeSELECT json FROM table WHERE json->>"$.key" = "foo";
will still be faster.) But with the downside of using more memory thanJSONDocument
)This PR consolidates the logic to validate if an index.
Additionally, it fixes a bug where
create table t (i int, index (i, i));
was allowed.fixes: Prevent Indexing JSON Fields dolthub/dolt#6064
This PR also fixes a couple unrelated issues:
Closed Issues
dolt diff --stat -r json
produces invalid JSONdolt diff
... that only shows the tables changed in a simpler format