dolt 1.35.13 #170760

BrewTestBot · 2024-05-03T21:25:56Z

Created by brew bump

Created with brew bump-formula-pr.

release notes

# Merged PRs
dolt

7818: Apply a factor to better estimate information_schema.TABLES.DATA_LENGTH

information_schema.TABLES.DATA_LENGTH currently reports the max possible table size for a table, and doesn't take into account table file compression or that variable length fields (e.g. TEXT) are not always fully used. Tools such as DBeaver use this metadata to display table sizes, and since the estimates can easily be orders of magnitude greater than the actual size on disk, it can cause customers to be concerned by the reported sizes (e.g. Table size calculation using DATA_LENGTH in information schema is naive and massively overstates the size of tables dolthub/dolt#6624).

As a short-term fix to make these estimates more accurate, we apply a constant factor to the max table size. I came up with this scaling factor by measuring a best case scenario (where no fields are variable length) and a worst case scenario (were all fields are variable length and only use a few bytes), then picking a value roughly in the middle. Longer-term, a better way to estimate table size on disk will be to use statistics data.
7810: fix output for dolt diff --stat -r json

This PR tidys up the code for printing diffs, specifically for JSON result format, and prints --stat correctly for JSON result format.

Additionally, we throw an error for SQL result format instead of just returning incorrect output. It might be worth implenting now, but I can just make an issue for it.

fixes: dolt diff --stat -r json produces invalid JSON dolthub/dolt#7800
7809: go/libraries/doltcore/sqle/dprocedures: dolt_pull.go: Improve CPU utilization of call dolt_pull.
7805: Fix: allow jsonSerializer to load JSON from LazyJSONDocument
7804: Changing database init/drop hooks to be a slice of hooks

The Dolt database provider currently has a single init hook and a single drop hook. We have a few hooks, and in order to support multiple hooks, we chain them together. Binlog replication will also need to register a similar init and drop hook to capture database create/drop actions, so to prepare for that, this PR turns the single init hook and single drop hook into a slice of init hooks and a slice of drop hooks.
7802: adding --name-only option for dolt diff

This PR adds support for --name-only option for dolt diff, which just prints the tables that have changed between the two commits. This mirrors git diff --name-only.

fixes: dolt diff ... that only shows the tables changed in a simpler format dolthub/dolt#7797
7795: Serialization code for binlog events

Provides support for serializing all Dolt data types into MySQL's binary encoding used in binlog events. Vitess provides good support for deserializing binary values from binlog events into Go datatypes, but doesn't provide any support for serializing types into MySQL's binary format. This PR pulls data out of Dolt's storage system and encodes it into MySQL's binary format. It would be interesting to split out the Dolt storage system specific code and the core MySQL serialization logic in the future, but this seems like the right first step.

Related to Dolt binlog Provider Support dolthub/dolt#7512
7785: Use LazyJSONDocument when reading from a JSON column.

This is the Dolt side of Dolt serializes and deserializes JSON unnecessarily. dolthub/dolt#7749

The GMS PR is Add LazyJSONDocument, which wraps a JSON string and only deserializes it if needed. dolthub/go-mysql-server#2470

LazyJSONDocument is an alternate implementation of sql.JSONWrapper that takes a string of serialized JSON and defers deserialization until it's actually required.

This is useful because in the most common use case (selecting a JSON column), deserialization is never required.

In an extreme example, I created a table with 8000 rows, with each row containing a 80KB JSON document.

dolt sql -q "SELECT * FROM test_table" ran in 47 seconds using JSONDocument, and 28 seconds using LazyJSONDocument, nearly half the time.

Even in cases where we do need to deserialize the JSON in order to filter on it, we can avoid reserializing it afterward, which is still a performance win.

Of note: In some cases we use a special serializer (defined in json_encode.go::marshalToMySqlString) in order to produce a string that is, according to the docstring "compatible with MySQL's JSON output, including spaces."

This currently gets used

In Query Diff
When hashing values for fulltext tables
When casting JSON columns to a text type
When writing values along the wire

The last one is the most worrying, because it means that we can't avoid the serialization round-trip if we're connecting to a dolt server remotely. I discussed with Max whether or not we consider it a requirement to match MySQL's wire responses exactly for JSON, and agreed that we could probably relax that requirement. Casting a document to a text type will still result in the same output as MySQL.


7754: Index rebuilds with external key sorting

Index builds now write keys to intermediate files and merge sort before materializing the prolly tree for the secondary index. This contrasts the default approach, which rebuilds the prolly tree each time we flush keys from memory. The old approach reads most of the tree with random reads and writes when memory flushes are unsorted keys. The new approach structures work for sequential IO by flushing sorted runs that become incrementally merge sorted. The sequential IO is dramatically faster for disk-based systems.

go-mysql-server

2485: Have LazyJSONDocument implement fmt.Stringer and driver.Valuer, in order to interoperate with other go SQL libraries.
2470: Add LazyJSONDocument, which wraps a JSON string and only deserializes it if needed.

This is the GMS side of Dolt serializes and deserializes JSON unnecessarily. dolthub/dolt#7749

This is a new JSONWrapper implementation. It isn't used by the GMS in-memory storage, but it will be used in Dolt to speed up SELECT queries that don't care about the structure of the JSON.

A big difference between this and JSONDocument is that even after it de-serializes the JSON into a go value, it continues to keep the string in memory. This is good in cases where we would want to re-serialize the JSON later without changing it. (So statements like SELECT json FROM table WHERE json->>"$.key" =  "foo"; will still be faster.) But with the downside of using more memory than JSONDocument)
2469: refactor index validation and prevent indexes over json columns

This PR consolidates the logic to validate if an index.

Additionally, it fixes a bug where create table t (i int, index (i, i)); was allowed.

fixes: Prevent Indexing JSON Fields dolthub/dolt#6064
2466: Schema-qualified table names

This PR also fixes a couple unrelated issues:

IMDB query plans are brought up to date (this is most of the change lines)
Fixed bugs in certain show statements (information_schema tests)



Closed Issues

7813: Ability to export diffs as SQL
6624: Table size calculation using DATA_LENGTH in information schema is naive and massively overstates the size of tables
7800: dolt diff --stat -r json produces invalid JSON
7749: Dolt serializes and deserializes JSON unnecessarily.
7797: dolt diff ... that only shows the tables changed in a simpler format

github-actions · 2024-05-03T22:11:26Z

🤖 An automated task has requested bottles to be published to this PR.

dolt 1.35.13

3907c63

github-actions bot added go Go use is a significant feature of the PR or issue bump-formula-pr PR was created using `brew bump-formula-pr` labels May 3, 2024

fxcoudert approved these changes May 3, 2024

View reviewed changes

dolt: update 1.35.13 bottle.

27861c1

github-actions bot added the CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. label May 3, 2024

github-actions bot approved these changes May 3, 2024

View reviewed changes

BrewTestBot enabled auto-merge May 3, 2024 22:13

BrewTestBot added this pull request to the merge queue May 3, 2024

Merged via the queue into master with commit 4b86b27 May 3, 2024
14 checks passed

BrewTestBot deleted the bump-dolt-1.35.13 branch May 3, 2024 22:19

github-actions bot added the outdated PR was locked due to age label Jun 3, 2024

github-actions bot locked as resolved and limited conversation to collaborators Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dolt 1.35.13 #170760

dolt 1.35.13 #170760

BrewTestBot commented May 3, 2024

github-actions bot commented May 3, 2024

dolt 1.35.13 #170760

dolt 1.35.13 #170760

Conversation

BrewTestBot commented May 3, 2024

dolt

go-mysql-server

Closed Issues

github-actions bot commented May 3, 2024