Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dolt 1.35.13 #170760

Merged
merged 2 commits into from
May 3, 2024
Merged

dolt 1.35.13 #170760

merged 2 commits into from
May 3, 2024

Conversation

BrewTestBot
Copy link
Member

Created by brew bump


Created with brew bump-formula-pr.

release notes
# Merged PRs

dolt

  • 7818: Apply a factor to better estimate information_schema.TABLES.DATA_LENGTH
    information_schema.TABLES.DATA_LENGTH currently reports the max possible table size for a table, and doesn't take into account table file compression or that variable length fields (e.g. TEXT) are not always fully used. Tools such as DBeaver use this metadata to display table sizes, and since the estimates can easily be orders of magnitude greater than the actual size on disk, it can cause customers to be concerned by the reported sizes (e.g. Table size calculation using DATA_LENGTH in information schema is naive and massively overstates the size of tables dolthub/dolt#6624).
    As a short-term fix to make these estimates more accurate, we apply a constant factor to the max table size. I came up with this scaling factor by measuring a best case scenario (where no fields are variable length) and a worst case scenario (were all fields are variable length and only use a few bytes), then picking a value roughly in the middle. Longer-term, a better way to estimate table size on disk will be to use statistics data.
  • 7810: fix output for dolt diff --stat -r json
    This PR tidys up the code for printing diffs, specifically for JSON result format, and prints --stat correctly for JSON result format.
    Additionally, we throw an error for SQL result format instead of just returning incorrect output. It might be worth implenting now, but I can just make an issue for it.
    fixes: dolt diff --stat -r json produces invalid JSON dolthub/dolt#7800
  • 7809: go/libraries/doltcore/sqle/dprocedures: dolt_pull.go: Improve CPU utilization of call dolt_pull.
  • 7805: Fix: allow jsonSerializer to load JSON from LazyJSONDocument
  • 7804: Changing database init/drop hooks to be a slice of hooks
    The Dolt database provider currently has a single init hook and a single drop hook. We have a few hooks, and in order to support multiple hooks, we chain them together. Binlog replication will also need to register a similar init and drop hook to capture database create/drop actions, so to prepare for that, this PR turns the single init hook and single drop hook into a slice of init hooks and a slice of drop hooks.
  • 7802: adding --name-only option for dolt diff
    This PR adds support for --name-only option for dolt diff, which just prints the tables that have changed between the two commits. This mirrors git diff --name-only.
    fixes: dolt diff ... that only shows the tables changed in a simpler format dolthub/dolt#7797
  • 7795: Serialization code for binlog events
    Provides support for serializing all Dolt data types into MySQL's binary encoding used in binlog events. Vitess provides good support for deserializing binary values from binlog events into Go datatypes, but doesn't provide any support for serializing types into MySQL's binary format. This PR pulls data out of Dolt's storage system and encodes it into MySQL's binary format. It would be interesting to split out the Dolt storage system specific code and the core MySQL serialization logic in the future, but this seems like the right first step.
    Related to Dolt binlog Provider Support dolthub/dolt#7512
  • 7785: Use LazyJSONDocument when reading from a JSON column.
    This is the Dolt side of Dolt serializes and deserializes JSON unnecessarily. dolthub/dolt#7749
    The GMS PR is Add LazyJSONDocument, which wraps a JSON string and only deserializes it if needed. dolthub/go-mysql-server#2470
    LazyJSONDocument is an alternate implementation of sql.JSONWrapper that takes a string of serialized JSON and defers deserialization until it's actually required.
    This is useful because in the most common use case (selecting a JSON column), deserialization is never required.
    In an extreme example, I created a table with 8000 rows, with each row containing a 80KB JSON document.
    dolt sql -q "SELECT * FROM test_table" ran in 47 seconds using JSONDocument, and 28 seconds using LazyJSONDocument, nearly half the time.
    Even in cases where we do need to deserialize the JSON in order to filter on it, we can avoid reserializing it afterward, which is still a performance win.
    Of note: In some cases we use a special serializer (defined in json_encode.go::marshalToMySqlString) in order to produce a string that is, according to the docstring "compatible with MySQL's JSON output, including spaces."
    This currently gets used
    • In Query Diff
    • When hashing values for fulltext tables
    • When casting JSON columns to a text type
    • When writing values along the wire
      The last one is the most worrying, because it means that we can't avoid the serialization round-trip if we're connecting to a dolt server remotely. I discussed with Max whether or not we consider it a requirement to match MySQL's wire responses exactly for JSON, and agreed that we could probably relax that requirement. Casting a document to a text type will still result in the same output as MySQL.
  • 7754: Index rebuilds with external key sorting
    Index builds now write keys to intermediate files and merge sort before materializing the prolly tree for the secondary index. This contrasts the default approach, which rebuilds the prolly tree each time we flush keys from memory. The old approach reads most of the tree with random reads and writes when memory flushes are unsorted keys. The new approach structures work for sequential IO by flushing sorted runs that become incrementally merge sorted. The sequential IO is dramatically faster for disk-based systems.

go-mysql-server

  • 2485: Have LazyJSONDocument implement fmt.Stringer and driver.Valuer, in order to interoperate with other go SQL libraries.
  • 2470: Add LazyJSONDocument, which wraps a JSON string and only deserializes it if needed.
    This is the GMS side of Dolt serializes and deserializes JSON unnecessarily. dolthub/dolt#7749
    This is a new JSONWrapper implementation. It isn't used by the GMS in-memory storage, but it will be used in Dolt to speed up SELECT queries that don't care about the structure of the JSON.
    A big difference between this and JSONDocument is that even after it de-serializes the JSON into a go value, it continues to keep the string in memory. This is good in cases where we would want to re-serialize the JSON later without changing it. (So statements like SELECT json FROM table WHERE json->>"$.key" = "foo"; will still be faster.) But with the downside of using more memory than JSONDocument)
  • 2469: refactor index validation and prevent indexes over json columns
    This PR consolidates the logic to validate if an index.
    Additionally, it fixes a bug where create table t (i int, index (i, i)); was allowed.
    fixes: Prevent Indexing JSON Fields dolthub/dolt#6064
  • 2466: Schema-qualified table names
    This PR also fixes a couple unrelated issues:
    • IMDB query plans are brought up to date (this is most of the change lines)
    • Fixed bugs in certain show statements (information_schema tests)

Closed Issues

  • 7813: Ability to export diffs as SQL
  • 6624: Table size calculation using DATA_LENGTH in information schema is naive and massively overstates the size of tables
  • 7800: dolt diff --stat -r json produces invalid JSON
  • 7749: Dolt serializes and deserializes JSON unnecessarily.
  • 7797: dolt diff ... that only shows the tables changed in a simpler format

@github-actions github-actions bot added go Go use is a significant feature of the PR or issue bump-formula-pr PR was created using `brew bump-formula-pr` labels May 3, 2024
Copy link
Contributor

github-actions bot commented May 3, 2024

🤖 An automated task has requested bottles to be published to this PR.

@github-actions github-actions bot added the CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. label May 3, 2024
@BrewTestBot BrewTestBot enabled auto-merge May 3, 2024 22:13
@BrewTestBot BrewTestBot added this pull request to the merge queue May 3, 2024
Merged via the queue into master with commit 4b86b27 May 3, 2024
14 checks passed
@BrewTestBot BrewTestBot deleted the bump-dolt-1.35.13 branch May 3, 2024 22:19
@github-actions github-actions bot added the outdated PR was locked due to age label Jun 3, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bump-formula-pr PR was created using `brew bump-formula-pr` CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. go Go use is a significant feature of the PR or issue outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants