-
Notifications
You must be signed in to change notification settings - Fork 66
Rewrite data file parser; Insert _tidb_rowid when needed; Update checkpoint structure #82
Conversation
Hi contributor, thanks for your PR. This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically. |
/run-all-tests |
This comment has been minimized.
This comment has been minimized.
3d9dde9
to
fd3d23d
Compare
The new lexer is 8x faster than MDDataReader. Speed is now a concern because we are going to read the entire file to get the accurate rows count per chunk.
fd3d23d
to
ea5e94c
Compare
ea5e94c
to
47d5df6
Compare
/run-all-tests |
PTAL @GregoryIan |
lightning/restore/checkpoints.go
Outdated
} | ||
|
||
func (merger *ChunkCheckpointMerger) MergeInto(cpd *TableCheckpointDiff) { | ||
cpd.hasChecksum = true | ||
cpd.hasChunks = true | ||
cpd.allocBase = merger.AllocBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always want to ask must it be it increasing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be. Adding a MaxInt64 call to ensure.
/run-all-tests |
back_quoted = '`' ^'`'* '`'; | ||
unquoted = ^([,;()'"`] | space)+; | ||
|
||
row = '(' (^[)'"`] | single_quoted | double_quoted | back_quoted)* ')'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about it, maybe we need more test or descirptions
@july2993 and I would review it again after we handle _row_id rebase |
On first read, we will reset the allocator base it is the maximum of 1. the AUTO_INCREMENT option of the CREATE TABLE statement, or 2. the total number of rows This ensures future writes after importing will not clobber existing rows due to overlapping _tidb_rowid.
I believe TiDB isn't going to accept any PR which doesn't handle the |
/ok-to-test |
/run-all-tests |
@july2993 PTAL |
@csuzhangxc @amyangfei PTAL |
ca1eab3
to
ed37dbe
Compare
/run-all-tests |
start = time.Now() | ||
kvs, _, err := kvEncoder.SQL2KV(sqls.String()) | ||
metrics.MarkTiming(encodeMark, start) | ||
common.AppLogger.Debugf("len(kvs) %d, len(sql) %d", len(kvs), sqls.Len()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is sqls.Len()
(number of bytes) for len(sql)
significant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's significant. This is just migration of existing code.
tidb-lightning/lightning/restore/restore.go
Lines 1115 to 1117 in 2ca7481
kvs, affectedRows, err := kvEncoder.SQL2KV(stmt) | |
metrics.MarkTiming(encodeMark, start) | |
common.AppLogger.Debugf("len(kvs) %d, len(sql) %d", len(kvs), len(stmt)) |
/run-integration-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-unit-test |
What problem does this PR solve?
Completely fix TOOL-462, by recording the
_tidb_rowid
on non-PkIsHandle tables to ensure idempotence when importing the same chunk twice.What is changed and how it works?
splitFuzzyRegion
back to an exact version.MDDataReader
by a ragel-based parser, which is about 8x faster on my machine.Check List
Tests
Code changes
Side effects
Related changes
tidb-ansible
repository