-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import multiple tables at same time - 1 #2191
import multiple tables at same time - 1 #2191
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments
yb-voyager/cmd/importData.go
Outdated
if err != nil { | ||
utils.ErrExit("preparing for file import: %s", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do this PrepareForFileImport
here? we are already doing it in NewFileBatchProducer
lastBatchNumber: lastBatchNumber, | ||
lastOffset: lastOffset, | ||
fileFullySplit: fileFullySplit, | ||
completed: completed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: completed: len(pendingBatches) == 0 && fileFullySplit
return nil, err | ||
} | ||
if p.lineFromPreviousBatch != "" { | ||
err = batchWriter.WriteRecord(p.lineFromPreviousBatch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment for explaining about this lineFromPreviousBatch
} | ||
|
||
// 3 batches should be produced | ||
// while calculating for the first batch, the header is also considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, while preparing the first batch - we add the bytes of the header to the batch's total bytes but for the further batches, we don't as we already have the header we don't include it in the batch's bytes.
I think worth testing if in some cases where the number of columns is huge can this header's bytes can also contribute to the batches' bytes and should be included.
Can you please add a TODO while we are adding a header to bthe atch file to fix this if required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, did not want to change the implementation as part of this PR. Will add a TODO:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@makalaaneesh @priyanshi-yb i think we should be uniform across all the batches, either consider it in all or not consider at all, which we can discuss.
assert.NotNil(t, batch1) | ||
assert.Equal(t, int64(2), batch1.RecordCount) | ||
|
||
// simulate a crash and recover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test for recovery situation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Added mostly minor comments only.
Thanks for adding the unit tests, keep adding these kind of tests will indeed help in making the codebase more robust.
batch := p.pendingBatches[0] | ||
p.pendingBatches = p.pendingBatches[1:] | ||
// file is fully split and returning the last batch, so mark the producer as completed | ||
if len(p.pendingBatches) == 0 && p.fileFullySplit { | ||
p.completed = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are setting as complete before last batch is processed.
should we be setting this when this is actually no batch available futher?
Suggesting something like this::
if len(p.pendingBatches) > 0 {
batch := p.pendingBatches[0]
p.pendingBatches = p.pendingBatches[1:]
return batch, nil
} else if len(p.pendingBatches) == 0 && p.fileFullySplit {
p.completed = true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@makalaaneesh this one might be important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanyamsinghal Since that is the last batch that we are returning (because file is fully split and we are picking the last pending batch, no batches are further available, so it made sense to mark it as done.
}, nil | ||
} | ||
|
||
func (p *FileBatchProducer) Done() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: p
--> producer
?
return d.maxSizeBytes | ||
} | ||
|
||
func createTempFile(dir string, fileContents string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider moving helper functions like this to test/utils/testutils.go
testutils package
return ldataDir, lexportDir, state, nil | ||
} | ||
|
||
func setupFileForTest(lexportDir string, fileContents string, dir string, tableName string) (string, *ImportFileTask, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more explicit function name?
assert.Equal(t, int64(2), batches[0].RecordCount) | ||
batchContents, err := os.ReadFile(batches[0].GetFilePath()) | ||
assert.NoError(t, err) | ||
assert.Equal(t, "id,val\n1, \"hello\"\n2, \"world\"", string(batchContents)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: define a var for expected values
// 3 batches should be produced | ||
// while calculating for the first batch, the header is also considered | ||
assert.Equal(t, 3, len(batches)) | ||
// each of length 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment?
} | ||
|
||
// 3 batches should be produced | ||
// while calculating for the first batch, the header is also considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@makalaaneesh @priyanshi-yb i think we should be uniform across all the batches, either consider it in all or not consider at all, which we can discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other case which can test here:
- Errors due to the datafile, for eg: syntax error.
- Resumability, after fixing that error in the main datafile.
- I think we two type of data files supported csv and text, so having coverage from that perspective is also good.
- Any variation in the content of datafile, specially csv? although that is something to be tested on data file package, but if it is not there we can add here also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good ideas -
1, 2: They are more about the full import. Not applicable in the case of a FileBatchProducer which just produces the batches. Can be taken up when I add the FileTaskImporter 👍
3,4 : agreed, but as you said better suited for the dataFile package.
Describe the changes in this pull request
FileBatchProducer
Describe if there are any user-facing changes
How was this pull request tested?
Wrote unit tests.
To run integration tests:
Does your PR have changes that can cause upgrade issues?