import multiple tables at same time - 1 #2191

makalaaneesh · 2025-01-16T08:15:51Z

Describe the changes in this pull request

refactor batch producing + submitting to allow for producing one single batch at a time. This will help us import multiple tables at the same time.
Refactor batch producing logic into a FileBatchProducer

Describe if there are any user-facing changes

How was this pull request tested?

Wrote unit tests.
To run integration tests:

resumption tests
long running tests

Does your PR have changes that can cause upgrade issues?

Component	Breaking changes?
MetaDB	Yes/No
Name registry json	Yes/No
Data File Descriptor Json	Yes/No
Export Snapshot Status Json	Yes/No
Import Data State	Yes/No
Export Status Json	Yes/No
Data .sql files of tables	Yes/No
Export and import data queue	Yes/No
Schema Dump	Yes/No
AssessmentDB	Yes/No
Sizing DB	Yes/No
Migration Assessment Report Json	Yes/No
Callhome Json	Yes/No
YugabyteD Tables	Yes/No
TargetDB Metadata Tables	Yes/No

makalaaneesh · 2025-01-20T12:24:06Z

large import data file test - https://jenkins.dev.yugabyte.com/blue/organizations/jenkins/users%2Fyb-voyager-testing%2Fyb-voyager-testing-pipeline/detail/yb-voyager-testing-pipeline/4512/pipeline/193

No regression in time compared to main branch - https://jenkins.dev.yugabyte.com/blue/organizations/jenkins/users%2Fyb-voyager-testing%2Fyb-voyager-testing-pipeline/detail/yb-voyager-testing-pipeline/4534/pipeline/199

priyanshi-yb

Few comments

priyanshi-yb · 2025-01-20T12:28:12Z

yb-voyager/cmd/importData.go

 	if err != nil {
 		utils.ErrExit("preparing for file import: %s", err)
 	}


Do we need to do this PrepareForFileImport here? we are already doing it in NewFileBatchProducer

priyanshi-yb · 2025-01-20T12:30:27Z

yb-voyager/cmd/importDataFileBatchProducer.go

+		lastBatchNumber: lastBatchNumber,
+		lastOffset:      lastOffset,
+		fileFullySplit:  fileFullySplit,
+		completed:       completed,


nit: completed: len(pendingBatches) == 0 && fileFullySplit

priyanshi-yb · 2025-01-20T12:36:06Z

yb-voyager/cmd/importDataFileBatchProducer.go

+		return nil, err
+	}
+	if p.lineFromPreviousBatch != "" {
+		err = batchWriter.WriteRecord(p.lineFromPreviousBatch)


Add comment for explaining about this lineFromPreviousBatch

priyanshi-yb · 2025-01-20T13:12:16Z

yb-voyager/cmd/importDataFileBatchProducer_test.go

+	}
+
+	// 3 batches should be produced
+	// while calculating for the first batch, the header is also considered


Oh right, while preparing the first batch - we add the bytes of the header to the batch's total bytes but for the further batches, we don't as we already have the header we don't include it in the batch's bytes.
I think worth testing if in some cases where the number of columns is huge can this header's bytes can also contribute to the batches' bytes and should be included.
Can you please add a TODO while we are adding a header to bthe atch file to fix this if required?

Yeah, did not want to change the implementation as part of this PR. Will add a TODO:

@makalaaneesh @priyanshi-yb i think we should be uniform across all the batches, either consider it in all or not consider at all, which we can discuss.

priyanshi-yb · 2025-01-20T13:16:04Z

yb-voyager/cmd/importDataFileBatchProducer_test.go

+	assert.NotNil(t, batch1)
+	assert.Equal(t, int64(2), batch1.RecordCount)
+
+	// simulate a crash and recover


Nice test for recovery situation!

priyanshi-yb

LGTM!

sanyamsinghal

LGTM. Added mostly minor comments only.
Thanks for adding the unit tests, keep adding these kind of tests will indeed help in making the codebase more robust.

sanyamsinghal · 2025-01-22T05:59:48Z

yb-voyager/cmd/importDataFileBatchProducer.go

+		batch := p.pendingBatches[0]
+		p.pendingBatches = p.pendingBatches[1:]
+		// file is fully split and returning the last batch, so mark the producer as completed
+		if len(p.pendingBatches) == 0 && p.fileFullySplit {
+			p.completed = true
+		}


Here we are setting as complete before last batch is processed.
should we be setting this when this is actually no batch available futher?

Suggesting something like this::

if len(p.pendingBatches) > 0 { batch := p.pendingBatches[0] p.pendingBatches = p.pendingBatches[1:] return batch, nil } else if len(p.pendingBatches) == 0 && p.fileFullySplit { p.completed = true }

@makalaaneesh this one might be important.

@sanyamsinghal Since that is the last batch that we are returning (because file is fully split and we are picking the last pending batch, no batches are further available, so it made sense to mark it as done.

sanyamsinghal · 2025-01-22T07:44:39Z

yb-voyager/cmd/importDataFileBatchProducer.go

+	}, nil
+}
+
+func (p *FileBatchProducer) Done() bool {


nit: p --> producer ?

sanyamsinghal · 2025-01-22T07:46:41Z

yb-voyager/cmd/importDataFileBatchProducer_test.go

+	return d.maxSizeBytes
+}
+
+func createTempFile(dir string, fileContents string) (string, error) {


consider moving helper functions like this to test/utils/testutils.go testutils package

sanyamsinghal · 2025-01-22T07:47:28Z

yb-voyager/cmd/importDataFileBatchProducer_test.go

+	return ldataDir, lexportDir, state, nil
+}
+
+func setupFileForTest(lexportDir string, fileContents string, dir string, tableName string) (string, *ImportFileTask, error) {


more explicit function name?

sanyamsinghal · 2025-01-22T07:49:42Z

yb-voyager/cmd/importDataFileBatchProducer_test.go

+	assert.Equal(t, int64(2), batches[0].RecordCount)
+	batchContents, err := os.ReadFile(batches[0].GetFilePath())
+	assert.NoError(t, err)
+	assert.Equal(t, "id,val\n1, \"hello\"\n2, \"world\"", string(batchContents))


nit: define a var for expected values

sanyamsinghal · 2025-01-22T07:51:14Z

yb-voyager/cmd/importDataFileBatchProducer_test.go

+	// 3 batches should be produced
+	// while calculating for the first batch, the header is also considered
+	assert.Equal(t, 3, len(batches))
+	// each of length 2


remove comment?

sanyamsinghal · 2025-01-22T07:52:51Z

yb-voyager/cmd/importDataFileBatchProducer_test.go

+	}
+
+	// 3 batches should be produced
+	// while calculating for the first batch, the header is also considered


@makalaaneesh @priyanshi-yb i think we should be uniform across all the batches, either consider it in all or not consider at all, which we can discuss.

sanyamsinghal · 2025-01-22T07:57:10Z

yb-voyager/cmd/importDataFileBatchProducer_test.go

Other case which can test here:

Errors due to the datafile, for eg: syntax error.

Resumability, after fixing that error in the main datafile.

I think we two type of data files supported csv and text, so having coverage from that perspective is also good.

Any variation in the content of datafile, specially csv? although that is something to be tested on data file package, but if it is not there we can add here also.

Good ideas -
1, 2: They are more about the full import. Not applicable in the case of a FileBatchProducer which just produces the batches. Can be taken up when I add the FileTaskImporter 👍
3,4 : agreed, but as you said better suited for the dataFile package.

makalaaneesh · 2025-01-22T17:27:46Z

resumption test - https://jenkins.dev.yugabyte.com/blue/organizations/jenkins/users%2Fyb-voyager-testing%2Fyb-voyager-testing-pipeline/detail/yb-voyager-testing-pipeline/4535/pipeline

makalaaneesh added 13 commits January 15, 2025 13:41

wip

fa29368

base logic for producing next batch

ec54ea6

store line from previous batch

a4aabc8

test

440a5f3

rewrite test

c14e69e

minor fix

a056385

more tests

c45b6a3

batch value verify

a78387e

assert less than

65c7645

resumable test

a9a294c

import data change to use filebatchproducer

1f32f7a

unit tag

cbc1ede

cleanup

943bd22

makalaaneesh requested review from sanyamsinghal and priyanshi-yb January 20, 2025 05:45

makalaaneesh marked this pull request as ready for review January 20, 2025 05:45

priyanshi-yb reviewed Jan 20, 2025

View reviewed changes

makalaaneesh added 2 commits January 22, 2025 13:00

comments

ae0f5f3

review comments

634f29c

makalaaneesh requested a review from priyanshi-yb January 22, 2025 07:36

priyanshi-yb approved these changes Jan 22, 2025

View reviewed changes

sanyamsinghal reviewed Jan 22, 2025

View reviewed changes

makalaaneesh added 3 commits January 22, 2025 13:50

test for when all batches are produced and we resume

b2da3ff

review comments

c65f112

Merge branch 'main' into aneesh/import-multiple-tables-at-same-time-1

3fc9164

makalaaneesh merged commit 6fef1dc into main Jan 22, 2025
66 of 67 checks passed

makalaaneesh deleted the aneesh/import-multiple-tables-at-same-time-1 branch January 22, 2025 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

import multiple tables at same time - 1 #2191

import multiple tables at same time - 1 #2191

makalaaneesh commented Jan 16, 2025 •

edited

Loading

makalaaneesh commented Jan 20, 2025 •

edited

Loading

priyanshi-yb left a comment

priyanshi-yb Jan 20, 2025

priyanshi-yb Jan 20, 2025

priyanshi-yb Jan 20, 2025

priyanshi-yb Jan 20, 2025

makalaaneesh Jan 22, 2025

sanyamsinghal Jan 22, 2025

priyanshi-yb Jan 20, 2025

priyanshi-yb left a comment

sanyamsinghal left a comment •

edited

Loading

sanyamsinghal Jan 22, 2025

sanyamsinghal Jan 22, 2025

makalaaneesh Jan 22, 2025

sanyamsinghal Jan 22, 2025

sanyamsinghal Jan 22, 2025

sanyamsinghal Jan 22, 2025

sanyamsinghal Jan 22, 2025

sanyamsinghal Jan 22, 2025

sanyamsinghal Jan 22, 2025

sanyamsinghal Jan 22, 2025

makalaaneesh Jan 22, 2025

makalaaneesh commented Jan 22, 2025

import multiple tables at same time - 1 #2191

import multiple tables at same time - 1 #2191

Conversation

makalaaneesh commented Jan 16, 2025 • edited Loading

Describe the changes in this pull request

Describe if there are any user-facing changes

How was this pull request tested?

Does your PR have changes that can cause upgrade issues?

makalaaneesh commented Jan 20, 2025 • edited Loading

priyanshi-yb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

priyanshi-yb left a comment

Choose a reason for hiding this comment

sanyamsinghal left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

makalaaneesh commented Jan 22, 2025

makalaaneesh commented Jan 16, 2025 •

edited

Loading

makalaaneesh commented Jan 20, 2025 •

edited

Loading

sanyamsinghal left a comment •

edited

Loading