Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict live migration to single table-list and added guardrails for that around table-list flags #2354

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

priyanshi-yb
Copy link
Contributor

@priyanshi-yb priyanshi-yb commented Feb 24, 2025

Describe the changes in this pull request

  1. During resumption in the command export data from source we are now using the stored table list in MSR TableListExportedFromSource and SourceExportedTableListWithLeafPartitions based on the source type as for PostgreSQL source we allow filtering the leaf partitions as well so storing them in MSR.
  2. In the command export data from target we are using the stored table list in MSR TableListExportedFromSource and then populate the leaf partitions from target for all the partitioned tables and then store it in the TargetExportedTableListWithLeafPartitions for further runs to avoid the fetching new leaf partitions from target.
  3. Added guardrails around the table-list/ exclude-table-list flags for checking if there is any discrepancy in these on resumption compared to initial lists.
    1. If any new tables are passed via these flags, commands error out saying Unknown tables found.
    2. If any table-list changes are found, commands report them on the console output of the commands, like Missing tables ... / Extra tables found ... than the initial list and give a prompt to the user if they are fine to continue with the initial list, else abort the command and let them restart.
  4. For the case of the partition, if any new leaf tables are added during the migration on the partitioned tables that are getting migrated. On resumption, we will report on the command output that some new leaf partitions found and will not be considered for the migration.

Describe if there are any user-facing changes

Yes, as mentioned above there will be user-facing changes for some guardrails around table-list etc.. now user won't be able to change the table-list during the live migration

Behaviour in these scenarios

  • New table is added on database either source/target -

With or without table-list flags: export data from source/target commands will be unaware of newly added tables and won’t consider it either during the run or re-run.
If a newly added table is passed through the table-list/exclude-table-list - commands error out saying unknown table, valid tables names are […].

➜ yb-voyager export data --export-dir ... --export-type snapshot-and-changes  --table-list test_partitions_quences                           
...
Continuing streaming from where we left off...
Unknown table names [test_partitions_quences] in the include list
Valid table names are: [public.foo_bar public.sales_region public.test_seq1 public.try public.sequence_check1 public.london public.sydney public.boston public.test_partitions_sequences_b public.test_partitions_sequences_l public.test_partitions_sequences_s public.sequence_check2 public.se..]
  • New leaf table is added on partitioned table in database either source/target -

With or without table-list flags:
-- During the run, export data from source/target commands will be unaware of newly added leaf tables and won’t consider it.
-- On re-run, export data from source/target commands will report to the user that new leaf tables are detected for partitioned tables with below msg -

Detected new partition tables for the following partitioned tables. These will not be considered during migration:
Root table: public.test_partitions_sequences, new leaf partitions: public.test_partitions_sequences_h
num tables to export: 1
table list for data export: [test_partitions_sequences (test_partitions_sequences_s)]

If new leaf partition table is passed through the table-list/exclude-table-list flags - commands error out saying unknown table, valid tables names are […].

➜  yb-voyager export data --export-dir .... --export-type snapshot-and-changes  --table-list test_partitions_sequences_h
....
Continuing streaming from where we left off...
Unknown table names [test_partitions_sequences_h] in the include list
Valid table names are: [public.foo_bar public.sales_region public.test_seq1 public.try public.sequence_check1 public.london public.sydney public.boston public.test_partitions_sequences_b public.test_partitions_sequences_l public.test_partitions_sequences_s public.sequence_check2 publi.....]
  • Table-list changes during resumption via flags

In case some tables are removed from the initial list in resumption - commands will report that there is discrepancy found and give prompt to user as mentioned below

  TARGET_DB_PASSWORD=*** yb-voyager export data from target --export-dir ... --table-list public."accounts_list_partitioned",public."empty_partition_table2",public."orders_interval_partition" --exclude-table-list empty_partition_table2
migrationID: 822cd9e3-00bc-4c7f-8d8c-195cdd3e24b7
Note: Live migration is a TECH PREVIEW feature.
export of data for source type as 'yugabytedb'
Continuing streaming from where we left off...
Changing the table list during live-migration is not allowed.
Missing tables in the current run compared to the initial list: empty_partition_table2_p_west,empty_partition_table2_p_east
Using the table list passed in the initial phase of migration - [public.accounts_list_partitioned_p_northwest .... public.orders_interval_partition_interval_partition_less_than_2018 public.orders_interval_partition]. 
Do you want continue?? [Y/N]: N
Aborting, Start a fresh migration...

In case some tables are added from the initial list in resumption - commands will report that there is discrepancy found and give prompt to user as mentioned below

 TARGET_DB_PASSWORD=*** yb-voyager export data from target --export-dir /home/centos/code/yb-voyager/migtests/tests/oracle/partitions/oracle_partitions_fallb_export-dir --table-list public."accounts_list_partitioned",public."empty_partition_table2",public."orders_interval_partition",empty_partition_table ...
migrationID: 822cd9e3-00bc-4c7f-8d8c-195cdd3e24b7
Note: Live migration is a TECH PREVIEW feature.
export of data for source type as 'yugabytedb'
Continuing streaming from where we left off...
Changing the table list during live-migration is not allowed.
Extra tables in the current run compared to the initial list: empty_partition_table2_p_extra,empty_partition_table_p_west,empty_partition_table_p_east,empty_partition_table
Using the table list passed in the initial phase of migration - [public.accounts_list_partitioned_p_northwest .... public.orders_interval_partition]. 
Do you want continue?? [Y/N]: Y
num tables to export: 3
table list for data export: [accounts_list_partitioned empty_partition_table2 orders_interval_partition]

How was this pull request tested?

Added unit tests with test container PG source for various cases.
Some TODOs remaining:

adding test assertions for below cases

  1. validate guardrails check prompts
  2. validate new leaf additions
  3. Unknown table in table-list error exit

Does your PR have changes that can cause upgrade issues?

Component Breaking changes?
MetaDB No
Name registry json No
Data File Descriptor Json No
Export Snapshot Status Json No
Import Data State No
Export Status Json No
Data .sql files of tables No
Export and import data queue No
Schema Dump No
AssessmentDB No
Sizing DB o
Migration Assessment Report Json No
Callhome Json No
YugabyteD Tables No
TargetDB Metadata Tables No

@priyanshi-yb priyanshi-yb changed the title Restrict live migration to single table-list from export data from source and guardrail for that around table-list flags Restrict live migration to single table-list and guardrail for that around table-list flags Feb 24, 2025
@priyanshi-yb priyanshi-yb force-pushed the priyanshi/name-reg-table-name branch from 850a85b to ce05787 Compare February 24, 2025 13:03
@priyanshi-yb priyanshi-yb force-pushed the priyanshi/name-reg-table-name branch from bf3e4e0 to f09e16f Compare February 24, 2025 16:35
@priyanshi-yb priyanshi-yb marked this pull request as ready for review February 25, 2025 12:26
@priyanshi-yb priyanshi-yb changed the title Restrict live migration to single table-list and guardrail for that around table-list flags Restrict live migration to single table-list and added guardrails for that around table-list flags Feb 25, 2025
@@ -1040,6 +1040,9 @@ func storeTableListInMSR(tableList []sqlname.NameTuple) error {
}))
err := metaDB.UpdateMigrationStatusRecord(func(record *metadb.MigrationStatusRecord) {
record.TableListExportedFromSource = minQuotedTableList
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: minQuotedTableList -> minQuotedTableListWithRoots

}

if len(lo.Keys(newLeafTables)) > 0 {
utils.PrintAndLog("Detected new partition tables for the following partitioned tables. These will not be considered during migration:")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we prompt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if a prompt is required, I think there is nothing we want users to do here right? so this might just FYI?

Copy link
Collaborator

@makalaaneesh makalaaneesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went over it at a high level, looks good!
Will give it a more detailed look soon along with the tests.


// getInitialTableList for subsequent run with start-clean false and basic without table-list flags so no guardrails
startClean = false
getInitialTableistAndAssertExpectedResult(t, expectedTableList, expectedPartitionsToRootMap)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you added a new partition leaf above. This should error out no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not errorring out in this case just reporting to the user that new leafs found for these root tables and these will not be considered during migration,


getInitialTableistAndAssertExpectedResult(t, expectedTableList, expectedPartitionsToRootMap)

//case2: getInitialTableList for subsequent run with start-clean false and basic with no table-list flags so no guardrails
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If first run has table-list, and subsequent run does not have table-list, we should fail no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't fail; we use the stored one directly. Not sure if we should fail or not. Maybe we can have a prompt in this case as well saying using the initial table list ...

…ls and new leafs addition. Added tests for subsequent run validating the guardrails of missing and extra tables and new leaf tables addition
@priyanshi-yb priyanshi-yb force-pushed the priyanshi/name-reg-table-name branch from 3ed77c3 to b034db2 Compare March 3, 2025 07:05
@@ -552,12 +559,12 @@ func addLeafPartitionsInTableList(tableList []sqlname.NameTuple, ifTableListSet
allLeafPartitions := GetAllLeafPartitions(table)
prevLengthOfList := len(modifiedTableList)
switch true {
case len(allLeafPartitions) == 0 && rootTable != table: //leaf partition
case len(allLeafPartitions) == 0 && !rootTable.Equals(table): //leaf partition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does our NameTuple.Equals work if the pointers are different, but the underlying values are the same?
Is that what prompted you to make this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does the DeepEqual internally on the two nametuples.

@@ -576,7 +583,12 @@ func addLeafPartitionsInTableList(tableList []sqlname.NameTuple, ifTableListSet
func GetRootTableOfPartition(table sqlname.NameTuple) (sqlname.NameTuple, error) {
parentTable := source.DB().ParentTableOfPartition(table)
if parentTable == "" {
return table, nil
//now we know its a root table so return the tuple from the nameregistry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give more context in the comment on why you need to lookup from nameregistry again?
For my clarification: this is because the original tuples do not have the target-side of names, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, right. The original tuples are hand-crafted ones without target names, so for the root, we shouldn't use these but instead get a proper one from the name registry.

var err error
var includeTableList, excludeTableList []sqlname.NameTuple

applyFilterAndAddLeafTable := func(flagList string, flagName string) ([]sqlname.NameTuple, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This structure of applying include/exclude is relatively easy to understand, thanks 👍

} else {
tableList = fullTableList
//this is only for filtering the non-leaf and non-root tables - the mid level partitioned table from fullTableList
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove non-leaf and non-root, you're still calling addLeafPartitionsInTableList? That sounds a bit weird. Does that function ignore all leafs, and then re-add them?

Copy link
Contributor Author

@priyanshi-yb priyanshi-yb Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addLeafPartitionsInTableList has a boolean parameter addAllLeafPartitions in case this is false, that function will consider that the table-list passed is already exhaustive with all the tables (root, middle partition, leaf partitions, so on..) and it will just do the filtering to remove the middle level partitions and just the keep the leaf and root tables in the list.

partitionsToRootTableMap = msr.TargetRenameTablesMap

//For the first run of export data from target we use the TableListExportedFromSource (which has only root tables)
if len(msr.TargetExportedTableListWithLeafPartitions) == 0 || msr.TargetRenameTablesMap == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: create a variable for this since this is used below as well.

})

// Finding all the partitions of all root tables part of migration, and report if there any new partitions added
rootToNewLeafTablesMap, err := detectNewLeafPartitionsOnPartitionedTables(rootTables, registeredList)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this block below to where we are actually checking rootToNewLeafTablesMap and doing guardrails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use the rootToNewLeafTablesMap to calculate the currentList

return nil, nil, fmt.Errorf("error applying table list flags for current list and remove roots: %v", err)
}

if len(lo.Keys(rootToNewLeafTablesMap)) > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this check be done before the source.TableList == "" && source.ExcludeTableList == "" if block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it in the detectNewLeafPartitionsOnPartitionedTable function back.


}

func applyTableListFlagsOnCurrentAndRemoveRootsFromBothLists(registeredList []sqlname.NameTuple, tableListViaFlag string, excludeTableListViaFlag string, rootToNewLeafTablesMap map[string][]string, rootTables []sqlname.NameTuple, firstRunTableWithLeafsAndRoots []sqlname.NameTuple) ([]sqlname.NameTuple, []sqlname.NameTuple, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new lines for the args. Hard to read otherwise.

return true, nil
}

func (reg *NameRegistry) GetRegisteredTableList() ([]*sqlname.ObjectName, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here on why we implemented this (partitions)

}
}

func fetchTableListFromSourceAndAssertResult(t *testing.T, expectedPartitionsToRootMap map[string]string, expectedTableList []sqlname.NameTuple) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is more like to assert that filtering is happening properly, right? maybe name it something like: assertTableListFiltering ?

assert.Equal(t, len(diff), 0)
}

func getInitialTableistAndAssertExpectedResult(t *testing.T, expectedTableList []sqlname.NameTuple, expectedPartitionsToRootMap map[string]string) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be good to have these function names start with assert.
assertInitialTableList

// Tests the unknown table case by over ridding the utils.ErrExit function to assert the error msg
func testUnknownTableCaseForTableListFlags(t *testing.T, expectedUnknownErrorMsg string) {
previousFunction := utils.ErrExit
//changing the error exit function to test the unknown table scenario
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oo, this does not seem like the best way to do it. To be fair, it essentially is just like "mocking" a function, so it's okay, but I think the right way would be:
If we write our functions in a way such that they return a list of guardrail failures, then we can just assert that, instead of having to mocking the errExit function.

Is this too much work given our current state?

Copy link
Contributor Author

@priyanshi-yb priyanshi-yb Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a lot of work. I think it's more of a choice between these ways -

  1. Either we directly display the problem/guardrail and error out
  2. Return as an error and then display it on the console eventually with the whole context.

e.g.

➜ yb-voyager export data ... --export-type snapshot-and-changes --table-list asdfa                                                                              
migrationID: a1194e55-3d95-4d37-95e3-d5221eed976d
....
Unknown table names in the include list: [asdfa]
Valid table names are: [...]


➜   yb-voyager export data ... --export-type snapshot-and-changes --table-list asdfa
migrationID: a1194e55-3d95-4d37-95e3-d5221eed976d
...
error getting initial table list: error applying table list flags for current list and remove roots: error in apply table list filter on registered list for the flags in current run: Unknown table names in the include list: [asdfa]
Valid table names are: [...]

I think the first one is better, where we directly display the problem/guardrails because it's not an actual error per say.
Let me know what do you think?

}
defer testPostgresSource.DB().Disconnect()
defer testPostgresSource.ExecuteSqls(cleanUpSqls...)
defer os.RemoveAll(fmt.Sprintf("%s/", testExportDir))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the sprintf? It's also a bit dangerous in case testExportDir is empty, cause then it will attempt to removeall("/")

t.Errorf("error initialising name reg for the source: %v", err)
}
//Running the command level functions
source = *testPostgresSource.Source
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to dereference here, a little confused? can't you directly use the pointer, and update the table-list, etc?

Copy link
Contributor Author

@priyanshi-yb priyanshi-yb Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source (the global variable) is not a pointer type, so need to dereference here to assign the test source to the actual source variable that is used by the functions.

}

if source.TableList == "" && source.ExcludeTableList == "" {
//which mean no table-lists are passed in subsequent runs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if

  • in first run, user passed table-list
  • in subsequent run, user did not pass table-list
    ? Shouldn't we report the discrepancy ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, okay let me remove this condition altogether and let the guardrails function do its of job of reporting the discrepancy.

}

// getInitialTableList for subsequent run with start-clean false and basic without table-list flags so no guardrails
startClean = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make startClean a param of the getInitialTableistAndAssertExpectedResult func?

assertGuardrailsChecksForMissingAndExtraTablesInSubsequentRun(t, expectedMissingTables2, expectedExtraTables2, firstRunTableList, rootTables)
getInitialTableistAndAssertExpectedResult(t, firstRunTableList, firstRunPartitionsToRootMap)

//case5: getInitialTableList for subsequent run with start-clean false and basic with table-list and exclude-table-list flags
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is passing both include and exclude list allowed today? i would have thought we should disallow it? in what situations does this help?

Copy link
Contributor Author

@priyanshi-yb priyanshi-yb Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we allow that, it helps in scenario where:
tables on source - ab1,abc2,abc3,abc4,abc5, ..(few other tables)
table-list abc* (meaning all the abc prefixed tables)
exclude-table-list abc5 (excluding the abc5)


}

func testCasesWithDifferentTableListFlagValuesTest1(t *testing.T, firstRunTableList []sqlname.NameTuple, firstRunPartitionsToRootMap map[string]string) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice different set of test-cases!
If/when Shubham finds some other cases, pls do add it here.


assertGuardrailsChecksForMissingAndExtraTablesInSubsequentRun(t, expectedMissingTables4, expectedExtraTables4, firstRunTableList, rootTables)
getInitialTableistAndAssertExpectedResult(t, firstRunTableList, firstRunPartitionsToRootMap)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test idea: ensure that
first run: foo* (foo1, foo2)
second run : foo1,foo2 does not result in any guardrails thrown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I had it in mind for adding more tests around glob pattern support in the table list flags, will add it in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants