Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: convertBoolToSemiSyncAction method to account for all semi sync actions #90

Closed
wants to merge 1 commit into from

Conversation

WilliamLu99
Copy link

@WilliamLu99 WilliamLu99 commented Apr 12, 2023

Description

Fixes a bug where ChangeTabletType would fail on clusters that don't have rpl_semi_sync_master and rpl_semi_sync_slave plugins loaded. Does so by refactoring the convertBoolToSemiSyncAction method to return SemiSyncActionNone if the plugin is not loaded. To do this, we query the underlying mysql to check if the rpl_semi_sync_% variables are present.

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Deployment Notes

@WilliamLu99
Copy link
Author

Not actually going to merge this, just creating the PR here first to get some feedback internally, check that CI passes. Then will try to upstream the change.

@@ -767,3 +767,11 @@ func (mysqld *Mysqld) SemiSyncReplicationStatus() (bool, error) {
}
return false, nil
}

func (mysqld *Mysqld) SemiSyncExtensionLoaded() bool {
qr, err := mysqld.FetchSuperQuery(context.TODO(), "SHOW GLOBAL VARIABLES LIKE 'rpl_semi_sync%'")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An improvement would be to check the INFORMATION_SCHEMA.PLUGINS meta table to check for extension being loaded. It is more performant (showing global variables is an expensive operation) and another reason being that, according to Jeremy, all SHOW ... queries are deprecated.

Here's a query you can plug in easily:

mysql> select count(*) > 0 as plugin_loaded from information_schema.plugins where plugin_name like "rpl_semi_sync_%";
+---------------+
| plugin_loaded |
+---------------+
|             0 |
+---------------+
1 row in set (0.00 sec)

plugin_loaded will be either 0 or 1

See:

https://dev.mysql.com/doc/refman/8.0/en/information-schema-plugins-table.html
and
https://dev.mysql.com/doc/refman/8.0/en/replication-semisync-installation.html

Copy link

@hkdsun hkdsun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we've improved the query to check for the extension being loaded, I think this can be proposed upstream!

@WilliamLu99 WilliamLu99 force-pushed the ignore-semisync-if-plugin-unloaded branch from 49c881d to a50a9d1 Compare April 13, 2023 21:33
@WilliamLu99 WilliamLu99 force-pushed the ignore-semisync-if-plugin-unloaded branch from a50a9d1 to 616393e Compare April 13, 2023 21:40
// Unload semi sync plugins
for _, tablet := range tablets[0:4] {
qr := utils.RunSQL(ctx, t, "select @@global.super_read_only", tablet)
result := fmt.Sprintf("%v", qr.Rows[0][0].ToString())
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased to main and now the replicas start with super_read_only, which forbids us to uninstall plugins.

@WilliamLu99 WilliamLu99 requested a review from hkdsun April 13, 2023 22:51
Copy link

@hkdsun hkdsun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!! I think this is definitely ready to be released/staged as a Shopify hotfix

In terms of submitting upstream, this is a nitpick but I think the level of abstraction of the test (end-to-end testing ChangeTabletType) doesn't match the level of abstraction of the change itself very well (how we interpret a false value).

@@ -405,6 +405,41 @@ func TestCrossCellDurability(t *testing.T) {
}
}

// Tests that ChangeTabletType works even when semi-sync plugins are not loaded.
func TestChangeTypeWithoutSemiSync(t *testing.T) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are submitting upstream, I must say that it's a bit weird to have this be the only test. Maybe we could add a unit test (on the tablet manager) when submitting?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, I'll add an unit test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look again. I don't think a unit test would fit in this scenario, as there's not much to unit test. Unit tests in this package use the fakemysqlddaemon to mock the Mysqld type that has the SemiSyncExtensionLoaded call.

Because of that, it limits the ability of what we are testing. A unit test in mysqlctl won't actually help us test any of the changes we made.

Fakemysqldaemon also mocks a bunch of other semi sync related methods that would otherwise break without my change.

Unit tests within this codebase in general seem to abstract away interactions with the DB. I think the endtoend cluster test is the right place for our tests, and that this test offers sufficient coverage for this change. I'll push this upstream, and if I get any feedback related to testing I'll make the change then.

@arthurschreiber
Copy link

👋 Hey friends!

We've run into the same issue as well. If there's anything we can do to help get this upstreamed, please let me know. 🙇‍♂️

@hkdsun
Copy link

hkdsun commented May 3, 2023

Hey Arthur, thanks for stopping by our fork. The upstream PR for this has been approved but William is digging into some suspect test failures here: vitessio#12914

@github-actions
Copy link

github-actions bot commented Jun 3, 2023

This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:

  • Push additional commits to the associated branch.
  • Remove the stale label.
  • Add a comment indicating why it is not stale.

If no action is taken within 7 days, this PR will be closed.

@github-actions github-actions bot added the Stale label Jun 3, 2023
@github-actions
Copy link

This PR was closed because it has been stale for 7 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants