Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TemplateUpgraders should be called during rolling restart #25263

Merged
merged 5 commits into from
Jun 22, 2017

Conversation

imotov
Copy link
Contributor

@imotov imotov commented Jun 15, 2017

In #24379 we added ability to upgrade templates on full cluster startup. This PR invokes the same update procedure also when a new node first joins the cluster allowing to update templates on a rolling cluster restart as well.

Closes #24680

@abeyad, @spinscale could you take a look to make sure this will work for you?

In elastic#24379 we added ability to upgrade templates on full cluster startup. This PR invokes the same update procedure also when a new node first joins the cluster allowing to update templates on a full cluster restart.

Closes elastic#24680
@@ -460,6 +463,7 @@ protected Node(final Environment environment, Collection<Class<? extends Plugin>
b.bind(UsageService.class).toInstance(usageService);
b.bind(NamedWriteableRegistry.class).toInstance(namedWriteableRegistry);
b.bind(MetaDataUpgrader.class).toInstance(metaDataUpgrader);
b.bind(TemplateUpgradeService.class).toInstance(templateUpgradeService);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is only bound in Guice so that tests can get access to it, so they can check if upgrades are in progress. Is this something that should be exposed via an api? At least let's find another way so we can keep additional stuff out of guice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I rewrote the test to get rid of guice there. I don't think we need an api for that. I was just trying to prevent a possible race condition in the test, but I found another way to do that without guice.

Copy link

@abeyad abeyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few minor nits and a question. Otherwise, LGTM

@@ -431,7 +439,6 @@ public static void toXContent(IndexTemplateMetaData indexTemplateMetaData, XCont
}
builder.endObject();

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra newline

* Upgrades Templates on behalf of installed {@link Plugin}s when a node joins the cluster
*/
public class TemplateUpgradeService extends AbstractComponent implements ClusterStateListener {
private final Tuple<Map<String, BytesReference>, Set<String>> EMPTY = new Tuple<>(Collections.emptyMap(), Collections.emptySet());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EMPTY could have a more descriptive name

AtomicInteger updateCount = new AtomicInteger();
// Make sure all templates are recreated correctly
assertBusy(() -> {
logger.info("checking....");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a more descriptive log message? or is this leftover?

assertThat(removedListener.getAndSet((ActionListener) args[1]), nullValue());
return null;
}).when(mockIndicesAdminClient).deleteTemplate(any(DeleteIndexTemplateRequest.class), any(ActionListener.class));

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra newline


public class TemplateUpgradeServiceTests extends ESTestCase {

private ClusterService clusterService = new ClusterService(Settings.EMPTY, new ClusterSettings(Settings.EMPTY, ClusterSettings
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final?

deleteTemplateListeners.add((ActionListener) args[1]);
return null;
}).when(mockIndicesAdminClient).deleteTemplate(any(DeleteIndexTemplateRequest.class), any(ActionListener.class));

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra newline

Collections.emptyList());

service.updateTemplates(additions, deletions);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add an assert here that putTemplateListeners.size() == additionsCount and deleteTemplateListeners.size() == deletionsCount?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit^2: assertThat(putTemplateListeners, hasSize(additionsCount));

putTemplateListeners.get(i).onFailure(new RuntimeException());
} else {
putTemplateListeners.get(i).onResponse(new PutIndexTemplateResponse(randomBoolean()) {

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not clear to me what the purpose of this response listener is? we are not asserting anything when invoking the action listener, so I'm not clear what its role is in the test?

@spinscale
Copy link
Contributor

To make sure this is addressed (and I may have just missed this, which I assume). The current cluster state listener implementation of TemplateUpgradeService runs on every node in the cluster, whenever there is a change - this also means, that when two nodes get the same cluster state change, they try to store/delete the same templates?

@imotov
Copy link
Contributor Author

imotov commented Jun 19, 2017

@spinscale you didn't miss anything. We are changing the way templates are upgraded - they are upgraded when a node with a new version joins not when it becomes the master, hence we need to run it on every node in the cluster. As a result, you are right, multiple nodes may try to creating the same template at the same time. Not sure if we can do much about out giving the requirements. Perhaps we can try optimizing MetaDataIndexTemplateService to not cause cluster state update if the template is exactly the same.

@imotov
Copy link
Contributor Author

imotov commented Jun 19, 2017

@spinscale and I discussed this a bit more and we came up with a schema that will allow the new nodes to update the template but will prevent update swarms. Before trying to update the templates

  1. A node will check if the master has the newest version. If it does and the nodes is the master, the master will perform the update. Otherwise the node will don't do anything.
  2. If the master doesn't have the highest version, then the node with the highest id among all nodes with the highest version will perform the upgrade.

@imotov
Copy link
Contributor Author

imotov commented Jun 20, 2017

@spinscale, @abeyad, I pushed the new logic that ensures that only one node can try to make updates at a time. Could you take another look?

Copy link
Contributor

@spinscale spinscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I keep thinking if the explicit master node check is fully necessary or can just be part of the highest version check - but I dont have any particular opinion to be honest. Thanks for doing this and coming up with the much better solution.


private final UnaryOperator<Map<String, IndexTemplateMetaData>> indexTemplateMetaDataUpgraders;

public final ClusterService clusterService;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentionally public along with the next two variables?


});
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are calling the listeners here, but not doing any further assertions? i.e. for the updatesInProgress variable?

ExecutorService executorService = mock(ExecutorService.class);
when(threadPool.generic()).thenReturn(executorService);
doAnswer(invocation -> {
Object[] args = invocation.getArguments();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just use a EsExecutors.newDirectExecutorService()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh... didnt see the updateInvocation counter... nvm

Collections.emptyList());

service.updateTemplates(additions, deletions);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit^2: assertThat(putTemplateListeners, hasSize(additionsCount));

}
}

public void testClientNodeRunsTemplateUpdates() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should not be doesNotRunTemplateUpgrade I assume

}
for (ObjectCursor<DiscoveryNode> node : nodes.getMasterAndDataNodes().values()) {
if (node.value.getVersion().equals(maxVersion)) {
if (node.value.getId().compareTo(localNode.getId()) > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be combined to a single if-statement?

Copy link

@abeyad abeyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@imotov imotov merged commit e6e5ae6 into elastic:master Jun 22, 2017
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jun 23, 2017
* master:
  testCreateShrinkIndex: removed left over debugging log line that violated linting
  testCreateShrinkIndex should make sure to use the right source stats when testing shrunk target
  [Test] Add unit test for XContentParserUtilsTests.parseStoredFieldsValue (elastic#25288)
  Update percolate-query.asciidoc (elastic#25364)
  Remove remaining `index.mapping.single_type=false` (elastic#25369)
  test: Replace OldIndexBackwardsCompatibilityIT#testOldClusterStates with a full cluster restart qa test
  Fix use of spaces on Windows if JAVA_HOME not set
  ESIndexLevelReplicationTestCase.ReplicationAction#execute should send exceptions to it's listener rather than bubble them up
  testRecoveryAfterPrimaryPromotion shouldn't flush the replica with extra operations
  fix sort and string processor tests around targetField (elastic#25358)
  Ensure `InternalEngineTests.testConcurrentWritesAndCommits` doesn't pile up commits (elastic#25367)
  [TEST] Add debug logging if an unexpected exception is thrown
  Update Painless to Allow Augmentation from Any Class (elastic#25360)
  TemplateUpgraders should be called during rolling restart (elastic#25263)
imotov added a commit that referenced this pull request Jun 23, 2017
In #24379 we added ability to upgrade templates on full cluster startup. This PR invokes the same update procedure also when a new node first joins the cluster allowing to update templates on a rolling cluster restart as well.

Closes #24680
@imotov imotov deleted the issue-24680-rolling-restart-upgrades branch May 1, 2020 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants