-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TemplateUpgraders should be called during rolling restart #25263
TemplateUpgraders should be called during rolling restart #25263
Conversation
In elastic#24379 we added ability to upgrade templates on full cluster startup. This PR invokes the same update procedure also when a new node first joins the cluster allowing to update templates on a full cluster restart. Closes elastic#24680
@@ -460,6 +463,7 @@ protected Node(final Environment environment, Collection<Class<? extends Plugin> | |||
b.bind(UsageService.class).toInstance(usageService); | |||
b.bind(NamedWriteableRegistry.class).toInstance(namedWriteableRegistry); | |||
b.bind(MetaDataUpgrader.class).toInstance(metaDataUpgrader); | |||
b.bind(TemplateUpgradeService.class).toInstance(templateUpgradeService); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this is only bound in Guice so that tests can get access to it, so they can check if upgrades are in progress. Is this something that should be exposed via an api? At least let's find another way so we can keep additional stuff out of guice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I rewrote the test to get rid of guice there. I don't think we need an api for that. I was just trying to prevent a possible race condition in the test, but I found another way to do that without guice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few minor nits and a question. Otherwise, LGTM
@@ -431,7 +439,6 @@ public static void toXContent(IndexTemplateMetaData indexTemplateMetaData, XCont | |||
} | |||
builder.endObject(); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra newline
* Upgrades Templates on behalf of installed {@link Plugin}s when a node joins the cluster | ||
*/ | ||
public class TemplateUpgradeService extends AbstractComponent implements ClusterStateListener { | ||
private final Tuple<Map<String, BytesReference>, Set<String>> EMPTY = new Tuple<>(Collections.emptyMap(), Collections.emptySet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EMPTY
could have a more descriptive name
AtomicInteger updateCount = new AtomicInteger(); | ||
// Make sure all templates are recreated correctly | ||
assertBusy(() -> { | ||
logger.info("checking...."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a more descriptive log message? or is this leftover?
assertThat(removedListener.getAndSet((ActionListener) args[1]), nullValue()); | ||
return null; | ||
}).when(mockIndicesAdminClient).deleteTemplate(any(DeleteIndexTemplateRequest.class), any(ActionListener.class)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra newline
|
||
public class TemplateUpgradeServiceTests extends ESTestCase { | ||
|
||
private ClusterService clusterService = new ClusterService(Settings.EMPTY, new ClusterSettings(Settings.EMPTY, ClusterSettings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final?
deleteTemplateListeners.add((ActionListener) args[1]); | ||
return null; | ||
}).when(mockIndicesAdminClient).deleteTemplate(any(DeleteIndexTemplateRequest.class), any(ActionListener.class)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra newline
Collections.emptyList()); | ||
|
||
service.updateTemplates(additions, deletions); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an assert here that putTemplateListeners.size() == additionsCount
and deleteTemplateListeners.size() == deletionsCount
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit^2: assertThat(putTemplateListeners, hasSize(additionsCount));
putTemplateListeners.get(i).onFailure(new RuntimeException()); | ||
} else { | ||
putTemplateListeners.get(i).onResponse(new PutIndexTemplateResponse(randomBoolean()) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not clear to me what the purpose of this response listener is? we are not asserting anything when invoking the action listener, so I'm not clear what its role is in the test?
To make sure this is addressed (and I may have just missed this, which I assume). The current cluster state listener implementation of |
@spinscale you didn't miss anything. We are changing the way templates are upgraded - they are upgraded when a node with a new version joins not when it becomes the master, hence we need to run it on every node in the cluster. As a result, you are right, multiple nodes may try to creating the same template at the same time. Not sure if we can do much about out giving the requirements. Perhaps we can try optimizing MetaDataIndexTemplateService to not cause cluster state update if the template is exactly the same. |
@spinscale and I discussed this a bit more and we came up with a schema that will allow the new nodes to update the template but will prevent update swarms. Before trying to update the templates
|
@spinscale, @abeyad, I pushed the new logic that ensures that only one node can try to make updates at a time. Could you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I keep thinking if the explicit master node check is fully necessary or can just be part of the highest version check - but I dont have any particular opinion to be honest. Thanks for doing this and coming up with the much better solution.
|
||
private final UnaryOperator<Map<String, IndexTemplateMetaData>> indexTemplateMetaDataUpgraders; | ||
|
||
public final ClusterService clusterService; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intentionally public along with the next two variables?
|
||
}); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are calling the listeners here, but not doing any further assertions? i.e. for the updatesInProgress
variable?
ExecutorService executorService = mock(ExecutorService.class); | ||
when(threadPool.generic()).thenReturn(executorService); | ||
doAnswer(invocation -> { | ||
Object[] args = invocation.getArguments(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could just use a EsExecutors.newDirectExecutorService()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh... didnt see the updateInvocation
counter... nvm
Collections.emptyList()); | ||
|
||
service.updateTemplates(additions, deletions); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit^2: assertThat(putTemplateListeners, hasSize(additionsCount));
} | ||
} | ||
|
||
public void testClientNodeRunsTemplateUpdates() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that should not be doesNotRunTemplateUpgrade
I assume
} | ||
for (ObjectCursor<DiscoveryNode> node : nodes.getMasterAndDataNodes().values()) { | ||
if (node.value.getVersion().equals(maxVersion)) { | ||
if (node.value.getId().compareTo(localNode.getId()) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be combined to a single if-statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* master: testCreateShrinkIndex: removed left over debugging log line that violated linting testCreateShrinkIndex should make sure to use the right source stats when testing shrunk target [Test] Add unit test for XContentParserUtilsTests.parseStoredFieldsValue (elastic#25288) Update percolate-query.asciidoc (elastic#25364) Remove remaining `index.mapping.single_type=false` (elastic#25369) test: Replace OldIndexBackwardsCompatibilityIT#testOldClusterStates with a full cluster restart qa test Fix use of spaces on Windows if JAVA_HOME not set ESIndexLevelReplicationTestCase.ReplicationAction#execute should send exceptions to it's listener rather than bubble them up testRecoveryAfterPrimaryPromotion shouldn't flush the replica with extra operations fix sort and string processor tests around targetField (elastic#25358) Ensure `InternalEngineTests.testConcurrentWritesAndCommits` doesn't pile up commits (elastic#25367) [TEST] Add debug logging if an unexpected exception is thrown Update Painless to Allow Augmentation from Any Class (elastic#25360) TemplateUpgraders should be called during rolling restart (elastic#25263)
In #24379 we added ability to upgrade templates on full cluster startup. This PR invokes the same update procedure also when a new node first joins the cluster allowing to update templates on a rolling cluster restart as well.
Closes #24680
@abeyad, @spinscale could you take a look to make sure this will work for you?