Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild remote connections on profile changes #37678

Merged
merged 18 commits into from
Feb 19, 2019

Conversation

Tim-Brooks
Copy link
Contributor

Currently remote compression and ping schedule settings are dynamic.
However, we do not listen for changes. This commit adds listeners for
changes to those two settings. Additionally, when those settings change
we now close existing connections and open new ones with the settings
applied.

Fixes #37201.

@Tim-Brooks Tim-Brooks added >bug :Distributed Coordination/Network Http and internode communication implementations v7.0.0 v6.7.0 labels Jan 22, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@Tim-Brooks
Copy link
Contributor Author

This is a POC for the change. I had to add a method to handle listening for more than two related settings. So please take a look at that as it is an area of the code I have not worked with much.

Additionally I need to add a test for setting changes in the RemoteClusterServiceTests. I will get to that in the next day or two.

* Note: Only settings registered in {@link SettingsModule} can be changed dynamically.
* </p>
*/
public synchronized void addAffixGroupUpdateConsumer(List<Setting.AffixSetting<?>> settings, BiConsumer<String, Settings> consumer) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pull this out into a separate change, leaving the usage here for reference? Then we can integrate this back after that has a separate review from the core/infra team.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #37679

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a question

getNodePredicate(settings), proxyAddress);
remoteClusters.put(clusterAlias, remote);
} else if (connectionProfileChanged(remote.getConnectionManager().getConnectionProfile(), connectionProfile)) {
// New ConnectionProfile. Must tear down existing connection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand this correctly we will cause existing connections to fail and further, if concurrently somebody is using is it will not be able to retry since it's close and there will be intermediate failures in such a case. I think in this case we have to somehow reconfigure instead of closing? I know I would like this to be simple but the sideeffects of a close of a connection while some consumer is "in the middle of something" might cause a lot of headaches since it's a rare event, hard to debug and to test. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you're describing is possible but will require changes at multiple levels of the networking stack. The ConnectionProfile is not currently mutable. So compression and ping intervals will need to be mutable. It also raises the question of whether connection timeout and handshake timeout would be mutable. I guess not since they are not dynamic settings currently. I think it would be very complicated to change the number of channels and their types, so that should stay static.

The compression change would be straightforward. For every request it we could just read from the volatile variable in the connection profile. Ping interval would be trickier because we would need to reschedule it to a different ping interval (probably at the completion of a ping-run).

It can be done, but it is uncertain to make the next release. It also would require reworking a lot with ConnectionProfile. Let me know if this is your preference and I will pursue updating this PR with those changes at some point in the future.

@Tim-Brooks Tim-Brooks requested a review from s1monw January 31, 2019 16:37
@Tim-Brooks
Copy link
Contributor Author

I just want to send a ping that this is still outstanding. Maybe we should have a discussion about what behavior we want here?

@jasontedor jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019
@Tim-Brooks
Copy link
Contributor Author

@jasontedor @ywelsch @s1monw - I know that we spoke about updating settings and accepting occasional failures earlier today.

However, when working on implementing a test for this scenario (that ccr following will recover) I encountered an issue. Both REMOTE_CLUSTER_PING_SCHEDULE and REMOTE_CLUSTER_COMPRESS mark REMOTE_CLUSTERS_SEEDS as a dependency. So you cannot currently submit a settings update without including that setting. Even if the seeds already exist, there must be seeds included in the settings update request.

java.lang.IllegalArgumentException: missing required setting [cluster.remote.leader_cluster.seeds] for setting [cluster.remote.leader_cluster.transport.compress]

	at __randomizedtesting.SeedInfo.seed([57EB59A912962E51:91509C043368EEF9]:0)
	at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:545)
	at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:476)
	at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:447)
	at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:418)
	at org.elasticsearch.action.admin.cluster.settings.SettingsUpdater.updateSettings(SettingsUpdater.java:95)
	at org.elasticsearch.action.admin.cluster.settings.TransportClusterUpdateSettingsAction$1.execute(TransportClusterUpdateSettingsAction.java:182)
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47)
	at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:687)
	at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:310)
	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:210)
	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142)
	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)
	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(Thread.java:834)

I assume this is fine? They need to re-add the seeds in order to add compression and ping schedule.

@Tim-Brooks
Copy link
Contributor Author

I also updated the documentation in remote-clusters.asciidoc to describe the behavior. @lcawl do you mind taking a look?

@Tim-Brooks Tim-Brooks requested a review from lcawl February 13, 2019 22:41
@ywelsch
Copy link
Contributor

ywelsch commented Feb 15, 2019

However, when working on implementing a test for this scenario (that ccr following will recover) I encountered an issue. Both REMOTE_CLUSTER_PING_SCHEDULE and REMOTE_CLUSTER_COMPRESS mark REMOTE_CLUSTERS_SEEDS as a dependency. So you cannot currently submit a settings update without including that setting. Even if the seeds already exist, there must be seeds included in the settings update request.
I assume this is fine? They need to re-add the seeds in order to add compression and ping schedule.

I think this is fine as long as it's documented.

Copy link
Contributor

@lcawl lcawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few additional edits in the documentation. LGTM and builds successfully

@s1monw
Copy link
Contributor

s1monw commented Feb 19, 2019

I assume this is fine? They need to re-add the seeds in order to add compression and ping schedule.

the point here was that this should only be set when the cluster is added. I think we should add this as a recommendation.

@Tim-Brooks Tim-Brooks merged commit a5cbef9 into elastic:master Feb 19, 2019
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Feb 19, 2019
Currently remote compression and ping schedule settings are dynamic.
However, we do not listen for changes. This commit adds listeners for
changes to those two settings. Additionally, when those settings change
we now close existing connections and open new ones with the settings
applied.

Fixes elastic#37201.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Feb 19, 2019
Currently remote compression and ping schedule settings are dynamic.
However, we do not listen for changes. This commit adds listeners for
changes to those two settings. Additionally, when those settings change
we now close existing connections and open new ones with the settings
applied.

Fixes elastic#37201.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Feb 19, 2019
Currently remote compression and ping schedule settings are dynamic.
However, we do not listen for changes. This commit adds listeners for
changes to those two settings. Additionally, when those settings change
we now close existing connections and open new ones with the settings
applied.

Fixes elastic#37201.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 20, 2019
…follow

* elastic/master: (37 commits)
  Enable test logging for TransformIntegrationTests#testSearchTransform.
  stronger wording for ilm+rollover in docs (elastic#39159)
  Mute SingleNodeTests (elastic#39156)
  AwaitsFix XPackUsageIT#testXPackCcrUsage.
  Resolve concurrency with watcher trigger service (elastic#39092)
  Fix median calculation in MedianAbsoluteDeviationAggregatorTests (elastic#38979)
  [DOCS] Edits the remote clusters documentation (elastic#38996)
  add version 6.6.2
  Revert "Mute failing test 20_mix_typless_typefull (elastic#38781)" (elastic#38912)
  Rebuild remote connections on profile changes (elastic#37678)
  Document 'max_size' parameter as shard size for rollover (elastic#38750)
  Add some missing toString() implementations (elastic#39124)
  Migrate Streamable to Writeable for cluster block package (elastic#37391)
  fix RethrottleTests retry (elastic#38978)
  Disable date parsing test in non english locale (elastic#39052)
  Remove BCryptTests (elastic#39098)
  [ML] Stop the ML memory tracker before closing node (elastic#39111)
  Allow retention lease operations under blocks (elastic#39089)
  ML refactor DatafeedsConfig(Update) so defaults are not populated in queries or aggs (elastic#38822)
  Fix retention leases sync on recovery test
  ...
@Tim-Brooks Tim-Brooks deleted the fix_remote_connection_updates branch April 30, 2020 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Determine behavior for dynamic remote cluster compress and ping schedule settings
8 participants