Avoid copying during iteration of all shards in routing table #94417

luyuncheng · 2023-03-09T07:59:51Z

we used cat/allocation to get cluster allocation.

the logic in the code:

elasticsearch/server/src/main/java/org/elasticsearch/rest/action/cat/RestAllocationAction.java

Lines 102 to 110 in a96ffea

    
           for (ShardRouting shard : state.getState().routingTable().allShards()) { 
        
               String nodeId = "UNASSIGNED"; 
        
               if (shard.assignedToNode()) { 
        
                   nodeId = shard.currentNodeId(); 
        
               } 
        
               allocs.merge(nodeId, 1, Integer::sum); 
        
           }

elasticsearch/server/src/main/java/org/elasticsearch/cluster/routing/RoutingTable.java

Lines 201 to 208 in a96ffea

    
           public List<ShardRouting> allShards() { 
        
               List<ShardRouting> shards = new ArrayList<>(); 
        
               for (String index : indicesRouting.keySet()) { 
        
                   List<ShardRouting> allShardsIndex = allShards(index); 
        
                   shards.addAll(allShardsIndex); 
        
               } 
        
               return shards; 
        
           }

it would iterator all shards twice, we can deduplicated it

elasticsearchmachine · 2023-03-09T08:33:45Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner

I think your analysis is correct. However, this will apply to all callers of RoutingTable#allShards() - there's just no good reason to copy all the shards in the cluster into a new list like this, so I think it would be better to fix the method on RoutingTable so that all callers (including future callers) will see the benefits. For instance, try making it return Iterable<ShardRouting> (you'll probably find org.elasticsearch.common.collect.Iterators#flatMap to be useful in this endeavour) or Stream<ShardRouting>.

2. Reduce iterator in RestAllocationAction

DaveCTurner

Looks ok to me, but @original-brownbear IIRC we removed some methods that made it easy to iterate over the routing table because they were over-used and much more expensive than doing it by index. Are you ok with bringing them back in this limited format to avoid GET _cat/shards and GET _cat/allocation having to do all that copying?

DaveCTurner · 2023-03-09T17:10:46Z

@elasticmachine ok to test

luyuncheng · 2023-03-09T17:15:29Z

Are you ok with bringing them back in this limited format to avoid GET _cat/shards and GET _cat/allocation having to do all that copying?

LGTM , let me try it

DaveCTurner · 2023-03-09T17:18:38Z

LGTM , let me try it

@luyuncheng sorry that question was intended for @original-brownbear who worked on related optimisations such as #84987 and #84955. No need for you to do anything here.

original-brownbear · 2023-03-09T17:20:50Z

@DaveCTurner yea definitely, that makes sense to me!

DaveCTurner

LGTM then 😄

luyuncheng · 2023-03-09T17:30:39Z

LGTM , let me try it

@luyuncheng sorry that question was intended for @original-brownbear who worked on related optimisations such as #84987 and #84955. No need for you to do anything here.

Got it , it is great that #84987 and #84955 optimized.

Today when applying a new cluster state we block the cluster applier thread for up to 5s while waiting to acquire each shard lock. Failure to acquire the shard lock is treated as an allocation failure, so after 5 retries (by default) we give up on the allocation. The shard lock may be held by some other actor, typically the previous incarnation of the shard which is still shutting down, but it will eventually be released. Yet, 5 retries of 5s each is sometimes not enough time to wait. Knowing that the shard lock will eventually be released, we can retry much more tenaciously. Moreover there's no reason why we have to create the `IndexShard` while applying the cluster state, because the shard remains in state `INITIALIZING`, and therefore unused, while it coordinates its own recovery. With this commit we try and acquire the shard lock during cluster state application, but do not wait if the lock is unavailable. Instead, we schedule a retry (also executed on the cluster state applier thread) and proceed with the rest of the cluster state application process. Relates elastic#24530 Backport of elastic#94545 and elastic#94623 (and a little bit of elastic#94417) to 8.7

Today when applying a new cluster state we block the cluster applier thread for up to 5s while waiting to acquire each shard lock. Failure to acquire the shard lock is treated as an allocation failure, so after 5 retries (by default) we give up on the allocation. The shard lock may be held by some other actor, typically the previous incarnation of the shard which is still shutting down, but it will eventually be released. Yet, 5 retries of 5s each is sometimes not enough time to wait. Knowing that the shard lock will eventually be released, we can retry much more tenaciously. Moreover there's no reason why we have to create the `IndexShard` while applying the cluster state, because the shard remains in state `INITIALIZING`, and therefore unused, while it coordinates its own recovery. With this commit we try and acquire the shard lock during cluster state application, but do not wait if the lock is unavailable. Instead, we schedule a retry (also executed on the cluster state applier thread) and proceed with the rest of the cluster state application process. Relates #24530 Backport of #94545 and #94623 (and a little bit of #94417) to 8.7

luyuncheng added 2 commits March 9, 2023 13:42

Reduce iterator duplicated in RestAllocationAction

9bd76c3

1. Reduce iterator in RestAllocationAction

55084fc

elasticsearchmachine added v8.8.0 external-contributor Pull request authored by a developer outside the Elasticsearch team needs:triage Requires assignment of a team area label labels Mar 9, 2023

DaveCTurner added >bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed needs:triage Requires assignment of a team area label labels Mar 9, 2023

DaveCTurner self-assigned this Mar 9, 2023

elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Mar 9, 2023

DaveCTurner requested changes Mar 9, 2023

View reviewed changes

1. Use Stream<ShardRouting> optimize routingTable().allShards() iterator

6c7984a

2. Reduce iterator in RestAllocationAction

DaveCTurner reviewed Mar 9, 2023

View reviewed changes

DaveCTurner approved these changes Mar 9, 2023

View reviewed changes

DaveCTurner changed the title ~~Reduce iterator duplicated in RestAllocationAction~~ Avoid copying during iteration of all shards in routing table Mar 9, 2023

Add changelog

f530545

1. Use iterator insteadof Collectors.groupingBy

2f3668e

DaveCTurner merged commit 78f5cf0 into elastic:main Mar 10, 2023

DaveCTurner mentioned this pull request Apr 11, 2023

Async creation of IndexShard instances #95121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid copying during iteration of all shards in routing table #94417

Avoid copying during iteration of all shards in routing table #94417

luyuncheng commented Mar 9, 2023 •

edited

Loading

elasticsearchmachine commented Mar 9, 2023

DaveCTurner left a comment

DaveCTurner left a comment

DaveCTurner commented Mar 9, 2023

luyuncheng commented Mar 9, 2023

DaveCTurner commented Mar 9, 2023

original-brownbear commented Mar 9, 2023

DaveCTurner left a comment

luyuncheng commented Mar 9, 2023

	for (ShardRouting shard : state.getState().routingTable().allShards()) {
	String nodeId = "UNASSIGNED";

	if (shard.assignedToNode()) {
	nodeId = shard.currentNodeId();
	}

	allocs.merge(nodeId, 1, Integer::sum);
	}

	public List<ShardRouting> allShards() {
	List<ShardRouting> shards = new ArrayList<>();
	for (String index : indicesRouting.keySet()) {
	List<ShardRouting> allShardsIndex = allShards(index);
	shards.addAll(allShardsIndex);
	}
	return shards;
	}

Avoid copying during iteration of all shards in routing table #94417

Avoid copying during iteration of all shards in routing table #94417

Conversation

luyuncheng commented Mar 9, 2023 • edited Loading

elasticsearchmachine commented Mar 9, 2023

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner commented Mar 9, 2023

luyuncheng commented Mar 9, 2023

DaveCTurner commented Mar 9, 2023

original-brownbear commented Mar 9, 2023

DaveCTurner left a comment

Choose a reason for hiding this comment

luyuncheng commented Mar 9, 2023

luyuncheng commented Mar 9, 2023 •

edited

Loading