-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow excluding a cluster from a CCS search using multi-target syntax #97865
Conversation
Pinging @elastic/es-search (Team:Search) |
Hi @quux00, I've created a changelog YAML for you. |
One issue I have found is that if you negate all the remote indices you specified, rather than searching nothing or returning an error, it searches the local indexes, even when the local cluster was explicitly not included. Example GET remote*:*,-remote*:*/_search?ccs_minimize_roundtrips=false // NOTE: MRT=true has the same result
{
"size": 0,
"aggs": {
"indexgroup": {
"terms": {
"field": "_index"
}
}
}
} returns a response that shows all local indexes were searched: {
"took": 19,
"timed_out": false,
"_shards": {
"total": 7,
"successful": 7,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1204,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"indexgroup": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "blogs",
"doc_count": 1200
},
{
"key": "myidx",
"doc_count": 1
},
{
"key": "test1",
"doc_count": 1
},
{
"key": "test2",
"doc_count": 1
},
{
"key": "test3",
"doc_count": 1
}
]
}
}
} This is a quite surprising and unexpected result, which I assumed was a bug but upon reading the code it seems to be intentional, given this code in RemoteClusterService public Map<String, OriginalIndices> groupIndices(IndicesOptions indicesOptions, String[] indices) {
final Map<String, OriginalIndices> originalIndicesMap = new HashMap<>();
final Map<String, List<String>> groupedIndices = groupClusterIndices(getRemoteClusterNames(), indices);
if (groupedIndices.isEmpty()) {
// search on _all in the local cluster if neither local indices nor remote indices were specified
originalIndicesMap.put(LOCAL_CLUSTER_GROUP_KEY, new OriginalIndices(Strings.EMPTY_ARRAY, indicesOptions));
} else {
for (Map.Entry<String, List<String>> entry : groupedIndices.entrySet()) {
String clusterAlias = entry.getKey();
List<String> originalIndices = entry.getValue();
originalIndicesMap.put(clusterAlias, new OriginalIndices(originalIndices.toArray(new String[0]), indicesOptions));
}
}
return originalIndicesMap;
} In this case That code was added in this PR #33899, which states:
Is that still true? If yes, why? And does that apply to a situation where the user (likely by mistake) both included and then excluded the same remote clusters? Should we instead throw an error in that case? |
The multi-target syntax for indices allows excluding an index with the "-" sign. Public docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/api-conventions.html#api-multi-index This commit expands that functionality to index expressions that include a cluster alias. For example: POST logs*,*:logs*,-remote4:*,-remote1*:*/_async_search Would result in search all remote clusters except for remote4 and remote1, remote11, remote12, remote13, etc.. A singleton wildcard is required in the index position of the `cluster:index`, to specify that we are excluding the entire cluster. This is useful when a cluster is down or slow during CCS searches. Excluding a subset of indexes on a remote cluster is not supported in this commit. For example, this will throw an error: POST logs*,*:logs*,-remote4:logs*/_async_search
…than empty string
I have modified Example:
but this will not throw an exception because the local indices were not excluded:
Note, however, that my code change only handles remote clusters - this will not throw an exception:
but will instead return empty search results:
The local index resolution happens in a different place than the remote cluster resolution and is quite intricate and complex, so I decided to accept the difference. The key issue here is that we need a way to exclude a cluster from a CCS search such that it is never contacted (due to being unavailable for example and causes the entire search request to fail). |
007d5d1
to
c03d07c
Compare
…ded all clusters, so Jbdc tests pass again
`remote*:*,-remote1:*,-remote4:*` will search all clusters with an alias that starts | ||
with "remote" except for "remote1" and "remote4". Note that to exclude a cluster | ||
with this notation you must exclude all of its indexes. Excluding a subset of indexes | ||
on a remote cluster is currently not supported. For example, this will throw an exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excluding a subset of indexes is supported but not quite this way? You can do remote1:-index1
?
Do we reject remote*:abc,-remote1:abc
? I am wondering what makes more sense, and also whether this would be perceived as a limitation from Kibana. I guess more of a convention: "if you want to exclude the cluster, use wildcard for indices".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do remote1:-index1?
Hmm, I haven't tried that. Good question. I'll look into it.
Do we reject
remote*:abc,-remote1:abc
? I am wondering what makes more sense
Yes, the current code rejects that.
I decided not to try to tackle that because it is a slippery slope. At this point in the code where it is handled, we don't have any idea what indexes exist on remote clusters. My goal here was to just exclude entire clusters. For your example, it would be fairly straightforward (though more work) to determine that it is a full cluster exclusion. But then a user might expect they can do remote*:abc*,-remote1:abc
too. But to truly handle that, we'd have to change the sub-search section of TransportSearchAction to send a different request to remote1 then we send to all the other clusters. And the CCS code pathways are already complicated enough that I didn't think that was worth it.
I guess more of a convention: "if you want to exclude the cluster, use wildcard for indices".
Right, my view is just to have a rule "you can exclude the entire cluster" here by using *
in the index position.
So we could allow remote*:abc,-remote1:abc
which is a negation of concrete indexes, but then we have to explain to users why any wildcard other than just remote1:*
is not allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your thinking is pragmatic, let's keep it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do remote1:-index1?
Thanks for pointing that out. I tested it and it does work in the same way the it does for local indexes. Which has an annoying quirk that you have to use a wildcard in the "inclusion" list to exclude a concrete index.
Example:
This works to exclude blogs
:
POST web_traffic,b*,-blogs/_async_search
but this does not:
POST web_traffic,blogs,-blogs/_async_search
// fails with
"error": {
"type": "status_exception",
"reason": "error while executing search",
"caused_by": {
"type": "index_not_found_exception",
"reason": "no such index [-blogs] and if you intended to exclude this index, ensure that you use wildcards that include it before explicitly excluding it",
"resource.type": "index_or_alias",
"resource.id": "-blogs",
"index_uuid": "_na_",
"index": "-blogs"
}
}
It used to just say "no such index [-blogs]" which is very misleading. I recently added the extra note "if you intended to exclude this index, ensure that you use wildcards that include it before explicitly excluding it".
So similarly, for remote clusters, this works:
POST remote*:blog*,remote1:-blogs/_async_search
but this fails with the original misleading error message:
POST remote*:blogs,remote1:-blogs/_async_search
"error": {
"type": "status_exception",
"reason": "error while executing search",
"caused_by": {
"type": "index_not_found_exception",
"reason": "no such index [-blogs]",
"index_uuid": "_na_",
"resource.type": "index_or_alias",
"resource.id": "-blogs",
"index": "-blogs"
}
}
so I'll file a bug to improve that error message.
Release highlight was added to 8.10 branch with PR #98620 |
elastic/elasticsearch#97865 expands index-pattern expressions to include a cluster alias for purposes of excluding an entire cluster from a cross-cluster search. This allows users to put the minus sign in front of the cluster name (`-cluster_one:*`). The advantage to this change is that it avoids sending any network calls to that cluster. Compare this to the existing syntax for excluding clusters, where the minus sign is in front of the index name (`cluster_one:-*`). The older syntax has to send the request to the remote cluster, which if it is down (and skip_unavailable=false), will cause the search to fail. This PR updates the docs to reflect the new syntax. --------- Co-authored-by: Kibana Machine <[email protected]>
elastic/elasticsearch#97865 expands index-pattern expressions to include a cluster alias for purposes of excluding an entire cluster from a cross-cluster search. This allows users to put the minus sign in front of the cluster name (`-cluster_one:*`). The advantage to this change is that it avoids sending any network calls to that cluster. Compare this to the existing syntax for excluding clusters, where the minus sign is in front of the index name (`cluster_one:-*`). The older syntax has to send the request to the remote cluster, which if it is down (and skip_unavailable=false), will cause the search to fail. This PR updates the docs to reflect the new syntax. --------- Co-authored-by: Kibana Machine <[email protected]> (cherry picked from commit 4b02740)
# Backport This will backport the following commits from `main` to `8.10`: - [update data view docs for excluding cluster (#164904)](#164904) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Nathan Reese","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-08-31T14:27:03Z","message":"update data view docs for excluding cluster (#164904)\n\nhttps://github.com/elastic/elasticsearch/pull/97865 expands\r\nindex-pattern expressions to include a cluster alias for purposes of\r\nexcluding an entire cluster from a cross-cluster search. This allows\r\nusers to put the minus sign in front of the cluster name\r\n(`-cluster_one:*`). The advantage to this change is that it avoids\r\nsending any network calls to that cluster. Compare this to the existing\r\nsyntax for excluding clusters, where the minus sign is in front of the\r\nindex name (`cluster_one:-*`). The older syntax has to send the request\r\nto the remote cluster, which if it is down (and skip_unavailable=false),\r\nwill cause the search to fail.\r\n\r\nThis PR updates the docs to reflect the new syntax.\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine <[email protected]>","sha":"4b02740b32542b6e84ee7a8abec9cb355f70db35","branchLabelMapping":{"^v8.11.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Team:Docs","release_note:skip","v8.10.0","v8.11.0"],"number":164904,"url":"https://github.com/elastic/kibana/pull/164904","mergeCommit":{"message":"update data view docs for excluding cluster (#164904)\n\nhttps://github.com/elastic/elasticsearch/pull/97865 expands\r\nindex-pattern expressions to include a cluster alias for purposes of\r\nexcluding an entire cluster from a cross-cluster search. This allows\r\nusers to put the minus sign in front of the cluster name\r\n(`-cluster_one:*`). The advantage to this change is that it avoids\r\nsending any network calls to that cluster. Compare this to the existing\r\nsyntax for excluding clusters, where the minus sign is in front of the\r\nindex name (`cluster_one:-*`). The older syntax has to send the request\r\nto the remote cluster, which if it is down (and skip_unavailable=false),\r\nwill cause the search to fail.\r\n\r\nThis PR updates the docs to reflect the new syntax.\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine <[email protected]>","sha":"4b02740b32542b6e84ee7a8abec9cb355f70db35"}},"sourceBranch":"main","suggestedTargetBranches":["8.10"],"targetPullRequestStates":[{"branch":"8.10","label":"v8.10.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.11.0","labelRegex":"^v8.11.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/164904","number":164904,"mergeCommit":{"message":"update data view docs for excluding cluster (#164904)\n\nhttps://github.com/elastic/elasticsearch/pull/97865 expands\r\nindex-pattern expressions to include a cluster alias for purposes of\r\nexcluding an entire cluster from a cross-cluster search. This allows\r\nusers to put the minus sign in front of the cluster name\r\n(`-cluster_one:*`). The advantage to this change is that it avoids\r\nsending any network calls to that cluster. Compare this to the existing\r\nsyntax for excluding clusters, where the minus sign is in front of the\r\nindex name (`cluster_one:-*`). The older syntax has to send the request\r\nto the remote cluster, which if it is down (and skip_unavailable=false),\r\nwill cause the search to fail.\r\n\r\nThis PR updates the docs to reflect the new syntax.\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine <[email protected]>","sha":"4b02740b32542b6e84ee7a8abec9cb355f70db35"}}]}] BACKPORT--> Co-authored-by: Nathan Reese <[email protected]>
The multi-target syntax for indices allows excluding an index with the "-" sign. Public docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/api-conventions.html#api-multi-index
This commit expands that functionality to index expressions that include a cluster alias
for purposes of excluding an entire cluster from a cross-cluster search.
For example:
POST logs*,*:logs*,-remote4:*,-remote1*:*/_async_search
would result in searching all remote clusters except for remote4 and remote1, remote11, remote12, remote13, etc..
A singleton wildcard is required in the index position of the
cluster:index
index expression, to specify that we are excluding the entire cluster. This is useful when a cluster is down or slow during CCS searches.Excluding a subset of indexes on a remote cluster is not supported in this commit. For example, this will throw an error:
POST logs*,*:logs*,-remote4:logs*/_async_search