-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Support Cross Cluster Search in SQL/PPL #789
Comments
Yes, I would like to have this feature to get the CCS capability with my old SQL queries. |
1 IntroductionWe will allow any node in an opensearch cluster to execute search requests against other opensearch clusters using piped processing language (PPL) and SQL. The remote cluster should respond with the documents matching the search query. This is a feature already supported by the OpenSearch service. Users can set up a connection between clusters, and later use that connection for cross cluster search using DSL. The cluster accepting the user request will be the coordinating cluster, and it will forward the request to remote clusters to fetch data results. The goal of this project is to allow users to make such search using PPL and SQL. 2 Background2.1 MotivationUsage of piped processing language (PPL) and SQL to query data stored in OpenSearch is now restricted to local cluster search only. Customers are not able to invoke cross cluster search with both PPL and SQL, and have to make individual connections to each of the remote clusters to fetch data. This is a pain point for them. As we are integrating more datasources into our query engine, such as prometheus for event monitoring, to allow customers to use PPL to easily explore and discover data of various sources, we also want to better support customers who set up multiple smaller OpenSearch clusters instead of a single large cluster. 2.2 Cross Cluster Search (CCS) in OpenSearchUsers could add the remote cluster to the coordinating clusters’s setting:
Syntax for cross cluster search is as follows:
2.3 Challenge for CCS in PPL/SQLOur execution flow for each query involves two major steps:
In order to support CCS in PPL/SQL, we need to make CCS possible for both steps.
The main design decision to make is: How to get index mapping and cluster settings from remote clusters? 3 User Interface3.1 Syntax Options3.1.1 Option 1 (preferred): Colon syntaxThis option uses the same identifier syntax as that of CCS in OpenSearch. IdentifiersIdentifiers should match one of the following syntax:
Both ExamplesPPL
SQL
Pros and ConsPros
Cons
3.1.2 Option 2: Dot syntaxThis option provides the same user experience with standard IdentifiersIdentifiers should match one of the following syntax:
Both ExamplesPPL
SQL
Pros and ConsPros
Cons
3.2 Response formatsResponse formats are exactly the same as a regular local cluster search. They are described here: https://opensearch.org/docs/latest/search-plugins/sql/response-formats/ The default response format is jdbc for both PPL and SQL. An example response is as follows:
4 DesignWe would like to fill the gap between the sql plugin and the OpenSearch CCS service. 4.1 Option 1Support “get mappings” query for CCS in OpenSearch, filling the gap shown in the diagram in section 2.3 Pros
Cons
4.2 Option 2Find a way to connect to each remote cluster individually, and route the cross cluster search requests to the remote clusters directly. For example, registering each remote cluster as a datasource. EvaluationHere are some reasons we believe this is not the right call:
4.3 Option 3 (preferred)Workaround by querying the local cluster index for field mapping and cluster settings, assuming that they’re the same as the remote clusters’.
This limitation can be removed once we could search the cross cluster index mapping using OpenSearch. Following the assumption that remote index mappings should also be present at local cluster, we get rid of the cluster name, and use only the index name to query the local cluster about its mapping. A breakdown of how we gather each info for a cross cluster search query:
For multi-indices query:
One thing to note is that if the user wants to search only a single index on a single remote cluster, and that the same index name doesn’t already exist on the local cluster, we do not allow such search. A workaround for this is that the user create a mapping table for that index on the local cluster. It does not need to be an index that stores actual data. It contains only the schema. This way, we could search the local cluster for field mapping, and search the data on the remote cluster. EvaluationOverall, this option might lead to the most confusing experience for users. However, it naturally solves the use case where an index is sharded across multiple clusters. With clear documentation and response warning / error handling, this option could work for some use cases. AppendixA. PPL/SQL Search SyntaxPPLBoth asterisk (*) in index names and comma-separated multi-indices search are supported. However, we do not support searching only using an asterisk.
SQLSQL syntax for querying multiple indices isn’t using the comma syntax. We can use the
Using the asterisk character (*), we can query multiple indices that match the expression. However, we do not support searching only using an asterisk.
B. Index Mapping and Cluster Settings in Query ProcessingCurrently, cluster settings is used to determine the maximum size of response. Index mapping is used for |
Remote cluster shouldn't be data sources. Today's concepts of cross cluster exists independent of data sources |
I would prioritize PPL work over sql. It's would be always good to have both/ but we have many users asking for ppl soon. |
Tasks
|
PPL CCS is released |
Is your feature request related to a problem?
Currently, SQL/PPL does not support cross cluster search (CCS) because CCS does not support fetch index mapping from remote cluster
What solution would you like?
Support CCS in SQL/PPL
The text was updated successfully, but these errors were encountered: