[DOC] Misleading and unclear documentation for the Spark Connector in the SQL/PPL docs #8212

salyh · 2024-09-11T14:25:33Z

What do you want to do?

The Spark connector is, according to this comment only supporting AWS EMR Serverless Spark (which means I need to have AWS credentials). This should be made clear in the docs.
The docs lacks examples how to setup EMR Serverless Spark and OpenSearch and where to provide the configuration (like spark.uri). For an user its unclear how to setup a basic working example.
Some of the config properties lacks examples and the info which values are valid:
- spark.uri "The identifier for your Spark data source." is misleading, lacks example and what the default is and wether its mandatory
- spark.auth.typeIts unclear which values are valid and what the default is and wether its mandatory
The spark connector docs lacks an reference to https://opensearch.org/docs/latest/dashboards/management/data-sources/ (and potentially https://opensearch.org/docs/latest/dashboards/management/accelerate-external-data/) and an explanation and examples how to add spark as a datasource
The docs are not coherent with https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/spark_connector.rst
- emr.cluster is missing for example
The ppl example is unclear

POST /_plugins/_ppl
content-type: application/json
{
   "query": "source = my_spark.sql('select * from alb_logs')"
}

To what is my_spark referring to?

Version:
all since Spark connector is supported

What other resources are available?

The text was updated successfully, but these errors were encountered:

Naarcha-AWS · 2024-09-17T12:19:10Z

@salyh: Thanks for submitting this issue! I'll find a dev who can help make the changes you requested.

dblock · 2024-09-30T16:16:36Z

[Catch All Triage - 1, 2, 3, 4]

salyh · 2024-10-07T08:53:05Z

salyh added the untriaged label Sep 11, 2024

salyh mentioned this issue Sep 12, 2024

Add Docker-compose setting for IT testing opensearch-project/opensearch-spark#606

Closed

Naarcha-AWS self-assigned this Sep 17, 2024

Naarcha-AWS added the 1 - Backlog - DEV Developer assigned to issue is responsible for creating PR. label Sep 17, 2024

dblock removed the untriaged label Sep 30, 2024

Provide feedback