Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve docs for support hadoop compatible file system when use HDFS … #24918

Merged

Conversation

hqbhoho
Copy link
Contributor

@hqbhoho hqbhoho commented Feb 6, 2025

Description

Follow up to #24627

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Feb 6, 2025
@github-actions github-actions bot added the docs label Feb 6, 2025
@hqbhoho hqbhoho requested a review from mosabua February 6, 2025 03:32
@hqbhoho
Copy link
Contributor Author

hqbhoho commented Feb 6, 2025

@losipiuk @mosabua Could you help review it? thanks!

Copy link
Member

@mosabua mosabua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clarify if this is about Alluxio or some other partially Hadoop compatible system. And if we make it more explicit as Alluxio as the concrete example we should link to where to get the JAR files from and what they are called maybe

@@ -507,6 +507,10 @@ the property may be configured for:
- Block [data size](prop-type-data-size) for HDFS storage.
- `4MB`
- HDFS
* - `exchange.hdfs.skip-directory-scheme-validation`
- Skip directory scheme validation to support hadoop compatible file system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Skip directory scheme validation to support hadoop compatible file system.
- Skip directory scheme validation to support Hadoop-compatible file system.

arguably "partially Hadoop-compatible" right?

@@ -603,6 +607,19 @@ exchange-manager.name=hdfs
exchange.base-directories=hadoop-master:9000/exchange-spooling-directory
hdfs.config.resources=/usr/lib/hadoop/etc/hadoop/core-site.xml
```
You can enable `exchange.hdfs.skip-directory-scheme-validation` to support hadoop compatible file system. Please do the following steps:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can enable `exchange.hdfs.skip-directory-scheme-validation` to support hadoop compatible file system. Please do the following steps:
You can enable `exchange.hdfs.skip-directory-scheme-validation` to support other Hadoop- compatible file systems:

@@ -603,6 +607,19 @@ exchange-manager.name=hdfs
exchange.base-directories=hadoop-master:9000/exchange-spooling-directory
hdfs.config.resources=/usr/lib/hadoop/etc/hadoop/core-site.xml
```
You can enable `exchange.hdfs.skip-directory-scheme-validation` to support hadoop compatible file system. Please do the following steps:
1. Configure AbstractFileSystem implementation in `core-site.xml`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Configure AbstractFileSystem implementation in `core-site.xml`.
1. Configure the `AbstractFileSystem` implementation in `core-site.xml`.

@@ -603,6 +607,19 @@ exchange-manager.name=hdfs
exchange.base-directories=hadoop-master:9000/exchange-spooling-directory
hdfs.config.resources=/usr/lib/hadoop/etc/hadoop/core-site.xml
```
You can enable `exchange.hdfs.skip-directory-scheme-validation` to support hadoop compatible file system. Please do the following steps:
1. Configure AbstractFileSystem implementation in `core-site.xml`.
2. Put the relevant client jars into the directory `${Trino_HOME}/plugin/exchange-hdfs` on all Trino servers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Put the relevant client jars into the directory `${Trino_HOME}/plugin/exchange-hdfs` on all Trino servers.
2. Add the relevant client JAR files into the directory `${Trino_HOME}/plugin/exchange-hdfs` on all Trino cluster nodes.

2. Put the relevant client jars into the directory `${Trino_HOME}/plugin/exchange-hdfs` on all Trino servers.

The following `exchange-manager.properties` configuration example specifies Alluxio
as the spooling storage destination.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
as the spooling storage destination.
as the spooling storage location.


```properties
exchange-manager.name=hdfs
exchange.base-directories=alluxio://alluxio-master:19998/exchange-spooling-directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that not done via the Alluxio file system support instead of HDFS? We have to probably explain that

Copy link
Contributor Author

@hqbhoho hqbhoho Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for feedback. Since the HDFS client supports accessing other Hadoop-compatible file system, I believe adding this config can provide user with more options. By modifying the core-site.xml and adding the relevant client JARs, user can freely choose Hadoop-compatible file system. Different file system will require different configurations and client JARs. Here, Alluxio is only an example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough .. we should call that out in the written docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the suggested changes. Please let me know if there's anything else.

@hqbhoho hqbhoho force-pushed the feature/improve_docs_for_exchange_hdfs branch from 27bb725 to 46c04bd Compare February 6, 2025 12:02
@hqbhoho hqbhoho requested a review from mosabua February 6, 2025 12:10
Copy link
Member

@mosabua mosabua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.

@mosabua mosabua merged commit 1afa2b8 into trinodb:master Feb 7, 2025
8 checks passed
@github-actions github-actions bot added this to the 471 milestone Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants