Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass conn ID to ObjectStoragePath via URI #35913

Merged
merged 5 commits into from
Dec 1, 2023

Conversation

uranusjr
Copy link
Member

This enables an alternative ObjectStoragePath init syntax, using the auth section in the URI to supply conn ID instead of a separate keyword argument. The explicit keyword argument is honored if supplied.

@bolkedebruin
Copy link
Contributor

I like being able to use an uri, but what did you base the format on?

@uranusjr
Copy link
Member Author

From AIP-48 (Data-aware scheduling):

  • s3://my-connection@bucket/order_data - using a specific connection ID for credentials
  • s3://bucket/order_data - using the default connection ID for this type. (Note: hooks don't currently have a concept of default connection IDs – that's a property of operators, so this information would be part of the new Dataset feature being added to providers.)

I don’t think this has any special semantics in Dataset scheduling currently, but having the connection ID in the auth part of the URI seems reasonable since Connection does hold auth information.

@bolkedebruin
Copy link
Contributor

bolkedebruin commented Nov 30, 2023

Can you add this to the documentation of object storage (how to use it)? Then lgtm.

This enables an alternative ObjectStoragePath init syntax, using the
auth section in the URI to supply conn ID instead of a separate keyword
argument. The explicit keyword argument is honored if supplied.
@uranusjr
Copy link
Member Author

I added a paragraph in the tutorial to explain the usage.

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good! I hope we will not have more complicated needs in the future that conflict with passing the connection ID in the user part.

While unlikely, it is *possible* for CPython to change how username and
password is parsed from netloc. To ensure we always do the right thing,
it is better to just implement the logic outselves so it always work as
expected. We need this logic to get the host info anyway, so it is not
possible to solely rely on CPython's parser.
@uranusjr
Copy link
Member Author

uranusjr commented Nov 30, 2023

Whatever that should go into the URI’s auth section should go into Connection since we don’t want users to put the real credentials in a DAG in any scenarios. One example is HTTP; while technically we can allow putting auth here, we should not. Whatever is the need, the actual auth should go into Connection, and be referenced via the ID in the path instead.

@bolkedebruin
Copy link
Contributor

Were you planning on making it clearer to the user when it fails (ie. a username, password probably has been provided)?

@uranusjr
Copy link
Member Author

uranusjr commented Dec 1, 2023

I do, but we don’t currently support any fs that logically takes auth from the URI, so I figure it’s better to split that into a separate changeset. Also I’m not quite sure when that error should be implemented yet.

@bolkedebruin bolkedebruin merged commit ab87cd0 into apache:main Dec 1, 2023
@ephraimbuddy ephraimbuddy added changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) AIP-58 labels Dec 5, 2023
@ephraimbuddy ephraimbuddy added this to the Airflow 2.8.0 milestone Dec 5, 2023
ephraimbuddy pushed a commit that referenced this pull request Dec 5, 2023
This enables an alternative ObjectStoragePath init syntax, using the
auth section in the URI to supply conn ID instead of a separate keyword
argument. The explicit keyword argument is honored if supplied.

(cherry picked from commit ab87cd0)
bolkedebruin pushed a commit to bolkedebruin/airflow that referenced this pull request Dec 8, 2023
This enables an alternative ObjectStoragePath init syntax, using the
auth section in the URI to supply conn ID instead of a separate keyword
argument. The explicit keyword argument is honored if supplied.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-58 changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants