-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add request-reply operator to Microsoft Azure provider #44675
base: main
Are you sure you want to change the base?
Conversation
4ca3d16
to
e890eb6
Compare
admin_hook = AdminClientHook(azure_service_bus_conn_id=self.azure_service_bus_conn_id) | ||
self._create_reply_subscription_for_correlation_id(admin_hook, context) | ||
|
||
message_hook = MessageHook(azure_service_bus_conn_id=self.azure_service_bus_conn_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to refactor this one into a message_hook() cached property method:
@cached_property
def message_hook() -> MessageHook:
return MessageHook(azure_service_bus_conn_id=self.azure_service_bus_conn_id)
# Remove the subscription on the reply topic | ||
self._remove_reply_subscription(admin_hook) | ||
|
||
def _send_request_message(self, message_hook: MessageHook, context: Context) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All those protected methods which have the MessageHook or AdminClientHook as parameter should become a public method of the corresponding hook. Most of the logic should always be located in the hooks, not in the operator, see the operator as some kind of facilitator within DAG's which delegates the heavy lifting to the hooks, so that the hooks on their turn can also be easily (re)used within PythonOperators without the need to rewrite the same logic as in the operators.
So in this case _send_request_message method should be part of MessageHook.
def execute(self, context: Context) -> None: | ||
"""Implement the request-reply pattern using existing hooks.""" | ||
self._validate_params() | ||
admin_hook = AdminClientHook(azure_service_bus_conn_id=self.azure_service_bus_conn_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this also in a admin_hook() cached property method:
@cached_property
def admin_hook() -> AdminClientHook:
return AdminClientHook(azure_service_bus_conn_id=self.azure_service_bus_conn_id)
self._remove_reply_subscription(admin_hook) | ||
|
||
def _send_request_message(self, message_hook: MessageHook, context: Context) -> None: | ||
with message_hook.get_conn() as service_bus_client: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After above remarks, this could then become:
with self.message_hook.get_conn() as service_bus_client:
"""Implement the request-reply pattern using existing hooks.""" | ||
self._validate_params() | ||
admin_hook = AdminClientHook(azure_service_bus_conn_id=self.azure_service_bus_conn_id) | ||
self._create_reply_subscription_for_correlation_id(admin_hook, context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After above remark, this could then become a oneliner:
self._create_reply_subscription_for_correlation_id(self.admin_hook, context)
"Created subscription %s on topic %s", self.subscription_name, self.reply_topic_name | ||
) | ||
|
||
def _create_subscription(self, admin_asb_conn: ServiceBusAdministrationClient, context: Context): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_create_subscription method should be part of AdminHook
"Sent request with id %s to queue %s", self.reply_correlation_id, self.request_queue_name | ||
) | ||
|
||
def _create_reply_subscription_for_correlation_id( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_create_reply_subscription_for_correlation_id method should be part of AdminClientHook
auto_delete_on_idle="PT6H", # 6 hours | ||
) | ||
|
||
def _remove_reply_subscription(self, admin_hook: AdminClientHook) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_remove_reply_subscription method should be part of AdminClientHook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work (y)
Thank you for the review. |
Just want to check, right now several of the existing operators get a handle to the connection and call the Microsoft Azure library directly. These should really be refactored down into the hook as well, right? For example this: https://github.com/apache/airflow/blob/main/providers/src/airflow/providers/microsoft/azure/operators/asb.py#L316-L339 |
Good catch, indeed it would be better that this code resides within the hook, the hook should take care of the connection handling and exposes the logic within a public method which on it's turn is called from the operator, that way the same operation can also be executed from the hook within a PythonOperator. But don't worry, there are still a lot of operators written that way unfortunately, but if we clean up every time we need to modify an operator, we will get there one day :-) |
When I was working on this new operator I thought about moving some of those down into the hooks so I could reuse. I'll go ahead and put in a PR to address that and then update this PR to use those. It will reduce duplication and be a good thing. I'll try and do it this afternoon...we'll see, I have day-job work to do suddenly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify the value of this operator in a wide context. In your prespective what % of the users of azure provider need this operator?
This PR contains alot of code. It somewhat feels out of scope for the operators we serve (but I don't know azure so I can't say for sure). I am not very comfortable with accepting complex logic operators for providers that are owned by us. Unlike GCP and AWS where they share responsibility for maintaining the provider code, Azure are not involved. So I have to ask the question - do we want this code at Airflow or maybe a better place for it is in a 3rd party provider that you can roll on your own public repo?
Addison-Wesley, 2003: https://www.enterpriseintegrationpatterns.com/patterns/messaging/RequestReply.html | ||
|
||
Steps are: | ||
1. Generate a unique ID for the message. The subscription needs to exist before the request message is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of what you write here needs to be in the docs not in the class doc string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems fair. I'm not sure where to move it to though. I don't see documentation outside of the docstrings. Sorry if I'm missing something obvious....not really a python dev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you mean under the top-level docs folder. It never occurred to me that the providers would be documented outside the top-level providers folder. I'm happy to move things over. I'll try to get to that later this week.
It seems to me if a message is sent from an airflow DAG then the DAG author probably wants a message back at some point to confirm completion, check for errors, et cetera. To the best of my knowledge, the logic in this PR implements the standard design pattern for doing that. Also, after the refactors that dabla requested this will be much smaller and the hooks will be more useful. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
e890eb6
to
ab34641
Compare
Need to rewrite to use a dedicated response queue because there is a race condition between adding the subscription and modifying the filter. The alternatives are to discard any messages before sending the request or to fix the python SDK for ASB but seems simpler to use a queue. |
63d6301
to
972d753
Compare
c17d00f
to
1176c3f
Compare
1176c3f
to
733992a
Compare
I moved the ability to set message headers, reply-to and message-id into another PR #47522 |
733992a
to
351ff51
Compare
…properties) on the message
351ff51
to
574c47b
Compare
This PR adds a request-reply operator to implement the design pattern from Enterprise Integration Patterns, Hohpe, Woolf,
Addison-Wesley, 2003
In particular, this means one could:
a) Create a service bus queue and topic for a batch process
b) set up an auto-scaling Azure Container Job listening to an Azure Service Bus queue for messages
c) create a DAG using the request-reply operator to start the Azure Container Job and capture the reply when it finishes.
Potential improvements:
A working Azure Container App Job can be built using the scripts/event-job-aca.zsh in the repo https://github.com/perry2of5/http-file-rtrvr
A working DAG is provided below: