-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make databricks sql hook return a namedtuple #36205
Make databricks sql hook return a namedtuple #36205
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
@Joffreybvn -> do you have more things to add to it or you think it's ready to review ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to address #36205 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given all the discussions around this earlier I'm a bit picky on the naming. We are not making something JSON serializable - but according to the earlier discussion we make the SQL hook return the specified NamedTuple format.
I wonder btw if we should not have a typedef for it if this is part the the DbAPI. That would make things conceptually stronger and less ambiguous.
If you insist on naming it something serializable please leave out JSON. It has nothing to do with it.
Can we further be clear on what this PR wants to achieve in the commit message? The title and message are now confusing.
My suggestion:
Return SerializableRow in databricks sql hook
The ADR for Airflow' s DB API specifies it needs to return a
named tuple SerialzableRow or a list of them. The databricks
hook by defaults returns a dict which we now transform to a
SerializableRow.
cffce3e
to
cde35bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the way it is now.
@bolkedebruin @phanikumv -> please take a look and see what you think. I am all for this direction |
And I created #36224 and assigned to myself and will shortly take a close look at this - to make sure the common.sql API is not only defined but also enforced by the pre-commit. When I started it last year and defined Also I hope it will become a |
The databricks provider was already released with the breaking change-> 5.0.1. Maybe this version should be removed / replaced by a new "5.0.1" one ? |
Nope. We can at most yank it. There is no way to replace existing released version. Luckily it has not been yet in constraints of any Airflow version, so we can do it without any serious consequences. Let me actually yank it now. |
Yanked |
Also cc: @ephraimbuddy -> since we yanked 5.0.1, I will rerun the last build for v2-8-stable to refresh constraints and move the tag once done (currentlly 5.0.1 was in the constraints planned to release for 2.8.0rc3) Job running here: https://github.com/apache/airflow/actions/runs/7199187385 |
Constraints with databricks 5.0.0 updated 3312193 and tag for 2.8.0rc3 moved @ephraimbuddy |
f3619e2
to
8b1b0e6
Compare
@bolkedebruin For a long-term solution would it make sense to have a way for a provider to register a custom serializer/deserializer for objects its operator might produce? We can extend this logic to include providers. |
9255d9a
to
2bfb8ad
Compare
…in static checks
49b8089
to
affd2f4
Compare
@@ -167,7 +195,7 @@ def run( | |||
handler: Callable[[Any], T] = ..., | |||
split_statements: bool = ..., | |||
return_last: bool = ..., | |||
) -> T | list[T]: | |||
) -> tuple | list[tuple] | list[list[tuple] | tuple] | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mypy was not happy with -> tuple | list[tuple] | list[list[tuple] | tuple]
as output - which is correct. I got error: "Overloaded function signatures 1 and 2 overlap with incompatible return types".
Following mypy docs, I added a None
for it to be stop complaining. But that's not correct, this method cannot return None when a handler is provided...
Implementing definitive typing is out of the scope of this PR, and I admit mypy beat me on that case. Thus I let that for #36224 - but I'm whiling to help on that issue too of course.
To be honest, I found quite interesting to be in controversial PR. Thus, totally fine for me :) Current status:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much much better! I understand what this is about now. The follow up would make things even clearer so absolutely appreciated if you want to help out there.
Hey @bolkedebruin -> next time when you merge PR - can you please make sure you keep "squash and merge" for PRs and do not remove the number - this one has been merged without it and it has no (#36205) in the name - which made it disappear from our release notes generation. I noticed it accidentally luckily. |
Sorry @potiuk i did actually squash and merge but adjusted the commit message to cover what was done. That removed the ref. I'll be more careful |
No worries :). I caught it luckily |
This is a fix proposal for #36161 (review)
The ADR for Airflow's DB API specifies that DbApiHooks needs to return a tuple / list of tuples. The databricks hook by defaults returns a dict-like
databricks.Row
object which we now transform into a new nametuple - which is compliant with the 'common data structure' specification.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.