-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement BigQuery Table Schema Update Operator #15367
Conversation
5d878e2
to
a797f32
Compare
0f8cca8
to
791d158
Compare
630d250
to
0fb0247
Compare
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
@marcosmarxm How important would you say it is to move functionality into a hook? Solving the use-case was easy using two already existing hooks and I don't think I need to create an additional hook given I don't implement any new functionality in how we talk with BigQuery Further I think mutating a mutable object shouldn't be a problem in this case and I am not too keen to deepcopy it for the sake of it. Lastly the naming of the schema_fields parameter, I wanted to be consistent with naming and input format of that field between this and other operators, do you think it's a problem? Changing it into updates_to_schema_fields or similar is an easy possibility. |
The CI broke because of pylint can you run pre-commit locally to organize imports? |
ee376d1
to
3f3f5b7
Compare
@marcosmarxm |
b059345
to
bd9af54
Compare
22a7093
to
7e128e2
Compare
The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the workflow link to check the reason. |
89964c5
to
0f960c3
Compare
044e0d0
to
8d8bc74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
Thanks for the thorough reviews @marcosmarxm and @tswast. Great jiob @thejens ! |
With this change we implement a new operator that handles patching of table schemas in bigquery.
This is needed as typing out an entire schema data structure (schema), in order to set e.g. a field description on a single field requires a lot of overhead. Also, many times the schema is not known or very complex as it may be the result of a Query or parsed automatically when importing files as tables.
This operator is useful for a workflow like:
Upstream: Create a BigQuery table as the output of a Query or import operator. Writer of job/operator knows the names of the fields, perhaps the types, but not necessarily how other schema fields are defined.
Downstream (this operator): Supply a partial schema definition that only contains field names and description values that will be patched on to the "generated by bigquery" schema from upstream.