Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery Storage Write API wrapper for Python SDK #25685

Merged
merged 15 commits into from
Mar 28, 2023

Conversation

ahmedabu98
Copy link
Contributor

@ahmedabu98 ahmedabu98 commented Mar 1, 2023

Implementing a wrapper for Python SDK that uses Java's Storage API SchemaTransform (created in #23988) to write to BigQuery

Some features added and adjusted along the way:

  • Added module plugin utilities that help when creating Python-using-Java cross-language tests that use a particular expansion service (ie GCP, kinesis, etc.).
  • Using these utilities, two new test-suites are added for Python-using-Java GCP cross-language tests (one for Dataflow, one for Direct).
  • Adjusted Python SchemaTransform API for convenience when creating wrappers (continued in Use external config schema to construct Python SchemaTransform payload #26100)

Fixes #21961
Fixes #25669

@ahmedabu98 ahmedabu98 marked this pull request as ready for review March 1, 2023 23:32
@ahmedabu98
Copy link
Contributor Author

R: @chamikaramj

@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2023

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@chamikaramj chamikaramj self-requested a review March 1, 2023 23:49
@codecov
Copy link

codecov bot commented Mar 2, 2023

Codecov Report

Merging #25685 (0999c68) into master (94e72e3) will decrease coverage by 0.02%.
The diff coverage is 65.09%.

@@            Coverage Diff             @@
##           master   #25685      +/-   ##
==========================================
- Coverage   71.43%   71.42%   -0.02%     
==========================================
  Files         779      779              
  Lines      102530   102629      +99     
==========================================
+ Hits        73245    73305      +60     
- Misses      27824    27864      +40     
+ Partials     1461     1460       -1     
Impacted Files Coverage Δ
sdks/python/apache_beam/io/gcp/bigquery.py 69.81% <31.91%> (-1.60%) ⬇️
sdks/python/apache_beam/io/gcp/bigquery_tools.py 74.61% <91.48%> (+1.32%) ⬆️
sdks/python/apache_beam/transforms/external.py 80.51% <91.66%> (+1.77%) ⬆️

... and 14 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ahmedabu98 ahmedabu98 force-pushed the storage_write_python_wrapper branch from 7edafd1 to 4b476ad Compare March 10, 2023 22:39
@ahmedabu98 ahmedabu98 closed this Mar 13, 2023
@ahmedabu98 ahmedabu98 reopened this Mar 13, 2023
@ahmedabu98
Copy link
Contributor Author

ahmedabu98 commented Mar 13, 2023

Seems like tests couldn't find the expansion service ?

Yeah it's only seeing the default value (see debug println) that I set here. The setup task overwrites that value and launches an expansion service as expected, but the python task doesn't read the updated value. I'm not sure how scoping works between these tasks.

@ahmedabu98 ahmedabu98 force-pushed the storage_write_python_wrapper branch from 34357d2 to 300dbbf Compare March 14, 2023 23:00
@ahmedabu98
Copy link
Contributor Author

R: @chamikaramj

Tests are passing now, ready for another review.

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Almost there :)

@ahmedabu98 ahmedabu98 requested a review from chamikaramj March 23, 2023 17:57
Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@ahmedabu98 ahmedabu98 force-pushed the storage_write_python_wrapper branch from 9d3c870 to 52a6c4a Compare March 27, 2023 14:32
@ahmedabu98
Copy link
Contributor Author

Run Python_Xlang_Gcp_Direct PostCommit

@ahmedabu98
Copy link
Contributor Author

Run Python_Xlang_Gcp_Dataflow PostCommit

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants