-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Configuration row arguments may get misplaced between Python SchemaTransformPayload encoding and Java RowCoder decoding #25669
Comments
I followed this unittest and tried encoding my configuration with SchemaTransformPayloadBuilder() and decoding with proto_utils.parse_Bytes(). It returned a configuration with fields in the right order. Not sure what the mismatch is between how Python and Java handle the payload. |
Update: we found that this problem was due to line 342 in the beam/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java Lines 333 to 345 in 3a1b641
Context: TypedSchemaTransformProvider uses a class to represent the configuration of the SchemaTransform. The configuration schema is inferred from its class using |
jdbc.py has a similar issue but that's not addressed here. transforms in jdbc.py needs to preserve ordering since it uses a config object [1] along with (old) SchemaIO API. This config object is encoded in the Python side and decoded in the Java side. So ordering of the the objects in the two sides have to match. One way to fix this will be to update jdbc.py to use schema-aware transforms (which will pick up the fix provided in this PR). [1] beam/sdks/python/apache_beam/io/jdbc.py Line 339 in e868c8d
|
What happened?
Was testing a SchemaTransform Python wrapper (#25521) and found that I had to have a right ordering of kwargs for the input arguments to reach the Java transform in the right fields. This is weird because the ordering of kwargs should have no impact.
For example, where
self._table="my_project:my_dataset.xlang_table"
,the following works fine:
and I get a configuration object in Java transform that looks like this:
However, if I change the kwargs to look like this (switch places of table and writeDisposition):
I get the following configuration object. Notice the value intended for
table
is now in thewriteDisposition
field.Issue Priority
Priority: 1 (data loss / total loss of function)
Issue Components
The text was updated successfully, but these errors were encountered: