-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make AvroArrowArrayReader possible to scan Avro backed table which contains nested records #7525
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me, though I am not an expert. I think this PR needs a test so that we don't accidentally break the feature in the future during a refactor
Thank you for the contribution @sarutak |
@alamb Or, is it better to add the test data to |
Thanks @sarutak -- that makes sense.
I suggest we add the data to arrow-testing first. I would feel much more comfortable merging code into datafusion that is tested, not only to prevent regressions, but also as part of reviewing unfamiliar code, having at test demonstrating it working is a major part of evaluating its suitability. |
@alamb All right. I've opend a PR in After the test data is added, I'll modify this PR to add tests. |
This PR proposes to add an Avro format test data which contains nested records. This data is necessary for testing the change proposed in [this PR](apache/datafusion#7525). The schema of this test data is as follows. ``` { "name": "record1", "namespace": "ns1", "type": "record", "fields": [ { "name": "f1", "type": { "name": "record2", "namespace": "ns2", "type": "record", "fields": [ { "name": "f1_1", "type": "string" }, { "name": "f1_2", "type": "int" }, { "name": "f1_3", "type": { "name": "record3", "namespace": "ns3", "type": "record", "fields": [ { "name": "f1_3_1", "type": "double" } ] } } ] } }, { "name": "f2", "type": "array", "items": { "name": "record4", "namespace": "ns4", "type": "record", "fields": [ { "name": "f2_1", "type": "boolean" }, { "name": "f2_2", "type": "float" } ] } } ] } ``` And the JSON representation of the Avro format file is as follows. ``` {"f1":{"f1_1":"aaa","f1_2":10,"f1_3":{"f1_3_1":3.14}},"f2":[{"f2_1":true,"f2_2":1.2},{"f2_1":true,"f2_2":2.2}]} {"f1":{"f1_1":"bbb","f1_2":20,"f1_3":{"f1_3_1":3.14}},"f2":[{"f2_1":false,"f2_2":10.2}]} ```
Has been merged |
@alamb Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @sarutak
Which issue does this PR close?
Closes #7524
Rationale for this change
This PR fixes an issue that I explained #7524.
What changes are included in this PR?
The causes are:
schema_lookup
considers thelookup
table only for root record. Child records have their ownlookup
table so they should be considered too.So, this change includes fixes for them.
Are these changes tested?
I prepared this Avro format file for test.
The schema of this file is as follows.
And the JSON representation of the Avro format file is as follows.
Using this data, I create a table and scan it.
The result seems as expected.
After this change is merged, I'll open a PR to add the the test data to arrow-testing. Then, I'll open a followup PR to add tests to avro.slt
Are there any user-facing changes?
No.