Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struct property is not selected in Presto #7652

Closed
ErikJansenIRefact opened this issue Jul 24, 2020 · 1 comment · Fixed by #8422
Closed

Struct property is not selected in Presto #7652

ErikJansenIRefact opened this issue Jul 24, 2020 · 1 comment · Fixed by #8422
Labels
area/sql Pulsar SQL related features type/bug The PR fixed a bug or issue reported a bug

Comments

@ErikJansenIRefact
Copy link

Expected behavior

A struct property in a class should be retrieved in Presto when the topic is queried. Preferably as a nested object or otherwise as flattened properties.

Actual behavior

The property isn't selected at all.

Steps to reproduce

Create a class with a property of type LocalDate.
Add messages to a topic with JSON schema.
Select the topic in Presto.
It is expected that the property classified as LocalDate is returned (as a nested object).

When reading the message with Pulsar the message does contain the LocalDate property as an object.

Message id: 2535:2:-1:0
Message key: 1
Message value: {"orderNbr":1,"customerNbr":1,"orderStatus":"O","totalPrice":173665.47,"orderDate":{"year":1996,"month":"JANUARY","chronology":{"calendarType":"iso8601","id":"ISO"},"era":"CE","dayOfYear":2,"dayOfWeek":"TUESDAY","leapYear":true,"monthValue":1,"dayOfMonth":2},"orderPriority":"5-LOW","clerkNbr":"Clerk#000000951","shipPriority":0,"comment":"nstructions sleep furiously among "}
Message publish time: 1595580635827
Message event time: 1595580635773

When we select in Presto this is the result:

select * from pulsar."public/default"."order-json-key-date" where ordernbr=1;
 ordernbr | customernbr | orderstatus | totalprice | orderpriority |    clerknbr     | shippriority |              comment               | __partition__ |     __event_time__      |    __publish_time__     | __message_id__ | __seque
----------+-------------+-------------+------------+---------------+-----------------+--------------+------------------------------------+---------------+-------------------------+-------------------------+----------------+--------
        1 |           1 | O           |  173665.47 | 5-LOW         | Clerk#000000951 |            0 | nstructions sleep furiously among  |            -1 | 2020-07-24 10:50:35.773 | 2020-07-24 10:50:35.827 | (2535,2,0)     |        
(1 row)

The property "orderDate" is missing.

System configuration

Pulsar version: 2.6.0

@ErikJansenIRefact
Copy link
Author

ErikJansenIRefact commented Jul 24, 2020

Enclosed the code to reproduce.

Archive.zip

@sijie sijie added triage/week-31 area/sql Pulsar SQL related features type/bug The PR fixed a bug or issue reported a bug labels Jul 27, 2020
jiazhai pushed a commit that referenced this issue Feb 1, 2021
Fixes #4747 
Fixes #7652 

### Motivation

PIP-71: https://github.com/apache/pulsar/wiki/PIP-71:-Pulsar-SQL-migrate-SchemaHandle-to-presto-decoder

**Pip-Doc** : [[PIP-71][SQL]Migrate SchemaHandle to Presto-decoder](https://docs.google.com/document/d/1KwG0GoHccju4-QNPfvT6tOwhp5Fvs6-iZlfLooPxTDM/edit?usp=sharing)

In current version , pulsar-presto deserialize fields  rely on SchemaHandler , but this causes the following restrictions :

- **Metadata**: current nested field is dissociate with presto ParameterizedType , It treated nested field as a separated field , so  presto compiler can’t understand the type hierarchy . nested field should be Row type in presto (e.g.  Hive struct type support) . In the same way,array \ map type also shoud associate with presto ParameterizedTypes.
- **Decoder** : SchemaHandler is hard to work with  `RecordCursor.getObject()` to support ROW,MAP,ARRAY .etc

The **motivations** of this pull request :
-  ` PulsarMetadata` take advantage of `ParameterizedType`  to describe `row/array/map` Type instead of resolve nested columns in pulsar-presto connecter.
- Customize `RowDecoder | RowDecoderFactory | ColumnDecoder` to work with pulsar interface, and with some our own extensions  compare to presto original version , we can support more type for backward compatible (e.g. 
 ` TIMESTAMP\DATE\TIME\Real\ARRAY\MAP\ROW ` support).
- Decouple avro or schema type with `pulsar-presto main module` (RecordSet,ConnectorMetadata .etc ), aim to friendly with other schema type ( [ProtobufNative](#8372)  、thrift etc..).

### Modifications

Describe in [PIP-71: Pulsar SQL migrate SchemaHandle to presto decoder](https://docs.google.com/document/d/1KwG0GoHccju4-QNPfvT6tOwhp5Fvs6-iZlfLooPxTDM/edit?usp=sharing) 

----

### Does this pull request potentially affect one of the following parts:

*If `yes` was chosen, please highlight the changes*

  - Dependencies (does it add or upgrade a dependency): (**yes** )
  - The public API: (no)
  - The schema: ( no)
  - The default values of configurations: (no)
  - The wire protocol: (no)
  - The rest endpoints: (no)
  - The admin cli options: (no)
  - Anything that affects deployment: (no)

### Documentation

  - Does this pull request introduce a new feature? (yes)
  [[PIP][SQL]Migrate SchemaHandle to Presto-decoder](https://docs.google.com/document/d/1KwG0GoHccju4-QNPfvT6tOwhp5Fvs6-iZlfLooPxTDM/edit?usp=sharing)

* codeStyle fix

* Update pulsar-sql/presto-pulsar/src/test/java/org/apache/pulsar/sql/presto/TestPulsarConnector.java

Co-authored-by: ran <[email protected]>

* Update pulsar-sql/presto-pulsar/src/test/java/org/apache/pulsar/sql/presto/TestPulsarConnector.java

Co-authored-by: ran <[email protected]>

* Update pulsar-sql/presto-pulsar/src/test/java/org/apache/pulsar/sql/presto/TestPulsarConnector.java

Co-authored-by: ran <[email protected]>

* add keyValue\Primitive schema test && add schema cyclic definition detect

* merge master

* merge master

Co-authored-by: wangguowei <[email protected]>
Co-authored-by: ran <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sql Pulsar SQL related features type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants