Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Source Postgres: support all Postgres 14 types #8726

Merged
merged 16 commits into from
Dec 20, 2021

Conversation

tuliren
Copy link
Contributor

@tuliren tuliren commented Dec 12, 2021

What

How

  • Postgres does not have a public type enum. So the MySQL approach won't work for Postgres.
  • We use JDBCType for most of the types. For corner cases, we check the column type name returned from JDBC, and treat them differently.

Recommended reading order

  1. PostgresSourceDatatypeTest.java
  2. PostgresSourceOperations.java

🚨 User Impact 🚨

  • This PR changes some of the Postgres data types. It is published with a minor version bump.

Pre-merge Checklist

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

@github-actions github-actions bot added the area/connectors Connector related issues label Dec 12, 2021
@tuliren tuliren temporarily deployed to more-secrets December 12, 2021 06:58 Inactive
@tuliren tuliren force-pushed the liren/postgres-types-v2 branch from c64f2b4 to 5266017 Compare December 12, 2021 18:37
@tuliren tuliren temporarily deployed to more-secrets December 12, 2021 18:39 Inactive
@tuliren tuliren force-pushed the liren/postgres-types-v2 branch from 5266017 to 2fb2000 Compare December 13, 2021 00:31
@tuliren tuliren temporarily deployed to more-secrets December 13, 2021 00:33 Inactive
@tuliren tuliren linked an issue Dec 13, 2021 that may be closed by this pull request
@tuliren tuliren temporarily deployed to more-secrets December 13, 2021 06:35 Inactive
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Dec 13, 2021
@tuliren tuliren temporarily deployed to more-secrets December 13, 2021 07:07 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets December 13, 2021 07:09 Inactive
@tuliren
Copy link
Contributor Author

tuliren commented Dec 13, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1571824409
❌ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1571824409
🐛 https://gradle.com/s/ikzuaoro3mups

@tuliren tuliren marked this pull request as ready for review December 13, 2021 07:42
@tuliren tuliren requested a review from etsybaev December 13, 2021 07:42
@tuliren tuliren temporarily deployed to more-secrets December 13, 2021 07:43 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets December 13, 2021 07:44 Inactive
@tuliren tuliren temporarily deployed to more-secrets December 13, 2021 07:53 Inactive
} else if (value.equalsIgnoreCase("-infinity")) {
node.put(columnName, Double.NEGATIVE_INFINITY);
} else if (value.equalsIgnoreCase("nan")) {
node.put(columnName, Double.NaN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does json have a concept of infinity and Nan? how are these represented when output from the connector itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Actually Json numbers do not support NaN or infinity. To support these three special values, we need to make sure that the destination can handle them. Otherwise, the destination will fail. Too bad, I need to revert this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like how we currently handle dates, you can output them in a string type with a format hint.

Then have normalization handle the special format hint for these special numbers accordingly for the destination

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChristopheDuong, nice.

Where the code that handles this? I'd like to see what the format hint looks like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the json, it's "string" type.

Then, on normalization, if we find a format hint, we know it's actually a date not a string:

and (definition["format"] == "date" or "date" in definition["format"])

Copy link
Contributor

@ChristopheDuong ChristopheDuong Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of catalog.json with date:

so you can have a special string-float type instead of using json's number in order to encode special float values:

"db_float_column": {
  "type": "string",
  "format": "float"
}

and normalization can handle it in SQL: For example in bigquery, based on https://stackoverflow.com/a/53692265:

there is no literal representation of NaN or infinity, but the following case-insensitive strings can be explicitly cast to float:

"NaN"
"inf" or "+inf"
"-inf"

case 
  when db_float_column = '-infinity' then cast("-inf" as float) 
  when db_float_column = '+infinity' then cast("+inf" as float)
  else cast(db_float_coumn as float) end

or differently for another destination

…ns with JDBCType.ARRAY (#8749)

* support array for jdbc sources

* fixed PR comments, added test cases

* added more elements for test case

* Fix test case

* add array test case for JdbcSourceOperations

Co-authored-by: Liren Tu <[email protected]>
@yurii-bidiuk yurii-bidiuk temporarily deployed to more-secrets December 16, 2021 09:47 Inactive
@tuliren tuliren temporarily deployed to more-secrets December 17, 2021 23:42 Inactive
@tuliren tuliren temporarily deployed to more-secrets December 20, 2021 00:26 Inactive
@tuliren tuliren temporarily deployed to more-secrets December 20, 2021 01:04 Inactive
@tuliren
Copy link
Contributor Author

tuliren commented Dec 20, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1599988085
❌ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1599988085
🐛 https://gradle.com/s/qcegtyclrphw2

@jrhizor jrhizor temporarily deployed to more-secrets December 20, 2021 01:10 Inactive
@tuliren tuliren temporarily deployed to more-secrets December 20, 2021 03:48 Inactive
@tuliren
Copy link
Contributor Author

tuliren commented Dec 20, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1600389644
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1600389644
No Python unittests run

@tuliren tuliren temporarily deployed to more-secrets December 20, 2021 03:53 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets December 20, 2021 03:55 Inactive
@tuliren
Copy link
Contributor Author

tuliren commented Dec 20, 2021

/publish connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1600584483
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1600584483

@jrhizor jrhizor temporarily deployed to more-secrets December 20, 2021 05:18 Inactive
@tuliren tuliren temporarily deployed to more-secrets December 20, 2021 07:36 Inactive
@tuliren tuliren merged commit ff4b83b into master Dec 20, 2021
@tuliren tuliren deleted the liren/postgres-types-v2 branch December 20, 2021 07:59
schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
* Add skeleton to support all postgres types

* Consolidate type tests

* Fix corner cases

* Bump postgres version

* Add tests for time and timetz

* Format code

* Revert date to timestamp

* Update comment

* Fix unit tests

* 🐛 Jdbc sources: switch from "string" to "array" schema type for columns with JDBCType.ARRAY (airbytehq#8749)

* support array for jdbc sources

* fixed PR comments, added test cases

* added more elements for test case

* Fix test case

* add array test case for JdbcSourceOperations

Co-authored-by: Liren Tu <[email protected]>

* Revert changes to support special number values

Postgres source cannot handle these special values yet
See https://github.com/airbytehq/airbyte/issues/8902

* Revert infinity and nan assertion in unit tests

This reverts commit 3bee7d1.

* Update documentation

* Bump postgres source version in seed

Co-authored-by: Yurii Bidiuk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
6 participants