Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres Source: add timezone awareness and handle BC dates #13166

Merged

Conversation

VitaliiMaltsev
Copy link
Contributor

@VitaliiMaltsev VitaliiMaltsev commented May 25, 2022

What

  1. Cant handle any BC dates (Ex. '199-10-10 BC" would be missed BC attribute and returned as an AD date)
  2. As part of the [EPIC] Expand date-time schema declarations to describe timezone awareness we would like to add to the Source Postgres additional information to the JSON schema about timestamp which is timezone aware/unaware.

There are next date-related types that should be covered:

timestamp with/without timezone
date
time with/without timezone

How

  1. When converting an object from resultset metadata to LocalDate class, we can use the method getEra()
  2. Created new JsonSchemaType final variable to cover all of the Postgres date-time types:

Screenshot 2022-05-26 at 17 55 50

Recommended reading order

  1. JsonSchemaType.java
  2. AbstractJdbcCompatibleSourceOperations.java
  3. PostgresSourceOperations.java

🚨 User Impact 🚨

user impact is possible at the normalization stage, since normalization cannot yet handle new parameters in the json schema

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/SUMMARY.md
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@github-actions github-actions bot added area/connectors Connector related issues area/platform issues related to the platform area/protocol labels May 25, 2022
@VitaliiMaltsev VitaliiMaltsev temporarily deployed to more-secrets May 25, 2022 10:02 Inactive
@VitaliiMaltsev VitaliiMaltsev temporarily deployed to more-secrets May 25, 2022 13:26 Inactive
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label May 25, 2022
@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented May 25, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2384744434
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2384744434
No Python unittests run

@VitaliiMaltsev VitaliiMaltsev temporarily deployed to more-secrets May 25, 2022 13:31 Inactive
@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented May 25, 2022

/test connector=connectors/source-postgres-strict-encrypt

🕑 connectors/source-postgres-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/2385335102
✅ connectors/source-postgres-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/2385335102
No Python unittests run

Copy link
Contributor

@alexandr-shegeda alexandr-shegeda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexandr-shegeda alexandr-shegeda marked this pull request as ready for review May 26, 2022 15:49
Comment on lines 98 to 103
} else if (columnTypeName.equalsIgnoreCase("time")) {
putTime(json, columnName, resultSet, colIndex);
} else if (columnTypeName.equalsIgnoreCase("timetz")) {
putTimeWithTimezone(json, columnName, resultSet, colIndex);
} else if (columnTypeName.equalsIgnoreCase("timestamptz")) {
putTimestampWithTimezone(json, columnName, resultSet, colIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible we will have more else if statements here, in the future?
I suggest refactoring this block to smth like:

 return switch (typeName) {
      case "real", "double", "float" -> Double.parseDouble(varCharValue);
      case "varchar" -> varCharValue;
      case "boolean" -> Boolean.parseBoolean(varCharValue);
      case "integer" -> Integer.parseInt(varCharValue);
      default -> null;
    };

__
sema-logo  Summary: 🛠️ This code needs a fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible we will have more else if statements here, in the future? I suggest refactoring this block to smth like:

 return switch (typeName) {
      case "real", "double", "float" -> Double.parseDouble(varCharValue);
      case "varchar" -> varCharValue;
      case "boolean" -> Boolean.parseBoolean(varCharValue);
      case "integer" -> Integer.parseInt(varCharValue);
      default -> null;
    };

__ sema-logo sema-logo  Summary: 🛠️ This code needs a fix

replaced with switch

return isBCE(date) ? value.substring(1) + " BC" : value;
}

private static boolean isBCE(LocalDate date) {
Copy link
Contributor

@alexandertsukanov alexandertsukanov May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why static ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is util method and we can reuse it without initialising of the new instance AbstractJdbcCompatibleSourceOperations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitaliiMaltsev in such case, should it be public?

__
sema-logo  Summary: ❓ I have a question

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitaliiMaltsev in such case, should it be public?

__ sema-logo sema-logo  Summary: ❓ I have a question

made it public

import io.airbyte.protocol.models.ConnectorSpecification;
import io.airbyte.integrations.source.relationaldb.models.DbState;
import io.airbyte.integrations.source.relationaldb.models.DbStreamState;
import io.airbyte.protocol.models.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, don't use wildcard imports.

__
sema-logo  Summary: 🛠️ This code needs a fix  |  Tags: Brittle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, don't use wildcard imports.

__ sema-logo sema-logo  Summary: 🛠️ This code needs a fix  |  Tags: Brittle

removed star import

Comment on lines 157 to 161
} else if (typeName.equalsIgnoreCase("timestamptz")) {
return JDBCType.TIMESTAMP_WITH_TIMEZONE;
} else if (typeName.equalsIgnoreCase("timetz")) {
return JDBCType.TIME_WITH_TIMEZONE;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like, the comment above is related to this block of code as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like, the comment above is related to this block of code as well.

replaced with switch

import io.airbyte.protocol.models.ConnectorSpecification;
import io.airbyte.integrations.source.relationaldb.models.DbState;
import io.airbyte.integrations.source.relationaldb.models.DbStreamState;
import io.airbyte.protocol.models.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, don't use wildcard imports.

__
sema-logo  Summary: 🛠️ This code needs a fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, don't use wildcard imports.

__ sema-logo sema-logo  Summary: 🛠️ This code needs a fix

removed star import

Copy link
Contributor

@alexandertsukanov alexandertsukanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished iteration of review. Left some comments

__
sema-logo  Tags: Reusable

@VitaliiMaltsev VitaliiMaltsev temporarily deployed to more-secrets May 27, 2022 10:42 Inactive
@VitaliiMaltsev VitaliiMaltsev temporarily deployed to more-secrets May 31, 2022 10:54 Inactive
@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented May 31, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2414870642

@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented May 31, 2022

/publish connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2416144342
❌ Failed to publish connectors/source-postgres
❌ Couldn't auto-bump version for connectors/source-postgres

@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented May 31, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2416615910
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2416615910
No Python unittests run

Build Passed

Test summary info:

All Passed

@grishick
Copy link
Contributor

grishick commented Jun 1, 2022

/publish connector=connectors/source-postgres run-tests=false

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2418683236
🚀 Successfully published connectors/source-postgres
🚀 Auto-bumped version for connectors/source-postgres
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2418683236

@VitaliiMaltsev VitaliiMaltsev temporarily deployed to more-secrets June 1, 2022 01:13 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets June 1, 2022 01:53 Inactive
@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented Jun 1, 2022

/publish connector=connectors/source-postgres-strict-encrypt run-tests=false

🕑 connectors/source-postgres-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/2419945137
🚀 Successfully published connectors/source-postgres-strict-encrypt
❌ Couldn't auto-bump version for connectors/source-postgres-strict-encrypt

@VitaliiMaltsev VitaliiMaltsev merged commit 0a56b1d into master Jun 1, 2022
@VitaliiMaltsev VitaliiMaltsev deleted the vmaltsev/12905-postgres-source-handle-timezones-and-eras branch June 1, 2022 07:46
@grishick
Copy link
Contributor

grishick commented Jun 1, 2022

@evantahler FYI auto-bump has failed here (see @VitaliiMaltsev 's last attempt)

@evantahler
Copy link
Contributor

evantahler commented Jun 1, 2022

@grishick isn't a failure expected if the version on this branch has already been bumped (your previous /publish did work)? The previous failures of the publish command seem to be test failures, as noted above

SshPasswordPostgresSourceAcceptanceTest > testDiscover() FAILED
    org.testcontainers.containers.ContainerLaunchException: Container startup failed
        at app//org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:345)
        at app//org.testcontainers.containers.GenericContainer.start(GenericContainer.java:326)
(snip...)

@VitaliiMaltsev
Copy link
Contributor Author

@grishick isn't a failure expected if the version on this branch has already been bumped (your previous /publish did work)? The previous failures of the publish command seem to be test failures, as noted above

SshPasswordPostgresSourceAcceptanceTest > testDiscover() FAILED
    org.testcontainers.containers.ContainerLaunchException: Container startup failed
        at app//org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:345)
        at app//org.testcontainers.containers.GenericContainer.start(GenericContainer.java:326)
(snip...)

@evantahler auto-bump version issue is related to Postgres source strict encrypt connector, not to Postgres source

@evantahler
Copy link
Contributor

Ah! sorry, I did not read closely enough to see that there were 2 connectors getting published here.

I'm actually not sure how this was working since #11712. My reading of the action:

      - name: Check if connector in definitions yaml
        if: github.event.inputs.auto-bump-version == 'true' && success()
        run: |
          connector="airbyte/${{ env.IMAGE_NAME }}"
          definitionpath=./airbyte-config/init/src/main/resources/seed/
          sourcecheck=$(yq e ".. | select(has(\"dockerRepository\")) | select(.dockerRepository == \"$connector\")" "$definitionpath"source_definitions.yaml)
          destcheck=$(yq e ".. | select(has(\"dockerRepository\")) | select(.dockerRepository == \"$connector\")" "$definitionpath"destination_definitions.yaml)
          if [[ (-z "$sourcecheck" && -z "$destcheck") ]]
          then exit 1
          fi

Seems to imply that if we are bumping the version, in all cases, we want to update our *definitions.yaml. But, these strict-encrypt versions aren't listed there, so this command will fail & exit 1.

Looking at some other recent PRs (#13176 (comment)) it looks like they are also failing for the same reason. Do we want to publish the connector, or is that something that cloud should do? It would be a simple fix to skip parts of the action if the connector name matches *strict-encrypt*...

I also think that we might be moving away from these secondary image and use an ENV var / setting to toggle on strict-encrypt mode in cloud. @tuliren - what's the path forward here?

@grishick
Copy link
Contributor

grishick commented Jun 1, 2022

Seems to imply that if we are bumping the version, in all cases, we want to update our *definitions.yaml. But, these strict-encrypt versions aren't listed there, so this command will fail & exit 1.

Hm... so this means that publishing strict-encrypt connectors with publish command is broken, but we still need to be publishing them to docker hub unless we replace these connectors entirely with some other way of enforcing encryption in the cloud. I am not aware of any work to get rid of strict encrypt connectors though.

jscottpolevault pushed a commit to jscottpolevault/airbyte that referenced this pull request Jun 1, 2022
…q#13166)

* Postgres Source: add timezone awareness and handle BC dates

* fixed checkstyle

* add tests

* updated changelog

* removed star import

* fixed tests

* refactoring

* removed star import

* fixed bytea type

* created final static constants

* bump version

* auto-bump connector version

Co-authored-by: Octavia Squidington III <[email protected]>
@tuliren
Copy link
Contributor

tuliren commented Jun 2, 2022

The publication of the strict-encrypt version succeeded. It was the version bumping that failed. This is expected, because the strict-encrypt version only exists on cloud.

The current workflow is:

  1. Publish the normal version and the strict-encrypt version.
  2. Ignore the version bumping failure from the strict-encrypt version.
  3. Bump the strict-encrypt version in airbyte-cloud.

I guess we can just make the version bumping script ignore the error if a connector does not exist.

I also think that we might be moving away from these secondary image and use an ENV var / setting to toggle on strict-encrypt mode in cloud. @tuliren - what's the path forward here?

Yes, I think we can do the following:

  1. Add a cloud environment variable that is specific to the Airbyte cloud deployment.
    • Or reuse an existing one if it exists.
  2. Pass the environment variable to the connector container.
  3. When the connector detects such an environment variable, it updates the spec and behave just like the strict-encrypt version.

Added a comment to this related issue: #8634 (comment)

@quantson
Copy link

quantson commented Jun 3, 2022

After updating our Postgres source connector to 0.4.19 following this PR merge, we're witnessing issues in one of our pipelines because date field precision is now truncated if the value of the millisecond or second is 0.

Value in PG database: 2022-07-02T16:09:00.000000Z
Value exported to BigQuery with Postgres source 0.4.18: 2022-07-02T16:09:00.000000Z
Value exported to BigQuery with Postgres source 0.4.18: 2022-07-02T16:09

I'm not sure if it's on purpose or if it's a bug, we've had to revert to 0.4.18 though

@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented Jun 6, 2022

After updating our Postgres source connector to 0.4.19 following this PR merge, we're witnessing issues in one of our pipelines because date field precision is now truncated if the value of the millisecond or second is 0.

Value in PG database: 2022-07-02T16:09:00.000000Z Value exported to BigQuery with Postgres source 0.4.18: 2022-07-02T16:09:00.000000Z Value exported to BigQuery with Postgres source 0.4.18: 2022-07-02T16:09

I'm not sure if it's on purpose or if it's a bug, we've had to revert to 0.4.18 though
@quantson what datatype are you using in postgres for this value? (2022-07-02T16:09:00.000000Z)

created an issue #13489
cc @grishick @tuliren

@VitaliiMaltsev
Copy link
Contributor Author

@quantson please take a look #13549

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation area/platform issues related to the platform area/protocol
Projects
None yet
8 participants