Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source BigQuery: preserve TIMESTAMP microseconds #20002

Conversation

markandrus
Copy link
Contributor

@markandrus markandrus commented Dec 1, 2022

What

I believe I am seeing an issue with the BigQuery source which is very similar to this one for the Postgres source: #9157

Specifically,

  1. My team has a BigQuery table with a TIMESTAMP column.
  2. We set the cursor in Airbyte to the TIMESTAMP column.
  3. Airbyte seems to truncate the TIMESTAMPs, resulting in "2022-04-27T19:29:45Z" instead of "2022-04-27T19:29:45.722Z".
  4. Because of this, Airbyte seems to continuously re-sync the same records.

How

In order to solve this, I have tried to preserve microseconds for TIMESTAMP columns. I did it in a very simple way (converting to Instant and reusing DataTypeUtils.toISO8601StringWithMicroseconds). I didn't want to do anything more involved, just demonstrate the idea. Open to suggestions or — even better ― perhaps a core Airbyte contributor could run with this?

I don't actually know how to test this (or even how to iterate on the BigQuery source in isolation). Can you please provide me some pointers?

🚨 User Impact 🚨

IDK

Pre-merge Checklist

IDK

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@markandrus markandrus force-pushed the preserve-bigquery-timestamp-microseconds branch from a2e642b to 42ea56b Compare December 1, 2022 22:06
@sajarin sajarin added internal bounty bounty-S Maintainer program: claimable small bounty PR and removed bounty internal bounty-S Maintainer program: claimable small bounty PR labels Dec 5, 2022
@marcosmarxm marcosmarxm added the team/db-dw-sources Backlog for Database and Data Warehouse Sources team label Dec 8, 2022
@marcosmarxm
Copy link
Member

marcosmarxm commented Dec 15, 2022

/test connector=connectors/source-bigquery

🕑 connectors/source-bigquery https://github.com/airbytehq/airbyte/actions/runs/3706722873
❌ connectors/source-bigquery https://github.com/airbytehq/airbyte/actions/runs/3706722873
🐛

@marcosmarxm
Copy link
Member

Hello 👋:skin-tone-2: and thank you for your contribution!

Airbyte has instituted a code freeze between 19 and 30 December, to make sure there are no disruptions during the holidays.
Because of this, reviewing and merging your contribution may take longer than usual.
We apologize for the delay, but we want everyone to have a quiet and happy holiday.

If you have any questions or need further clarification, please don't hesitate to ping via Slack.

@sh4sh
Copy link
Contributor

sh4sh commented Mar 7, 2023

Was this resolved in #13166?

@sh4sh sh4sh removed the team/destinations Destinations team's backlog label Mar 7, 2023
@marcosmarxm marcosmarxm added the area/connectors Connector related issues label Apr 18, 2023
@akashkulk akashkulk self-requested a review April 24, 2023 22:43
@marcosmarxm
Copy link
Member

marcosmarxm commented Apr 25, 2023

/test connector=connectors/source-bigquery

🕑 connectors/source-bigquery https://github.com/airbytehq/airbyte/actions/runs/4796721423
❌ connectors/source-bigquery https://github.com/airbytehq/airbyte/actions/runs/4796721423
🐛

@marcosmarxm
Copy link
Member

marcosmarxm commented Apr 25, 2023

/test connector=connectors/source-bigquery

🕑 connectors/source-bigquery https://github.com/airbytehq/airbyte/actions/runs/4796962584
❌ connectors/source-bigquery https://github.com/airbytehq/airbyte/actions/runs/4796962584
🐛 https://gradle.com/s/j4ulnzbxropqm

Build Failed

Test summary info:

Could not find result summary

@octavia-squidington-iii octavia-squidington-iii removed the area/connectors Connector related issues label Apr 25, 2023
@cgardens cgardens added the area/connectors Connector related issues label Apr 25, 2023
@prateekmukhedkar
Copy link
Contributor

@markandrus can you please add a test case for this change?

@markandrus
Copy link
Contributor Author

Hi @prateekmukhedkar,

I don't know how to test these changes, but my team has also moved on from Airbyte, so I don't think I'd have time to work on this again. I believe this PR may still address a real issue with timestamp resolution in BigQuery, but I also no longer require the functionality. So if y'all want to run with this yourselves, or close the PR, either is fine with me.

@marcosmarxm
Copy link
Member

Closed because Source Bigquery was moved to Java CDK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues community connectors/source/bigquery internal needs-triage team/db-dw-sources Backlog for Database and Data Warehouse Sources team
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

10 participants