Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS S3 Staging COPY is writing records from different table in the same raw table #5664

Closed
danieldiamond opened this issue Aug 26, 2021 · 0 comments · Fixed by #5924 or #7074
Closed

Comments

@danieldiamond
Copy link
Contributor

danieldiamond commented Aug 26, 2021

Enviroment

  • Airbyte version: 0.29.11-alpha
  • OS Version / Instance: AWS EC2
  • Deployment: Docker
  • Source Connector and version: MySQL 0.4.3
  • Destination Connector and version: Snowflake 0.3.12
  • Severity: Critical
  • Step where error happened: Sync

Current Behavior

in the _AIRBYTE_RAW_MYTABLE i am seeing json blobs in _AIRBYTE_DATA column that are pertaining to two different tables.

i can give two examples where this happens. both of which have similar table names
e.g.

  • user and user_invites tables
  • entrydispute and entrydisputeevent tables

in the first table, there are JSON blobs with schemas pertaining to the second table. and so the resulting _scd and final tables contain NULL records because _AIRBYTE_RAW_USER has a record (row) with raw json data having the schema of _AIRBYTE_RAW_USER_INVITES and so the columns in this raw (in the final _scd and final table) are NULL because that row doesn't have any data with columns relating to the original table.

It appears that the AWS S3 Staging COPY is copying data from one table into the other

Expected Behavior

Each table should have _AIRBYTE_RAW_<TABLE> with only that table's data

Logs

If applicable, please upload the logs from the failing operation.
For sync jobs, you can download the full logs from the UI by going to the sync attempt page and
clicking the download logs button at the top right of the logs display window.

LOG

replace this with
your long log
output here

Steps to Reproduce

  1. I am not entirely sure. I have modified connectors to use standard and then S3 staging (changing this feature back and forth) as well as CDC vs standard (changing this feature back and forth)
  2. I have re-used the same source for different connections (and different tables)
  3. I have made two of the same connections (one using standard and the other using S3 staging)

Are you willing to submit a PR?

Remove this with your answer.

@danieldiamond danieldiamond added the type/bug Something isn't working label Aug 26, 2021
@jrhizor jrhizor added the priority/high High priority label Aug 26, 2021
@sherifnada sherifnada added area/connectors Connector related issues priority/critical Critical priority! and removed priority/high High priority labels Aug 26, 2021
@andriikorotkov andriikorotkov self-assigned this Sep 1, 2021
@sherifnada sherifnada reopened this Oct 7, 2021
@andriikorotkov andriikorotkov linked a pull request Oct 15, 2021 that will close this issue
38 tasks
@sherifnada sherifnada moved this to Done in GL Roadmap Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment