Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When replicating very large tables to s3, sync fails when 10000 part limit is exceeded. Add part size config. #6090

Closed
mikeplanting opened this issue Sep 15, 2021 · 5 comments
Assignees
Labels
area/connectors Connector related issues type/bug Something isn't working

Comments

@mikeplanting
Copy link

Enviroment

  • Airbyte version: 0.29.15-alpha
  • OS Version / Instance: AWS EC2
  • Deployment: Kubernetes
  • Source Connector and version: MySQL (configured to use CDC, binlog) 0.4.4
  • Destination Connector and version: S3 0.1.11
  • Severity: High
  • Step where error happened: Sync job, main error is that since we cannot adjust part size, the very large table replication hit the max 10,000 part limit set by AWS. If we can adjust part size, then we may be able to stay under the 10000 part limit.

Current Behavior

See Error above

Expected Behavior

Sync should finishe

Logs

2021-09-14 21:52:45 INFO 2021-09-14 21:52:45 [33mWARN[m i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):78 - {} - Airbyte message consumer: failed.
2021-09-14 21:52:45 ERROR Exception in thread "main" java.lang.IndexOutOfBoundsException: This stream was allocated the part numbers from 1 (inclusive) to 10001 (exclusive)and it has gone beyond the end..

LOG

replace this with
your long log
output here

Steps to Reproduce

  1. Setup mysql to s3 sync on very large table that will exceed 10000 parts
  2. Sync
  3. Error should show
@mikeplanting mikeplanting added the type/bug Something isn't working label Sep 15, 2021
@tuliren tuliren self-assigned this Sep 17, 2021
@sherifnada sherifnada added the area/connectors Connector related issues label Oct 7, 2021
@sherifnada
Copy link
Contributor

potential duplicate of #6245

@pprithvi
Copy link

pprithvi commented Nov 15, 2021

@sherifnada I was able to move ~140M records to s3 as flattened CSV but the same sync fails as JSON and not flattened CSV. Is anyone working on a PR on this?

@yurii-bidiuk yurii-bidiuk self-assigned this Dec 13, 2021
@yurii-bidiuk
Copy link
Contributor

Hi @mikeplanting, @tredencegithub. Are you able to reproduce this bug now? The part size config has been added in #5890. In the UI, it is called "Block Size". By default, it's 5. Please try to increase this value and see if sync fails.
image

@alexandr-shegeda alexandr-shegeda moved this to Implementation in progress in GL Roadmap Dec 17, 2021
@yurii-bidiuk yurii-bidiuk moved this from Implementation in progress to On hold in GL Roadmap Dec 22, 2021
@yurii-bidiuk
Copy link
Contributor

Hi @misteryeo, can we close this ticket, since the part size config was implemented by another PR #5890?

@misteryeo
Copy link
Contributor

Thanks @yurii-bidiuk - closing this ticket.

@oustynova oustynova moved this from On hold to Done in GL Roadmap Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues type/bug Something isn't working
Projects
No open projects
Archived in project
Development

No branches or pull requests

6 participants