Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Redshift bulk loader fails on empty stream #3469

Closed
bamaer opened this issue Dec 1, 2023 · 0 comments · Fixed by #3470
Closed

[Bug]: Redshift bulk loader fails on empty stream #3469

bamaer opened this issue Dec 1, 2023 · 0 comments · Fixed by #3470
Assignees
Labels
Milestone

Comments

@bamaer
Copy link
Contributor

bamaer commented Dec 1, 2023

Apache Hop version?

SNAPSHOT-20231201

Java version?

openjdk version "11.0.21" 2023-10-17

Operating system

Linux

What happened?

Redshift bulk loader fails when trying to write an stream to S3 (CSV) and COPY into Redshift.

2023/12/01 05:59:00 - Hop - Pipeline opened.
2023/12/01 05:59:00 - Hop - Launching pipeline [redshift_empty_table]...
2023/12/01 05:59:00 - Hop - Started the pipeline execution.
2023/12/01 05:59:00 - redshift_empty_table - Executing this pipeline using the Local Pipeline Engine with run configuration 'local'
2023/12/01 05:59:00 - redshift_empty_table - Execution started for pipeline [redshift_empty_table]
2023/12/01 05:59:03 - S3 File - Part size less than minimum of 5MB, set to minimum
2023/12/01 05:59:03 - S3 File - Part size less than minimum of 5MB, set to minimum
2023/12/01 05:59:04 - load empty_table.0 - Connected to database dwh
2023/12/01 05:59:04 - load empty_table.0 - ERROR: Unexpected error
2023/12/01 05:59:04 - load empty_table.0 - ERROR: org.apache.hop.core.exception.HopDatabaseException:
2023/12/01 05:59:04 - load empty_table.0 - Error executing COPY statements
2023/12/01 05:59:04 - load empty_table.0 - ERROR: The specified S3 prefix 'main/empty_table.csv' does not exist
2023/12/01 05:59:04 - load empty_table.0 - Detail:
2023/12/01 05:59:04 - load empty_table.0 - -----------------------------------------------
2023/12/01 05:59:04 - load empty_table.0 - error: The specified S3 prefix 'main/empty_table.csv' does not exist
2023/12/01 05:59:04 - load empty_table.0 - code: 8001
2023/12/01 05:59:04 - load empty_table.0 - context:
2023/12/01 05:59:04 - load empty_table.0 - query: 161874653[child_sequence:1]
2023/12/01 05:59:04 - load empty_table.0 - location: s3_utility.cpp:708
2023/12/01 05:59:04 - load empty_table.0 - process: padbmaster [pid=1073963716]
2023/12/01 05:59:04 - load empty_table.0 - -----------------------------------------------
2023/12/01 05:59:04 - load empty_table.0 -
2023/12/01 05:59:04 - load empty_table.0 -
2023/12/01 05:59:04 - load empty_table.0 - at org.apache.hop.pipeline.transforms.redshift.bulkloader.RedshiftBulkLoader.processRow(RedshiftBulkLoader.java:124)
2023/12/01 05:59:04 - load empty_table.0 - at org.apache.hop.pipeline.transform.RunThread.run(RunThread.java:55)
2023/12/01 05:59:04 - load empty_table.0 - at java.base/java.lang.Thread.run(Thread.java:829)
2023/12/01 05:59:04 - load empty_table.0 - Caused by: com.amazon.redshift.util.RedshiftException: ERROR: The specified S3 prefix 'main/empty_table.csv' does not exist
2023/12/01 05:59:04 - load empty_table.0 - Detail:
2023/12/01 05:59:04 - load empty_table.0 - -----------------------------------------------
2023/12/01 05:59:04 - load empty_table.0 - error: The specified S3 prefix 'main/empty_table.csv' does not exist
2023/12/01 05:59:04 - load empty_table.0 - code: 8001
2023/12/01 05:59:04 - load empty_table.0 - context:
2023/12/01 05:59:04 - load empty_table.0 - query: 161874653[child_sequence:1]
2023/12/01 05:59:04 - load empty_table.0 - location: s3_utility.cpp:708
2023/12/01 05:59:04 - load empty_table.0 - process: padbmaster [pid=1073963716]
2023/12/01 05:59:04 - load empty_table.0 - -----------------------------------------------
2023/12/01 05:59:04 - load empty_table.0 -
2023/12/01 05:59:04 - load empty_table.0 - at com.amazon.redshift.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2608)
2023/12/01 05:59:04 - load empty_table.0 - at com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread(QueryExecutorImpl.java:2276)
2023/12/01 05:59:04 - load empty_table.0 - at com.amazon.redshift.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1881)
2023/12/01 05:59:04 - load empty_table.0 - at com.amazon.redshift.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1873)
2023/12/01 05:59:04 - load empty_table.0 - at com.amazon.redshift.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:370)
2023/12/01 05:59:04 - load empty_table.0 - at com.amazon.redshift.jdbc.RedshiftStatementImpl.executeInternal(RedshiftStatementImpl.java:515)
2023/12/01 05:59:04 - load empty_table.0 - at com.amazon.redshift.jdbc.RedshiftStatementImpl.execute(RedshiftStatementImpl.java:436)
2023/12/01 05:59:04 - load empty_table.0 - at com.amazon.redshift.jdbc.RedshiftStatementImpl.executeWithFlags(RedshiftStatementImpl.java:377)
2023/12/01 05:59:04 - load empty_table.0 - at com.amazon.redshift.jdbc.RedshiftStatementImpl.executeCachedSql(RedshiftStatementImpl.java:363)
2023/12/01 05:59:04 - load empty_table.0 - at com.amazon.redshift.jdbc.RedshiftStatementImpl.executeWithFlags(RedshiftStatementImpl.java:340)
2023/12/01 05:59:04 - load empty_table.0 - at com.amazon.redshift.jdbc.RedshiftStatementImpl.executeUpdate(RedshiftStatementImpl.java:298)
2023/12/01 05:59:04 - load empty_table.0 - at org.apache.hop.pipeline.transforms.redshift.bulkloader.RedshiftBulkLoader.processRow(RedshiftBulkLoader.java:116)
2023/12/01 05:59:04 - load empty_table.0 - ... 2 more
2023/12/01 05:59:04 - load empty_table.0 - Finished processing (I=0, O=0, R=0, W=0, U=0, E=1)
2023/12/01 05:59:04 - redshift_empty_table - Pipeline duration : 4.006 seconds [ 4.006" ]
2023/12/01 05:59:04 - redshift_empty_table - Execution finished on a local pipeline engine with run configuration 'local'
2023/12/01 05:59:04 - redshift_empty_table - Pipeline detected one or more transforms with errors.
2023/12/01 05:59:04 - redshift_empty_table - Pipeline is killing the other transforms!

Issue Priority

Priority: 2

Issue Component

Component: Database, Component: Transforms

@bamaer bamaer self-assigned this Dec 1, 2023
@github-actions github-actions bot added P2 Default Priority Database Transforms labels Dec 1, 2023
bamaer added a commit to bamaer/hop that referenced this issue Dec 1, 2023
bamaer added a commit to bamaer/hop that referenced this issue Dec 1, 2023
hansva added a commit that referenced this issue Dec 3, 2023
additional checks to skip file + copy on empty stream #3469
@hansva hansva added this to the 2.8 milestone Dec 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants