moving single object with GCSToGCSOperator
differs from gsutil mv command
#37576
Closed
1 of 2 tasks
Labels
area:providers
good first issue
kind:bug
This is a clearly a bug
provider:google
Google (including GCP) related issues
Apache Airflow Provider(s)
google
Versions of Apache Airflow Providers
apache-airflow-providers-google==10.10.0
Apache Airflow version
2.6.3
Operating System
Debian 11
Deployment
Google Cloud Composer
Deployment details
Reproducible locally in our Dockerfile based with Python environment installed with conda in our VS Code dev container. Local executor, Postgres database. Same error in our Google Cloud Composer deployment (k8s and Postgres and celery executor). Can provide full pip install with Dockerfile if needed.
What happened
The result of
GCSToGCSOperator
differs based of the existing source files in the source bucket. And the result ofGCSToGCSOperator
also differs if we run the equavalentgsutil mv
command. I believe this is because theGCSToGCSOperator
treats moving a single object different than moving multiple objects.What you think should happen instead
The
GCSToGCSOperator
should match what thegsutil mv
command does.How to reproduce
Overview
Airflow operator usage
Here is our example usage of this operator.
gsutil mv usage
Here is our example usage of the gsutil mv command.
Test 1: Expected result
Given that these files exist before running the task.
The Airflow
GCSToGCSOperator
task will movegs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/12345.txt
togs://bucket-name-2/folder/nested_folder/aaaa/bbbb/cccc/12345.txt
gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/67890.txt
togs://bucket-name-2/folder/nested_folder/aaaa/bbbb/cccc/67890.txt
This matches what the equivalent gsutil command would do.
Test 2: Unexpected result
Given that these files exist before running the task.
The Airflow
GCSToGCSOperator
task will movegs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/12345.txt
togs://bucket-name-2/folder/nested_folder/12345.txt
with doesn't retain the nested folder structure like the first test.This does not match what the equivalent gsutil command would do. The gsutil mv command would correctly move
gs://bucket-name/folder/nested_folder/aaaa/bbbb/cccc/12345.txt
togs://bucket-name-2/folder/nested_folder/aaaa/bbbb/cccc/12345.txt
.Anything else
Here is the gcloud version output from my tests above.
> gcloud version Google Cloud SDK 453.0.0 alpha 2023.10.27 beta 2023.10.27 bq 2.0.98 bundled-python3-unix 3.9.17 core 2023.10.27 gcloud-crc32c 1.0.0 gke-gcloud-auth-plugin 0.5.6 gsutil 5.27
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: