Library for ingesting files into Google Cloud Storage.
The drive and use case to create this library, was when you need to create a lot of files to test some capabilities and integrations that rely on GCS. If you just need to sync files it's better to use gsutil.
The name of the folder used for ingestion, will be converted into the bucket_name. It will traverse all children directories. You can either ingest the whole folder or ingest a random number of files using the folder as the source. If you pass the random number of files, the ingestor will randomly pick files and each iteration will have __{index} at the end of the file ingested.
The files at the sample folder are taken from BigQuery public datasets.
git clone https://.../gcs-file-ingestor.git
cd gcs-file-ingestor
- Google Cloud Storage Editor
<YOUR-CREDENTIALS_FILES_FOLDER>/gcs-file-ingestor-credentials.json
Please notice this folder and file will be required in next steps.
Using virtualenv is optional, but strongly recommended unless you use Docker or a PEX file.
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate
pip install --editable .
Replace below values according to your environment:
export GOOGLE_APPLICATION_CREDENTIALS=credentials_file_path
export PROJECT_ID=google_cloud_project_id
export SOURCE_FILE_LOCATION=./local_bucket_sync
export GCS_FILES_NUMBER=1000
See instructions below.
- Virtualenv
python main.py --project-id $PROJECT_ID --src-dir=$SOURCE_FILE_LOCATION create-files-from-dir
- Virtualenv
python main.py --project-id $PROJECT_ID --src-dir=$SOURCE_FILE_LOCATION create-random-number-of-files-from-dir --number-files $GCS_FILES_NUMBER
This script uses a label on the GCS created buckets to identify which buckets need to be deleted.
- Virtualenv
python main.py --project-id $PROJECT_ID clean-up-buckets
docker build -t gcs-file-creator .
docker run --rm --tty \
-v CREDENTIALS_FILES_FOLDER:/data \
-v YOUR-INGESTION_DIR:/ingestion-dir \
gcs-file-creator \
--project-id $PROJECT_ID \
--src-dir=/ingestion-dir \
create-files-from-dir