From c02f974524adf64b72086e37c92c051e26afdec0 Mon Sep 17 00:00:00 2001
From: morazow <m.orazow@gmail.com>
Date: Fri, 7 Jul 2023 10:35:19 +0200
Subject: [PATCH 1/2] Updated developer guide

Fixes #162
---
 .gitignore                         |  3 +++
 doc/development/developer_guide.md | 18 ++++++++++++++++++
 2 files changed, 21 insertions(+)
diff --git a/.gitignore b/.gitignore
index 46b8a441..051f2f4e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -58,6 +58,9 @@ tmp
 .project
 .scala_dependencies
 *.sc
+**/.settings/org.eclipse.core.resources.prefs
+**/.settings/org.eclipse.jdt.apt.core.prefs
+**/.settings/org.eclipse.m2e.core.prefs
 
 # Ensime
 .ensime
diff --git a/doc/development/developer_guide.md b/doc/development/developer_guide.md
index 41fa3411..b9ba50f9 100644
--- a/doc/development/developer_guide.md
+++ b/doc/development/developer_guide.md
@@ -14,6 +14,24 @@ userProvidedS3Bucket/
 
 The generated intermediate write path `<UUID>-<SparkApplicationId>/<SparkQueryId>/` is validated that it is empty before write. And it is cleaned up after the write query finishes.
 
+## S3 Staging Commit Process
+
+The Spark job that writes data to Exasol uses AWS S3 bucket as a intermediate storage. In this process, the `ExasolS3Table` API implementation uses Spark [`CSVTable`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala) writer to create files in S3.
+
+The write process continues as following:
+
+1. We ask Spark's `CSVTable` to commit data into S3 bucket
+1. We commit to import this data into Exasol database using Exasol's `CSV` loader
+1. And finally we ask our `ExasolS3Table` API implementation to commit the write process
+
+If any failure occurs, each step will trigger the `abort` method and S3 bucket locations will be cleaned up. If job finishes successfully, the Spark job end listener will trigger the cleanup process.
+
+## S3 Maximum Number of Files
+
+For the write Spark jobs, we allow maximum of `1000` CSV files to be written as intermediate data into S3 bucket. The main reason for this is that S3 SDK `listObjects` command returns up to 1000 objects from a bucket path per each request.
+
+Even though we use could improve it to list more objects from S3 bucket with multiple requests, we wanted to keep this threshold for now.
+
 ## Integration Tests
 
 The integration tests are run using [Docker](https://www.docker.com) and [exasol-testcontainers](https://github.com/exasol/exasol-testcontainers/)

From e52d2410c3b1a835e47cfdbd017ca8acac1b2b6d Mon Sep 17 00:00:00 2001
From: Muhammet Orazov <916295+morazow@users.noreply.github.com>
Date: Fri, 7 Jul 2023 11:34:18 +0200
Subject: [PATCH 2/2] Apply suggestions from code review

Co-authored-by: Christoph Pirkl <christoph.pirkl@exasol.com>
---
 doc/development/developer_guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/development/developer_guide.md b/doc/development/developer_guide.md
index b9ba50f9..b741c521 100644
--- a/doc/development/developer_guide.md
+++ b/doc/development/developer_guide.md
@@ -16,7 +16,7 @@ The generated intermediate write path `<UUID>-<SparkApplicationId>/<SparkQueryId
 
 ## S3 Staging Commit Process
 
-The Spark job that writes data to Exasol uses AWS S3 bucket as a intermediate storage. In this process, the `ExasolS3Table` API implementation uses Spark [`CSVTable`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala) writer to create files in S3.
+The Spark job that writes data to Exasol uses an AWS S3 bucket as intermediate storage. In this process, the `ExasolS3Table` API implementation uses Spark [`CSVTable`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala) writer to create files in S3.
 
 The write process continues as following:
 
@@ -30,7 +30,7 @@ If any failure occurs, each step will trigger the `abort` method and S3 bucket l
 
 For the write Spark jobs, we allow maximum of `1000` CSV files to be written as intermediate data into S3 bucket. The main reason for this is that S3 SDK `listObjects` command returns up to 1000 objects from a bucket path per each request.
 
-Even though we use could improve it to list more objects from S3 bucket with multiple requests, we wanted to keep this threshold for now.
+Even though we could improve it to list more objects from S3 bucket with multiple requests, we wanted to keep this threshold for now.
 
 ## Integration Tests