Adding Resize Spark #630

blublinsky · 2024-09-26T09:15:45Z

Why are these changes needed?

Adding additional transforms to Spark pipeline

Related issue number (if any).

daw3rd · 2024-09-26T13:06:02Z

transforms/universal/resize/spark/Dockerfile

@@ -0,0 +1,44 @@
+ARG BASE_IMAGE=quay.io/dataprep1/data-prep-kit/data-prep-kit-spark-3.5.2:0.2.1.dev0


0.2.1.dev0 is no longer used in dev. to be consistent with other spark transforms (recent change), use latest as the tag. Note however, that this is generally overridden from the Makefile anyway by setting BASE_IMAGE when docker build is called. But for consistency it would be nice to change to latest.

daw3rd · 2024-09-26T13:06:49Z

transforms/universal/resize/spark/Makefile

+
+# set the version of python transform that this depends on.
+set-versions: 
+	$(MAKE) TRANSFORM_PYTHON_VERSION=${NOOP_PYTHON_VERSION} TOML_VERSION=$(NOOP_SPARK_VERSION) .transforms.set-versions 


NOOP -> RESIZE?

daw3rd · 2024-09-26T13:08:06Z

transforms/universal/resize/spark/README.md

+The set of dictionary keys holding [BlockListTransform](src/blocklist_transform.py)
+configuration for values are as follows:
+
+* _max_rows_per_table_ - specifies max documents per table


To better future-proof this file, shouldn't it defer to the python readme for configuration and CLI?

documentation update

d5ef3d3

blublinsky requested a review from daw3rd September 26, 2024 09:15

daw3rd requested changes Sep 26, 2024

View reviewed changes

blublinsky added 3 commits September 26, 2024 16:04

addressed comments

6fe6b57

addressed comments

a83e7a2

addressed comments

6b70c98

daw3rd approved these changes Sep 27, 2024

View reviewed changes

daw3rd merged commit 49ebd51 into dev Sep 27, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Resize Spark #630

Adding Resize Spark #630

blublinsky commented Sep 26, 2024

daw3rd Sep 26, 2024

blublinsky Sep 26, 2024

daw3rd Sep 26, 2024

blublinsky Sep 26, 2024

daw3rd Sep 26, 2024

		@@ -0,0 +1,44 @@
		ARG BASE_IMAGE=quay.io/dataprep1/data-prep-kit/data-prep-kit-spark-3.5.2:0.2.1.dev0

Adding Resize Spark #630

Adding Resize Spark #630

Conversation

blublinsky commented Sep 26, 2024

Why are these changes needed?

Related issue number (if any).

daw3rd Sep 26, 2024

Choose a reason for hiding this comment

blublinsky Sep 26, 2024

Choose a reason for hiding this comment

daw3rd Sep 26, 2024

Choose a reason for hiding this comment

blublinsky Sep 26, 2024

Choose a reason for hiding this comment

daw3rd Sep 26, 2024

Choose a reason for hiding this comment