-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Resize Spark #630
Adding Resize Spark #630
Conversation
@@ -0,0 +1,44 @@ | |||
ARG BASE_IMAGE=quay.io/dataprep1/data-prep-kit/data-prep-kit-spark-3.5.2:0.2.1.dev0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.2.1.dev0 is no longer used in dev. to be consistent with other spark transforms (recent change), use latest
as the tag. Note however, that this is generally overridden from the Makefile anyway by setting BASE_IMAGE when docker build is called. But for consistency it would be nice to change to latest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
# set the version of python transform that this depends on. | ||
set-versions: | ||
$(MAKE) TRANSFORM_PYTHON_VERSION=${NOOP_PYTHON_VERSION} TOML_VERSION=$(NOOP_SPARK_VERSION) .transforms.set-versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOOP -> RESIZE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
The set of dictionary keys holding [BlockListTransform](src/blocklist_transform.py) | ||
configuration for values are as follows: | ||
|
||
* _max_rows_per_table_ - specifies max documents per table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To better future-proof this file, shouldn't it defer to the python readme for configuration and CLI?
Why are these changes needed?
Adding additional transforms to Spark pipeline
Related issue number (if any).
#586