-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pull request source branch to cut release 0.2.1 #622
Merged
Merged
Changes from 5 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
ca8087a
Pull request source branch to cut release 0.2.1
touma-I 41ca069
Updated release notes for v0.2.1
touma-I b74e6b6
fix docker tag to match release
touma-I 2599710
fix release tag for spark image
touma-I 5c8bd5d
fix tag for latest spark image
touma-I c8333ff
Missing base image in quay.io
touma-I 8a794cb
missing base image in quay.io
touma-I 28e73a2
use image from cache
touma-I df0e27b
Build pckages for the release using src folder only (no test)
touma-I 44ce92d
fix base spark image tag to use .make.versions instead of hardcoded t…
daw3rd 446b45a
redoing fix after breakeage with patch added by David
touma-I File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[project] | ||
name = "data_prep_toolkit_ray" | ||
version = "0.2.1.dev3" | ||
version = "0.2.1" | ||
keywords = ["data", "data preprocessing", "data preparation", "llm", "generative", "ai", "fine-tuning", "llmapps" ] | ||
requires-python = ">=3.10" | ||
description = "Data Preparation Toolkit Library for Ray" | ||
|
@@ -11,7 +11,7 @@ authors = [ | |
{ name = "Boris Lublinsky", email = "[email protected]" }, | ||
] | ||
dependencies = [ | ||
"data-prep-toolkit>=0.2.1.dev3", | ||
"data-prep-toolkit>=0.2.1", | ||
"ray[default]==2.24.0", | ||
# These two are to fix security issues identified by quay.io | ||
"fastapi>=0.110.2", | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[project] | ||
name = "data_prep_toolkit_spark" | ||
version = "0.2.1.dev3" | ||
version = "0.2.1" | ||
keywords = ["data", "data preprocessing", "data preparation", "llm", "generative", "ai", "fine-tuning", "llmapps" ] | ||
requires-python = ">=3.10" | ||
description = "Data Preparation Toolkit Library for Spark" | ||
|
@@ -11,7 +11,7 @@ authors = [ | |
{ name = "Boris Lublinsky", email = "[email protected]" }, | ||
] | ||
dependencies = [ | ||
"data-prep-toolkit==0.2.1.dev3", | ||
"data-prep-toolkit==0.2.1", | ||
"pyspark>=3.5.2", | ||
"psutil>=6.0.0" | ||
] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,41 @@ | ||
# Data Prep Kit Release notes | ||
|
||
## Release 0.2.1 - 9/24/2024 | ||
|
||
### General | ||
1. Bug fixes across the repo | ||
1. Added AI Alliance RAG demo, tutorials and notebooks and tips for running on google colab | ||
1. Added new transforms and single package for transforms published to pypi | ||
1. improved CI/CD with targeted workflow triggered on specific changes to specific modules | ||
1. New enhancements for cutting a release | ||
|
||
|
||
### data-prep-toolkit libraries (python, ray, spark) | ||
|
||
1. Restructure the repository to distinguish/separate runtime libraries | ||
1. Split data-processing-lib/ray into python and ray | ||
1. Spark runtime | ||
1. updated pyarrow version | ||
1. define required transform() method as abstract to AbstractTableTransform | ||
1. Enables configuration of makefile to use src or pypi for data-prep-kit library dependencies | ||
|
||
|
||
### KFP Workloads | ||
|
||
1. Update kfp image version | ||
1. Enable kfp in GH action for testing randomly selected workflow and prevent kfp test for transforms that do not support it | ||
1. Auto generate kfp pipelines | ||
1. Combine the common KFP support code in a shared library | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No it is not a new feature.
|
||
1. Update K8s cluster deployment and remove creation of clusterrolebinding in kubeflow installation | ||
|
||
|
||
### Transforms | ||
|
||
1. Added 7 new transdforms including: language identification, profiler, repo level ordering, doc quality, pdf2parquet, HTML2Parquet and PII Transform | ||
1. Added ededup python implementation and incremental ededup | ||
1. Added fuzzy floating point comparison | ||
|
||
|
||
## Release 0.2.0 - 6/27/2024 | ||
|
||
### General | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[project] | ||
name = "dpk_code2parquet_transform_python" | ||
version = "0.2.1.dev3" | ||
version = "0.2.1" | ||
requires-python = ">=3.10" | ||
description = "code2parquet Python Transform" | ||
license = {text = "Apache-2.0"} | ||
|
@@ -10,7 +10,7 @@ authors = [ | |
{ name = "Boris Lublinsky", email = "[email protected]" }, | ||
] | ||
dependencies = [ | ||
"data-prep-toolkit==0.2.1.dev3", | ||
"data-prep-toolkit==0.2.1", | ||
"parameterized", | ||
"pandas", | ||
] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[project] | ||
name = "dpk_code2parquet_transform_ray" | ||
version = "0.2.1.dev3" | ||
version = "0.2.1" | ||
requires-python = ">=3.10" | ||
description = "code2parquet Ray Transform" | ||
license = {text = "Apache-2.0"} | ||
|
@@ -10,8 +10,8 @@ authors = [ | |
{ name = "Boris Lublinsky", email = "[email protected]" }, | ||
] | ||
dependencies = [ | ||
"data-prep-toolkit-ray==0.2.1.dev3", | ||
"dpk-code2parquet-transform-python==0.2.1.dev3", | ||
"data-prep-toolkit-ray==0.2.1", | ||
"dpk-code2parquet-transform-python==0.2.1", | ||
"parameterized", | ||
"pandas", | ||
] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[project] | ||
name = "dpk_code_quality_transform_python" | ||
version = "0.2.1.dev3" | ||
version = "0.2.1" | ||
requires-python = ">=3.10" | ||
description = "Code Quality Python Transform" | ||
license = {text = "Apache-2.0"} | ||
|
@@ -9,7 +9,7 @@ authors = [ | |
{ name = "Shivdeep Singh", email = "[email protected]" }, | ||
] | ||
dependencies = [ | ||
"data-prep-toolkit==0.2.1.dev3", | ||
"data-prep-toolkit==0.2.1", | ||
"bs4==0.0.2", | ||
"transformers==4.38.2", | ||
] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[project] | ||
name = "dpk_code_quality_transform_ray" | ||
version = "0.2.1.dev3" | ||
version = "0.2.1" | ||
requires-python = ">=3.10" | ||
description = "Code Quality Ray Transform" | ||
license = {text = "Apache-2.0"} | ||
|
@@ -9,8 +9,8 @@ authors = [ | |
{ name = "Shivdeep Singh", email = "[email protected]" }, | ||
] | ||
dependencies = [ | ||
"dpk-code-quality-transform-python==0.2.1.dev3", | ||
"data-prep-toolkit-ray==0.2.1.dev3", | ||
"dpk-code-quality-transform-python==0.2.1", | ||
"data-prep-toolkit-ray==0.2.1", | ||
] | ||
|
||
[build-system] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[project] | ||
name = "dpk_header_cleanser_transform_python" | ||
version = "0.2.1.dev3" | ||
version = "0.2.1" | ||
requires-python = ">=3.10" | ||
description = "License and Copyright Removal Transform for Python" | ||
license = {text = "Apache-2.0"} | ||
|
@@ -9,7 +9,7 @@ authors = [ | |
{ name = "Yash kalathiya", email = "[email protected]" }, | ||
] | ||
dependencies = [ | ||
"data-prep-toolkit==0.2.1.dev3", | ||
"data-prep-toolkit==0.2.1", | ||
"scancode-toolkit==32.1.0", | ||
] | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[project] | ||
name = "dpk_header_cleanser_transform_ray" | ||
version = "0.2.1.dev3" | ||
version = "0.2.1" | ||
requires-python = ">=3.10" | ||
description = "License and copyright removal Transform for Ray" | ||
license = {text = "Apache-2.0"} | ||
|
@@ -9,8 +9,8 @@ authors = [ | |
{ name = "Yash kalathiya", email = "[email protected]" }, | ||
] | ||
dependencies = [ | ||
"dpk-header-cleanser-transform-python==0.2.1.dev3", | ||
"data-prep-toolkit-ray==0.2.1.dev3", | ||
"dpk-header-cleanser-transform-python==0.2.1", | ||
"data-prep-toolkit-ray==0.2.1", | ||
"scancode-toolkit==32.1.0", | ||
] | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the bullets start with upper case, while others with lower.