Skip to content

Commit

Permalink
fix typos
Browse files Browse the repository at this point in the history
Signed-off-by: Maroun Touma <[email protected]>
  • Loading branch information
touma-I committed Nov 15, 2024
1 parent 8e71177 commit ef7c57d
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 15 deletions.
1 change: 1 addition & 0 deletions transforms/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ build-pkg-dist:
fi \
done
# Only needs to build the whl
git show --no-patch > src/data/gitshow.txt
$(MAKE) BUILD_WHEEL_EXTRA_ARG=-w .defaults.build-dist
-rm -fr src

Expand Down
23 changes: 12 additions & 11 deletions transforms/README-list.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,18 +23,19 @@ Note: This list includes the transforms that were part of the release starting w
* [code_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code_quality/python/README.md)
* [proglang_select](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/proglang_select/python/README.md)
* language
* [doc_chunk](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/doc_chunk/python/README.md)
* [doc_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/doc_quality/python/README.md)
* [lang_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/lang_id/python/README.md)
* [pdf2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/pdf2parquet/python/README.md)
* [text_encoder](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/text_encoder/python/README.md)
* [pii_redactor](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/pii_redactor/python/README.md)
* [doc_chunk](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/doc_chunk/python/README.md)
* [doc_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/doc_quality/python/README.md)
* [lang_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/lang_id/python/README.md)
* [pdf2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/pdf2parquet/python/README.md)
* [text_encoder](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/text_encoder/python/README.md)
* [pii_redactor](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/pii_redactor/python/README.md)
* universal
* [ededup](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/ededup/python/README.md)
* [filter](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/filter/python/README.md)
* [resize](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/resize/python/README.md)
* [tokenization](https://github.com/IBM/data-prep-kit/blob/dev/transforms/tokenization/doc_chunk/python/README.md)
* [doc_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/doc_id/python/README.md)
* [ededup](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/ededup/python/README.md)
* [filter](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/filter/python/README.md)
* [resize](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/resize/python/README.md)
* [tokenization](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/tokenization/python/README.md)
* [doc_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/doc_id/python/README.md)
* [web2prquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/universal/web2parquet/README.md)



Expand Down
5 changes: 5 additions & 0 deletions transforms/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,11 @@ resize = { file = ["universal/resize/python/requirements.txt"]}
# Does not seem to work for our custom layout
# copy all files to a single src and let automatic discovery find them

[tool.setuptools.package-data]
"*" = ["*.txt"]

[tool.setuptools.packages.find]
where = ["src"]

#[tool.setuptools.package-dir]
#dpk_web2parquet = "universal/web2parquet/dpk_web2parquet"
Expand Down
8 changes: 4 additions & 4 deletions transforms/universal/web2parquet/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Web Crawler to Parquet

This tranforms crawls the web and download files in real-time.
This tranform crawls the web and downloads files in real-time.

This first release of the transform, only accept the following 4 parameters. Additional releases will extend the functionality to allow the user to specify additional constraints such as mime-type, domain-focus, etc.
This first release of the transform, only accepts the following 4 parameters. Additional releases will extend the functionality to allow the user to specify additional constraints such as mime-type, domain-focus, etc.


## parameters
## Parameters

For configuring the crawl, users need to identify the follow paramters:
For configuring the crawl, users need to identify the follow parameters:

| parameter:type | Description |
| --- | --- |
Expand Down

0 comments on commit ef7c57d

Please sign in to comment.