Skip to content

Commit

Permalink
Merge pull request #951 from IBM/noop-refactor
Browse files Browse the repository at this point in the history
refactor noop transform to use dpk_ structures
  • Loading branch information
touma-I authored Jan 23, 2025
2 parents 8ea7763 + cea4ea5 commit 3ca9926
Show file tree
Hide file tree
Showing 101 changed files with 633 additions and 1,930 deletions.
306 changes: 0 additions & 306 deletions data-processing-lib/doc/advanced-transform-tutorial.md

This file was deleted.

2 changes: 1 addition & 1 deletion data-processing-lib/doc/data-access-factory.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ the processing of input data files and the expected destination
of the processed files.
The `DataAccessFactory` is most often configured using command line arguments
to specify the type of `DataAccess` instance to create
(see `--data_*` options [here](python-launcher-options.md).
(see `--data_*` options [here](launcher-options.md).
Currently, it supports
[DataAccessLocal](../python/src/data_processing/data_access/data_access_local.py)
and
Expand Down
Original file line number Diff line number Diff line change
@@ -1,24 +1,16 @@
# Ray Launcher Command Line Options
A number of command line options are available when launching a transform.
# Runtime Command Line Options

The following is a current --help output (a work in progress) for
the `NOOPTransform` (note the --noop_sleep_sec and --noop_pwd options):
A number of command line options are available when launching a transform.
* Transform options defined by the specific transform
* Runtime/launcher independent options, primarily for identifying data sources and destinations.
* Runtime-specific options for controlling aspects of the individual runtime.

The runtime options are discussed below (see the specific transform or using -help
to determine transform options.)

## Runtime-independent Launcher CLI Arguments
The following are the set of command line launcher options available to all runtimes.
```
usage: noop_transform.py [-h] [--run_locally RUN_LOCALLY] [--noop_sleep_sec NOOP_SLEEP_SEC] [--noop_pwd NOOP_PWD] [--data_s3_cred DATA_S3_CRED] [--data_s3_config DATA_S3_CONFIG] [--data_local_config DATA_LOCAL_CONFIG]
[--data_max_files DATA_MAX_FILES] [--data_checkpointing DATA_CHECKPOINTING] [--data_data_sets DATA_DATA_SETS] [--data_files_to_use DATA_FILES_TO_USE] [--data_num_samples DATA_NUM_SAMPLES]
[--runtime_num_workers RUNTIME_NUM_WORKERS] [--runtime_worker_options RUNTIME_WORKER_OPTIONS] [--runtime_creation_delay RUNTIME_CREATION_DELAY] [--runtime_pipeline_id RUNTIME_PIPELINE_ID]
[--runtime_job_id RUNTIME_JOB_ID] [--runtime_code_location RUNTIME_CODE_LOCATION]
Driver for noop processing
options:
-h, --help show this help message and exit
--run_locally RUN_LOCALLY
running ray local flag
--noop_sleep_sec NOOP_SLEEP_SEC
Sleep actor for a number of seconds while processing the data frame, before writing the file to COS
--noop_pwd NOOP_PWD A dummy password which should be filtered out of the metadata
--data_s3_cred DATA_S3_CRED
AST string of options for s3 credentials. Only required for S3 data access.
access_key: access key help text
Expand Down Expand Up @@ -49,6 +41,29 @@ options:
list of file extensions to choose for input.
--data_num_samples DATA_NUM_SAMPLES
number of random input files to process
```

## Python Launcher CLI Arguments
The following are the set of command line launcher options available on for the python runtime.
```
--runtime_num_processors RUNTIME_NUM_PROCESSORS
size of multiprocessing pool
--runtime_pipeline_id RUNTIME_PIPELINE_ID
pipeline id
--runtime_job_id RUNTIME_JOB_ID
job id
--runtime_code_location RUNTIME_CODE_LOCATION
AST string containing code location
github: Github repository URL.
commit_hash: github commit hash
path: Path within the repository
Example: { 'github': 'https://github.com/somerepo', 'commit_hash': '1324',
'path': 'transforms/universal/code' }
```
## Ray Launcher CLI Arguments
The following are the set of command line launcher options available on for the Ray runtime.
```
--runtime_num_workers RUNTIME_NUM_WORKERS
number of workers
--runtime_worker_options RUNTIME_WORKER_OPTIONS
Expand Down Expand Up @@ -77,3 +92,18 @@ options:
Example: { 'github': 'https://github.com/somerepo', 'commit_hash': '1324',
'path': 'transforms/universal/code' }
```
## Spark Launcher CLI Arguments
The following are the set of command line launcher options available on for the Spark runtime.
```
--runtime_pipeline_id RUNTIME_PIPELINE_ID
pipeline id
--runtime_job_id RUNTIME_JOB_ID
job id
--runtime_code_location RUNTIME_CODE_LOCATION
AST string containing code location
github: Github repository URL.
commit_hash: github commit hash
path: Path within the repository
Example: { 'github': 'https://github.com/somerepo', 'commit_hash': '1324',
'path': 'transforms/universal/code' }
```
Loading

0 comments on commit 3ca9926

Please sign in to comment.