-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #82 from anandhu-eng/docs-update
Add MLCFlow script execution flow
- Loading branch information
Showing
3 changed files
with
159 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# MLC "script" execution flow | ||
|
||
## Understanding MLC scripts | ||
|
||
* An MLC script is identified by a set of tags and by an unique ID. | ||
* Further each MLC script can have multiple variations and they are identified by variation tags which are treated in the same way as tags and identified by a `_` prefix. | ||
|
||
### MLC script execution flow | ||
```mermaid | ||
graph TD | ||
MLC -->|env = incoming env + env_from_meta| B[Script] | ||
B -->|env - local_env_keys| C[List of Dependencies] | ||
C --> D[Preprocess] | ||
D -->|env - local_env_keys| E[Prehook dependencies] | ||
E -->F[Run script] | ||
F -->|env - clean_env_keys_post_deps| G[Posthook dependencies] | ||
G --> H[Postprocess] | ||
H -->|env - clean_env_keys_post_deps| I[Post dependencies] | ||
I -->|"env(new_env_keys)"| J[Script return] | ||
``` | ||
|
||
* When an MLC script is invoked (either by tags or by unique ID), its `meta.yaml` is processed first which will check for any `deps` script and if there are, then they are executed in order. | ||
* Once all the `deps` scripts are executed, `customize.py` file is checked and if existing `preprocess` function inside it is executed if present. | ||
* Then any `prehook_deps` scripts mentioned in `meta.yaml` are executed similar to `deps` | ||
* After this, keys in `env` dictionary is exported as `ENV` variables and `run` file if exists is executed. | ||
* Once run file execution is done, any `posthook_deps` scripts mentioned in `meta.yaml` are executed similar to `deps` | ||
* Then `postprocess` function inside customize.py is executed if present. | ||
* After this stage any `post_deps` scripts mentioned in `meta.yaml` is executed. | ||
|
||
*If a script is already cached, then the `preprocess`, `run file` and `postprocess` executions won't happen and only the dependencies marked as `dynamic` will be executed from `deps`, `prehook_deps`, `posthook_deps` and `postdeps`.* | ||
|
||
### Input flags | ||
When we run an MLC script we can also pass inputs to it and any input added in `input_mapping` dictionary inside `meta.yaml` gets converted to the corresponding `ENV` variable. | ||
|
||
### Conditional execution of any `deps`, `post_deps` | ||
We can use `skip_if_env` dictionary inside any `deps`, `prehook_deps`, `posthook_deps` or `post_deps` to make its execution conditional | ||
|
||
### Versions | ||
We can specify any specific version of a script using `version`. `version_max` and `version_min` are also possible options. | ||
|
||
* When `version_min` is given, any version above this if present in the cache or detected in the system can be chosen. If nothing is detected `default_version` if present and if above `version_min` will be used for installation. Otherwise `version_min` will be used as `version`. | ||
|
||
* When `version_max` is given, any version below this if present in the cache or detected in the system can be chosen. If nothing is detected `default_version` if present and if below `version_max` will be used for installation. Otherwise `version_max_usable` (additional needed input for `version_max`) will be used as `version`. | ||
|
||
### Variations | ||
* Variations are used to customize MLC script and each unique combination of variations uses a unique cache entry. Each variation can turn on `env` keys also any other meta including dependencies specific to it. Variations are turned on like tags but with a `_` prefix. For example, if a script is having tags `"get,myscript"`, to call the variation `"test"` inside it, we have to use tags `"get,myscript,_test"`. | ||
|
||
#### Variation groups | ||
`group` is a key to map variations into a group and at any time only one variation from a group can be used in the variation tags. For example, both `cpu` and `cuda` can be two variations under the `device` group, but user can at any time use either `cpu` or `cuda` as variation tags but not both. | ||
|
||
#### Dynamic variations | ||
Sometimes it is difficult to add all variations needed for a script like say `batch_size` which can take many different values. To handle this case, we support dynamic variations using '#' where '#' can be dynamically replaced by any string. For example, `"_batch_size.8"` can be used as a tag to turn on the dynamic variation `"_batch_size.#"`. | ||
|
||
### ENV flow during MLC script execution | ||
|
||
|
||
* During a given script execution incoming `env` dictionary is saved `(saved_env)` and all the updates happens on a copy of it. | ||
* Once a script execution is over (which includes all the dependent script executions as well), newly created keys and any updated keys are merged with the `saved_env` provided the keys are mentioned in `new_env_keys` | ||
* Same behaviour applies to `state` dictionary. | ||
|
||
#### Special env keys | ||
* Any env key with a prefix `MLC_TMP_*` and `MLC_GIT_*` are not passed by default to any dependency. These can be force passed by adding the key(s) to the `force_env_keys` list of the concerned dependency. | ||
* Similarly we can avoid any env key from being passed to a given dependency by adding the prefix of the key in the `clean_env_keys` list of the concerned dependency. | ||
* `--input` is automatically converted to `MLC_INPUT` env key | ||
* `version` is converted to `MLC_VERSION`, ``version_min` to `MLC_VERSION_MIN` and `version_max` to `MLC_VERSION_MAX` | ||
* If `env['MLC_GH_TOKEN']=TOKEN_VALUE` is set then git URLs (specified by `MLC_GIT_URL`) are changed to add this token. | ||
* If `env['MLC_GIT_SSH']=yes`, then git URLs are changed to SSH from HTTPS. | ||
|
||
### Script Meta | ||
#### Special keys in script meta | ||
* TBD: `reuse_version`, `inherit_variation_tags`, `update_env_tags_from_env` | ||
|
||
### How cache works? | ||
* If `cache=true` is set in a script meta, the result of the script execution is cached for further use. | ||
* For a cached script, `env` and `state` updates are done using `new_env` and `new_state` dictionaries which are stored in the `cm-cached.json` file inside the cached folder. | ||
* By using `--new` input, a new cache entry can be forced even when an old one exist. | ||
* By default no depndencies are run for a cached entry unless `dynamic` key is set for it. | ||
|
||
|
||
Please see [here](https://github.com/mlcommons/mlperf-automations/blob/main/docs/getting-started.md) for trying MLC scripts. | ||
|
||
|
||
|
||
|
||
© 2022-25 [MLCommons](https://mlcommons.org)<br> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
This page provides a walkthrough of the `meta.yaml` file. | ||
|
||
## Keys and Datatypes followed | ||
|
||
1. **alias**: `string` | ||
- Contains alias of the script which would be used instead of tags while runnning a script. | ||
2. **uid**: `string` | ||
- Unique identifier used to identify individual scripts. | ||
- Could be used instead of tags while running a script | ||
3. **automation_alias**: `string` | ||
- Alias with respect to particular automation | ||
4. **automation_uid**: `string` | ||
- Unique identifier used to identify an automation | ||
5. **category**: `string` | ||
- Script category | ||
6. **developers**: `list of strings` | ||
- List of developers who were involved in developing the particular script | ||
7. **tags**: `list of strings` | ||
- List of tags which could be specified by the user to run the particular script | ||
8. **default_env**: `dictionary` - Contains key-value pairs where values are `strings` | ||
- Contains key-value pairs which depicts the env variable and their value which could be set as default for a particular script. | ||
- The value of any default env would be replaced if the env variable is set anywhere in script files or is populated from the parent script to child. | ||
9. **env**: `dictionary` - Contains key-value pairs where values are `strings` | ||
- This key could be used to set a series of env variable and their values. | ||
10. **input_mapping**: `dictionary` - Contains key-value pairs where values are `strings` | ||
- This helps to map the input flags related to a particular script to the corresponding env variable | ||
- Only the keys that are specified under `input_mapping` in `meta.yml` of a script are being mapped to env variable. | ||
11. **env_key_mapping**: `dictionary` - Contains key-value pairs where values are `strings` | ||
- Used to map a particular env key to another env key. | ||
12. **new_env_keys**: `list of strings` | ||
- Used to specify the env keys that should be passed to the parent script(if the particular script is called as a dependency of another script). | ||
13. **new_state_keys**: `list of strings` | ||
- Used to specify the state keys that should be passed to the parent script(if the particular script is called as a dependency of another script). | ||
14. **deps**: `list of dictionaries` - Each dictionary can contain `tags` or other nested keys | ||
- List of dictionaries which specify the tags of the scripts that should be called as a dependency, env variable that should be passed, version, names, etc | ||
1. **names**: `list of strings` | ||
- They are the list of strings that the user could specify for that particular dependency so that whenever user needs to explicitely modify the configuration to be passed for that particular script, they could access through this name. | ||
2. **enable_if_env**: `dictionary` - Contains key-value pairs where values are lists of `strings` | ||
- This key could be used to configure script such that the particular dependency should only be called if one/more env variables are enabled, or their value is set to something specific. | ||
3. **skip_if_env**: `dictionary` - Contains key-value pairs where values are lists of `strings`. | ||
- This key could be used to configure script such that the particular dependency should be skipped if one/more env variables are enabled, or their value is set to something specific. | ||
15. **prehook_deps**: `list of dictionaries` - Each dictionary may contain `names` and `tags` as lists | ||
- List of dictionaries which specify the tags of the scripts that should be called as a prehook dependency, env variable that should be passed, version, names, etc | ||
- To know more about the script execution flow, please see [this](../script-flow/index.md) documentation. | ||
19. **posthook_deps**: `list of dictionaries` - Each dictionary may contain `tags` and other keys | ||
- List of dictionaries which specify the tags of the scripts that should be called as a posthook dependency, env variable that should be passed, version, names, etc | ||
- To know more about the script execution flow, please see [this](../script-flow/index.md) documentation. | ||
20. **variation_groups_order**: `list of strings` | ||
21. **variations**: `dictionary` - Each variation is a dictionary containing keys like `alias`, `default_variations`, `group`, etc. | ||
22. **group**: `string` | ||
23. **add_deps_recursive**: `dictionary` - Contains nested `tags` and other keys | ||
24. **default_variations**: `dictionary` - Contains key-value pairs where values are `strings` | ||
25. **docker**: `dictionary` - Contains keys specific to Docker configurations: | ||
- **base_image**: `string` | ||
- **image_name**: `string` | ||
- **os**: `string` | ||
- **os_version**: `string` | ||
- **deps**: `list of dictionaries` - Each dictionary can include `tags` or other keys. | ||
- **env**: `dictionary` - Contains key-value pairs where values are `strings` | ||
- **interactive**: `boolean` | ||
- **extra_run_args**: `string` | ||
- **mounts**: `list of strings` - Specifies mount paths in the format `"source:destination"` | ||
- **pre_run_cmds**: `list of strings` - Commands to run before the container starts | ||
- **docker_input_mapping**: `dictionary` - Contains key-value pairs where values are strings, mapping input parameters to Docker environment variables | ||
- **use_host_user_id**: `boolean` | ||
- **use_host_group_id**: `boolean` | ||
- **skip_run_cmd**: `string` | ||
- **shm_size**: `string` | ||
- **real_run**: `boolean` | ||
- **all_gpus**: `string` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters