Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ClearMLHandler to track all MONAI Experiments #6013

Merged
merged 48 commits into from
Mar 12, 2023

Conversation

skinan
Copy link
Contributor

@skinan skinan commented Feb 16, 2023

ClearML is a leading MLOps stack that can supercharge dialogues research with its state-of-the-art experiment tracking capability. ClearML: https://clear.ml/

I have added a clearml_handler.py which contains ClearMLHandler, ClearMLStatsHandler, and ClearMLImageHandler class. Basically, clearml can track everything which is tracked by tensorboard including scalars, and debug samples, and can also store models, artifacts, and the console in the ClearML server which can be easily accessed from ClearML WebUI shown as below:

Screenshot from 2023-02-16 19-54-36

Screenshot from 2023-02-16 19-54-29

Screenshot from 2023-02-16 19-54-18

Screenshot from 2023-02-16 19-53-32
Screenshot from 2023-02-16 19-53-49

Screenshot from 2023-02-16 19-55-29

ClearMLStatsHandler and ClearMLImageHandler can be used with Pytorch Trainer just like TensorboardStatsHandler and TensorboardImageHandler.

Use ClearMLStatsHandler() & ClearMLImageHandler(log_dir="./runs/", batch_transform=from_engine(["image", "label"]),output_transform=from_engine(["pred"]),) with any MONAI example to test its functionality.

Also, please let us where should we put further documentation and tutorials regarding this MLOps tool in the MONAI.

@Nic-Ma Nic-Ma requested a review from binliunls February 16, 2023 14:40
Copy link
Contributor

@Nic-Ma Nic-Ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution.
Please add some unit tests to cover the feature.

Thanks.

monai/handlers/clearml_handlers.py Outdated Show resolved Hide resolved
monai/handlers/clearml_handlers.py Outdated Show resolved Hide resolved
@skinan
Copy link
Contributor Author

skinan commented Feb 17, 2023

Working on adding some unittests.

monai/handlers/clearml_handlers.py Outdated Show resolved Hide resolved
monai/handlers/clearml_handlers.py Outdated Show resolved Hide resolved
monai/handlers/clearml_handlers.py Outdated Show resolved Hide resolved
monai/handlers/clearml_handlers.py Show resolved Hide resolved
@skinan
Copy link
Contributor Author

skinan commented Mar 8, 2023

@wyli , @Nic-Ma , @binliunls, is the memory ran out of error on these Tests normal? The failing tests are blocking the PR.

@wyli
Copy link
Contributor

wyli commented Mar 8, 2023

@wyli , @Nic-Ma , @binliunls, is the memory ran out of error on these Tests normal? The failing tests are blocking the PR.

yes, that issue is probably introduced by the code changes in this PR. any idea what the root cause is? we can skip the windows test using @skip_if_windows

def skip_if_windows(obj):
but preferably we should understand the issue first.

@skinan
Copy link
Contributor Author

skinan commented Mar 8, 2023

@wyli , Could you please clarify what the coverage command is doing here: https://github.com/Project-MONAI/MONAI/actions/runs/4352352552/jobs/7605037274#step:12:18505 . Because, we need to understand why it has triggered the clearml fire binding. As it was not supposed to trigger clearml.

Also, I would request that can you please check monitoring on the runner machine, so we can understand if it truly ran out of memory.

@wyli
Copy link
Contributor

wyli commented Mar 8, 2023

@wyli , Could you please clarify what the coverage command is doing here: https://github.com/Project-MONAI/MONAI/actions/runs/4352352552/jobs/7605037274#step:12:18505 . Because, we need to understand why it has triggered the clearml fire binding. As it was not supposed to trigger clearml.

Also, I would request that can you please check monitoring on the runner machine, so we can understand if it truly ran out of memory.

that's testing the bundle command:

python -m monai.bundle run --runner_id evaluating ...

you can run the test locally with:

python -m tests.test_integration_bundle_run

@Nic-Ma @binliunls could you please help the debugging here?

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Mar 8, 2023

/build

Signed-off-by: skinan <[email protected]>
@wyli
Copy link
Contributor

wyli commented Mar 9, 2023

looking at the error messages, it seems the error is from the clearml fire binding on windows.

     component_trace = _Fire(component, args, parsed_flag_args, context, name)\r
  File "C:\\hostedtoolcache\\windows\\Python\\3.8.10\\x64\\lib\\site-packages\\fire\\core.py", line 475, in _Fire\r
    component, remaining_args = _CallAndUpdateTrace(\r
  File "C:\\hostedtoolcache\\windows\\Python\\3.8.10\\x64\\lib\\site-packages\\clearml\\binding\\frameworks\\__init__.py", line 36, in _inner_patch\r
    raise ex\r
  File "C:\\hostedtoolcache\\windows\\Python\\3.8.10\\x64\\lib\\site-packages\\clearml\\binding\\frameworks\\__init__.py", line 34, in _inner_patch\r
    ret = patched_fn(original_fn, *args, **kwargs)\r
  File "C:\\hostedtoolcache\\windows\\Python\\3.8.10\\x64\\lib\\site-packages\\clearml\\binding\\fire_bind.py", line 172, in __CallAndUpdateTrace\r
    PatchFire.__groups, PatchFire.__commands = PatchFire.__get_all_groups_and_commands(\r
  File "C:\\hostedtoolcache\\windows\\Python\\3.8.10\\x64\\lib\\site-packages\\clearml\\binding\\fire_bind.py", line 249, in __get_all_groups_and_commands\r
    PatchFire._commands_sep.join(query_group) + PatchFire._commands_sep if len(query_group) > 0 else ""\r

https://github.com/Project-MONAI/MONAI/actions/runs/4371805968/jobs/7648041295

skinan and others added 3 commits March 12, 2023 09:42
I, Victor Sonck <[email protected]>, hereby add my Signed-off-by to this commit: ce0ac23

Signed-off-by: Victor Sonck <[email protected]
Signed-off-by: Victor Sonck <[email protected]>
Signed-off-by: skinan <[email protected]>
@wyli
Copy link
Contributor

wyli commented Mar 12, 2023

/build

@wyli wyli enabled auto-merge (squash) March 12, 2023 10:53
@wyli wyli merged commit f754928 into Project-MONAI:dev Mar 12, 2023
@skinan skinan deleted the integrate-clearml branch March 12, 2023 16:42
@wyli
Copy link
Contributor

wyli commented Mar 12, 2023

the docker test is blocked by the unit test https://github.com/Project-MONAI/MONAI/actions/runs/4397243843/jobs/7700566567 could you please help debug it? @skinan

the test is running on a github runner https://github.com/Project-MONAI/MONAI/actions/runs/4397243843/workflow#L83-L99

@skinan
Copy link
Contributor Author

skinan commented Mar 14, 2023

@wyli , This is weird. We are trying to reproduce it locally, to find out what goes wrong.

@wyli
Copy link
Contributor

wyli commented Mar 14, 2023

@wyli , This is weird. We are trying to reproduce it locally, to find out what goes wrong.

thanks, I try to reproduce it on github by manually triggering the test dev...wyli:MONAI:trigger-tests but it works fine now https://github.com/wyli/MONAI/actions/runs/4415391874 I'll monitor this test for the dev branch

@skinan skinan restored the integrate-clearml branch March 15, 2023 12:08
wyli pushed a commit that referenced this pull request Mar 21, 2023
**Issue:** fixes #6148
**Previous Pull-request:**
#6013

---------

Signed-off-by: skinan <[email protected]>
jak0bw pushed a commit to jak0bw/MONAI that referenced this pull request Mar 28, 2023
**Issue:** fixes Project-MONAI#6148
**Previous Pull-request:**
Project-MONAI#6013

---------

Signed-off-by: skinan <[email protected]>
jak0bw pushed a commit to jak0bw/MONAI that referenced this pull request Mar 28, 2023
**Issue:** fixes Project-MONAI#6148
**Previous Pull-request:**
Project-MONAI#6013

---------

Signed-off-by: skinan <[email protected]>
@pollfly pollfly mentioned this pull request Aug 7, 2023
4 tasks
wyli pushed a commit that referenced this pull request Aug 7, 2023
### Description

Add option to install `clearml` as optional dependency with `pip install
monai[clearml]`. All of the docstrings were updated and unit tests added
in [PR #6013](#6013).

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] Integration tests passed locally by running `./runtests.sh -f -u
--net --coverage`.
- [ ] Quick tests passed locally by running `./runtests.sh --quick
--unittests --disttests`.

---------

Signed-off-by: revital <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants