Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema performance improvements #632

Merged
merged 45 commits into from
Jul 18, 2024

Conversation

AlyssaCote
Copy link
Contributor

@AlyssaCote AlyssaCote commented Jul 15, 2024

Performance improvements needed to be made in order to reduce the amount of copies we were making and de/serialization time. Instead of building a Tensor and then adding it to a Request, the Request holds TensorDescriptors and the actual tensor data is sent after the request through the FLInterface.

I was able to delete a lot of TensorFlow and Torch tests that were separated out now that the build_tensor has been updated to build_tensor_descriptor.

@AlyssaCote AlyssaCote marked this pull request as draft July 15, 2024 21:22
Copy link

codecov bot commented Jul 15, 2024

Codecov Report

Attention: Patch coverage is 27.02703% with 27 lines in your changes missing coverage. Please review.

Please upload report for BASE (mli-feature@eace71e). Learn more about missing BASE report.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff               @@
##             mli-feature     #632   +/-   ##
==============================================
  Coverage               ?   76.61%           
==============================================
  Files                  ?      100           
  Lines                  ?     6905           
  Branches               ?        0           
==============================================
  Hits                   ?     5290           
  Misses                 ?     1615           
  Partials               ?        0           
Files Coverage Δ
smartsim/_core/mli/comm/channel/channel.py 75.00% <ø> (ø)
smartsim/_core/mli/infrastructure/worker/worker.py 53.75% <ø> (ø)
smartsim/_core/mli/message_handler.py 99.51% <100.00%> (ø)
...rtsim/_core/mli/mli_schemas/tensor/tensor_capnp.py 100.00% <ø> (ø)
smartsim/_core/mli/comm/channel/dragonchannel.py 52.38% <66.66%> (ø)
...im/_core/mli/infrastructure/worker/torch_worker.py 85.41% <0.00%> (ø)
smartsim/_core/mli/comm/channel/dragonfli.py 64.00% <11.11%> (ø)
.../_core/mli/infrastructure/control/workermanager.py 22.15% <0.00%> (ø)

@AlyssaCote AlyssaCote marked this pull request as ready for review July 16, 2024 16:18
@AlyssaCote AlyssaCote requested review from al-rigazzi and ankona July 16, 2024 16:18
Copy link
Contributor

@mellis13 mellis13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some general comments and questions but overall great changes to the code.

msg_tensor = MessageHandler.build_tensor(
tensor,

# TODO isn't this what output descriptors are for?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we resolve these two TODO comments? Do we need to make tickets or can they be deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually a note so I remembered to ask this question for the group! We don't use OutputDescriptors anywhere. I think the hardcoded information here can come from the OutputDescriptors so we know how the tensor needs to be reconstructed. I'll make a ticket for further discussion and remove these TODOs.


interm = time.perf_counter() # timing
request = deserialize_message(
request_bytes, self._comm_channel_type, self._device
)

if request.input_meta and tensor_list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe The logic from 248 to 264 (and the deserialization_message() content) would be better encapsulated in a unpack_request. I think _on_iteration should have minimal manipulation of the request based on the serialiation and communication specifics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Does this make it more difficult to do perf timing though?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree, but maybe we wait to refactor _on_iteration until we're solid with performance timing? It might make it more difficult.

Copy link
Collaborator

@al-rigazzi al-rigazzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@AlyssaCote AlyssaCote merged commit 7169f1c into CrayLabs:mli-feature Jul 18, 2024
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants