Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support S3 namespaces when retrieving models from buckets #4917

Merged
merged 15 commits into from
Dec 17, 2019

Conversation

cianclarke
Copy link
Contributor

@cianclarke cianclarke commented Dec 6, 2019

Proposed changes:

  • When working with many models, it helps to make use of S3 namespacing functionality (aka "folders") within buckets. Not being able to load models from buckets is a hindrance when working across projects.
  • Present S3 bucket support only works with root directory - whenever a namespace is provided, this path is also used on the local file system to write the .tar.gz (see Models loaded from S3 buckets do not support namespaces  #4916 )
  • PR allows passing of S3 paths with namespaces for modal retrieval.

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

Fixes #4916

@claassistantio
Copy link

claassistantio commented Dec 6, 2019

CLA assistant check
All committers have signed the CLA.

@erohmensing
Copy link
Contributor

Thanks for the PR, we'll give it a review as soon as possible.

@erohmensing erohmensing requested a review from ricwo December 10, 2019 08:52
Copy link
Contributor

@ricwo ricwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for opening the PR 👍 The functionality looks good in general.

Please don't forget to add a Changelog entry for your change (you can find instructions on how to do that in the latest README on master)

@@ -151,11 +151,11 @@ def _persist_tar(self, file_key: Text, tar_path: Text) -> None:
with open(tar_path, "rb") as f:
self.s3.Object(self.bucket_name, file_key).put(Body=f)

def _retrieve_tar(self, target_filename: Text) -> None:
def _retrieve_tar(self, model_full_path: Text) -> None:
tar_name = os.path.basename(model_full_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line should move one down (below the docstring)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

awsPersistor._retrieve_tar(model)
retrieveArgs = download_fileobj.call_args[0]
assert retrieveArgs[0] == model
assert retrieveArgs[1].name == "model.tar.gz"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move these two tests right below the existing two s3 tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


# noinspection PyPep8Naming
@mock_s3
def test_s3_private_retrieve_tar():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly is being tested here? A comment would be very helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added! I included this test retroactively to bump coverage back to an acceptable level by coveralls. I'd be happy to also remove it if you think it excessive

@cianclarke
Copy link
Contributor Author

@ricwo Many thanks for the review. I've incorporated all your suggestions, and also added a changelog entry 👍

Copy link
Contributor

@ricwo ricwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for the fixes and again for your contribution ⭐️

@ricwo
Copy link
Contributor

ricwo commented Dec 15, 2019

@cianclarke looks like you didn't check in the changelog file - would you mind adding that?

@ricwo ricwo self-requested a review December 15, 2019 05:53
Copy link
Contributor

@ricwo ricwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes until the changelog file is uploaded - good to merge afterwards

@cianclarke
Copy link
Contributor Author

Ah, sorry @ricwo - you're absolutely right. I've committed the changelog entry.
I'm also noticing coveralls oddly report coverage having gone back down again, just from moving the tests around when I addressed your PR feedback. I'm not sure what's going on there!

@cianclarke
Copy link
Contributor Author

@ricwo - I committed your args suggestions from the GitHub UI (looks like I may have missed those).
Hoping that was the argument updates you had meant - happy to also rebase the excess commits out, or can do on merge - not sure if the project has a preference on this.

@ricwo ricwo merged commit 6449f7f into RasaHQ:master Dec 17, 2019
@cianclarke
Copy link
Contributor Author

Thanks @ricwo

@cianclarke cianclarke deleted the supportS3Namespaces branch December 17, 2019 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Models loaded from S3 buckets do not support namespaces
4 participants