Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SageMaker enhancements to allow custom docker image, input channels referring to s3/remote data locations and metrics logging #504

Merged
merged 5 commits into from
Jul 12, 2022

Conversation

pacman100
Copy link
Contributor

@pacman100 pacman100 commented Jul 11, 2022

What does this PR do?

Addresses the requests of #502 and more

  1. One can specify a custom docker image instead of Official 🤗 DLCs through the accelerate config questionnaire. When this isn't provided, the latest Official 🤗 DLC will be used.
  2. Support for input channels pointing to S3 data locations via TSV file, e.g., below are the contents of sagemaker_inputs.tsv whose location is given as part of accelerate config setup.
channel_name	data_location
train	s3://sagemaker-sample/samples/datasets/imdb/train
test	s3://sagemaker-sample/samples/datasets/imdb/test
  1. Support for SageMaker metrics logging via TSV file, e.g., below are the contents of the sagemaker_metrics_definition.tsv whose location is given as part of accelerate config setup.
metric_name	metric_regex
accuracy	'accuracy': ([0-9.]+)
f1	'f1': ([0-9.]+)

Below is the logging of the metrics in SageMaker
Screenshot 2022-07-11 at 6 17 25 PM

Example of accelerate config with above features setup [XXXXX values are AWS account specific]:

base_job_name: accelerate-sagemaker-1
compute_environment: AMAZON_SAGEMAKER
distributed_type: DATA_PARALLEL
ec2_instance_type: ml.p3.16xlarge
iam_role_name: XXXXX
image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training:1.8.1-transformers4.10.2-gpu-py36-cu111-ubuntu18.04
mixed_precision: fp16
num_machines: 1
profile: XXXXX
py_version: py38
pytorch_version: 1.10.2
region: us-east-1
sagemaker_inputs_file: /home/ubuntu/sagemaker_examples/sagemaker_inputs.tsv
sagemaker_metrics_file: /home/ubuntu/sagemaker_examples/sagemaker_metrics_definition.tsv
transformers_version: 4.17.0
use_cpu: false

@pacman100 pacman100 requested review from philschmid and sgugger July 11, 2022 11:40
@pacman100 pacman100 changed the title Smangrul/sagemaker enhancements sagemaker enhancements to allow custom docker image, input channels referring to s3/remote data locations and metrics logging Jul 11, 2022
@pacman100 pacman100 changed the title sagemaker enhancements to allow custom docker image, input channels referring to s3/remote data locations and metrics logging SageMaker enhancements to allow custom docker image, input channels referring to s3/remote data locations and metrics logging Jul 11, 2022
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 11, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! LGTM but I'd also like to see @philschmid review since he is the SageMaker expert :-)

Copy link
Contributor

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅

@pacman100 pacman100 merged commit 3c1f97c into huggingface:main Jul 12, 2022
@plamb-viso
Copy link

Wow @pacman100 you responded to that request quickly. Thanks everyone, the Accelerate team is awesome.

@pacman100 pacman100 deleted the smangrul/sagemaker-enhancements branch July 13, 2022 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants