Add Configuration setup for SageMaker #17

philschmid · 2021-03-25T21:46:46Z

This PR adds configuration possibilities for launch scripts on Amazon SageMaker. This pr contains only the configuration part of the CLI and not the job launch itself. I decoupled the config subparsers into multiple small files to reduce complexity and structure different config options clear. This allows us to add in the future different configurations easier.

I also added 2 github actions which run make quality and make test.

I know it is a big PR. I hope we can iterate fast on it.

philschmid · 2021-03-25T21:57:18Z

src/accelerate/commands/config/cluster.py

+from .config_utils import _ask_field, _convert_distributed_mode, _convert_yes_no_to_bool
+
+
+def get_cluster_input():


decoupled input gathering from main config get_user_input() into the separate options

philschmid · 2021-03-25T21:57:40Z

src/accelerate/commands/config/config.py

+
+
+def get_user_input():
+    compute_environment = _ask_field(


Determines which config flow should be used.

philschmid · 2021-03-25T21:58:52Z

src/accelerate/commands/config/config_args.py

+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+


create a separate file to merge all configuration dataclasses. We can also think about moving them so state.py

philschmid · 2021-03-25T21:59:50Z

src/accelerate/commands/config/config_args.py

+    default_config_file = default_json_config_file
+
+
+def load_config_from_file(config_file):


general loading function for determining if config file is json or yaml and which configuration is used for loading. Open for suggestions on how to identify the config_class easier.

philschmid · 2021-03-25T22:00:40Z

src/accelerate/commands/config/config_args.py

+
+
+@dataclass
+class BaseConfig:


base class contains overlapping properties and methods.

philschmid · 2021-03-25T22:02:38Z

src/accelerate/commands/config/config_args.py

+class ClusterConfig(BaseConfig):
+    num_processes: int
+    machine_rank: int = 0
+    num_machines: int = 1


kept num_machines in specialist class due to dataclass -> Dataclass fields without default value cannot appear after data fields with default values.

philschmid · 2021-03-25T22:03:42Z

src/accelerate/state.py

+    """
+
+    # Subclassing str as well as Enum allows the `ComputeEnvironment` to be JSON-serializable out of the box.
+    CUSTOM_CLUSTER = "CUSTOM_CLUSTER"


open for better naming

I think LOCAL_MACHINE is better.

sgugger

Amazing work! To complete the decoupling of commands.config while leaving stuff easily accessible, I would put all the main functions/classes of commands.config.xxx.py inside the commands.config.__init__.py.

Then I think LOCAL_MACHINE is a better name than CUSTOM_CLUSTER which sounds a bit too grand for most users ;-)

Thanks for adding a basic CI, it was on my TODO but I was too lazy to actually do it ;-)

sgugger · 2021-03-25T22:52:32Z

src/accelerate/commands/accelerate_cli.py

@@ -16,7 +16,7 @@

 from argparse import ArgumentParser

-from accelerate.commands.config import config_command_parser
+from accelerate.commands.config.config import config_command_parser


This is a bit weird so would remove one .config by adding config_command_parser in the intermediate init.

sgugger · 2021-03-25T22:53:54Z

src/accelerate/commands/config/config.py

+
+def get_user_input():
+    compute_environment = _ask_field(
+        "In which compute environment are you running? ([0] Custom Cluster, [1] AWS (Amazon SageMaker)): ",


Custom Cluster does sound a bit too complicated for the base config.

Suggested change

"In which compute environment are you running? ([0] Custom Cluster, [1] AWS (Amazon SageMaker)): ",

"In which compute environment are you running? ([0] This machine, [1] AWS (Amazon SageMaker)): ",

sgugger · 2021-03-25T22:56:09Z

src/accelerate/commands/config/config_args.py

+            yaml.safe_dump(self.to_dict(), f)
+
+    def __post_init__(self):
+        if isinstance(self.distributed_type, str):


Suggested change

if isinstance(self.distributed_type, str):

if isinstance(self.compute_environment, str):

sgugger · 2021-03-25T22:58:12Z

src/accelerate/commands/launch.py

@@ -23,8 +23,8 @@
 from pathlib import Path
 from typing import Optional

-from accelerate.commands.config import LaunchConfig, default_config_file
-from accelerate.state import DistributedType
+from accelerate.commands.config.config_args import default_config_file, load_config_from_file


Would also put those two functions in the intermediate init accelerate.commands.config.__init__.py

sgugger · 2021-03-25T22:59:09Z

src/accelerate/state.py

+    """
+
+    # Subclassing str as well as Enum allows the `ComputeEnvironment` to be JSON-serializable out of the box.
+    CUSTOM_CLUSTER = "CUSTOM_CLUSTER"


I think LOCAL_MACHINE is better.

sgugger · 2021-03-25T22:59:41Z

src/accelerate/utils.py

+_has_boto3 = importlib.util.find_spec("boto3") is not None
+


Would define a public is_boto3_available() function instead of a private variable.

sgugger · 2021-03-25T23:00:13Z

setup.py

+extras["test"] = [
+    "pytest",
+    "pytest-xdist",
+]
 setup(


I think we can add an extra "sagemaker" with boto3 inside.

philschmid · 2021-03-26T18:16:52Z

I can´t reproduce why quality is failing. I created a complete new conda env with the same packages.

LysandreJik

Yes, LGTM! Thanks for working on this.

The quality issue may originate from mismatched version between your versions and the CI?

philschmid added 6 commits March 25, 2021 16:18

decoupled config sub-cli

d52c54e

added more ci workflows

c5b2637

added sagemaker config

a538761

fix actions

212f6c3

fixed matrix actions

6acaba8

removed python matrix from actions

d876afb

philschmid commented Mar 25, 2021

View reviewed changes

philschmid requested a review from sgugger March 25, 2021 22:05

philschmid added 2 commits March 25, 2021 23:07

changed actions name

e77469f

changed step name

cace4b3

sgugger approved these changes Mar 25, 2021

View reviewed changes

sgugger requested a review from LysandreJik March 25, 2021 23:01

philschmid added 6 commits March 26, 2021 17:22

changed CUSTOM_CLUSTER to LOCAL_MACHINE and added feedback

c138e31

added feedback

b23d7f1

make style

1b4969e

make quality

4ca514f

replaced private variable with method

080fd4a

is it fixing quality

04adbab

LysandreJik approved these changes Mar 29, 2021

View reviewed changes

philschmid merged commit f7e0c26 into main Mar 30, 2021

philschmid deleted the run-on-sagemaker branch March 30, 2021 06:25

sgugger mentioned this pull request Mar 30, 2021

Add defaults for compute_environmnent #23

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Configuration setup for SageMaker #17

Add Configuration setup for SageMaker #17

philschmid commented Mar 25, 2021 •

edited

Loading

philschmid Mar 25, 2021

philschmid Mar 25, 2021

philschmid Mar 25, 2021

philschmid Mar 25, 2021 •

edited

Loading

philschmid Mar 25, 2021

philschmid Mar 25, 2021

philschmid Mar 25, 2021

sgugger Mar 25, 2021

sgugger left a comment •

edited

Loading

sgugger Mar 25, 2021

sgugger Mar 25, 2021

sgugger Mar 25, 2021

sgugger Mar 25, 2021

sgugger Mar 25, 2021

sgugger Mar 25, 2021

sgugger Mar 25, 2021

philschmid commented Mar 26, 2021

LysandreJik left a comment

		from .config_utils import _ask_field, _convert_distributed_mode, _convert_yes_no_to_bool


		def get_cluster_input():

		default_config_file = default_json_config_file


		def load_config_from_file(config_file):

	"In which compute environment are you running? ([0] Custom Cluster, [1] AWS (Amazon SageMaker)): ",
	"In which compute environment are you running? ([0] This machine, [1] AWS (Amazon SageMaker)): ",

	if isinstance(self.distributed_type, str):
	if isinstance(self.compute_environment, str):

Add Configuration setup for SageMaker #17

Add Configuration setup for SageMaker #17

Conversation

philschmid commented Mar 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philschmid Mar 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgugger left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philschmid commented Mar 26, 2021

LysandreJik left a comment

Choose a reason for hiding this comment

philschmid commented Mar 25, 2021 •

edited

Loading

philschmid Mar 25, 2021 •

edited

Loading

sgugger left a comment •

edited

Loading