Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Accelerator refactor #5385

Closed
wants to merge 122 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
122 commits
Select commit Hold shift + click to select a range
b932996
move to old package
justusschock Nov 9, 2020
eb804fb
add initial draft of new accelerators
justusschock Nov 9, 2020
0e69b07
add initial data parallel draft
justusschock Nov 9, 2020
c66a1bf
add initial precision draft
justusschock Nov 9, 2020
f88dad2
scheduler helper functions
justusschock Nov 9, 2020
e4d6e1e
define base plugin api
justusschock Nov 11, 2020
a309425
base plugin integration
justusschock Nov 11, 2020
0c7dc10
continue ddp plugin
justusschock Nov 11, 2020
d0fa39c
minor changes precision plugin
justusschock Nov 11, 2020
7fb18e9
start ddp plugin
justusschock Nov 11, 2020
f85923b
initail version ddp spawn
justusschock Nov 12, 2020
72069df
remove deprecated implementation
justusschock Nov 12, 2020
6d5c764
add comment on whats missing
justusschock Nov 12, 2020
c1b2fc2
latest state
justusschock Nov 20, 2020
6eba30b
update accelerator for model to live in traintype plugin
justusschock Nov 30, 2020
8f05699
add general plugin interface
justusschock Nov 30, 2020
4fb42d5
add model properties
justusschock Nov 30, 2020
89448ed
Trainer integration part 1 for CPU accelerator
awaelchli Dec 4, 2020
bd42a88
test single gpu trainer integration
awaelchli Dec 6, 2020
8602b47
make device changes a bit less hardcoded
justusschock Dec 7, 2020
bdd0eb0
properly resolve attributes
justusschock Dec 7, 2020
7f2224b
add properties for accelerator forwarding
justusschock Dec 7, 2020
f1441a3
correct optimizer_step calls
justusschock Dec 7, 2020
ea2a92b
call train or test
awaelchli Dec 7, 2020
71c0170
make calls to trainstep (ad fix bugs)
justusschock Dec 7, 2020
89540ea
remove gradient_clip_val from accelerator
awaelchli Dec 7, 2020
817522f
add back the step end methods
awaelchli Dec 7, 2020
d2778ef
add precision todo comment
awaelchli Dec 7, 2020
e0e6bb9
ddp
awaelchli Dec 8, 2020
d7cd92e
clean up
awaelchli Dec 8, 2020
b1bdc19
connect
awaelchli Dec 8, 2020
be4ae59
clean up
awaelchli Dec 8, 2020
1b9c9bf
post
awaelchli Dec 8, 2020
f060b4a
pst
awaelchli Dec 8, 2020
39cb046
disable progress bar on rank > 0
awaelchli Dec 9, 2020
c5072e1
precision test
justusschock Dec 10, 2020
baf5881
fix native amp
justusschock Dec 10, 2020
20d2375
a
awaelchli Dec 12, 2020
16efd44
adsf
awaelchli Dec 12, 2020
a952ded
undo
awaelchli Dec 12, 2020
0c9a34d
ddp spawn
awaelchli Dec 12, 2020
386590a
a
awaelchli Dec 12, 2020
05e87a2
spawn
awaelchli Dec 12, 2020
1f0cde8
ad
awaelchli Dec 12, 2020
70fe209
d
awaelchli Dec 12, 2020
476c561
finish ddp plugin integration
awaelchli Dec 13, 2020
cee0a3b
remove logger from plugins
awaelchli Dec 13, 2020
63e4031
setup
awaelchli Dec 13, 2020
9f6d52a
remove logger arg
awaelchli Dec 13, 2020
dc2bbe6
module
awaelchli Dec 13, 2020
de8fa73
clean up
awaelchli Dec 13, 2020
b714cfb
clean up
awaelchli Dec 13, 2020
fba75ab
ddp_cpu integration
awaelchli Dec 14, 2020
a8f9fd0
cuda context manager for emptying cache
awaelchli Dec 14, 2020
7bd0581
args
awaelchli Dec 14, 2020
8d70f37
move "log_gpu_memory" to logger connector
awaelchli Dec 14, 2020
8b3ab41
fix imports
justusschock Dec 14, 2020
1d7968d
typo
justusschock Dec 14, 2020
853b4ac
remove todo
justusschock Dec 14, 2020
6657daa
add rpc_enabled flag
justusschock Dec 14, 2020
0aca3b0
remove unused self arg
justusschock Dec 14, 2020
3c3463c
comment out unnexessary amp part
justusschock Dec 14, 2020
808b548
fix model connector
justusschock Dec 14, 2020
33cc785
fix import
justusschock Dec 14, 2020
e9a30cc
copy properties only once
justusschock Dec 14, 2020
5153522
add cluster env
awaelchli Dec 22, 2020
29d2c3c
move slurm configuration
awaelchli Dec 22, 2020
d3cf3e6
resolve importerrors
awaelchli Dec 22, 2020
acf8014
handle distributed_sampler_kwargs
awaelchli Dec 22, 2020
2c586a7
move emptying cache to accelertor
awaelchli Dec 22, 2020
4509cfc
fix a few tests
awaelchli Dec 22, 2020
304946c
restoring the result from subprocess
awaelchli Dec 22, 2020
0f63ff9
fix queue.get() order for results
awaelchli Dec 22, 2020
fb3748f
add missing "block_backward_sync" context manager
awaelchli Dec 22, 2020
db13592
add missing "block_backward_sync" context manager
awaelchli Dec 22, 2020
19f232b
fix sync_batchnorm
awaelchli Dec 22, 2020
7fa62c6
fix supported gpu-ids for tuple
awaelchli Dec 22, 2020
50a4cd2
fix clip gradients and inf recursion
awaelchli Dec 22, 2020
82870ba
accelerator selection: added cluster_environment plugin
awaelchli Dec 23, 2020
b713c38
fix torchelastic test
awaelchli Dec 23, 2020
e71ead8
fix reduce early stopping decision for DDP
awaelchli Dec 24, 2020
3b86b6f
fix tests: callbacks, conversion to lightning optimizer
awaelchli Dec 24, 2020
b596d71
fix lightning optimizer does not pickle
awaelchli Dec 24, 2020
d3d29a9
fix setting benchmark and deterministic option
awaelchli Dec 24, 2020
773f178
fix slurm amp test
awaelchli Dec 24, 2020
c4370cc
fix prepare_data test and determine node_rank
awaelchli Dec 27, 2020
a8f080b
fix retrieving last path when testing
awaelchli Dec 27, 2020
ec94920
remove obsolete plugin argument
awaelchli Dec 27, 2020
144348e
fix test: test_trainer_config
awaelchli Dec 27, 2020
97936f8
fix torchscript tests
awaelchli Dec 27, 2020
e843661
fix trainer.model access
awaelchli Dec 27, 2020
d931d0c
move properties
awaelchli Dec 27, 2020
ee23e36
fix test_transfer_batch_hook
awaelchli Dec 27, 2020
5276a06
fix auto_select_gpus
awaelchli Dec 27, 2020
ec6807c
fix omegaconf test
awaelchli Dec 27, 2020
e09633b
fix test that needs to simulate slurm ddp
awaelchli Dec 27, 2020
ad38d18
add horovod plugin
awaelchli Dec 29, 2020
b5800e3
fix test with named arguments
awaelchli Dec 29, 2020
1519e62
clean up whitespace
awaelchli Dec 29, 2020
05a9939
fix datamodules test
awaelchli Dec 29, 2020
8c87da7
remove old accelerators
justusschock Jan 6, 2021
3532228
fix naming
justusschock Jan 6, 2021
c84f81f
move old plugins
justusschock Jan 6, 2021
15d50e2
move to plugins
justusschock Jan 6, 2021
a579dc8
create precision subpackage
justusschock Jan 6, 2021
78b6ed6
create training_type subpackage
justusschock Jan 6, 2021
391b589
fix all new import errors
awaelchli Jan 7, 2021
5fe7d33
fix wrong arguments order passed to test
awaelchli Jan 7, 2021
65fb1a1
fix LR finder
awaelchli Jan 10, 2021
8bbd18b
Added sharded training type and amp plugin
Jan 11, 2021
3d152b0
Move clip grad to precision plugin
Jan 11, 2021
51e23b7
Added sharded spawn, select accelerators based on distributed_backend…
Jan 12, 2021
4662c62
Fix import issue, attempting to fix tests
Jan 12, 2021
e0dc481
Fix initial test
Jan 12, 2021
a8b8b60
Reflect hook logic from master, should wrap model after move to device
Jan 14, 2021
19be001
remove unnecessary test file
justusschock Jan 21, 2021
8308ee7
optional state consolidation since spawn does not wrap optim on main …
justusschock Jan 21, 2021
c3f9595
change plugin type
justusschock Jan 21, 2021
6fe8038
move state loading to main proc only
justusschock Jan 21, 2021
4df8629
reset optimizers properly
justusschock Jan 21, 2021
944dcef
fix some imports
justusschock Jan 22, 2021
575b500
fix rebasing
justusschock Jan 22, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 13 additions & 38 deletions benchmarks/test_sharded_parity.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,12 @@
import os
import platform
import time
from typing import Type, Union
from typing import Type

import pytest
import torch

from pytorch_lightning import seed_everything, Trainer
from pytorch_lightning.plugins.ddp_plugin import DDPPlugin
from pytorch_lightning.plugins.sharded_plugin import DDPShardedPlugin
from pytorch_lightning.utilities import _FAIRSCALE_AVAILABLE, _NATIVE_AMP_AVAILABLE
from tests.backends import DDPLauncher
from tests.base.boring_model import BoringModel, RandomDataset
Expand All @@ -32,10 +30,8 @@
@pytest.mark.skipif(platform.system() == "Windows", reason="Distributed training is not supported on Windows")
@pytest.mark.skipif(not _FAIRSCALE_AVAILABLE, reason="Fairscale is not available")
def test_ddp_sharded_plugin_correctness_one_gpu():
plugin_parity_test(
sharded_parity_test(
gpus=1,
accelerator='ddp_spawn',
plugin=DDPShardedPlugin(),
model_cls=SeedTrainLoaderModel,
)

Expand All @@ -45,11 +41,9 @@ def test_ddp_sharded_plugin_correctness_one_gpu():
@pytest.mark.skipif(platform.system() == "Windows", reason="Distributed training is not supported on Windows")
@pytest.mark.skipif(not _FAIRSCALE_AVAILABLE, reason="Fairscale is not available")
def test_ddp_sharded_plugin_correctness_amp_one_gpu():
plugin_parity_test(
sharded_parity_test(
gpus=1,
precision=16,
accelerator='ddp_spawn',
plugin=DDPShardedPlugin(),
model_cls=SeedTrainLoaderModel,
)

Expand All @@ -59,10 +53,8 @@ def test_ddp_sharded_plugin_correctness_amp_one_gpu():
@pytest.mark.skipif(platform.system() == "Windows", reason="Distributed training is not supported on Windows")
@pytest.mark.skipif(not _FAIRSCALE_AVAILABLE, reason="Fairscale is not available")
def test_ddp_sharded_plugin_correctness_multi_gpu():
plugin_parity_test(
sharded_parity_test(
gpus=2,
accelerator='ddp_spawn',
plugin=DDPShardedPlugin(),
model_cls=SeedTrainLoaderModel,
max_percent_speed_diff=0.25, # todo: Increase speed diff since only 2 GPUs sharding 2 optimizers
)
Expand All @@ -73,11 +65,9 @@ def test_ddp_sharded_plugin_correctness_multi_gpu():
@pytest.mark.skipif(torch.cuda.device_count() < 2, reason="test requires multi-GPU machine")
@pytest.mark.skipif(not _FAIRSCALE_AVAILABLE, reason="Fairscale is not available")
def test_ddp_sharded_plugin_correctness_amp_multi_gpu():
plugin_parity_test(
sharded_parity_test(
gpus=2,
precision=16,
accelerator='ddp_spawn',
plugin=DDPShardedPlugin(),
model_cls=SeedTrainLoaderModel,
max_percent_speed_diff=0.25, # todo: Increase speed diff since only 2 GPUs sharding 2 optimizers
)
Expand All @@ -88,11 +78,9 @@ def test_ddp_sharded_plugin_correctness_amp_multi_gpu():
@pytest.mark.skipif(torch.cuda.device_count() < 2, reason="test requires multi-GPU machine")
@pytest.mark.skipif(not _FAIRSCALE_AVAILABLE, reason="Fairscale is not available")
def test_ddp_string_sharded_plugin_correctness_amp_multi_gpu():
plugin_parity_test(
sharded_parity_test(
gpus=2,
precision=16,
accelerator='ddp_spawn',
plugin='ddp_sharded',
model_cls=SeedTrainLoaderModel,
max_percent_speed_diff=0.25, # todo: Increase speed diff since only 2 GPUs sharding 2 optimizers
)
Expand All @@ -104,11 +92,9 @@ def test_ddp_string_sharded_plugin_correctness_amp_multi_gpu():
reason="test should be run outside of pytest")
@DDPLauncher.run("--accelerator ddp --gpus 2 --precision 32")
def test_ddp_sharded_plugin_correctness_multi_gpu_ddp(tmpdir, args=None):
plugin_parity_test(
sharded_parity_test(
gpus=args.gpus,
precision=args.precision,
accelerator=args.accelerator,
plugin=DDPShardedPlugin(),
model_cls=SeedTrainLoaderModel,
)

Expand All @@ -119,11 +105,9 @@ def test_ddp_sharded_plugin_correctness_multi_gpu_ddp(tmpdir, args=None):
reason="test should be run outside of pytest")
@DDPLauncher.run("--accelerator ddp --gpus 2 --precision 16")
def test_ddp_sharded_plugin_correctness_amp_multi_gpu_ddp(tmpdir, args=None):
plugin_parity_test(
sharded_parity_test(
gpus=args.gpus,
precision=args.precision,
accelerator=args.accelerator,
plugin=DDPShardedPlugin(),
model_cls=SeedTrainLoaderModel,
)

Expand All @@ -136,10 +120,8 @@ def test_ddp_sharded_plugin_correctness_multi_gpu_multi_optim():
"""
Ensures same results using multiple optimizers across multiple GPUs
"""
plugin_parity_test(
plugin=DDPShardedPlugin(),
sharded_parity_test(
gpus=2,
accelerator='ddp_spawn',
model_cls=SeedTrainLoaderMultipleOptimizersModel,
max_percent_speed_diff=0.25, # todo: Increase speed diff since only 2 GPUs sharding 2 optimizers
)
Expand All @@ -153,10 +135,8 @@ def test_ddp_sharded_plugin_correctness_multi_gpu_multi_optim_manual(tmpdir):
"""
Ensures using multiple optimizers across multiple GPUs with manual optimization
"""
plugin_parity_test(
plugin=DDPShardedPlugin(),
sharded_parity_test(
gpus=2,
accelerator='ddp_spawn',
model_cls=SeedTrainLoaderManualModel,
max_percent_speed_diff=0.25, # todo: Increase speed diff since only 2 GPUs sharding 2 optimizers
)
Expand Down Expand Up @@ -253,11 +233,9 @@ def record_ddp_fit_model_stats(trainer, model, use_cuda):
return max_memory, total_time


def plugin_parity_test(
def sharded_parity_test(
model_cls: Type[SeedTrainLoaderModel],
plugin: Union[str, DDPPlugin],
seed: int = 42,
accelerator: str = 'ddp_spawn',
gpus: int = 0,
precision: int = 32,
max_percent_speed_diff: float = 0.1,
Expand All @@ -268,9 +246,7 @@ def plugin_parity_test(

Args:
model_cls: Model class to use for test.
plugin: Plugin to parity test.
seed: Seed for generators. Note that this does not handle the seed for data-loading on multi-process.
accelerator: Accelerator type for test.
gpus: Number of GPUS to enable.
precision: Whether to use AMP or normal FP32 training.
max_percent_speed_diff: The maximum speed difference compared to normal DDP training.
Expand All @@ -288,7 +264,7 @@ def plugin_parity_test(
max_epochs=1,
gpus=gpus,
precision=precision,
accelerator=accelerator,
accelerator='ddp_spawn',
)

max_memory_ddp, ddp_time = record_ddp_fit_model_stats(
Expand All @@ -306,8 +282,7 @@ def plugin_parity_test(
max_epochs=1,
gpus=gpus,
precision=precision,
accelerator=accelerator,
plugins=[plugin],
accelerator='ddp_sharded_spawn',
)

max_memory_custom, custom_model_time = record_ddp_fit_model_stats(
Expand Down
11 changes: 5 additions & 6 deletions pl_examples/bug_report_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,24 +55,23 @@ class BoringModel(LightningModule):
def __init__(self):
"""
Testing PL Module

Use as follows:
- subclass
- modify the behavior for what you want

class TestModel(BaseTestModel):
def training_step(...):
# do your own thing

or:

model = BaseTestModel()
model.training_epoch_end = None

"""
super().__init__()
self.layer = torch.nn.Linear(32, 2)

@property
def automatic_optimization(self):
return True

def forward(self, x):
return self.layer(x)

Expand All @@ -81,7 +80,7 @@ def loss(self, batch, prediction):
return torch.nn.functional.mse_loss(prediction, torch.ones_like(prediction))

def step(self, x):
x = self.layer(x)
x = self(x)
out = torch.nn.functional.mse_loss(x, torch.ones_like(x))
return out

Expand Down
29 changes: 4 additions & 25 deletions pytorch_lightning/accelerators/__init__.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,4 @@
# Copyright The PyTorch Lightning team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from pytorch_lightning.accelerators.accelerator import Accelerator # noqa: F401
from pytorch_lightning.accelerators.cpu_accelerator import CPUAccelerator # noqa: F401
from pytorch_lightning.accelerators.ddp2_accelerator import DDP2Accelerator # noqa: F401
from pytorch_lightning.accelerators.ddp_accelerator import DDPAccelerator # noqa: F401
from pytorch_lightning.accelerators.ddp_cpu_hpc_accelerator import DDPCPUHPCAccelerator # noqa: F401
from pytorch_lightning.accelerators.ddp_cpu_spawn_accelerator import DDPCPUSpawnAccelerator # noqa: F401
from pytorch_lightning.accelerators.ddp_hpc_accelerator import DDPHPCAccelerator # noqa: F401
from pytorch_lightning.accelerators.ddp_spawn_accelerator import DDPSpawnAccelerator # noqa: F401
from pytorch_lightning.accelerators.dp_accelerator import DataParallelAccelerator # noqa: F401
from pytorch_lightning.accelerators.gpu_accelerator import GPUAccelerator # noqa: F401
from pytorch_lightning.accelerators.horovod_accelerator import HorovodAccelerator # noqa: F401
from pytorch_lightning.accelerators.tpu_accelerator import TPUAccelerator # noqa: F401
from pytorch_lightning.accelerators.accelerator import Accelerator
from pytorch_lightning.accelerators.cpu import CPUAccelerator
from pytorch_lightning.accelerators.gpu import GPUAccelerator
from pytorch_lightning.accelerators.tpu import TPUAccelerator
Comment on lines +1 to +4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 ... similar to trainer new attributes _distrib_type and _device_type
on the other hand, it will be breaking change... :/

Loading