-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Accelerator refactor #5385
Closed
justusschock
wants to merge
122
commits into
Lightning-AI:master
from
justusschock:accelerator-refactor
Closed
Changes from all commits
Commits
Show all changes
122 commits
Select commit
Hold shift + click to select a range
b932996
move to old package
justusschock eb804fb
add initial draft of new accelerators
justusschock 0e69b07
add initial data parallel draft
justusschock c66a1bf
add initial precision draft
justusschock f88dad2
scheduler helper functions
justusschock e4d6e1e
define base plugin api
justusschock a309425
base plugin integration
justusschock 0c7dc10
continue ddp plugin
justusschock d0fa39c
minor changes precision plugin
justusschock 7fb18e9
start ddp plugin
justusschock f85923b
initail version ddp spawn
justusschock 72069df
remove deprecated implementation
justusschock 6d5c764
add comment on whats missing
justusschock c1b2fc2
latest state
justusschock 6eba30b
update accelerator for model to live in traintype plugin
justusschock 8f05699
add general plugin interface
justusschock 4fb42d5
add model properties
justusschock 89448ed
Trainer integration part 1 for CPU accelerator
awaelchli bd42a88
test single gpu trainer integration
awaelchli 8602b47
make device changes a bit less hardcoded
justusschock bdd0eb0
properly resolve attributes
justusschock 7f2224b
add properties for accelerator forwarding
justusschock f1441a3
correct optimizer_step calls
justusschock ea2a92b
call train or test
awaelchli 71c0170
make calls to trainstep (ad fix bugs)
justusschock 89540ea
remove gradient_clip_val from accelerator
awaelchli 817522f
add back the step end methods
awaelchli d2778ef
add precision todo comment
awaelchli e0e6bb9
ddp
awaelchli d7cd92e
clean up
awaelchli b1bdc19
connect
awaelchli be4ae59
clean up
awaelchli 1b9c9bf
post
awaelchli f060b4a
pst
awaelchli 39cb046
disable progress bar on rank > 0
awaelchli c5072e1
precision test
justusschock baf5881
fix native amp
justusschock 20d2375
a
awaelchli 16efd44
adsf
awaelchli a952ded
undo
awaelchli 0c9a34d
ddp spawn
awaelchli 386590a
a
awaelchli 05e87a2
spawn
awaelchli 1f0cde8
ad
awaelchli 70fe209
d
awaelchli 476c561
finish ddp plugin integration
awaelchli cee0a3b
remove logger from plugins
awaelchli 63e4031
setup
awaelchli 9f6d52a
remove logger arg
awaelchli dc2bbe6
module
awaelchli de8fa73
clean up
awaelchli b714cfb
clean up
awaelchli fba75ab
ddp_cpu integration
awaelchli a8f9fd0
cuda context manager for emptying cache
awaelchli 7bd0581
args
awaelchli 8d70f37
move "log_gpu_memory" to logger connector
awaelchli 8b3ab41
fix imports
justusschock 1d7968d
typo
justusschock 853b4ac
remove todo
justusschock 6657daa
add rpc_enabled flag
justusschock 0aca3b0
remove unused self arg
justusschock 3c3463c
comment out unnexessary amp part
justusschock 808b548
fix model connector
justusschock 33cc785
fix import
justusschock e9a30cc
copy properties only once
justusschock 5153522
add cluster env
awaelchli 29d2c3c
move slurm configuration
awaelchli d3cf3e6
resolve importerrors
awaelchli acf8014
handle distributed_sampler_kwargs
awaelchli 2c586a7
move emptying cache to accelertor
awaelchli 4509cfc
fix a few tests
awaelchli 304946c
restoring the result from subprocess
awaelchli 0f63ff9
fix queue.get() order for results
awaelchli fb3748f
add missing "block_backward_sync" context manager
awaelchli db13592
add missing "block_backward_sync" context manager
awaelchli 19f232b
fix sync_batchnorm
awaelchli 7fa62c6
fix supported gpu-ids for tuple
awaelchli 50a4cd2
fix clip gradients and inf recursion
awaelchli 82870ba
accelerator selection: added cluster_environment plugin
awaelchli b713c38
fix torchelastic test
awaelchli e71ead8
fix reduce early stopping decision for DDP
awaelchli 3b86b6f
fix tests: callbacks, conversion to lightning optimizer
awaelchli b596d71
fix lightning optimizer does not pickle
awaelchli d3d29a9
fix setting benchmark and deterministic option
awaelchli 773f178
fix slurm amp test
awaelchli c4370cc
fix prepare_data test and determine node_rank
awaelchli a8f080b
fix retrieving last path when testing
awaelchli ec94920
remove obsolete plugin argument
awaelchli 144348e
fix test: test_trainer_config
awaelchli 97936f8
fix torchscript tests
awaelchli e843661
fix trainer.model access
awaelchli d931d0c
move properties
awaelchli ee23e36
fix test_transfer_batch_hook
awaelchli 5276a06
fix auto_select_gpus
awaelchli ec6807c
fix omegaconf test
awaelchli e09633b
fix test that needs to simulate slurm ddp
awaelchli ad38d18
add horovod plugin
awaelchli b5800e3
fix test with named arguments
awaelchli 1519e62
clean up whitespace
awaelchli 05a9939
fix datamodules test
awaelchli 8c87da7
remove old accelerators
justusschock 3532228
fix naming
justusschock c84f81f
move old plugins
justusschock 15d50e2
move to plugins
justusschock a579dc8
create precision subpackage
justusschock 78b6ed6
create training_type subpackage
justusschock 391b589
fix all new import errors
awaelchli 5fe7d33
fix wrong arguments order passed to test
awaelchli 65fb1a1
fix LR finder
awaelchli 8bbd18b
Added sharded training type and amp plugin
3d152b0
Move clip grad to precision plugin
51e23b7
Added sharded spawn, select accelerators based on distributed_backend…
4662c62
Fix import issue, attempting to fix tests
e0dc481
Fix initial test
a8b8b60
Reflect hook logic from master, should wrap model after move to device
19be001
remove unnecessary test file
justusschock 8308ee7
optional state consolidation since spawn does not wrap optim on main …
justusschock c3f9595
change plugin type
justusschock 6fe8038
move state loading to main proc only
justusschock 4df8629
reset optimizers properly
justusschock 944dcef
fix some imports
justusschock 575b500
fix rebasing
justusschock File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,4 @@ | ||
# Copyright The PyTorch Lightning team. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
from pytorch_lightning.accelerators.accelerator import Accelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.cpu_accelerator import CPUAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.ddp2_accelerator import DDP2Accelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.ddp_accelerator import DDPAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.ddp_cpu_hpc_accelerator import DDPCPUHPCAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.ddp_cpu_spawn_accelerator import DDPCPUSpawnAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.ddp_hpc_accelerator import DDPHPCAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.ddp_spawn_accelerator import DDPSpawnAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.dp_accelerator import DataParallelAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.gpu_accelerator import GPUAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.horovod_accelerator import HorovodAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.tpu_accelerator import TPUAccelerator # noqa: F401 | ||
from pytorch_lightning.accelerators.accelerator import Accelerator | ||
from pytorch_lightning.accelerators.cpu import CPUAccelerator | ||
from pytorch_lightning.accelerators.gpu import GPUAccelerator | ||
from pytorch_lightning.accelerators.tpu import TPUAccelerator | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 ... similar to trainer new attributes
_distrib_type
and_device_type
on the other hand, it will be breaking change... :/