Merge branch 'master' into bugfix/batch-device

Lightning-AI · Jul 1, 2021 · 88ca10d · 88ca10d
2 parents 5ddeaec + d51b0ae
commit 88ca10d
Show file tree

Hide file tree

Showing 96 changed files with 1,813 additions and 407 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -91,7 +91,7 @@ jobs:
     docker:
       - image: circleci/python:3.7
     environment:
-      - XLA_VER: 1.7
+      - XLA_VER: 1.8
       - MAX_CHECKS: 240
       - CHECK_SPEEP: 5
     steps:

diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -36,6 +36,7 @@
 
 # Specifics
 /pytorch_lightning/trainer/connectors/logger_connector @tchaton @carmocca
+/pytorch_lightning/trainer/progress.py  @tchaton @awaelchli @carmocca
 
 # Metrics
 /pytorch_lightning/metrics/             @SkafteNicki @ananyahjha93 @justusschock

diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -2,6 +2,8 @@
 
 Welcome to the PyTorch Lightning community! We're building the most advanced research platform on the planet to implement the latest, best practices that the amazing PyTorch team rolls out!
 
+If you are new to open source, check out [this blog to get started with your first Open Source contribution](https://devblog.pytorchlightning.ai/quick-contribution-guide-86d977171b3a).
+
 ## Main Core Value: One less thing to remember
 
 Simplify the API as much as possible from the user perspective.
@@ -58,13 +60,13 @@ Have a favorite feature from other libraries like fast.ai or transformers? Those
 
 ## Contribution Types
 
-We are always looking for help implementing new features or fixing bugs.
+We are always open to contributions of new features or bug fixes.
 
 A lot of good work has already been done in project mechanics (requirements.txt, setup.py, pep8, badges, ci, etc...) so we're in a good state there thanks to all the early contributors (even pre-beta release)!
 
 ### Bug Fixes:
 
-1. If you find a bug please submit a github issue.
+1. If you find a bug please submit a GitHub issue.
 
    - Make sure the title explains the issue.
    - Describe your setup, what you are trying to do, expected vs. actual behaviour. Please add configs and code samples.
@@ -79,12 +81,12 @@ A lot of good work has already been done in project mechanics (requirements.txt,
 
 3. Submit a PR!
 
-_**Note**, even if you do not find the solution, sending a PR with a test covering the issue is a valid contribution and we can help you or finish it with you :]_
+_**Note**, even if you do not find the solution, sending a PR with a test covering the issue is a valid contribution, and we can help you or finish it with you :]_
 
 ### New Features:
 
-1. Submit a github issue - describe what is the motivation of such feature (adding the use case or an example is helpful).
-2. Let's discuss to determine the feature scope.
+1. Submit a GitHub issue - describe what is the motivation of such feature (adding the use case, or an example is helpful).
+2. Determine the feature scope with us.
 3. Submit a PR! We recommend test driven approach to adding new features as well:
 
    - Write a test for the functionality you want to add.
@@ -199,7 +201,7 @@ Note: if your computer does not have multi-GPU nor TPU these tests are skipped.
 **GitHub Actions:** For convenience, you can also use your own GHActions building which will be triggered with each commit.
 This is useful if you do not test against all required dependency versions.
 
-**Docker:** Another option is utilize the [pytorch lightning cuda base docker image](https://hub.docker.com/repository/docker/pytorchlightning/pytorch_lightning/tags?page=1&name=cuda). You can then run:
+**Docker:** Another option is to utilize the [pytorch lightning cuda base docker image](https://hub.docker.com/repository/docker/pytorchlightning/pytorch_lightning/tags?page=1&name=cuda). You can then run:
 
 ```bash
 python -m pytest pytorch_lightning tests pl_examples -v
@@ -230,7 +232,7 @@ We welcome any useful contribution! For your convenience here's a recommended wo
    - Make sure all tests are passing.
    - Make sure you add a GitHub issue to your PR.
 5. Use tags in PR name for following cases:
-   - **[blocked by #<number>]** if you work is depending on others changes.
+   - **[blocked by #<number>]** if your work is dependent on other PRs.
    - **[wip]** when you start to re-edit your work, mark it so no one will accidentally merge it in meantime.
 
 ### Question & Answer

diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -41,13 +41,14 @@ wget https://raw.githubusercontent.com/PyTorchLightning/pytorch-lightning/master
 python collect_env_details.py
 ```
 
- - PyTorch Version (e.g., 1.0):
- - OS (e.g., Linux):
- - How you installed PyTorch (`conda`, `pip`, source):
- - Build command you used (if compiling from source):
+ - PyTorch Lightning Version (e.g., 1.3.0):
+ - PyTorch Version (e.g., 1.8)
  - Python version:
+ - OS (e.g., Linux):
  - CUDA/cuDNN version:
  - GPU models and configuration:
+ - How you installed PyTorch (`conda`, `pip`, source):
+ - If compiling from source, the output of `torch.__config__.show()`:
  - Any other relevant information:
 
 ### Additional context

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -30,9 +30,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Added support for checkpointing based on a provided time interval during training ([#7515](https://github.com/PyTorchLightning/pytorch-lightning/pull/7515))
 
 
-- Added dataclasses for progress tracking (
-    [#6603](https://github.com/PyTorchLightning/pytorch-lightning/pull/6603),
-    [#7574](https://github.com/PyTorchLightning/pytorch-lightning/pull/7574))
+- Progress tracking
+  * Added dataclasses for progress tracking ([#6603](https://github.com/PyTorchLightning/pytorch-lightning/pull/6603), [#7574](https://github.com/PyTorchLightning/pytorch-lightning/pull/7574), [#8140](https://github.com/PyTorchLightning/pytorch-lightning/pull/8140))
+  * Add `{,load_}state_dict` to the progress tracking dataclasses ([#8140](https://github.com/PyTorchLightning/pytorch-lightning/pull/8140))
 
 
 - Added support for passing a `LightningDataModule` positionally as the second argument to `trainer.{validate,test,predict}` ([#7431](https://github.com/PyTorchLightning/pytorch-lightning/pull/7431))
@@ -84,11 +84,14 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 
 - Fault-tolerant training
-    * Add `{,load_}state_dict` to `ResultCollection` ([#7948](https://github.com/PyTorchLightning/pytorch-lightning/pull/7948))
-    * Checkpoint the loop results ([#7966](https://github.com/PyTorchLightning/pytorch-lightning/pull/7966))
+    * Added `{,load_}state_dict` to `ResultCollection` ([#7948](https://github.com/PyTorchLightning/pytorch-lightning/pull/7948))
+    * Added `{,load_}state_dict` to `Loops` ([#8197](https://github.com/PyTorchLightning/pytorch-lightning/pull/8197))
 
 
-- Add `rank_zero_only` to `LightningModule.log` function ([#7966](https://github.com/PyTorchLightning/pytorch-lightning/pull/7966))
+- Added `rank_zero_only` to `LightningModule.log` function ([#7966](https://github.com/PyTorchLightning/pytorch-lightning/pull/7966))
+
+
+- Added `metric_attribute` to `LightningModule.log` function ([#7966](https://github.com/PyTorchLightning/pytorch-lightning/pull/7966))
 
 
 - Added a warning if `Trainer(log_every_n_steps)` is a value too high for the training dataloader ([#7734](https://github.com/PyTorchLightning/pytorch-lightning/pull/7734))
@@ -115,9 +118,18 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Add support for calling scripts using the module syntax (`python -m package.script`) ([#8073](https://github.com/PyTorchLightning/pytorch-lightning/pull/8073))
 
 
+- Add support for optimizers and learning rate schedulers to `LightningCLI` ([#8093](https://github.com/PyTorchLightning/pytorch-lightning/pull/8093))
+
+
 - Add torchelastic check when sanitizing GPUs ([#8095](https://github.com/PyTorchLightning/pytorch-lightning/pull/8095))
 
 
+- Added XLA Profiler ([#8014](https://github.com/PyTorchLightning/pytorch-lightning/pull/8014))
+
+
+- Added `max_depth` parameter in `ModelSummary` ([#8062](https://github.com/PyTorchLightning/pytorch-lightning/pull/8062))
+
+
 ### Changed
 
 
@@ -220,6 +232,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - `Trainer(resume_from_checkpoint=...)` now restores the model directly after `LightningModule.setup()`, which is before `LightningModule.configure_sharded_model()` ([#7652](https://github.com/PyTorchLightning/pytorch-lightning/pull/7652))
 
 
+- Added a mechanism to detect `deadlock` for `DDP` when only 1 process trigger an `Exception`. The mechanism will `kill the processes` when it happens ([#8167](https://github.com/PyTorchLightning/pytorch-lightning/pull/8167))
+
+
 ### Deprecated
 
 
@@ -253,9 +268,15 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Deprecated the use of `CheckpointConnector.hpc_load()` in favor of `CheckpointConnector.restore()` ([#7652](https://github.com/PyTorchLightning/pytorch-lightning/pull/7652))
 
 
+- Deprecated `DDPPlugin.task_idx` in favor of `DDPPlugin.local_rank` ([#8203](https://github.com/PyTorchLightning/pytorch-lightning/pull/8203))
+
+
 - Deprecated the `Trainer.train_loop` property in favor of `Trainer.fit_loop` ([#8025](https://github.com/PyTorchLightning/pytorch-lightning/pull/8025))
 
 
+- Deprecated `mode` parameter in `ModelSummary` in favor of `max_depth` ([#8062](https://github.com/PyTorchLightning/pytorch-lightning/pull/8062))
+
+
 ### Removed
 
 - Removed `ProfilerConnector` ([#7654](https://github.com/PyTorchLightning/pytorch-lightning/pull/7654))
@@ -285,6 +306,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 ### Fixed
 
 
+- Fixed SWA to also work with `IterableDataset` ([#8172](https://github.com/PyTorchLightning/pytorch-lightning/pull/8172))
+
 - Fixed `lr_scheduler` checkpointed state by calling `update_lr_schedulers` before saving checkpoints ([#7877](https://github.com/PyTorchLightning/pytorch-lightning/pull/7877))
 
 
@@ -315,6 +338,23 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Fixed a DDP info message that was never shown ([#8111](https://github.com/PyTorchLightning/pytorch-lightning/pull/8111))
 
 
+- Fixed metrics generated during `validation sanity checking` are cleaned on end ([#8171](https://github.com/PyTorchLightning/pytorch-lightning/pull/8171))
+
+
+- Fixed a bug where an infinite recursion would be triggered when using the `BaseFinetuning` callback on a model that contains a `ModuleDict` ([#8170](https://github.com/PyTorchLightning/pytorch-lightning/pull/8170))
+
+
+- Fixed NCCL error when selecting non-consecutive device ids ([#8165](https://github.com/PyTorchLightning/pytorch-lightning/pull/8165))
+
+
+- Fixed `log_gpu_memory` metrics not being added to `logging` when nothing else is logged ([#8174](https://github.com/PyTorchLightning/pytorch-lightning/pull/8174))
+
+
+- Fixed a bug where calling `log` with a `Metric` instance would raise an error if it was a nested attribute of the model ([#8181](https://github.com/PyTorchLightning/pytorch-lightning/pull/8181))
+
+
+- Fixed a bug where using `precision=64` would cause buffers with complex dtype to be cast to real ([#8208](https://github.com/PyTorchLightning/pytorch-lightning/pull/8208))
+
 - Fixes access to `callback_metrics` in ddp_spawn ([#7916](https://github.com/PyTorchLightning/pytorch-lightning/pull/7916))
 
 

diff --git a/README.md b/README.md
@@ -369,7 +369,9 @@ class LitAutoEncoder(pl.LightningModule):
 
 The lightning community is maintained by
 - [10+ core contributors](https://pytorch-lightning.readthedocs.io/en/latest/governance.html) who are all a mix of professional engineers, Research Scientists, and Ph.D. students from top AI labs.
-- 400+ community contributors.
+- 480+ active community contributors.
+
+Want to help us build Lightning and reduce boilerplate for thousands of researchers? [Learn how to make your first contribution here](https://devblog.pytorchlightning.ai/quick-contribution-guide-86d977171b3a)
 
 Lightning is also part of the [PyTorch ecosystem](https://pytorch.org/ecosystem/) which requires projects to have solid testing, documentation and support.
 

diff --git a/dockers/tpu-tests/tpu_test_cases.jsonnet b/dockers/tpu-tests/tpu_test_cases.jsonnet
@@ -22,6 +22,7 @@ local tputests = base.BaseTest {
     |||
       cd pytorch-lightning
       coverage run --source=pytorch_lightning -m pytest -v --capture=no \
+          tests/profiler/test_xla_profiler.py \
           pytorch_lightning/utilities/xla_device.py \
           tests/accelerators/test_tpu_backend.py \
           tests/models/test_tpu.py

diff --git a/docs/source/_templates/layout.html b/docs/source/_templates/layout.html
@@ -0,0 +1,10 @@
+{% extends "!layout.html" %}
+<link rel="canonical" href="{{ theme_canonical_url }}{{ pagename }}.html" />
+
+{% block footer %}
+{{ super() }}
+<script script type="text/javascript">
+  var collapsedSections = ['Best practices', 'Lightning API', 'Optional extensions', 'Tutorials', 'API References', 'Bolts', 'Examples', 'Common Use Cases', 'Partner Domain Frameworks', 'Community'];
+</script>
+
+{% endblock %}
diff --git a/docs/source/_templates/theme_variables.jinja b/docs/source/_templates/theme_variables.jinja
@@ -14,5 +14,7 @@
   'blog': 'https://www.pytorchlightning.ai/blog',
   'resources': 'https://pytorch-lightning.readthedocs.io/en/latest/#community-examples',
   'support': 'https://pytorch-lightning.rtfd.io/en/latest/',
+  'community': 'https://pytorch-lightning.slack.com',
+  'forums': 'https://pytorch-lightning.slack.com',
 }
 -%}
diff --git a/docs/source/common/lightning_cli.rst b/docs/source/common/lightning_cli.rst
@@ -1,6 +1,7 @@
 .. testsetup:: *
     :skipif: not _JSONARGPARSE_AVAILABLE
 
+    import torch
     from unittest import mock
     from typing import List
     from pytorch_lightning.core.lightning import LightningModule
@@ -385,7 +386,7 @@ instantiating the trainer class can be found in :code:`self.config['trainer']`.
 
 
 Configurable callbacks
-~~~~~~~~~~~~~~~~~~~~~~
+^^^^^^^^^^^^^^^^^^^^^^
 
 As explained previously, any callback can be added by including it in the config via :code:`class_path` and
 :code:`init_args` entries. However, there are other cases in which a callback should always be present and be
@@ -417,7 +418,7 @@ To change the configuration of the :code:`EarlyStopping` in the config it would
 
 
 Argument linking
-~~~~~~~~~~~~~~~~
+^^^^^^^^^^^^^^^^
 
 Another case in which it might be desired to extend :class:`~pytorch_lightning.utilities.cli.LightningCLI` is that the
 model and data module depend on a common parameter. For example in some cases both classes require to know the
@@ -470,3 +471,117 @@ Instantiation links are used to automatically determine the order of instantiati
     The linking of arguments can be used for more complex cases. For example to derive a value via a function that takes
     multiple settings as input. For more details have a look at the API of `link_arguments
     <https://jsonargparse.readthedocs.io/en/stable/#jsonargparse.core.ArgumentParser.link_arguments>`_.
+
+
+Optimizers and learning rate schedulers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Optimizers and learning rate schedulers can also be made configurable. The most common case is when a model only has a
+single optimizer and optionally a single learning rate scheduler. In this case the model's
+:class:`~pytorch_lightning.core.lightning.LightningModule` could be left without implementing the
+:code:`configure_optimizers` method since it is normally always the same and just adds boilerplate. The following code
+snippet shows how to implement it:
+
+.. testcode::
+
+    import torch
+    from pytorch_lightning.utilities.cli import LightningCLI
+
+    class MyLightningCLI(LightningCLI):
+
+        def add_arguments_to_parser(self, parser):
+            parser.add_optimizer_args(torch.optim.Adam)
+            parser.add_lr_scheduler_args(torch.optim.lr_scheduler.ExponentialLR)
+
+    cli = MyLightningCLI(MyModel)
+
+With this the :code:`configure_optimizers` method is automatically implemented and in the config the :code:`optimizer`
+and :code:`lr_scheduler` groups would accept all of the options for the given classes, in this example :code:`Adam` and
+:code:`ExponentialLR`. Therefore, the config file would be structured like:
+
+.. code-block:: yaml
+
+    optimizer:
+      lr: 0.01
+    lr_scheduler:
+      gamma: 0.2
+    model:
+      ...
+    trainer:
+      ...
+
+And any of these arguments could be passed directly through command line. For example:
+
+.. code-block:: bash
+
+    $ python train.py --optimizer.lr=0.01 --lr_scheduler.gamma=0.2
+
+There is also the possibility of selecting among multiple classes by giving them as a tuple. For example:
+
+.. testcode::
+
+    class MyLightningCLI(LightningCLI):
+
+        def add_arguments_to_parser(self, parser):
+            parser.add_optimizer_args((torch.optim.SGD, torch.optim.Adam))
+
+In this case in the config the :code:`optimizer` group instead of having directly init settings, it should specify
+:code:`class_path` and optionally :code:`init_args`. Sub-classes of the classes in the tuple would also be accepted.
+A corresponding example of the config file would be:
+
+.. code-block:: yaml
+
+    optimizer:
+      class_path: torch.optim.Adam
+      init_args:
+        lr: 0.01
+    model:
+      ...
+    trainer:
+      ...
+
+And the same through command line:
+
+.. code-block:: bash
+
+    $ python train.py --optimizer='{class_path: torch.optim.Adam, init_args: {lr: 0.01}}'
+
+The automatic implementation of :code:`configure_optimizers` can be disabled by linking the configuration group. An
+example can be :code:`ReduceLROnPlateau` which requires to specify a monitor. This would be:
+
+.. testcode::
+
+    from pytorch_lightning.utilities.cli import instantiate_class, LightningCLI
+
+    class MyModel(LightningModule):
+
+        def __init__(self, optimizer_init: dict, lr_scheduler_init: dict):
+            super().__init__()
+            self.optimizer_init = optimizer_init
+            self.lr_scheduler_init = lr_scheduler_init
+
+        def configure_optimizers(self):
+            optimizer = instantiate_class(self.parameters(), self.optimizer_init)
+            scheduler = instantiate_class(optimizer, self.lr_scheduler_init)
+            return {"optimizer": optimizer, "lr_scheduler": scheduler, "monitor": "metric_to_track"}
+
+    class MyLightningCLI(LightningCLI):
+
+        def add_arguments_to_parser(self, parser):
+            parser.add_optimizer_args(
+                torch.optim.Adam,
+                link_to='model.optimizer_init',
+            )
+            parser.add_lr_scheduler_args(
+                torch.optim.lr_scheduler.ReduceLROnPlateau,
+                link_to='model.lr_scheduler_init',
+            )
+
+    cli = MyLightningCLI(MyModel)
+
+For both possibilities of using :meth:`pytorch_lightning.utilities.cli.LightningArgumentParser.add_optimizer_args` with
+a single class or a tuple of classes, the value given to :code:`optimizer_init` will always be a dictionary including
+:code:`class_path` and :code:`init_args` entries. The function
+:func:`~pytorch_lightning.utilities.cli.instantiate_class` takes care of importing the class defined in
+:code:`class_path` and instantiating it using some positional arguments, in this case :code:`self.parameters()`, and the
+:code:`init_args`. Any number of optimizers and learning rate schedulers can be added when using :code:`link_to`.