Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Future structure #13265

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
10 changes: 5 additions & 5 deletions .actions/pull_legacy_checkpoints.sh
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#!/bin/bash
# Run this script from the project root.
URL="https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip"
mkdir -p legacy
mkdir -p test/legacy
# wget is simpler but does not work on Windows
python -c "from urllib.request import urlretrieve; urlretrieve('$URL', 'legacy/checkpoints.zip')"
ls -l legacy/
unzip -o legacy/checkpoints.zip -d legacy/
ls -l legacy/checkpoints/
python -c "from urllib.request import urlretrieve; urlretrieve('$URL', 'test/legacy/checkpoints.zip')"
ls -l test/legacy/
unzip -o test/legacy/checkpoints.zip -d test/legacy/
ls -l test/legacy/checkpoints/
17 changes: 12 additions & 5 deletions .azure-pipelines/gpu-benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ trigger:
include:
- "master"
- "release/*"
- "future/*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed before merge, right?

- "refs/tags/*"

pr: none
Expand All @@ -34,8 +35,14 @@ jobs:
clean: all

steps:
- bash: |
python -m pytest tests/benchmarks -v --durations=0
displayName: 'Testing: benchmarks'
env:
PL_RUNNING_BENCHMARKS: 1

- bash: |
pip install -e . -r requirements/strategies.txt
pip list
displayName: 'Install package'
Comment on lines +39 to +42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary


- bash: python -m pytest unittests_pl/benchmarks -v --durations=0
env:
PL_RUNNING_BENCHMARKS: 1
workingDirectory: test
displayName: 'Testing: benchmarks'
42 changes: 24 additions & 18 deletions .azure-pipelines/gpu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,15 @@ trigger:
include:
- "master"
- "release/*"
- "future/*"
- "refs/tags/*"
pr:
- "master"
- "release/*"
- "future/*"

jobs:
- job: pytest
- job: testing
strategy:
matrix:
'PyTorch - LTS':
Expand All @@ -28,15 +30,12 @@ jobs:
timeoutInMinutes: "100"
# how much time to give 'run always even if cancelled tasks' before stopping them
cancelTimeoutInMinutes: "2"

pool: azure-jirka-spot

container:
image: $(image)
# default shm size is 64m. Increase it to avoid:
# 'Error while creating shared memory: unhandled system error, NCCL version 2.7.8'
options: "--runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all --shm-size=512m"

workspace:
clean: all

Expand All @@ -56,8 +55,9 @@ jobs:
python -c "fname = 'requirements/strategies.txt' ; lines = [line for line in open(fname).readlines() if 'horovod' not in line] ; open(fname, 'w').writelines(lines)"
CUDA_VERSION_MM=$(python -c "import torch ; print(''.join(map(str, torch.version.cuda.split('.')[:2])))")
pip install "bagua-cuda$CUDA_VERSION_MM>=0.9.0"
pip install . --requirement requirements/devel.txt
pip install . --requirement requirements/strategies.txt
pip install -e .
pip install --requirement requirements/devel.txt
pip install --requirement requirements/strategies.txt
Comment on lines +58 to +60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pip install -e .
pip install --requirement requirements/devel.txt
pip install --requirement requirements/strategies.txt
pip install -e .[strategies]
pip install --requirement requirements/devel.txt

pip list
displayName: 'Install dependencies'

Expand All @@ -72,12 +72,16 @@ jobs:
- bash: bash .actions/pull_legacy_checkpoints.sh
displayName: 'Get legacy checkpoints'

- bash: |
python -m coverage run --source pytorch_lightning -m pytest pytorch_lightning tests --ignore tests/benchmarks -v --junitxml=$(Build.StagingDirectory)/test-results.xml --durations=50
displayName: 'Testing: standard'
- bash: python -m coverage run --source pytorch_lightning -m pytest pytorch_lightning
workingDirectory: src
displayName: 'Testing: doctests'

- bash: |
bash tests/standalone_tests.sh
- bash: python -m coverage run --source pytorch_lightning -m pytest unittests_pl --ignore unittests_pl/benchmarks -v --junitxml=$(Build.StagingDirectory)/test-results.xml --durations=50
displayName: 'Testing: unittests'
workingDirectory: test

- bash: bash run_standalone_tests.sh
workingDirectory: test
env:
PL_USE_MOCKED_MNIST: "1"
displayName: 'Testing: standalone'
Expand All @@ -86,8 +90,9 @@ jobs:
python -m coverage report
python -m coverage xml
python -m coverage html
python -m codecov --token=$(CODECOV_TOKEN) --commit=$(Build.SourceVersion) --flags=gpu,pytest --name="GPU-coverage" --env=linux,azure
python -m codecov --token=$(CODECOV_TOKEN) --commit=$(Build.SourceVersion) --flags=gpu,unittest --name="GPU-coverage" --env=linux,azure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python -m codecov --token=$(CODECOV_TOKEN) --commit=$(Build.SourceVersion) --flags=gpu,unittest --name="GPU-coverage" --env=linux,azure
python -m codecov --token=$(CODECOV_TOKEN) --commit=$(Build.SourceVersion) --flags=gpu,pytest --name="GPU-coverage" --env=linux,azure

ls -l
workingDirectory: test
displayName: 'Statistics'

- task: PublishTestResults@2
Expand All @@ -109,14 +114,15 @@ jobs:

- script: |
set -e
python -m pytest pl_examples -v --maxfail=2 --durations=0
bash pl_examples/run_examples.sh --trainer.accelerator=gpu --trainer.devices=1
bash pl_examples/run_examples.sh --trainer.accelerator=gpu --trainer.devices=2 --trainer.strategy=ddp
bash pl_examples/run_examples.sh --trainer.accelerator=gpu --trainer.devices=2 --trainer.strategy=ddp --trainer.precision=16
bash run_ddp_examples.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the test here with the other standalone tests

bash run_pl_examples.sh --trainer.accelerator=gpu --trainer.devices=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script should be examples/pytorch/run_examples.sh

bash run_pl_examples.sh --trainer.accelerator=gpu --trainer.devices=2 --trainer.strategy=ddp
bash run_pl_examples.sh --trainer.accelerator=gpu --trainer.devices=2 --trainer.strategy=ddp --trainer.precision=16
workingDirectory: examples
env:
PL_USE_MOCKED_MNIST: "1"
displayName: 'Testing: examples'

- bash: |
python -m pytest tests/benchmarks -v --maxfail=2 --durations=0
- bash: python -m pytest unittests_pl/benchmarks -v --maxfail=2 --durations=0
workingDirectory: test
displayName: 'Testing: benchmarks'
24 changes: 15 additions & 9 deletions .azure-pipelines/hpu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,20 @@ trigger:
include:
- "master"
- "release/*"
- "future/*"
- "refs/tags/*"
pr:
- "master"
- "release/*"
- "future/*"

jobs:
- job: tests

- job: testing
# how long to run the job before automatically cancelling
timeoutInMinutes: "10"
# how much time to give 'run always even if cancelled tasks' before stopping them
cancelTimeoutInMinutes: "2"

pool: intel-hpus

workspace:
clean: all

Expand All @@ -33,25 +32,32 @@ jobs:
displayName: 'Instance HW info'

- bash: |
pip install . --requirement requirements/extra.txt
pip install -e .[extra]
pip install . --requirement requirements/test.txt
displayName: 'Install dependencies'

- bash: |
python -m pytest -sv tests/accelerators/test_hpu.py --forked --junitxml=hpu1_test-results.xml
python -m pytest -sv unittests_pl/accelerators/test_hpu.py --forked --junitxml=hpu1_test-results.xml
workingDirectory: test
displayName: 'Single card HPU test'

- bash: |
python -m pytest -sv tests/accelerators/test_hpu.py --forked --hpus 8 --junitxml=hpu8_test-results.xml
python -m pytest -sv unittests_pl/accelerators/test_hpu.py --forked --hpus 8 --junitxml=hpu8_test-results.xml
workingDirectory: test
displayName: 'Multi card(8) HPU test'

- bash: |
python -m pytest -sv tests/plugins/precision/hpu/test_hpu.py --hmp-bf16 'tests/plugins/precision/hpu/ops_bf16.txt' --hmp-fp32 'tests/plugins/precision/hpu/ops_fp32.txt' --forked --junitxml=hpu1_precision_test-results.xml
python -m pytest -sv unittests_pl/plugins/precision/hpu/test_hpu.py --hmp-bf16 \
'unittests_pl/plugins/precision/hpu/ops_bf16.txt' --hmp-fp32 \
'unittests_pl/plugins/precision/hpu/ops_fp32.txt' --forked \
--junitxml=hpu1_precision_test-results.xml
workingDirectory: test
displayName: 'HPU precision test'

- bash: |
export PYTHONPATH="${PYTHONPATH}:$(pwd)"
python "pl_examples/hpu_examples/simple_mnist/mnist.py"
python "pl_hpu/mnist_sample.py"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure here should be examples/pytorch/hpu

workingDirectory: examples
displayName: 'Testing: HPU examples'

- task: PublishTestResults@2
Expand Down
31 changes: 19 additions & 12 deletions .azure-pipelines/ipu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,23 @@ trigger:
branches:
include:
- master
- release/*
- refs/tags/*
- "release/*"
- "future/*"
- "refs/tags/*"
pr:
- master
- release/*
- "release/*"
- "future/*"

variables:
- name: poplar_sdk
value: "poplar_sdk-ubuntu_20_04-2.3.1+793-89796d462d"

jobs:
- job: tests

- job: testing
# how long to run the job before automatically cancelling
timeoutInMinutes: "15"
pool: graphcore-ipus

workspace:
clean: all

Expand Down Expand Up @@ -55,7 +55,7 @@ jobs:
export GIT_TERMINAL_PROMPT=1
python ./requirements/adjust-versions.py requirements/extra.txt
python ./requirements/adjust-versions.py requirements/examples.txt
pip install . --requirement ./requirements/devel.txt
pip install -e . --requirement ./requirements/devel.txt
pip list
displayName: 'Install dependencies'

Expand All @@ -68,16 +68,23 @@ jobs:
set -eux
source ${{ variables.poplar_sdk }}/poplar-ubuntu*/enable.sh
source ${{ variables.poplar_sdk }}/popart-ubuntu*/enable.sh

python -c "import poptorch; print(poptorch.__version__)"
displayName: "Check poptorch installation"

- bash: |
source ${{ variables.poplar_sdk }}/poplar-ubuntu*/enable.sh
source ${{ variables.poplar_sdk }}/popart-ubuntu*/enable.sh
export POPTORCH_WAIT_FOR_IPU=1
export PL_RUN_IPU_TESTS=1
python -m coverage run --source pytorch_lightning -m pytest tests -vv --junitxml=$(Build.StagingDirectory)/test-results.xml --durations=50
cd src
python -m pytest pytorch_lightning
displayName: 'DocTests'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't add doctesting to these accelerator jobs as it complicates the pipeline for contributors


- bash: |
source ${{ variables.poplar_sdk }}/poplar-ubuntu*/enable.sh
source ${{ variables.poplar_sdk }}/popart-ubuntu*/enable.sh
cd test
python -m coverage run --source pytorch_lightning -m pytest unittests_pl -vv --durations=50
env:
MKL_THREADING_LAYER: "GNU"
displayName: 'Testing: standard'
POPTORCH_WAIT_FOR_IPU: 1
PL_RUN_IPU_TESTS: 1
displayName: 'UnitTests'
3 changes: 2 additions & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ trigger:
include:
- "master"
- "release/*"
- "future/*"
- "refs/tags/*"
pr:
- "master"
Expand Down Expand Up @@ -40,7 +41,7 @@ references:
# the image uses python 2.7 by default, force a different version
pyenv global 3.7.3
python --version
pip install -r requirements/docs.txt
pip install -e . -r requirements/docs.txt
pip list
cd docs
make clean
Expand Down
74 changes: 37 additions & 37 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -16,48 +16,48 @@
/docs/ @edenlightning @tchaton @borda @awaelchli @RobertLaurella
/.github/*.md @edenlightning @williamfalcon @borda
/.github/ISSUE_TEMPLATE/ @edenlightning @borda @tchaton
/docs/source/conf.py @borda @awaelchli @carmocca
/docs/source/index.rst @williamfalcon
/docs/source/levels @williamfalcon @RobertLaurella
/docs/source/expertise_levels @williamfalcon @RobertLaurella
/docs/source-PL/conf.py @borda @awaelchli @carmocca
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this - separator instead of

docs/pytorch_lightning/source/...

/docs/source-PL/index.rst @williamfalcon
/docs/source-PL/levels @williamfalcon @RobertLaurella
/docs/source-PL/expertise_levels @williamfalcon @RobertLaurella

# Packages
/pytorch_lightning/accelerators @williamfalcon @tchaton @SeanNaren @awaelchli @justusschock @kaushikb11
/pytorch_lightning/callbacks @williamfalcon @tchaton @carmocca @borda @kaushikb11
/pytorch_lightning/core @tchaton @SeanNaren @borda @carmocca @justusschock @kaushikb11
/pytorch_lightning/distributed @williamfalcon @tchaton @awaelchli @kaushikb11
/pytorch_lightning/lite @tchaton @awaelchli @carmocca
/pytorch_lightning/loggers @tchaton @awaelchli @borda
/pytorch_lightning/loggers/wandb.py @borisdayma
/pytorch_lightning/loggers/neptune.py @shnela @HubertJaworski @pkasprzyk @pitercl @Raalsky @aniezurawski @kamil-kaczmarek
/pytorch_lightning/loops @tchaton @awaelchli @justusschock @carmocca
/pytorch_lightning/overrides @tchaton @SeanNaren @borda
/pytorch_lightning/plugins @tchaton @SeanNaren @awaelchli @justusschock
/pytorch_lightning/profiler @williamfalcon @tchaton @borda @carmocca
/pytorch_lightning/profiler/pytorch.py @nbcsm @guotuofeng
/pytorch_lightning/strategies @tchaton @SeanNaren @awaelchli @justusschock @kaushikb11
/pytorch_lightning/trainer @williamfalcon @borda @tchaton @SeanNaren @carmocca @awaelchli @justusschock @kaushikb11
/pytorch_lightning/trainer/connectors @tchaton @SeanNaren @carmocca @borda
/pytorch_lightning/tuner @SkafteNicki @borda @awaelchli
/pytorch_lightning/utilities @borda @tchaton @SeanNaren @carmocca
/src/pytorch_lightning/accelerators @williamfalcon @tchaton @SeanNaren @awaelchli @justusschock @kaushikb11
/src/pytorch_lightning/callbacks @williamfalcon @tchaton @carmocca @borda @kaushikb11
/src/pytorch_lightning/core @tchaton @SeanNaren @borda @carmocca @justusschock @kaushikb11
/src/pytorch_lightning/distributed @williamfalcon @tchaton @awaelchli @kaushikb11
/src/pytorch_lightning/lite @tchaton @awaelchli @carmocca
/src/pytorch_lightning/loggers @tchaton @awaelchli @borda
/src/pytorch_lightning/loggers/wandb.py @borisdayma
/src/pytorch_lightning/loggers/neptune.py @shnela @HubertJaworski @pkasprzyk @pitercl @Raalsky @aniezurawski @kamil-kaczmarek
/src/pytorch_lightning/loops @tchaton @awaelchli @justusschock @carmocca
/src/pytorch_lightning/overrides @tchaton @SeanNaren @borda
/src/pytorch_lightning/plugins @tchaton @SeanNaren @awaelchli @justusschock
/src/pytorch_lightning/profiler @williamfalcon @tchaton @borda @carmocca
/src/pytorch_lightning/profiler/pytorch.py @nbcsm @guotuofeng
/src/pytorch_lightning/strategies @tchaton @SeanNaren @awaelchli @justusschock @kaushikb11
/src/pytorch_lightning/trainer @williamfalcon @borda @tchaton @SeanNaren @carmocca @awaelchli @justusschock @kaushikb11
/src/pytorch_lightning/trainer/connectors @tchaton @SeanNaren @carmocca @borda
/src/pytorch_lightning/tuner @SkafteNicki @borda @awaelchli
/src/pytorch_lightning/utilities @borda @tchaton @SeanNaren @carmocca

# Specifics
/pytorch_lightning/trainer/connectors/logger_connector @tchaton @carmocca
/pytorch_lightning/trainer/progress.py @tchaton @awaelchli @carmocca
/src/pytorch_lightning/trainer/connectors/logger_connector @tchaton @carmocca
/src/pytorch_lightning/trainer/progress.py @tchaton @awaelchli @carmocca

# API
/pytorch_lightning/callbacks/base.py @williamfalcon @awaelchli @ananthsub @carmocca
/pytorch_lightning/core/datamodule.py @williamFalcon @awaelchli @ananthsub @carmocca
/pytorch_lightning/trainer/trainer.py @williamfalcon @tchaton @awaelchli
/pytorch_lightning/core/hooks.py @williamfalcon @tchaton @awaelchli @ananthsub @carmocca
/pytorch_lightning/core/lightning.py @williamfalcon @tchaton @awaelchli
/src/pytorch_lightning/callbacks/base.py @williamfalcon @awaelchli @ananthsub @carmocca
/src/pytorch_lightning/core/datamodule.py @williamFalcon @awaelchli @ananthsub @carmocca
/src/pytorch_lightning/trainer/trainer.py @williamfalcon @tchaton @awaelchli
/src/pytorch_lightning/core/hooks.py @williamfalcon @tchaton @awaelchli @ananthsub @carmocca
/src/pytorch_lightning/core/lightning.py @williamfalcon @tchaton @awaelchli

# Testing
/tests/helpers/boring_model.py @williamfalcon @tchaton @borda

/.github/CODEOWNERS @williamfalcon
/.github/approve_config.yml @williamfalcon
/SECURITY.md @williamfalcon
/README.md @williamfalcon @edenlightning @borda
/setup.py @williamfalcon @borda @carmocca
/pytorch_lightning/__about__.py @williamfalcon @borda @carmocca
/test/unittests_pl/helpers/boring_model.py @williamfalcon @tchaton @borda

/.github/CODEOWNERS @williamfalcon
/.github/approve_config.yml @williamfalcon
/SECURITY.md @williamfalcon
/README.md @williamfalcon @edenlightning @borda
/setup.py @williamfalcon @borda @carmocca
/src/pytorch_lightning/__about__.py @williamfalcon @borda @carmocca
Loading