Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightning Dataset (including optimized dataloading of s3 buckets) #17743

Merged
merged 250 commits into from
Jun 13, 2023
Merged
Show file tree
Hide file tree
Changes from 202 commits
Commits
Show all changes
250 commits
Select commit Hold shift + click to select a range
0ec5afd
Lightning DataLoader
Jun 1, 2023
ad6a05e
lightning dataloader
Jun 1, 2023
475b75d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 1, 2023
741e168
init
Jun 1, 2023
0b74481
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 1, 2023
8dd1cf0
example
Jun 1, 2023
ba06925
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 1, 2023
500c2d9
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 1, 2023
4e398d5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 1, 2023
481ed41
env var
Jun 1, 2023
47ef6b9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 1, 2023
0d046bf
Update src/lightning/pytorch/utilities/data/__init__.py
nohalon Jun 1, 2023
0cb617e
remove unused functions
Jun 1, 2023
bd7d1e6
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 1, 2023
3b373ed
extra reqs
Jun 1, 2023
aa0d423
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 1, 2023
7585c4b
Update src/lightning/pytorch/utilities/data/fileio.py
nohalon Jun 1, 2023
1374542
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 1, 2023
34f5b01
imports work now! yay
Jun 1, 2023
812d6fa
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 1, 2023
bff5587
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 1, 2023
996a32a
tests
Jun 1, 2023
ce3e949
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 1, 2023
58fea16
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 1, 2023
91ec960
imports
Jun 1, 2023
4f509c3
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 1, 2023
7a65260
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 1, 2023
665b409
missing import
Jun 1, 2023
7c57fee
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 1, 2023
4529f53
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 1, 2023
6333177
error handling
Jun 2, 2023
a6d456c
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 2, 2023
d50dbe0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
c9799f5
update creds for local use case
Jun 2, 2023
01411b7
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 2, 2023
66c8316
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
a84ebd4
codeowners
Jun 2, 2023
1662e02
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 2, 2023
fc0a5c3
recursive get index
nohalon Jun 2, 2023
1c9cfc4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
15114e1
index
nohalon Jun 2, 2023
4aa66c8
Merge branch 'lightning_dataloader' of https://github.com/Lightning-A…
nohalon Jun 2, 2023
ba08475
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
a1242fe
clean up get index
Jun 2, 2023
8a5c826
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 2, 2023
33af351
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
a505bea
update imagenet example
Jun 2, 2023
b4680c9
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 2, 2023
8f0e10c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
be0d2bb
docstrings
Jun 2, 2023
9d27632
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 2, 2023
e37043a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
2680b5b
docstrings
Jun 2, 2023
c4d308f
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 2, 2023
7f13685
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
851b8e1
docstrings
Jun 2, 2023
10c9cd3
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 2, 2023
11197a8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
ccf258e
example cleanup
Jun 2, 2023
d943fc1
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 2, 2023
3d899d0
Merge branch 'master' into lightning_dataloader
nohalon Jun 2, 2023
3df8cbe
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
054f52a
changelog
Jun 2, 2023
d55191a
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 2, 2023
18bf064
reqs
Jun 2, 2023
5482c9b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
cff7d75
codeowners
Jun 2, 2023
50581fb
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 2, 2023
176cf98
requirements
Jun 2, 2023
993c98d
expose LightningDataset too
Jun 2, 2023
43dd253
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
bea0787
expost LightningDataset at top level
Jun 2, 2023
151535d
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 2, 2023
0b5647a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
ec49b6d
remove unused private methods from init
Jun 6, 2023
029c086
remove private imports
Jun 6, 2023
3f0f3f1
upper bound on extra requirements
Jun 6, 2023
f82c08c
review comments
Jun 6, 2023
25e541b
Merge branch 'master' of github.com:Lightning-AI/lightning into light…
Jun 6, 2023
e13c381
loosen req
Jun 6, 2023
74d24ac
deps
Jun 6, 2023
8697134
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 6, 2023
69c4a6c
test updating fabric base req
Jun 6, 2023
a4aec63
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 6, 2023
a2071ad
remove version pin on s3fs to test
Jun 6, 2023
92d27a0
recover missing function
Jun 6, 2023
3573c39
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 6, 2023
58dbfe6
tests
Jun 6, 2023
e6e21bd
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 6, 2023
d3ba8ef
update
Jun 6, 2023
0715b19
random
Jun 6, 2023
d29c896
torchdata >= 0.3.0
Borda Jun 6, 2023
bb99b24
update torchdata version
Jun 6, 2023
2d8b3dd
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 6, 2023
f929c1d
remove torchdata version to test
Jun 6, 2023
9cb59c0
try rem torch version pin
Jun 6, 2023
4c81a68
req
Jun 6, 2023
bb3ab55
update bucket in test
Jun 6, 2023
51b08ef
req
Jun 6, 2023
fd1466e
skips
Borda Jun 6, 2023
10a809a
Merge branch 'lightning_dataloader' of https://github.com/PyTorchLigh…
Borda Jun 6, 2023
e2d0e9c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 6, 2023
b4de7a6
import
Jun 6, 2023
35dc574
update structure to lightning.data
Jun 6, 2023
59b2800
base.txt for data reqs
Jun 6, 2023
45119c2
fix imports
Jun 6, 2023
4be3d03
rename to LightningS3Dataset
Jun 6, 2023
6fe029f
new workflow
Jun 6, 2023
76bb646
dont need to test warnings
Jun 6, 2023
a6e04fc
reqs
Jun 6, 2023
9296738
req
Jun 6, 2023
5d0b870
revert data folder in pytorch
Jun 6, 2023
6c6ffc5
test import
Jun 6, 2023
e1077e9
tests
Jun 6, 2023
aa38485
req
Jun 6, 2023
660d789
req
Jun 6, 2023
24c8a0b
req
Jun 6, 2023
b6cca17
torch version
Jun 6, 2023
1597da5
req
Jun 6, 2023
49e8250
req
Jun 6, 2023
e467cc7
open dep
Jun 6, 2023
d6c36b7
reformatted
Jun 6, 2023
e962f4c
pin strict
Jun 6, 2023
c85213c
pin strict extra
Jun 6, 2023
77b160f
req
Jun 6, 2023
a97e9ae
modify workflow, no cache
Jun 6, 2023
632707b
try
Jun 6, 2023
c877815
patch
Jun 6, 2023
cceed3e
import
Jun 6, 2023
6209c16
fix
Jun 6, 2023
ba86531
dataset test
Jun 7, 2023
1d3021c
update getattr
Jun 7, 2023
a8a4652
pin everything to test
Jun 7, 2023
9f5939c
remove torch preinstall from workflow
Jun 7, 2023
3dcb399
workflow
Jun 7, 2023
cc19771
req
Jun 7, 2023
0c5e90c
Update .github/workflows/ci-tests-data.yml
nohalon Jun 7, 2023
0657c13
workflow
Jun 7, 2023
71a6e1e
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 7, 2023
6befe61
workflow
Jun 7, 2023
87c6993
req
Jun 7, 2023
2ad2c3f
Update .github/workflows/ci-tests-data.yml
nohalon Jun 7, 2023
06a6144
workflow
Jun 7, 2023
153c4dc
print
Jun 7, 2023
06c6a30
skip test for now
Jun 7, 2023
0a05d0b
update path join
Jun 7, 2023
b551ce3
revert app dep version bump
Jun 7, 2023
ae26f72
Update .github/workflows/ci-tests-data.yml
nohalon Jun 7, 2023
5cf05bc
workflow updates
Jun 7, 2023
7ff127b
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 7, 2023
936e647
app base req
Jun 7, 2023
708782c
req
Jun 7, 2023
acc95cf
window test failure
Jun 7, 2023
436ccda
add data req to assistant
Jun 7, 2023
5342dcf
Merge branch 'master' into lightning_dataloader
justusschock Jun 7, 2023
8239a7b
try
justusschock Jun 7, 2023
88830e0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 7, 2023
46d2964
add missing comma
justusschock Jun 7, 2023
4cf5715
updates
justusschock Jun 7, 2023
1c65df7
update
justusschock Jun 7, 2023
8f7d89f
Merge branch 'lightning_dataloader' of https://github.com/lightning-a…
justusschock Jun 7, 2023
41f7e46
typo
justusschock Jun 7, 2023
29bd7f8
requirements
justusschock Jun 7, 2023
d818307
try widening req
Jun 7, 2023
6fa96d0
older torch version
Jun 7, 2023
ca9ba49
update
justusschock Jun 7, 2023
d7c5be5
Merge branch 'lightning_dataloader' of https://github.com/lightning-a…
justusschock Jun 7, 2023
5f1b8d2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 7, 2023
303a2fa
update
justusschock Jun 7, 2023
049d909
update
justusschock Jun 7, 2023
3c6677d
update
justusschock Jun 7, 2023
40caf27
update
justusschock Jun 7, 2023
8773b16
cleanup tests
justusschock Jun 7, 2023
798e310
Merge branch 'lightning_dataloader' of https://github.com/lightning-a…
justusschock Jun 7, 2023
a86236b
typo again
justusschock Jun 7, 2023
ab10edb
update
justusschock Jun 7, 2023
d8e7273
remove unnecessary line
justusschock Jun 7, 2023
9a02922
Merge branch 'lightning_dataloader' of https://github.com/lightning-a…
justusschock Jun 7, 2023
6a11c08
Update .github/CODEOWNERS
justusschock Jun 7, 2023
6ddce8f
Discard changes to requirements/pytorch/base.txt
justusschock Jun 7, 2023
5b5491f
Discard changes to requirements/fabric/base.txt
justusschock Jun 7, 2023
2994cfe
Discard changes to requirements/app/base.txt
justusschock Jun 7, 2023
be7cffb
Merge branch 'master' into lightning_dataloader
justusschock Jun 7, 2023
dd7428f
requirements
justusschock Jun 7, 2023
20815f3
requirements
justusschock Jun 7, 2023
1c0050a
one line
Jun 8, 2023
4e9ca29
app workflow pick only app reqs
Jun 8, 2023
4159015
rename package
Jun 8, 2023
355bfb6
undo
Jun 8, 2023
365ac24
don't use cache
Jun 8, 2023
9ea77de
examples CI
Jun 8, 2023
72c6bb2
pytorch and fabric CI
Jun 8, 2023
81e09e9
try remove cache
Jun 8, 2023
fa6a309
Apply suggestions from code review
Borda Jun 9, 2023
b31fe8f
Merge branch 'master' into lightning_dataloader
Borda Jun 9, 2023
d119adb
jirka playing
Borda Jun 9, 2023
2a8e06f
jirka playing
Borda Jun 9, 2023
851c7b4
jirka playing
Borda Jun 9, 2023
ff42309
blah
Jun 9, 2023
906425c
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 9, 2023
24f02c4
flatten LightningDataset
Jun 9, 2023
03d14d6
cleans up dataset class
Jun 9, 2023
0f23411
jirka playing
Borda Jun 9, 2023
4c12c72
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 9, 2023
f90d683
jirka playing
Borda Jun 9, 2023
0b2fd0e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 9, 2023
772cbe7
extra
Borda Jun 9, 2023
38f2576
Merge branch 'lightning_dataloader' of https://github.com/PyTorchLigh…
Borda Jun 9, 2023
4aae186
fix dataset test
Jun 12, 2023
15e049d
update checkgroups
Jun 12, 2023
9becd1a
Luca's review comments
Jun 12, 2023
77da80c
val error fix
Jun 12, 2023
1fcecdd
unskip test
Jun 12, 2023
20189fa
Merge branch 'master' of github.com:Lightning-AI/lightning into light…
Jun 12, 2023
a5c70d0
min
Borda Jun 12, 2023
e00b7c6
fix precommit warning
Jun 12, 2023
82c08ae
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 12, 2023
03cfc1a
cpu
Borda Jun 12, 2023
4442b1f
Merge branch 'lightning_dataloader' of https://github.com/PyTorchLigh…
Borda Jun 12, 2023
2fb5450
docstrings
Jun 12, 2023
201fdc4
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 12, 2023
a52780e
req
Borda Jun 12, 2023
9fc99d1
Merge branch 'lightning_dataloader' of https://github.com/PyTorchLigh…
Borda Jun 12, 2023
2acbaec
2.0.1
Borda Jun 12, 2023
98907ba
add return type
Jun 12, 2023
82525de
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 12, 2023
82f1252
typing errors
Jun 12, 2023
8479047
Merge branch 'master' of github.com:Lightning-AI/lightning into light…
Jun 12, 2023
78da2c5
req
Jun 12, 2023
6486b2f
return types with quotations
Jun 12, 2023
9d63b1f
import for type-checking
justusschock Jun 12, 2023
19115cc
no botocore in cloudagnostic code
justusschock Jun 12, 2023
37e038d
exit args
Jun 12, 2023
18c1d74
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 12, 2023
a7b4f48
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 12, 2023
588b5f9
update
justusschock Jun 12, 2023
9044345
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 12, 2023
dd7ecd9
backends typing
justusschock Jun 12, 2023
a639788
remove oldest from data tests
Jun 12, 2023
8a27725
Merge branch 'lightning_dataloader' of github.com:Lightning-AI/lightn…
Jun 12, 2023
8f02186
typing
Jun 12, 2023
998753f
typing
Jun 12, 2023
7c2b427
typing
Jun 12, 2023
dd64bfa
types
Jun 12, 2023
5a9e1ad
type
Jun 12, 2023
fcaadd3
typing
Jun 12, 2023
64c09e6
typing
Jun 12, 2023
47479be
typing
Jun 12, 2023
f474fe9
import fix
Jun 12, 2023
e1b0c56
Changelog
Jun 12, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .actions/assistant.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,11 @@
"requirements/fabric/base.txt",
"requirements/fabric/strategies.txt",
),
"data": (
"requirements/data/data.txt",
"requirements/data/cloud.txt",
"requirements/data/examples.txt",
),
}
REQUIREMENT_FILES_ALL = list(chain(*REQUIREMENT_FILES.values()))

Expand Down Expand Up @@ -404,6 +409,7 @@ def _replace_min(fname: str) -> None:
def replace_oldest_ver(requirement_fnames: Sequence[str] = REQUIREMENT_FILES_ALL) -> None:
"""Replace the min package version by fixed one."""
for fname in requirement_fnames:
print(fname)
AssistantCLI._replace_min(fname)

@staticmethod
Expand Down
5 changes: 5 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,11 @@
/src/lightning/pytorch/core/hooks.py @williamfalcon @tchaton @awaelchli @carmocca
/src/lightning/pytorch/core/module.py @williamfalcon @tchaton @awaelchli @carmocca

# Data Utilities
/examples/data/ @nohalon @justusschock
/src/lightning/data/ @nohalon @justusschock
/tests/tests_data @nohalon @justusschock

# Lightning Fabric
/src/lightning/fabric @awaelchli @carmocca @justusschock
/src/lightning_fabric @awaelchli @carmocca @justusschock
Expand Down
118 changes: 118 additions & 0 deletions .github/workflows/ci-tests-data.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
name: Test Data

# see: https://help.github.com/en/actions/reference/events-that-trigger-workflows
on:
push:
branches: [master, "release/*"]
pull_request:
branches: [master, "release/*"]
types: [opened, reopened, ready_for_review, synchronize] # added `ready_for_review` since draft is skipped
paths:
- ".actions/**"
- "requirements/data/**"
- "src/lightning/data/**"
- "tests/tests_data/**"
- "pyproject.toml" # includes pytest config
- ".github/workflows/ci-tests-data.yml"
- "!requirements/*/docs.txt"
- "!*.md"
- "!**/*.md"

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.head_ref }}
cancel-in-progress: ${{ ! (github.ref == 'refs/heads/master' || startsWith(github.ref, 'refs/heads/release/')) }}

defaults:
run:
shell: bash

jobs:
data-cpu:
runs-on: ${{ matrix.os }}
if: github.event.pull_request.draft == false
strategy:
fail-fast: false
matrix:
include:
- {os: "macOS-11", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.0"}
- {os: "ubuntu-20.04", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.0"}
- {os: "windows-2022", pkg-name: "lightning", python-version: "3.10", pytorch-version: "2.0"}
# "oldest" versions tests, only on minimum Python
- {os: "macOS-11", pkg-name: "lightning", python-version: "3.8", pytorch-version: "2.0", requires: "oldest"}
- {os: "ubuntu-20.04", pkg-name: "lightning", python-version: "3.8", pytorch-version: "2.0", requires: "oldest"}
- {os: "windows-2022", pkg-name: "lightning", python-version: "3.8", pytorch-version: "2.0", requires: "oldest"}
timeout-minutes: 25 # because of building grpcio on Mac
env:
PACKAGE_NAME: ${{ matrix.pkg-name }}
FREEZE_REQUIREMENTS: ${{ ! (github.ref == 'refs/heads/master' || startsWith(github.ref, 'refs/heads/release/')) }}
# PYPI_CACHE_DIR: "_pip-wheels"
TORCH_URL_STABLE: "https://download.pytorch.org/whl/cpu/torch_stable.html"
TORCH_URL_TEST: "https://download.pytorch.org/whl/test/cpu/torch_test.html"
steps:
- uses: actions/checkout@v3

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: basic setup
run: pip install -q -r .actions/requirements.txt

- name: Set min. dependencies
if: ${{ matrix.requires == 'oldest' }}
run: |
python .actions/assistant.py replace_oldest_ver

- name: Adjust PyTorch versions in requirements files
if: ${{ matrix.requires != 'oldest' && matrix.release != 'pre' }}
run: |
pip install -q wget packaging
python -m wget https://raw.githubusercontent.com/Lightning-AI/utilities/main/scripts/adjust-torch-versions.py
for fpath in `ls requirements/data/*.txt`; do \
python ./adjust-torch-versions.py $fpath ${{ matrix.pytorch-version }}; \
done
cat requirements/data/data.txt
cat requirements/data/cloud.txt

# - name: pip wheels cache
# uses: actions/cache/restore@v3
# with:
# path: ${{ env.PYPI_CACHE_DIR }}
# key: pypi_wheels
# - run: |
# mkdir -p $PYPI_CACHE_DIR
# ls -lh $PYPI_CACHE_DIR

# removing torch stable line:
# pip install -e ".[${extra}test]" "pytest-timeout" -U -f ${TORCH_URL} ${TORCH_PREINSTALL} -f ${PYPI_CACHE_DIR} --prefer-binary
- name: Install package & dependencies
run: |
python -m pip install -q pip -U
pip install -e ".[data-dev]" "pytest-timeout" -U -f ${TORCH_URL} --prefer-binary
pip list

- name: Testing Data
working-directory: tests/tests_data
# NOTE: do not include coverage report here, see: https://github.com/nedbat/coveragepy/issues/1003
run: |
python -m coverage run --source lightning \
-m pytest -v --timeout=30 --durations=50

- name: Statistics
if: success()
working-directory: tests/tests_data
run: |
coverage report
coverage xml

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
# see: https://github.com/actions/toolkit/issues/399
continue-on-error: true
with:
token: ${{ secrets.CODECOV_TOKEN }}
file: tests/tests_data/coverage.xml
flags: lightning,cpu,pytest,python${{ matrix.python-version }}
name: CPU-coverage
fail_ci_if_error: false
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,10 @@ our_model.tar
test.png
saved_models
data/
!src/lightning/data/
!examples/data/
!tests/tests_pytorch/utilities/data/
!requirements/data/
.shared
.lightning
node_modules/
Expand Down
190 changes: 190 additions & 0 deletions examples/data/image/imagenet.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
import os
import traceback
from argparse import ArgumentParser
from typing import Callable, Literal, Optional

import torch
import torch.nn.functional as F
import torch.optim as optim
import torch.optim.lr_scheduler as lr_scheduler

import lightning as L
from lightning.pytorch.utilities.model_helpers import get_torchvision_model

parser = ArgumentParser()
parser.add_argument("--workers", default=4, type=int)
parser.add_argument("--batchsize", default=56, type=int)
parser.add_argument("-e", "--evaluate", dest="evaluate", action="store_true", help="evaluate model on validation set")
args = parser.parse_args()

# --------------------------------
# Step 1: Define a LightningModule
# --------------------------------


class ImageNetLightningModel(L.LightningModule):
"""
>>> ImageNetLightningModel(data_path='missing') # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
ImageNetLightningModel(
(model): ResNet(...)
)
"""

from torchvision.models.resnet import ResNet18_Weights

def __init__(
self,
data_path: str,
index_file_path: str = None,
arch: str = "resnet18",
weights=ResNet18_Weights.IMAGENET1K_V1,
lr: float = 1e-4,
momentum: float = 0.9,
weight_decay: float = 1e-4,
batch_size: int = 256,
workers: int = 4,
):
super().__init__()
self.arch = arch
self.weights = weights
self.lr = lr
self.momentum = momentum
self.weight_decay = weight_decay
self.batch_size = batch_size
self.workers = workers
self.data_path = data_path
self.index_file_path = index_file_path
self.model = get_torchvision_model(self.arch, weights=self.weights)
self.train_dataset: Optional[Dataset] = None
self.eval_dataset: Optional[Dataset] = None

def forward(self, x):
return self.model(x)

def training_step(self, batch, batch_idx):
images, target = batch
output = self.model(images)
loss_train = F.cross_entropy(output, target)
self.log("train_loss", loss_train)
return loss_train

def eval_step(self, batch, batch_idx, prefix: str):
images, target = batch
output = self.model(images)
loss_val = F.cross_entropy(output, target)
self.log(f"{prefix}_loss", loss_val)
return loss_val

def validation_step(self, batch, batch_idx):
return self.eval_step(batch, batch_idx, "val")

def test_step(self, batch, batch_idx):
return self.eval_step(batch, batch_idx, "test")

def configure_optimizers(self):
optimizer = optim.SGD(self.parameters(), lr=self.lr, momentum=self.momentum, weight_decay=self.weight_decay)
scheduler = lr_scheduler.LambdaLR(optimizer, lambda epoch: 0.1 ** (epoch // 30))
return [optimizer], [scheduler]

def train_dataloader(self):
import torchvision as tv

transforms = tv.transforms.Compose([tv.transforms.RandomResizedCrop(224), tv.transforms.ToTensor()])

train_dataset = S3LightningImagenetDataset(
data_source=self.data_path, split="train", transforms=transforms, path_to_index_file=self.index_file_path
)

return torch.utils.data.DataLoader(
dataset=train_dataset, batch_size=self.batch_size, shuffle=True, num_workers=self.workers
)

def val_dataloader(self):
import torchvision as tv

transforms = tv.transforms.Compose([tv.transforms.RandomResizedCrop(224), tv.transforms.ToTensor()])

val_dataset = S3LightningImagenetDataset(
data_source=self.data_path, split="val", transforms=transforms, path_to_index_file=self.index_file_path
)

return torch.utils.data.DataLoader(
dataset=val_dataset, batch_size=self.batch_size, shuffle=True, num_workers=self.workers
)

def test_dataloader(self):
return self.val_dataloader()


# -------------------
# Step 2: Define data
# -------------------


class S3LightningImagenetDataset(L.LightningDataset):
def __init__(
self,
data_source: str,
split: Literal["train", "val"],
transforms: Optional[Callable] = None,
path_to_index_file: Optional[str] = None,
):
from torchvision.models._meta import _IMAGENET_CATEGORIES

super().__init__(data_source=data_source, backend="s3", path_to_index_file=path_to_index_file)

# only get files for the split
self.files = tuple([x for x in self.files if split in x])

# get unique classes
self.classes = _IMAGENET_CATEGORIES

self.transforms = transforms

def load_sample(self, file_path, stream):
from PIL import Image

try:
img = Image.open(stream)

if self.transforms is not None:
img = self.transforms(img)

# Converting grey scale images to RGB
if img.shape[0] == 1:
img = img.repeat((3, 1, 1))

curr_cls = os.path.basename(os.path.dirname(file_path)).replace("_", " ")
cls_idx = self.classes.index(curr_cls)
return img, cls_idx
except Exception:
print(file_path, traceback.print_exc())
pass


if __name__ == "__main__":
# os.environ["AWS_ACCESS_KEY"] = <your aws access key>
# os.environ["AWS_SECRET_ACCESS_KEY"] = <your aws secret key>

data_path = "s3://imagenet-tiny"
index_file_path = "imagenet/imagenet-index.txt"

# -------------------
# Step 3: Train
# -------------------

print("Instantiate Model")
model = ImageNetLightningModel(
weights=None,
data_path=data_path,
index_file_path=index_file_path,
batch_size=args.batchsize,
workers=args.workers,
)
trainer = L.Trainer()

print("Train Model")
if args.evaluate:
trainer.test(model)
else:
trainer.fit(model)
5 changes: 5 additions & 0 deletions requirements/data/cloud.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# NOTE: the upper bound for the package version is only set for CI stability, and it is dropped while installing this package
# in case you want to preserve/enforce restrictions on the latest compatible version, add "strict" as an in-line comment

fsspec[http] >2021.06.0, <2023.5.0
s3fs >=2022.5.0, <=2022.11.1
6 changes: 6 additions & 0 deletions requirements/data/data.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# NOTE: the upper bound for the package version is only set for CI stability, and it is dropped while installing this package
# in case you want to preserve/enforce restrictions on the latest compatible version, add "strict" as an in-line comment

lightning-utilities >=0.8.0, <0.9.0
torchdata >0.6.0, < 0.7.0
torch >2.0.0, < 2.1.0
3 changes: 3 additions & 0 deletions requirements/data/examples.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Pillow >= 9.5.0
# min version to match torch >= 2.0.1
torchvision >=0.15.2, <=0.16
5 changes: 5 additions & 0 deletions requirements/data/test.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
coverage ==7.2.5
pytest ==7.3.1
pytest-cov ==4.0.0
pytest-rerunfailures ==10.3
pytest-random-order ==1.1.0
2 changes: 1 addition & 1 deletion requirements/fabric/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# in case you want to preserve/enforce restrictions on the latest compatible version, add "strict" as an in-line comment

numpy >=1.17.2, <1.24.4
torch >=1.11.0, <=2.0.0
torch >=1.11.0, <2.1.0
fsspec[http]>2021.06.0, <2023.5.0
packaging >=17.1, <=23.0
typing-extensions >=4.0.0, <=4.4.0
Expand Down
2 changes: 1 addition & 1 deletion requirements/pytorch/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# in case you want to preserve/enforce restrictions on the latest compatible version, add "strict" as an in-line comment

numpy >=1.17.2, <1.24.4
torch >=1.11.0, <=2.0.0
torch >=1.11.0, <2.1.0
tqdm >=4.57.0, <4.66.0
PyYAML >=5.4, <=6.0
fsspec[http] >2021.06.0, <2023.5.0
Expand Down
Loading