Set limits for `fetcher.done` #18441

awaelchli · 2023-08-30T18:26:47Z

What does this PR do?

Follow up to #18376 making the dataloader_iter respect the limits set in the Trainer.
Fixes #18334

Debugging script to compare iterations to master branch (demonstrates NeMo use case):

import torch
from torch.utils.data import DataLoader, Dataset

from lightning.pytorch import LightningModule, Trainer

global_batch_size = 4
micro_batch_size = 2
assert global_batch_size % micro_batch_size == 0


class RandomDataset(Dataset):
    def __init__(self, length):
        self.len = length
        self.data = torch.randn(length, 32)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)
        self.val_fetched = 0
        self.val_iter_raised = False
        self.val_iter_done = False
        self.val_step_entered = 0

        self.train_fetched = 0
        self.train_iter_raised = False
        self.train_iter_done = False
        self.train_step_entered = 0

    def training_step(self, dataloader_iter, batch_idx):
        self.train_step_entered += 1
        self.train_iter_done = dataloader_iter.done
        for i in range(global_batch_size // micro_batch_size):
            try:
                batch = next(dataloader_iter)
            except StopIteration:
                self.train_iter_raised = True
                return None
            self.train_fetched += 1
        return self.layer(batch).sum()

    def validation_step(self, dataloader_iter, batch_idx):
        self.val_step_entered += 1
        self.val_iter_done = dataloader_iter.done
        for i in range(global_batch_size // micro_batch_size):
            try:
                batch = next(dataloader_iter)
            except StopIteration:
                self.val_iter_raised = True
                return
            self.val_fetched += 1
            self.layer(batch).sum()

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)


train_data = DataLoader(RandomDataset(length=16), batch_size=micro_batch_size)
val_data = DataLoader(RandomDataset(length=16), batch_size=micro_batch_size)

model = BoringModel()
trainer = Trainer(
    # limit_train_batches=3,
    limit_val_batches=4,
    num_sanity_val_steps=0,
    # max_steps=2,
    max_epochs=1,
    accelerator="cpu",
)
trainer.fit(model, train_data, val_data)
# trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)

print("train fetched", model.train_fetched)
print("train step entered", model.train_step_entered)
print("train iter exhausted", model.train_iter_raised)

print("val fetched", model.val_fetched)
print("val step entered", model.val_step_entered)
print("val iter exhausted", model.val_iter_raised)

cc @Borda @justusschock @awaelchli @carmocca

for more information, see https://pre-commit.ci

github-actions · 2023-08-31T12:07:05Z

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Tests workflow

Check ID	Status
pl-cpu (macOS-11, lightning, 3.8, 1.11)	success	✅
pl-cpu (macOS-11, lightning, 3.9, 1.12)	success	✅
pl-cpu (macOS-11, lightning, 3.10, 1.13)	success	✅
pl-cpu (macOS-11, lightning, 3.10, 2.0)	success	✅
pl-cpu (macOS-11, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.11)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.9, 1.12)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.10, 1.13)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.10, 2.0)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (windows-2022, lightning, 3.8, 1.11)	success	✅
pl-cpu (windows-2022, lightning, 3.9, 1.12)	success	✅
pl-cpu (windows-2022, lightning, 3.10, 1.13)	success	✅
pl-cpu (windows-2022, lightning, 3.10, 2.0)	success	✅
pl-cpu (windows-2022, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (macOS-11, pytorch, 3.8, 1.13)	success	✅
pl-cpu (ubuntu-20.04, pytorch, 3.8, 1.13)	success	✅
pl-cpu (windows-2022, pytorch, 3.8, 1.13)	success	✅
pl-cpu (macOS-12, pytorch, 3.11, 2.0)	success	✅
pl-cpu (ubuntu-22.04, pytorch, 3.11, 2.0)	success	✅
pl-cpu (windows-2022, pytorch, 3.11, 2.0)	success	✅

These checks are required after the changes to src/lightning/pytorch/loops/evaluation_loop.py, src/lightning/pytorch/loops/fetchers.py, src/lightning/pytorch/loops/fit_loop.py, src/lightning/pytorch/loops/prediction_loop.py, src/lightning/pytorch/loops/training_epoch_loop.py, src/lightning/pytorch/trainer/trainer.py, src/lightning/pytorch/utilities/combined_loader.py, tests/tests_pytorch/loops/test_evaluation_loop.py, tests/tests_pytorch/loops/test_fetchers.py, tests/tests_pytorch/loops/test_loops.py, tests/tests_pytorch/strategies/test_single_device.py, tests/tests_pytorch/trainer/properties/test_estimated_stepping_batches.py, tests/tests_pytorch/trainer/test_dataloaders.py, tests/tests_pytorch/trainer/test_trainer.py, tests/tests_pytorch/utilities/test_combined_loader.py.

🟢 pytorch_lightning: Azure GPU

Check ID	Status
[pytorch-lightning (GPUs) (testing Lightning	latest)](https://dev.azure.com/Lightning-AI/72ab7ed8-b00f-4b6e-b131-3388f7ffafa7/_build/results?buildId=173284&view=logs&jobId=47e66f3c-897a-5428-da11-bf5c7745762e)	success
[pytorch-lightning (GPUs) (testing PyTorch	latest)](https://dev.azure.com/Lightning-AI/72ab7ed8-b00f-4b6e-b131-3388f7ffafa7/_build/results?buildId=173284&view=logs&jobId=3f274fac-2e11-54ca-487e-194c91f3ae9f)	success

These checks are required after the changes to src/lightning/pytorch/loops/evaluation_loop.py, src/lightning/pytorch/loops/fetchers.py, src/lightning/pytorch/loops/fit_loop.py, src/lightning/pytorch/loops/prediction_loop.py, src/lightning/pytorch/loops/training_epoch_loop.py, src/lightning/pytorch/trainer/trainer.py, src/lightning/pytorch/utilities/combined_loader.py, tests/tests_pytorch/loops/test_evaluation_loop.py, tests/tests_pytorch/loops/test_fetchers.py, tests/tests_pytorch/loops/test_loops.py, tests/tests_pytorch/strategies/test_single_device.py, tests/tests_pytorch/trainer/properties/test_estimated_stepping_batches.py, tests/tests_pytorch/trainer/test_dataloaders.py, tests/tests_pytorch/trainer/test_trainer.py, tests/tests_pytorch/utilities/test_combined_loader.py.

🟢 pytorch_lightning: Benchmarks

Check ID	Status
lightning.Benchmarks	success	✅

These checks are required after the changes to src/lightning/pytorch/loops/evaluation_loop.py, src/lightning/pytorch/loops/fetchers.py, src/lightning/pytorch/loops/fit_loop.py, src/lightning/pytorch/loops/prediction_loop.py, src/lightning/pytorch/loops/training_epoch_loop.py, src/lightning/pytorch/trainer/trainer.py, src/lightning/pytorch/utilities/combined_loader.py.

🟢 pytorch_lightning: Docs

Check ID	Status
docs-make (pytorch, doctest)	success	✅
docs-make (pytorch, html)	success	✅

These checks are required after the changes to src/lightning/pytorch/loops/evaluation_loop.py, src/lightning/pytorch/loops/fetchers.py, src/lightning/pytorch/loops/fit_loop.py, src/lightning/pytorch/loops/prediction_loop.py, src/lightning/pytorch/loops/training_epoch_loop.py, src/lightning/pytorch/trainer/trainer.py, src/lightning/pytorch/utilities/combined_loader.py.

🟢 mypy

Check ID	Status
mypy	success	✅

These checks are required after the changes to src/lightning/pytorch/loops/evaluation_loop.py, src/lightning/pytorch/loops/fetchers.py, src/lightning/pytorch/loops/fit_loop.py, src/lightning/pytorch/loops/prediction_loop.py, src/lightning/pytorch/loops/training_epoch_loop.py, src/lightning/pytorch/trainer/trainer.py, src/lightning/pytorch/utilities/combined_loader.py.

🟢 install

Check ID	Status
install-pkg (ubuntu-22.04, app, 3.8)	success	✅
install-pkg (ubuntu-22.04, app, 3.11)	success	✅
install-pkg (ubuntu-22.04, fabric, 3.8)	success	✅
install-pkg (ubuntu-22.04, fabric, 3.11)	success	✅
install-pkg (ubuntu-22.04, pytorch, 3.8)	success	✅
install-pkg (ubuntu-22.04, pytorch, 3.11)	success	✅
install-pkg (ubuntu-22.04, lightning, 3.8)	success	✅
install-pkg (ubuntu-22.04, lightning, 3.11)	success	✅
install-pkg (ubuntu-22.04, notset, 3.8)	success	✅
install-pkg (ubuntu-22.04, notset, 3.11)	success	✅
install-pkg (macOS-12, app, 3.8)	success	✅
install-pkg (macOS-12, app, 3.11)	success	✅
install-pkg (macOS-12, fabric, 3.8)	success	✅
install-pkg (macOS-12, fabric, 3.11)	success	✅
install-pkg (macOS-12, pytorch, 3.8)	success	✅
install-pkg (macOS-12, pytorch, 3.11)	success	✅
install-pkg (macOS-12, lightning, 3.8)	success	✅
install-pkg (macOS-12, lightning, 3.11)	success	✅
install-pkg (macOS-12, notset, 3.8)	success	✅
install-pkg (macOS-12, notset, 3.11)	success	✅
install-pkg (windows-2022, app, 3.8)	success	✅
install-pkg (windows-2022, app, 3.11)	success	✅
install-pkg (windows-2022, fabric, 3.8)	success	✅
install-pkg (windows-2022, fabric, 3.11)	success	✅
install-pkg (windows-2022, pytorch, 3.8)	success	✅
install-pkg (windows-2022, pytorch, 3.11)	success	✅
install-pkg (windows-2022, lightning, 3.8)	success	✅
install-pkg (windows-2022, lightning, 3.11)	success	✅
install-pkg (windows-2022, notset, 3.8)	success	✅
install-pkg (windows-2022, notset, 3.11)	success	✅

These checks are required after the changes to src/lightning/pytorch/loops/evaluation_loop.py, src/lightning/pytorch/loops/fetchers.py, src/lightning/pytorch/loops/fit_loop.py, src/lightning/pytorch/loops/prediction_loop.py, src/lightning/pytorch/loops/training_epoch_loop.py, src/lightning/pytorch/trainer/trainer.py, src/lightning/pytorch/utilities/combined_loader.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

…h' into dataloader-iter/via-loader-length

for more information, see https://pre-commit.ci

tests/tests_pytorch/loops/test_fetchers.py

carmocca

Tests are thorough, good job

src/lightning/pytorch/loops/fetchers.py

src/lightning/pytorch/loops/fit_loop.py

src/lightning/pytorch/loops/training_epoch_loop.py

src/lightning/pytorch/utilities/combined_loader.py

tests/tests_pytorch/loops/test_loops.py

tests/tests_pytorch/utilities/test_combined_loader.py

src/lightning/pytorch/loops/fetchers.py

src/lightning/pytorch/loops/training_epoch_loop.py

src/lightning/pytorch/trainer/trainer.py

awaelchli added 2 commits August 30, 2023 12:43

wip

4cd141d

wip

4f92b94

github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 30, 2023

pre-commit-ci bot and others added 3 commits August 30, 2023 18:28

[pre-commit.ci] auto fixes from pre-commit.com hooks

06fcccc

for more information, see https://pre-commit.ci

fix

67aac4d

fixes

49ce20f

awaelchli force-pushed the dataloader-iter/via-loader-length branch from 0353163 to 49ce20f Compare August 30, 2023 22:11

pre-commit-ci bot and others added 6 commits August 30, 2023 22:12

[pre-commit.ci] auto fixes from pre-commit.com hooks

36a00c1

for more information, see https://pre-commit.ci

update

817eb18

fix

3083632

implement len for other modes

e7cb210

fix

c4d3961

[pre-commit.ci] auto fixes from pre-commit.com hooks

7a35014

for more information, see https://pre-commit.ci

awaelchli changed the title ~~Set limits for fetcher.done V2~~ WIP: (v2) Set limits for fetcher.done Aug 31, 2023

awaelchli marked this pull request as ready for review August 31, 2023 12:06

awaelchli requested review from carmocca, justusschock, Borda, williamFalcon, lantiga and tchaton as code owners August 31, 2023 12:06

awaelchli and others added 8 commits August 31, 2023 14:23

set limits in training loop

8cf3513

Merge remote-tracking branch 'origin/dataloader-iter/via-loader-lengt…

4f50461

…h' into dataloader-iter/via-loader-length

convert the limits

fa53bcd

Merge branch 'master' into dataloader-iter/via-loader-length

5a62929

None check

eaa2be6

fix passing limits

2fd6783

[pre-commit.ci] auto fixes from pre-commit.com hooks

63e58e8

for more information, see https://pre-commit.ci

fix test

36d3a3a

awaelchli commented Sep 5, 2023

View reviewed changes

tests/tests_pytorch/loops/test_fetchers.py Show resolved Hide resolved

awaelchli requested a review from carmocca September 5, 2023 14:21

awaelchli added 2 commits September 5, 2023 17:24

raise StopIteration when length is reached

157ec6d

fix test

98a13c4

mergify bot added the has conflicts label Sep 5, 2023

carmocca reviewed Sep 5, 2023

View reviewed changes

Merge branch 'master' into dataloader-iter/via-loader-length

dd56105

mergify bot removed the has conflicts label Sep 5, 2023

carmocca and others added 8 commits September 6, 2023 01:45

Fix bad merge

9159a31

raise error if iter() not called before len()

7361f95

simplify len computation

64b9deb

move tests to test_fetchers.py

af7dcc7

move the special stopping condition to the _DataFetcherWrapper

94a8be7

revert the sum(limits) = 0 change

72ce38d

fix test for on_batch_start indices with dataloader_iter

5627bf4

try to skip additional iter() call in first epoch

b63bcb1

awaelchli requested a review from carmocca September 6, 2023 01:28

carmocca reviewed Sep 6, 2023

View reviewed changes

src/lightning/pytorch/loops/fetchers.py Outdated Show resolved Hide resolved

src/lightning/pytorch/loops/training_epoch_loop.py Outdated Show resolved Hide resolved

src/lightning/pytorch/trainer/trainer.py Show resolved Hide resolved

awaelchli added 3 commits September 6, 2023 04:02

remove a comment

ccdc469

check if data_fetcher's iterator exists

190c480

update guard for iter() decision

aa2b2b7

carmocca approved these changes Sep 6, 2023

View reviewed changes

awaelchli mentioned this pull request Sep 6, 2023

Refactor data fetcher selection in loops #18494

Merged

awaelchli added the ready PRs ready to be merged label Sep 6, 2023

Borda approved these changes Sep 7, 2023

View reviewed changes

awaelchli merged commit 8381ed3 into master Sep 7, 2023

awaelchli deleted the dataloader-iter/via-loader-length branch September 7, 2023 14:46

awaelchli mentioned this pull request Sep 8, 2023

Support tracking the batches fetched by dataloader_iter in the progress bar #18498

Closed

awaelchli mentioned this pull request Mar 16, 2024

[WIP] Test is_last_batch use cases #19659

Closed

awaelchli mentioned this pull request Apr 2, 2024

WIP Delay data fetcher setup #19725

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set limits for `fetcher.done` #18441

Set limits for `fetcher.done` #18441

awaelchli commented Aug 30, 2023 •

edited by github-actions bot

Loading

github-actions bot commented Aug 31, 2023 •

edited

Loading

carmocca left a comment

Set limits for fetcher.done #18441

Set limits for fetcher.done #18441

Conversation

awaelchli commented Aug 30, 2023 • edited by github-actions bot Loading

What does this PR do?

github-actions bot commented Aug 31, 2023 • edited Loading

⚡ Required checks status: All passing 🟢

Groups summary

carmocca left a comment

Choose a reason for hiding this comment

Set limits for `fetcher.done` #18441

Set limits for `fetcher.done` #18441

awaelchli commented Aug 30, 2023 •

edited by github-actions bot

Loading

github-actions bot commented Aug 31, 2023 •

edited

Loading