Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Example Model Used in Documention #18327

Closed
2 of 11 tasks
jxtngx opened this issue Aug 16, 2023 · 14 comments
Closed
2 of 11 tasks

Update Example Model Used in Documention #18327

jxtngx opened this issue Aug 16, 2023 · 14 comments
Labels
docs Documentation related good first issue Good for newcomers help wanted Open to be worked on

Comments

@jxtngx
Copy link
Contributor

jxtngx commented Aug 16, 2023

📚 Documentation

Overview

This issue serves to track multiple updates to examples provided in PyTorch Lightning and Lightning Fabric documentation.

Examples will be updated to reflect a common model called LightningTransformer. This common model will replace all occurrences of LitAutoEncoder and similar encoder-decoder modules.

The purpose of these updates is to modernize the examples. However, care should be taken to provide examples which will run on most machines i.e. it is not suitable to provide an example of pretraining or finetuning an LLM as the default example, as this would result in issues for new users who do not possess a machine capable of such a task.

Each update should have a PR of its own, or be grouped in a cohesive manner in order to consolidate.

cc @Borda

Additional Resources

Follow the docs/README and documentation guidelines for information on docstring conventions and using doctest.

Tasks

Please comment on this issue to have an example added to the to-do list or to be assigned to a particular task.

PyTorch Lightning

Must look in docs/source-pytorch/.

Priorities

Secondary

Lightning Fabric

Must look in docs/source-fabric/

LightningTransformer Demo for PyTorch Lightning

Below is the Transformer that will be used. The model exists in demos, and can be abstracted from the docs in order to keep the example high-level.

Note

this is the same demo Transformer used for Lightning Fabric's home page.

import lightning.pytorch as pl
import torch

from lightning.pytorch.demos import Transformer


class LightningTransformer(pl.LightningModule):
    def __init__(self, vocab_size):
        super().__init__()
        self.model = Transformer(vocab_size=vocab_size)

    def forward(self, batch):
        input, target = batch
        return self.model(input.view(1, -1), target.view(1, -1))

    def training_step(self, batch, batch_idx):
        input, target = batch
        output = self.model(input, target)
        loss = torch.nn.functional.nll_loss(output, target.view(-1))
        return loss

    def predict_step(self, batch):
        return self(batch)

    def configure_optimizers(self):
        return torch.optim.SGD(self.model.parameters(), lr=0.1)


if __name__ == "__main__":
    from lightning.pytorch.demos import WikiText2
    from torch.utils.data import DataLoader

    dataset = WikiText2()
    dataloader = DataLoader(dataset)
    model = LightningTransformer(vocab_size=dataset.vocab_size)

    trainer = pl.Trainer(fast_dev_run=True)
    trainer.fit(model=model, train_dataloaders=dataloader)

WikiText2DataModule Demo for PyTorch Lightning

from pathlib import Path

from torch.utils.data import DataLoader, random_split

import lightning.pytorch as pl
from lightning.pytorch.utilities.types import EVAL_DATALOADERS, TRAIN_DATALOADERS
from lightning.pytorch.demos.transformer import WikiText2


class WikiText2DataModule(pl.LightningDataModule):
    def __init__(
        self,
        num_workers: int = 2,
        data_dir: Path = Path("./data"),
        block_size: int = 35,
        download: bool = True,
        train_size: float = 0.8,
    ) -> None:
        super().__init__()
        self.data_dir = data_dir
        self.block_size = block_size
        self.download = download
        self.num_workers = num_workers
        self.train_size = train_size
        self.dataset = None

    def prepare_data(self) -> None:
        self.dataset = WikiText2(data_dir=self.data_dir, block_size=self.block_size, download=self.download)

    def setup(self, stage: str) -> None:
        if stage == "fit" or stage is None:
            train_size = int(len(self.dataset) * self.train_size)
            test_size = len(self.dataset) - train_size
            self.train_data, self.val_data = random_split(self.dataset, lengths=[train_size, test_size])
        if stage == "test" or stage is None:
            self.test_data = self.val_data

    def train_dataloader(self) -> TRAIN_DATALOADERS:
        return DataLoader(self.train_data, num_workers=self.num_workers)

    def val_dataloader(self) -> EVAL_DATALOADERS:
        return DataLoader(self.val_data, num_workers=self.num_workers)

    def test_dataloader(self) -> EVAL_DATALOADERS:
        return DataLoader(self.test_data, num_workers=self.num_workers)
@jxtngx jxtngx added docs Documentation related needs triage Waiting to be triaged by maintainers labels Aug 16, 2023
@Borda Borda added help wanted Open to be worked on good first issue Good for newcomers and removed needs triage Waiting to be triaged by maintainers labels Aug 16, 2023
@aniketmaurya
Copy link
Contributor

AutoEncoder is a great example for image classification tasks and for beginner to intermediate folks. I'd propose to only replace where we really need a more advanced demo. We can use Stable Diffusion and LitGPT as example for the GenAI models.

@Dev-Khant
Copy link

Can I pick this up @JustinGoheen?

@jxtngx
Copy link
Contributor Author

jxtngx commented Oct 5, 2023

Hi @Dev-Khant 👋 let's check with @carmocca and @awaelchli where they may like for you to assist.

@carmocca
Copy link
Contributor

carmocca commented Oct 6, 2023

Hi @JustinGoheen, I don't have context on the objective of this issue, but I can see there's a long list of "secondary" items in the top post. Since you created it I think it's up to you to decide whether they should be changed or not

@sbshah97
Copy link

Hey @JustinGoheen I'd love to contribute but I am a new contributor. Can I add some help?

@jxtngx
Copy link
Contributor Author

jxtngx commented Oct 30, 2023

@sbshah97 would you like to work on lightning/pytorch/core/module?

@jxtngx
Copy link
Contributor Author

jxtngx commented Oct 30, 2023

@Dev-Khant would you like to work on lightning/pytorch/trainer/trainer?

@sbshah97
Copy link

Hello yes I can give it a try.

@Dev-Khant
Copy link

@Dev-Khant would you like to work on lightning/pytorch/trainer/trainer?

Yes sure @JustinGoheen

@jxtngx
Copy link
Contributor Author

jxtngx commented Oct 31, 2023

@Dev-Khant & @sbshah97: here are some clarifying instructions for you since these each involve docstrings.

The basic steps are:

  1. Fork and clone the Lightning repo.
  2. Create a working branch. If you don't want to use the command line to manage your gitops, GitKraken is a really nice tool to help manage git repos and PRs.
  3. Go to the respective pages for your particular task: LightningModule, Trainer
  4. Read through the API section, scanning for grammar and syntax errors, and errors in the example code snippets.
  5. If there are no errors, that is completely okay – just let me know and I will do one final check before marking off the task and working with you to select your next contribution.
  6. If there are errors, make the changes on your working branch, and push the changes.
  7. If there are errors , and you have pushed the changes to your working branch. Then open a draft PR to submit your changes. Once the changes are submitted, I can help review the changes before changing the PR's status to ready for review.
  8. Once the PR is marked as ready for review, the core maintainers will either approve and merge the PR, or suggest additional changes.

Thank you for your willingness to help, and definitely let me know if you have any questions 🙂

@sbshah97
Copy link

sbshah97 commented Nov 8, 2023

Hey Justin. I wanted to understand how to sort of approach this task. From what I understand there are two parts to it.

  1. Check for any grammatical mistakes. For Proposal for help #1 I just put it through ChatGPT and from the looks of it there don't seem to be any errors.
  2. Check for any function errors in the documentation. - I am not sure how to proceed on this. Any pointers?

@JustinGoheen

@jxtngx
Copy link
Contributor Author

jxtngx commented Nov 9, 2023

@sbshah97 I'll put together a guide for doctest for you.

@sbshah97
Copy link

sbshah97 commented Nov 9, 2023

Thank you Justin. Looking forward to that.

@sbshah97
Copy link

Hey Justin anything on this ?

@jxtngx jxtngx closed this as completed Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation related good first issue Good for newcomers help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

6 participants