Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when running example in README.md #4367

Closed
ThomasWollmann opened this issue Oct 26, 2020 · 16 comments
Closed

Bug when running example in README.md #4367

ThomasWollmann opened this issue Oct 26, 2020 · 16 comments
Assignees
Labels
bug Something isn't working good first issue Good for newcomers help wanted Open to be worked on priority: 1 Medium priority task working as intended Working as intended

Comments

@ThomasWollmann
Copy link

🐛 Bug

When running the example in the README.md, the script crashes (Error: AttributeError: 'TypeError' object has no attribute 'message') when calling "trainer.fit(...)".

Appeared with 1.0.3, 1.0.2, 1.0.1, 1.0.0. The script works in 0.9.0 if you remove the line "self.log('train_loss', loss)".

Stacktrace:

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py:45: UserWarning: you passed in a val_dataloader but have no validation_step. Skipping validation loop
  warnings.warn(*args, **kwargs)
Traceback (most recent call last):
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 188, in log_metrics
    self.experiment.add_scalar(k, v, step)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/base.py", line 450, in experiment
    return get_experiment() or DummyExperiment()
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py", line 35, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/base.py", line 449, in get_experiment
    return fn(self)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 144, in experiment
    self._experiment = SummaryWriter(log_dir=self.log_dir, **self._kwargs)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 118, in log_dir
    version = self.version if isinstance(self.version, str) else f"version_{self.version}"
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 234, in version
    self._version = self._get_next_version()
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 246, in _get_next_version
    d = listing["name"]
TypeError: string indices must be integers
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/thomas/mx/tmp/main.py", line 41, in <module>
    trainer.fit(autoencoder, DataLoader(train), DataLoader(val))
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 440, in fit
    results = self.accelerator_backend.train()
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/cpu_accelerator.py", line 45, in train
    self.trainer.train_loop.setup_training(model)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 132, in setup_training
    self.trainer.logger.log_hyperparams(ref_model.hparams)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py", line 35, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 169, in log_hyperparams
    self.log_metrics(metrics, 0)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py", line 35, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 191, in log_metrics
    type(e)(e.message + m)
AttributeError: 'TypeError' object has no attribute 'message'

To Reproduce

import os
import torch
from torch import nn
import torch.nn.functional as F
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
import pytorch_lightning as pl

class LitAutoEncoder(pl.LightningModule):

    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 3))
        self.decoder = nn.Sequential(nn.Linear(3, 128), nn.ReLU(), nn.Linear(128, 28 * 28))
    
    def forward(self, x):
        # in lightning, forward defines the prediction/inference actions
        embedding = self.encoder(x)
        return embedding

    def training_step(self, batch, batch_idx):
        # training_step defined the train loop. It is independent of forward
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = F.mse_loss(x_hat, x)
        self.log('train_loss', loss)
        return loss

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        return optimizer

dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
train, val = random_split(dataset, [55000, 5000])

autoencoder = LitAutoEncoder()
trainer = pl.Trainer()
trainer.fit(autoencoder, DataLoader(train), DataLoader(val))

Expected behavior

Training should start.

Environment

  • CUDA:
    • GPU:
    • available: False
    • version: 10.2
  • Packages:
    • numpy: 1.17.2
    • pyTorch_debug: False
    • pyTorch_version: 1.6.0
    • pytorch-lightning: 1.0.3
    • tqdm: 4.36.1
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.7.4
    • version: Training accuracy #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020
@ThomasWollmann ThomasWollmann added bug Something isn't working help wanted Open to be worked on labels Oct 26, 2020
@rohitgr7
Copy link
Contributor

Tried the code on colab. Not getting any error.

@Maddy12
Copy link

Maddy12 commented Oct 27, 2020

I am getting the same error

@rohitgr7
Copy link
Contributor

Can you guys reproduce this error with a colab notebook??

@bipinKrishnan
Copy link
Contributor

Tried the code on colab. Not getting any error.

Same here, the code works fine on Google Colab.

@edenlightning edenlightning added the waiting on author Waiting on user action, correction, or update label Oct 29, 2020
@ThomasWollmann
Copy link
Author

Fine for me in Colab. Still not working locally.

@21jun
Copy link

21jun commented Nov 4, 2020

I solved this problem by installing pytorch-lightning v.1.0.5

pip uninstall pytorch-lightning
pip install pytorch-lighting==1.0.5

In Google Colab, pytorch-lightning version is 1.0.5 while my local version was 0.9
At the first place, I installed pytorch-lightning with conda install pytorch-lightning -c conda-forge but it gave me v 0.9

OS: Windows 10
arch: x86_64
conda: 4.8.5
python: 3.8.2
pytorch: 1.6.0

@ThomasWollmann
Copy link
Author

I did the same. Also installed pytorch via conda. Still the same error.

@edenlightning
Copy link
Contributor

@Borda any idea?

@edenlightning edenlightning added priority: 1 Medium priority task and removed waiting on author Waiting on user action, correction, or update labels Nov 17, 2020
@Borda Borda self-assigned this Nov 17, 2020
@Borda Borda added the good first issue Good for newcomers label Dec 1, 2020
@Borda
Copy link
Member

Borda commented Dec 3, 2020

@awaelchli can we add running doctest (via Sphinx) also on all MarkDown files, what do you think? :]

@awaelchli
Copy link
Contributor

that's probably a bit overkill :)

@ThomasWollmann
Copy link
Author

ThomasWollmann commented Dec 4, 2020

@Borda having the samples as regular unit tests would be already nice. Do not thing there is not an urgent need for doctest.

@Borda
Copy link
Member

Borda commented Dec 4, 2020

@Borda having the samples as regular unit tests would be already nice. Do not thing there is not an urgent need for doctest.

well, how would utilize unit-tests as samples? the point with doctsring is that are examples and also tested so one code serves twice, compare to unit-tests which are usually in another folder (out of the package distribution) and then you just hope that if you a change in tests you update example accordingly...

@ThomasWollmann
Copy link
Author

ThomasWollmann commented Dec 7, 2020

@Borda having the samples as regular unit tests would be already nice. Do not thing there is not an urgent need for doctest.

well, how would utilize unit-tests as samples? the point with doctsring is that are examples and also tested so one code serves twice, compare to unit-tests which are usually in another folder (out of the package distribution) and then you just hope that if you a change in tests you update example accordingly...

That's true. It's a bit hacky to simulate an out of package sample within a unit test.

@ThomasWollmann
Copy link
Author

I reinstalled my OS and couldn't reproduce the error anymore. Will close the issue for now.

@Jerryxiaoyu
Copy link

Jerryxiaoyu commented Dec 19, 2020

@edenlightning I got the same problem. I fixed it by reinstalling a higher version TensorFlow (1.14+).
I think this is because Pytorch-lighting requires TensorBoard >=2.2, but that means also requires a higher version TensorFlow to support the TensorBoard.
I found the TensorFlow version was only 1.12 in my environment. So I upgraded TensorFlow to 1.14, and the errors were missing.

https://github.com/tensorflow/tensorboard#can-i-run-tensorboard-without-a-tensorflow-installation

@edenlightning edenlightning added the working as intended Working as intended label Jan 19, 2021
@edenlightning
Copy link
Contributor

Thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Open to be worked on priority: 1 Medium priority task working as intended Working as intended
Projects
None yet
Development

No branches or pull requests

9 participants