Bug when running example in README.md #4367

ThomasWollmann · 2020-10-26T12:56:52Z

🐛 Bug

When running the example in the README.md, the script crashes (Error: AttributeError: 'TypeError' object has no attribute 'message') when calling "trainer.fit(...)".

Appeared with 1.0.3, 1.0.2, 1.0.1, 1.0.0. The script works in 0.9.0 if you remove the line "self.log('train_loss', loss)".

Stacktrace:

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py:45: UserWarning: you passed in a val_dataloader but have no validation_step. Skipping validation loop
  warnings.warn(*args, **kwargs)
Traceback (most recent call last):
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 188, in log_metrics
    self.experiment.add_scalar(k, v, step)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/base.py", line 450, in experiment
    return get_experiment() or DummyExperiment()
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py", line 35, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/base.py", line 449, in get_experiment
    return fn(self)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 144, in experiment
    self._experiment = SummaryWriter(log_dir=self.log_dir, **self._kwargs)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 118, in log_dir
    version = self.version if isinstance(self.version, str) else f"version_{self.version}"
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 234, in version
    self._version = self._get_next_version()
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 246, in _get_next_version
    d = listing["name"]
TypeError: string indices must be integers
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/thomas/mx/tmp/main.py", line 41, in <module>
    trainer.fit(autoencoder, DataLoader(train), DataLoader(val))
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 440, in fit
    results = self.accelerator_backend.train()
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/cpu_accelerator.py", line 45, in train
    self.trainer.train_loop.setup_training(model)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 132, in setup_training
    self.trainer.logger.log_hyperparams(ref_model.hparams)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py", line 35, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 169, in log_hyperparams
    self.log_metrics(metrics, 0)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py", line 35, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/thomas/anaconda3/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 191, in log_metrics
    type(e)(e.message + m)
AttributeError: 'TypeError' object has no attribute 'message'

To Reproduce

import os
import torch
from torch import nn
import torch.nn.functional as F
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
import pytorch_lightning as pl

class LitAutoEncoder(pl.LightningModule):

    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 3))
        self.decoder = nn.Sequential(nn.Linear(3, 128), nn.ReLU(), nn.Linear(128, 28 * 28))
    
    def forward(self, x):
        # in lightning, forward defines the prediction/inference actions
        embedding = self.encoder(x)
        return embedding

    def training_step(self, batch, batch_idx):
        # training_step defined the train loop. It is independent of forward
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = F.mse_loss(x_hat, x)
        self.log('train_loss', loss)
        return loss

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        return optimizer

dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
train, val = random_split(dataset, [55000, 5000])

autoencoder = LitAutoEncoder()
trainer = pl.Trainer()
trainer.fit(autoencoder, DataLoader(train), DataLoader(val))

Expected behavior

Training should start.

Environment

CUDA:
- GPU:
- available: False
- version: 10.2
Packages:
- numpy: 1.17.2
- pyTorch_debug: False
- pyTorch_version: 1.6.0
- pytorch-lightning: 1.0.3
- tqdm: 4.36.1
System:
- OS: Linux
- architecture:
  - 64bit
- processor: x86_64
- python: 3.7.4
- version: Training accuracy #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020

The text was updated successfully, but these errors were encountered:

rohitgr7 · 2020-10-27T17:11:52Z

Tried the code on colab. Not getting any error.

Maddy12 · 2020-10-27T21:32:38Z

I am getting the same error

rohitgr7 · 2020-10-28T03:56:49Z

Can you guys reproduce this error with a colab notebook??

bipinKrishnan · 2020-10-28T07:23:15Z

Tried the code on colab. Not getting any error.

Same here, the code works fine on Google Colab.

ThomasWollmann · 2020-10-31T16:11:55Z

Fine for me in Colab. Still not working locally.

21jun · 2020-11-04T15:55:05Z

I solved this problem by installing pytorch-lightning v.1.0.5

pip uninstall pytorch-lightning
pip install pytorch-lighting==1.0.5

In Google Colab, pytorch-lightning version is 1.0.5 while my local version was 0.9
At the first place, I installed pytorch-lightning with conda install pytorch-lightning -c conda-forge but it gave me v 0.9

OS: Windows 10
arch: x86_64
conda: 4.8.5
python: 3.8.2
pytorch: 1.6.0

ThomasWollmann · 2020-11-12T11:59:09Z

I did the same. Also installed pytorch via conda. Still the same error.

edenlightning · 2020-11-17T19:45:33Z

@Borda any idea?

Borda · 2020-12-03T09:34:46Z

@awaelchli can we add running doctest (via Sphinx) also on all MarkDown files, what do you think? :]

awaelchli · 2020-12-03T22:36:21Z

that's probably a bit overkill :)

ThomasWollmann · 2020-12-04T07:36:00Z

@Borda having the samples as regular unit tests would be already nice. Do not thing there is not an urgent need for doctest.

Borda · 2020-12-04T08:56:22Z

@Borda having the samples as regular unit tests would be already nice. Do not thing there is not an urgent need for doctest.

well, how would utilize unit-tests as samples? the point with doctsring is that are examples and also tested so one code serves twice, compare to unit-tests which are usually in another folder (out of the package distribution) and then you just hope that if you a change in tests you update example accordingly...

ThomasWollmann · 2020-12-07T15:03:53Z

@Borda having the samples as regular unit tests would be already nice. Do not thing there is not an urgent need for doctest.

well, how would utilize unit-tests as samples? the point with doctsring is that are examples and also tested so one code serves twice, compare to unit-tests which are usually in another folder (out of the package distribution) and then you just hope that if you a change in tests you update example accordingly...

That's true. It's a bit hacky to simulate an out of package sample within a unit test.

ThomasWollmann · 2020-12-07T15:06:10Z

I reinstalled my OS and couldn't reproduce the error anymore. Will close the issue for now.

Jerryxiaoyu · 2020-12-19T02:00:09Z

@edenlightning I got the same problem. I fixed it by reinstalling a higher version TensorFlow (1.14+).
I think this is because Pytorch-lighting requires TensorBoard >=2.2， but that means also requires a higher version TensorFlow to support the TensorBoard.
I found the TensorFlow version was only 1.12 in my environment. So I upgraded TensorFlow to 1.14, and the errors were missing.

https://github.com/tensorflow/tensorboard#can-i-run-tensorboard-without-a-tensorflow-installation

edenlightning · 2021-01-19T16:05:18Z

Thanks for the update!

ThomasWollmann added bug Something isn't working help wanted Open to be worked on labels Oct 26, 2020

edenlightning added the waiting on author Waiting on user action, correction, or update label Oct 29, 2020

edenlightning added priority: 1 Medium priority task and removed waiting on author Waiting on user action, correction, or update labels Nov 17, 2020

Borda self-assigned this Nov 17, 2020

Borda added the good first issue Good for newcomers label Dec 1, 2020

ThomasWollmann closed this as completed Dec 7, 2020

edenlightning reopened this Dec 15, 2020

edenlightning added the working as intended Working as intended label Jan 19, 2021

edenlightning closed this as completed Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug when running example in README.md #4367

Bug when running example in README.md #4367

ThomasWollmann commented Oct 26, 2020

rohitgr7 commented Oct 27, 2020

Maddy12 commented Oct 27, 2020

rohitgr7 commented Oct 28, 2020

bipinKrishnan commented Oct 28, 2020

ThomasWollmann commented Oct 31, 2020

21jun commented Nov 4, 2020 •

edited

Loading

ThomasWollmann commented Nov 12, 2020

edenlightning commented Nov 17, 2020

Borda commented Dec 3, 2020

awaelchli commented Dec 3, 2020

ThomasWollmann commented Dec 4, 2020 •

edited

Loading

Borda commented Dec 4, 2020

ThomasWollmann commented Dec 7, 2020 •

edited

Loading

ThomasWollmann commented Dec 7, 2020

Jerryxiaoyu commented Dec 19, 2020 •

edited

Loading

edenlightning commented Jan 19, 2021

Bug when running example in README.md #4367

Bug when running example in README.md #4367

Comments

ThomasWollmann commented Oct 26, 2020

🐛 Bug

To Reproduce

Expected behavior

Environment

rohitgr7 commented Oct 27, 2020

Maddy12 commented Oct 27, 2020

rohitgr7 commented Oct 28, 2020

bipinKrishnan commented Oct 28, 2020

ThomasWollmann commented Oct 31, 2020

21jun commented Nov 4, 2020 • edited Loading

ThomasWollmann commented Nov 12, 2020

edenlightning commented Nov 17, 2020

Borda commented Dec 3, 2020

awaelchli commented Dec 3, 2020

ThomasWollmann commented Dec 4, 2020 • edited Loading

Borda commented Dec 4, 2020

ThomasWollmann commented Dec 7, 2020 • edited Loading

ThomasWollmann commented Dec 7, 2020

Jerryxiaoyu commented Dec 19, 2020 • edited Loading

edenlightning commented Jan 19, 2021

21jun commented Nov 4, 2020 •

edited

Loading

ThomasWollmann commented Dec 4, 2020 •

edited

Loading

ThomasWollmann commented Dec 7, 2020 •

edited

Loading

Jerryxiaoyu commented Dec 19, 2020 •

edited

Loading