Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent MemoryError while training on VCTK #50

Closed
engiecat opened this issue Mar 4, 2018 · 4 comments
Closed

Persistent MemoryError while training on VCTK #50

engiecat opened this issue Mar 4, 2018 · 4 comments

Comments

@engiecat
Copy link
Contributor

engiecat commented Mar 4, 2018

Hello. I am currently trying to train VCTK model on deepvoice 3 multispeaker model.
While it seems that it works okay, sometimes the training crashes with the following error.

2734it [13:58,  3.26it/s]Traceback (most recent call last):
  File "train.py", line 957, in <module>
    train_seq2seq=train_seq2seq, train_postnet=train_postnet)
  File "train.py", line 585, in train
    in tqdm(enumerate(data_loader)):
  File "H:\envs\pytorch\lib\site-packages\tqdm\_tqdm.py", line 959, in __iter__
    for obj in iterable:
  File "H:\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 281, in __next__
    return self._process_next_batch(batch)
  File "H:\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
MemoryError: Traceback (most recent call last):
  File "H:\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 55, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "H:\Tensorflow_Study\git\deepvoice3_pytorch\train.py", line 329, in collate_fn
    dtype=np.float32)
MemoryError

Forcing garbage collection sporadically(using gc.collect()) doesn't help the issue.
Currently, I have 16 GB of RAM with 48 GB of virtual memory available on my SSD (just in case).
(Using Windows 10 with PyTorch 0.3.1 (with CUDA 8.0, GTX1060 6GB))

Also, I do observe that in Resource Monitor, the memory usage in Commit(KB) and Working Set(KB) is significantly different, as shown below. (Sorry for the non-english)
image

Thank you for creating such wonderful implementation!
:)

@r9y9
Copy link
Owner

r9y9 commented Mar 4, 2018

Thank you for the report. I have never seen MemoryError on Linux, so I'm guessing this is a windows-specific issue... If gc.collect() doesn't help, this is possibly a pytorch (C++-side) issue.

@r9y9
Copy link
Owner

r9y9 commented Mar 4, 2018

Maybe worth trying num_workers=1 for DataLoader?

# Data loader
pin_memory=True,
num_workers=2,

@engiecat
Copy link
Contributor Author

engiecat commented Mar 4, 2018

@r9y9 It didn't crash for 15 hours(working well till now). (compared to ~2.5 hrs previously)
Thank you for your assistance!!!

@engiecat engiecat closed this as completed Mar 4, 2018
@engiecat engiecat reopened this Mar 4, 2018
r9y9 pushed a commit that referenced this issue Mar 10, 2018
* Fixed typeerror (torch.index_select received an invalid combination of arguments)

  File "synthesis.py", line 137, in <module>
    model, text, p=replace_pronunciation_prob, speaker_id=speaker_id, fast=True)
  File "synthesis.py", line 66, in tts
    sequence, text_positions=text_positions, speaker_ids=speaker_ids)
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 79, in forward
    text_positions, frame_positions, input_lengths)
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 116, in forward
    text_sequences, lengths=input_lengths, speaker_embed=speaker_embed)
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\deepvoice3.py", line 75, in forward
    x = self.embed_tokens(text_sequences) <- change this to long!
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 103, in forward
    self.scale_grad_by_freq, self.sparse
  File "H:\envs\pytorch\lib\site-packages\torch\nn\_functions\thnn\sparse.py", line 59, in forward
    output = torch.index_select(weight, 0, indices.view(-1))
TypeError: torch.index_select received an invalid combination of arguments - got (�[32;1mtorch.cuda.FloatTensor�[0m, �[32;1mint�[0m, �[31;1mtorch.cuda.IntTensor�[0m), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index)

changed text_sequence to long, as required by torch.index_select.

* Fixed Nonetype error in collect_features

* requirements.txt fix

* Memory Leakage bugfix + hparams change

* Pre-PR modifications

* Pre-PR modifications 2

* Pre-PR modifications 3

* Post-PR modification

* remove requirements.txt

* num_workers to 1 in train.py
@engiecat
Copy link
Contributor Author

Fixed for now!

engiecat added a commit to engiecat/deepvoice3_pytorch that referenced this issue May 5, 2018
* Fixed typeerror (torch.index_select received an invalid combination of arguments)

  File "synthesis.py", line 137, in <module>
    model, text, p=replace_pronunciation_prob, speaker_id=speaker_id, fast=True)
  File "synthesis.py", line 66, in tts
    sequence, text_positions=text_positions, speaker_ids=speaker_ids)
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 79, in forward
    text_positions, frame_positions, input_lengths)
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 116, in forward
    text_sequences, lengths=input_lengths, speaker_embed=speaker_embed)
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\deepvoice3.py", line 75, in forward
    x = self.embed_tokens(text_sequences) <- change this to long!
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 103, in forward
    self.scale_grad_by_freq, self.sparse
  File "H:\envs\pytorch\lib\site-packages\torch\nn\_functions\thnn\sparse.py", line 59, in forward
    output = torch.index_select(weight, 0, indices.view(-1))
TypeError: torch.index_select received an invalid combination of arguments - got (�[32;1mtorch.cuda.FloatTensor�[0m, �[32;1mint�[0m, �[31;1mtorch.cuda.IntTensor�[0m), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index)

changed text_sequence to long, as required by torch.index_select.

* Fixed Nonetype error in collect_features

* requirements.txt fix

* Memory Leakage bugfix + hparams change

* Pre-PR modifications

* Pre-PR modifications 2

* Pre-PR modifications 3

* Post-PR modification

* remove requirements.txt

* num_workers to 1 in train.py

Windows Filename bugfix

In windows, this causes WinError 123

Windows Specific Filename bugfix (r9y9#58)

* Fixed typeerror (torch.index_select received an invalid combination of arguments)

  File "synthesis.py", line 137, in <module>
    model, text, p=replace_pronunciation_prob, speaker_id=speaker_id, fast=True)
  File "synthesis.py", line 66, in tts
    sequence, text_positions=text_positions, speaker_ids=speaker_ids)
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 79, in forward
    text_positions, frame_positions, input_lengths)
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 116, in forward
    text_sequences, lengths=input_lengths, speaker_embed=speaker_embed)
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\deepvoice3.py", line 75, in forward
    x = self.embed_tokens(text_sequences) <- change this to long!
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 103, in forward
    self.scale_grad_by_freq, self.sparse
  File "H:\envs\pytorch\lib\site-packages\torch\nn\_functions\thnn\sparse.py", line 59, in forward
    output = torch.index_select(weight, 0, indices.view(-1))
TypeError: torch.index_select received an invalid combination of arguments - got (�[32;1mtorch.cuda.FloatTensor�[0m, �[32;1mint�[0m, �[31;1mtorch.cuda.IntTensor�[0m), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index)

changed text_sequence to long, as required by torch.index_select.

* Fixed Nonetype error in collect_features

* requirements.txt fix

* Memory Leakage bugfix + hparams change

* Pre-PR modifications

* Pre-PR modifications 2

* Pre-PR modifications 3

* Post-PR modification

* remove requirements.txt

* num_workers to 1 in train.py

* Windows log filename bugfix

* Revert "Windows log filename bugfix"

This reverts commit 5214c24.

* merge 2

* Windows Filename bugfix

In windows, this causes WinError 123

* Cleanup before PR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants