Persistent MemoryError while training on VCTK #50

engiecat · 2018-03-04T01:28:31Z

Hello. I am currently trying to train VCTK model on deepvoice 3 multispeaker model.
While it seems that it works okay, sometimes the training crashes with the following error.

2734it [13:58,  3.26it/s]Traceback (most recent call last):
  File "train.py", line 957, in <module>
    train_seq2seq=train_seq2seq, train_postnet=train_postnet)
  File "train.py", line 585, in train
    in tqdm(enumerate(data_loader)):
  File "H:\envs\pytorch\lib\site-packages\tqdm\_tqdm.py", line 959, in __iter__
    for obj in iterable:
  File "H:\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 281, in __next__
    return self._process_next_batch(batch)
  File "H:\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
MemoryError: Traceback (most recent call last):
  File "H:\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 55, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "H:\Tensorflow_Study\git\deepvoice3_pytorch\train.py", line 329, in collate_fn
    dtype=np.float32)
MemoryError

Forcing garbage collection sporadically(using gc.collect()) doesn't help the issue.
Currently, I have 16 GB of RAM with 48 GB of virtual memory available on my SSD (just in case).
(Using Windows 10 with PyTorch 0.3.1 (with CUDA 8.0, GTX1060 6GB))

Also, I do observe that in Resource Monitor, the memory usage in Commit(KB) and Working Set(KB) is significantly different, as shown below. (Sorry for the non-english)

Thank you for creating such wonderful implementation!
:)

The text was updated successfully, but these errors were encountered:

r9y9 · 2018-03-04T03:09:30Z

Thank you for the report. I have never seen MemoryError on Linux, so I'm guessing this is a windows-specific issue... If gc.collect() doesn't help, this is possibly a pytorch (C++-side) issue.

r9y9 · 2018-03-04T03:11:13Z

Maybe worth trying num_workers=1 for DataLoader?

deepvoice3_pytorch/hparams.py

Lines 82 to 84 in 5dc4426

    
           # Data loader 
        
           pin_memory=True, 
        
           num_workers=2,

engiecat · 2018-03-04T22:07:32Z

@r9y9 It didn't crash for 15 hours(working well till now). (compared to ~2.5 hrs previously)
Thank you for your assistance!!!

* Fixed typeerror (torch.index_select received an invalid combination of arguments) File "synthesis.py", line 137, in <module> model, text, p=replace_pronunciation_prob, speaker_id=speaker_id, fast=True) File "synthesis.py", line 66, in tts sequence, text_positions=text_positions, speaker_ids=speaker_ids) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 79, in forward text_positions, frame_positions, input_lengths) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 116, in forward text_sequences, lengths=input_lengths, speaker_embed=speaker_embed) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\deepvoice3.py", line 75, in forward x = self.embed_tokens(text_sequences) <- change this to long! File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 103, in forward self.scale_grad_by_freq, self.sparse File "H:\envs\pytorch\lib\site-packages\torch\nn\_functions\thnn\sparse.py", line 59, in forward output = torch.index_select(weight, 0, indices.view(-1)) TypeError: torch.index_select received an invalid combination of arguments - got (�[32;1mtorch.cuda.FloatTensor�[0m, �[32;1mint�[0m, �[31;1mtorch.cuda.IntTensor�[0m), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index) changed text_sequence to long, as required by torch.index_select. * Fixed Nonetype error in collect_features * requirements.txt fix * Memory Leakage bugfix + hparams change * Pre-PR modifications * Pre-PR modifications 2 * Pre-PR modifications 3 * Post-PR modification * remove requirements.txt * num_workers to 1 in train.py

engiecat · 2018-03-10T10:04:39Z

Fixed for now!

* Fixed typeerror (torch.index_select received an invalid combination of arguments) File "synthesis.py", line 137, in <module> model, text, p=replace_pronunciation_prob, speaker_id=speaker_id, fast=True) File "synthesis.py", line 66, in tts sequence, text_positions=text_positions, speaker_ids=speaker_ids) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 79, in forward text_positions, frame_positions, input_lengths) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 116, in forward text_sequences, lengths=input_lengths, speaker_embed=speaker_embed) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\deepvoice3.py", line 75, in forward x = self.embed_tokens(text_sequences) <- change this to long! File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 103, in forward self.scale_grad_by_freq, self.sparse File "H:\envs\pytorch\lib\site-packages\torch\nn\_functions\thnn\sparse.py", line 59, in forward output = torch.index_select(weight, 0, indices.view(-1)) TypeError: torch.index_select received an invalid combination of arguments - got (�[32;1mtorch.cuda.FloatTensor�[0m, �[32;1mint�[0m, �[31;1mtorch.cuda.IntTensor�[0m), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index) changed text_sequence to long, as required by torch.index_select. * Fixed Nonetype error in collect_features * requirements.txt fix * Memory Leakage bugfix + hparams change * Pre-PR modifications * Pre-PR modifications 2 * Pre-PR modifications 3 * Post-PR modification * remove requirements.txt * num_workers to 1 in train.py Windows Filename bugfix In windows, this causes WinError 123 Windows Specific Filename bugfix (r9y9#58) * Fixed typeerror (torch.index_select received an invalid combination of arguments) File "synthesis.py", line 137, in <module> model, text, p=replace_pronunciation_prob, speaker_id=speaker_id, fast=True) File "synthesis.py", line 66, in tts sequence, text_positions=text_positions, speaker_ids=speaker_ids) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 79, in forward text_positions, frame_positions, input_lengths) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\__init__.py", line 116, in forward text_sequences, lengths=input_lengths, speaker_embed=speaker_embed) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\Tensorflow_Study\git\deepvoice3_pytorch\deepvoice3_pytorch\deepvoice3.py", line 75, in forward x = self.embed_tokens(text_sequences) <- change this to long! File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "H:\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py", line 103, in forward self.scale_grad_by_freq, self.sparse File "H:\envs\pytorch\lib\site-packages\torch\nn\_functions\thnn\sparse.py", line 59, in forward output = torch.index_select(weight, 0, indices.view(-1)) TypeError: torch.index_select received an invalid combination of arguments - got (�[32;1mtorch.cuda.FloatTensor�[0m, �[32;1mint�[0m, �[31;1mtorch.cuda.IntTensor�[0m), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index) changed text_sequence to long, as required by torch.index_select. * Fixed Nonetype error in collect_features * requirements.txt fix * Memory Leakage bugfix + hparams change * Pre-PR modifications * Pre-PR modifications 2 * Pre-PR modifications 3 * Post-PR modification * remove requirements.txt * num_workers to 1 in train.py * Windows log filename bugfix * Revert "Windows log filename bugfix" This reverts commit 5214c24. * merge 2 * Windows Filename bugfix In windows, this causes WinError 123 * Cleanup before PR

engiecat closed this as completed Mar 4, 2018

engiecat reopened this Mar 4, 2018

engiecat mentioned this issue Mar 8, 2018

Fix for #37, #50 and #53 (Windows specific issues) #54

Merged

engiecat closed this as completed Mar 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent MemoryError while training on VCTK #50

Persistent MemoryError while training on VCTK #50

engiecat commented Mar 4, 2018

r9y9 commented Mar 4, 2018

r9y9 commented Mar 4, 2018

engiecat commented Mar 4, 2018 •

edited

Loading

engiecat commented Mar 10, 2018

Persistent MemoryError while training on VCTK #50

Persistent MemoryError while training on VCTK #50

Comments

engiecat commented Mar 4, 2018

r9y9 commented Mar 4, 2018

r9y9 commented Mar 4, 2018

engiecat commented Mar 4, 2018 • edited Loading

engiecat commented Mar 10, 2018

engiecat commented Mar 4, 2018 •

edited

Loading