prepare.py slow speed #18

danpovey · 2020-11-16T07:37:02Z

I have been running some experiments to debug why prepare.py is so slow.
I'll post the results here in a bit, but let me just give you a preview of why I believe it's so slow.
By default, torch seems to use a lot of threads, but it does so in a very inefficient way which actually just wastes CPU.
We attempt to get around this with:

  torch.set_num_threads(1)
  torch.set_num_interop_threads(1)

but now that we are using "spawn" as the parallelization method and the main() is guarded by:

if __name__ == '__main__':
    main()

that code is never reached.

The text was updated successfully, but these errors were encountered:

danpovey · 2020-11-16T07:50:18Z

Here are my notes from investigating the time to prepare dev-clean, including writing llc files.
The script prepare.py was modified to skip the prepare_librispeech set and just load the relevant manifest from
disk so the time is just the time taken in cut_set.compute_and_store_features().

Below I show various setups. TL:DR: the following statements:

torch.set_num_threads(1)
torch.set_num_interop_threads(1)

need to be moved to OUTSIDE OF MAIN() for a 100-fold speed imrovement.
This should be done regardless of the num-jobs or execution setup.

with executor = None; torch.set_num_threads(1);  torch.set_num_interop_threads(1):
       real 1m17, user: 0m51, sys: 0m22

with executor = None; and no torch.setXXX, which means it uses about 25 CPUs:
        real    1m48, user: 45m45, sys: 1m6

with executor = ProcessPoolExecutor(num_jobs, mp_context=multiprocessing.get_context("spawn")),
  [in main.py: torch.set_num_threads(1);  torch.set_num_interop_threads(1)]; num_jobs = 1:
      Uses about 25 CPUs within 1 thread:
      real 1m35, user 39m30, sys 1m16.

with executor = ProcessPoolExecutor(num_jobs, mp_context=multiprocessing.get_context("spawn")),
  [in main.py: torch.set_num_threads(10);  torch.set_num_interop_threads(1)]; num_jobs = 10:
    Each thread uses about 500% CPU.
    real    10m16.768s,  user   299m8.986s, sys 2m26.761s

with executor = ProcessPoolExecutor(num_jobs, mp_context=multiprocessing.get_context("spawn")),
  [outside main.py: torch.set_num_threads(10);  torch.set_num_interop_threads(1)]; num_jobs = 10:
    Each thread uses about 500% CPU.
    real    real    0m13.860s,   user   1m8.146s, sys   0m44.639s

danpovey · 2020-11-16T07:51:15Z

Also notice there is about a factor of 300 in speed between the last 2 runs, if you look at the user time taken (i.e. total across CPUs).

jimbozhang · 2020-11-16T08:10:38Z

Let me try it under your suggestion and make a PR to fix it.

jimbozhang · 2020-11-16T08:23:26Z

Yes, it is much faster! 👍

Use 48 cores on ('dev-clean', 'test-clean'):

real    0m47.141s
user    3m11.562s
sys     1m39.085s

Use 72 cores on ('dev-clean', 'test-clean', 'train-clean-100'):

real    6m20.349s
user    32m31.692s
sys     7m57.317s

It is slower than Dan's result, I think it should be caused by the NFS delay of my home directory.

pzelasko · 2020-11-16T14:20:15Z

🎉

mthrok · 2020-11-20T17:14:15Z

I have disabled the OpenMP support for libsox on torchaudio side. pytorch/audio#1026 It will be available through nightly binary distribution starting tomorrow. (Please check out pytorch.org for the installation of nightly build)

This should remove the suspected cause of crashing (GNU OpenMP and Intel OpenMP co-exist and interfere), and will allow you to go back to the fork method without being worried about OMP_NUM_THREADS configuration (or torch.set_num_threads).

We have minor release scheduled in the early December, so we appreciate if you can test it and confirm it resolves the issue (or not). pytorch/audio#1021

danpovey · 2020-11-21T03:50:06Z

@jimbozhang would you mind testing this? (I think what is required is testing that the speed is ok after you revert the "spawn"->"fork" and remove the stuff about setting torch num_threads).

jimbozhang · 2020-11-21T04:23:24Z

@jimbozhang would you mind testing this? (I think what is required is testing that the speed is ok after you revert the "spawn"->"fork" and remove the stuff about setting torch num_threads).

OK, I'll work on this (and k2 NVTX stuffs) next Monday. @danpovey

jimbozhang · 2020-11-24T06:53:51Z

With the latest version's k2+lhotse+snowfall, under the path snowfall/egs/librispeech/asr/simple_v1:

$ time python prepare.py

dev-clean
Extracting and storing features in parallel: 100%|_________| 2703/2703 [00:07<00:00, 364.28it/s]

test-clean
Extracting and storing features in parallel: 100%|_________| 2620/2620 [00:07<00:00, 342.19it/s]

train-clean-100
Extracting and storing features in parallel: 100%|_________| 28539/28539 [01:26<00:00, 331.35it/s]

real    3m41.975s
user    33m55.149s
sys     5m25.221s

Didn't modify the job number setting, so the job number num_jobs = min(15, os.cpu_count()) should be 15.

danpovey · 2020-11-24T15:14:39Z

I think to test what he's asking about, you need to install the latest version of TorchAudio (available through the nightly builds)... at least I think that is what he was saying. You might need to install PyTorch 1.7.0 in order to do that. Perhaps you should talk to Haowen about this to-morrow. On our test machine we have cuda 10.1 and you are probably using PyTorch 1.6.0, so you'll have to see whether there Also you'd need to change the snowfall prepare.py code to use "fork" instead of "spawn", and probably remove the 2 statements setting torch's num-threads to 0, to verify that it still works the same after those changes.

…

On Tue, Nov 24, 2020 at 3:09 PM Junbo Zhang ***@***.***> wrote: With the latest version's k2+lhotse+snowfall, under the path snowfall/egs/librispeech/asr/simple_v1: $ time python prepare.py dev-clean Extracting and storing features in parallel: 100%|_________| 2703/2703 [00:07<00:00, 364.28it/s] test-clean Extracting and storing features in parallel: 100%|_________| 2620/2620 [00:07<00:00, 342.19it/s] train-clean-100 Extracting and storing features in parallel: 100%|_________| 28539/28539 [01:26<00:00, 331.35it/s] real 3m41.975s user 33m55.149s sys 5m25.221s — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#18 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO625JHOMXUVMTN4UGDSRNKS3ANCNFSM4TWZ42KA> .

jimbozhang · 2020-11-24T17:05:38Z

Ah, apologies. I misunderstood it. I didn't notice mthrok's comment at all 😓 and just simply tested the latest snowfall. Sorry about that ...

I'll try torchaudio with fork method tomorrow.

jimbozhang · 2020-11-26T13:54:22Z

I tested it. The snowfall/egs/librispeech/asr/simple_v1/prepare.py worked fine when I just replaced "spawn" with "fork", with torchaudio (0.8.0.dev20201125) + torch (1.8.0.dev20201126+cu101).

But after I removed the torch.set_num_threads settings, it became extremely slow, same as before. @mthrok @danpovey

jimbozhang mentioned this issue Nov 16, 2020

fix prepare.py slow speed #19

Merged

qindazhu mentioned this issue Nov 16, 2020

use mp_context=multiprocessing.get_context("spawn") in ProcessPoolExecutor will crash lhotse-speech/lhotse#126

Closed

mthrok mentioned this issue Nov 16, 2020

Temporarily Disable OpenMP support for libsox pytorch/audio#1026

Merged

danpovey mentioned this issue Jun 21, 2021

CUDA out of memory #216

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prepare.py slow speed #18

prepare.py slow speed #18

danpovey commented Nov 16, 2020

danpovey commented Nov 16, 2020

danpovey commented Nov 16, 2020

jimbozhang commented Nov 16, 2020

jimbozhang commented Nov 16, 2020 •

edited

Loading

pzelasko commented Nov 16, 2020

mthrok commented Nov 20, 2020

danpovey commented Nov 21, 2020

jimbozhang commented Nov 21, 2020 •

edited

Loading

jimbozhang commented Nov 24, 2020 •

edited

Loading

danpovey commented Nov 24, 2020 via email

jimbozhang commented Nov 24, 2020

jimbozhang commented Nov 26, 2020

prepare.py slow speed #18

prepare.py slow speed #18

Comments

danpovey commented Nov 16, 2020

danpovey commented Nov 16, 2020

danpovey commented Nov 16, 2020

jimbozhang commented Nov 16, 2020

jimbozhang commented Nov 16, 2020 • edited Loading

pzelasko commented Nov 16, 2020

mthrok commented Nov 20, 2020

danpovey commented Nov 21, 2020

jimbozhang commented Nov 21, 2020 • edited Loading

jimbozhang commented Nov 24, 2020 • edited Loading

danpovey commented Nov 24, 2020 via email

jimbozhang commented Nov 24, 2020

jimbozhang commented Nov 26, 2020

jimbozhang commented Nov 16, 2020 •

edited

Loading

jimbozhang commented Nov 21, 2020 •

edited

Loading

jimbozhang commented Nov 24, 2020 •

edited

Loading