RTX2080 GPU is only being used at 30% - whilst a single CPU core is maxed out #1080

tombburnell · 2019-09-09T17:01:54Z

I am trying to train a model and I find that the GPU (RTX 2080) is only being used approx 30% whilst 1 of my CPU cores (9900k@5Ghz) appears to be maxed out.

I'm trying to train a corpus of roughly 100k records tagged with approx 100 keywords.

ubuntu18
cuda10
python 3.7.3
'en' embeddings
embeddings_storage_mode=gpu

Is it expected that the GPU is not fully utilised?

Can it not use more than 1 cpu core?

What other info should I provide?

pommedeterresautee · 2019-09-09T17:46:36Z

What task are you trying to perform?
LM training should be around 100%.
For NER training, there are few slow operations to replace to improve perf (some memory transfer that may be reduced, some operations that can be vectorized, etc.).
Not yet worked on classification.

A work is on going:
#1074
#1068
#1053
#1038

Few others are coming.
When all are applied my own 2080TI is going up to 50%.
May be more optimizations would help to reduce inference time, please don't hesitate to help!

tombburnell · 2019-09-10T09:08:23Z

Hi,
Sorry, I'm very new to this so not familiar with the terminology.
I'm trying to create a model that predicts multiple relevant topics given a text document.

NER is named entity recognition? I'm not doing that. What is LM?

Once trained, I run a script ./predict "Some text" and get back list of labels and their score.

I have created my corpus in FastText Format. I have eliminated labels that occur infrequently.
I notice that more labels seems to make it exponentially slower.

My training code is..

if( torch.cuda.is_available():
  flair.device = torch.device('cuda:0');
else:
  flair.device = torch.device('cpu');

FASTTEXT_EMBEDDINGS = 'en'
# this is the folder in which train, test and dev files reside
data_folder = 'corpus'
# load corpus containing training, test and dev data
corpus: Corpus = ClassificationCorpus(data_folder,
                                      test_file='test.txt',
                                      dev_file='dev.txt',
                                      train_file='train.txt')
# 2. create the label dictionary
label_dict = corpus.make_label_dictionary()
# 3. make a list of word embeddings
word_embeddings = [
                   #WordEmbeddings('glove'),
                   WordEmbeddings('en')
                   # comment in flair embeddings for state-of-the-art results
                   # FlairEmbeddings('news-forward'),
                   # FlairEmbeddings('news-backward'),
                   ]
# 4. initialize document embedding by passing list of word embeddings
# Can choose between many RNN types (GRU by default, to change use rnn_type parameter)
document_embeddings: DocumentRNNEmbeddings = DocumentRNNEmbeddings(word_embeddings,
                                                                     hidden_size=512,
                                                                     reproject_words=True,
                                                                     reproject_words_dimension=256,
                                                                     )
# 5. create the text classifier
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict)

# 6. initialize the text classifier trainer
trainer = ModelTrainer(classifier, corpus)
trainer.train('model',
              embeddings_storage_mode = 'gpu',
              learning_rate=0.1,
              mini_batch_size=32,
              anneal_factor=0.5,
              patience=5,
              max_epochs=200)

pommedeterresautee · 2019-09-10T09:20:49Z

LM -> Language model.
You are right about NER, and you are trying to perform classification.

I have not yet worked on classification optimization, however what I can tell you is that to best way to reduce the inference time is through increasing GPU use by spotting and removing as much as possible slow operations. Parallelization may be an answer for some parts of the process but so far I had better results by vectorizing and avoiding memory copy.

If you are curious you can check what slow down using a profiler like cProfile or the pytorch profiler (easy to use, and produce an easy to exploit report)

tombburnell · 2019-09-10T16:02:44Z

Thanks, I'll take a look at that.

Also, for this kind of classification task, should I be using Word Embeddings or Document Embeddings?

alanakbik · 2019-09-11T09:29:21Z

Your code looks good, i.e. use document embeddings that use a stack of word embeddings.

tombburnell · 2019-10-02T11:58:54Z

I'm running a training model right now using the latest code and it only appears to be using 3-10% of my GPU.

2019-10-02 10:48:29,136 ----------------------------------------------------------------------------------------------------
2019-10-02 10:48:29,137 Model: "TextClassifier(
(document_embeddings): DocumentRNNEmbeddings(
(embeddings): StackedEmbeddings(
(list_embedding_0): WordEmbeddings('glove')
(list_embedding_1): WordEmbeddings('en')
)
(word_reprojection_map): Linear(in_features=400, out_features=256, bias=True)
(rnn): GRU(256, 512, batch_first=True)
(dropout): Dropout(p=0.5, inplace=False)
)
(decoder): Linear(in_features=512, out_features=27, bias=True)
(loss_function): BCEWithLogitsLoss()
)"
2019-10-02 10:48:29,137 ----------------------------------------------------------------------------------------------------
2019-10-02 10:48:29,137 Corpus: "Corpus: 188540 train + 23567 dev + 23567 test sentences"
2019-10-02 10:48:29,137 ----------------------------------------------------------------------------------------------------
2019-10-02 10:48:29,137 Parameters:
2019-10-02 10:48:29,137 - learning_rate: "0.1"
2019-10-02 10:48:29,137 - mini_batch_size: "32"
2019-10-02 10:48:29,137 - patience: "5"
2019-10-02 10:48:29,137 - anneal_factor: "0.5"
2019-10-02 10:48:29,137 - max_epochs: "200"
2019-10-02 10:48:29,137 - shuffle: "True"
2019-10-02 10:48:29,137 - train_with_dev: "False"
2019-10-02 10:48:29,137 - batch_growth_annealing: "False"

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1658 G /usr/lib/xorg/Xorg 24MiB |
| 0 1696 G /usr/bin/gnome-shell 58MiB |
| 0 1881 G /usr/lib/xorg/Xorg 226MiB |
| 0 2030 G /usr/bin/gnome-shell 169MiB |
| 0 4705 G ...uest-channel-token=16982847936099033133 75MiB |
| 0 10912 G ...uest-channel-token=17263537349738535565 62MiB |
| 0 13452 C python 1363MiB |
| 0 18113 G ...pycharm-community-2019.2.1/jbr/bin/java 8MiB |
| 0 30650 C python 977MiB |
+-----------------------------------------------------------------------------+

A colleague ran some lower level torch code on the same machine yesterday and was get 99% gpu utilisation

pommedeterresautee · 2019-10-02T13:58:34Z

I have worked on TextClassifier these last days.
Optimization are not yet pushed because I wait for a fix on sentiment model to be pushed first (#1165). Otherwise I can't check if I don't break model accuracy... Anyway... soon it should work much faster.

tombburnell · 2019-10-02T14:20:40Z

ok thanks - looking forward to it.

stale · 2020-04-29T20:11:10Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tombburnell added the bug Something isn't working label Sep 9, 2019

stale bot added the wontfix This will not be worked on label Apr 29, 2020

stale bot closed this as completed May 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RTX2080 GPU is only being used at 30% - whilst a single CPU core is maxed out #1080

RTX2080 GPU is only being used at 30% - whilst a single CPU core is maxed out #1080

tombburnell commented Sep 9, 2019

pommedeterresautee commented Sep 9, 2019

tombburnell commented Sep 10, 2019 •

edited

Loading

pommedeterresautee commented Sep 10, 2019

tombburnell commented Sep 10, 2019

alanakbik commented Sep 11, 2019

tombburnell commented Oct 2, 2019

pommedeterresautee commented Oct 2, 2019

tombburnell commented Oct 2, 2019

stale bot commented Apr 29, 2020

RTX2080 GPU is only being used at 30% - whilst a single CPU core is maxed out #1080

RTX2080 GPU is only being used at 30% - whilst a single CPU core is maxed out #1080

Comments

tombburnell commented Sep 9, 2019

pommedeterresautee commented Sep 9, 2019

tombburnell commented Sep 10, 2019 • edited Loading

pommedeterresautee commented Sep 10, 2019

tombburnell commented Sep 10, 2019

alanakbik commented Sep 11, 2019

tombburnell commented Oct 2, 2019

pommedeterresautee commented Oct 2, 2019

tombburnell commented Oct 2, 2019

stale bot commented Apr 29, 2020

tombburnell commented Sep 10, 2019 •

edited

Loading